Methodology Series Module 5: Sampling Strategies.
Setia, Maninder Singh
2016-01-01
Once the research question and the research design have been finalised, it is important to select the appropriate sample for the study. The method by which the researcher selects the sample is the ' Sampling Method'. There are essentially two types of sampling methods: 1) probability sampling - based on chance events (such as random numbers, flipping a coin etc.); and 2) non-probability sampling - based on researcher's choice, population that accessible & available. Some of the non-probability sampling methods are: purposive sampling, convenience sampling, or quota sampling. Random sampling method (such as simple random sample or stratified random sample) is a form of probability sampling. It is important to understand the different sampling methods used in clinical studies and mention this method clearly in the manuscript. The researcher should not misrepresent the sampling method in the manuscript (such as using the term ' random sample' when the researcher has used convenience sample). The sampling method will depend on the research question. For instance, the researcher may want to understand an issue in greater detail for one particular population rather than worry about the ' generalizability' of these results. In such a scenario, the researcher may want to use ' purposive sampling' for the study.
Chao, Li-Wei; Szrek, Helena; Peltzer, Karl; Ramlagan, Shandir; Fleming, Peter; Leite, Rui; Magerman, Jesswill; Ngwenya, Godfrey B.; Pereira, Nuno Sousa; Behrman, Jere
2011-01-01
Finding an efficient method for sampling micro- and small-enterprises (MSEs) for research and statistical reporting purposes is a challenge in developing countries, where registries of MSEs are often nonexistent or outdated. This lack of a sampling frame creates an obstacle in finding a representative sample of MSEs. This study uses computer simulations to draw samples from a census of businesses and non-businesses in the Tshwane Municipality of South Africa, using three different sampling methods: the traditional probability sampling method, the compact segment sampling method, and the World Health Organization’s Expanded Programme on Immunization (EPI) sampling method. Three mechanisms by which the methods could differ are tested, the proximity selection of respondents, the at-home selection of respondents, and the use of inaccurate probability weights. The results highlight the importance of revisits and accurate probability weights, but the lesser effect of proximity selection on the samples’ statistical properties. PMID:22582004
Methodology Series Module 5: Sampling Strategies
Setia, Maninder Singh
2016-01-01
Once the research question and the research design have been finalised, it is important to select the appropriate sample for the study. The method by which the researcher selects the sample is the ‘ Sampling Method’. There are essentially two types of sampling methods: 1) probability sampling – based on chance events (such as random numbers, flipping a coin etc.); and 2) non-probability sampling – based on researcher's choice, population that accessible & available. Some of the non-probability sampling methods are: purposive sampling, convenience sampling, or quota sampling. Random sampling method (such as simple random sample or stratified random sample) is a form of probability sampling. It is important to understand the different sampling methods used in clinical studies and mention this method clearly in the manuscript. The researcher should not misrepresent the sampling method in the manuscript (such as using the term ‘ random sample’ when the researcher has used convenience sample). The sampling method will depend on the research question. For instance, the researcher may want to understand an issue in greater detail for one particular population rather than worry about the ‘ generalizability’ of these results. In such a scenario, the researcher may want to use ‘ purposive sampling’ for the study. PMID:27688438
Sampling Methods in Cardiovascular Nursing Research: An Overview.
Kandola, Damanpreet; Banner, Davina; O'Keefe-McCarthy, Sheila; Jassal, Debbie
2014-01-01
Cardiovascular nursing research covers a wide array of topics from health services to psychosocial patient experiences. The selection of specific participant samples is an important part of the research design and process. The sampling strategy employed is of utmost importance to ensure that a representative sample of participants is chosen. There are two main categories of sampling methods: probability and non-probability. Probability sampling is the random selection of elements from the population, where each element of the population has an equal and independent chance of being included in the sample. There are five main types of probability sampling including simple random sampling, systematic sampling, stratified sampling, cluster sampling, and multi-stage sampling. Non-probability sampling methods are those in which elements are chosen through non-random methods for inclusion into the research study and include convenience sampling, purposive sampling, and snowball sampling. Each approach offers distinct advantages and disadvantages and must be considered critically. In this research column, we provide an introduction to these key sampling techniques and draw on examples from the cardiovascular research. Understanding the differences in sampling techniques may aid nurses in effective appraisal of research literature and provide a reference pointfor nurses who engage in cardiovascular research.
Nonprobability and probability-based sampling strategies in sexual science.
Catania, Joseph A; Dolcini, M Margaret; Orellana, Roberto; Narayanan, Vasudah
2015-01-01
With few exceptions, much of sexual science builds upon data from opportunistic nonprobability samples of limited generalizability. Although probability-based studies are considered the gold standard in terms of generalizability, they are costly to apply to many of the hard-to-reach populations of interest to sexologists. The present article discusses recent conclusions by sampling experts that have relevance to sexual science that advocates for nonprobability methods. In this regard, we provide an overview of Internet sampling as a useful, cost-efficient, nonprobability sampling method of value to sex researchers conducting modeling work or clinical trials. We also argue that probability-based sampling methods may be more readily applied in sex research with hard-to-reach populations than is typically thought. In this context, we provide three case studies that utilize qualitative and quantitative techniques directed at reducing limitations in applying probability-based sampling to hard-to-reach populations: indigenous Peruvians, African American youth, and urban men who have sex with men (MSM). Recommendations are made with regard to presampling studies, adaptive and disproportionate sampling methods, and strategies that may be utilized in evaluating nonprobability and probability-based sampling methods.
The estimation of tree posterior probabilities using conditional clade probability distributions.
Larget, Bret
2013-07-01
In this article I introduce the idea of conditional independence of separated subtrees as a principle by which to estimate the posterior probability of trees using conditional clade probability distributions rather than simple sample relative frequencies. I describe an algorithm for these calculations and software which implements these ideas. I show that these alternative calculations are very similar to simple sample relative frequencies for high probability trees but are substantially more accurate for relatively low probability trees. The method allows the posterior probability of unsampled trees to be calculated when these trees contain only clades that are in other sampled trees. Furthermore, the method can be used to estimate the total probability of the set of sampled trees which provides a measure of the thoroughness of a posterior sample.
The Estimation of Tree Posterior Probabilities Using Conditional Clade Probability Distributions
Larget, Bret
2013-01-01
In this article I introduce the idea of conditional independence of separated subtrees as a principle by which to estimate the posterior probability of trees using conditional clade probability distributions rather than simple sample relative frequencies. I describe an algorithm for these calculations and software which implements these ideas. I show that these alternative calculations are very similar to simple sample relative frequencies for high probability trees but are substantially more accurate for relatively low probability trees. The method allows the posterior probability of unsampled trees to be calculated when these trees contain only clades that are in other sampled trees. Furthermore, the method can be used to estimate the total probability of the set of sampled trees which provides a measure of the thoroughness of a posterior sample. [Bayesian phylogenetics; conditional clade distributions; improved accuracy; posterior probabilities of trees.] PMID:23479066
Jeffrey H. Gove
2003-01-01
Many of the most popular sampling schemes used in forestry are probability proportional to size methods. These methods are also referred to as size biased because sampling is actually from a weighted form of the underlying population distribution. Length- and area-biased sampling are special cases of size-biased sampling where the probability weighting comes from a...
Haynes, Trevor B.; Rosenberger, Amanda E.; Lindberg, Mark S.; Whitman, Matthew; Schmutz, Joel A.
2013-01-01
Studies examining species occurrence often fail to account for false absences in field sampling. We investigate detection probabilities of five gear types for six fish species in a sample of lakes on the North Slope, Alaska. We used an occupancy modeling approach to provide estimates of detection probabilities for each method. Variation in gear- and species-specific detection probability was considerable. For example, detection probabilities for the fyke net ranged from 0.82 (SE = 0.05) for least cisco (Coregonus sardinella) to 0.04 (SE = 0.01) for slimy sculpin (Cottus cognatus). Detection probabilities were also affected by site-specific variables such as depth of the lake, year, day of sampling, and lake connection to a stream. With the exception of the dip net and shore minnow traps, each gear type provided the highest detection probability of at least one species. Results suggest that a multimethod approach may be most effective when attempting to sample the entire fish community of Arctic lakes. Detection probability estimates will be useful for designing optimal fish sampling and monitoring protocols in Arctic lakes.
Multi-scale occupancy estimation and modelling using multiple detection methods
Nichols, James D.; Bailey, Larissa L.; O'Connell, Allan F.; Talancy, Neil W.; Grant, Evan H. Campbell; Gilbert, Andrew T.; Annand, Elizabeth M.; Husband, Thomas P.; Hines, James E.
2008-01-01
Occupancy estimation and modelling based on detection–nondetection data provide an effective way of exploring change in a species’ distribution across time and space in cases where the species is not always detected with certainty. Today, many monitoring programmes target multiple species, or life stages within a species, requiring the use of multiple detection methods. When multiple methods or devices are used at the same sample sites, animals can be detected by more than one method.We develop occupancy models for multiple detection methods that permit simultaneous use of data from all methods for inference about method-specific detection probabilities. Moreover, the approach permits estimation of occupancy at two spatial scales: the larger scale corresponds to species’ use of a sample unit, whereas the smaller scale corresponds to presence of the species at the local sample station or site.We apply the models to data collected on two different vertebrate species: striped skunks Mephitis mephitis and red salamanders Pseudotriton ruber. For striped skunks, large-scale occupancy estimates were consistent between two sampling seasons. Small-scale occupancy probabilities were slightly lower in the late winter/spring when skunks tend to conserve energy, and movements are limited to males in search of females for breeding. There was strong evidence of method-specific detection probabilities for skunks. As anticipated, large- and small-scale occupancy areas completely overlapped for red salamanders. The analyses provided weak evidence of method-specific detection probabilities for this species.Synthesis and applications. Increasingly, many studies are utilizing multiple detection methods at sampling locations. The modelling approach presented here makes efficient use of detections from multiple methods to estimate occupancy probabilities at two spatial scales and to compare detection probabilities associated with different detection methods. The models can be viewed as another variation of Pollock's robust design and may be applicable to a wide variety of scenarios where species occur in an area but are not always near the sampled locations. The estimation approach is likely to be especially useful in multispecies conservation programmes by providing efficient estimates using multiple detection devices and by providing device-specific detection probability estimates for use in survey design.
Smart, Adam S; Tingley, Reid; Weeks, Andrew R; van Rooyen, Anthony R; McCarthy, Michael A
2015-10-01
Effective management of alien species requires detecting populations in the early stages of invasion. Environmental DNA (eDNA) sampling can detect aquatic species at relatively low densities, but few studies have directly compared detection probabilities of eDNA sampling with those of traditional sampling methods. We compare the ability of a traditional sampling technique (bottle trapping) and eDNA to detect a recently established invader, the smooth newt Lissotriton vulgaris vulgaris, at seven field sites in Melbourne, Australia. Over a four-month period, per-trap detection probabilities ranged from 0.01 to 0.26 among sites where L. v. vulgaris was detected, whereas per-sample eDNA estimates were much higher (0.29-1.0). Detection probabilities of both methods varied temporally (across days and months), but temporal variation appeared to be uncorrelated between methods. Only estimates of spatial variation were strongly correlated across the two sampling techniques. Environmental variables (water depth, rainfall, ambient temperature) were not clearly correlated with detection probabilities estimated via trapping, whereas eDNA detection probabilities were negatively correlated with water depth, possibly reflecting higher eDNA concentrations at lower water levels. Our findings demonstrate that eDNA sampling can be an order of magnitude more sensitive than traditional methods, and illustrate that traditional- and eDNA-based surveys can provide independent information on species distributions when occupancy surveys are conducted over short timescales.
Sampling considerations for disease surveillance in wildlife populations
Nusser, S.M.; Clark, W.R.; Otis, D.L.; Huang, L.
2008-01-01
Disease surveillance in wildlife populations involves detecting the presence of a disease, characterizing its prevalence and spread, and subsequent monitoring. A probability sample of animals selected from the population and corresponding estimators of disease prevalence and detection provide estimates with quantifiable statistical properties, but this approach is rarely used. Although wildlife scientists often assume probability sampling and random disease distributions to calculate sample sizes, convenience samples (i.e., samples of readily available animals) are typically used, and disease distributions are rarely random. We demonstrate how landscape-based simulation can be used to explore properties of estimators from convenience samples in relation to probability samples. We used simulation methods to model what is known about the habitat preferences of the wildlife population, the disease distribution, and the potential biases of the convenience-sample approach. Using chronic wasting disease in free-ranging deer (Odocoileus virginianus) as a simple illustration, we show that using probability sample designs with appropriate estimators provides unbiased surveillance parameter estimates but that the selection bias and coverage errors associated with convenience samples can lead to biased and misleading results. We also suggest practical alternatives to convenience samples that mix probability and convenience sampling. For example, a sample of land areas can be selected using a probability design that oversamples areas with larger animal populations, followed by harvesting of individual animals within sampled areas using a convenience sampling method.
Systematic sampling for suspended sediment
Robert B. Thomas
1991-01-01
Abstract - Because of high costs or complex logistics, scientific populations cannot be measured entirely and must be sampled. Accepted scientific practice holds that sample selection be based on statistical principles to assure objectivity when estimating totals and variances. Probability sampling--obtaining samples with known probabilities--is the only method that...
Iachan, Ronaldo; H. Johnson, Christopher; L. Harding, Richard; Kyle, Tonja; Saavedra, Pedro; L. Frazier, Emma; Beer, Linda; L. Mattson, Christine; Skarbinski, Jacek
2016-01-01
Background: Health surveys of the general US population are inadequate for monitoring human immunodeficiency virus (HIV) infection because the relatively low prevalence of the disease (<0.5%) leads to small subpopulation sample sizes. Objective: To collect a nationally and locally representative probability sample of HIV-infected adults receiving medical care to monitor clinical and behavioral outcomes, supplementing the data in the National HIV Surveillance System. This paper describes the sample design and weighting methods for the Medical Monitoring Project (MMP) and provides estimates of the size and characteristics of this population. Methods: To develop a method for obtaining valid, representative estimates of the in-care population, we implemented a cross-sectional, three-stage design that sampled 23 jurisdictions, then 691 facilities, then 9,344 HIV patients receiving medical care, using probability-proportional-to-size methods. The data weighting process followed standard methods, accounting for the probabilities of selection at each stage and adjusting for nonresponse and multiplicity. Nonresponse adjustments accounted for differing response at both facility and patient levels. Multiplicity adjustments accounted for visits to more than one HIV care facility. Results: MMP used a multistage stratified probability sampling design that was approximately self-weighting in each of the 23 project areas and nationally. The probability sample represents the estimated 421,186 HIV-infected adults receiving medical care during January through April 2009. Methods were efficient (i.e., induced small, unequal weighting effects and small standard errors for a range of weighted estimates). Conclusion: The information collected through MMP allows monitoring trends in clinical and behavioral outcomes and informs resource allocation for treatment and prevention activities. PMID:27651851
Probability Issues in without Replacement Sampling
ERIC Educational Resources Information Center
Joarder, A. H.; Al-Sabah, W. S.
2007-01-01
Sampling without replacement is an important aspect in teaching conditional probabilities in elementary statistics courses. Different methods proposed in different texts for calculating probabilities of events in this context are reviewed and their relative merits and limitations in applications are pinpointed. An alternative representation of…
Viana, Duarte S; Santamaría, Luis; Figuerola, Jordi
2016-02-01
Propagule retention time is a key factor in determining propagule dispersal distance and the shape of "seed shadows". Propagules dispersed by animal vectors are either ingested and retained in the gut until defecation or attached externally to the body until detachment. Retention time is a continuous variable, but it is commonly measured at discrete time points, according to pre-established sampling time-intervals. Although parametric continuous distributions have been widely fitted to these interval-censored data, the performance of different fitting methods has not been evaluated. To investigate the performance of five different fitting methods, we fitted parametric probability distributions to typical discretized retention-time data with known distribution using as data-points either the lower, mid or upper bounds of sampling intervals, as well as the cumulative distribution of observed values (using either maximum likelihood or non-linear least squares for parameter estimation); then compared the estimated and original distributions to assess the accuracy of each method. We also assessed the robustness of these methods to variations in the sampling procedure (sample size and length of sampling time-intervals). Fittings to the cumulative distribution performed better for all types of parametric distributions (lognormal, gamma and Weibull distributions) and were more robust to variations in sample size and sampling time-intervals. These estimated distributions had negligible deviations of up to 0.045 in cumulative probability of retention times (according to the Kolmogorov-Smirnov statistic) in relation to original distributions from which propagule retention time was simulated, supporting the overall accuracy of this fitting method. In contrast, fitting the sampling-interval bounds resulted in greater deviations that ranged from 0.058 to 0.273 in cumulative probability of retention times, which may introduce considerable biases in parameter estimates. We recommend the use of cumulative probability to fit parametric probability distributions to propagule retention time, specifically using maximum likelihood for parameter estimation. Furthermore, the experimental design for an optimal characterization of unimodal propagule retention time should contemplate at least 500 recovered propagules and sampling time-intervals not larger than the time peak of propagule retrieval, except in the tail of the distribution where broader sampling time-intervals may also produce accurate fits.
Chen, Peichen; Liu, Shih-Chia; Liu, Hung-I; Chen, Tse-Wei
2011-01-01
For quarantine sampling, it is of fundamental importance to determine the probability of finding an infestation when a specified number of units are inspected. In general, current sampling procedures assume 100% probability (perfect) of detecting a pest if it is present within a unit. Ideally, a nematode extraction method should remove all stages of all species with 100% efficiency regardless of season, temperature, or other environmental conditions; in practice however, no method approaches these criteria. In this study we determined the probability of detecting nematode infestations for quarantine sampling with imperfect extraction efficacy. Also, the required sample and the risk involved in detecting nematode infestations with imperfect extraction efficacy are presented. Moreover, we developed a computer program to calculate confidence levels for different scenarios with varying proportions of infestation and efficacy of detection. In addition, a case study, presenting the extraction efficacy of the modified Baermann's Funnel method on Aphelenchoides besseyi, is used to exemplify the use of our program to calculate the probability of detecting nematode infestations in quarantine sampling with imperfect extraction efficacy. The result has important implications for quarantine programs and highlights the need for a very large number of samples if perfect extraction efficacy is not achieved in such programs. We believe that the results of the study will be useful for the determination of realistic goals in the implementation of quarantine sampling. PMID:22791911
Nonlinear Demodulation and Channel Coding in EBPSK Scheme
Chen, Xianqing; Wu, Lenan
2012-01-01
The extended binary phase shift keying (EBPSK) is an efficient modulation technique, and a special impacting filter (SIF) is used in its demodulator to improve the bit error rate (BER) performance. However, the conventional threshold decision cannot achieve the optimum performance, and the SIF brings more difficulty in obtaining the posterior probability for LDPC decoding. In this paper, we concentrate not only on reducing the BER of demodulation, but also on providing accurate posterior probability estimates (PPEs). A new approach for the nonlinear demodulation based on the support vector machine (SVM) classifier is introduced. The SVM method which selects only a few sampling points from the filter output was used for getting PPEs. The simulation results show that the accurate posterior probability can be obtained with this method and the BER performance can be improved significantly by applying LDPC codes. Moreover, we analyzed the effect of getting the posterior probability with different methods and different sampling rates. We show that there are more advantages of the SVM method under bad condition and it is less sensitive to the sampling rate than other methods. Thus, SVM is an effective method for EBPSK demodulation and getting posterior probability for LDPC decoding. PMID:23213281
Nonlinear demodulation and channel coding in EBPSK scheme.
Chen, Xianqing; Wu, Lenan
2012-01-01
The extended binary phase shift keying (EBPSK) is an efficient modulation technique, and a special impacting filter (SIF) is used in its demodulator to improve the bit error rate (BER) performance. However, the conventional threshold decision cannot achieve the optimum performance, and the SIF brings more difficulty in obtaining the posterior probability for LDPC decoding. In this paper, we concentrate not only on reducing the BER of demodulation, but also on providing accurate posterior probability estimates (PPEs). A new approach for the nonlinear demodulation based on the support vector machine (SVM) classifier is introduced. The SVM method which selects only a few sampling points from the filter output was used for getting PPEs. The simulation results show that the accurate posterior probability can be obtained with this method and the BER performance can be improved significantly by applying LDPC codes. Moreover, we analyzed the effect of getting the posterior probability with different methods and different sampling rates. We show that there are more advantages of the SVM method under bad condition and it is less sensitive to the sampling rate than other methods. Thus, SVM is an effective method for EBPSK demodulation and getting posterior probability for LDPC decoding.
Multinomial mixture model with heterogeneous classification probabilities
Holland, M.D.; Gray, B.R.
2011-01-01
Royle and Link (Ecology 86(9):2505-2512, 2005) proposed an analytical method that allowed estimation of multinomial distribution parameters and classification probabilities from categorical data measured with error. While useful, we demonstrate algebraically and by simulations that this method yields biased multinomial parameter estimates when the probabilities of correct category classifications vary among sampling units. We address this shortcoming by treating these probabilities as logit-normal random variables within a Bayesian framework. We use Markov chain Monte Carlo to compute Bayes estimates from a simulated sample from the posterior distribution. Based on simulations, this elaborated Royle-Link model yields nearly unbiased estimates of multinomial and correct classification probability estimates when classification probabilities are allowed to vary according to the normal distribution on the logit scale or according to the Beta distribution. The method is illustrated using categorical submersed aquatic vegetation data. ?? 2010 Springer Science+Business Media, LLC.
Farnsworth, G.L.; Nichols, J.D.; Sauer, J.R.; Fancy, S.G.; Pollock, K.H.; Shriner, S.A.; Simons, T.R.; Ralph, C. John; Rich, Terrell D.
2005-01-01
Point counts are a standard sampling procedure for many bird species, but lingering concerns still exist about the quality of information produced from the method. It is well known that variation in observer ability and environmental conditions can influence the detection probability of birds in point counts, but many biologists have been reluctant to abandon point counts in favor of more intensive approaches to counting. However, over the past few years a variety of statistical and methodological developments have begun to provide practical ways of overcoming some of the problems with point counts. We describe some of these approaches, and show how they can be integrated into standard point count protocols to greatly enhance the quality of the information. Several tools now exist for estimation of detection probability of birds during counts, including distance sampling, double observer methods, time-depletion (removal) methods, and hybrid methods that combine these approaches. Many counts are conducted in habitats that make auditory detection of birds much more likely than visual detection. As a framework for understanding detection probability during such counts, we propose separating two components of the probability a bird is detected during a count into (1) the probability a bird vocalizes during the count and (2) the probability this vocalization is detected by an observer. In addition, we propose that some measure of the area sampled during a count is necessary for valid inferences about bird populations. This can be done by employing fixed-radius counts or more sophisticated distance-sampling models. We recommend any studies employing point counts be designed to estimate detection probability and to include a measure of the area sampled.
Schillaci, Michael A; Schillaci, Mario E
2009-02-01
The use of small sample sizes in human and primate evolutionary research is commonplace. Estimating how well small samples represent the underlying population, however, is not commonplace. Because the accuracy of determinations of taxonomy, phylogeny, and evolutionary process are dependant upon how well the study sample represents the population of interest, characterizing the uncertainty, or potential error, associated with analyses of small sample sizes is essential. We present a method for estimating the probability that the sample mean is within a desired fraction of the standard deviation of the true mean using small (n<10) or very small (n < or = 5) sample sizes. This method can be used by researchers to determine post hoc the probability that their sample is a meaningful approximation of the population parameter. We tested the method using a large craniometric data set commonly used by researchers in the field. Given our results, we suggest that sample estimates of the population mean can be reasonable and meaningful even when based on small, and perhaps even very small, sample sizes.
Cao, Youfang; Liang, Jie
2013-01-01
Critical events that occur rarely in biological processes are of great importance, but are challenging to study using Monte Carlo simulation. By introducing biases to reaction selection and reaction rates, weighted stochastic simulation algorithms based on importance sampling allow rare events to be sampled more effectively. However, existing methods do not address the important issue of barrier crossing, which often arises from multistable networks and systems with complex probability landscape. In addition, the proliferation of parameters and the associated computing cost pose significant problems. Here we introduce a general theoretical framework for obtaining optimized biases in sampling individual reactions for estimating probabilities of rare events. We further describe a practical algorithm called adaptively biased sequential importance sampling (ABSIS) method for efficient probability estimation. By adopting a look-ahead strategy and by enumerating short paths from the current state, we estimate the reaction-specific and state-specific forward and backward moving probabilities of the system, which are then used to bias reaction selections. The ABSIS algorithm can automatically detect barrier-crossing regions, and can adjust bias adaptively at different steps of the sampling process, with bias determined by the outcome of exhaustively generated short paths. In addition, there are only two bias parameters to be determined, regardless of the number of the reactions and the complexity of the network. We have applied the ABSIS method to four biochemical networks: the birth-death process, the reversible isomerization, the bistable Schlögl model, and the enzymatic futile cycle model. For comparison, we have also applied the finite buffer discrete chemical master equation (dCME) method recently developed to obtain exact numerical solutions of the underlying discrete chemical master equations of these problems. This allows us to assess sampling results objectively by comparing simulation results with true answers. Overall, ABSIS can accurately and efficiently estimate rare event probabilities for all examples, often with smaller variance than other importance sampling algorithms. The ABSIS method is general and can be applied to study rare events of other stochastic networks with complex probability landscape. PMID:23862966
NASA Astrophysics Data System (ADS)
Cao, Youfang; Liang, Jie
2013-07-01
Critical events that occur rarely in biological processes are of great importance, but are challenging to study using Monte Carlo simulation. By introducing biases to reaction selection and reaction rates, weighted stochastic simulation algorithms based on importance sampling allow rare events to be sampled more effectively. However, existing methods do not address the important issue of barrier crossing, which often arises from multistable networks and systems with complex probability landscape. In addition, the proliferation of parameters and the associated computing cost pose significant problems. Here we introduce a general theoretical framework for obtaining optimized biases in sampling individual reactions for estimating probabilities of rare events. We further describe a practical algorithm called adaptively biased sequential importance sampling (ABSIS) method for efficient probability estimation. By adopting a look-ahead strategy and by enumerating short paths from the current state, we estimate the reaction-specific and state-specific forward and backward moving probabilities of the system, which are then used to bias reaction selections. The ABSIS algorithm can automatically detect barrier-crossing regions, and can adjust bias adaptively at different steps of the sampling process, with bias determined by the outcome of exhaustively generated short paths. In addition, there are only two bias parameters to be determined, regardless of the number of the reactions and the complexity of the network. We have applied the ABSIS method to four biochemical networks: the birth-death process, the reversible isomerization, the bistable Schlögl model, and the enzymatic futile cycle model. For comparison, we have also applied the finite buffer discrete chemical master equation (dCME) method recently developed to obtain exact numerical solutions of the underlying discrete chemical master equations of these problems. This allows us to assess sampling results objectively by comparing simulation results with true answers. Overall, ABSIS can accurately and efficiently estimate rare event probabilities for all examples, often with smaller variance than other importance sampling algorithms. The ABSIS method is general and can be applied to study rare events of other stochastic networks with complex probability landscape.
Cao, Youfang; Liang, Jie
2013-07-14
Critical events that occur rarely in biological processes are of great importance, but are challenging to study using Monte Carlo simulation. By introducing biases to reaction selection and reaction rates, weighted stochastic simulation algorithms based on importance sampling allow rare events to be sampled more effectively. However, existing methods do not address the important issue of barrier crossing, which often arises from multistable networks and systems with complex probability landscape. In addition, the proliferation of parameters and the associated computing cost pose significant problems. Here we introduce a general theoretical framework for obtaining optimized biases in sampling individual reactions for estimating probabilities of rare events. We further describe a practical algorithm called adaptively biased sequential importance sampling (ABSIS) method for efficient probability estimation. By adopting a look-ahead strategy and by enumerating short paths from the current state, we estimate the reaction-specific and state-specific forward and backward moving probabilities of the system, which are then used to bias reaction selections. The ABSIS algorithm can automatically detect barrier-crossing regions, and can adjust bias adaptively at different steps of the sampling process, with bias determined by the outcome of exhaustively generated short paths. In addition, there are only two bias parameters to be determined, regardless of the number of the reactions and the complexity of the network. We have applied the ABSIS method to four biochemical networks: the birth-death process, the reversible isomerization, the bistable Schlögl model, and the enzymatic futile cycle model. For comparison, we have also applied the finite buffer discrete chemical master equation (dCME) method recently developed to obtain exact numerical solutions of the underlying discrete chemical master equations of these problems. This allows us to assess sampling results objectively by comparing simulation results with true answers. Overall, ABSIS can accurately and efficiently estimate rare event probabilities for all examples, often with smaller variance than other importance sampling algorithms. The ABSIS method is general and can be applied to study rare events of other stochastic networks with complex probability landscape.
Kang, Leni; Zhang, Shaokai; Zhao, Fanghui; Qiao, Youlin
2014-03-01
To evaluate and adjust the verification bias existed in the screening or diagnostic tests. Inverse-probability weighting method was used to adjust the sensitivity and specificity of the diagnostic tests, with an example of cervical cancer screening used to introduce the Compare Tests package in R software which could be implemented. Sensitivity and specificity calculated from the traditional method and maximum likelihood estimation method were compared to the results from Inverse-probability weighting method in the random-sampled example. The true sensitivity and specificity of the HPV self-sampling test were 83.53% (95%CI:74.23-89.93)and 85.86% (95%CI: 84.23-87.36). In the analysis of data with randomly missing verification by gold standard, the sensitivity and specificity calculated by traditional method were 90.48% (95%CI:80.74-95.56)and 71.96% (95%CI:68.71-75.00), respectively. The adjusted sensitivity and specificity under the use of Inverse-probability weighting method were 82.25% (95% CI:63.11-92.62) and 85.80% (95% CI: 85.09-86.47), respectively, whereas they were 80.13% (95%CI:66.81-93.46)and 85.80% (95%CI: 84.20-87.41) under the maximum likelihood estimation method. The inverse-probability weighting method could effectively adjust the sensitivity and specificity of a diagnostic test when verification bias existed, especially when complex sampling appeared.
Red-shouldered hawk occupancy surveys in central Minnesota, USA
Henneman, C.; McLeod, M.A.; Andersen, D.E.
2007-01-01
Forest-dwelling raptors are often difficult to detect because many species occur at low density or are secretive. Broadcasting conspecific vocalizations can increase the probability of detecting forest-dwelling raptors and has been shown to be an effective method for locating raptors and assessing their relative abundance. Recent advances in statistical techniques based on presence-absence data use probabilistic arguments to derive probability of detection when it is <1 and to provide a model and likelihood-based method for estimating proportion of sites occupied. We used these maximum-likelihood models with data from red-shouldered hawk (Buteo lineatus) call-broadcast surveys conducted in central Minnesota, USA, in 1994-1995 and 2004-2005. Our objectives were to obtain estimates of occupancy and detection probability 1) over multiple sampling seasons (yr), 2) incorporating within-season time-specific detection probabilities, 3) with call type and breeding stage included as covariates in models of probability of detection, and 4) with different sampling strategies. We visited individual survey locations 2-9 times per year, and estimates of both probability of detection (range = 0.28-0.54) and site occupancy (range = 0.81-0.97) varied among years. Detection probability was affected by inclusion of a within-season time-specific covariate, call type, and breeding stage. In 2004 and 2005 we used survey results to assess the effect that number of sample locations, double sampling, and discontinued sampling had on parameter estimates. We found that estimates of probability of detection and proportion of sites occupied were similar across different sampling strategies, and we suggest ways to reduce sampling effort in a monitoring program.
Public Attitudes toward Stuttering in Turkey: Probability versus Convenience Sampling
ERIC Educational Resources Information Center
Ozdemir, R. Sertan; St. Louis, Kenneth O.; Topbas, Seyhun
2011-01-01
Purpose: A Turkish translation of the "Public Opinion Survey of Human Attributes-Stuttering" ("POSHA-S") was used to compare probability versus convenience sampling to measure public attitudes toward stuttering. Method: A convenience sample of adults in Eskisehir, Turkey was compared with two replicates of a school-based,…
Roh, Min K; Gillespie, Dan T; Petzold, Linda R
2010-11-07
The weighted stochastic simulation algorithm (wSSA) was developed by Kuwahara and Mura [J. Chem. Phys. 129, 165101 (2008)] to efficiently estimate the probabilities of rare events in discrete stochastic systems. The wSSA uses importance sampling to enhance the statistical accuracy in the estimation of the probability of the rare event. The original algorithm biases the reaction selection step with a fixed importance sampling parameter. In this paper, we introduce a novel method where the biasing parameter is state-dependent. The new method features improved accuracy, efficiency, and robustness.
O'Connell, Allan F.; Talancy, Neil W.; Bailey, Larissa L.; Sauer, John R.; Cook, Robert; Gilbert, Andrew T.
2006-01-01
Large-scale, multispecies monitoring programs are widely used to assess changes in wildlife populations but they often assume constant detectability when documenting species occurrence. This assumption is rarely met in practice because animal populations vary across time and space. As a result, detectability of a species can be influenced by a number of physical, biological, or anthropogenic factors (e.g., weather, seasonality, topography, biological rhythms, sampling methods). To evaluate some of these influences, we estimated site occupancy rates using species-specific detection probabilities for meso- and large terrestrial mammal species on Cape Cod, Massachusetts, USA. We used model selection to assess the influence of different sampling methods and major environmental factors on our ability to detect individual species. Remote cameras detected the most species (9), followed by cubby boxes (7) and hair traps (4) over a 13-month period. Estimated site occupancy rates were similar among sampling methods for most species when detection probabilities exceeded 0.15, but we question estimates obtained from methods with detection probabilities between 0.05 and 0.15, and we consider methods with lower probabilities unacceptable for occupancy estimation and inference. Estimated detection probabilities can be used to accommodate variation in sampling methods, which allows for comparison of monitoring programs using different protocols. Vegetation and seasonality produced species-specific differences in detectability and occupancy, but differences were not consistent within or among species, which suggests that our results should be considered in the context of local habitat features and life history traits for the target species. We believe that site occupancy is a useful state variable and suggest that monitoring programs for mammals using occupancy data consider detectability prior to making inferences about species distributions or population change.
Probability of coincidental similarity among the orbits of small bodies - I. Pairing
NASA Astrophysics Data System (ADS)
Jopek, Tadeusz Jan; Bronikowska, Małgorzata
2017-09-01
Probability of coincidental clustering among orbits of comets, asteroids and meteoroids depends on many factors like: the size of the orbital sample searched for clusters or the size of the identified group, it is different for groups of 2,3,4,… members. Probability of coincidental clustering is assessed by the numerical simulation, therefore, it depends also on the method used for the synthetic orbits generation. We have tested the impact of some of these factors. For a given size of the orbital sample we have assessed probability of random pairing among several orbital populations of different sizes. We have found how these probabilities vary with the size of the orbital samples. Finally, keeping fixed size of the orbital sample we have shown that the probability of random pairing can be significantly different for the orbital samples obtained by different observation techniques. Also for the user convenience we have obtained several formulae which, for given size of the orbital sample can be used to calculate the similarity threshold corresponding to the small value of the probability of coincidental similarity among two orbits.
A re-evaluation of a case-control model with contaminated controls for resource selection studies
Christopher T. Rota; Joshua J. Millspaugh; Dylan C. Kesler; Chad P. Lehman; Mark A. Rumble; Catherine M. B. Jachowski
2013-01-01
A common sampling design in resource selection studies involves measuring resource attributes at sample units used by an animal and at sample units considered available for use. Few models can estimate the absolute probability of using a sample unit from such data, but such approaches are generally preferred over statistical methods that estimate a relative probability...
Lognormal Approximations of Fault Tree Uncertainty Distributions.
El-Shanawany, Ashraf Ben; Ardron, Keith H; Walker, Simon P
2018-01-26
Fault trees are used in reliability modeling to create logical models of fault combinations that can lead to undesirable events. The output of a fault tree analysis (the top event probability) is expressed in terms of the failure probabilities of basic events that are input to the model. Typically, the basic event probabilities are not known exactly, but are modeled as probability distributions: therefore, the top event probability is also represented as an uncertainty distribution. Monte Carlo methods are generally used for evaluating the uncertainty distribution, but such calculations are computationally intensive and do not readily reveal the dominant contributors to the uncertainty. In this article, a closed-form approximation for the fault tree top event uncertainty distribution is developed, which is applicable when the uncertainties in the basic events of the model are lognormally distributed. The results of the approximate method are compared with results from two sampling-based methods: namely, the Monte Carlo method and the Wilks method based on order statistics. It is shown that the closed-form expression can provide a reasonable approximation to results obtained by Monte Carlo sampling, without incurring the computational expense. The Wilks method is found to be a useful means of providing an upper bound for the percentiles of the uncertainty distribution while being computationally inexpensive compared with full Monte Carlo sampling. The lognormal approximation method and Wilks's method appear attractive, practical alternatives for the evaluation of uncertainty in the output of fault trees and similar multilinear models. © 2018 Society for Risk Analysis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Geist, William H.
2017-09-15
The objectives for this presentation are to describe the method that the IAEA uses to determine a sampling plan for nuclear material measurements; describe the terms detection probability and significant quantity; list the three nuclear materials measurement types; describe the sampling method applied to an item facility; and describe multiple method sampling.
Modulation Based on Probability Density Functions
NASA Technical Reports Server (NTRS)
Williams, Glenn L.
2009-01-01
A proposed method of modulating a sinusoidal carrier signal to convey digital information involves the use of histograms representing probability density functions (PDFs) that characterize samples of the signal waveform. The method is based partly on the observation that when a waveform is sampled (whether by analog or digital means) over a time interval at least as long as one half cycle of the waveform, the samples can be sorted by frequency of occurrence, thereby constructing a histogram representing a PDF of the waveform during that time interval.
A New Method for Generating Probability Tables in the Unresolved Resonance Region
Holcomb, Andrew M.; Leal, Luiz C.; Rahnema, Farzad; ...
2017-04-18
One new method for constructing probability tables in the unresolved resonance region (URR) has been developed. This new methodology is an extensive modification of the single-level Breit-Wigner (SLBW) pseudo-resonance pair sequence method commonly used to generate probability tables in the URR. The new method uses a Monte Carlo process to generate many pseudo-resonance sequences by first sampling the average resonance parameter data in the URR and then converting the sampled resonance parameters to the more robust R-matrix limited (RML) format. Furthermore, for each sampled set of pseudo-resonance sequences, the temperature-dependent cross sections are reconstructed on a small grid around themore » energy of reference using the Reich-Moore formalism and the Leal-Hwang Doppler broadening methodology. We then use the effective cross sections calculated at the energies of reference to construct probability tables in the URR. The RML cross-section reconstruction algorithm has been rigorously tested for a variety of isotopes, including 16O, 19F, 35Cl, 56Fe, 63Cu, and 65Cu. The new URR method also produced normalized cross-section factor probability tables for 238U that were found to be in agreement with current standards. The modified 238U probability tables were shown to produce results in excellent agreement with several standard benchmarks, including the IEU-MET-FAST-007 (BIG TEN), IEU-MET-FAST-003, and IEU-COMP-FAST-004 benchmarks.« less
Assessing performance and validating finite element simulations using probabilistic knowledge
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dolin, Ronald M.; Rodriguez, E. A.
Two probabilistic approaches for assessing performance are presented. The first approach assesses probability of failure by simultaneously modeling all likely events. The probability each event causes failure along with the event's likelihood of occurrence contribute to the overall probability of failure. The second assessment method is based on stochastic sampling using an influence diagram. Latin-hypercube sampling is used to stochastically assess events. The overall probability of failure is taken as the maximum probability of failure of all the events. The Likelihood of Occurrence simulation suggests failure does not occur while the Stochastic Sampling approach predicts failure. The Likelihood of Occurrencemore » results are used to validate finite element predictions.« less
Wilcox, Taylor M; Mckelvey, Kevin S.; Young, Michael K.; Sepulveda, Adam; Shepard, Bradley B.; Jane, Stephen F; Whiteley, Andrew R.; Lowe, Winsor H.; Schwartz, Michael K.
2016-01-01
Environmental DNA sampling (eDNA) has emerged as a powerful tool for detecting aquatic animals. Previous research suggests that eDNA methods are substantially more sensitive than traditional sampling. However, the factors influencing eDNA detection and the resulting sampling costs are still not well understood. Here we use multiple experiments to derive independent estimates of eDNA production rates and downstream persistence from brook trout (Salvelinus fontinalis) in streams. We use these estimates to parameterize models comparing the false negative detection rates of eDNA sampling and traditional backpack electrofishing. We find that using the protocols in this study eDNA had reasonable detection probabilities at extremely low animal densities (e.g., probability of detection 0.18 at densities of one fish per stream kilometer) and very high detection probabilities at population-level densities (e.g., probability of detection > 0.99 at densities of ≥ 3 fish per 100 m). This is substantially more sensitive than traditional electrofishing for determining the presence of brook trout and may translate into important cost savings when animals are rare. Our findings are consistent with a growing body of literature showing that eDNA sampling is a powerful tool for the detection of aquatic species, particularly those that are rare and difficult to sample using traditional methods.
Method for predicting peptide detection in mass spectrometry
Kangas, Lars [West Richland, WA; Smith, Richard D [Richland, WA; Petritis, Konstantinos [Richland, WA
2010-07-13
A method of predicting whether a peptide present in a biological sample will be detected by analysis with a mass spectrometer. The method uses at least one mass spectrometer to perform repeated analysis of a sample containing peptides from proteins with known amino acids. The method then generates a data set of peptides identified as contained within the sample by the repeated analysis. The method then calculates the probability that a specific peptide in the data set was detected in the repeated analysis. The method then creates a plurality of vectors, where each vector has a plurality of dimensions, and each dimension represents a property of one or more of the amino acids present in each peptide and adjacent peptides in the data set. Using these vectors, the method then generates an algorithm from the plurality of vectors and the calculated probabilities that specific peptides in the data set were detected in the repeated analysis. The algorithm is thus capable of calculating the probability that a hypothetical peptide represented as a vector will be detected by a mass spectrometry based proteomic platform, given that the peptide is present in a sample introduced into a mass spectrometer.
Zhou, Hanzhi; Elliott, Michael R; Raghunathan, Trivellore E
2016-06-01
Multistage sampling is often employed in survey samples for cost and convenience. However, accounting for clustering features when generating datasets for multiple imputation is a nontrivial task, particularly when, as is often the case, cluster sampling is accompanied by unequal probabilities of selection, necessitating case weights. Thus, multiple imputation often ignores complex sample designs and assumes simple random sampling when generating imputations, even though failing to account for complex sample design features is known to yield biased estimates and confidence intervals that have incorrect nominal coverage. In this article, we extend a recently developed, weighted, finite-population Bayesian bootstrap procedure to generate synthetic populations conditional on complex sample design data that can be treated as simple random samples at the imputation stage, obviating the need to directly model design features for imputation. We develop two forms of this method: one where the probabilities of selection are known at the first and second stages of the design, and the other, more common in public use files, where only the final weight based on the product of the two probabilities is known. We show that this method has advantages in terms of bias, mean square error, and coverage properties over methods where sample designs are ignored, with little loss in efficiency, even when compared with correct fully parametric models. An application is made using the National Automotive Sampling System Crashworthiness Data System, a multistage, unequal probability sample of U.S. passenger vehicle crashes, which suffers from a substantial amount of missing data in "Delta-V," a key crash severity measure.
Zhou, Hanzhi; Elliott, Michael R.; Raghunathan, Trivellore E.
2017-01-01
Multistage sampling is often employed in survey samples for cost and convenience. However, accounting for clustering features when generating datasets for multiple imputation is a nontrivial task, particularly when, as is often the case, cluster sampling is accompanied by unequal probabilities of selection, necessitating case weights. Thus, multiple imputation often ignores complex sample designs and assumes simple random sampling when generating imputations, even though failing to account for complex sample design features is known to yield biased estimates and confidence intervals that have incorrect nominal coverage. In this article, we extend a recently developed, weighted, finite-population Bayesian bootstrap procedure to generate synthetic populations conditional on complex sample design data that can be treated as simple random samples at the imputation stage, obviating the need to directly model design features for imputation. We develop two forms of this method: one where the probabilities of selection are known at the first and second stages of the design, and the other, more common in public use files, where only the final weight based on the product of the two probabilities is known. We show that this method has advantages in terms of bias, mean square error, and coverage properties over methods where sample designs are ignored, with little loss in efficiency, even when compared with correct fully parametric models. An application is made using the National Automotive Sampling System Crashworthiness Data System, a multistage, unequal probability sample of U.S. passenger vehicle crashes, which suffers from a substantial amount of missing data in “Delta-V,” a key crash severity measure. PMID:29226161
Perpendicular distance sampling: an alternative method for sampling downed coarse woody debris
Michael S. Williams; Jeffrey H. Gove
2003-01-01
Coarse woody debris (CWD) plays an important role in many forest ecosystem processes. In recent years, a number of new methods have been proposed to sample CWD. These methods select individual logs into the sample using some form of unequal probability sampling. One concern with most of these methods is the difficulty in estimating the volume of each log. A new method...
NASA Astrophysics Data System (ADS)
Zhang, Jiaxin; Shields, Michael D.
2018-01-01
This paper addresses the problem of uncertainty quantification and propagation when data for characterizing probability distributions are scarce. We propose a methodology wherein the full uncertainty associated with probability model form and parameter estimation are retained and efficiently propagated. This is achieved by applying the information-theoretic multimodel inference method to identify plausible candidate probability densities and associated probabilities that each method is the best model in the Kullback-Leibler sense. The joint parameter densities for each plausible model are then estimated using Bayes' rule. We then propagate this full set of probability models by estimating an optimal importance sampling density that is representative of all plausible models, propagating this density, and reweighting the samples according to each of the candidate probability models. This is in contrast with conventional methods that try to identify a single probability model that encapsulates the full uncertainty caused by lack of data and consequently underestimate uncertainty. The result is a complete probabilistic description of both aleatory and epistemic uncertainty achieved with several orders of magnitude reduction in computational cost. It is shown how the model can be updated to adaptively accommodate added data and added candidate probability models. The method is applied for uncertainty analysis of plate buckling strength where it is demonstrated how dataset size affects the confidence (or lack thereof) we can place in statistical estimates of response when data are lacking.
For what applications can probability and non-probability sampling be used?
H. T. Schreuder; T. G. Gregoire; J. P. Weyer
2001-01-01
Almost any type of sample has some utility when estimating population quantities. The focus in this paper is to indicate what type or combination of types of sampling can be used in various situations ranging from a sample designed to establish cause-effect or legal challenge to one involving a simple subjective judgment. Several of these methods have little or no...
Sampling and estimating recreational use.
Timothy G. Gregoire; Gregory J. Buhyoff
1999-01-01
Probability sampling methods applicable to estimate recreational use are presented. Both single- and multiple-access recreation sites are considered. One- and two-stage sampling methods are presented. Estimation of recreational use is presented in a series of examples.
USDA-ARS?s Scientific Manuscript database
Small, coded, pill-sized tracers embedded in grain are proposed as a method for grain traceability. A sampling process for a grain traceability system was designed and investigated by applying probability statistics using a science-based sampling approach to collect an adequate number of tracers fo...
NASA Astrophysics Data System (ADS)
Christen, Alejandra; Escarate, Pedro; Curé, Michel; Rial, Diego F.; Cassetti, Julia
2016-10-01
Aims: Knowing the distribution of stellar rotational velocities is essential for understanding stellar evolution. Because we measure the projected rotational speed v sin I, we need to solve an ill-posed problem given by a Fredholm integral of the first kind to recover the "true" rotational velocity distribution. Methods: After discretization of the Fredholm integral we apply the Tikhonov regularization method to obtain directly the probability distribution function for stellar rotational velocities. We propose a simple and straightforward procedure to determine the Tikhonov parameter. We applied Monte Carlo simulations to prove that the Tikhonov method is a consistent estimator and asymptotically unbiased. Results: This method is applied to a sample of cluster stars. We obtain confidence intervals using a bootstrap method. Our results are in close agreement with those obtained using the Lucy method for recovering the probability density distribution of rotational velocities. Furthermore, Lucy estimation lies inside our confidence interval. Conclusions: Tikhonov regularization is a highly robust method that deconvolves the rotational velocity probability density function from a sample of v sin I data directly without the need for any convergence criteria.
Statistics provide guidance for indigenous organic carbon detection on Mars missions.
Sephton, Mark A; Carter, Jonathan N
2014-08-01
Data from the Viking and Mars Science Laboratory missions indicate the presence of organic compounds that are not definitively martian in origin. Both contamination and confounding mineralogies have been suggested as alternatives to indigenous organic carbon. Intuitive thought suggests that we are repeatedly obtaining data that confirms the same level of uncertainty. Bayesian statistics may suggest otherwise. If an organic detection method has a true positive to false positive ratio greater than one, then repeated organic matter detection progressively increases the probability of indigeneity. Bayesian statistics also reveal that methods with higher ratios of true positives to false positives give higher overall probabilities and that detection of organic matter in a sample with a higher prior probability of indigenous organic carbon produces greater confidence. Bayesian statistics, therefore, provide guidance for the planning and operation of organic carbon detection activities on Mars. Suggestions for future organic carbon detection missions and instruments are as follows: (i) On Earth, instruments should be tested with analog samples of known organic content to determine their true positive to false positive ratios. (ii) On the mission, for an instrument with a true positive to false positive ratio above one, it should be recognized that each positive detection of organic carbon will result in a progressive increase in the probability of indigenous organic carbon being present; repeated measurements, therefore, can overcome some of the deficiencies of a less-than-definitive test. (iii) For a fixed number of analyses, the highest true positive to false positive ratio method or instrument will provide the greatest probability that indigenous organic carbon is present. (iv) On Mars, analyses should concentrate on samples with highest prior probability of indigenous organic carbon; intuitive desires to contrast samples of high prior probability and low prior probability of indigenous organic carbon should be resisted.
Nielson, Ryan M.; Gray, Brian R.; McDonald, Lyman L.; Heglund, Patricia J.
2011-01-01
Estimation of site occupancy rates when detection probabilities are <1 is well established in wildlife science. Data from multiple visits to a sample of sites are used to estimate detection probabilities and the proportion of sites occupied by focal species. In this article we describe how site occupancy methods can be applied to estimate occupancy rates of plants and other sessile organisms. We illustrate this approach and the pitfalls of ignoring incomplete detection using spatial data for 2 aquatic vascular plants collected under the Upper Mississippi River's Long Term Resource Monitoring Program (LTRMP). Site occupancy models considered include: a naïve model that ignores incomplete detection, a simple site occupancy model assuming a constant occupancy rate and a constant probability of detection across sites, several models that allow site occupancy rates and probabilities of detection to vary with habitat characteristics, and mixture models that allow for unexplained variation in detection probabilities. We used information theoretic methods to rank competing models and bootstrapping to evaluate the goodness-of-fit of the final models. Results of our analysis confirm that ignoring incomplete detection can result in biased estimates of occupancy rates. Estimates of site occupancy rates for 2 aquatic plant species were 19–36% higher compared to naive estimates that ignored probabilities of detection <1. Simulations indicate that final models have little bias when 50 or more sites are sampled, and little gains in precision could be expected for sample sizes >300. We recommend applying site occupancy methods for monitoring presence of aquatic species.
Surveying Europe's Only Cave-Dwelling Chordate Species (Proteus anguinus) Using Environmental DNA.
Vörös, Judit; Márton, Orsolya; Schmidt, Benedikt R; Gál, Júlia Tünde; Jelić, Dušan
2017-01-01
In surveillance of subterranean fauna, especially in the case of rare or elusive aquatic species, traditional techniques used for epigean species are often not feasible. We developed a non-invasive survey method based on environmental DNA (eDNA) to detect the presence of the red-listed cave-dwelling amphibian, Proteus anguinus, in the caves of the Dinaric Karst. We tested the method in fifteen caves in Croatia, from which the species was previously recorded or expected to occur. We successfully confirmed the presence of P. anguinus from ten caves and detected the species for the first time in five others. Using a hierarchical occupancy model we compared the availability and detection probability of eDNA of two water sampling methods, filtration and precipitation. The statistical analysis showed that both availability and detection probability depended on the method and estimates for both probabilities were higher using filter samples than for precipitation samples. Combining reliable field and laboratory methods with robust statistical modeling will give the best estimates of species occurrence.
Sampling in epidemiological research: issues, hazards and pitfalls.
Tyrer, Stephen; Heyman, Bob
2016-04-01
Surveys of people's opinions are fraught with difficulties. It is easier to obtain information from those who respond to text messages or to emails than to attempt to obtain a representative sample. Samples of the population that are selected non-randomly in this way are termed convenience samples as they are easy to recruit. This introduces a sampling bias. Such non-probability samples have merit in many situations, but an epidemiological enquiry is of little value unless a random sample is obtained. If a sufficient number of those selected actually complete a survey, the results are likely to be representative of the population. This editorial describes probability and non-probability sampling methods and illustrates the difficulties and suggested solutions in performing accurate epidemiological research.
Sampling in epidemiological research: issues, hazards and pitfalls
Tyrer, Stephen; Heyman, Bob
2016-01-01
Surveys of people's opinions are fraught with difficulties. It is easier to obtain information from those who respond to text messages or to emails than to attempt to obtain a representative sample. Samples of the population that are selected non-randomly in this way are termed convenience samples as they are easy to recruit. This introduces a sampling bias. Such non-probability samples have merit in many situations, but an epidemiological enquiry is of little value unless a random sample is obtained. If a sufficient number of those selected actually complete a survey, the results are likely to be representative of the population. This editorial describes probability and non-probability sampling methods and illustrates the difficulties and suggested solutions in performing accurate epidemiological research. PMID:27087985
Thorndahl, S; Willems, P
2008-01-01
Failure of urban drainage systems may occur due to surcharge or flooding at specific manholes in the system, or due to overflows from combined sewer systems to receiving waters. To quantify the probability or return period of failure, standard approaches make use of the simulation of design storms or long historical rainfall series in a hydrodynamic model of the urban drainage system. In this paper, an alternative probabilistic method is investigated: the first-order reliability method (FORM). To apply this method, a long rainfall time series was divided in rainstorms (rain events), and each rainstorm conceptualized to a synthetic rainfall hyetograph by a Gaussian shape with the parameters rainstorm depth, duration and peak intensity. Probability distributions were calibrated for these three parameters and used on the basis of the failure probability estimation, together with a hydrodynamic simulation model to determine the failure conditions for each set of parameters. The method takes into account the uncertainties involved in the rainstorm parameterization. Comparison is made between the failure probability results of the FORM method, the standard method using long-term simulations and alternative methods based on random sampling (Monte Carlo direct sampling and importance sampling). It is concluded that without crucial influence on the modelling accuracy, the FORM is very applicable as an alternative to traditional long-term simulations of urban drainage systems.
Pendleton, G.W.; Ralph, C. John; Sauer, John R.; Droege, Sam
1995-01-01
Many factors affect the use of point counts for monitoring bird populations, including sampling strategies, variation in detection rates, and independence of sample points. The most commonly used sampling plans are stratified sampling, cluster sampling, and systematic sampling. Each of these might be most useful for different objectives or field situations. Variation in detection probabilities and lack of independence among sample points can bias estimates and measures of precision. All of these factors should be con-sidered when using point count methods.
Use and interpretation of logistic regression in habitat-selection studies
Keating, Kim A.; Cherry, Steve
2004-01-01
Logistic regression is an important tool for wildlife habitat-selection studies, but the method frequently has been misapplied due to an inadequate understanding of the logistic model, its interpretation, and the influence of sampling design. To promote better use of this method, we review its application and interpretation under 3 sampling designs: random, case-control, and use-availability. Logistic regression is appropriate for habitat use-nonuse studies employing random sampling and can be used to directly model the conditional probability of use in such cases. Logistic regression also is appropriate for studies employing case-control sampling designs, but careful attention is required to interpret results correctly. Unless bias can be estimated or probability of use is small for all habitats, results of case-control studies should be interpreted as odds ratios, rather than probability of use or relative probability of use. When data are gathered under a use-availability design, logistic regression can be used to estimate approximate odds ratios if probability of use is small, at least on average. More generally, however, logistic regression is inappropriate for modeling habitat selection in use-availability studies. In particular, using logistic regression to fit the exponential model of Manly et al. (2002:100) does not guarantee maximum-likelihood estimates, valid probabilities, or valid likelihoods. We show that the resource selection function (RSF) commonly used for the exponential model is proportional to a logistic discriminant function. Thus, it may be used to rank habitats with respect to probability of use and to identify important habitat characteristics or their surrogates, but it is not guaranteed to be proportional to probability of use. Other problems associated with the exponential model also are discussed. We describe an alternative model based on Lancaster and Imbens (1996) that offers a method for estimating conditional probability of use in use-availability studies. Although promising, this model fails to converge to a unique solution in some important situations. Further work is needed to obtain a robust method that is broadly applicable to use-availability studies.
Mixed Methods Sampling: A Typology with Examples
ERIC Educational Resources Information Center
Teddlie, Charles; Yu, Fen
2007-01-01
This article presents a discussion of mixed methods (MM) sampling techniques. MM sampling involves combining well-established qualitative and quantitative techniques in creative ways to answer research questions posed by MM research designs. Several issues germane to MM sampling are presented including the differences between probability and…
Estimating rare events in biochemical systems using conditional sampling.
Sundar, V S
2017-01-28
The paper focuses on development of variance reduction strategies to estimate rare events in biochemical systems. Obtaining this probability using brute force Monte Carlo simulations in conjunction with the stochastic simulation algorithm (Gillespie's method) is computationally prohibitive. To circumvent this, important sampling tools such as the weighted stochastic simulation algorithm and the doubly weighted stochastic simulation algorithm have been proposed. However, these strategies require an additional step of determining the important region to sample from, which is not straightforward for most of the problems. In this paper, we apply the subset simulation method, developed as a variance reduction tool in the context of structural engineering, to the problem of rare event estimation in biochemical systems. The main idea is that the rare event probability is expressed as a product of more frequent conditional probabilities. These conditional probabilities are estimated with high accuracy using Monte Carlo simulations, specifically the Markov chain Monte Carlo method with the modified Metropolis-Hastings algorithm. Generating sample realizations of the state vector using the stochastic simulation algorithm is viewed as mapping the discrete-state continuous-time random process to the standard normal random variable vector. This viewpoint opens up the possibility of applying more sophisticated and efficient sampling schemes developed elsewhere to problems in stochastic chemical kinetics. The results obtained using the subset simulation method are compared with existing variance reduction strategies for a few benchmark problems, and a satisfactory improvement in computational time is demonstrated.
Estimating abundance of mountain lions from unstructured spatial sampling
Russell, Robin E.; Royle, J. Andrew; Desimone, Richard; Schwartz, Michael K.; Edwards, Victoria L.; Pilgrim, Kristy P.; Mckelvey, Kevin S.
2012-01-01
Mountain lions (Puma concolor) are often difficult to monitor because of their low capture probabilities, extensive movements, and large territories. Methods for estimating the abundance of this species are needed to assess population status, determine harvest levels, evaluate the impacts of management actions on populations, and derive conservation and management strategies. Traditional mark–recapture methods do not explicitly account for differences in individual capture probabilities due to the spatial distribution of individuals in relation to survey effort (or trap locations). However, recent advances in the analysis of capture–recapture data have produced methods estimating abundance and density of animals from spatially explicit capture–recapture data that account for heterogeneity in capture probabilities due to the spatial organization of individuals and traps. We adapt recently developed spatial capture–recapture models to estimate density and abundance of mountain lions in western Montana. Volunteers and state agency personnel collected mountain lion DNA samples in portions of the Blackfoot drainage (7,908 km2) in west-central Montana using 2 methods: snow back-tracking mountain lion tracks to collect hair samples and biopsy darting treed mountain lions to obtain tissue samples. Overall, we recorded 72 individual capture events, including captures both with and without tissue sample collection and hair samples resulting in the identification of 50 individual mountain lions (30 females, 19 males, and 1 unknown sex individual). We estimated lion densities from 8 models containing effects of distance, sex, and survey effort on detection probability. Our population density estimates ranged from a minimum of 3.7 mountain lions/100 km2 (95% Cl 2.3–5.7) under the distance only model (including only an effect of distance on detection probability) to 6.7 (95% Cl 3.1–11.0) under the full model (including effects of distance, sex, survey effort, and distance x sex on detection probability). These numbers translate to a total estimate of 293 mountain lions (95% Cl 182–451) to 529 (95% Cl 245–870) within the Blackfoot drainage. Results from the distance model are similar to previous estimates of 3.6 mountain lions/100 km2 for the study area; however, results from all other models indicated greater numbers of mountain lions. Our results indicate that unstructured spatial sampling combined with spatial capture–recapture analysis can be an effective method for estimating large carnivore densities.
On estimating probability of presence from use-availability or presence-background data.
Phillips, Steven J; Elith, Jane
2013-06-01
A fundamental ecological modeling task is to estimate the probability that a species is present in (or uses) a site, conditional on environmental variables. For many species, available data consist of "presence" data (locations where the species [or evidence of it] has been observed), together with "background" data, a random sample of available environmental conditions. Recently published papers disagree on whether probability of presence is identifiable from such presence-background data alone. This paper aims to resolve the disagreement, demonstrating that additional information is required. We defined seven simulated species representing various simple shapes of response to environmental variables (constant, linear, convex, unimodal, S-shaped) and ran five logistic model-fitting methods using 1000 presence samples and 10 000 background samples; the simulations were repeated 100 times. The experiment revealed a stark contrast between two groups of methods: those based on a strong assumption that species' true probability of presence exactly matches a given parametric form had highly variable predictions and much larger RMS error than methods that take population prevalence (the fraction of sites in which the species is present) as an additional parameter. For six species, the former group grossly under- or overestimated probability of presence. The cause was not model structure or choice of link function, because all methods were logistic with linear and, where necessary, quadratic terms. Rather, the experiment demonstrates that an estimate of prevalence is not just helpful, but is necessary (except in special cases) for identifying probability of presence. We therefore advise against use of methods that rely on the strong assumption, due to Lele and Keim (recently advocated by Royle et al.) and Lancaster and Imbens. The methods are fragile, and their strong assumption is unlikely to be true in practice. We emphasize, however, that we are not arguing against standard statistical methods such as logistic regression, generalized linear models, and so forth, none of which requires the strong assumption. If probability of presence is required for a given application, there is no panacea for lack of data. Presence-background data must be augmented with an additional datum, e.g., species' prevalence, to reliably estimate absolute (rather than relative) probability of presence.
Yura, Harold T; Hanson, Steen G
2012-04-01
Methods for simulation of two-dimensional signals with arbitrary power spectral densities and signal amplitude probability density functions are disclosed. The method relies on initially transforming a white noise sample set of random Gaussian distributed numbers into a corresponding set with the desired spectral distribution, after which this colored Gaussian probability distribution is transformed via an inverse transform into the desired probability distribution. In most cases the method provides satisfactory results and can thus be considered an engineering approach. Several illustrative examples with relevance for optics are given.
Surveying Europe’s Only Cave-Dwelling Chordate Species (Proteus anguinus) Using Environmental DNA
Márton, Orsolya; Schmidt, Benedikt R.; Gál, Júlia Tünde; Jelić, Dušan
2017-01-01
In surveillance of subterranean fauna, especially in the case of rare or elusive aquatic species, traditional techniques used for epigean species are often not feasible. We developed a non-invasive survey method based on environmental DNA (eDNA) to detect the presence of the red-listed cave-dwelling amphibian, Proteus anguinus, in the caves of the Dinaric Karst. We tested the method in fifteen caves in Croatia, from which the species was previously recorded or expected to occur. We successfully confirmed the presence of P. anguinus from ten caves and detected the species for the first time in five others. Using a hierarchical occupancy model we compared the availability and detection probability of eDNA of two water sampling methods, filtration and precipitation. The statistical analysis showed that both availability and detection probability depended on the method and estimates for both probabilities were higher using filter samples than for precipitation samples. Combining reliable field and laboratory methods with robust statistical modeling will give the best estimates of species occurrence. PMID:28129383
High throughput nonparametric probability density estimation.
Farmer, Jenny; Jacobs, Donald
2018-01-01
In high throughput applications, such as those found in bioinformatics and finance, it is important to determine accurate probability distribution functions despite only minimal information about data characteristics, and without using human subjectivity. Such an automated process for univariate data is implemented to achieve this goal by merging the maximum entropy method with single order statistics and maximum likelihood. The only required properties of the random variables are that they are continuous and that they are, or can be approximated as, independent and identically distributed. A quasi-log-likelihood function based on single order statistics for sampled uniform random data is used to empirically construct a sample size invariant universal scoring function. Then a probability density estimate is determined by iteratively improving trial cumulative distribution functions, where better estimates are quantified by the scoring function that identifies atypical fluctuations. This criterion resists under and over fitting data as an alternative to employing the Bayesian or Akaike information criterion. Multiple estimates for the probability density reflect uncertainties due to statistical fluctuations in random samples. Scaled quantile residual plots are also introduced as an effective diagnostic to visualize the quality of the estimated probability densities. Benchmark tests show that estimates for the probability density function (PDF) converge to the true PDF as sample size increases on particularly difficult test probability densities that include cases with discontinuities, multi-resolution scales, heavy tails, and singularities. These results indicate the method has general applicability for high throughput statistical inference.
High throughput nonparametric probability density estimation
Farmer, Jenny
2018-01-01
In high throughput applications, such as those found in bioinformatics and finance, it is important to determine accurate probability distribution functions despite only minimal information about data characteristics, and without using human subjectivity. Such an automated process for univariate data is implemented to achieve this goal by merging the maximum entropy method with single order statistics and maximum likelihood. The only required properties of the random variables are that they are continuous and that they are, or can be approximated as, independent and identically distributed. A quasi-log-likelihood function based on single order statistics for sampled uniform random data is used to empirically construct a sample size invariant universal scoring function. Then a probability density estimate is determined by iteratively improving trial cumulative distribution functions, where better estimates are quantified by the scoring function that identifies atypical fluctuations. This criterion resists under and over fitting data as an alternative to employing the Bayesian or Akaike information criterion. Multiple estimates for the probability density reflect uncertainties due to statistical fluctuations in random samples. Scaled quantile residual plots are also introduced as an effective diagnostic to visualize the quality of the estimated probability densities. Benchmark tests show that estimates for the probability density function (PDF) converge to the true PDF as sample size increases on particularly difficult test probability densities that include cases with discontinuities, multi-resolution scales, heavy tails, and singularities. These results indicate the method has general applicability for high throughput statistical inference. PMID:29750803
Deterministic multidimensional nonuniform gap sampling.
Worley, Bradley; Powers, Robert
2015-12-01
Born from empirical observations in nonuniformly sampled multidimensional NMR data relating to gaps between sampled points, the Poisson-gap sampling method has enjoyed widespread use in biomolecular NMR. While the majority of nonuniform sampling schemes are fully randomly drawn from probability densities that vary over a Nyquist grid, the Poisson-gap scheme employs constrained random deviates to minimize the gaps between sampled grid points. We describe a deterministic gap sampling method, based on the average behavior of Poisson-gap sampling, which performs comparably to its random counterpart with the additional benefit of completely deterministic behavior. We also introduce a general algorithm for multidimensional nonuniform sampling based on a gap equation, and apply it to yield a deterministic sampling scheme that combines burst-mode sampling features with those of Poisson-gap schemes. Finally, we derive a relationship between stochastic gap equations and the expectation value of their sampling probability densities. Copyright © 2015 Elsevier Inc. All rights reserved.
Sampling designs matching species biology produce accurate and affordable abundance indices
Farley, Sean; Russell, Gareth J.; Butler, Matthew J.; Selinger, Jeff
2013-01-01
Wildlife biologists often use grid-based designs to sample animals and generate abundance estimates. Although sampling in grids is theoretically sound, in application, the method can be logistically difficult and expensive when sampling elusive species inhabiting extensive areas. These factors make it challenging to sample animals and meet the statistical assumption of all individuals having an equal probability of capture. Violating this assumption biases results. Does an alternative exist? Perhaps by sampling only where resources attract animals (i.e., targeted sampling), it would provide accurate abundance estimates more efficiently and affordably. However, biases from this approach would also arise if individuals have an unequal probability of capture, especially if some failed to visit the sampling area. Since most biological programs are resource limited, and acquiring abundance data drives many conservation and management applications, it becomes imperative to identify economical and informative sampling designs. Therefore, we evaluated abundance estimates generated from grid and targeted sampling designs using simulations based on geographic positioning system (GPS) data from 42 Alaskan brown bears (Ursus arctos). Migratory salmon drew brown bears from the wider landscape, concentrating them at anadromous streams. This provided a scenario for testing the targeted approach. Grid and targeted sampling varied by trap amount, location (traps placed randomly, systematically or by expert opinion), and traps stationary or moved between capture sessions. We began by identifying when to sample, and if bears had equal probability of capture. We compared abundance estimates against seven criteria: bias, precision, accuracy, effort, plus encounter rates, and probabilities of capture and recapture. One grid (49 km2 cells) and one targeted configuration provided the most accurate results. Both placed traps by expert opinion and moved traps between capture sessions, which raised capture probabilities. The grid design was least biased (−10.5%), but imprecise (CV 21.2%), and used most effort (16,100 trap-nights). The targeted configuration was more biased (−17.3%), but most precise (CV 12.3%), with least effort (7,000 trap-nights). Targeted sampling generated encounter rates four times higher, and capture and recapture probabilities 11% and 60% higher than grid sampling, in a sampling frame 88% smaller. Bears had unequal probability of capture with both sampling designs, partly because some bears never had traps available to sample them. Hence, grid and targeted sampling generated abundance indices, not estimates. Overall, targeted sampling provided the most accurate and affordable design to index abundance. Targeted sampling may offer an alternative method to index the abundance of other species inhabiting expansive and inaccessible landscapes elsewhere, provided their attraction to resource concentrations. PMID:24392290
Influence of item distribution pattern and abundance on efficiency of benthic core sampling
Behney, Adam C.; O'Shaughnessy, Ryan; Eichholz, Michael W.; Stafford, Joshua D.
2014-01-01
ore sampling is a commonly used method to estimate benthic item density, but little information exists about factors influencing the accuracy and time-efficiency of this method. We simulated core sampling in a Geographic Information System framework by generating points (benthic items) and polygons (core samplers) to assess how sample size (number of core samples), core sampler size (cm2), distribution of benthic items, and item density affected the bias and precision of estimates of density, the detection probability of items, and the time-costs. When items were distributed randomly versus clumped, bias decreased and precision increased with increasing sample size and increased slightly with increasing core sampler size. Bias and precision were only affected by benthic item density at very low values (500–1,000 items/m2). Detection probability (the probability of capturing ≥ 1 item in a core sample if it is available for sampling) was substantially greater when items were distributed randomly as opposed to clumped. Taking more small diameter core samples was always more time-efficient than taking fewer large diameter samples. We are unable to present a single, optimal sample size, but provide information for researchers and managers to derive optimal sample sizes dependent on their research goals and environmental conditions.
Multiple data sources improve DNA-based mark-recapture population estimates of grizzly bears.
Boulanger, John; Kendall, Katherine C; Stetz, Jeffrey B; Roon, David A; Waits, Lisette P; Paetkau, David
2008-04-01
A fundamental challenge to estimating population size with mark-recapture methods is heterogeneous capture probabilities and subsequent bias of population estimates. Confronting this problem usually requires substantial sampling effort that can be difficult to achieve for some species, such as carnivores. We developed a methodology that uses two data sources to deal with heterogeneity and applied this to DNA mark-recapture data from grizzly bears (Ursus arctos). We improved population estimates by incorporating additional DNA "captures" of grizzly bears obtained by collecting hair from unbaited bear rub trees concurrently with baited, grid-based, hair snag sampling. We consider a Lincoln-Petersen estimator with hair snag captures as the initial session and rub tree captures as the recapture session and develop an estimator in program MARK that treats hair snag and rub tree samples as successive sessions. Using empirical data from a large-scale project in the greater Glacier National Park, Montana, USA, area and simulation modeling we evaluate these methods and compare the results to hair-snag-only estimates. Empirical results indicate that, compared with hair-snag-only data, the joint hair-snag-rub-tree methods produce similar but more precise estimates if capture and recapture rates are reasonably high for both methods. Simulation results suggest that estimators are potentially affected by correlation of capture probabilities between sample types in the presence of heterogeneity. Overall, closed population Huggins-Pledger estimators showed the highest precision and were most robust to sparse data, heterogeneity, and capture probability correlation among sampling types. Results also indicate that these estimators can be used when a segment of the population has zero capture probability for one of the methods. We propose that this general methodology may be useful for other species in which mark-recapture data are available from multiple sources.
Richard, David; Speck, Thomas
2018-03-28
We investigate the kinetics and the free energy landscape of the crystallization of hard spheres from a supersaturated metastable liquid though direct simulations and forward flux sampling. In this first paper, we describe and test two different ways to reconstruct the free energy barriers from the sampled steady state probability distribution of cluster sizes without sampling the equilibrium distribution. The first method is based on mean first passage times, and the second method is based on splitting probabilities. We verify both methods for a single particle moving in a double-well potential. For the nucleation of hard spheres, these methods allow us to probe a wide range of supersaturations and to reconstruct the kinetics and the free energy landscape from the same simulation. Results are consistent with the scaling predicted by classical nucleation theory although a quantitative fit requires a rather large effective interfacial tension.
NASA Astrophysics Data System (ADS)
Richard, David; Speck, Thomas
2018-03-01
We investigate the kinetics and the free energy landscape of the crystallization of hard spheres from a supersaturated metastable liquid though direct simulations and forward flux sampling. In this first paper, we describe and test two different ways to reconstruct the free energy barriers from the sampled steady state probability distribution of cluster sizes without sampling the equilibrium distribution. The first method is based on mean first passage times, and the second method is based on splitting probabilities. We verify both methods for a single particle moving in a double-well potential. For the nucleation of hard spheres, these methods allow us to probe a wide range of supersaturations and to reconstruct the kinetics and the free energy landscape from the same simulation. Results are consistent with the scaling predicted by classical nucleation theory although a quantitative fit requires a rather large effective interfacial tension.
NASA Astrophysics Data System (ADS)
Xia, Xintao; Wang, Zhongyu
2008-10-01
For some methods of stability analysis of a system using statistics, it is difficult to resolve the problems of unknown probability distribution and small sample. Therefore, a novel method is proposed in this paper to resolve these problems. This method is independent of probability distribution, and is useful for small sample systems. After rearrangement of the original data series, the order difference and two polynomial membership functions are introduced to estimate the true value, the lower bound and the supper bound of the system using fuzzy-set theory. Then empirical distribution function is investigated to ensure confidence level above 95%, and the degree of similarity is presented to evaluate stability of the system. Cases of computer simulation investigate stable systems with various probability distribution, unstable systems with linear systematic errors and periodic systematic errors and some mixed systems. The method of analysis for systematic stability is approved.
Williams, Michael S; Cao, Yong; Ebel, Eric D
2013-07-15
Levels of pathogenic organisms in food and water have steadily declined in many parts of the world. A consequence of this reduction is that the proportion of samples that test positive for the most contaminated product-pathogen pairings has fallen to less than 0.1. While this is unequivocally beneficial to public health, datasets with very few enumerated samples present an analytical challenge because a large proportion of the observations are censored values. One application of particular interest to risk assessors is the fitting of a statistical distribution function to datasets collected at some point in the farm-to-table continuum. The fitted distribution forms an important component of an exposure assessment. A number of studies have compared different fitting methods and proposed lower limits on the proportion of samples where the organisms of interest are identified and enumerated, with the recommended lower limit of enumerated samples being 0.2. This recommendation may not be applicable to food safety risk assessments for a number of reasons, which include the development of new Bayesian fitting methods, the use of highly sensitive screening tests, and the generally larger sample sizes found in surveys of food commodities. This study evaluates the performance of a Markov chain Monte Carlo fitting method when used in conjunction with a screening test and enumeration of positive samples by the Most Probable Number technique. The results suggest that levels of contamination for common product-pathogen pairs, such as Salmonella on poultry carcasses, can be reliably estimated with the proposed fitting method and samples sizes in excess of 500 observations. The results do, however, demonstrate that simple guidelines for this application, such as the proportion of positive samples, cannot be provided. Published by Elsevier B.V.
Electrofishing capture probability of smallmouth bass in streams
Dauwalter, D.C.; Fisher, W.L.
2007-01-01
Abundance estimation is an integral part of understanding the ecology and advancing the management of fish populations and communities. Mark-recapture and removal methods are commonly used to estimate the abundance of stream fishes. Alternatively, abundance can be estimated by dividing the number of individuals sampled by the probability of capture. We conducted a mark-recapture study and used multiple repeated-measures logistic regression to determine the influence of fish size, sampling procedures, and stream habitat variables on the cumulative capture probability for smallmouth bass Micropterus dolomieu in two eastern Oklahoma streams. The predicted capture probability was used to adjust the number of individuals sampled to obtain abundance estimates. The observed capture probabilities were higher for larger fish and decreased with successive electrofishing passes for larger fish only. Model selection suggested that the number of electrofishing passes, fish length, and mean thalweg depth affected capture probabilities the most; there was little evidence for any effect of electrofishing power density and woody debris density on capture probability. Leave-one-out cross validation showed that the cumulative capture probability model predicts smallmouth abundance accurately. ?? Copyright by the American Fisheries Society 2007.
Dong, Qi; Elliott, Michael R; Raghunathan, Trivellore E
2014-06-01
Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs.
Dong, Qi; Elliott, Michael R.; Raghunathan, Trivellore E.
2017-01-01
Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs. PMID:29200608
Nonlinear Spatial Inversion Without Monte Carlo Sampling
NASA Astrophysics Data System (ADS)
Curtis, A.; Nawaz, A.
2017-12-01
High-dimensional, nonlinear inverse or inference problems usually have non-unique solutions. The distribution of solutions are described by probability distributions, and these are usually found using Monte Carlo (MC) sampling methods. These take pseudo-random samples of models in parameter space, calculate the probability of each sample given available data and other information, and thus map out high or low probability values of model parameters. However, such methods would converge to the solution only as the number of samples tends to infinity; in practice, MC is found to be slow to converge, convergence is not guaranteed to be achieved in finite time, and detection of convergence requires the use of subjective criteria. We propose a method for Bayesian inversion of categorical variables such as geological facies or rock types in spatial problems, which requires no sampling at all. The method uses a 2-D Hidden Markov Model over a grid of cells, where observations represent localized data constraining the model in each cell. The data in our example application are seismic properties such as P- and S-wave impedances or rock density; our model parameters are the hidden states and represent the geological rock types in each cell. The observations at each location are assumed to depend on the facies at that location only - an assumption referred to as `localized likelihoods'. However, the facies at a location cannot be determined solely by the observation at that location as it also depends on prior information concerning its correlation with the spatial distribution of facies elsewhere. Such prior information is included in the inversion in the form of a training image which represents a conceptual depiction of the distribution of local geologies that might be expected, but other forms of prior information can be used in the method as desired. The method provides direct (pseudo-analytic) estimates of posterior marginal probability distributions over each variable, so these do not need to be estimated from samples as is required in MC methods. On a 2-D test example the method is shown to outperform previous methods significantly, and at a fraction of the computational cost. In many foreseeable applications there are therefore no serious impediments to extending the method to 3-D spatial models.
Piecewise SALT sampling for estimating suspended sediment yields
Robert B. Thomas
1989-01-01
A probability sampling method called SALT (Selection At List Time) has been developed for collecting and summarizing data on delivery of suspended sediment in rivers. It is based on sampling and estimating yield using a suspended-sediment rating curve for high discharges and simple random sampling for low flows. The method gives unbiased estimates of total yield and...
Ryan, K; Williams, D Gareth; Balding, David J
2016-11-01
Many DNA profiles recovered from crime scene samples are of a quality that does not allow them to be searched against, nor entered into, databases. We propose a method for the comparison of profiles arising from two DNA samples, one or both of which can have multiple donors and be affected by low DNA template or degraded DNA. We compute likelihood ratios to evaluate the hypothesis that the two samples have a common DNA donor, and hypotheses specifying the relatedness of two donors. Our method uses a probability distribution for the genotype of the donor of interest in each sample. This distribution can be obtained from a statistical model, or we can exploit the ability of trained human experts to assess genotype probabilities, thus extracting much information that would be discarded by standard interpretation rules. Our method is compatible with established methods in simple settings, but is more widely applicable and can make better use of information than many current methods for the analysis of mixed-source, low-template DNA profiles. It can accommodate uncertainty arising from relatedness instead of or in addition to uncertainty arising from noisy genotyping. We describe a computer program GPMDNA, available under an open source licence, to calculate LRs using the method presented in this paper. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Probabilistic Design of a Mars Sample Return Earth Entry Vehicle Thermal Protection System
NASA Technical Reports Server (NTRS)
Dec, John A.; Mitcheltree, Robert A.
2002-01-01
The driving requirement for design of a Mars Sample Return mission is to assure containment of the returned samples. Designing to, and demonstrating compliance with, such a requirement requires physics based tools that establish the relationship between engineer's sizing margins and probabilities of failure. The traditional method of determining margins on ablative thermal protection systems, while conservative, provides little insight into the actual probability of an over-temperature during flight. The objective of this paper is to describe a new methodology for establishing margins on sizing the thermal protection system (TPS). Results of this Monte Carlo approach are compared with traditional methods.
Woldegebriel, Michael; Vivó-Truyols, Gabriel
2016-10-04
A novel method for compound identification in liquid chromatography-high resolution mass spectrometry (LC-HRMS) is proposed. The method, based on Bayesian statistics, accommodates all possible uncertainties involved, from instrumentation up to data analysis into a single model yielding the probability of the compound of interest being present/absent in the sample. This approach differs from the classical methods in two ways. First, it is probabilistic (instead of deterministic); hence, it computes the probability that the compound is (or is not) present in a sample. Second, it answers the hypothesis "the compound is present", opposed to answering the question "the compound feature is present". This second difference implies a shift in the way data analysis is tackled, since the probability of interfering compounds (i.e., isomers and isobaric compounds) is also taken into account.
A novel method for correcting scanline-observational bias of discontinuity orientation
Huang, Lei; Tang, Huiming; Tan, Qinwen; Wang, Dingjian; Wang, Liangqing; Ez Eldin, Mutasim A. M.; Li, Changdong; Wu, Qiong
2016-01-01
Scanline observation is known to introduce an angular bias into the probability distribution of orientation in three-dimensional space. In this paper, numerical solutions expressing the functional relationship between the scanline-observational distribution (in one-dimensional space) and the inherent distribution (in three-dimensional space) are derived using probability theory and calculus under the independence hypothesis of dip direction and dip angle. Based on these solutions, a novel method for obtaining the inherent distribution (also for correcting the bias) is proposed, an approach which includes two procedures: 1) Correcting the cumulative probabilities of orientation according to the solutions, and 2) Determining the distribution of the corrected orientations using approximation methods such as the one-sample Kolmogorov-Smirnov test. The inherent distribution corrected by the proposed method can be used for discrete fracture network (DFN) modelling, which is applied to such areas as rockmass stability evaluation, rockmass permeability analysis, rockmass quality calculation and other related fields. To maximize the correction capacity of the proposed method, the observed sample size is suggested through effectiveness tests for different distribution types, dispersions and sample sizes. The performance of the proposed method and the comparison of its correction capacity with existing methods are illustrated with two case studies. PMID:26961249
Probabilistic methods for rotordynamics analysis
NASA Technical Reports Server (NTRS)
Wu, Y.-T.; Torng, T. Y.; Millwater, H. R.; Fossum, A. F.; Rheinfurth, M. H.
1991-01-01
This paper summarizes the development of the methods and a computer program to compute the probability of instability of dynamic systems that can be represented by a system of second-order ordinary linear differential equations. Two instability criteria based upon the eigenvalues or Routh-Hurwitz test functions are investigated. Computational methods based on a fast probability integration concept and an efficient adaptive importance sampling method are proposed to perform efficient probabilistic analysis. A numerical example is provided to demonstrate the methods.
Jung, Minsoo
2015-01-01
When there is no sampling frame within a certain group or the group is concerned that making its population public would bring social stigma, we say the population is hidden. It is difficult to approach this kind of population survey-methodologically because the response rate is low and its members are not quite honest with their responses when probability sampling is used. The only alternative known to address the problems caused by previous methods such as snowball sampling is respondent-driven sampling (RDS), which was developed by Heckathorn and his colleagues. RDS is based on a Markov chain, and uses the social network information of the respondent. This characteristic allows for probability sampling when we survey a hidden population. We verified through computer simulation whether RDS can be used on a hidden population of cancer survivors. According to the simulation results of this thesis, the chain-referral sampling of RDS tends to minimize as the sample gets bigger, and it becomes stabilized as the wave progresses. Therefore, it shows that the final sample information can be completely independent from the initial seeds if a certain level of sample size is secured even if the initial seeds were selected through convenient sampling. Thus, RDS can be considered as an alternative which can improve upon both key informant sampling and ethnographic surveys, and it needs to be utilized for various cases domestically as well.
Change-in-ratio estimators for populations with more than two subclasses
Udevitz, Mark S.; Pollock, Kenneth H.
1991-01-01
Change-in-ratio methods have been developed to estimate the size of populations with two or three population subclasses. Most of these methods require the often unreasonable assumption of equal sampling probabilities for individuals in all subclasses. This paper presents new models based on the weaker assumption that ratios of sampling probabilities are constant over time for populations with three or more subclasses. Estimation under these models requires that a value be assumed for one of these ratios when there are two samples. Explicit expressions are given for the maximum likelihood estimators under models for two samples with three or more subclasses and for three samples with two subclasses. A numerical method using readily available statistical software is described for obtaining the estimators and their standard errors under all of the models. Likelihood ratio tests that can be used in model selection are discussed. Emphasis is on the two-sample, three-subclass models for which Monte-Carlo simulation results and an illustrative example are presented.
New methods for sampling sparse populations
Anna Ringvall
2007-01-01
To improve surveys of sparse objects, methods that use auxiliary information have been suggested. Guided transect sampling uses prior information, e.g., from aerial photographs, for the layout of survey strips. Instead of being laid out straight, the strips will wind between potentially more interesting areas. 3P sampling (probability proportional to prediction) uses...
Intervals for posttest probabilities: a comparison of 5 methods.
Mossman, D; Berger, J O
2001-01-01
Several medical articles discuss methods of constructing confidence intervals for single proportions and the likelihood ratio, but scant attention has been given to the systematic study of intervals for the posterior odds, or the positive predictive value, of a test. The authors describe 5 methods of constructing confidence intervals for posttest probabilities when estimates of sensitivity, specificity, and the pretest probability of a disorder are derived from empirical data. They then evaluate each method to determine how well the intervals' coverage properties correspond to their nominal value. When the estimates of pretest probabilities, sensitivity, and specificity are derived from more than 80 subjects and are not close to 0 or 1, all methods generate intervals with appropriate coverage properties. When these conditions are not met, however, the best-performing method is an objective Bayesian approach implemented by a simple simulation using a spreadsheet. Physicians and investigators can generate accurate confidence intervals for posttest probabilities in small-sample situations using the objective Bayesian approach.
Correcting Classifiers for Sample Selection Bias in Two-Phase Case-Control Studies
Theis, Fabian J.
2017-01-01
Epidemiological studies often utilize stratified data in which rare outcomes or exposures are artificially enriched. This design can increase precision in association tests but distorts predictions when applying classifiers on nonstratified data. Several methods correct for this so-called sample selection bias, but their performance remains unclear especially for machine learning classifiers. With an emphasis on two-phase case-control studies, we aim to assess which corrections to perform in which setting and to obtain methods suitable for machine learning techniques, especially the random forest. We propose two new resampling-based methods to resemble the original data and covariance structure: stochastic inverse-probability oversampling and parametric inverse-probability bagging. We compare all techniques for the random forest and other classifiers, both theoretically and on simulated and real data. Empirical results show that the random forest profits from only the parametric inverse-probability bagging proposed by us. For other classifiers, correction is mostly advantageous, and methods perform uniformly. We discuss consequences of inappropriate distribution assumptions and reason for different behaviors between the random forest and other classifiers. In conclusion, we provide guidance for choosing correction methods when training classifiers on biased samples. For random forests, our method outperforms state-of-the-art procedures if distribution assumptions are roughly fulfilled. We provide our implementation in the R package sambia. PMID:29312464
45 CFR 286.260 - May Tribes use sampling and electronic filing?
Code of Federal Regulations, 2011 CFR
2011-10-01
... method” means a probability sampling method in which every sampling unit has a known, non-zero chance to... quarterly reports electronically, based on format specifications that we will provide. Tribes who do not...
Prah, Philip; Hickson, Ford; Bonell, Chris; McDaid, Lisa M; Johnson, Anne M; Wayal, Sonali; Clifton, Soazig; Sonnenberg, Pam; Nardone, Anthony; Erens, Bob; Copas, Andrew J; Riddell, Julie; Weatherburn, Peter; Mercer, Catherine H
2016-01-01
Objective To examine sociodemographic and behavioural differences between men who have sex with men (MSM) participating in recent UK convenience surveys and a national probability sample survey. Methods We compared 148 MSM aged 18–64 years interviewed for Britain's third National Survey of Sexual Attitudes and Lifestyles (Natsal-3) undertaken in 2010–2012, with men in the same age range participating in contemporaneous convenience surveys of MSM: 15 500 British resident men in the European MSM Internet Survey (EMIS); 797 in the London Gay Men's Sexual Health Survey; and 1234 in Scotland's Gay Men's Sexual Health Survey. Analyses compared men reporting at least one male sexual partner (past year) on similarly worded questions and multivariable analyses accounted for sociodemographic differences between the surveys. Results MSM in convenience surveys were younger and better educated than MSM in Natsal-3, and a larger proportion identified as gay (85%–95% vs 62%). Partner numbers were higher and same-sex anal sex more common in convenience surveys. Unprotected anal intercourse was more commonly reported in EMIS. Compared with Natsal-3, MSM in convenience surveys were more likely to report gonorrhoea diagnoses and HIV testing (both past year). Differences between the samples were reduced when restricting analysis to gay-identifying MSM. Conclusions National probability surveys better reflect the population of MSM but are limited by their smaller samples of MSM. Convenience surveys recruit larger samples of MSM but tend to over-represent MSM identifying as gay and reporting more sexual risk behaviours. Because both sampling strategies have strengths and weaknesses, methods are needed to triangulate data from probability and convenience surveys. PMID:26965869
NASA Technical Reports Server (NTRS)
Munoz, E. F.; Silverman, M. P.
1979-01-01
A single-step most-probable-number method for determining the number of fecal coliform bacteria present in sewage treatment plant effluents is discussed. A single growth medium based on that of Reasoner et al. (1976) and consisting of 5.0 gr. proteose peptone, 3.0 gr. yeast extract, 10.0 gr. lactose, 7.5 gr. NaCl, 0.2 gr. sodium lauryl sulfate, and 0.1 gr. sodium desoxycholate per liter is used. The pH is adjusted to 6.5, and samples are incubated at 44.5 deg C. Bacterial growth is detected either by measuring the increase with time in the electrical impedance ratio between the innoculated sample vial and an uninnoculated reference vial or by visual examination for turbidity. Results obtained by the single-step method for chlorinated and unchlorinated effluent samples are in excellent agreement with those obtained by the standard method. It is suggested that in automated treatment plants impedance ratio data could be automatically matched by computer programs with the appropriate dilution factors and most probable number tables already in the computer memory, with the corresponding result displayed as fecal coliforms per 100 ml of effluent.
Schwarcz, Sandra; Spindler, Hilary; Scheer, Susan; Valleroy, Linda; Lansky, Amy
2007-07-01
Convenience samples are used to determine HIV-related behaviors among men who have sex with men (MSM) without measuring the extent to which the results are representative of the broader MSM population. We compared results from a cross-sectional survey of MSM recruited from gay bars between June and October 2001 to a random digit dial telephone survey conducted between June 2002 and January 2003. The men in the probability sample were older, better educated, and had higher incomes than men in the convenience sample, the convenience sample enrolled more employed men and men of color. Substance use around the time of sex was higher in the convenience sample but other sexual behaviors were similar. HIV testing was common among men in both samples. Periodic validation, through comparison of data collected by different sampling methods, may be useful when relying on survey data for program and policy development.
Butler, Troy; Wildey, Timothy
2018-01-01
In thist study, we develop a procedure to utilize error estimates for samples of a surrogate model to compute robust upper and lower bounds on estimates of probabilities of events. We show that these error estimates can also be used in an adaptive algorithm to simultaneously reduce the computational cost and increase the accuracy in estimating probabilities of events using computationally expensive high-fidelity models. Specifically, we introduce the notion of reliability of a sample of a surrogate model, and we prove that utilizing the surrogate model for the reliable samples and the high-fidelity model for the unreliable samples gives preciselymore » the same estimate of the probability of the output event as would be obtained by evaluation of the original model for each sample. The adaptive algorithm uses the additional evaluations of the high-fidelity model for the unreliable samples to locally improve the surrogate model near the limit state, which significantly reduces the number of high-fidelity model evaluations as the limit state is resolved. Numerical results based on a recently developed adjoint-based approach for estimating the error in samples of a surrogate are provided to demonstrate (1) the robustness of the bounds on the probability of an event, and (2) that the adaptive enhancement algorithm provides a more accurate estimate of the probability of the QoI event than standard response surface approximation methods at a lower computational cost.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Butler, Troy; Wildey, Timothy
In thist study, we develop a procedure to utilize error estimates for samples of a surrogate model to compute robust upper and lower bounds on estimates of probabilities of events. We show that these error estimates can also be used in an adaptive algorithm to simultaneously reduce the computational cost and increase the accuracy in estimating probabilities of events using computationally expensive high-fidelity models. Specifically, we introduce the notion of reliability of a sample of a surrogate model, and we prove that utilizing the surrogate model for the reliable samples and the high-fidelity model for the unreliable samples gives preciselymore » the same estimate of the probability of the output event as would be obtained by evaluation of the original model for each sample. The adaptive algorithm uses the additional evaluations of the high-fidelity model for the unreliable samples to locally improve the surrogate model near the limit state, which significantly reduces the number of high-fidelity model evaluations as the limit state is resolved. Numerical results based on a recently developed adjoint-based approach for estimating the error in samples of a surrogate are provided to demonstrate (1) the robustness of the bounds on the probability of an event, and (2) that the adaptive enhancement algorithm provides a more accurate estimate of the probability of the QoI event than standard response surface approximation methods at a lower computational cost.« less
He, Hua; McDermott, Michael P.
2012-01-01
Sensitivity and specificity are common measures of the accuracy of a diagnostic test. The usual estimators of these quantities are unbiased if data on the diagnostic test result and the true disease status are obtained from all subjects in an appropriately selected sample. In some studies, verification of the true disease status is performed only for a subset of subjects, possibly depending on the result of the diagnostic test and other characteristics of the subjects. Estimators of sensitivity and specificity based on this subset of subjects are typically biased; this is known as verification bias. Methods have been proposed to correct verification bias under the assumption that the missing data on disease status are missing at random (MAR), that is, the probability of missingness depends on the true (missing) disease status only through the test result and observed covariate information. When some of the covariates are continuous, or the number of covariates is relatively large, the existing methods require parametric models for the probability of disease or the probability of verification (given the test result and covariates), and hence are subject to model misspecification. We propose a new method for correcting verification bias based on the propensity score, defined as the predicted probability of verification given the test result and observed covariates. This is estimated separately for those with positive and negative test results. The new method classifies the verified sample into several subsamples that have homogeneous propensity scores and allows correction for verification bias. Simulation studies demonstrate that the new estimators are more robust to model misspecification than existing methods, but still perform well when the models for the probability of disease and probability of verification are correctly specified. PMID:21856650
Probability techniques for reliability analysis of composite materials
NASA Technical Reports Server (NTRS)
Wetherhold, Robert C.; Ucci, Anthony M.
1994-01-01
Traditional design approaches for composite materials have employed deterministic criteria for failure analysis. New approaches are required to predict the reliability of composite structures since strengths and stresses may be random variables. This report will examine and compare methods used to evaluate the reliability of composite laminae. The two types of methods that will be evaluated are fast probability integration (FPI) methods and Monte Carlo methods. In these methods, reliability is formulated as the probability that an explicit function of random variables is less than a given constant. Using failure criteria developed for composite materials, a function of design variables can be generated which defines a 'failure surface' in probability space. A number of methods are available to evaluate the integration over the probability space bounded by this surface; this integration delivers the required reliability. The methods which will be evaluated are: the first order, second moment FPI methods; second order, second moment FPI methods; the simple Monte Carlo; and an advanced Monte Carlo technique which utilizes importance sampling. The methods are compared for accuracy, efficiency, and for the conservativism of the reliability estimation. The methodology involved in determining the sensitivity of the reliability estimate to the design variables (strength distributions) and importance factors is also presented.
NASA Technical Reports Server (NTRS)
Hudson, Nicolas; Lin, Ying; Barengoltz, Jack
2010-01-01
A method for evaluating the probability of a Viable Earth Microorganism (VEM) contaminating a sample during the sample acquisition and handling (SAH) process of a potential future Mars Sample Return mission is developed. A scenario where multiple core samples would be acquired using a rotary percussive coring tool, deployed from an arm on a MER class rover is analyzed. The analysis is conducted in a structured way by decomposing sample acquisition and handling process into a series of discrete time steps, and breaking the physical system into a set of relevant components. At each discrete time step, two key functions are defined: The probability of a VEM being released from each component, and the transport matrix, which represents the probability of VEM transport from one component to another. By defining the expected the number of VEMs on each component at the start of the sampling process, these decompositions allow the expected number of VEMs on each component at each sampling step to be represented as a Markov chain. This formalism provides a rigorous mathematical framework in which to analyze the probability of a VEM entering the sample chain, as well as making the analysis tractable by breaking the process down into small analyzable steps.
NASA Astrophysics Data System (ADS)
Wang, C.; Rubin, Y.
2014-12-01
Spatial distribution of important geotechnical parameter named compression modulus Es contributes considerably to the understanding of the underlying geological processes and the adequate assessment of the Es mechanics effects for differential settlement of large continuous structure foundation. These analyses should be derived using an assimilating approach that combines in-situ static cone penetration test (CPT) with borehole experiments. To achieve such a task, the Es distribution of stratum of silty clay in region A of China Expo Center (Shanghai) is studied using the Bayesian-maximum entropy method. This method integrates rigorously and efficiently multi-precision of different geotechnical investigations and sources of uncertainty. Single CPT samplings were modeled as a rational probability density curve by maximum entropy theory. Spatial prior multivariate probability density function (PDF) and likelihood PDF of the CPT positions were built by borehole experiments and the potential value of the prediction point, then, preceding numerical integration on the CPT probability density curves, the posterior probability density curve of the prediction point would be calculated by the Bayesian reverse interpolation framework. The results were compared between Gaussian Sequential Stochastic Simulation and Bayesian methods. The differences were also discussed between single CPT samplings of normal distribution and simulated probability density curve based on maximum entropy theory. It is shown that the study of Es spatial distributions can be improved by properly incorporating CPT sampling variation into interpolation process, whereas more informative estimations are generated by considering CPT Uncertainty for the estimation points. Calculation illustrates the significance of stochastic Es characterization in a stratum, and identifies limitations associated with inadequate geostatistical interpolation techniques. This characterization results will provide a multi-precision information assimilation method of other geotechnical parameters.
USDA-ARS?s Scientific Manuscript database
Whether a required Salmonella test series is passed or failed depends not only on the presence of the bacteria, but also on the methods for taking samples, the methods for culturing samples, and the statistics associated with the sampling plan. The pass-fail probabilities of the two-class attribute...
A statistical treatment of bioassay pour fractions
NASA Astrophysics Data System (ADS)
Barengoltz, Jack; Hughes, David
A bioassay is a method for estimating the number of bacterial spores on a spacecraft surface for the purpose of demonstrating compliance with planetary protection (PP) requirements (Ref. 1). The details of the process may be seen in the appropriate PP document (e.g., for NASA, Ref. 2). In general, the surface is mechanically sampled with a damp sterile swab or wipe. The completion of the process is colony formation in a growth medium in a plate (Petri dish); the colonies are counted. Consider a set of samples from randomly selected, known areas of one spacecraft surface, for simplicity. One may calculate the mean and standard deviation of the bioburden density, which is the ratio of counts to area sampled. The standard deviation represents an estimate of the variation from place to place of the true bioburden density commingled with the precision of the individual sample counts. The accuracy of individual sample results depends on the equipment used, the collection method, and the culturing method. One aspect that greatly influences the result is the pour fraction, which is the quantity of fluid added to the plates divided by the total fluid used in extracting spores from the sampling equipment. In an analysis of a single sample’s counts due to the pour fraction, one seeks to answer the question: What is the probability that if a certain number of spores are counted with a known pour fraction, that there are an additional number of spores in the part of the rinse not poured. This is given for specific values by the binomial distribution density, where detection (of culturable spores) is success and the probability of success is the pour fraction. A special summation over the binomial distribution, equivalent to adding for all possible values of the true total number of spores, is performed. This distribution when normalized will almost yield the desired quantity. It is the probability that the additional number of spores does not exceed a certain value. Of course, for a desired value of uncertainty, one must invert the calculation. However, this probability of finding exactly the number of spores in the poured part is correct only in the case where all values of the true number of spores greater than or equal to the adjusted count are equally probable. This is not realistic, of course, but the result can only overestimate the uncertainty. So it is useful. In probability speak, one has the conditional probability given any true total number of spores. Therefore one must multiply it by the probability of each possible true count, before the summation. If the counts for a sample set (of which this is one sample) are available, one may use the calculated variance and the normal probability distribution. In this approach, one assumes a normal distribution and neglects the contribution from spatial variation. The former is a common assumption. The latter can only add to the conservatism (over estimate the number of spores at some level of confidence). A more straightforward approach is to assume a Poisson probability distribution for the measured total sample set counts, and use the product of the number of samples and the mean number of counts per sample as the mean of the Poisson distribution. It is necessary to set the total count to 1 in the Poisson distribution when actual total count is zero. Finally, even when the planetary protection requirements for spore burden refer only to the mean values, they require an adjustment for pour fraction and method efficiency (a PP specification based on independent data). The adjusted mean values are a 50/50 proposition (e.g., the probability of the true total counts in the sample set exceeding the estimate is 0.50). However, this is highly unconservative when the total counts are zero. No adjustment to the mean values occurs for either pour fraction or efficiency. The recommended approach is once again to set the total counts to 1, but now applied to the mean values. Then one may apply the corrections to the revised counts. It can be shown by the methods developed in this work that this change is usually conservative enough to increase the level of confidence in the estimate to 0.5. 1. NASA. (2005) Planetary protection provisions for robotic extraterrestrial missions. NPR 8020.12C, April 2005, National Aeronautics and Space Administration, Washington, DC. 2. NASA. (2010) Handbook for the Microbiological Examination of Space Hardware, NASA-HDBK-6022, National Aeronautics and Space Administration, Washington, DC.
Nangia, Shikha; Jasper, Ahren W; Miller, Thomas F; Truhlar, Donald G
2004-02-22
The most widely used algorithm for Monte Carlo sampling of electronic transitions in trajectory surface hopping (TSH) calculations is the so-called anteater algorithm, which is inefficient for sampling low-probability nonadiabatic events. We present a new sampling scheme (called the army ants algorithm) for carrying out TSH calculations that is applicable to systems with any strength of coupling. The army ants algorithm is a form of rare event sampling whose efficiency is controlled by an input parameter. By choosing a suitable value of the input parameter the army ants algorithm can be reduced to the anteater algorithm (which is efficient for strongly coupled cases), and by optimizing the parameter the army ants algorithm may be efficiently applied to systems with low-probability events. To demonstrate the efficiency of the army ants algorithm, we performed atom-diatom scattering calculations on a model system involving weakly coupled electronic states. Fully converged quantum mechanical calculations were performed, and the probabilities for nonadiabatic reaction and nonreactive deexcitation (quenching) were found to be on the order of 10(-8). For such low-probability events the anteater sampling scheme requires a large number of trajectories ( approximately 10(10)) to obtain good statistics and converged semiclassical results. In contrast by using the new army ants algorithm converged results were obtained by running 10(5) trajectories. Furthermore, the results were found to be in excellent agreement with the quantum mechanical results. Sampling errors were estimated using the bootstrap method, which is validated for use with the army ants algorithm. (c) 2004 American Institute of Physics.
Accounting for Incomplete Species Detection in Fish Community Monitoring
DOE Office of Scientific and Technical Information (OSTI.GOV)
McManamay, Ryan A; Orth, Dr. Donald J; Jager, Yetta
2013-01-01
Riverine fish assemblages are heterogeneous and very difficult to characterize with a one-size-fits-all approach to sampling. Furthermore, detecting changes in fish assemblages over time requires accounting for variation in sampling designs. We present a modeling approach that permits heterogeneous sampling by accounting for site and sampling covariates (including method) in a model-based framework for estimation (versus a sampling-based framework). We snorkeled during three surveys and electrofished during a single survey in suite of delineated habitats stratified by reach types. We developed single-species occupancy models to determine covariates influencing patch occupancy and species detection probabilities whereas community occupancy models estimated speciesmore » richness in light of incomplete detections. For most species, information-theoretic criteria showed higher support for models that included patch size and reach as covariates of occupancy. In addition, models including patch size and sampling method as covariates of detection probabilities also had higher support. Detection probability estimates for snorkeling surveys were higher for larger non-benthic species whereas electrofishing was more effective at detecting smaller benthic species. The number of sites and sampling occasions required to accurately estimate occupancy varied among fish species. For rare benthic species, our results suggested that higher number of occasions, and especially the addition of electrofishing, may be required to improve detection probabilities and obtain accurate occupancy estimates. Community models suggested that richness was 41% higher than the number of species actually observed and the addition of an electrofishing survey increased estimated richness by 13%. These results can be useful to future fish assemblage monitoring efforts by informing sampling designs, such as site selection (e.g. stratifying based on patch size) and determining effort required (e.g. number of sites versus occasions).« less
Model for spectral and chromatographic data
Jarman, Kristin [Richland, WA; Willse, Alan [Richland, WA; Wahl, Karen [Richland, WA; Wahl, Jon [Richland, WA
2002-11-26
A method and apparatus using a spectral analysis technique are disclosed. In one form of the invention, probabilities are selected to characterize the presence (and in another form, also a quantification of a characteristic) of peaks in an indexed data set for samples that match a reference species, and other probabilities are selected for samples that do not match the reference species. An indexed data set is acquired for a sample, and a determination is made according to techniques exemplified herein as to whether the sample matches or does not match the reference species. When quantification of peak characteristics is undertaken, the model is appropriately expanded, and the analysis accounts for the characteristic model and data. Further techniques are provided to apply the methods and apparatuses to process control, cluster analysis, hypothesis testing, analysis of variance, and other procedures involving multiple comparisons of indexed data.
Harry T. Valentine
2002-01-01
Randomized branch sampling (RBS) is a special application of multistage probability sampling (see Sampling, environmental), which was developed originally by Jessen [3] to estimate fruit counts on individual orchard trees. In general, the method can be used to obtain estimates of many different attributes of trees or other branched plants. The usual objective of RBS is...
Gilliom, Robert J.; Helsel, Dennis R.
1986-01-01
A recurring difficulty encountered in investigations of many metals and organic contaminants in ambient waters is that a substantial portion of water sample concentrations are below limits of detection established by analytical laboratories. Several methods were evaluated for estimating distributional parameters for such censored data sets using only uncensored observations. Their reliabilities were evaluated by a Monte Carlo experiment in which small samples were generated from a wide range of parent distributions and censored at varying levels. Eight methods were used to estimate the mean, standard deviation, median, and interquartile range. Criteria were developed, based on the distribution of uncensored observations, for determining the best performing parameter estimation method for any particular data set. The most robust method for minimizing error in censored-sample estimates of the four distributional parameters over all simulation conditions was the log-probability regression method. With this method, censored observations are assumed to follow the zero-to-censoring level portion of a lognormal distribution obtained by a least squares regression between logarithms of uncensored concentration observations and their z scores. When method performance was separately evaluated for each distributional parameter over all simulation conditions, the log-probability regression method still had the smallest errors for the mean and standard deviation, but the lognormal maximum likelihood method had the smallest errors for the median and interquartile range. When data sets were classified prior to parameter estimation into groups reflecting their probable parent distributions, the ranking of estimation methods was similar, but the accuracy of error estimates was markedly improved over those without classification.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gilliom, R.J.; Helsel, D.R.
1986-02-01
A recurring difficulty encountered in investigations of many metals and organic contaminants in ambient waters is that a substantial portion of water sample concentrations are below limits of detection established by analytical laboratories. Several methods were evaluated for estimating distributional parameters for such censored data sets using only uncensored observations. Their reliabilities were evaluated by a Monte Carlo experiment in which small samples were generated from a wide range of parent distributions and censored at varying levels. Eight methods were used to estimate the mean, standard deviation, median, and interquartile range. Criteria were developed, based on the distribution of uncensoredmore » observations, for determining the best performing parameter estimation method for any particular data det. The most robust method for minimizing error in censored-sample estimates of the four distributional parameters over all simulation conditions was the log-probability regression method. With this method, censored observations are assumed to follow the zero-to-censoring level portion of a lognormal distribution obtained by a least squares regression between logarithms of uncensored concentration observations and their z scores. When method performance was separately evaluated for each distributional parameter over all simulation conditions, the log-probability regression method still had the smallest errors for the mean and standard deviation, but the lognormal maximum likelihood method had the smallest errors for the median and interquartile range. When data sets were classified prior to parameter estimation into groups reflecting their probable parent distributions, the ranking of estimation methods was similar, but the accuracy of error estimates was markedly improved over those without classification.« less
Estimation of distributional parameters for censored trace-level water-quality data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gilliom, R.J.; Helsel, D.R.
1984-01-01
A recurring difficulty encountered in investigations of many metals and organic contaminants in ambient waters is that a substantial portion of water-sample concentrations are below limits of detection established by analytical laboratories. Several methods were evaluated for estimating distributional parameters for such censored data sets using only uncensored observations. Their reliabilities were evaluated by a Monte Carlo experiment in which small samples were generated from a wide range of parent distributions and censored at varying levels. Eight methods were used to estimate the mean, standard deviation, median, and interquartile range. Criteria were developed, based on the distribution of uncensored observations,more » for determining the best-performing parameter estimation method for any particular data set. The most robust method for minimizing error in censored-sample estimates of the four distributional parameters over all simulation conditions was the log-probability regression method. With this method, censored observations are assumed to follow the zero-to-censoring level portion of a lognormal distribution obtained by a least-squares regression between logarithms of uncensored concentration observations and their z scores. When method performance was separately evaluated for each distributional parameter over all simulation conditions, the log-probability regression method still had the smallest errors for the mean and standard deviation, but the lognormal maximum likelihood method had the smallest errors for the median and interquartile range. When data sets were classified prior to parameter estimation into groups reflecting their probable parent distributions, the ranking of estimation methods was similar, but the accuracy of error estimates was markedly improved over those without classification. 6 figs., 6 tabs.« less
Grain reconstruction of porous media: application to a Bentheim sandstone.
Thovert, J-F; Adler, P M
2011-05-01
The two-point correlation measured on a thin section can be used to derive the probability density of the radii of a population of penetrable spheres. The geometrical, transport, and deformation properties of samples derived by this method compare well with the properties of the digitized real sample and of the samples generated by the standard grain reconstruction method. © 2011 American Physical Society
DOE Office of Scientific and Technical Information (OSTI.GOV)
Letant, S E; Kane, S R; Murphy, G A
2008-05-30
This note presents a comparison of Most-Probable-Number Rapid Viability (MPN-RV) PCR and traditional culture methods for the quantification of Bacillus anthracis Sterne spores in macrofoam swabs generated by the Centers for Disease Control and Prevention (CDC) for a multi-center validation study aimed at testing environmental swab processing methods for recovery, detection, and quantification of viable B. anthracis spores from surfaces. Results show that spore numbers provided by the MPN RV-PCR method were in statistical agreement with the CDC conventional culture method for all three levels of spores tested (10{sup 4}, 10{sup 2}, and 10 spores) even in the presence ofmore » dirt. In addition to detecting low levels of spores in environmental conditions, the MPN RV-PCR method is specific, and compatible with automated high-throughput sample processing and analysis protocols.« less
Radiation detection method and system using the sequential probability ratio test
Nelson, Karl E [Livermore, CA; Valentine, John D [Redwood City, CA; Beauchamp, Brock R [San Ramon, CA
2007-07-17
A method and system using the Sequential Probability Ratio Test to enhance the detection of an elevated level of radiation, by determining whether a set of observations are consistent with a specified model within a given bounds of statistical significance. In particular, the SPRT is used in the present invention to maximize the range of detection, by providing processing mechanisms for estimating the dynamic background radiation, adjusting the models to reflect the amount of background knowledge at the current point in time, analyzing the current sample using the models to determine statistical significance, and determining when the sample has returned to the expected background conditions.
Conroy, M.J.; Nichols, J.D.
1984-01-01
Several important questions in evolutionary biology and paleobiology involve sources of variation in extinction rates. In all cases of which we are aware, extinction rates have been estimated from data in which the probability that an observation (e.g., a fossil taxon) will occur is related both to extinction rates and to what we term encounter probabilities. Any statistical method for analyzing fossil data should at a minimum permit separate inferences on these two components. We develop a method for estimating taxonomic extinction rates from stratigraphic range data and for testing hypotheses about variability in these rates. We use this method to estimate extinction rates and to test the hypothesis of constant extinction rates for several sets of stratigraphic range data. The results of our tests support the hypothesis that extinction rates varied over the geologic time periods examined. We also present a test that can be used to identify periods of high or low extinction probabilities and provide an example using Phanerozoic invertebrate data. Extinction rates should be analyzed using stochastic models, in which it is recognized that stratigraphic samples are random varlates and that sampling is imperfect
Methods for fitting a parametric probability distribution to most probable number data.
Williams, Michael S; Ebel, Eric D
2012-07-02
Every year hundreds of thousands, if not millions, of samples are collected and analyzed to assess microbial contamination in food and water. The concentration of pathogenic organisms at the end of the production process is low for most commodities, so a highly sensitive screening test is used to determine whether the organism of interest is present in a sample. In some applications, samples that test positive are subjected to quantitation. The most probable number (MPN) technique is a common method to quantify the level of contamination in a sample because it is able to provide estimates at low concentrations. This technique uses a series of dilution count experiments to derive estimates of the concentration of the microorganism of interest. An application for these data is food-safety risk assessment, where the MPN concentration estimates can be fitted to a parametric distribution to summarize the range of potential exposures to the contaminant. Many different methods (e.g., substitution methods, maximum likelihood and regression on order statistics) have been proposed to fit microbial contamination data to a distribution, but the development of these methods rarely considers how the MPN technique influences the choice of distribution function and fitting method. An often overlooked aspect when applying these methods is whether the data represent actual measurements of the average concentration of microorganism per milliliter or the data are real-valued estimates of the average concentration, as is the case with MPN data. In this study, we propose two methods for fitting MPN data to a probability distribution. The first method uses a maximum likelihood estimator that takes average concentration values as the data inputs. The second is a Bayesian latent variable method that uses the counts of the number of positive tubes at each dilution to estimate the parameters of the contamination distribution. The performance of the two fitting methods is compared for two data sets that represent Salmonella and Campylobacter concentrations on chicken carcasses. The results demonstrate a bias in the maximum likelihood estimator that increases with reductions in average concentration. The Bayesian method provided unbiased estimates of the concentration distribution parameters for all data sets. We provide computer code for the Bayesian fitting method. Published by Elsevier B.V.
Point-Sampling and Line-Sampling Probability Theory, Geometric Implications, Synthesis
L.R. Grosenbaugh
1958-01-01
Foresters concerned with measuring tree populations on definite areas have long employed two well-known methods of representative sampling. In list or enumerative sampling the entire tree population is tallied with a known proportion being randomly selected and measured for volume or other variables. In area sampling all trees on randomly located plots or strips...
Generalized Wishart Mixtures for Unsupervised Classification of PolSAR Data
NASA Astrophysics Data System (ADS)
Li, Lan; Chen, Erxue; Li, Zengyuan
2013-01-01
This paper presents an unsupervised clustering algorithm based upon the expectation maximization (EM) algorithm for finite mixture modelling, using the complex wishart probability density function (PDF) for the probabilities. The mixture model enables to consider heterogeneous thematic classes which could not be better fitted by the unimodal wishart distribution. In order to make it fast and robust to calculate, we use the recently proposed generalized gamma distribution (GΓD) for the single polarization intensity data to make the initial partition. Then we use the wishart probability density function for the corresponding sample covariance matrix to calculate the posterior class probabilities for each pixel. The posterior class probabilities are used for the prior probability estimates of each class and weights for all class parameter updates. The proposed method is evaluated and compared with the wishart H-Alpha-A classification. Preliminary results show that the proposed method has better performance.
Moving on From Representativeness: Testing the Utility of the Global Drug Survey.
Barratt, Monica J; Ferris, Jason A; Zahnow, Renee; Palamar, Joseph J; Maier, Larissa J; Winstock, Adam R
2017-01-01
A decline in response rates in traditional household surveys, combined with increased internet coverage and decreased research budgets, has resulted in increased attractiveness of web survey research designs based on purposive and voluntary opt-in sampling strategies. In the study of hidden or stigmatised behaviours, such as cannabis use, web survey methods are increasingly common. However, opt-in web surveys are often heavily criticised due to their lack of sampling frame and unknown representativeness. In this article, we outline the current state of the debate about the relevance of pursuing representativeness, the state of probability sampling methods, and the utility of non-probability, web survey methods especially for accessing hidden or minority populations. Our article has two aims: (1) to present a comprehensive description of the methodology we use at Global Drug Survey (GDS), an annual cross-sectional web survey and (2) to compare the age and sex distributions of cannabis users who voluntarily completed (a) a household survey or (b) a large web-based purposive survey (GDS), across three countries: Australia, the United States, and Switzerland. We find that within each set of country comparisons, the demographic distributions among recent cannabis users are broadly similar, demonstrating that the age and sex distributions of those who volunteer to be surveyed are not vastly different between these non-probability and probability methods. We conclude that opt-in web surveys of hard-to-reach populations are an efficient way of gaining in-depth understanding of stigmatised behaviours and are appropriate, as long as they are not used to estimate drug use prevalence of the general population.
Moving on From Representativeness: Testing the Utility of the Global Drug Survey
Barratt, Monica J; Ferris, Jason A; Zahnow, Renee; Palamar, Joseph J; Maier, Larissa J; Winstock, Adam R
2017-01-01
A decline in response rates in traditional household surveys, combined with increased internet coverage and decreased research budgets, has resulted in increased attractiveness of web survey research designs based on purposive and voluntary opt-in sampling strategies. In the study of hidden or stigmatised behaviours, such as cannabis use, web survey methods are increasingly common. However, opt-in web surveys are often heavily criticised due to their lack of sampling frame and unknown representativeness. In this article, we outline the current state of the debate about the relevance of pursuing representativeness, the state of probability sampling methods, and the utility of non-probability, web survey methods especially for accessing hidden or minority populations. Our article has two aims: (1) to present a comprehensive description of the methodology we use at Global Drug Survey (GDS), an annual cross-sectional web survey and (2) to compare the age and sex distributions of cannabis users who voluntarily completed (a) a household survey or (b) a large web-based purposive survey (GDS), across three countries: Australia, the United States, and Switzerland. We find that within each set of country comparisons, the demographic distributions among recent cannabis users are broadly similar, demonstrating that the age and sex distributions of those who volunteer to be surveyed are not vastly different between these non-probability and probability methods. We conclude that opt-in web surveys of hard-to-reach populations are an efficient way of gaining in-depth understanding of stigmatised behaviours and are appropriate, as long as they are not used to estimate drug use prevalence of the general population. PMID:28924351
Review of sampling hard-to-reach and hidden populations for HIV surveillance.
Magnani, Robert; Sabin, Keith; Saidel, Tobi; Heckathorn, Douglas
2005-05-01
Adequate surveillance of hard-to-reach and 'hidden' subpopulations is crucial to containing the HIV epidemic in low prevalence settings and in slowing the rate of transmission in high prevalence settings. For a variety of reasons, however, conventional facility and survey-based surveillance data collection strategies are ineffective for a number of key subpopulations, particularly those whose behaviors are illegal or illicit. This paper critically reviews alternative sampling strategies for undertaking behavioral or biological surveillance surveys of such groups. Non-probability sampling approaches such as facility-based sentinel surveillance and snowball sampling are the simplest to carry out, but are subject to a high risk of sampling/selection bias. Most of the probability sampling methods considered are limited in that they are adequate only under certain circumstances and for some groups. One relatively new method, respondent-driven sampling, an adaptation of chain-referral sampling, appears to be the most promising for general applications. However, as its applicability to HIV surveillance in resource-poor settings has yet to be established, further field trials are needed before a firm conclusion can be reached.
Digital simulation of an arbitrary stationary stochastic process by spectral representation.
Yura, Harold T; Hanson, Steen G
2011-04-01
In this paper we present a straightforward, efficient, and computationally fast method for creating a large number of discrete samples with an arbitrary given probability density function and a specified spectral content. The method relies on initially transforming a white noise sample set of random Gaussian distributed numbers into a corresponding set with the desired spectral distribution, after which this colored Gaussian probability distribution is transformed via an inverse transform into the desired probability distribution. In contrast to previous work, where the analyses were limited to auto regressive and or iterative techniques to obtain satisfactory results, we find that a single application of the inverse transform method yields satisfactory results for a wide class of arbitrary probability distributions. Although a single application of the inverse transform technique does not conserve the power spectra exactly, it yields highly accurate numerical results for a wide range of probability distributions and target power spectra that are sufficient for system simulation purposes and can thus be regarded as an accurate engineering approximation, which can be used for wide range of practical applications. A sufficiency condition is presented regarding the range of parameter values where a single application of the inverse transform method yields satisfactory agreement between the simulated and target power spectra, and a series of examples relevant for the optics community are presented and discussed. Outside this parameter range the agreement gracefully degrades but does not distort in shape. Although we demonstrate the method here focusing on stationary random processes, we see no reason why the method could not be extended to simulate non-stationary random processes. © 2011 Optical Society of America
Farmer, William H.; Koltun, Greg
2017-01-01
Study regionThe state of Ohio in the United States, a humid, continental climate.Study focusThe estimation of nonexceedance probabilities of daily streamflows as an alternative means of establishing the relative magnitudes of streamflows associated with hydrologic and water-quality observations.New hydrological insights for the regionSeveral methods for estimating nonexceedance probabilities of daily mean streamflows are explored, including single-index methodologies (nearest-neighboring index) and geospatial tools (kriging and topological kriging). These methods were evaluated by conducting leave-one-out cross-validations based on analyses of nearly 7 years of daily streamflow data from 79 unregulated streamgages in Ohio and neighboring states. The pooled, ordinary kriging model, with a median Nash–Sutcliffe performance of 0.87, was superior to the single-site index methods, though there was some bias in the tails of the probability distribution. Incorporating network structure through topological kriging did not improve performance. The pooled, ordinary kriging model was applied to 118 locations without systematic streamgaging across Ohio where instantaneous streamflow measurements had been made concurrent with water-quality sampling on at least 3 separate days. Spearman rank correlations between estimated nonexceedance probabilities and measured streamflows were high, with a median value of 0.76. In consideration of application, the degree of regulation in a set of sample sites helped to specify the streamgages required to implement kriging approaches successfully.
Using effort information with change-in-ratio data for population estimation
Udevitz, Mark S.; Pollock, Kenneth H.
1995-01-01
Most change-in-ratio (CIR) methods for estimating fish and wildlife population sizes have been based only on assumptions about how encounter probabilities vary among population subclasses. When information on sampling effort is available, it is also possible to derive CIR estimators based on assumptions about how encounter probabilities vary over time. This paper presents a generalization of previous CIR models that allows explicit consideration of a range of assumptions about the variation of encounter probabilities among subclasses and over time. Explicit estimators are derived under this model for specific sets of assumptions about the encounter probabilities. Numerical methods are presented for obtaining estimators under the full range of possible assumptions. Likelihood ratio tests for these assumptions are described. Emphasis is on obtaining estimators based on assumptions about variation of encounter probabilities over time.
Guo, Yan; Li, Xiaoming; Fang, Xiaoyi; Lin, Xiuyun; Song, Yan; Jiang, Shuling; Stanton, Bonita
2011-01-01
Sample representativeness remains one of the challenges in effective HIV/STD surveillance and prevention targeting MSM worldwide. Although convenience samples are widely used in studies of MSM, previous studies suggested that these samples might not be representative of the broader MSM population. This issue becomes even more critical in many developing countries where needed resources for conducting probability sampling are limited. We examined variations in HIV and Syphilis infections and sociodemographic and behavioral factors among 307 young migrant MSM recruited using four different convenience sampling methods (peer outreach, informal social network, Internet, and venue-based) in Beijing, China in 2009. The participants completed a self-administered survey and provided blood specimens for HIV/STD testing. Among the four MSM samples using different recruitment methods, rates of HIV infections were 5.1%, 5.8%, 7.8%, and 3.4%; rates of Syphilis infection were 21.8%, 36.2%, 11.8%, and 13.8%; rates of inconsistent condom use were 57%, 52%, 58%, and 38%. Significant differences were found in various sociodemographic characteristics (e.g., age, migration history, education, income, places of employment) and risk behaviors (e.g., age at first sex, number of sex partners, involvement in commercial sex, and substance use) among samples recruited by different sampling methods. The results confirmed the challenges of obtaining representative MSM samples and underscored the importance of using multiple sampling methods to reach MSM from diverse backgrounds and in different social segments and to improve the representativeness of the MSM samples when the use of probability sampling approach is not feasible. PMID:21711162
Guo, Yan; Li, Xiaoming; Fang, Xiaoyi; Lin, Xiuyun; Song, Yan; Jiang, Shuling; Stanton, Bonita
2011-11-01
Sample representativeness remains one of the challenges in effective HIV/STD surveillance and prevention targeting men who have sex with men (MSM) worldwide. Although convenience samples are widely used in studies of MSM, previous studies suggested that these samples might not be representative of the broader MSM population. This issue becomes even more critical in many developing countries where needed resources for conducting probability sampling are limited. We examined variations in HIV and Syphilis infections and sociodemographic and behavioral factors among 307 young migrant MSM recruited using four different convenience sampling methods (peer outreach, informal social network, Internet, and venue-based) in Beijing, China in 2009. The participants completed a self-administered survey and provided blood specimens for HIV/STD testing. Among the four MSM samples using different recruitment methods, rates of HIV infections were 5.1%, 5.8%, 7.8%, and 3.4%; rates of Syphilis infection were 21.8%, 36.2%, 11.8%, and 13.8%; and rates of inconsistent condom use were 57%, 52%, 58%, and 38%. Significant differences were found in various sociodemographic characteristics (e.g., age, migration history, education, income, and places of employment) and risk behaviors (e.g., age at first sex, number of sex partners, involvement in commercial sex, and substance use) among samples recruited by different sampling methods. The results confirmed the challenges of obtaining representative MSM samples and underscored the importance of using multiple sampling methods to reach MSM from diverse backgrounds and in different social segments and to improve the representativeness of the MSM samples when the use of probability sampling approach is not feasible.
ERIC Educational Resources Information Center
Vahabi, Mandana
2010-01-01
Objective: To test whether the format in which women receive probabilistic information about breast cancer and mammography affects their comprehension. Methods: A convenience sample of 180 women received pre-assembled randomized packages containing a breast health information brochure, with probabilities presented in either verbal or numeric…
NASA Astrophysics Data System (ADS)
Giovanis, D. G.; Shields, M. D.
2018-07-01
This paper addresses uncertainty quantification (UQ) for problems where scalar (or low-dimensional vector) response quantities are insufficient and, instead, full-field (very high-dimensional) responses are of interest. To do so, an adaptive stochastic simulation-based methodology is introduced that refines the probability space based on Grassmann manifold variations. The proposed method has a multi-element character discretizing the probability space into simplex elements using a Delaunay triangulation. For every simplex, the high-dimensional solutions corresponding to its vertices (sample points) are projected onto the Grassmann manifold. The pairwise distances between these points are calculated using appropriately defined metrics and the elements with large total distance are sub-sampled and refined. As a result, regions of the probability space that produce significant changes in the full-field solution are accurately resolved. An added benefit is that an approximation of the solution within each element can be obtained by interpolation on the Grassmann manifold. The method is applied to study the probability of shear band formation in a bulk metallic glass using the shear transformation zone theory.
Probabilistic approach to lysozyme crystal nucleation kinetics.
Dimitrov, Ivaylo L; Hodzhaoglu, Feyzim V; Koleva, Dobryana P
2015-09-01
Nucleation of lysozyme crystals in quiescent solutions at a regime of progressive nucleation is investigated under an optical microscope at conditions of constant supersaturation. A method based on the stochastic nature of crystal nucleation and using discrete time sampling of small solution volumes for the presence or absence of detectable crystals is developed. It allows probabilities for crystal detection to be experimentally estimated. One hundred single samplings were used for each probability determination for 18 time intervals and six lysozyme concentrations. Fitting of a particular probability function to experimentally obtained data made possible the direct evaluation of stationary rates for lysozyme crystal nucleation, the time for growth of supernuclei to a detectable size and probability distribution of nucleation times. Obtained stationary nucleation rates were then used for the calculation of other nucleation parameters, such as the kinetic nucleation factor, nucleus size, work for nucleus formation and effective specific surface energy of the nucleus. The experimental method itself is simple and adaptable and can be used for crystal nucleation studies of arbitrary soluble substances with known solubility at particular solution conditions.
Sample size calculation for a proof of concept study.
Yin, Yin
2002-05-01
Sample size calculation is vital for a confirmatory clinical trial since the regulatory agencies require the probability of making Type I error to be significantly small, usually less than 0.05 or 0.025. However, the importance of the sample size calculation for studies conducted by a pharmaceutical company for internal decision making, e.g., a proof of concept (PoC) study, has not received enough attention. This article introduces a Bayesian method that identifies the information required for planning a PoC and the process of sample size calculation. The results will be presented in terms of the relationships between the regulatory requirements, the probability of reaching the regulatory requirements, the goalpost for PoC, and the sample size used for PoC.
Public attitudes toward stuttering in Turkey: probability versus convenience sampling.
Ozdemir, R Sertan; St Louis, Kenneth O; Topbaş, Seyhun
2011-12-01
A Turkish translation of the Public Opinion Survey of Human Attributes-Stuttering (POSHA-S) was used to compare probability versus convenience sampling to measure public attitudes toward stuttering. A convenience sample of adults in Eskişehir, Turkey was compared with two replicates of a school-based, probability cluster sampling scheme. The two replicates of the probability sampling scheme yielded similar demographic samples, both of which were different from the convenience sample. Components of subscores on the POSHA-S were significantly different in more than half of the comparisons between convenience and probability samples, indicating important differences in public attitudes. If POSHA-S users intend to generalize to specific geographic areas, results of this study indicate that probability sampling is a better research strategy than convenience sampling. The reader will be able to: (1) discuss the difference between convenience sampling and probability sampling; (2) describe a school-based probability sampling scheme; and (3) describe differences in POSHA-S results from convenience sampling versus probability sampling. Copyright © 2011 Elsevier Inc. All rights reserved.
A risk assessment method for multi-site damage
NASA Astrophysics Data System (ADS)
Millwater, Harry Russell, Jr.
This research focused on developing probabilistic methods suitable for computing small probabilities of failure, e.g., 10sp{-6}, of structures subject to multi-site damage (MSD). MSD is defined as the simultaneous development of fatigue cracks at multiple sites in the same structural element such that the fatigue cracks may coalesce to form one large crack. MSD is modeled as an array of collinear cracks with random initial crack lengths with the centers of the initial cracks spaced uniformly apart. The data used was chosen to be representative of aluminum structures. The structure is considered failed whenever any two adjacent cracks link up. A fatigue computer model is developed that can accurately and efficiently grow a collinear array of arbitrary length cracks from initial size until failure. An algorithm is developed to compute the stress intensity factors of all cracks considering all interaction effects. The probability of failure of two to 100 cracks is studied. Lower bounds on the probability of failure are developed based upon the probability of the largest crack exceeding a critical crack size. The critical crack size is based on the initial crack size that will grow across the ligament when the neighboring crack has zero length. The probability is evaluated using extreme value theory. An upper bound is based on the probability of the maximum sum of initial cracks being greater than a critical crack size. A weakest link sampling approach is developed that can accurately and efficiently compute small probabilities of failure. This methodology is based on predicting the weakest link, i.e., the two cracks to link up first, for a realization of initial crack sizes, and computing the cycles-to-failure using these two cracks. Criteria to determine the weakest link are discussed. Probability results using the weakest link sampling method are compared to Monte Carlo-based benchmark results. The results indicate that very small probabilities can be computed accurately in a few minutes using a Hewlett-Packard workstation.
Bayesian analysis of rare events
NASA Astrophysics Data System (ADS)
Straub, Daniel; Papaioannou, Iason; Betz, Wolfgang
2016-06-01
In many areas of engineering and science there is an interest in predicting the probability of rare events, in particular in applications related to safety and security. Increasingly, such predictions are made through computer models of physical systems in an uncertainty quantification framework. Additionally, with advances in IT, monitoring and sensor technology, an increasing amount of data on the performance of the systems is collected. This data can be used to reduce uncertainty, improve the probability estimates and consequently enhance the management of rare events and associated risks. Bayesian analysis is the ideal method to include the data into the probabilistic model. It ensures a consistent probabilistic treatment of uncertainty, which is central in the prediction of rare events, where extrapolation from the domain of observation is common. We present a framework for performing Bayesian updating of rare event probabilities, termed BUS. It is based on a reinterpretation of the classical rejection-sampling approach to Bayesian analysis, which enables the use of established methods for estimating probabilities of rare events. By drawing upon these methods, the framework makes use of their computational efficiency. These methods include the First-Order Reliability Method (FORM), tailored importance sampling (IS) methods and Subset Simulation (SuS). In this contribution, we briefly review these methods in the context of the BUS framework and investigate their applicability to Bayesian analysis of rare events in different settings. We find that, for some applications, FORM can be highly efficient and is surprisingly accurate, enabling Bayesian analysis of rare events with just a few model evaluations. In a general setting, BUS implemented through IS and SuS is more robust and flexible.
Simplified Computation for Nonparametric Windows Method of Probability Density Function Estimation.
Joshi, Niranjan; Kadir, Timor; Brady, Michael
2011-08-01
Recently, Kadir and Brady proposed a method for estimating probability density functions (PDFs) for digital signals which they call the Nonparametric (NP) Windows method. The method involves constructing a continuous space representation of the discrete space and sampled signal by using a suitable interpolation method. NP Windows requires only a small number of observed signal samples to estimate the PDF and is completely data driven. In this short paper, we first develop analytical formulae to obtain the NP Windows PDF estimates for 1D, 2D, and 3D signals, for different interpolation methods. We then show that the original procedure to calculate the PDF estimate can be significantly simplified and made computationally more efficient by a judicious choice of the frame of reference. We have also outlined specific algorithmic details of the procedures enabling quick implementation. Our reformulation of the original concept has directly demonstrated a close link between the NP Windows method and the Kernel Density Estimator.
Fatemeh, Dehghan; Reza, Zolfaghari Mohammad; Mohammad, Arjomandzadegan; Salomeh, Kalantari; Reza, Ahmari Gholam; Hossein, Sarmadian; Maryam, Sadrnia; Azam, Ahmadi; Mana, Shojapoor; Negin, Najarian; Reza, Kasravi Alii; Saeed, Falahat
2014-01-01
Objective To analyse molecular detection of coliforms and shorten the time of PCR. Methods Rapid detection of coliforms by amplification of lacZ and uidA genes in a multiplex PCR reaction was designed and performed in comparison with most probably number (MPN) method for 16 artificial and 101 field samples. The molecular method was also conducted on isolated coliforms from positive MPN samples; standard sample for verification of microbial method certificated reference material; isolated strains from certificated reference material and standard bacteria. The PCR and electrophoresis parameters were changed for reducing the operation time. Results Results of PCR for lacZ and uidA genes were similar in all of standard, operational and artificial samples and showed the 876 bp and 147 bp bands of lacZ and uidA genes by multiplex PCR. PCR results were confirmed by MPN culture method by sensitivity 86% (95% CI: 0.71-0.93). Also the total execution time, with a successful change of factors, was reduced to less than two and a half hour. Conclusions Multiplex PCR method with shortened operation time was used for the simultaneous detection of total coliforms and Escherichia coli in distribution system of Arak city. It's recommended to be used at least as an initial screening test, and then the positive samples could be randomly tested by MPN. PMID:25182727
Probability of detection of defects in coatings with electronic shearography
NASA Astrophysics Data System (ADS)
Maddux, Gary A.; Horton, Charles M.; Lansing, Matthew D.; Gnacek, William J.; Newton, Patrick L.
1994-07-01
The goal of this research was to utilize statistical methods to evaluate the probability of detection (POD) of defects in coatings using electronic shearography. The coating system utilized in the POD studies was to be the paint system currently utilized on the external casings of the NASA Space Transportation System (STS) Revised Solid Rocket Motor (RSRM) boosters. The population of samples was to be large enough to determine the minimum defect size for 90 percent probability of detection of 95 percent confidence POD on these coatings. Also, the best methods to excite coatings on aerospace components to induce deformations for measurement by electronic shearography were to be determined.
Probability of detection of defects in coatings with electronic shearography
NASA Technical Reports Server (NTRS)
Maddux, Gary A.; Horton, Charles M.; Lansing, Matthew D.; Gnacek, William J.; Newton, Patrick L.
1994-01-01
The goal of this research was to utilize statistical methods to evaluate the probability of detection (POD) of defects in coatings using electronic shearography. The coating system utilized in the POD studies was to be the paint system currently utilized on the external casings of the NASA Space Transportation System (STS) Revised Solid Rocket Motor (RSRM) boosters. The population of samples was to be large enough to determine the minimum defect size for 90 percent probability of detection of 95 percent confidence POD on these coatings. Also, the best methods to excite coatings on aerospace components to induce deformations for measurement by electronic shearography were to be determined.
Polynomial chaos representation of databases on manifolds
DOE Office of Scientific and Technical Information (OSTI.GOV)
Soize, C., E-mail: christian.soize@univ-paris-est.fr; Ghanem, R., E-mail: ghanem@usc.edu
2017-04-15
Characterizing the polynomial chaos expansion (PCE) of a vector-valued random variable with probability distribution concentrated on a manifold is a relevant problem in data-driven settings. The probability distribution of such random vectors is multimodal in general, leading to potentially very slow convergence of the PCE. In this paper, we build on a recent development for estimating and sampling from probabilities concentrated on a diffusion manifold. The proposed methodology constructs a PCE of the random vector together with an associated generator that samples from the target probability distribution which is estimated from data concentrated in the neighborhood of the manifold. Themore » method is robust and remains efficient for high dimension and large datasets. The resulting polynomial chaos construction on manifolds permits the adaptation of many uncertainty quantification and statistical tools to emerging questions motivated by data-driven queries.« less
Change-in-ratio methods for estimating population size
Udevitz, Mark S.; Pollock, Kenneth H.; McCullough, Dale R.; Barrett, Reginald H.
2002-01-01
Change-in-ratio (CIR) methods can provide an effective, low cost approach for estimating the size of wildlife populations. They rely on being able to observe changes in proportions of population subclasses that result from the removal of a known number of individuals from the population. These methods were first introduced in the 1940’s to estimate the size of populations with 2 subclasses under the assumption of equal subclass encounter probabilities. Over the next 40 years, closed population CIR models were developed to consider additional subclasses and use additional sampling periods. Models with assumptions about how encounter probabilities vary over time, rather than between subclasses, also received some attention. Recently, all of these CIR models have been shown to be special cases of a more general model. Under the general model, information from additional samples can be used to test assumptions about the encounter probabilities and to provide estimates of subclass sizes under relaxations of these assumptions. These developments have greatly extended the applicability of the methods. CIR methods are attractive because they do not require the marking of individuals, and subclass proportions often can be estimated with relatively simple sampling procedures. However, CIR methods require a carefully monitored removal of individuals from the population, and the estimates will be of poor quality unless the removals induce substantial changes in subclass proportions. In this paper, we review the state of the art for closed population estimation with CIR methods. Our emphasis is on the assumptions of CIR methods and on identifying situations where these methods are likely to be effective. We also identify some important areas for future CIR research.
Exploring the Connection Between Sampling Problems in Bayesian Inference and Statistical Mechanics
NASA Technical Reports Server (NTRS)
Pohorille, Andrew
2006-01-01
The Bayesian and statistical mechanical communities often share the same objective in their work - estimating and integrating probability distribution functions (pdfs) describing stochastic systems, models or processes. Frequently, these pdfs are complex functions of random variables exhibiting multiple, well separated local minima. Conventional strategies for sampling such pdfs are inefficient, sometimes leading to an apparent non-ergodic behavior. Several recently developed techniques for handling this problem have been successfully applied in statistical mechanics. In the multicanonical and Wang-Landau Monte Carlo (MC) methods, the correct pdfs are recovered from uniform sampling of the parameter space by iteratively establishing proper weighting factors connecting these distributions. Trivial generalizations allow for sampling from any chosen pdf. The closely related transition matrix method relies on estimating transition probabilities between different states. All these methods proved to generate estimates of pdfs with high statistical accuracy. In another MC technique, parallel tempering, several random walks, each corresponding to a different value of a parameter (e.g. "temperature"), are generated and occasionally exchanged using the Metropolis criterion. This method can be considered as a statistically correct version of simulated annealing. An alternative approach is to represent the set of independent variables as a Hamiltonian system. Considerab!e progress has been made in understanding how to ensure that the system obeys the equipartition theorem or, equivalently, that coupling between the variables is correctly described. Then a host of techniques developed for dynamical systems can be used. Among them, probably the most powerful is the Adaptive Biasing Force method, in which thermodynamic integration and biased sampling are combined to yield very efficient estimates of pdfs. The third class of methods deals with transitions between states described by rate constants. These problems are isomorphic with chemical kinetics problems. Recently, several efficient techniques for this purpose have been developed based on the approach originally proposed by Gillespie. Although the utility of the techniques mentioned above for Bayesian problems has not been determined, further research along these lines is warranted
Williams, M S; Ebel, E D; Cao, Y
2013-01-01
The fitting of statistical distributions to microbial sampling data is a common application in quantitative microbiology and risk assessment applications. An underlying assumption of most fitting techniques is that data are collected with simple random sampling, which is often times not the case. This study develops a weighted maximum likelihood estimation framework that is appropriate for microbiological samples that are collected with unequal probabilities of selection. A weighted maximum likelihood estimation framework is proposed for microbiological samples that are collected with unequal probabilities of selection. Two examples, based on the collection of food samples during processing, are provided to demonstrate the method and highlight the magnitude of biases in the maximum likelihood estimator when data are inappropriately treated as a simple random sample. Failure to properly weight samples to account for how data are collected can introduce substantial biases into inferences drawn from the data. The proposed methodology will reduce or eliminate an important source of bias in inferences drawn from the analysis of microbial data. This will also make comparisons between studies and the combination of results from different studies more reliable, which is important for risk assessment applications. © 2012 No claim to US Government works.
Mark J. Ducey; Jeffrey H. Gove; Harry T. Valentine
2008-01-01
Perpendicular distance sampling (PDS) is a fast probability-proportional-to-size method for inventory of downed wood. However, previous development of PDS had limited the method to estimating only one variable (such as volume per hectare, or surface area per hectare) at a time. Here, we develop a general design-unbiased estimator for PDS. We then show how that...
De Boni, Raquel; do Nascimento Silva, Pedro Luis; Bastos, Francisco Inácio; Pechansky, Flavio; de Vasconcellos, Mauricio Teixeira Leite
2012-01-01
Drinking alcoholic beverages in places such as bars and clubs may be associated with harmful consequences such as violence and impaired driving. However, methods for obtaining probabilistic samples of drivers who drink at these places remain a challenge – since there is no a priori information on this mobile population – and must be continually improved. This paper describes the procedures adopted in the selection of a population-based sample of drivers who drank at alcohol selling outlets in Porto Alegre, Brazil, which we used to estimate the prevalence of intention to drive under the influence of alcohol. The sampling strategy comprises a stratified three-stage cluster sampling: 1) census enumeration areas (CEA) were stratified by alcohol outlets (AO) density and sampled with probability proportional to the number of AOs in each CEA; 2) combinations of outlets and shifts (COS) were stratified by prevalence of alcohol-related traffic crashes and sampled with probability proportional to their squared duration in hours; and, 3) drivers who drank at the selected COS were stratified by their intention to drive and sampled using inverse sampling. Sample weights were calibrated using a post-stratification estimator. 3,118 individuals were approached and 683 drivers interviewed, leading to an estimate that 56.3% (SE = 3,5%) of the drivers intended to drive after drinking in less than one hour after the interview. Prevalence was also estimated by sex and broad age groups. The combined use of stratification and inverse sampling enabled a good trade-off between resource and time allocation, while preserving the ability to generalize the findings. The current strategy can be viewed as a step forward in the efforts to improve surveys and estimation for hard-to-reach, mobile populations. PMID:22514620
Sample Selection in Randomized Experiments: A New Method Using Propensity Score Stratified Sampling
ERIC Educational Resources Information Center
Tipton, Elizabeth; Hedges, Larry; Vaden-Kiernan, Michael; Borman, Geoffrey; Sullivan, Kate; Caverly, Sarah
2014-01-01
Randomized experiments are often seen as the "gold standard" for causal research. Despite the fact that experiments use random assignment to treatment conditions, units are seldom selected into the experiment using probability sampling. Very little research on experimental design has focused on how to make generalizations to well-defined…
Performance of Random Effects Model Estimators under Complex Sampling Designs
ERIC Educational Resources Information Center
Jia, Yue; Stokes, Lynne; Harris, Ian; Wang, Yan
2011-01-01
In this article, we consider estimation of parameters of random effects models from samples collected via complex multistage designs. Incorporation of sampling weights is one way to reduce estimation bias due to unequal probabilities of selection. Several weighting methods have been proposed in the literature for estimating the parameters of…
Monitoring Species of Concern Using Noninvasive Genetic Sampling and Capture-Recapture Methods
2016-11-01
ABBREVIATIONS AICc Akaike’s Information Criterion with small sample size correction AZGFD Arizona Game and Fish Department BMGR Barry M. Goldwater...MNKA Minimum Number Known Alive N Abundance Ne Effective Population Size NGS Noninvasive Genetic Sampling NGS-CR Noninvasive Genetic...parameter estimates from capture-recapture models require sufficient sample sizes , capture probabilities and low capture biases. For NGS-CR, sample
Validation of an automated fluorescein method for determining bromide in water
Fishman, M. J.; Schroder, L.J.; Friedman, L.C.
1985-01-01
Surface, atmospheric precipitation and deionized water samples were spiked with ??g l-1 concentrations of bromide, and the solutions stored in polyethylene and polytetrafluoroethylene bottles. Bromide was determined periodically for 30 days. Automated fluorescein and ion chromatography methods were used to determine bromide in these prepared samples. Analysis of the data by the paired t-test indicates that the two methods are not significantly different at a probability of 95% for samples containing from 0.015 to 0.5 mg l-1 of bromide. The correlation coefficient for the same sets of paired data is 0.9987. Recovery data, except for the surface water samples to which 0.005 mg l-1 of bromide was added, range from 89 to 112%. There appears to be no loss of bromide from solution in either type of container.Surface, atmospheric precipitation and deionized water samples were spiked with mu g l** minus **1 concentrations of bromide, and the solutions stored in polyethylene and polytetrafluoroethylene bottles. Bromide was determined periodically for 30 days. Automated fluorescein and ion chromatography methods were used to determine bromide in these prepared samples. Analysis of the data by the paired t-test indicates that the two methods are not significantly different at a probability of 95% for samples containing from 0. 015 to 0. 5 mg l** minus **1 of bromide. The correlation coefficient for the same sets of paired data is 0. 9987. Recovery data, except for the surface water samples to which 0. 005 mg l** minus **1 of bromide was added, range from 89 to 112%. Refs.
Durbin, Gregory W; Salter, Robert
2006-01-01
The Ecolite High Volume Juice (HVJ) presence-absence method for a 10-ml juice sample was compared with the U.S. Food and Drug Administration Bacteriological Analytical Manual most-probable-number (MPN) method for analysis of artificially contaminated orange juices. Samples were added to Ecolite-HVJ medium and incubated at 35 degrees C for 24 to 48 h. Fluorescent blue results were positive for glucuronidase- and galactosidase-producing microorganisms, specifically indicative of about 94% of Escherichia coli strains. Four strains of E. coli were added to juices at concentrations of 0.21 to 6.8 CFU/ ml. Mixtures of enteric bacteria (Enterobacter plus Klebsiella, Citrobacter plus Proteus, or Hafnia plus Citrobacter plus Enterobacter) were added to simulate background flora. Three orange juice types were evaluated (n = 10) with and without the addition of the E. coli strains. Ecolite-HVJ produced 90 of 90 (10 of 10 samples of three juice types, each inoculated with three different E. coli strains) positive (blue-fluorescent) results with artificially contaminated E. coli that had MPN concentrations of <0.3 to 9.3 CFU/ml. Ten of 30 E. coli ATCC 11229 samples with MPN concentrations of <0.3 CFU/ml were identified as positive with Ecolite-HVJ. Isolated colonies recovered from positive Ecolite-HVJ samples were confirmed biochemically as E. coli. Thirty (10 samples each of three juice types) negative (not fluorescent) results were obtained for samples contaminated with only enteric bacteria and for uninoculated control samples. A juice manufacturer evaluated citrus juice production with both the Ecolite-HVJ and Colicomplete methods and recorded identical negative results for 95 20-ml samples and identical positive results for 5 20-ml samples artificially contaminated with E. coli. The Ecolite-HVJ method requires no preenrichment and subsequent transfer steps, which makes it a simple and easy method for use by juice producers.
NASA Technical Reports Server (NTRS)
Ussery, Warren; Johnson, Kenneth; Walker, James; Rummel, Ward
2008-01-01
This slide presentation reviews the use of terahertz imaging and Backscatter Radiography in a probability of detection study of the foam on the external tank (ET) shedding and damaging the shuttle orbiter. Non-destructive Examination (NDE) is performed as one method of preventing critical foam debris during the launch. Conventional NDE methods for inspection of the foam are assessed and the deficiencies are reviewed. Two methods for NDE inspection are reviewed: Backscatter Radiography (BSX) and Terahertz (THZ) Imaging. The purpose of the Probability of Detection (POD) study was to assess performance and reliability of the use of BSX and or THZ as an appropriate NDE method. The study used a test article with inserted defects, and a sample of blanks included to test for false positives. The results of the POD study are reported.
[A quickly methodology for drug intelligence using profiling of illicit heroin samples].
Zhang, Jianxin; Chen, Cunyi
2012-07-01
The aim of the paper was to evaluate a link between two heroin seizures using a descriptive method. The system involved the derivation and gas chromatographic separation of samples followed by a fully automatic data analysis and transfer to a database. Comparisons used the square cosine function between two chromatograms assimilated to vectors. The method showed good discriminatory capabilities. The probability of false positives was extremely slight. In conclusion, this method proved to be efficient and reliable, which appeared suitable for estimating the links between illicit heroin samples.
Statistical methods for identifying and bounding a UXO target area or minefield
DOE Office of Scientific and Technical Information (OSTI.GOV)
McKinstry, Craig A.; Pulsipher, Brent A.; Gilbert, Richard O.
2003-09-18
The sampling unit for minefield or UXO area characterization is typically represented by a geographical block or transect swath that lends itself to characterization by geophysical instrumentation such as mobile sensor arrays. New spatially based statistical survey methods and tools, more appropriate for these unique sampling units have been developed and implemented at PNNL (Visual Sample Plan software, ver. 2.0) with support from the US Department of Defense. Though originally developed to support UXO detection and removal efforts, these tools may also be used in current form or adapted to support demining efforts and aid in the development of newmore » sensors and detection technologies by explicitly incorporating both sampling and detection error in performance assessments. These tools may be used to (1) determine transect designs for detecting and bounding target areas of critical size, shape, and density of detectable items of interest with a specified confidence probability, (2) evaluate the probability that target areas of a specified size, shape and density have not been missed by a systematic or meandering transect survey, and (3) support post-removal verification by calculating the number of transects required to achieve a specified confidence probability that no UXO or mines have been missed.« less
Tran, Viet-Thi; Porcher, Raphael; Falissard, Bruno; Ravaud, Philippe
2016-12-01
To describe methods to determine sample sizes in surveys using open-ended questions and to assess how resampling methods can be used to determine data saturation in these surveys. We searched the literature for surveys with open-ended questions and assessed the methods used to determine sample size in 100 studies selected at random. Then, we used Monte Carlo simulations on data from a previous study on the burden of treatment to assess the probability of identifying new themes as a function of the number of patients recruited. In the literature, 85% of researchers used a convenience sample, with a median size of 167 participants (interquartile range [IQR] = 69-406). In our simulation study, the probability of identifying at least one new theme for the next included subject was 32%, 24%, and 12% after the inclusion of 30, 50, and 100 subjects, respectively. The inclusion of 150 participants at random resulted in the identification of 92% themes (IQR = 91-93%) identified in the original study. In our study, data saturation was most certainly reached for samples >150 participants. Our method may be used to determine when to continue the study to find new themes or stop because of futility. Copyright © 2016 Elsevier Inc. All rights reserved.
Ensemble-Biased Metadynamics: A Molecular Simulation Method to Sample Experimental Distributions
Marinelli, Fabrizio; Faraldo-Gómez, José D.
2015-01-01
We introduce an enhanced-sampling method for molecular dynamics (MD) simulations referred to as ensemble-biased metadynamics (EBMetaD). The method biases a conventional MD simulation to sample a molecular ensemble that is consistent with one or more probability distributions known a priori, e.g., experimental intramolecular distance distributions obtained by double electron-electron resonance or other spectroscopic techniques. To this end, EBMetaD adds an adaptive biasing potential throughout the simulation that discourages sampling of configurations inconsistent with the target probability distributions. The bias introduced is the minimum necessary to fulfill the target distributions, i.e., EBMetaD satisfies the maximum-entropy principle. Unlike other methods, EBMetaD does not require multiple simulation replicas or the introduction of Lagrange multipliers, and is therefore computationally efficient and straightforward in practice. We demonstrate the performance and accuracy of the method for a model system as well as for spin-labeled T4 lysozyme in explicit water, and show how EBMetaD reproduces three double electron-electron resonance distance distributions concurrently within a few tens of nanoseconds of simulation time. EBMetaD is integrated in the open-source PLUMED plug-in (www.plumed-code.org), and can be therefore readily used with multiple MD engines. PMID:26083917
Gray, Brian R.; Holland, Mark D.; Yi, Feng; Starcevich, Leigh Ann Harrod
2013-01-01
Site occupancy models are commonly used by ecologists to estimate the probabilities of species site occupancy and of species detection. This study addresses the influence on site occupancy and detection estimates of variation in species availability among surveys within sites. Such variation in availability may result from temporary emigration, nonavailability of the species for detection, and sampling sites spatially when species presence is not uniform within sites. We demonstrate, using Monte Carlo simulations and aquatic vegetation data, that variation in availability and heterogeneity in the probability of availability may yield biases in the expected values of the site occupancy and detection estimates that have traditionally been associated with low-detection probabilities and heterogeneity in those probabilities. These findings confirm that the effects of availability may be important for ecologists and managers, and that where such effects are expected, modification of sampling designs and/or analytical methods should be considered. Failure to limit the effects of availability may preclude reliable estimation of the probability of site occupancy.
Serang, Oliver; MacCoss, Michael J.; Noble, William Stafford
2010-01-01
The problem of identifying proteins from a shotgun proteomics experiment has not been definitively solved. Identifying the proteins in a sample requires ranking them, ideally with interpretable scores. In particular, “degenerate” peptides, which map to multiple proteins, have made such a ranking difficult to compute. The problem of computing posterior probabilities for the proteins, which can be interpreted as confidence in a protein’s presence, has been especially daunting. Previous approaches have either ignored the peptide degeneracy problem completely, addressed it by computing a heuristic set of proteins or heuristic posterior probabilities, or by estimating the posterior probabilities with sampling methods. We present a probabilistic model for protein identification in tandem mass spectrometry that recognizes peptide degeneracy. We then introduce graph-transforming algorithms that facilitate efficient computation of protein probabilities, even for large data sets. We evaluate our identification procedure on five different well-characterized data sets and demonstrate our ability to efficiently compute high-quality protein posteriors. PMID:20712337
Sampling design trade-offs in occupancy studies with imperfect detection: examples and software
Bailey, L.L.; Hines, J.E.; Nichols, J.D.
2007-01-01
Researchers have used occupancy, or probability of occupancy, as a response or state variable in a variety of studies (e.g., habitat modeling), and occupancy is increasingly favored by numerous state, federal, and international agencies engaged in monitoring programs. Recent advances in estimation methods have emphasized that reliable inferences can be made from these types of studies if detection and occupancy probabilities are simultaneously estimated. The need for temporal replication at sampled sites to estimate detection probability creates a trade-off between spatial replication (number of sample sites distributed within the area of interest/inference) and temporal replication (number of repeated surveys at each site). Here, we discuss a suite of questions commonly encountered during the design phase of occupancy studies, and we describe software (program GENPRES) developed to allow investigators to easily explore design trade-offs focused on particularities of their study system and sampling limitations. We illustrate the utility of program GENPRES using an amphibian example from Greater Yellowstone National Park, USA.
Convective Weather Forecast Quality Metrics for Air Traffic Management Decision-Making
NASA Technical Reports Server (NTRS)
Chatterji, Gano B.; Gyarfas, Brett; Chan, William N.; Meyn, Larry A.
2006-01-01
Since numerical weather prediction models are unable to accurately forecast the severity and the location of the storm cells several hours into the future when compared with observation data, there has been a growing interest in probabilistic description of convective weather. The classical approach for generating uncertainty bounds consists of integrating the state equations and covariance propagation equations forward in time. This step is readily recognized as the process update step of the Kalman Filter algorithm. The second well known method, known as the Monte Carlo method, consists of generating output samples by driving the forecast algorithm with input samples selected from distributions. The statistical properties of the distributions of the output samples are then used for defining the uncertainty bounds of the output variables. This method is computationally expensive for a complex model compared to the covariance propagation method. The main advantage of the Monte Carlo method is that a complex non-linear model can be easily handled. Recently, a few different methods for probabilistic forecasting have appeared in the literature. A method for computing probability of convection in a region using forecast data is described in Ref. 5. Probability at a grid location is computed as the fraction of grid points, within a box of specified dimensions around the grid location, with forecast convection precipitation exceeding a specified threshold. The main limitation of this method is that the results are dependent on the chosen dimensions of the box. The examples presented Ref. 5 show that this process is equivalent to low-pass filtering of the forecast data with a finite support spatial filter. References 6 and 7 describe the technique for computing percentage coverage within a 92 x 92 square-kilometer box and assigning the value to the center 4 x 4 square-kilometer box. This technique is same as that described in Ref. 5. Characterizing the forecast, following the process described in Refs. 5 through 7, in terms of percentage coverage or confidence level is notionally sound compared to characterizing in terms of probabilities because the probability of the forecast being correct can only be determined using actual observations. References 5 through 7 only use the forecast data and not the observations. The method for computing the probability of detection, false alarm ratio and several forecast quality metrics (Skill Scores) using both the forecast and observation data are given in Ref. 2. This paper extends the statistical verification method in Ref. 2 to determine co-occurrence probabilities. The method consists of computing the probability that a severe weather cell (grid location) is detected in the observation data in the neighborhood of the severe weather cell in the forecast data. Probabilities of occurrence at the grid location and in its neighborhood with higher severity, and with lower severity in the observation data compared to that in the forecast data are examined. The method proposed in Refs. 5 through 7 is used for computing the probability that a certain number of cells in the neighborhood of severe weather cells in the forecast data are seen as severe weather cells in the observation data. Finally, the probability of existence of gaps in the observation data in the neighborhood of severe weather cells in forecast data is computed. Gaps are defined as openings between severe weather cells through which an aircraft can safely fly to its intended destination. The rest of the paper is organized as follows. Section II summarizes the statistical verification method described in Ref. 2. The extension of this method for computing the co-occurrence probabilities in discussed in Section HI. Numerical examples using NCWF forecast data and NCWD observation data are presented in Section III to elucidate the characteristics of the co-occurrence probabilities. This section also discusses the procedure for computing throbabilities that the severity of convection in the observation data will be higher or lower in the neighborhood of grid locations compared to that indicated at the grid locations in the forecast data. The probability of coverage of neighborhood grid cells is also described via examples in this section. Section IV discusses the gap detection algorithm and presents a numerical example to illustrate the method. The locations of the detected gaps in the observation data are used along with the locations of convective weather cells in the forecast data to determine the probability of existence of gaps in the neighborhood of these cells. Finally, the paper is concluded in Section V.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Romero, Vicente; Bonney, Matthew; Schroeder, Benjamin
When very few samples of a random quantity are available from a source distribution of unknown shape, it is usually not possible to accurately infer the exact distribution from which the data samples come. Under-estimation of important quantities such as response variance and failure probabilities can result. For many engineering purposes, including design and risk analysis, we attempt to avoid under-estimation with a strategy to conservatively estimate (bound) these types of quantities -- without being overly conservative -- when only a few samples of a random quantity are available from model predictions or replicate experiments. This report examines a classmore » of related sparse-data uncertainty representation and inference approaches that are relatively simple, inexpensive, and effective. Tradeoffs between the methods' conservatism, reliability, and risk versus number of data samples (cost) are quantified with multi-attribute metrics use d to assess method performance for conservative estimation of two representative quantities: central 95% of response; and 10 -4 probability of exceeding a response threshold in a tail of the distribution. Each method's performance is characterized with 10,000 random trials on a large number of diverse and challenging distributions. The best method and number of samples to use in a given circumstance depends on the uncertainty quantity to be estimated, the PDF character, and the desired reliability of bounding the true value. On the basis of this large data base and study, a strategy is proposed for selecting the method and number of samples for attaining reasonable credibility levels in bounding these types of quantities when sparse samples of random variables or functions are available from experiments or simulations.« less
Wang, Dan; Silkie, Sarah S; Nelson, Kara L; Wuertz, Stefan
2010-09-01
Cultivation- and library-independent, quantitative PCR-based methods have become the method of choice in microbial source tracking. However, these qPCR assays are not 100% specific and sensitive for the target sequence in their respective hosts' genome. The factors that can lead to false positive and false negative information in qPCR results are well defined. It is highly desirable to have a way of removing such false information to estimate the true concentration of host-specific genetic markers and help guide the interpretation of environmental monitoring studies. Here we propose a statistical model based on the Law of Total Probability to predict the true concentration of these markers. The distributions of the probabilities of obtaining false information are estimated from representative fecal samples of known origin. Measurement error is derived from the sample precision error of replicated qPCR reactions. Then, the Monte Carlo method is applied to sample from these distributions of probabilities and measurement error. The set of equations given by the Law of Total Probability allows one to calculate the distribution of true concentrations, from which their expected value, confidence interval and other statistical characteristics can be easily evaluated. The output distributions of predicted true concentrations can then be used as input to watershed-wide total maximum daily load determinations, quantitative microbial risk assessment and other environmental models. This model was validated by both statistical simulations and real world samples. It was able to correct the intrinsic false information associated with qPCR assays and output the distribution of true concentrations of Bacteroidales for each animal host group. Model performance was strongly affected by the precision error. It could perform reliably and precisely when the standard deviation of the precision error was small (≤ 0.1). Further improvement on the precision of sample processing and qPCR reaction would greatly improve the performance of the model. This methodology, built upon Bacteroidales assays, is readily transferable to any other microbial source indicator where a universal assay for fecal sources of that indicator exists. Copyright © 2010 Elsevier Ltd. All rights reserved.
Use of Internet panels to conduct surveys.
Hays, Ron D; Liu, Honghu; Kapteyn, Arie
2015-09-01
The use of Internet panels to collect survey data is increasing because it is cost-effective, enables access to large and diverse samples quickly, takes less time than traditional methods to obtain data for analysis, and the standardization of the data collection process makes studies easy to replicate. A variety of probability-based panels have been created, including Telepanel/CentERpanel, Knowledge Networks (now GFK KnowledgePanel), the American Life Panel, the Longitudinal Internet Studies for the Social Sciences panel, and the Understanding America Study panel. Despite the advantage of having a known denominator (sampling frame), the probability-based Internet panels often have low recruitment participation rates, and some have argued that there is little practical difference between opting out of a probability sample and opting into a nonprobability (convenience) Internet panel. This article provides an overview of both probability-based and convenience panels, discussing potential benefits and cautions for each method, and summarizing the approaches used to weight panel respondents in order to better represent the underlying population. Challenges of using Internet panel data are discussed, including false answers, careless responses, giving the same answer repeatedly, getting multiple surveys from the same respondent, and panelists being members of multiple panels. More is to be learned about Internet panels generally and about Web-based data collection, as well as how to evaluate data collected using mobile devices and social-media platforms.
An information diffusion technique to assess integrated hazard risks.
Huang, Chongfu; Huang, Yundong
2018-02-01
An integrated risk is a scene in the future associated with some adverse incident caused by multiple hazards. An integrated probability risk is the expected value of disaster. Due to the difficulty of assessing an integrated probability risk with a small sample, weighting methods and copulas are employed to avoid this obstacle. To resolve the problem, in this paper, we develop the information diffusion technique to construct a joint probability distribution and a vulnerability surface. Then, an integrated risk can be directly assessed by using a small sample. A case of an integrated risk caused by flood and earthquake is given to show how the suggested technique is used to assess the integrated risk of annual property loss. Copyright © 2017 Elsevier Inc. All rights reserved.
He, Fu-yuan; Deng, Kai-wen; Huang, Sheng; Liu, Wen-long; Shi, Ji-lian
2013-09-01
The paper aims to elucidate and establish a new mathematic model: the total quantum statistical moment standard similarity (TQSMSS) on the base of the original total quantum statistical moment model and to illustrate the application of the model to medical theoretical research. The model was established combined with the statistical moment principle and the normal distribution probability density function properties, then validated and illustrated by the pharmacokinetics of three ingredients in Buyanghuanwu decoction and of three data analytical method for them, and by analysis of chromatographic fingerprint for various extracts with different solubility parameter solvents dissolving the Buyanghanwu-decoction extract. The established model consists of four mainly parameters: (1) total quantum statistical moment similarity as ST, an overlapped area by two normal distribution probability density curves in conversion of the two TQSM parameters; (2) total variability as DT, a confidence limit of standard normal accumulation probability which is equal to the absolute difference value between the two normal accumulation probabilities within integration of their curve nodical; (3) total variable probability as 1-Ss, standard normal distribution probability within interval of D(T); (4) total variable probability (1-beta)alpha and (5) stable confident probability beta(1-alpha): the correct probability to make positive and negative conclusions under confident coefficient alpha. With the model, we had analyzed the TQSMS similarities of pharmacokinetics of three ingredients in Buyanghuanwu decoction and of three data analytical methods for them were at range of 0.3852-0.9875 that illuminated different pharmacokinetic behaviors of each other; and the TQSMS similarities (ST) of chromatographic fingerprint for various extracts with different solubility parameter solvents dissolving Buyanghuanwu-decoction-extract were at range of 0.6842-0.999 2 that showed different constituents with various solvent extracts. The TQSMSS can characterize the sample similarity, by which we can quantitate the correct probability with the test of power under to make positive and negative conclusions no matter the samples come from same population under confident coefficient a or not, by which we can realize an analysis at both macroscopic and microcosmic levels, as an important similar analytical method for medical theoretical research.
Bayesian analysis of rare events
DOE Office of Scientific and Technical Information (OSTI.GOV)
Straub, Daniel, E-mail: straub@tum.de; Papaioannou, Iason; Betz, Wolfgang
2016-06-01
In many areas of engineering and science there is an interest in predicting the probability of rare events, in particular in applications related to safety and security. Increasingly, such predictions are made through computer models of physical systems in an uncertainty quantification framework. Additionally, with advances in IT, monitoring and sensor technology, an increasing amount of data on the performance of the systems is collected. This data can be used to reduce uncertainty, improve the probability estimates and consequently enhance the management of rare events and associated risks. Bayesian analysis is the ideal method to include the data into themore » probabilistic model. It ensures a consistent probabilistic treatment of uncertainty, which is central in the prediction of rare events, where extrapolation from the domain of observation is common. We present a framework for performing Bayesian updating of rare event probabilities, termed BUS. It is based on a reinterpretation of the classical rejection-sampling approach to Bayesian analysis, which enables the use of established methods for estimating probabilities of rare events. By drawing upon these methods, the framework makes use of their computational efficiency. These methods include the First-Order Reliability Method (FORM), tailored importance sampling (IS) methods and Subset Simulation (SuS). In this contribution, we briefly review these methods in the context of the BUS framework and investigate their applicability to Bayesian analysis of rare events in different settings. We find that, for some applications, FORM can be highly efficient and is surprisingly accurate, enabling Bayesian analysis of rare events with just a few model evaluations. In a general setting, BUS implemented through IS and SuS is more robust and flexible.« less
The relationship between species detection probability and local extinction probability
Alpizar-Jara, R.; Nichols, J.D.; Hines, J.E.; Sauer, J.R.; Pollock, K.H.; Rosenberry, C.S.
2004-01-01
In community-level ecological studies, generally not all species present in sampled areas are detected. Many authors have proposed the use of estimation methods that allow detection probabilities that are < 1 and that are heterogeneous among species. These methods can also be used to estimate community-dynamic parameters such as species local extinction probability and turnover rates (Nichols et al. Ecol Appl 8:1213-1225; Conserv Biol 12:1390-1398). Here, we present an ad hoc approach to estimating community-level vital rates in the presence of joint heterogeneity of detection probabilities and vital rates. The method consists of partitioning the number of species into two groups using the detection frequencies and then estimating vital rates (e.g., local extinction probabilities) for each group. Estimators from each group are combined in a weighted estimator of vital rates that accounts for the effect of heterogeneity. Using data from the North American Breeding Bird Survey, we computed such estimates and tested the hypothesis that detection probabilities and local extinction probabilities were negatively related. Our analyses support the hypothesis that species detection probability covaries negatively with local probability of extinction and turnover rates. A simulation study was conducted to assess the performance of vital parameter estimators as well as other estimators relevant to questions about heterogeneity, such as coefficient of variation of detection probabilities and proportion of species in each group. Both the weighted estimator suggested in this paper and the original unweighted estimator for local extinction probability performed fairly well and provided no basis for preferring one to the other.
Probability sampling in legal cases: Kansas cellphone users
NASA Astrophysics Data System (ADS)
Kadane, Joseph B.
2012-10-01
Probability sampling is a standard statistical technique. This article introduces the basic ideas of probability sampling, and shows in detail how probability sampling was used in a particular legal case.
Measurement of absolute gamma emission probabilities
NASA Astrophysics Data System (ADS)
Sumithrarachchi, Chandana S.; Rengan, Krish; Griffin, Henry C.
2003-06-01
The energies and emission probabilities (intensities) of gamma-rays emitted in radioactive decays of particular nuclides are the most important characteristics by which to quantify mixtures of radionuclides. Often, quantification is limited by uncertainties in measured intensities. A technique was developed to reduce these uncertainties. The method involves obtaining a pure sample of a nuclide using radiochemical techniques, and using appropriate fractions for beta and gamma measurements. The beta emission rates were measured using a liquid scintillation counter, and the gamma emission rates were measured with a high-purity germanium detector. Results were combined to obtain absolute gamma emission probabilities. All sources of uncertainties greater than 0.1% were examined. The method was tested with 38Cl and 88Rb.
Estimating Model Probabilities using Thermodynamic Markov Chain Monte Carlo Methods
NASA Astrophysics Data System (ADS)
Ye, M.; Liu, P.; Beerli, P.; Lu, D.; Hill, M. C.
2014-12-01
Markov chain Monte Carlo (MCMC) methods are widely used to evaluate model probability for quantifying model uncertainty. In a general procedure, MCMC simulations are first conducted for each individual model, and MCMC parameter samples are then used to approximate marginal likelihood of the model by calculating the geometric mean of the joint likelihood of the model and its parameters. It has been found the method of evaluating geometric mean suffers from the numerical problem of low convergence rate. A simple test case shows that even millions of MCMC samples are insufficient to yield accurate estimation of the marginal likelihood. To resolve this problem, a thermodynamic method is used to have multiple MCMC runs with different values of a heating coefficient between zero and one. When the heating coefficient is zero, the MCMC run is equivalent to a random walk MC in the prior parameter space; when the heating coefficient is one, the MCMC run is the conventional one. For a simple case with analytical form of the marginal likelihood, the thermodynamic method yields more accurate estimate than the method of using geometric mean. This is also demonstrated for a case of groundwater modeling with consideration of four alternative models postulated based on different conceptualization of a confining layer. This groundwater example shows that model probabilities estimated using the thermodynamic method are more reasonable than those obtained using the geometric method. The thermodynamic method is general, and can be used for a wide range of environmental problem for model uncertainty quantification.
Detection of a Serum Siderophore by LC-MS/MS as a Potential Biomarker of Invasive Aspergillosis
Carroll, Cassandra S.; Amankwa, Lawrence N.; Pinto, Linda J.; Fuller, Jeffrey D.; Moore, Margo M.
2016-01-01
Invasive aspergillosis (IA) is a life-threatening systemic mycosis caused primarily by Aspergillus fumigatus. Early diagnosis of IA is based, in part, on an immunoassay for circulating fungal cell wall carbohydrate, galactomannan (GM). However, a wide range of sensitivity and specificity rates have been reported for the GM test across various patient populations. To obtain iron in vivo, A. fumigatus secretes the siderophore, N,N',N"-triacetylfusarinine C (TAFC) and we hypothesize that TAFC may represent a possible biomarker for early detection of IA. We developed an ultra performance liquid chromatography tandem mass spectrometry (UPLC-MS/MS) method for TAFC analysis from serum, and measured TAFC in serum samples collected from patients at risk for IA. The method showed lower and upper limits of quantitation (LOQ) of 5 ng/ml and 750 ng/ml, respectively, and complete TAFC recovery from spiked serum. As proof of concept, we evaluated 76 serum samples from 58 patients with suspected IA that were investigated for the presence of GM. Fourteen serum samples obtained from 11 patients diagnosed with probable or proven IA were also analyzed for the presence of TAFC. Control sera (n = 16) were analyzed to establish a TAFC cut-off value (≥6 ng/ml). Of the 36 GM-positive samples (≥0.5 GM index) from suspected IA patients, TAFC was considered positive in 25 (69%). TAFC was also found in 28 additional GM-negative samples. TAFC was detected in 4 of the 14 samples (28%) from patients with proven/probable aspergillosis. Log-transformed TAFC and GM values from patients with proven/probable IA, healthy individuals and SLE patients showed a significant correlation with a Pearson r value of 0.77. In summary, we have developed a method for the detection of TAFC in serum that revealed this fungal product in the sera of patients at risk for invasive aspergillosis. A prospective study is warranted to determine whether this method provides improved early detection of IA. PMID:26974544
An Alternative View of Forest Sampling
Francis A. Roesch; Edwin J. Green; Charles T. Scott
1993-01-01
A generalized concept is presented for all of the commonly used methods of forest sampling. The concept views the forest as a two-dimensional picture which is cut up into pieces like a jigsaw puzzle, with the pieces defined by the individual selection probabilities of the trees in the forest. This concept results in a finite number of independently selected sample...
On Two-Stage Multiple Comparison Procedures When There Are Unequal Sample Sizes in the First Stage.
ERIC Educational Resources Information Center
Wilcox, Rand R.
1984-01-01
Two stage multiple-comparison procedures give an exact solution to problems of power and Type I errors, but require equal sample sizes in the first stage. This paper suggests a method of evaluating the experimentwise Type I error probability when the first stage has unequal sample sizes. (Author/BW)
Mohammadkhani, Parvaneh; Khanipour, Hamid; Azadmehr, Hedieh; Mobramm, Ardeshir; Naseri, Esmaeil
2015-01-01
The aim of this study was to evaluate suicide probability in Iranian males with substance abuse or dependence disorder and to investigate the predictors of suicide probability based on trait mindfulness, reasons for living and severity of general psychiatric symptoms. Participants were 324 individuals with substance abuse or dependence in an outpatient setting and prison. Reasons for living questionnaire, Mindfulness Attention Awareness Scale and Suicide probability Scale were used as instruments. Sample was selected based on convenience sampling method. Data were analyzed using SPSS and AMOS. The life-time prevalence of suicide attempt in the outpatient setting was35% and it was 42% in the prison setting. Suicide probability in the prison setting was significantly higher than in the outpatient setting (p<0.001). The severity of general symptom strongly correlated with suicide probability. Trait mindfulness, not reasons for living beliefs, had a mediating effect in the relationship between the severity of general symptoms and suicide probability. Fear of social disapproval, survival and coping beliefs and child-related concerns significantly predicted suicide probability (p<0.001). It could be suggested that trait mindfulness was more effective in preventing suicide probability than beliefs about reasons for living in individuals with substance abuse or dependence disorders. The severity of general symptom should be regarded as an important risk factor of suicide probability.
Jimenez, Maribel; Chaidez, Cristobal
2012-07-01
Monitoring of waterborne pathogens is improved by using concentration methods prior to detection; however, direct microbial enumeration is desired to study microbial ecology and human health risks. The aim of this work was to determine Salmonella presence in river water with an ultrafiltration system coupled with the ISO 6579:1993 isolation standard method (UFS-ISO). Most probable number (MPN) method was used directly in water samples to estimate Salmonella populations. Additionally, the effect between Salmonella determination and water turbidity was evaluated. Ten liters or three tenfold dilutions (1, 0.1, and 0.01 mL) of water were processed for Salmonella detection and estimation by the UFS-ISO and MPN methods, respectively. A total of 84 water samples were tested, and Salmonella was confirmed in 64/84 (76%) and 38/84 (44%) when UFS-ISO and MPN were used, respectively. Salmonella populations were less than 5 × 10(3) MPN/L in 73/84 of samples evaluated (87%), and only three (3.5%) showed contamination with numbers greater than 4.5 × 10(4) MPN/L. Water turbidity did not affect Salmonella determination regardless of the performed method. These findings suggest that Salmonella abundance in Sinaloa rivers is not a health risk for human infections in spite of its persistence. Thus, choosing the appropriate strategy to study Salmonella in river water samples is necessary to clarify its behavior and transport in the environment.
A linear programming model for protein inference problem in shotgun proteomics.
Huang, Ting; He, Zengyou
2012-11-15
Assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is an important issue in shotgun proteomics. The objective of protein inference is to find a subset of proteins that are truly present in the sample. Although many methods have been proposed for protein inference, several issues such as peptide degeneracy still remain unsolved. In this article, we present a linear programming model for protein inference. In this model, we use a transformation of the joint probability that each peptide/protein pair is present in the sample as the variable. Then, both the peptide probability and protein probability can be expressed as a formula in terms of the linear combination of these variables. Based on this simple fact, the protein inference problem is formulated as an optimization problem: minimize the number of proteins with non-zero probabilities under the constraint that the difference between the calculated peptide probability and the peptide probability generated from peptide identification algorithms should be less than some threshold. This model addresses the peptide degeneracy issue by forcing some joint probability variables involving degenerate peptides to be zero in a rigorous manner. The corresponding inference algorithm is named as ProteinLP. We test the performance of ProteinLP on six datasets. Experimental results show that our method is competitive with the state-of-the-art protein inference algorithms. The source code of our algorithm is available at: https://sourceforge.net/projects/prolp/. zyhe@dlut.edu.cn. Supplementary data are available at Bioinformatics Online.
Evaluating impacts using a BACI design, ratios, and a Bayesian approach with a focus on restoration.
Conner, Mary M; Saunders, W Carl; Bouwes, Nicolaas; Jordan, Chris
2015-10-01
Before-after-control-impact (BACI) designs are an effective method to evaluate natural and human-induced perturbations on ecological variables when treatment sites cannot be randomly chosen. While effect sizes of interest can be tested with frequentist methods, using Bayesian Markov chain Monte Carlo (MCMC) sampling methods, probabilities of effect sizes, such as a ≥20 % increase in density after restoration, can be directly estimated. Although BACI and Bayesian methods are used widely for assessing natural and human-induced impacts for field experiments, the application of hierarchal Bayesian modeling with MCMC sampling to BACI designs is less common. Here, we combine these approaches and extend the typical presentation of results with an easy to interpret ratio, which provides an answer to the main study question-"How much impact did a management action or natural perturbation have?" As an example of this approach, we evaluate the impact of a restoration project, which implemented beaver dam analogs, on survival and density of juvenile steelhead. Results indicated the probabilities of a ≥30 % increase were high for survival and density after the dams were installed, 0.88 and 0.99, respectively, while probabilities for a higher increase of ≥50 % were variable, 0.17 and 0.82, respectively. This approach demonstrates a useful extension of Bayesian methods that can easily be generalized to other study designs from simple (e.g., single factor ANOVA, paired t test) to more complicated block designs (e.g., crossover, split-plot). This approach is valuable for estimating the probabilities of restoration impacts or other management actions.
Löfström, Charlotta; Knutsson, Rickard; Axelsson, Charlotta Engdahl; Rådström, Peter
2004-01-01
A PCR procedure has been developed for routine analysis of viable Salmonella spp. in feed samples. The objective was to develop a simple PCR-compatible enrichment procedure to enable DNA amplification without any sample pretreatment such as DNA extraction or cell lysis. PCR inhibition by 14 different feed samples and natural background flora was circumvented by the use of the DNA polymerase Tth. This DNA polymerase was found to exhibit a high level of resistance to PCR inhibitors present in these feed samples compared to DyNAzyme II, FastStart Taq, Platinum Taq, Pwo, rTth, Taq, and Tfl. The specificity of the Tth assay was confirmed by testing 101 Salmonella and 43 non-Salmonella strains isolated from feed and food samples. A sample preparation method based on culture enrichment in buffered peptone water and DNA amplification with Tth DNA polymerase was developed. The probability of detecting small numbers of salmonellae in feed, in the presence of natural background flora, was accurately determined and found to follow a logistic regression model. From this model, the probability of detecting 1 CFU per 25 g of feed in artificially contaminated soy samples was calculated and found to be 0.81. The PCR protocol was evaluated on 155 naturally contaminated feed samples and compared to an established culture-based method, NMKL-71. Eight percent of the samples were positive by PCR, compared with 3% with the conventional method. The reasons for the differences in sensitivity are discussed. Use of this method in the routine analysis of animal feed samples would improve safety in the food chain. PMID:14711627
[Determination of wine original regions using information fusion of NIR and MIR spectroscopy].
Xiang, Ling-Li; Li, Meng-Hua; Li, Jing-Mingz; Li, Jun-Hui; Zhang, Lu-Da; Zhao, Long-Lian
2014-10-01
Geographical origins of wine grapes are significant factors affecting wine quality and wine prices. Tasters' evaluation is a good method but has some limitations. It is important to discriminate different wine original regions quickly and accurately. The present paper proposed a method to determine wine original regions based on Bayesian information fusion that fused near-infrared (NIR) transmission spectra information and mid-infrared (MIR) ATR spectra information of wines. This method improved the determination results by expanding the sources of analysis information. NIR spectra and MIR spectra of 153 wine samples from four different regions of grape growing were collected by near-infrared and mid-infrared Fourier transform spe trometer separately. These four different regions are Huailai, Yantai, Gansu and Changli, which areall typical geographical originals for Chinese wines. NIR and MIR discriminant models for wine regions were established using partial least squares discriminant analysis (PLS-DA) based on NIR spectra and MIR spectra separately. In PLS-DA, the regions of wine samples are presented in group of binary code. There are four wine regions in this paper, thereby using four nodes standing for categorical variables. The output nodes values for each sample in NIR and MIR models were normalized first. These values stand for the probabilities of each sample belonging to each category. They seemed as the input to the Bayesian discriminant formula as a priori probability value. The probabilities were substituteed into the Bayesian formula to get posterior probabilities, by which we can judge the new class characteristics of these samples. Considering the stability of PLS-DA models, all the wine samples were divided into calibration sets and validation sets randomly for ten times. The results of NIR and MIR discriminant models of four wine regions were as follows: the average accuracy rates of calibration sets were 78.21% (NIR) and 82.57% (MIR), and the average accuracy rates of validation sets were 82.50% (NIR) and 81.98% (MIR). After using the method proposed in this paper, the accuracy rates of calibration and validation changed to 87.11% and 90.87% separately, which all achieved better results of determination than individual spectroscopy. These results suggest that Bayesian information fusion of NIR and MIR spectra is feasible for fast identification of wine original regions.
Determining the authenticity of athlete urine in doping control by DNA analysis.
Devesse, Laurence; Syndercombe Court, Denise; Cowan, David
2015-10-01
The integrity of urine samples collected from athletes for doping control is essential. The authenticity of samples may be contested, leading to the need for a robust sample identification method. DNA typing using short tandem repeats (STR) can be used for identification purposes, but its application to cellular DNA in urine has so far been limited. Here, a reliable and accurate method is reported for the successful identification of urine samples, using reduced final extraction volumes and the STR multiplex kit, Promega® PowerPlex ESI 17, with capillary electrophoretic characterisation of the alleles. Full DNA profiles were obtained for all samples (n = 20) stored for less than 2 days at 4 °C. The effect of different storage conditions on yield of cellular DNA and probability of obtaining a full profile were also investigated. Storage for 21 days at 4 °C resulted in allelic drop-out in some samples, but the random match probabilities obtained demonstrate the high power of discrimination achieved through targeting a large number of STRs. The best solution for long-term storage was centrifugation and removal of supernatant prior to freezing at -20 °C. The method is robust enough for incorporation into current anti-doping protocols, and was successfully applied to 44 athlete samples for anti-doping testing with 100% concordant typing. Copyright © 2015 John Wiley & Sons, Ltd.
Probability machines: consistent probability estimation using nonparametric learning machines.
Malley, J D; Kruppa, J; Dasgupta, A; Malley, K G; Ziegler, A
2012-01-01
Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem. The aim of this paper is to show how random forests and nearest neighbors can be used for consistent estimation of individual probabilities. Two random forest algorithms and two nearest neighbor algorithms are described in detail for estimation of individual probabilities. We discuss the consistency of random forests, nearest neighbors and other learning machines in detail. We conduct a simulation study to illustrate the validity of the methods. We exemplify the algorithms by analyzing two well-known data sets on the diagnosis of appendicitis and the diagnosis of diabetes in Pima Indians. Simulations demonstrate the validity of the method. With the real data application, we show the accuracy and practicality of this approach. We provide sample code from R packages in which the probability estimation is already available. This means that all calculations can be performed using existing software. Random forest algorithms as well as nearest neighbor approaches are valid machine learning methods for estimating individual probabilities for binary responses. Freely available implementations are available in R and may be used for applications.
POF-Darts: Geometric adaptive sampling for probability of failure
Ebeida, Mohamed S.; Mitchell, Scott A.; Swiler, Laura P.; ...
2016-06-18
We introduce a novel technique, POF-Darts, to estimate the Probability Of Failure based on random disk-packing in the uncertain parameter space. POF-Darts uses hyperplane sampling to explore the unexplored part of the uncertain space. We use the function evaluation at a sample point to determine whether it belongs to failure or non-failure regions, and surround it with a protection sphere region to avoid clustering. We decompose the domain into Voronoi cells around the function evaluations as seeds and choose the radius of the protection sphere depending on the local Lipschitz continuity. As sampling proceeds, regions uncovered with spheres will shrink,more » improving the estimation accuracy. After exhausting the function evaluation budget, we build a surrogate model using the function evaluations associated with the sample points and estimate the probability of failure by exhaustive sampling of that surrogate. In comparison to other similar methods, our algorithm has the advantages of decoupling the sampling step from the surrogate construction one, the ability to reach target POF values with fewer samples, and the capability of estimating the number and locations of disconnected failure regions, not just the POF value. Furthermore, we present various examples to demonstrate the efficiency of our novel approach.« less
Martinez, Marie-José; Durand, Benoit; Calavas, Didier; Ducrot, Christian
2010-06-01
Demonstrating disease freedom is becoming important in different fields including animal disease control. Most methods consider sampling only from a homogeneous population in which each animal has the same probability of becoming infected. In this paper, we propose a new methodology to calculate the probability of detecting the disease if it is present in a heterogeneous population of small size with potentially different risk groups, differences in risk being defined using relative risks. To calculate this probability, for each possible arrangement of the infected animals in the different groups, the probability that all the animals tested are test-negative given this arrangement is multiplied by the probability that this arrangement occurs. The probability formula is developed using the assumption of a perfect test and hypergeometric sampling for finite small size populations. The methodology is applied to scrapie, a disease affecting small ruminants and characterized in sheep by a strong genetic susceptibility defining different risk groups. It illustrates that the genotypes of the tested animals influence heavily the confidence level of detecting scrapie. The results present the statistical power for substantiating disease freedom in a small heterogeneous population as a function of the design prevalence, the structure of the sample tested, the structure of the herd and the associated relative risks. (c) 2010 Elsevier B.V. All rights reserved.
Probability bounds analysis for nonlinear population ecology models.
Enszer, Joshua A; Andrei Măceș, D; Stadtherr, Mark A
2015-09-01
Mathematical models in population ecology often involve parameters that are empirically determined and inherently uncertain, with probability distributions for the uncertainties not known precisely. Propagating such imprecise uncertainties rigorously through a model to determine their effect on model outputs can be a challenging problem. We illustrate here a method for the direct propagation of uncertainties represented by probability bounds though nonlinear, continuous-time, dynamic models in population ecology. This makes it possible to determine rigorous bounds on the probability that some specified outcome for a population is achieved, which can be a core problem in ecosystem modeling for risk assessment and management. Results can be obtained at a computational cost that is considerably less than that required by statistical sampling methods such as Monte Carlo analysis. The method is demonstrated using three example systems, with focus on a model of an experimental aquatic food web subject to the effects of contamination by ionic liquids, a new class of potentially important industrial chemicals. Copyright © 2015. Published by Elsevier Inc.
Aitken, C G
1999-07-01
It is thought that, in a consignment of discrete units, a certain proportion of the units contain illegal material. A sample of the consignment is to be inspected. Various methods for the determination of the sample size are compared. The consignment will be considered as a random sample from some super-population of units, a certain proportion of which contain drugs. For large consignments, a probability distribution, known as the beta distribution, for the proportion of the consignment which contains illegal material is obtained. This distribution is based on prior beliefs about the proportion. Under certain specific conditions the beta distribution gives the same numerical results as an approach based on the binomial distribution. The binomial distribution provides a probability for the number of units in a sample which contain illegal material, conditional on knowing the proportion of the consignment which contains illegal material. This is in contrast to the beta distribution which provides probabilities for the proportion of a consignment which contains illegal material, conditional on knowing the number of units in the sample which contain illegal material. The interpretation when the beta distribution is used is much more intuitively satisfactory. It is also much more flexible in its ability to cater for prior beliefs which may vary given the different circumstances of different crimes. For small consignments, a distribution, known as the beta-binomial distribution, for the number of units in the consignment which are found to contain illegal material, is obtained, based on prior beliefs about the number of units in the consignment which are thought to contain illegal material. As with the beta and binomial distributions for large samples, it is shown that, in certain specific conditions, the beta-binomial and hypergeometric distributions give the same numerical results. However, the beta-binomial distribution, as with the beta distribution, has a more intuitively satisfactory interpretation and greater flexibility. The beta and the beta-binomial distributions provide methods for the determination of the minimum sample size to be taken from a consignment in order to satisfy a certain criterion. The criterion requires the specification of a proportion and a probability.
Erus, Guray; Zacharaki, Evangelia I; Davatzikos, Christos
2014-04-01
This paper presents a method for capturing statistical variation of normal imaging phenotypes, with emphasis on brain structure. The method aims to estimate the statistical variation of a normative set of images from healthy individuals, and identify abnormalities as deviations from normality. A direct estimation of the statistical variation of the entire volumetric image is challenged by the high-dimensionality of images relative to smaller sample sizes. To overcome this limitation, we iteratively sample a large number of lower dimensional subspaces that capture image characteristics ranging from fine and localized to coarser and more global. Within each subspace, a "target-specific" feature selection strategy is applied to further reduce the dimensionality, by considering only imaging characteristics present in a test subject's images. Marginal probability density functions of selected features are estimated through PCA models, in conjunction with an "estimability" criterion that limits the dimensionality of estimated probability densities according to available sample size and underlying anatomy variation. A test sample is iteratively projected to the subspaces of these marginals as determined by PCA models, and its trajectory delineates potential abnormalities. The method is applied to segmentation of various brain lesion types, and to simulated data on which superiority of the iterative method over straight PCA is demonstrated. Copyright © 2014 Elsevier B.V. All rights reserved.
Erus, Guray; Zacharaki, Evangelia I.; Davatzikos, Christos
2014-01-01
This paper presents a method for capturing statistical variation of normal imaging phenotypes, with emphasis on brain structure. The method aims to estimate the statistical variation of a normative set of images from healthy individuals, and identify abnormalities as deviations from normality. A direct estimation of the statistical variation of the entire volumetric image is challenged by the high-dimensionality of images relative to smaller sample sizes. To overcome this limitation, we iteratively sample a large number of lower dimensional subspaces that capture image characteristics ranging from fine and localized to coarser and more global. Within each subspace, a “target-specific” feature selection strategy is applied to further reduce the dimensionality, by considering only imaging characteristics present in a test subject’s images. Marginal probability density functions of selected features are estimated through PCA models, in conjunction with an “estimability” criterion that limits the dimensionality of estimated probability densities according to available sample size and underlying anatomy variation. A test sample is iteratively projected to the subspaces of these marginals as determined by PCA models, and its trajectory delineates potential abnormalities. The method is applied to segmentation of various brain lesion types, and to simulated data on which superiority of the iterative method over straight PCA is demonstrated. PMID:24607564
Latin hypercube approach to estimate uncertainty in ground water vulnerability
Gurdak, J.J.; McCray, J.E.; Thyne, G.; Qi, S.L.
2007-01-01
A methodology is proposed to quantify prediction uncertainty associated with ground water vulnerability models that were developed through an approach that coupled multivariate logistic regression with a geographic information system (GIS). This method uses Latin hypercube sampling (LHS) to illustrate the propagation of input error and estimate uncertainty associated with the logistic regression predictions of ground water vulnerability. Central to the proposed method is the assumption that prediction uncertainty in ground water vulnerability models is a function of input error propagation from uncertainty in the estimated logistic regression model coefficients (model error) and the values of explanatory variables represented in the GIS (data error). Input probability distributions that represent both model and data error sources of uncertainty were simultaneously sampled using a Latin hypercube approach with logistic regression calculations of probability of elevated nonpoint source contaminants in ground water. The resulting probability distribution represents the prediction intervals and associated uncertainty of the ground water vulnerability predictions. The method is illustrated through a ground water vulnerability assessment of the High Plains regional aquifer. Results of the LHS simulations reveal significant prediction uncertainties that vary spatially across the regional aquifer. Additionally, the proposed method enables a spatial deconstruction of the prediction uncertainty that can lead to improved prediction of ground water vulnerability. ?? 2007 National Ground Water Association.
Perry, Russell W.; Kirsch, Joseph E.; Hendrix, A. Noble
2016-06-17
Resource managers rely on abundance or density metrics derived from beach seine surveys to make vital decisions that affect fish population dynamics and assemblage structure. However, abundance and density metrics may be biased by imperfect capture and lack of geographic closure during sampling. Currently, there is considerable uncertainty about the capture efficiency of juvenile Chinook salmon (Oncorhynchus tshawytscha) by beach seines. Heterogeneity in capture can occur through unrealistic assumptions of closure and from variation in the probability of capture caused by environmental conditions. We evaluated the assumptions of closure and the influence of environmental conditions on capture efficiency and abundance estimates of Chinook salmon from beach seining within the Sacramento–San Joaquin Delta and the San Francisco Bay. Beach seine capture efficiency was measured using a stratified random sampling design combined with open and closed replicate depletion sampling. A total of 56 samples were collected during the spring of 2014. To assess variability in capture probability and the absolute abundance of juvenile Chinook salmon, beach seine capture efficiency data were fitted to the paired depletion design using modified N-mixture models. These models allowed us to explicitly test the closure assumption and estimate environmental effects on the probability of capture. We determined that our updated method allowing for lack of closure between depletion samples drastically outperformed traditional data analysis that assumes closure among replicate samples. The best-fit model (lowest-valued Akaike Information Criterion model) included the probability of fish being available for capture (relaxed closure assumption), capture probability modeled as a function of water velocity and percent coverage of fine sediment, and abundance modeled as a function of sample area, temperature, and water velocity. Given that beach seining is a ubiquitous sampling technique for many species, our improved sampling design and analysis could provide significant improvements in density and abundance estimation.
The exact probability distribution of the rank product statistics for replicated experiments.
Eisinga, Rob; Breitling, Rainer; Heskes, Tom
2013-03-18
The rank product method is a widely accepted technique for detecting differentially regulated genes in replicated microarray experiments. To approximate the sampling distribution of the rank product statistic, the original publication proposed a permutation approach, whereas recently an alternative approximation based on the continuous gamma distribution was suggested. However, both approximations are imperfect for estimating small tail probabilities. In this paper we relate the rank product statistic to number theory and provide a derivation of its exact probability distribution and the true tail probabilities. Copyright © 2013 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.
Taylor M. Wilcox; Kevin S. McKelvey; Michael K. Young; Adam J. Sepulveda; Bradley B. Shepard; Stephen F. Jane; Andrew R. Whiteley; Winsor H. Lowe; Michael K. Schwartz
2016-01-01
Environmental DNA sampling (eDNA) has emerged as a powerful tool for detecting aquatic animals. Previous research suggests that eDNA methods are substantially more sensitive than traditional sampling. However, the factors influencing eDNA detection and the resulting sampling costs are still not well understood. Here we use multiple experiments to derive...
Rare Event Simulation in Radiation Transport
NASA Astrophysics Data System (ADS)
Kollman, Craig
This dissertation studies methods for estimating extremely small probabilities by Monte Carlo simulation. Problems in radiation transport typically involve estimating very rare events or the expected value of a random variable which is with overwhelming probability equal to zero. These problems often have high dimensional state spaces and irregular geometries so that analytic solutions are not possible. Monte Carlo simulation must be used to estimate the radiation dosage being transported to a particular location. If the area is well shielded the probability of any one particular particle getting through is very small. Because of the large number of particles involved, even a tiny fraction penetrating the shield may represent an unacceptable level of radiation. It therefore becomes critical to be able to accurately estimate this extremely small probability. Importance sampling is a well known technique for improving the efficiency of rare event calculations. Here, a new set of probabilities is used in the simulation runs. The results are multiplied by the likelihood ratio between the true and simulated probabilities so as to keep our estimator unbiased. The variance of the resulting estimator is very sensitive to which new set of transition probabilities are chosen. It is shown that a zero variance estimator does exist, but that its computation requires exact knowledge of the solution. A simple random walk with an associated killing model for the scatter of neutrons is introduced. Large deviation results for optimal importance sampling in random walks are extended to the case where killing is present. An adaptive "learning" algorithm for implementing importance sampling is given for more general Markov chain models of neutron scatter. For finite state spaces this algorithm is shown to give, with probability one, a sequence of estimates converging exponentially fast to the true solution. In the final chapter, an attempt to generalize this algorithm to a continuous state space is made. This involves partitioning the space into a finite number of cells. There is a tradeoff between additional computation per iteration and variance reduction per iteration that arises in determining the optimal grid size. All versions of this algorithm can be thought of as a compromise between deterministic and Monte Carlo methods, capturing advantages of both techniques.
Prah, Philip; Hickson, Ford; Bonell, Chris; McDaid, Lisa M; Johnson, Anne M; Wayal, Sonali; Clifton, Soazig; Sonnenberg, Pam; Nardone, Anthony; Erens, Bob; Copas, Andrew J; Riddell, Julie; Weatherburn, Peter; Mercer, Catherine H
2016-09-01
To examine sociodemographic and behavioural differences between men who have sex with men (MSM) participating in recent UK convenience surveys and a national probability sample survey. We compared 148 MSM aged 18-64 years interviewed for Britain's third National Survey of Sexual Attitudes and Lifestyles (Natsal-3) undertaken in 2010-2012, with men in the same age range participating in contemporaneous convenience surveys of MSM: 15 500 British resident men in the European MSM Internet Survey (EMIS); 797 in the London Gay Men's Sexual Health Survey; and 1234 in Scotland's Gay Men's Sexual Health Survey. Analyses compared men reporting at least one male sexual partner (past year) on similarly worded questions and multivariable analyses accounted for sociodemographic differences between the surveys. MSM in convenience surveys were younger and better educated than MSM in Natsal-3, and a larger proportion identified as gay (85%-95% vs 62%). Partner numbers were higher and same-sex anal sex more common in convenience surveys. Unprotected anal intercourse was more commonly reported in EMIS. Compared with Natsal-3, MSM in convenience surveys were more likely to report gonorrhoea diagnoses and HIV testing (both past year). Differences between the samples were reduced when restricting analysis to gay-identifying MSM. National probability surveys better reflect the population of MSM but are limited by their smaller samples of MSM. Convenience surveys recruit larger samples of MSM but tend to over-represent MSM identifying as gay and reporting more sexual risk behaviours. Because both sampling strategies have strengths and weaknesses, methods are needed to triangulate data from probability and convenience surveys. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Constructing inverse probability weights for continuous exposures: a comparison of methods.
Naimi, Ashley I; Moodie, Erica E M; Auger, Nathalie; Kaufman, Jay S
2014-03-01
Inverse probability-weighted marginal structural models with binary exposures are common in epidemiology. Constructing inverse probability weights for a continuous exposure can be complicated by the presence of outliers, and the need to identify a parametric form for the exposure and account for nonconstant exposure variance. We explored the performance of various methods to construct inverse probability weights for continuous exposures using Monte Carlo simulation. We generated two continuous exposures and binary outcomes using data sampled from a large empirical cohort. The first exposure followed a normal distribution with homoscedastic variance. The second exposure followed a contaminated Poisson distribution, with heteroscedastic variance equal to the conditional mean. We assessed six methods to construct inverse probability weights using: a normal distribution, a normal distribution with heteroscedastic variance, a truncated normal distribution with heteroscedastic variance, a gamma distribution, a t distribution (1, 3, and 5 degrees of freedom), and a quantile binning approach (based on 10, 15, and 20 exposure categories). We estimated the marginal odds ratio for a single-unit increase in each simulated exposure in a regression model weighted by the inverse probability weights constructed using each approach, and then computed the bias and mean squared error for each method. For the homoscedastic exposure, the standard normal, gamma, and quantile binning approaches performed best. For the heteroscedastic exposure, the quantile binning, gamma, and heteroscedastic normal approaches performed best. Our results suggest that the quantile binning approach is a simple and versatile way to construct inverse probability weights for continuous exposures.
Xu, Jason; Minin, Vladimir N
2015-07-01
Branching processes are a class of continuous-time Markov chains (CTMCs) with ubiquitous applications. A general difficulty in statistical inference under partially observed CTMC models arises in computing transition probabilities when the discrete state space is large or uncountable. Classical methods such as matrix exponentiation are infeasible for large or countably infinite state spaces, and sampling-based alternatives are computationally intensive, requiring integration over all possible hidden events. Recent work has successfully applied generating function techniques to computing transition probabilities for linear multi-type branching processes. While these techniques often require significantly fewer computations than matrix exponentiation, they also become prohibitive in applications with large populations. We propose a compressed sensing framework that significantly accelerates the generating function method, decreasing computational cost up to a logarithmic factor by only assuming the probability mass of transitions is sparse. We demonstrate accurate and efficient transition probability computations in branching process models for blood cell formation and evolution of self-replicating transposable elements in bacterial genomes.
Xu, Jason; Minin, Vladimir N.
2016-01-01
Branching processes are a class of continuous-time Markov chains (CTMCs) with ubiquitous applications. A general difficulty in statistical inference under partially observed CTMC models arises in computing transition probabilities when the discrete state space is large or uncountable. Classical methods such as matrix exponentiation are infeasible for large or countably infinite state spaces, and sampling-based alternatives are computationally intensive, requiring integration over all possible hidden events. Recent work has successfully applied generating function techniques to computing transition probabilities for linear multi-type branching processes. While these techniques often require significantly fewer computations than matrix exponentiation, they also become prohibitive in applications with large populations. We propose a compressed sensing framework that significantly accelerates the generating function method, decreasing computational cost up to a logarithmic factor by only assuming the probability mass of transitions is sparse. We demonstrate accurate and efficient transition probability computations in branching process models for blood cell formation and evolution of self-replicating transposable elements in bacterial genomes. PMID:26949377
Sampling probability distributions of lesions in mammograms
NASA Astrophysics Data System (ADS)
Looney, P.; Warren, L. M.; Dance, D. R.; Young, K. C.
2015-03-01
One approach to image perception studies in mammography using virtual clinical trials involves the insertion of simulated lesions into normal mammograms. To facilitate this, a method has been developed that allows for sampling of lesion positions across the cranio-caudal and medio-lateral radiographic projections in accordance with measured distributions of real lesion locations. 6825 mammograms from our mammography image database were segmented to find the breast outline. The outlines were averaged and smoothed to produce an average outline for each laterality and radiographic projection. Lesions in 3304 mammograms with malignant findings were mapped on to a standardised breast image corresponding to the average breast outline using piecewise affine transforms. A four dimensional probability distribution function was found from the lesion locations in the cranio-caudal and medio-lateral radiographic projections for calcification and noncalcification lesions. Lesion locations sampled from this probability distribution function were mapped on to individual mammograms using a piecewise affine transform which transforms the average outline to the outline of the breast in the mammogram. The four dimensional probability distribution function was validated by comparing it to the two dimensional distributions found by considering each radiographic projection and laterality independently. The correlation of the location of the lesions sampled from the four dimensional probability distribution function across radiographic projections was shown to match the correlation of the locations of the original mapped lesion locations. The current system has been implemented as a web-service on a server using the Python Django framework. The server performs the sampling, performs the mapping and returns the results in a javascript object notation format.
Fancher, Chris M.; Han, Zhen; Levin, Igor; Page, Katharine; Reich, Brian J.; Smith, Ralph C.; Wilson, Alyson G.; Jones, Jacob L.
2016-01-01
A Bayesian inference method for refining crystallographic structures is presented. The distribution of model parameters is stochastically sampled using Markov chain Monte Carlo. Posterior probability distributions are constructed for all model parameters to properly quantify uncertainty by appropriately modeling the heteroskedasticity and correlation of the error structure. The proposed method is demonstrated by analyzing a National Institute of Standards and Technology silicon standard reference material. The results obtained by Bayesian inference are compared with those determined by Rietveld refinement. Posterior probability distributions of model parameters provide both estimates and uncertainties. The new method better estimates the true uncertainties in the model as compared to the Rietveld method. PMID:27550221
A Model-Free Machine Learning Method for Risk Classification and Survival Probability Prediction.
Geng, Yuan; Lu, Wenbin; Zhang, Hao Helen
2014-01-01
Risk classification and survival probability prediction are two major goals in survival data analysis since they play an important role in patients' risk stratification, long-term diagnosis, and treatment selection. In this article, we propose a new model-free machine learning framework for risk classification and survival probability prediction based on weighted support vector machines. The new procedure does not require any specific parametric or semiparametric model assumption on data, and is therefore capable of capturing nonlinear covariate effects. We use numerous simulation examples to demonstrate finite sample performance of the proposed method under various settings. Applications to a glioma tumor data and a breast cancer gene expression survival data are shown to illustrate the new methodology in real data analysis.
Carnegie, Nicole Bohme; Wang, Rui; Novitsky, Vladimir; De Gruttola, Victor
2014-01-01
Linkage analysis is useful in investigating disease transmission dynamics and the effect of interventions on them, but estimates of probabilities of linkage between infected people from observed data can be biased downward when missingness is informative. We investigate variation in the rates at which subjects' viral genotypes link across groups defined by viral load (low/high) and antiretroviral treatment (ART) status using blood samples from household surveys in the Northeast sector of Mochudi, Botswana. The probability of obtaining a sequence from a sample varies with viral load; samples with low viral load are harder to amplify. Pairwise genetic distances were estimated from aligned nucleotide sequences of HIV-1C env gp120. It is first shown that the probability that randomly selected sequences are linked can be estimated consistently from observed data. This is then used to develop estimates of the probability that a sequence from one group links to at least one sequence from another group under the assumption of independence across pairs. Furthermore, a resampling approach is developed that accounts for the presence of correlation across pairs, with diagnostics for assessing the reliability of the method. Sequences were obtained for 65% of subjects with high viral load (HVL, n = 117), 54% of subjects with low viral load but not on ART (LVL, n = 180), and 45% of subjects on ART (ART, n = 126). The probability of linkage between two individuals is highest if both have HVL, and lowest if one has LVL and the other has LVL or is on ART. Linkage across groups is high for HVL and lower for LVL and ART. Adjustment for missing data increases the group-wise linkage rates by 40–100%, and changes the relative rates between groups. Bias in inferences regarding HIV viral linkage that arise from differential ability to genotype samples can be reduced by appropriate methods for accommodating missing data. PMID:24415932
Carnegie, Nicole Bohme; Wang, Rui; Novitsky, Vladimir; De Gruttola, Victor
2014-01-01
Linkage analysis is useful in investigating disease transmission dynamics and the effect of interventions on them, but estimates of probabilities of linkage between infected people from observed data can be biased downward when missingness is informative. We investigate variation in the rates at which subjects' viral genotypes link across groups defined by viral load (low/high) and antiretroviral treatment (ART) status using blood samples from household surveys in the Northeast sector of Mochudi, Botswana. The probability of obtaining a sequence from a sample varies with viral load; samples with low viral load are harder to amplify. Pairwise genetic distances were estimated from aligned nucleotide sequences of HIV-1C env gp120. It is first shown that the probability that randomly selected sequences are linked can be estimated consistently from observed data. This is then used to develop estimates of the probability that a sequence from one group links to at least one sequence from another group under the assumption of independence across pairs. Furthermore, a resampling approach is developed that accounts for the presence of correlation across pairs, with diagnostics for assessing the reliability of the method. Sequences were obtained for 65% of subjects with high viral load (HVL, n = 117), 54% of subjects with low viral load but not on ART (LVL, n = 180), and 45% of subjects on ART (ART, n = 126). The probability of linkage between two individuals is highest if both have HVL, and lowest if one has LVL and the other has LVL or is on ART. Linkage across groups is high for HVL and lower for LVL and ART. Adjustment for missing data increases the group-wise linkage rates by 40-100%, and changes the relative rates between groups. Bias in inferences regarding HIV viral linkage that arise from differential ability to genotype samples can be reduced by appropriate methods for accommodating missing data.
Estimating survival and breeding probability for pond-breeding amphibians: a modified robust design
Bailey, L.L.; Kendall, W.L.; Church, D.R.; Wilbur, H.M.
2004-01-01
Many studies of pond-breeding amphibians involve sampling individuals during migration to and from breeding habitats. Interpreting population processes and dynamics from these studies is difficult because (1) only a proportion of the population is observable each season, while an unknown proportion remains unobservable (e.g., non-breeding adults) and (2) not all observable animals are captured. Imperfect capture probability can be easily accommodated in capture?recapture models, but temporary transitions between observable and unobservable states, often referred to as temporary emigration, is known to cause problems in both open- and closed-population models. We develop a multistate mark?recapture (MSMR) model, using an open-robust design that permits one entry and one exit from the study area per season. Our method extends previous temporary emigration models (MSMR with an unobservable state) in two ways. First, we relax the assumption of demographic closure (no mortality) between consecutive (secondary) samples, allowing estimation of within-pond survival. Also, we add the flexibility to express survival probability of unobservable individuals (e.g., ?non-breeders?) as a function of the survival probability of observable animals while in the same, terrestrial habitat. This allows for potentially different annual survival probabilities for observable and unobservable animals. We apply our model to a relictual population of eastern tiger salamanders (Ambystoma tigrinum tigrinum). Despite small sample sizes, demographic parameters were estimated with reasonable precision. We tested several a priori biological hypotheses and found evidence for seasonal differences in pond survival. Our methods could be applied to a variety of pond-breeding species and other taxa where individuals are captured entering or exiting a common area (e.g., spawning or roosting area, hibernacula).
Mauro, John C; Loucks, Roger J; Balakrishnan, Jitendra; Raghavan, Srikanth
2007-05-21
The thermodynamics and kinetics of a many-body system can be described in terms of a potential energy landscape in multidimensional configuration space. The partition function of such a landscape can be written in terms of a density of states, which can be computed using a variety of Monte Carlo techniques. In this paper, a new self-consistent Monte Carlo method for computing density of states is described that uses importance sampling and a multiplicative update factor to achieve rapid convergence. The technique is then applied to compute the equilibrium quench probability of the various inherent structures (minima) in the landscape. The quench probability depends on both the potential energy of the inherent structure and the volume of its corresponding basin in configuration space. Finally, the methodology is extended to the isothermal-isobaric ensemble in order to compute inherent structure quench probabilities in an enthalpy landscape.
Detection of image structures using the Fisher information and the Rao metric.
Maybank, Stephen J
2004-12-01
In many detection problems, the structures to be detected are parameterized by the points of a parameter space. If the conditional probability density function for the measurements is known, then detection can be achieved by sampling the parameter space at a finite number of points and checking each point to see if the corresponding structure is supported by the data. The number of samples and the distances between neighboring samples are calculated using the Rao metric on the parameter space. The Rao metric is obtained from the Fisher information which is, in turn, obtained from the conditional probability density function. An upper bound is obtained for the probability of a false detection. The calculations are simplified in the low noise case by making an asymptotic approximation to the Fisher information. An application to line detection is described. Expressions are obtained for the asymptotic approximation to the Fisher information, the volume of the parameter space, and the number of samples. The time complexity for line detection is estimated. An experimental comparison is made with a Hough transform-based method for detecting lines.
ERIC Educational Resources Information Center
Peterson, Ivars
1991-01-01
A method that enables people to obtain the benefits of statistics and probability theory without the shortcomings of conventional methods because it is free of mathematical formulas and is easy to understand and use is described. A resampling technique called the "bootstrap" is discussed in terms of application and development. (KR)
Asquith, William H.; Kiang, Julie E.; Cohn, Timothy A.
2017-07-17
The U.S. Geological Survey (USGS), in cooperation with the U.S. Nuclear Regulatory Commission, has investigated statistical methods for probabilistic flood hazard assessment to provide guidance on very low annual exceedance probability (AEP) estimation of peak-streamflow frequency and the quantification of corresponding uncertainties using streamgage-specific data. The term “very low AEP” implies exceptionally rare events defined as those having AEPs less than about 0.001 (or 1 × 10–3 in scientific notation or for brevity 10–3). Such low AEPs are of great interest to those involved with peak-streamflow frequency analyses for critical infrastructure, such as nuclear power plants. Flood frequency analyses at streamgages are most commonly based on annual instantaneous peak streamflow data and a probability distribution fit to these data. The fitted distribution provides a means to extrapolate to very low AEPs. Within the United States, the Pearson type III probability distribution, when fit to the base-10 logarithms of streamflow, is widely used, but other distribution choices exist. The USGS-PeakFQ software, implementing the Pearson type III within the Federal agency guidelines of Bulletin 17B (method of moments) and updates to the expected moments algorithm (EMA), was specially adapted for an “Extended Output” user option to provide estimates at selected AEPs from 10–3 to 10–6. Parameter estimation methods, in addition to product moments and EMA, include L-moments, maximum likelihood, and maximum product of spacings (maximum spacing estimation). This study comprehensively investigates multiple distributions and parameter estimation methods for two USGS streamgages (01400500 Raritan River at Manville, New Jersey, and 01638500 Potomac River at Point of Rocks, Maryland). The results of this study specifically involve the four methods for parameter estimation and up to nine probability distributions, including the generalized extreme value, generalized log-normal, generalized Pareto, and Weibull. Uncertainties in streamflow estimates for corresponding AEP are depicted and quantified as two primary forms: quantile (aleatoric [random sampling] uncertainty) and distribution-choice (epistemic [model] uncertainty). Sampling uncertainties of a given distribution are relatively straightforward to compute from analytical or Monte Carlo-based approaches. Distribution-choice uncertainty stems from choices of potentially applicable probability distributions for which divergence among the choices increases as AEP decreases. Conventional goodness-of-fit statistics, such as Cramér-von Mises, and L-moment ratio diagrams are demonstrated in order to hone distribution choice. The results generally show that distribution choice uncertainty is larger than sampling uncertainty for very low AEP values.
Kuan, C H; Goh, S G; Loo, Y Y; Chang, W S; Lye, Y L; Puspanadan, S; Tang, J Y H; Nakaguchi, Y; Nishibuchi, M; Mahyudin, N A; Radu, S
2013-06-01
A total of 216 chicken offal samples (chicken liver = 72; chicken heart = 72; chicken gizzard = 72) from wet markets and hypermarkets in Selangor, Malaysia, were examined for the presence and density of Listeria monocytogenes by using a combination of the most probable number and PCR method. The prevalence of L. monocytogenes in 216 chicken offal samples examined was 26.39%, and among the positive samples, the chicken gizzard showed the highest percentage at 33.33% compared with chicken liver (25.00%) and chicken heart (20.83%). The microbial load of L. monocytogenes in chicken offal samples ranged from <3 to 93.0 most probable number per gram. The presence of L. monocytogenes in chicken offal samples may indicate that chicken offal can act as a possible vehicle for the occurrence of foodborne listeriosis. Hence, there is a need to investigate the biosafety level of chicken offal in Malaysia.
NASA Astrophysics Data System (ADS)
Yang, P.; Ng, T. L.; Yang, W.
2015-12-01
Effective water resources management depends on the reliable estimation of the uncertainty of drought events. Confidence intervals (CIs) are commonly applied to quantify this uncertainty. A CI seeks to be at the minimal length necessary to cover the true value of the estimated variable with the desired probability. In drought analysis where two or more variables (e.g., duration and severity) are often used to describe a drought, copulas have been found suitable for representing the joint probability behavior of these variables. However, the comprehensive assessment of the parameter uncertainties of copulas of droughts has been largely ignored, and the few studies that have recognized this issue have not explicitly compared the various methods to produce the best CIs. Thus, the objective of this study to compare the CIs generated using two widely applied uncertainty estimation methods, bootstrapping and Markov Chain Monte Carlo (MCMC). To achieve this objective, (1) the marginal distributions lognormal, Gamma, and Generalized Extreme Value, and the copula functions Clayton, Frank, and Plackett are selected to construct joint probability functions of two drought related variables. (2) The resulting joint functions are then fitted to 200 sets of simulated realizations of drought events with known distribution and extreme parameters and (3) from there, using bootstrapping and MCMC, CIs of the parameters are generated and compared. The effect of an informative prior on the CIs generated by MCMC is also evaluated. CIs are produced for different sample sizes (50, 100, and 200) of the simulated drought events for fitting the joint probability functions. Preliminary results assuming lognormal marginal distributions and the Clayton copula function suggest that for cases with small or medium sample sizes (~50-100), MCMC to be superior method if an informative prior exists. Where an informative prior is unavailable, for small sample sizes (~50), both bootstrapping and MCMC yield the same level of performance, and for medium sample sizes (~100), bootstrapping is better. For cases with a large sample size (~200), there is little difference between the CIs generated using bootstrapping and MCMC regardless of whether or not an informative prior exists.
Latin Hypercube Sampling (LHS) UNIX Library/Standalone
DOE Office of Scientific and Technical Information (OSTI.GOV)
2004-05-13
The LHS UNIX Library/Standalone software provides the capability to draw random samples from over 30 distribution types. It performs the sampling by a stratified sampling method called Latin Hypercube Sampling (LHS). Multiple distributions can be sampled simultaneously, with user-specified correlations amongst the input distributions, LHS UNIX Library/ Standalone provides a way to generate multi-variate samples. The LHS samples can be generated either as a callable library (e.g., from within the DAKOTA software framework) or as a standalone capability. LHS UNIX Library/Standalone uses the Latin Hypercube Sampling method (LHS) to generate samples. LHS is a constrained Monte Carlo sampling scheme. Inmore » LHS, the range of each variable is divided into non-overlapping intervals on the basis of equal probability. A sample is selected at random with respect to the probability density in each interval, If multiple variables are sampled simultaneously, then values obtained for each are paired in a random manner with the n values of the other variables. In some cases, the pairing is restricted to obtain specified correlations amongst the input variables. Many simulation codes have input parameters that are uncertain and can be specified by a distribution, To perform uncertainty analysis and sensitivity analysis, random values are drawn from the input parameter distributions, and the simulation is run with these values to obtain output values. If this is done repeatedly, with many input samples drawn, one can build up a distribution of the output as well as examine correlations between input and output variables.« less
Rare event simulation in radiation transport
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kollman, Craig
1993-10-01
This dissertation studies methods for estimating extremely small probabilities by Monte Carlo simulation. Problems in radiation transport typically involve estimating very rare events or the expected value of a random variable which is with overwhelming probability equal to zero. These problems often have high dimensional state spaces and irregular geometries so that analytic solutions are not possible. Monte Carlo simulation must be used to estimate the radiation dosage being transported to a particular location. If the area is well shielded the probability of any one particular particle getting through is very small. Because of the large number of particles involved,more » even a tiny fraction penetrating the shield may represent an unacceptable level of radiation. It therefore becomes critical to be able to accurately estimate this extremely small probability. Importance sampling is a well known technique for improving the efficiency of rare event calculations. Here, a new set of probabilities is used in the simulation runs. The results are multiple by the likelihood ratio between the true and simulated probabilities so as to keep the estimator unbiased. The variance of the resulting estimator is very sensitive to which new set of transition probabilities are chosen. It is shown that a zero variance estimator does exist, but that its computation requires exact knowledge of the solution. A simple random walk with an associated killing model for the scatter of neutrons is introduced. Large deviation results for optimal importance sampling in random walks are extended to the case where killing is present. An adaptive ``learning`` algorithm for implementing importance sampling is given for more general Markov chain models of neutron scatter. For finite state spaces this algorithm is shown to give with probability one, a sequence of estimates converging exponentially fast to the true solution.« less
Space Object Collision Probability via Monte Carlo on the Graphics Processing Unit
NASA Astrophysics Data System (ADS)
Vittaldev, Vivek; Russell, Ryan P.
2017-09-01
Fast and accurate collision probability computations are essential for protecting space assets. Monte Carlo (MC) simulation is the most accurate but computationally intensive method. A Graphics Processing Unit (GPU) is used to parallelize the computation and reduce the overall runtime. Using MC techniques to compute the collision probability is common in literature as the benchmark. An optimized implementation on the GPU, however, is a challenging problem and is the main focus of the current work. The MC simulation takes samples from the uncertainty distributions of the Resident Space Objects (RSOs) at any time during a time window of interest and outputs the separations at closest approach. Therefore, any uncertainty propagation method may be used and the collision probability is automatically computed as a function of RSO collision radii. Integration using a fixed time step and a quartic interpolation after every Runge Kutta step ensures that no close approaches are missed. Two orders of magnitude speedups over a serial CPU implementation are shown, and speedups improve moderately with higher fidelity dynamics. The tool makes the MC approach tractable on a single workstation, and can be used as a final product, or for verifying surrogate and analytical collision probability methods.
Infectious Cryptosporidium parvum oocysts in final reclaimed effluent
Gennaccaro, A.L.; McLaughlin, M.R.; Quintero-Betancourt, W.; Huffman, D.E.; Rose, J.B.
2003-01-01
Water samples collected throughout several reclamation facilities were analyzed for the presence of infectious Cryptosporidium parvum by the focus detection method-most-probable-number cell culture technique. Results revealed the presence of infectious C. parvum oocysts in 40% of the final disinfected effluent samples. Sampled effluent contained on average seven infectious oocysts per 100 liters. Thus, reclaimed water is not pathogen free but contains infectious C. parvum.
Scheid, Anika; Nebel, Markus E
2012-07-09
Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent) stochastic context-free grammar (SCFG) that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF) approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples), where neither of these two competing approaches generally outperforms the other. In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones), then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst-case time requirements of such an SCFG based sampling method without significant accuracy losses. If, on the other hand, the quality of sampled structures can be observed to strongly react to slight disturbances, there is little hope for improving the complexity by heuristic procedures. We hence provide a reliable test for the hypothesis that a heuristic method could be implemented to improve the time scaling of RNA secondary structure prediction in the worst-case - without sacrificing much of the accuracy of the results. Our experiments indicate that absolute errors generally lead to the generation of useless sample sets, whereas relative errors seem to have only small negative impact on both the predictive accuracy and the overall quality of resulting structure samples. Based on these observations, we present some useful ideas for developing a time-reduced sampling method guaranteeing an acceptable predictive accuracy. We also discuss some inherent drawbacks that arise in the context of approximation. The key results of this paper are crucial for the design of an efficient and competitive heuristic prediction method based on the increasingly accepted and attractive statistical sampling approach. This has indeed been indicated by the construction of prototype algorithms.
2012-01-01
Background Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent) stochastic context-free grammar (SCFG) that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF) approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples), where neither of these two competing approaches generally outperforms the other. Results In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones), then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst-case time requirements of such an SCFG based sampling method without significant accuracy losses. If, on the other hand, the quality of sampled structures can be observed to strongly react to slight disturbances, there is little hope for improving the complexity by heuristic procedures. We hence provide a reliable test for the hypothesis that a heuristic method could be implemented to improve the time scaling of RNA secondary structure prediction in the worst-case – without sacrificing much of the accuracy of the results. Conclusions Our experiments indicate that absolute errors generally lead to the generation of useless sample sets, whereas relative errors seem to have only small negative impact on both the predictive accuracy and the overall quality of resulting structure samples. Based on these observations, we present some useful ideas for developing a time-reduced sampling method guaranteeing an acceptable predictive accuracy. We also discuss some inherent drawbacks that arise in the context of approximation. The key results of this paper are crucial for the design of an efficient and competitive heuristic prediction method based on the increasingly accepted and attractive statistical sampling approach. This has indeed been indicated by the construction of prototype algorithms. PMID:22776037
Comparing population size estimators for plethodontid salamanders
Bailey, L.L.; Simons, T.R.; Pollock, K.H.
2004-01-01
Despite concern over amphibian declines, few studies estimate absolute abundances because of logistic and economic constraints and previously poor estimator performance. Two estimation approaches recommended for amphibian studies are mark-recapture and depletion (or removal) sampling. We compared abundance estimation via various mark-recapture and depletion methods, using data from a three-year study of terrestrial salamanders in Great Smoky Mountains National Park. Our results indicate that short-term closed-population, robust design, and depletion methods estimate surface population of salamanders (i.e., those near the surface and available for capture during a given sampling occasion). In longer duration studies, temporary emigration violates assumptions of both open- and closed-population mark-recapture estimation models. However, if the temporary emigration is completely random, these models should yield unbiased estimates of the total population (superpopulation) of salamanders in the sampled area. We recommend using Pollock's robust design in mark-recapture studies because of its flexibility to incorporate variation in capture probabilities and to estimate temporary emigration probabilities.
Yu, Yun; Degnan, James H.; Nakhleh, Luay
2012-01-01
Gene tree topologies have proven a powerful data source for various tasks, including species tree inference and species delimitation. Consequently, methods for computing probabilities of gene trees within species trees have been developed and widely used in probabilistic inference frameworks. All these methods assume an underlying multispecies coalescent model. However, when reticulate evolutionary events such as hybridization occur, these methods are inadequate, as they do not account for such events. Methods that account for both hybridization and deep coalescence in computing the probability of a gene tree topology currently exist for very limited cases. However, no such methods exist for general cases, owing primarily to the fact that it is currently unknown how to compute the probability of a gene tree topology within the branches of a phylogenetic network. Here we present a novel method for computing the probability of gene tree topologies on phylogenetic networks and demonstrate its application to the inference of hybridization in the presence of incomplete lineage sorting. We reanalyze a Saccharomyces species data set for which multiple analyses had converged on a species tree candidate. Using our method, though, we show that an evolutionary hypothesis involving hybridization in this group has better support than one of strict divergence. A similar reanalysis on a group of three Drosophila species shows that the data is consistent with hybridization. Further, using extensive simulation studies, we demonstrate the power of gene tree topologies at obtaining accurate estimates of branch lengths and hybridization probabilities of a given phylogenetic network. Finally, we discuss identifiability issues with detecting hybridization, particularly in cases that involve extinction or incomplete sampling of taxa. PMID:22536161
Horton, Bethany Jablonski; Wages, Nolan A.; Conaway, Mark R.
2016-01-01
Toxicity probability interval designs have received increasing attention as a dose-finding method in recent years. In this study, we compared the two-stage, likelihood-based continual reassessment method (CRM), modified toxicity probability interval (mTPI), and the Bayesian optimal interval design (BOIN) in order to evaluate each method's performance in dose selection for Phase I trials. We use several summary measures to compare the performance of these methods, including percentage of correct selection (PCS) of the true maximum tolerable dose (MTD), allocation of patients to doses at and around the true MTD, and an accuracy index. This index is an efficiency measure that describes the entire distribution of MTD selection and patient allocation by taking into account the distance between the true probability of toxicity at each dose level and the target toxicity rate. The simulation study considered a broad range of toxicity curves and various sample sizes. When considering PCS, we found that CRM outperformed the two competing methods in most scenarios, followed by BOIN, then mTPI. We observed a similar trend when considering the accuracy index for dose allocation, where CRM most often outperformed both the mTPI and BOIN. These trends were more pronounced with increasing number of dose levels. PMID:27435150
NASA Astrophysics Data System (ADS)
von der Linden, Wolfgang; Dose, Volker; von Toussaint, Udo
2014-06-01
Preface; Part I. Introduction: 1. The meaning of probability; 2. Basic definitions; 3. Bayesian inference; 4. Combinatrics; 5. Random walks; 6. Limit theorems; 7. Continuous distributions; 8. The central limit theorem; 9. Poisson processes and waiting times; Part II. Assigning Probabilities: 10. Transformation invariance; 11. Maximum entropy; 12. Qualified maximum entropy; 13. Global smoothness; Part III. Parameter Estimation: 14. Bayesian parameter estimation; 15. Frequentist parameter estimation; 16. The Cramer-Rao inequality; Part IV. Testing Hypotheses: 17. The Bayesian way; 18. The frequentist way; 19. Sampling distributions; 20. Bayesian vs frequentist hypothesis tests; Part V. Real World Applications: 21. Regression; 22. Inconsistent data; 23. Unrecognized signal contributions; 24. Change point problems; 25. Function estimation; 26. Integral equations; 27. Model selection; 28. Bayesian experimental design; Part VI. Probabilistic Numerical Techniques: 29. Numerical integration; 30. Monte Carlo methods; 31. Nested sampling; Appendixes; References; Index.
Point counts are a common method for sampling avian distribution and abundance. Though methods for estimating detection probabilities are available, many analyses use raw counts and do not correct for detectability. We use a removal model of detection within an N-mixture approa...
Spiegelhalter, D J; Freedman, L S
1986-01-01
The 'textbook' approach to determining sample size in a clinical trial has some fundamental weaknesses which we discuss. We describe a new predictive method which takes account of prior clinical opinion about the treatment difference. The method adopts the point of clinical equivalence (determined by interviewing the clinical participants) as the null hypothesis. Decision rules at the end of the study are based on whether the interval estimate of the treatment difference (classical or Bayesian) includes the null hypothesis. The prior distribution is used to predict the probabilities of making the decisions to use one or other treatment or to reserve final judgement. It is recommended that sample size be chosen to control the predicted probability of the last of these decisions. An example is given from a multi-centre trial of superficial bladder cancer.
Improving inferences from fisheries capture-recapture studies through remote detection of PIT tags
Hewitt, David A.; Janney, Eric C.; Hayes, Brian S.; Shively, Rip S.
2010-01-01
Models for capture-recapture data are commonly used in analyses of the dynamics of fish and wildlife populations, especially for estimating vital parameters such as survival. Capture-recapture methods provide more reliable inferences than other methods commonly used in fisheries studies. However, for rare or elusive fish species, parameter estimation is often hampered by small probabilities of re-encountering tagged fish when encounters are obtained through traditional sampling methods. We present a case study that demonstrates how remote antennas for passive integrated transponder (PIT) tags can increase encounter probabilities and the precision of survival estimates from capture-recapture models. Between 1999 and 2007, trammel nets were used to capture and tag over 8,400 endangered adult Lost River suckers (Deltistes luxatus) during the spawning season in Upper Klamath Lake, Oregon. Despite intensive sampling at relatively discrete spawning areas, encounter probabilities from Cormack-Jolly-Seber models were consistently low (< 0.2) and the precision of apparent annual survival estimates was poor. Beginning in 2005, remote PIT tag antennas were deployed at known spawning locations to increase the probability of re-encountering tagged fish. We compare results based only on physical recaptures with results based on both physical recaptures and remote detections to demonstrate the substantial improvement in estimates of encounter probabilities (approaching 100%) and apparent annual survival provided by the remote detections. The richer encounter histories provided robust inferences about the dynamics of annual survival and have made it possible to explore more realistic models and hypotheses about factors affecting the conservation and recovery of this endangered species. Recent advances in technology related to PIT tags have paved the way for creative implementation of large-scale tagging studies in systems where they were previously considered impracticable.
Representation of complex probabilities and complex Gibbs sampling
NASA Astrophysics Data System (ADS)
Salcedo, Lorenzo Luis
2018-03-01
Complex weights appear in Physics which are beyond a straightforward importance sampling treatment, as required in Monte Carlo calculations. This is the wellknown sign problem. The complex Langevin approach amounts to effectively construct a positive distribution on the complexified manifold reproducing the expectation values of the observables through their analytical extension. Here we discuss the direct construction of such positive distributions paying attention to their localization on the complexified manifold. Explicit localized representations are obtained for complex probabilities defined on Abelian and non Abelian groups. The viability and performance of a complex version of the heat bath method, based on such representations, is analyzed.
More than Just Convenient: The Scientific Merits of Homogeneous Convenience Samples
Jager, Justin; Putnick, Diane L.; Bornstein, Marc H.
2017-01-01
Despite their disadvantaged generalizability relative to probability samples, non-probability convenience samples are the standard within developmental science, and likely will remain so because probability samples are cost-prohibitive and most available probability samples are ill-suited to examine developmental questions. In lieu of focusing on how to eliminate or sharply reduce reliance on convenience samples within developmental science, here we propose how to augment their advantages when it comes to understanding population effects as well as subpopulation differences. Although all convenience samples have less clear generalizability than probability samples, we argue that homogeneous convenience samples have clearer generalizability relative to conventional convenience samples. Therefore, when researchers are limited to convenience samples, they should consider homogeneous convenience samples as a positive alternative to conventional or heterogeneous) convenience samples. We discuss future directions as well as potential obstacles to expanding the use of homogeneous convenience samples in developmental science. PMID:28475254
[Isolation and identification of Cronobacter (Enterobacter sakazakii) strains from food].
Dong, Xiaohui; Li, Chengsi; Wu, Qingping; Zhang, Jumei; Mo, Shuping; Guo, Weipeng; Yang, Xiaojuan; Xu, Xiaoke
2013-05-04
This study aimed to detect and quantify Cronobacter in 300 powdered milk samples and 50 non-powdered milk samples. Totally, 24 Cronobacter (formerly Enterobacter sakazakii) strains isolated from powdered milk and other foods were identified and confirmed. Cronobacter strains were detected quantitatively using most probable number (MPN) method and molecular detection method. We identified 24 Cronobacter strains using biochemical patterns, including indole production and dulcitol, malonate, melezitose, turanose, and myo-Inositol utilization. Of the 24 strains, their 16S rRNA genes were sequenced, and constructed phylogenetic tree by N-J (Neighbour-Joining) with the 16S rRNA gene sequences of 17 identified Cronobacter strains and 10 non-Cronobacter strains. Quantitative detection showed that Cronobacter strains were detected in 23 out of 350 samples yielding 6.6% detection rate. Twenty-four Cronobacter strains were isolated from 23 samples and the Cronobacter was more than 100 MPN/100g in 4 samples out of 23 samples. The 24 Cronobacter spp. isolates strains were identified and confirmed, including 19 Cronobacter sakazakii strains, 2 C. malonaticus strains, 2 C. dubliensis subsp. lactaridi strains, and 1 C. muytjensii strain. The combination of molecular detection method and most probable number (MPN) method could be suitable for the detection of Cronobacter in powdered milk, with low rate of contamination and high demand of quantitative detection. 24 isolated strains were confirmed and identified by biochemical patterns and molecular technology, and C. sakazakii could be the dominant species. The problem of Cronobacter in powdered milk should be a hidden danger to nurseling, and should catch the government and consumer's attention.
NASA Astrophysics Data System (ADS)
Liu, Y.; Pau, G. S. H.; Finsterle, S.
2015-12-01
Parameter inversion involves inferring the model parameter values based on sparse observations of some observables. To infer the posterior probability distributions of the parameters, Markov chain Monte Carlo (MCMC) methods are typically used. However, the large number of forward simulations needed and limited computational resources limit the complexity of the hydrological model we can use in these methods. In view of this, we studied the implicit sampling (IS) method, an efficient importance sampling technique that generates samples in the high-probability region of the posterior distribution and thus reduces the number of forward simulations that we need to run. For a pilot-point inversion of a heterogeneous permeability field based on a synthetic ponded infiltration experiment simulated with TOUGH2 (a subsurface modeling code), we showed that IS with linear map provides an accurate Bayesian description of the parameterized permeability field at the pilot points with just approximately 500 forward simulations. We further studied the use of surrogate models to improve the computational efficiency of parameter inversion. We implemented two reduced-order models (ROMs) for the TOUGH2 forward model. One is based on polynomial chaos expansion (PCE), of which the coefficients are obtained using the sparse Bayesian learning technique to mitigate the "curse of dimensionality" of the PCE terms. The other model is Gaussian process regression (GPR) for which different covariance, likelihood and inference models are considered. Preliminary results indicate that ROMs constructed based on the prior parameter space perform poorly. It is thus impractical to replace this hydrological model by a ROM directly in a MCMC method. However, the IS method can work with a ROM constructed for parameters in the close vicinity of the maximum a posteriori probability (MAP) estimate. We will discuss the accuracy and computational efficiency of using ROMs in the implicit sampling procedure for the hydrological problem considered. This work was supported, in part, by the U.S. Dept. of Energy under Contract No. DE-AC02-05CH11231
Fractional Gaussian model in global optimization
NASA Astrophysics Data System (ADS)
Dimri, V. P.; Srivastava, R. P.
2009-12-01
Earth system is inherently non-linear and it can be characterized well if we incorporate no-linearity in the formulation and solution of the problem. General tool often used for characterization of the earth system is inversion. Traditionally inverse problems are solved using least-square based inversion by linearizing the formulation. The initial model in such inversion schemes is often assumed to follow posterior Gaussian probability distribution. It is now well established that most of the physical properties of the earth follow power law (fractal distribution). Thus, the selection of initial model based on power law probability distribution will provide more realistic solution. We present a new method which can draw samples of posterior probability density function very efficiently using fractal based statistics. The application of the method has been demonstrated to invert band limited seismic data with well control. We used fractal based probability density function which uses mean, variance and Hurst coefficient of the model space to draw initial model. Further this initial model is used in global optimization inversion scheme. Inversion results using initial models generated by our method gives high resolution estimates of the model parameters than the hitherto used gradient based liner inversion method.
Meta-analysis with missing study-level sample variance data.
Chowdhry, Amit K; Dworkin, Robert H; McDermott, Michael P
2016-07-30
We consider a study-level meta-analysis with a normally distributed outcome variable and possibly unequal study-level variances, where the object of inference is the difference in means between a treatment and control group. A common complication in such an analysis is missing sample variances for some studies. A frequently used approach is to impute the weighted (by sample size) mean of the observed variances (mean imputation). Another approach is to include only those studies with variances reported (complete case analysis). Both mean imputation and complete case analysis are only valid under the missing-completely-at-random assumption, and even then the inverse variance weights produced are not necessarily optimal. We propose a multiple imputation method employing gamma meta-regression to impute the missing sample variances. Our method takes advantage of study-level covariates that may be used to provide information about the missing data. Through simulation studies, we show that multiple imputation, when the imputation model is correctly specified, is superior to competing methods in terms of confidence interval coverage probability and type I error probability when testing a specified group difference. Finally, we describe a similar approach to handling missing variances in cross-over studies. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Gotelli, Nicholas J.; Dorazio, Robert M.; Ellison, Aaron M.; Grossman, Gary D.
2010-01-01
Quantifying patterns of temporal trends in species assemblages is an important analytical challenge in community ecology. We describe methods of analysis that can be applied to a matrix of counts of individuals that is organized by species (rows) and time-ordered sampling periods (columns). We first developed a bootstrapping procedure to test the null hypothesis of random sampling from a stationary species abundance distribution with temporally varying sampling probabilities. This procedure can be modified to account for undetected species. We next developed a hierarchical model to estimate species-specific trends in abundance while accounting for species-specific probabilities of detection. We analysed two long-term datasets on stream fishes and grassland insects to demonstrate these methods. For both assemblages, the bootstrap test indicated that temporal trends in abundance were more heterogeneous than expected under the null model. We used the hierarchical model to estimate trends in abundance and identified sets of species in each assemblage that were steadily increasing, decreasing or remaining constant in abundance over more than a decade of standardized annual surveys. Our methods of analysis are broadly applicable to other ecological datasets, and they represent an advance over most existing procedures, which do not incorporate effects of incomplete sampling and imperfect detection.
Probability of identification: adulteration of American Ginseng with Asian Ginseng.
Harnly, James; Chen, Pei; Harrington, Peter De B
2013-01-01
The AOAC INTERNATIONAL guidelines for validation of botanical identification methods were applied to the detection of Asian Ginseng [Panax ginseng (PG)] as an adulterant for American Ginseng [P. quinquefolius (PQ)] using spectral fingerprints obtained by flow injection mass spectrometry (FIMS). Samples of 100% PQ and 100% PG were physically mixed to provide 90, 80, and 50% PQ. The multivariate FIMS fingerprint data were analyzed using soft independent modeling of class analogy (SIMCA) based on 100% PQ. The Q statistic, a measure of the degree of non-fit of the test samples with the calibration model, was used as the analytical parameter. FIMS was able to discriminate between 100% PQ and 100% PG, and between 100% PQ and 90, 80, and 50% PQ. The probability of identification (POI) curve was estimated based on the SD of 90% PQ. A digital model of adulteration, obtained by mathematically summing the experimentally acquired spectra of 100% PQ and 100% PG in the desired ratios, agreed well with the physical data and provided an easy and more accurate method for constructing the POI curve. Two chemometric modeling methods, SIMCA and fuzzy optimal associative memories, and two classification methods, partial least squares-discriminant analysis and fuzzy rule-building expert systems, were applied to the data. The modeling methods correctly identified the adulterated samples; the classification methods did not.
Quantifying prognosis with risk predictions.
Pace, Nathan L; Eberhart, Leopold H J; Kranke, Peter R
2012-01-01
Prognosis is a forecast, based on present observations in a patient, of their probable outcome from disease, surgery and so on. Research methods for the development of risk probabilities may not be familiar to some anaesthesiologists. We briefly describe methods for identifying risk factors and risk scores. A probability prediction rule assigns a risk probability to a patient for the occurrence of a specific event. Probability reflects the continuum between absolute certainty (Pi = 1) and certified impossibility (Pi = 0). Biomarkers and clinical covariates that modify risk are known as risk factors. The Pi as modified by risk factors can be estimated by identifying the risk factors and their weighting; these are usually obtained by stepwise logistic regression. The accuracy of probabilistic predictors can be separated into the concepts of 'overall performance', 'discrimination' and 'calibration'. Overall performance is the mathematical distance between predictions and outcomes. Discrimination is the ability of the predictor to rank order observations with different outcomes. Calibration is the correctness of prediction probabilities on an absolute scale. Statistical methods include the Brier score, coefficient of determination (Nagelkerke R2), C-statistic and regression calibration. External validation is the comparison of the actual outcomes to the predicted outcomes in a new and independent patient sample. External validation uses the statistical methods of overall performance, discrimination and calibration and is uniformly recommended before acceptance of the prediction model. Evidence from randomised controlled clinical trials should be obtained to show the effectiveness of risk scores for altering patient management and patient outcomes.
The COLA Collision Avoidance Method
NASA Astrophysics Data System (ADS)
Assmann, K.; Berger, J.; Grothkopp, S.
2009-03-01
In the following we present a collision avoidance method named COLA. The method has been designed to predict collisions for Earth orbiting spacecraft on any orbits, including orbit changes, with other space-born objects. The point in time of a collision and the collision probability are determined. To guarantee effective processing the COLA method uses a modular design and is composed of several components which are either developed within this work or deduced from existing algorithms: A filtering module, the close approach determination, the collision detection and the collision probability calculation. A software tool which implements the COLA method has been verified using various test cases built from sample missions. This software has been implemented in the C++ programming language and serves as a universal collision detection tool at LSE Space Engineering & Operations AG.
External validation of a Cox prognostic model: principles and methods
2013-01-01
Background A prognostic model should not enter clinical practice unless it has been demonstrated that it performs a useful role. External validation denotes evaluation of model performance in a sample independent of that used to develop the model. Unlike for logistic regression models, external validation of Cox models is sparsely treated in the literature. Successful validation of a model means achieving satisfactory discrimination and calibration (prediction accuracy) in the validation sample. Validating Cox models is not straightforward because event probabilities are estimated relative to an unspecified baseline function. Methods We describe statistical approaches to external validation of a published Cox model according to the level of published information, specifically (1) the prognostic index only, (2) the prognostic index together with Kaplan-Meier curves for risk groups, and (3) the first two plus the baseline survival curve (the estimated survival function at the mean prognostic index across the sample). The most challenging task, requiring level 3 information, is assessing calibration, for which we suggest a method of approximating the baseline survival function. Results We apply the methods to two comparable datasets in primary breast cancer, treating one as derivation and the other as validation sample. Results are presented for discrimination and calibration. We demonstrate plots of survival probabilities that can assist model evaluation. Conclusions Our validation methods are applicable to a wide range of prognostic studies and provide researchers with a toolkit for external validation of a published Cox model. PMID:23496923
The utility of Bayesian predictive probabilities for interim monitoring of clinical trials
Connor, Jason T.; Ayers, Gregory D; Alvarez, JoAnn
2014-01-01
Background Bayesian predictive probabilities can be used for interim monitoring of clinical trials to estimate the probability of observing a statistically significant treatment effect if the trial were to continue to its predefined maximum sample size. Purpose We explore settings in which Bayesian predictive probabilities are advantageous for interim monitoring compared to Bayesian posterior probabilities, p-values, conditional power, or group sequential methods. Results For interim analyses that address prediction hypotheses, such as futility monitoring and efficacy monitoring with lagged outcomes, only predictive probabilities properly account for the amount of data remaining to be observed in a clinical trial and have the flexibility to incorporate additional information via auxiliary variables. Limitations Computational burdens limit the feasibility of predictive probabilities in many clinical trial settings. The specification of prior distributions brings additional challenges for regulatory approval. Conclusions The use of Bayesian predictive probabilities enables the choice of logical interim stopping rules that closely align with the clinical decision making process. PMID:24872363
Kolmogorov-Smirnov test for spatially correlated data
Olea, R.A.; Pawlowsky-Glahn, V.
2009-01-01
The Kolmogorov-Smirnov test is a convenient method for investigating whether two underlying univariate probability distributions can be regarded as undistinguishable from each other or whether an underlying probability distribution differs from a hypothesized distribution. Application of the test requires that the sample be unbiased and the outcomes be independent and identically distributed, conditions that are violated in several degrees by spatially continuous attributes, such as topographical elevation. A generalized form of the bootstrap method is used here for the purpose of modeling the distribution of the statistic D of the Kolmogorov-Smirnov test. The innovation is in the resampling, which in the traditional formulation of bootstrap is done by drawing from the empirical sample with replacement presuming independence. The generalization consists of preparing resamplings with the same spatial correlation as the empirical sample. This is accomplished by reading the value of unconditional stochastic realizations at the sampling locations, realizations that are generated by simulated annealing. The new approach was tested by two empirical samples taken from an exhaustive sample closely following a lognormal distribution. One sample was a regular, unbiased sample while the other one was a clustered, preferential sample that had to be preprocessed. Our results show that the p-value for the spatially correlated case is always larger that the p-value of the statistic in the absence of spatial correlation, which is in agreement with the fact that the information content of an uncorrelated sample is larger than the one for a spatially correlated sample of the same size. ?? Springer-Verlag 2008.
Rand E. Eads; Mark R. Boolootian; Steven C. [Inventors] Hankin
1987-01-01
Abstract - A programmable calculator is connected to a pumping sampler by an interface circuit board. The calculator has a sediment sampling program stored therein and includes a timer to periodically wake up the calculator. Sediment collection is controlled by a Selection At List Time (SALT) scheme in which the probability of taking a sample is proportional to its...
NASA Astrophysics Data System (ADS)
Kang, Zhizhong
2013-10-01
This paper presents a new approach to automatic registration of terrestrial laser scanning (TLS) point clouds utilizing a novel robust estimation method by an efficient BaySAC (BAYes SAmpling Consensus). The proposed method directly generates reflectance images from 3D point clouds, and then using SIFT algorithm extracts keypoints to identify corresponding image points. The 3D corresponding points, from which transformation parameters between point clouds are computed, are acquired by mapping the 2D ones onto the point cloud. To remove false accepted correspondences, we implement a conditional sampling method to select the n data points with the highest inlier probabilities as a hypothesis set and update the inlier probabilities of each data point using simplified Bayes' rule for the purpose of improving the computation efficiency. The prior probability is estimated by the verification of the distance invariance between correspondences. The proposed approach is tested on four data sets acquired by three different scanners. The results show that, comparing with the performance of RANSAC, BaySAC leads to less iterations and cheaper computation cost when the hypothesis set is contaminated with more outliers. The registration results also indicate that, the proposed algorithm can achieve high registration accuracy on all experimental datasets.
Estimation and applications of size-biased distributions in forestry
Jeffrey H. Gove
2003-01-01
Size-biased distributions arise naturally in several contexts in forestry and ecology. Simple power relationships (e.g. basal area and diameter at breast height) between variables are one such area of interest arising from a modelling perspective. Another, probability proportional to size PPS) sampling, is found in the most widely used methods for sampling standing or...
ERIC Educational Resources Information Center
Jia, Yue; Stokes, Lynne; Harris, Ian; Wang, Yan
2011-01-01
Estimation of parameters of random effects models from samples collected via complex multistage designs is considered. One way to reduce estimation bias due to unequal probabilities of selection is to incorporate sampling weights. Many researchers have been proposed various weighting methods (Korn, & Graubard, 2003; Pfeffermann, Skinner,…
Wohlsen, T; Bates, J; Vesey, G; Robinson, W A; Katouli, M
2006-04-01
To use BioBall cultures as a precise reference standard to evaluate methods for enumeration of Escherichia coli and other coliform bacteria in water samples. Eight methods were evaluated including membrane filtration, standard plate count (pour and spread plate methods), defined substrate technology methods (Colilert and Colisure), the most probable number method and the Petrifilm disposable plate method. Escherichia coli and Enterobacter aerogenes BioBall cultures containing 30 organisms each were used. All tests were performed using 10 replicates. The mean recovery of both bacteria varied with the different methods employed. The best and most consistent results were obtained with Petrifilm and the pour plate method. Other methods either yielded a low recovery or showed significantly high variability between replicates. The BioBall is a very suitable quality control tool for evaluating the efficiency of methods for bacterial enumeration in water samples.
Raman spectroscopy detection of platelet for Alzheimer’s disease with predictive probabilities
NASA Astrophysics Data System (ADS)
Wang, L. J.; Du, X. Q.; Du, Z. W.; Yang, Y. Y.; Chen, P.; Tian, Q.; Shang, X. L.; Liu, Z. C.; Yao, X. Q.; Wang, J. Z.; Wang, X. H.; Cheng, Y.; Peng, J.; Shen, A. G.; Hu, J. M.
2014-08-01
Alzheimer’s disease (AD) is a common form of dementia. Early and differential diagnosis of AD has always been an arduous task for the medical expert due to the unapparent early symptoms and the currently imperfect imaging examination methods. Therefore, obtaining reliable markers with clinical diagnostic value in easily assembled samples is worthy and significant. Our previous work with laser Raman spectroscopy (LRS), in which we detected platelet samples of different ages of AD transgenic mice and non-transgenic controls, showed great effect in the diagnosis of AD. In addition, a multilayer perception network (MLP) classification method was adopted to discriminate the spectral data. However, there were disturbances, which were induced by noise from the machines and so on, in the data set; thus the MLP method had to be trained with large-scale data. In this paper, we aim to re-establish the classification models of early and advanced AD and the control group with fewer features, and apply some mechanism of noise reduction to improve the accuracy of models. An adaptive classification method based on the Gaussian process (GP) featured, with predictive probabilities, is proposed, which could tell when a data set is related to some kind of disease. Compared with MLP on the same feature set, GP showed much better performance in the experimental results. What is more, since the spectra of platelets are isolated from AD, GP has good expansibility and can be applied in diagnosis of many other similar diseases, such as Parkinson’s disease (PD). Spectral data of 4 month and 12 month AD platelets, as well as control data, were collected. With predictive probabilities, the proposed GP classification method improved the diagnostic sensitivity to nearly 100%. Samples were also collected from PD platelets as classification and comparison to the 12 month AD. The presented approach and our experiments indicate that utilization of GP with predictive probabilities in platelet LRS detection analysis turns out to be more accurate for early and differential diagnosis of AD and has a wide application prospect.
Hyper-polyhedron model applied to molecular screening of guanidines as Na/H exchange inhibitors.
Bao, Xin-Hua; Lu, Wen-Cong; Liu, Liang; Chen, Nian-Yi
2003-05-01
To investigate structure-activity relationships of N-(3-Oxo-3,4-dihydro-2H-benzo[1,4]oxazine-6-carbonyl) guanidines in Na/H exchange inhibitory activities and probe into a new method of the computer-aided molecular screening. The hyper-polyhedron model (HPM) was proposed in our lab. The samples with probably higher activities could be determined in such a way that their representing points should be in the hyper-polyhedron region where all known samples with high activities were distributed. And the predictive ability of different methods available was tested by the cross-validation experiment. The accurate rate of molecular screening of N-(3-Oxo-3,4-dihydro-2H-benzo[1,4]oxazine-6-carbonyl) guanidines by HPM was much higher than that obtained by PCA (principal component analysis) and Fisher methods for the data set available here. Therefore, HPM could be used as a powerful tool for screening new compounds with probably higher activities.
Inferring probabilistic stellar rotation periods using Gaussian processes
NASA Astrophysics Data System (ADS)
Angus, Ruth; Morton, Timothy; Aigrain, Suzanne; Foreman-Mackey, Daniel; Rajpaul, Vinesh
2018-02-01
Variability in the light curves of spotted, rotating stars is often non-sinusoidal and quasi-periodic - spots move on the stellar surface and have finite lifetimes, causing stellar flux variations to slowly shift in phase. A strictly periodic sinusoid therefore cannot accurately model a rotationally modulated stellar light curve. Physical models of stellar surfaces have many drawbacks preventing effective inference, such as highly degenerate or high-dimensional parameter spaces. In this work, we test an appropriate effective model: a Gaussian Process with a quasi-periodic covariance kernel function. This highly flexible model allows sampling of the posterior probability density function of the periodic parameter, marginalizing over the other kernel hyperparameters using a Markov Chain Monte Carlo approach. To test the effectiveness of this method, we infer rotation periods from 333 simulated stellar light curves, demonstrating that the Gaussian process method produces periods that are more accurate than both a sine-fitting periodogram and an autocorrelation function method. We also demonstrate that it works well on real data, by inferring rotation periods for 275 Kepler stars with previously measured periods. We provide a table of rotation periods for these and many more, altogether 1102 Kepler objects of interest, and their posterior probability density function samples. Because this method delivers posterior probability density functions, it will enable hierarchical studies involving stellar rotation, particularly those involving population modelling, such as inferring stellar ages, obliquities in exoplanet systems, or characterizing star-planet interactions. The code used to implement this method is available online.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Abulencia, A.; Acosta, D.; Adelman, Jahred A.
2006-02-01
The authors describe a measurement of the top quark mass from events produced in p{bar p} collisions at a center-of-mass energy of 1.96 TeV, using the Collider Detector at Fermilab. They identify t{bar t} candidates where both W bosons from the top quarks decay into leptons (e{nu}, {mu}{nu}, or {tau}{nu}) from a data sample of 360 pb{sup -1}. The top quark mass is reconstructed in each event separately by three different methods, which draw upon simulated distributions of the neutrino pseudorapidity, t{bar t} longitudinal momentum, or neutrino azimuthal angle in order to extract probability distributions for the top quark mass.more » For each method, representative mass distributions, or templates, are constructed from simulated samples of signal and background events, and parameterized to form continuous probability density functions. A likelihood fit incorporating these parameterized templates is then performed on the data sample masses in order to derive a final top quark mass. Combining the three template methods, taking into account correlations in their statistical and systematic uncertainties, results in a top quark mass measurement of 170.1 {+-} 6.0(stat.) {+-} 4.1(syst.) GeV/c{sup 2}.« less
Probability-based hazard avoidance guidance for planetary landing
NASA Astrophysics Data System (ADS)
Yuan, Xu; Yu, Zhengshi; Cui, Pingyuan; Xu, Rui; Zhu, Shengying; Cao, Menglong; Luan, Enjie
2018-03-01
Future landing and sample return missions on planets and small bodies will seek landing sites with high scientific value, which may be located in hazardous terrains. Autonomous landing in such hazardous terrains and highly uncertain planetary environments is particularly challenging. Onboard hazard avoidance ability is indispensable, and the algorithms must be robust to uncertainties. In this paper, a novel probability-based hazard avoidance guidance method is developed for landing in hazardous terrains on planets or small bodies. By regarding the lander state as probabilistic, the proposed guidance algorithm exploits information on the uncertainty of lander position and calculates the probability of collision with each hazard. The collision probability serves as an accurate safety index, which quantifies the impact of uncertainties on the lander safety. Based on the collision probability evaluation, the state uncertainty of the lander is explicitly taken into account in the derivation of the hazard avoidance guidance law, which contributes to enhancing the robustness to the uncertain dynamics of planetary landing. The proposed probability-based method derives fully analytic expressions and does not require off-line trajectory generation. Therefore, it is appropriate for real-time implementation. The performance of the probability-based guidance law is investigated via a set of simulations, and the effectiveness and robustness under uncertainties are demonstrated.
Ruzik, L; Obarski, N; Papierz, A; Mojski, M
2015-06-01
High-performance liquid chromatography (HPLC) with UV/VIS spectrophotometric detection combined with the chemometric method of cluster analysis (CA) was used for the assessment of repeatability of composition of nine types of perfumed waters. In addition, the chromatographic method of separating components of the perfume waters under analysis was subjected to an optimization procedure. The chromatograms thus obtained were used as sources of data for the chemometric method of cluster analysis (CA). The result was a classification of a set comprising 39 perfumed water samples with a similar composition at a specified level of probability (level of agglomeration). A comparison of the classification with the manufacturer's declarations reveals a good degree of consistency and demonstrates similarity between samples in different classes. A combination of the chromatographic method with cluster analysis (HPLC UV/VIS - CA) makes it possible to quickly assess the repeatability of composition of perfumed waters at selected levels of probability. © 2014 Society of Cosmetic Scientists and the Société Française de Cosmétologie.
Gaussian process surrogates for failure detection: A Bayesian experimental design approach
NASA Astrophysics Data System (ADS)
Wang, Hongqiao; Lin, Guang; Li, Jinglai
2016-05-01
An important task of uncertainty quantification is to identify the probability of undesired events, in particular, system failures, caused by various sources of uncertainties. In this work we consider the construction of Gaussian process surrogates for failure detection and failure probability estimation. In particular, we consider the situation that the underlying computer models are extremely expensive, and in this setting, determining the sampling points in the state space is of essential importance. We formulate the problem as an optimal experimental design for Bayesian inferences of the limit state (i.e., the failure boundary) and propose an efficient numerical scheme to solve the resulting optimization problem. In particular, the proposed limit-state inference method is capable of determining multiple sampling points at a time, and thus it is well suited for problems where multiple computer simulations can be performed in parallel. The accuracy and performance of the proposed method is demonstrated by both academic and practical examples.
Santra, Kalyan; Smith, Emily A.; Petrich, Jacob W.; ...
2016-12-12
It is often convenient to know the minimum amount of data needed in order to obtain a result of desired accuracy and precision. It is a necessity in the case of subdiffraction-limited microscopies, such as stimulated emission depletion (STED) microscopy, owing to the limited sample volumes and the extreme sensitivity of the samples to photobleaching and photodamage. We present a detailed comparison of probability-based techniques (the maximum likelihood method and methods based on the binomial and the Poisson distributions) with residual minimization-based techniques for retrieving the fluorescence decay parameters for various two-fluorophore mixtures, as a function of the total numbermore » of photon counts, in time-correlated, single-photon counting experiments. The probability-based techniques proved to be the most robust (insensitive to initial values) in retrieving the target parameters and, in fact, performed equivalently to 2-3 significant figures. This is to be expected, as we demonstrate that the three methods are fundamentally related. Furthermore, methods based on the Poisson and binomial distributions have the desirable feature of providing a bin-by-bin analysis of a single fluorescence decay trace, which thus permits statistics to be acquired using only the one trace for not only the mean and median values of the fluorescence decay parameters but also for the associated standard deviations. Lastly, these probability-based methods lend themselves well to the analysis of the sparse data sets that are encountered in subdiffraction-limited microscopies.« less
Valid statistical inference methods for a case-control study with missing data.
Tian, Guo-Liang; Zhang, Chi; Jiang, Xuejun
2018-04-01
The main objective of this paper is to derive the valid sampling distribution of the observed counts in a case-control study with missing data under the assumption of missing at random by employing the conditional sampling method and the mechanism augmentation method. The proposed sampling distribution, called the case-control sampling distribution, can be used to calculate the standard errors of the maximum likelihood estimates of parameters via the Fisher information matrix and to generate independent samples for constructing small-sample bootstrap confidence intervals. Theoretical comparisons of the new case-control sampling distribution with two existing sampling distributions exhibit a large difference. Simulations are conducted to investigate the influence of the three different sampling distributions on statistical inferences. One finding is that the conclusion by the Wald test for testing independency under the two existing sampling distributions could be completely different (even contradictory) from the Wald test for testing the equality of the success probabilities in control/case groups under the proposed distribution. A real cervical cancer data set is used to illustrate the proposed statistical methods.
Chen, Yi; Pouillot, Régis; S Burall, Laurel; Strain, Errol A; Van Doren, Jane M; De Jesus, Antonio J; Laasri, Anna; Wang, Hua; Ali, Laila; Tatavarthy, Aparna; Zhang, Guodong; Hu, Lijun; Day, James; Sheth, Ishani; Kang, Jihun; Sahu, Surasri; Srinivasan, Devayani; Brown, Eric W; Parish, Mickey; Zink, Donald L; Datta, Atin R; Hammack, Thomas S; Macarisin, Dumitru
2017-01-16
A precise and accurate method for enumeration of low level of Listeria monocytogenes in foods is critical to a variety of studies. In this study, paired comparison of most probable number (MPN) and direct plating enumeration of L. monocytogenes was conducted on a total of 1730 outbreak-associated ice cream samples that were naturally contaminated with low level of L. monocytogenes. MPN was performed on all 1730 samples. Direct plating was performed on all samples using the RAPID'L.mono (RLM) agar (1600 samples) and agar Listeria Ottaviani and Agosti (ALOA; 130 samples). Probabilistic analysis with Bayesian inference model was used to compare paired direct plating and MPN estimates of L. monocytogenes in ice cream samples because assumptions implicit in ordinary least squares (OLS) linear regression analyses were not met for such a comparison. The probabilistic analysis revealed good agreement between the MPN and direct plating estimates, and this agreement showed that the MPN schemes and direct plating schemes using ALOA or RLM evaluated in the present study were suitable for enumerating low levels of L. monocytogenes in these ice cream samples. The statistical analysis further revealed that OLS linear regression analyses of direct plating and MPN data did introduce bias that incorrectly characterized systematic differences between estimates from the two methods. Published by Elsevier B.V.
Härkänen, Tommi; Kaikkonen, Risto; Virtala, Esa; Koskinen, Seppo
2014-11-06
To assess the nonresponse rates in a questionnaire survey with respect to administrative register data, and to correct the bias statistically. The Finnish Regional Health and Well-being Study (ATH) in 2010 was based on a national sample and several regional samples. Missing data analysis was based on socio-demographic register data covering the whole sample. Inverse probability weighting (IPW) and doubly robust (DR) methods were estimated using the logistic regression model, which was selected using the Bayesian information criteria. The crude, weighted and true self-reported turnout in the 2008 municipal election and prevalences of entitlements to specially reimbursed medication, and the crude and weighted body mass index (BMI) means were compared. The IPW method appeared to remove a relatively large proportion of the bias compared to the crude prevalence estimates of the turnout and the entitlements to specially reimbursed medication. Several demographic factors were shown to be associated with missing data, but few interactions were found. Our results suggest that the IPW method can improve the accuracy of results of a population survey, and the model selection provides insight into the structure of missing data. However, health-related missing data mechanisms are beyond the scope of statistical methods, which mainly rely on socio-demographic information to correct the results.
Effects of Tube Processing on the Fatigue Life of Nitinol
NASA Astrophysics Data System (ADS)
Adler, Paul; Frei, Rudolf; Kimiecik, Michael; Briant, Paul; James, Brad; Liu, Chuan
2018-03-01
Nitinol tubes were manufactured from Standard Grade VIM-VAR ingots using Tube Manufacturing method "TM-1." Diamond-shaped samples were laser cut, shape set, then fatigued at 37 °C to 107 cycles. The 50, 5, and 1% probabilities of fracture were calculated as a function of number of cycles to fracture and compared with probabilities determined for fatigue data published by Robertson et al. (J Mech Behav Biomater 51:119-131, 2015). Robertson tested similar diamonds made from the same standard grade of Nitinol as in the current study, two other standard grades of Nitinol, and two high-purity grades of Nitinol expressly designed to improve fatigue life. Robertson's tubes were manufactured using Tube Manufacturing method "TM-2." Fatigue performance of TM-1 and TM-2 diamonds were compared: At 107 cycles, strain amplitudes corresponding to the three probabilities of fracture of the TM-1 diamonds were 2-3 times those of the TM-2 diamonds made from the same grade of Nitinol, and comparable to TM-2 diamonds made from the higher-purity materials. This difference is likely a result of the differences in tube manufacturing techniques and effects on resulting microstructures. Microstructural analyses of samples revealed a correlation between the median probability of fracture and median inclusion diameter that follows an inverse power-law function of the form y ≈ x -1.
Computational methods for efficient structural reliability and reliability sensitivity analysis
NASA Technical Reports Server (NTRS)
Wu, Y.-T.
1993-01-01
This paper presents recent developments in efficient structural reliability analysis methods. The paper proposes an efficient, adaptive importance sampling (AIS) method that can be used to compute reliability and reliability sensitivities. The AIS approach uses a sampling density that is proportional to the joint PDF of the random variables. Starting from an initial approximate failure domain, sampling proceeds adaptively and incrementally with the goal of reaching a sampling domain that is slightly greater than the failure domain to minimize over-sampling in the safe region. Several reliability sensitivity coefficients are proposed that can be computed directly and easily from the above AIS-based failure points. These probability sensitivities can be used for identifying key random variables and for adjusting design to achieve reliability-based objectives. The proposed AIS methodology is demonstrated using a turbine blade reliability analysis problem.
Statistical computation of tolerance limits
NASA Technical Reports Server (NTRS)
Wheeler, J. T.
1993-01-01
Based on a new theory, two computer codes were developed specifically to calculate the exact statistical tolerance limits for normal distributions within unknown means and variances for the one-sided and two-sided cases for the tolerance factor, k. The quantity k is defined equivalently in terms of the noncentral t-distribution by the probability equation. Two of the four mathematical methods employ the theory developed for the numerical simulation. Several algorithms for numerically integrating and iteratively root-solving the working equations are written to augment the program simulation. The program codes generate some tables of k's associated with the varying values of the proportion and sample size for each given probability to show accuracy obtained for small sample sizes.
Asteroid orbital inversion using uniform phase-space sampling
NASA Astrophysics Data System (ADS)
Muinonen, K.; Pentikäinen, H.; Granvik, M.; Oszkiewicz, D.; Virtanen, J.
2014-07-01
We review statistical inverse methods for asteroid orbit computation from a small number of astrometric observations and short time intervals of observations. With the help of Markov-chain Monte Carlo methods (MCMC), we present a novel inverse method that utilizes uniform sampling of the phase space for the orbital elements. The statistical orbital ranging method (Virtanen et al. 2001, Muinonen et al. 2001) was set out to resolve the long-lasting challenges in the initial computation of orbits for asteroids. The ranging method starts from the selection of a pair of astrometric observations. Thereafter, the topocentric ranges and angular deviations in R.A. and Decl. are randomly sampled. The two Cartesian positions allow for the computation of orbital elements and, subsequently, the computation of ephemerides for the observation dates. Candidate orbital elements are included in the sample of accepted elements if the χ^2-value between the observed and computed observations is within a pre-defined threshold. The sample orbital elements obtain weights based on a certain debiasing procedure. When the weights are available, the full sample of orbital elements allows the probabilistic assessments for, e.g., object classification and ephemeris computation as well as the computation of collision probabilities. The MCMC ranging method (Oszkiewicz et al. 2009; see also Granvik et al. 2009) replaces the original sampling algorithm described above with a proposal probability density function (p.d.f.), and a chain of sample orbital elements results in the phase space. MCMC ranging is based on a bivariate Gaussian p.d.f. for the topocentric ranges, and allows for the sampling to focus on the phase-space domain with most of the probability mass. In the virtual-observation MCMC method (Muinonen et al. 2012), the proposal p.d.f. for the orbital elements is chosen to mimic the a posteriori p.d.f. for the elements: first, random errors are simulated for each observation, resulting in a set of virtual observations; second, corresponding virtual least-squares orbital elements are derived using the Nelder-Mead downhill simplex method; third, repeating the procedure two times allows for a computation of a difference for two sets of virtual orbital elements; and, fourth, this orbital-element difference constitutes a symmetric proposal in a random-walk Metropolis-Hastings algorithm, avoiding the explicit computation of the proposal p.d.f. In a discrete approximation, the allowed proposals coincide with the differences that are based on a large number of pre-computed sets of virtual least-squares orbital elements. The virtual-observation MCMC method is thus based on the characterization of the relevant volume in the orbital-element phase space. Here we utilize MCMC to map the phase-space domain of acceptable solutions. We can make use of the proposal p.d.f.s from the MCMC ranging and virtual-observation methods. The present phase-space mapping produces, upon convergence, a uniform sampling of the solution space within a pre-defined χ^2-value. The weights of the sampled orbital elements are then computed on the basis of the corresponding χ^2-values. The present method resembles the original ranging method. On one hand, MCMC mapping is insensitive to local extrema in the phase space and efficiently maps the solution space. This is somewhat contrary to the MCMC methods described above. On the other hand, MCMC mapping can suffer from producing a small number of sample elements with small χ^2-values, in resemblance to the original ranging method. We apply the methods to example near-Earth, main-belt, and transneptunian objects, and highlight the utilization of the methods in the data processing and analysis pipeline of the ESA Gaia space mission.
Biased Metropolis Sampling for Rugged Free Energy Landscapes
NASA Astrophysics Data System (ADS)
Berg, Bernd A.
2003-11-01
Metropolis simulations of all-atom models of peptides (i.e. small proteins) are considered. Inspired by the funnel picture of Bryngelson and Wolyness, a transformation of the updating probabilities of the dihedral angles is defined, which uses probability densities from a higher temperature to improve the algorithmic performance at a lower temperature. The method is suitable for canonical as well as for generalized ensemble simulations. A simple approximation to the full transformation is tested at room temperature for Met-Enkephalin in vacuum. Integrated autocorrelation times are found to be reduced by factors close to two and a similar improvement due to generalized ensemble methods enters multiplicatively.
Probability of detection of defects in coatings with electronic shearography
NASA Technical Reports Server (NTRS)
Russell, S. S.; Lansing, M. D.; Horton, C. M.; Gnacek, W. J.
1995-01-01
The goal of this research was to utilize statistical methods to evaluate the probability of detection (POD) of defects in coatings using electronic shearography. The coating system utilized in the POD studies was to be the paint system currently utilized on the external casings of the NASA space transportation system reusable solid rocket motor boosters. The population of samples was to be large enough to determine the minimum defect size for 90-percent POD of 95-percent confidence POD on these coatings. Also, the best methods to excite coatings on aerospace components to induce deformations for measurement by electronic shearography were to be determined.
Stan : A Probabilistic Programming Language
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carpenter, Bob; Gelman, Andrew; Hoffman, Matthew D.
Stan is a probabilistic programming language for specifying statistical models. A Stan program imperatively defines a log probability function over parameters conditioned on specified data and constants. As of version 2.14.0, Stan provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm. Stan is also a platform for computing log densities and their gradients and Hessians, which can be used in alternative algorithms such as variational Bayes, expectationmore » propagation, and marginal inference using approximate integration. To this end, Stan is set up so that the densities, gradients, and Hessians, along with intermediate quantities of the algorithm such as acceptance probabilities, are easily accessible. Stan can also be called from the command line using the cmdstan package, through R using the rstan package, and through Python using the pystan package. All three interfaces support sampling and optimization-based inference with diagnostics and posterior analysis. rstan and pystan also provide access to log probabilities, gradients, Hessians, parameter transforms, and specialized plotting.« less
Stan : A Probabilistic Programming Language
Carpenter, Bob; Gelman, Andrew; Hoffman, Matthew D.; ...
2017-01-01
Stan is a probabilistic programming language for specifying statistical models. A Stan program imperatively defines a log probability function over parameters conditioned on specified data and constants. As of version 2.14.0, Stan provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm. Stan is also a platform for computing log densities and their gradients and Hessians, which can be used in alternative algorithms such as variational Bayes, expectationmore » propagation, and marginal inference using approximate integration. To this end, Stan is set up so that the densities, gradients, and Hessians, along with intermediate quantities of the algorithm such as acceptance probabilities, are easily accessible. Stan can also be called from the command line using the cmdstan package, through R using the rstan package, and through Python using the pystan package. All three interfaces support sampling and optimization-based inference with diagnostics and posterior analysis. rstan and pystan also provide access to log probabilities, gradients, Hessians, parameter transforms, and specialized plotting.« less
Peck, Michael W.; Plowman, June; Aldus, Clare F.; Wyatt, Gary M.; Penaloza Izurieta, Walter; Stringer, Sandra C.; Barker, Gary C.
2010-01-01
The highly potent botulinum neurotoxins are responsible for botulism, a severe neuroparalytic disease. Strains of nonproteolytic Clostridium botulinum form neurotoxins of types B, E, and F and are the main hazard associated with minimally heated refrigerated foods. Recent developments in quantitative microbiological risk assessment (QMRA) and food safety objectives (FSO) have made food safety more quantitative and include, as inputs, probability distributions for the contamination of food materials and foods. A new method that combines a selective enrichment culture with multiplex PCR has been developed and validated to enumerate specifically the spores of nonproteolytic C. botulinum. Key features of this new method include the following: (i) it is specific for nonproteolytic C. botulinum (and does not detect proteolytic C. botulinum), (ii) the detection limit has been determined for each food tested (using carefully structured control samples), and (iii) a low detection limit has been achieved by the use of selective enrichment and large test samples. The method has been used to enumerate spores of nonproteolytic C. botulinum in 637 samples of 19 food materials included in pasta-based minimally heated refrigerated foods and in 7 complete foods. A total of 32 samples (5 egg pastas and 27 scallops) contained spores of nonproteolytic C. botulinum type B or F. The majority of samples contained <100 spores/kg, but one sample of scallops contained 444 spores/kg. Nonproteolytic C. botulinum type E was not detected. Importantly, for QMRA and FSO, the construction of probability distributions will enable the frequency of packs containing particular levels of contamination to be determined. PMID:20709854
Peck, Michael W; Plowman, June; Aldus, Clare F; Wyatt, Gary M; Izurieta, Walter Penaloza; Stringer, Sandra C; Barker, Gary C
2010-10-01
The highly potent botulinum neurotoxins are responsible for botulism, a severe neuroparalytic disease. Strains of nonproteolytic Clostridium botulinum form neurotoxins of types B, E, and F and are the main hazard associated with minimally heated refrigerated foods. Recent developments in quantitative microbiological risk assessment (QMRA) and food safety objectives (FSO) have made food safety more quantitative and include, as inputs, probability distributions for the contamination of food materials and foods. A new method that combines a selective enrichment culture with multiplex PCR has been developed and validated to enumerate specifically the spores of nonproteolytic C. botulinum. Key features of this new method include the following: (i) it is specific for nonproteolytic C. botulinum (and does not detect proteolytic C. botulinum), (ii) the detection limit has been determined for each food tested (using carefully structured control samples), and (iii) a low detection limit has been achieved by the use of selective enrichment and large test samples. The method has been used to enumerate spores of nonproteolytic C. botulinum in 637 samples of 19 food materials included in pasta-based minimally heated refrigerated foods and in 7 complete foods. A total of 32 samples (5 egg pastas and 27 scallops) contained spores of nonproteolytic C. botulinum type B or F. The majority of samples contained <100 spores/kg, but one sample of scallops contained 444 spores/kg. Nonproteolytic C. botulinum type E was not detected. Importantly, for QMRA and FSO, the construction of probability distributions will enable the frequency of packs containing particular levels of contamination to be determined.
Steinka, Izabela
2004-01-01
Immunoassay methods were used to identify the presence of staphylococcal enterotoxins in lactic acid cheese vacuum and non-vacuum packed. There was assessed the probability of encountering staphylococcal enterotoxin in cheese dependent on different systems of packaging, count of staphylococcal cells, intensiveness of coagulase synthesis and tightness of packaging. The presence of enterotoxin was identified in 5% of researched samples of products stored for 14 days. The influence of packaging system and tightness on presence of enterotoxin was observed. The probability of presence of staphylococcal and enterotoxin in relation to researched factors was presented by the mathematical models.
Design and operation of the national home health aide survey: 2007-2008.
Bercovitz, Anita; Moss, Abigail J; Sengupta, Manisha; Harris-Kojetin, Lauren D; Squillace, Marie R; Emily, Rosenoff; Branden, Laura
2010-03-01
This report provides an overview of the National Home Health Aide Survey (NHHAS), the first national probability survey of home health aides. NHHAS was designed to provide national estimates of home health aides who provided assistance in activities of daily living (ADLs) and were directly employed by agencies that provide home health and/or hospice care. This report discusses the need for and objectives of the survey, the design process, the survey methods, and data availability. METHODS NHHAS, a multistage probability sample survey, was conducted as a supplement to the 2007 National Home and Hospice Care Survey (NHHCS). Agencies providing home health and/or hospice care were sampled, and then aides employed by these agencies were sampled and interviewed by telephone. Survey topics included recruitment, training, job history, family life, client relations, work-related injuries, and demographics. NHHAS was virtually identical to the 2004 National Nursing Assistant Survey of certified nursing assistants employed in sampled nursing homes with minor changes to account for differences in workplace environment and responsibilities. RESULTS From September 2007 to April 2008, interviews were completed with 3,416 aides. A public-use data file that contains the interview responses, sampling weights, and design variables is available. The NHHAS overall response rate weighted by the inverse of the probability of selection was 41 percent. This rate is the product of the weighted first-stage agency response rate of 57 percent (i.e., weighted response rate of 59 percent for agency participation in NHHCS times the weighted response rate of 97 percent for agencies participating in NHHCS that also participated in NHHAS) and the weighted second-stage aide response rate of 72 percent to NHHAS.
Ji, Yuan; Wang, Sue-Jane
2013-01-01
The 3 + 3 design is the most common choice among clinicians for phase I dose-escalation oncology trials. In recent reviews, more than 95% of phase I trials have been based on the 3 + 3 design. Given that it is intuitive and its implementation does not require a computer program, clinicians can conduct 3 + 3 dose escalations in practice with virtually no logistic cost, and trial protocols based on the 3 + 3 design pass institutional review board and biostatistics reviews quickly. However, the performance of the 3 + 3 design has rarely been compared with model-based designs in simulation studies with matched sample sizes. In the vast majority of statistical literature, the 3 + 3 design has been shown to be inferior in identifying true maximum-tolerated doses (MTDs), although the sample size required by the 3 + 3 design is often orders-of-magnitude smaller than model-based designs. In this article, through comparative simulation studies with matched sample sizes, we demonstrate that the 3 + 3 design has higher risks of exposing patients to toxic doses above the MTD than the modified toxicity probability interval (mTPI) design, a newly developed adaptive method. In addition, compared with the mTPI design, the 3 + 3 design does not yield higher probabilities in identifying the correct MTD, even when the sample size is matched. Given that the mTPI design is equally transparent, costless to implement with free software, and more flexible in practical situations, we highly encourage its adoption in early dose-escalation studies whenever the 3 + 3 design is also considered. We provide free software to allow direct comparisons of the 3 + 3 design with other model-based designs in simulation studies with matched sample sizes. PMID:23569307
Capel, P.D.; Larson, S.J.
1995-01-01
Minimizing the loss of target organic chemicals from environmental water samples between the time of sample collection and isolation is important to the integrity of an investigation. During this sample holding time, there is a potential for analyte loss through volatilization from the water to the headspace, sorption to the walls and cap of the sample bottle; and transformation through biotic and/or abiotic reactions. This paper presents a chemodynamic-based, generalized approach to estimate the most probable loss processes for individual target organic chemicals. The basic premise is that the investigator must know which loss process(es) are important for a particular analyte, based on its chemodynamic properties, when choosing the appropriate method(s) to prevent loss.
Aurora, R. Nisha; Putcha, Nirupama; Swartz, Rachel; Punjabi, Naresh M.
2016-01-01
Background Obstructive sleep apnea is a prevalent yet underdiagnosed condition associated with cardiovascular morbidity and mortality. Home sleep testing offers an efficient means for diagnosing obstructive sleep apnea but has primarily been deployed in clinical samples with a high pretest probability. The current study sought to assess if obstructive sleep apnea can be diagnosed with home sleep testing in a non-referred sample without involvement of a sleep medicine specialist. Methods A study of community-based adults with untreated obstructive sleep apnea was undertaken. Misclassification of disease severity based on home sleep testing with and without involvement of a sleep medicine specialist was assessed, and agreement was characterized using scatter plots, Pearson's correlation coefficient, Bland-Altman analysis, and the kappa statistic. Analyses were also conducted to assess whether any observed differences varied as a function of pretest probability of obstructive sleep apnea or subjective sleepiness. Results The sample consisted of 191 subjects with over half (56.5%) having obstructive sleep apnea. Without involvement of a sleep medicine specialist, obstructive sleep apnea was not identified in only 5.8% of the sample. Analyses comparing the categorical assessment of disease severity with and without a sleep medicine specialist showed that in total, 32 subjects (16.8%) were misclassified. Agreement in the disease severity with and without a sleep medicine specialist was not influenced by the pretest probability or daytime sleep tendency. Conclusion Obstructive sleep apnea can be reliably identified with home sleep testing in a non-referred sample irrespective of the pretest probability of the disease. PMID:26968467
Nichols, James D.; Hines, James E.; Pollock, Kenneth H.; Hinz, Robert L.; Link, William A.
1994-01-01
The proportion of animals in a population that breeds is an important determinant of population growth rate. Usual estimates of this quantity from field sampling data assume that the probability of appearing in the capture or count statistic is the same for animals that do and do not breed. A similar assumption is required by most existing methods used to test ecologically interesting hypotheses about reproductive costs using field sampling data. However, in many field sampling situations breeding and nonbreeding animals are likely to exhibit different probabilities of being seen or caught. In this paper, we propose the use of multistate capture-recapture models for these estimation and testing problems. This methodology permits a formal test of the hypothesis of equal capture/sighting probabilities for breeding and nonbreeding individuals. Two estimators of breeding proportion (and associated standard errors) are presented, one for the case of equal capture probabilities and one for the case of unequal capture probabilities. The multistate modeling framework also yields formal tests of hypotheses about reproductive costs to future reproduction or survival or both fitness components. The general methodology is illustrated using capture-recapture data on female meadow voles, Microtus pennsylvanicus. Resulting estimates of the proportion of reproductively active females showed strong seasonal variation, as expected, with low breeding proportions in midwinter. We found no evidence of reproductive costs extracted in subsequent survival or reproduction. We believe that this methodological framework has wide application to problems in animal ecology concerning breeding proportions and phenotypic reproductive costs.
Abramyan, Tigran M.; Hyde-Volpe, David L.; Stuart, Steven J.; Latour, Robert A.
2017-01-01
The use of standard molecular dynamics simulation methods to predict the interactions of a protein with a material surface have the inherent limitations of lacking the ability to determine the most likely conformations and orientations of the adsorbed protein on the surface and to determine the level of convergence attained by the simulation. In addition, standard mixing rules are typically applied to combine the nonbonded force field parameters of the solution and solid phases the system to represent interfacial behavior without validation. As a means to circumvent these problems, the authors demonstrate the application of an efficient advanced sampling method (TIGER2A) for the simulation of the adsorption of hen egg-white lysozyme on a crystalline (110) high-density polyethylene surface plane. Simulations are conducted to generate a Boltzmann-weighted ensemble of sampled states using force field parameters that were validated to represent interfacial behavior for this system. The resulting ensembles of sampled states were then analyzed using an in-house-developed cluster analysis method to predict the most probable orientations and conformations of the protein on the surface based on the amount of sampling performed, from which free energy differences between the adsorbed states were able to be calculated. In addition, by conducting two independent sets of TIGER2A simulations combined with cluster analyses, the authors demonstrate a method to estimate the degree of convergence achieved for a given amount of sampling. The results from these simulations demonstrate that these methods enable the most probable orientations and conformations of an adsorbed protein to be predicted and that the use of our validated interfacial force field parameter set provides closer agreement to available experimental results compared to using standard CHARMM force field parameterization to represent molecular behavior at the interface. PMID:28514864
Hard choices in assessing survival past dams — a comparison of single- and paired-release strategies
Zydlewski, Joseph D.; Stich, Daniel S.; Sigourney, Douglas B.
2017-01-01
Mark–recapture models are widely used to estimate survival of salmon smolts migrating past dams. Paired releases have been used to improve estimate accuracy by removing components of mortality not attributable to the dam. This method is accompanied by reduced precision because (i) sample size is reduced relative to a single, large release; and (ii) variance calculations inflate error. We modeled an idealized system with a single dam to assess trade-offs between accuracy and precision and compared methods using root mean squared error (RMSE). Simulations were run under predefined conditions (dam mortality, background mortality, detection probability, and sample size) to determine scenarios when the paired release was preferable to a single release. We demonstrate that a paired-release design provides a theoretical advantage over a single-release design only at large sample sizes and high probabilities of detection. At release numbers typical of many survival studies, paired release can result in overestimation of dam survival. Failures to meet model assumptions of a paired release may result in further overestimation of dam-related survival. Under most conditions, a single-release strategy was preferable.
Economic Intervention and Parenting: A Randomized Experiment of Statewide Child Development Accounts
ERIC Educational Resources Information Center
Nam, Yunju; Wikoff, Nora; Sherraden, Michael
2016-01-01
Objective: We examine the effects of Child Development Accounts (CDAs) on parenting stress and practices. Methods: We use data from the SEED for Oklahoma Kids (SEED OK) experiment. SEED OK selected caregivers of infants from Oklahoma birth certificates using a probability sampling method, randomly assigned caregivers to the treatment (n = 1,132)…
Random Numbers and Monte Carlo Methods
NASA Astrophysics Data System (ADS)
Scherer, Philipp O. J.
Many-body problems often involve the calculation of integrals of very high dimension which cannot be treated by standard methods. For the calculation of thermodynamic averages Monte Carlo methods are very useful which sample the integration volume at randomly chosen points. After summarizing some basic statistics, we discuss algorithms for the generation of pseudo-random numbers with given probability distribution which are essential for all Monte Carlo methods. We show how the efficiency of Monte Carlo integration can be improved by sampling preferentially the important configurations. Finally the famous Metropolis algorithm is applied to classical many-particle systems. Computer experiments visualize the central limit theorem and apply the Metropolis method to the traveling salesman problem.
Flood Frequency Curves - Use of information on the likelihood of extreme floods
NASA Astrophysics Data System (ADS)
Faber, B.
2011-12-01
Investment in the infrastructure that reduces flood risk for flood-prone communities must incorporate information on the magnitude and frequency of flooding in that area. Traditionally, that information has been a probability distribution of annual maximum streamflows developed from the historical gaged record at a stream site. Practice in the United States fits a Log-Pearson type3 distribution to the annual maximum flows of an unimpaired streamflow record, using the method of moments to estimate distribution parameters. The procedure makes the assumptions that annual peak streamflow events are (1) independent, (2) identically distributed, and (3) form a representative sample of the overall probability distribution. Each of these assumptions can be challenged. We rarely have enough data to form a representative sample, and therefore must compute and display the uncertainty in the estimated flood distribution. But, is there a wet/dry cycle that makes precipitation less than independent between successive years? Are the peak flows caused by different types of events from different statistical populations? How does the watershed or climate changing over time (non-stationarity) affect the probability distribution floods? Potential approaches to avoid these assumptions vary from estimating trend and shift and removing them from early data (and so forming a homogeneous data set), to methods that estimate statistical parameters that vary with time. A further issue in estimating a probability distribution of flood magnitude (the flood frequency curve) is whether a purely statistical approach can accurately capture the range and frequency of floods that are of interest. A meteorologically-based analysis produces "probable maximum precipitation" (PMP) and subsequently a "probable maximum flood" (PMF) that attempts to describe an upper bound on flood magnitude in a particular watershed. This analysis can help constrain the upper tail of the probability distribution, well beyond the range of gaged data or even historical or paleo-flood data, which can be very important in risk analyses performed for flood risk management and dam and levee safety studies.
Approximation of Failure Probability Using Conditional Sampling
NASA Technical Reports Server (NTRS)
Giesy. Daniel P.; Crespo, Luis G.; Kenney, Sean P.
2008-01-01
In analyzing systems which depend on uncertain parameters, one technique is to partition the uncertain parameter domain into a failure set and its complement, and judge the quality of the system by estimating the probability of failure. If this is done by a sampling technique such as Monte Carlo and the probability of failure is small, accurate approximation can require so many sample points that the computational expense is prohibitive. Previous work of the authors has shown how to bound the failure event by sets of such simple geometry that their probabilities can be calculated analytically. In this paper, it is shown how to make use of these failure bounding sets and conditional sampling within them to substantially reduce the computational burden of approximating failure probability. It is also shown how the use of these sampling techniques improves the confidence intervals for the failure probability estimate for a given number of sample points and how they reduce the number of sample point analyses needed to achieve a given level of confidence.
Jung, R.E.; Royle, J. Andrew; Sauer, J.R.; Addison, C.; Rau, R.D.; Shirk, J.L.; Whissel, J.C.
2005-01-01
Stream salamanders in the family Plethodontidae constitute a large biomass in and near headwater streams in the eastern United States and are promising indicators of stream ecosystem health. Many studies of stream salamanders have relied on population indices based on counts rather than population estimates based on techniques such as capture-recapture and removal. Application of estimation procedures allows the calculation of detection probabilities (the proportion of total animals present that are detected during a survey) and their associated sampling error, and may be essential for determining salamander population sizes and trends. In 1999, we conducted capture-recapture and removal population estimation methods for Desmognathus salamanders at six streams in Shenandoah National Park, Virginia, USA. Removal sampling appeared more efficient and detection probabilities from removal data were higher than those from capture-recapture. During 2001-2004, we used removal estimation at eight streams in the park to assess the usefulness of this technique for long-term monitoring of stream salamanders. Removal detection probabilities ranged from 0.39 to 0.96 for Desmognathus, 0.27 to 0.89 for Eurycea and 0.27 to 0.75 for northern spring (Gyrinophilus porphyriticus) and northern red (Pseudotriton ruber) salamanders across stream transects. Detection probabilities did not differ across years for Desmognathus and Eurycea, but did differ among streams for Desmognathus. Population estimates of Desmognathus decreased between 2001-2002 and 2003-2004 which may be related to changes in stream flow conditions. Removal-based procedures may be a feasible approach for population estimation of salamanders, but field methods should be designed to meet the assumptions of the sampling procedures. New approaches to estimating stream salamander populations are discussed.
[Respondent-Driven Sampling: a new sampling method to study visible and hidden populations].
Mantecón, Alejandro; Juan, Montse; Calafat, Amador; Becoña, Elisardo; Román, Encarna
2008-01-01
The paper introduces a variant of chain-referral sampling: respondent-driven sampling (RDS). This sampling method shows that methods based on network analysis can be combined with the statistical validity of standard probability sampling methods. In this sense, RDS appears to be a mathematical improvement of snowball sampling oriented to the study of hidden populations. However, we try to prove its validity with populations that are not within a sampling frame but can nonetheless be contacted without difficulty. The basics of RDS are explained through our research on young people (aged 14 to 25) who go clubbing, consume alcohol and other drugs, and have sex. Fieldwork was carried out between May and July 2007 in three Spanish regions: Baleares, Galicia and Comunidad Valenciana. The presentation of the study shows the utility of this type of sampling when the population is accessible but there is a difficulty deriving from the lack of a sampling frame. However, the sample obtained is not a random representative one in statistical terms of the target population. It must be acknowledged that the final sample is representative of a 'pseudo-population' that approximates to the target population but is not identical to it.
Technology Development Risk Assessment for Space Transportation Systems
NASA Technical Reports Server (NTRS)
Mathias, Donovan L.; Godsell, Aga M.; Go, Susie
2006-01-01
A new approach for assessing development risk associated with technology development projects is presented. The method represents technology evolution in terms of sector-specific discrete development stages. A Monte Carlo simulation is used to generate development probability distributions based on statistical models of the discrete transitions. Development risk is derived from the resulting probability distributions and specific program requirements. Two sample cases are discussed to illustrate the approach, a single rocket engine development and a three-technology space transportation portfolio.
Park, Douglas L; Coates, Scott; Brewer, Vickery A; Garber, Eric A E; Abouzied, Mohamed; Johnson, Kurt; Ritter, Bruce; McKenzie, Deborah
2005-01-01
Performance Tested Method multiple laboratory validations for the detection of peanut protein in 4 different food matrixes were conducted under the auspices of the AOAC Research Institute. In this blind study, 3 commercially available ELISA test kits were validated: Neogen Veratox for Peanut, R-Biopharm RIDASCREEN FAST Peanut, and Tepnel BioKits for Peanut Assay. The food matrixes used were breakfast cereal, cookies, ice cream, and milk chocolate spiked at 0 and 5 ppm peanut. Analyses of the samples were conducted by laboratories representing industry and international and U.S governmental agencies. All 3 commercial test kits successfully identified spiked and peanut-free samples. The validation study required 60 analyses on test samples at the target level 5 microg peanut/g food and 60 analyses at a peanut-free level, which was designed to ensure that the lower 95% confidence limit for the sensitivity and specificity would not be <90%. The probability that a test sample contains an allergen given a prevalence rate of 5% and a positive test result using a single test kit analysis with 95% sensitivity and 95% specificity, which was demonstrated for these test kits, would be 50%. When 2 test kits are run simultaneously on all samples, the probability becomes 95%. It is therefore recommended that all field samples be analyzed with at least 2 of the validated kits.
Data Analysis Recipes: Using Markov Chain Monte Carlo
NASA Astrophysics Data System (ADS)
Hogg, David W.; Foreman-Mackey, Daniel
2018-05-01
Markov Chain Monte Carlo (MCMC) methods for sampling probability density functions (combined with abundant computational resources) have transformed the sciences, especially in performing probabilistic inferences, or fitting models to data. In this primarily pedagogical contribution, we give a brief overview of the most basic MCMC method and some practical advice for the use of MCMC in real inference problems. We give advice on method choice, tuning for performance, methods for initialization, tests of convergence, troubleshooting, and use of the chain output to produce or report parameter estimates with associated uncertainties. We argue that autocorrelation time is the most important test for convergence, as it directly connects to the uncertainty on the sampling estimate of any quantity of interest. We emphasize that sampling is a method for doing integrals; this guides our thinking about how MCMC output is best used. .
Meirovitch, Hagai
2010-01-01
The commonly used simulation techniques, Metropolis Monte Carlo (MC) and molecular dynamics (MD) are of a dynamical type which enables one to sample system configurations i correctly with the Boltzmann probability, P(i)(B), while the value of P(i)(B) is not provided directly; therefore, it is difficult to obtain the absolute entropy, S approximately -ln P(i)(B), and the Helmholtz free energy, F. With a different simulation approach developed in polymer physics, a chain is grown step-by-step with transition probabilities (TPs), and thus their product is the value of the construction probability; therefore, the entropy is known. Because all exact simulation methods are equivalent, i.e. they lead to the same averages and fluctuations of physical properties, one can treat an MC or MD sample as if its members have rather been generated step-by-step. Thus, each configuration i of the sample can be reconstructed (from nothing) by calculating the TPs with which it could have been constructed. This idea applies also to bulk systems such as fluids or magnets. This approach has led earlier to the "local states" (LS) and the "hypothetical scanning" (HS) methods, which are approximate in nature. A recent development is the hypothetical scanning Monte Carlo (HSMC) (or molecular dynamics, HSMD) method which is based on stochastic TPs where all interactions are taken into account. In this respect, HSMC(D) can be viewed as exact and the only approximation involved is due to insufficient MC(MD) sampling for calculating the TPs. The validity of HSMC has been established by applying it first to liquid argon, TIP3P water, self-avoiding walks (SAW), and polyglycine models, where the results for F were found to agree with those obtained by other methods. Subsequently, HSMD was applied to mobile loops of the enzymes porcine pancreatic alpha-amylase and acetylcholinesterase in explicit water, where the difference in F between the bound and free states of the loop was calculated. Currently, HSMD is being extended for calculating the absolute and relative free energies of ligand-enzyme binding. We describe the whole approach and discuss future directions. 2009 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Liu, Xiao-Ming; Jiang, Jun; Hong, Ling; Tang, Dafeng
In this paper, a new method of Generalized Cell Mapping with Sampling-Adaptive Interpolation (GCMSAI) is presented in order to enhance the efficiency of the computation of one-step probability transition matrix of the Generalized Cell Mapping method (GCM). Integrations with one mapping step are replaced by sampling-adaptive interpolations of third order. An explicit formula of interpolation error is derived for a sampling-adaptive control to switch on integrations for the accuracy of computations with GCMSAI. By applying the proposed method to a two-dimensional forced damped pendulum system, global bifurcations are investigated with observations of boundary metamorphoses including full to partial and partial to partial as well as the birth of fully Wada boundary. Moreover GCMSAI requires a computational time of one thirtieth up to one fiftieth compared to that of the previous GCM.
Thieke, Christian; Nill, Simeon; Oelfke, Uwe; Bortfeld, Thomas
2002-05-01
In inverse planning for intensity-modulated radiotherapy, the dose calculation is a crucial element limiting both the maximum achievable plan quality and the speed of the optimization process. One way to integrate accurate dose calculation algorithms into inverse planning is to precalculate the dose contribution of each beam element to each voxel for unit fluence. These precalculated values are stored in a big dose calculation matrix. Then the dose calculation during the iterative optimization process consists merely of matrix look-up and multiplication with the actual fluence values. However, because the dose calculation matrix can become very large, this ansatz requires a lot of computer memory and is still very time consuming, making it not practical for clinical routine without further modifications. In this work we present a new method to significantly reduce the number of entries in the dose calculation matrix. The method utilizes the fact that a photon pencil beam has a rapid radial dose falloff, and has very small dose values for the most part. In this low-dose part of the pencil beam, the dose contribution to a voxel is only integrated into the dose calculation matrix with a certain probability. Normalization with the reciprocal of this probability preserves the total energy, even though many matrix elements are omitted. Three probability distributions were tested to find the most accurate one for a given memory size. The sampling method is compared with the use of a fully filled matrix and with the well-known method of just cutting off the pencil beam at a certain lateral distance. A clinical example of a head and neck case is presented. It turns out that a sampled dose calculation matrix with only 1/3 of the entries of the fully filled matrix does not sacrifice the quality of the resulting plans, whereby the cutoff method results in a suboptimal treatment plan.
Environmental DNA as a Tool for Inventory and Monitoring of Aquatic Vertebrates
2017-07-01
geomorphic calculations and description of each reach. Methods Channel Surveys We initially selected reaches based on access and visual indicators...WA 99164 I-2 Environmental DNA lab protocol: designing species-specific qPCR assays Species-specific surveys should use quantitative polymerase...to traditional field sampling with respect to sensitivity, detection probabilities, and cost efficiency. Compared to field surveys , eDNA sampling
Ding, Aidong Adam; Hsieh, Jin-Jian; Wang, Weijing
2015-01-01
Bivariate survival analysis has wide applications. In the presence of covariates, most literature focuses on studying their effects on the marginal distributions. However covariates can also affect the association between the two variables. In this article we consider the latter issue by proposing a nonstandard local linear estimator for the concordance probability as a function of covariates. Under the Clayton copula, the conditional concordance probability has a simple one-to-one correspondence with the copula parameter for different data structures including those subject to independent or dependent censoring and dependent truncation. The proposed method can be used to study how covariates affect the Clayton association parameter without specifying marginal regression models. Asymptotic properties of the proposed estimators are derived and their finite-sample performances are examined via simulations. Finally, for illustration, we apply the proposed method to analyze a bone marrow transplant data set.
Method and apparatus for detecting a desired behavior in digital image data
Kegelmeyer, Jr., W. Philip
1997-01-01
A method for detecting stellate lesions in digitized mammographic image data includes the steps of prestoring a plurality of reference images, calculating a plurality of features for each of the pixels of the reference images, and creating a binary decision tree from features of randomly sampled pixels from each of the reference images. Once the binary decision tree has been created, a plurality of features, preferably including an ALOE feature (analysis of local oriented edges), are calculated for each of the pixels of the digitized mammographic data. Each of these plurality of features of each pixel are input into the binary decision tree and a probability is determined, for each of the pixels, corresponding to the likelihood of the presence of a stellate lesion, to create a probability image. Finally, the probability image is spatially filtered to enforce local consensus among neighboring pixels and the spatially filtered image is output.
Method and apparatus for detecting a desired behavior in digital image data
Kegelmeyer, Jr., W. Philip
1997-01-01
A method for detecting stellate lesions in digitized mammographic image data includes the steps of prestoring a plurality of reference images, calculating a plurality of features for each of the pixels of the reference images, and creating a binary decision tree from features of randomly sampled pixels from each of the reference images. Once the binary decision tree has been created, a plurality of features, preferably including an ALOE feature (analysis of local oriented edges), are calculated for each of the pixels of the digitized mammographic data. Each of these plurality of features of each pixel are input into the binary decision tree and a probability is determined, for each of the pixels, corresponding to the likelihood of the presence of a stellate lesion, to create a probability image. Finally, the probability image is spacially filtered to enforce local consensus among neighboring pixels and the spacially filtered image is output.
A method for the extraction and quantitation of phycoerythrin from algae
NASA Technical Reports Server (NTRS)
Stewart, D. E.
1982-01-01
A summary of a new technique for the extraction and quantitation of phycoerythrin (PHE) from algal samples is described. Results of analysis of four extracts representing three PHE types from algae including cryptomonad and cyanophyte types are presented. The method of extraction and an equation for quantitation are given. A graph showing the relationship of concentration and fluorescence units that may be used with samples fluorescing around 575-580 nm (probably dominated by cryptophytes in estuarine waters) and 560 nm (dominated by cyanophytes characteristics of the open ocean) is provided.
Methods of scaling threshold color difference using printed samples
NASA Astrophysics Data System (ADS)
Huang, Min; Cui, Guihua; Liu, Haoxue; Luo, M. Ronnier
2012-01-01
A series of printed samples on substrate of semi-gloss paper and with the magnitude of threshold color difference were prepared for scaling the visual color difference and to evaluate the performance of different method. The probabilities of perceptibly was used to normalized to Z-score and different color differences were scaled to the Z-score. The visual color difference was got, and checked with the STRESS factor. The results indicated that only the scales have been changed but the relative scales between pairs in the data are preserved.
Eskiyurt, Reyhan; Ozkan, Birgul
2017-01-01
Aim: This study was carried out to determine the reasons of the suicide probability and reasons for living of the inpatients hospitalized at the psychiatry clinic and to analyze the relationship between them. Materials and Methods: The sample of the study consisted of 192 patients who were hospitalized in psychiatric clinics between February and May 2016 and who agreed to participate in the study. In collecting data, personal information form, suicide probability scale (SPS), reasons for living inventory (RFL), and Beck's depression inventory (BDI) were used. Stepwise regression method was used to determine the factors that predict suicide probability. Results: In the study, as a result of analyses made, the median score on the SPS was found 76.0, the median score on the RFL was found 137.0, the median score on the BDI of the patients was found 13.5, and it was found that patients with a high probability of suicide had less reasons for living and that their depression levels were very high. As a result of stepwise regression analysis, it was determined that suicidal ideation, reasons for living, maltreatment, education level, age, and income status were the predictors of suicide probability (F = 61.125; P < 0.001). Discussion: It was found that the patients who hospitalized in the psychiatric clinic have high suicide probability and the reasons of living are strong predictors of suicide probability in accordance with the literature. PMID:29497185
Audio feature extraction using probability distribution function
NASA Astrophysics Data System (ADS)
Suhaib, A.; Wan, Khairunizam; Aziz, Azri A.; Hazry, D.; Razlan, Zuradzman M.; Shahriman A., B.
2015-05-01
Voice recognition has been one of the popular applications in robotic field. It is also known to be recently used for biometric and multimedia information retrieval system. This technology is attained from successive research on audio feature extraction analysis. Probability Distribution Function (PDF) is a statistical method which is usually used as one of the processes in complex feature extraction methods such as GMM and PCA. In this paper, a new method for audio feature extraction is proposed which is by using only PDF as a feature extraction method itself for speech analysis purpose. Certain pre-processing techniques are performed in prior to the proposed feature extraction method. Subsequently, the PDF result values for each frame of sampled voice signals obtained from certain numbers of individuals are plotted. From the experimental results obtained, it can be seen visually from the plotted data that each individuals' voice has comparable PDF values and shapes.
Mori, Yoshiharu; Okumura, Hisashi
2015-12-05
Simulated tempering (ST) is a useful method to enhance sampling of molecular simulations. When ST is used, the Metropolis algorithm, which satisfies the detailed balance condition, is usually applied to calculate the transition probability. Recently, an alternative method that satisfies the global balance condition instead of the detailed balance condition has been proposed by Suwa and Todo. In this study, ST method with the Suwa-Todo algorithm is proposed. Molecular dynamics simulations with ST are performed with three algorithms (the Metropolis, heat bath, and Suwa-Todo algorithms) to calculate the transition probability. Among the three algorithms, the Suwa-Todo algorithm yields the highest acceptance ratio and the shortest autocorrelation time. These suggest that sampling by a ST simulation with the Suwa-Todo algorithm is most efficient. In addition, because the acceptance ratio of the Suwa-Todo algorithm is higher than that of the Metropolis algorithm, the number of temperature states can be reduced by 25% for the Suwa-Todo algorithm when compared with the Metropolis algorithm. © 2015 Wiley Periodicals, Inc.
Williamson, Graham R
2003-11-01
This paper discusses the theoretical limitations of the use of random sampling and probability theory in the production of a significance level (or P-value) in nursing research. Potential alternatives, in the form of randomization tests, are proposed. Research papers in nursing, medicine and psychology frequently misrepresent their statistical findings, as the P-values reported assume random sampling. In this systematic review of studies published between January 1995 and June 2002 in the Journal of Advanced Nursing, 89 (68%) studies broke this assumption because they used convenience samples or entire populations. As a result, some of the findings may be questionable. The key ideas of random sampling and probability theory for statistical testing (for generating a P-value) are outlined. The result of a systematic review of research papers published in the Journal of Advanced Nursing is then presented, showing how frequently random sampling appears to have been misrepresented. Useful alternative techniques that might overcome these limitations are then discussed. REVIEW LIMITATIONS: This review is limited in scope because it is applied to one journal, and so the findings cannot be generalized to other nursing journals or to nursing research in general. However, it is possible that other nursing journals are also publishing research articles based on the misrepresentation of random sampling. The review is also limited because in several of the articles the sampling method was not completely clearly stated, and in this circumstance a judgment has been made as to the sampling method employed, based on the indications given by author(s). Quantitative researchers in nursing should be very careful that the statistical techniques they use are appropriate for the design and sampling methods of their studies. If the techniques they employ are not appropriate, they run the risk of misinterpreting findings by using inappropriate, unrepresentative and biased samples.
Irish study of high-density Schizophrenia families: Field methods and power to detect linkage
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kendler, K.S.; Straub, R.E.; MacLean, C.J.
Large samples of multiplex pedigrees will probably be needed to detect susceptibility loci for schizophrenia by linkage analysis. Standardized ascertainment of such pedigrees from culturally and ethnically homogeneous populations may improve the probability of detection and replication of linkage. The Irish Study of High-Density Schizophrenia Families (ISHDSF) was formed from standardized ascertainment of multiplex schizophrenia families in 39 psychiatric facilities covering over 90% of the population in Ireland and Northern Ireland. We here describe a phenotypic sample and a subset thereof, the linkage sample. Individuals were included in the phenotypic sample if adequate diagnostic information, based on personal interview and/ormore » hospital record, was available. Only individuals with available DNA were included in the linkage sample. Inclusion of a pedigree into the phenotypic sample required at least two first, second, or third degree relatives with non-affective psychosis (NAP), one of whom had schizophrenia (S) or poor-outcome schizoaffective disorder (PO-SAD). Entry into the linkage sample required DNA samples on at least two individuals with NAP, of whom at least one had S or PO-SAD. Affection was defined by narrow, intermediate, and broad criteria. 75 refs., 6 tabs.« less
Schmelzle, Molly C; Kinziger, Andrew P
2016-07-01
Environmental DNA (eDNA) monitoring approaches promise to greatly improve detection of rare, endangered and invasive species in comparison with traditional field approaches. Herein, eDNA approaches and traditional seining methods were applied at 29 research locations to compare method-specific estimates of detection and occupancy probabilities for endangered tidewater goby (Eucyclogobius newberryi). At each location, multiple paired seine hauls and water samples for eDNA analysis were taken, ranging from two to 23 samples per site, depending upon habitat size. Analysis using a multimethod occupancy modelling framework indicated that the probability of detection using eDNA was nearly double (0.74) the rate of detection for seining (0.39). The higher detection rates afforded by eDNA allowed determination of tidewater goby occupancy at two locations where they have not been previously detected and at one location considered to be locally extirpated. Additionally, eDNA concentration was positively related to tidewater goby catch per unit effort, suggesting eDNA could potentially be used as a proxy for local tidewater goby abundance. Compared to traditional field sampling, eDNA provided improved occupancy parameter estimates and can be applied to increase management efficiency across a broad spatial range and within a diversity of habitats. © 2015 John Wiley & Sons Ltd.
Experimental transition probabilities for Mn II spectral lines
NASA Astrophysics Data System (ADS)
Manrique, J.; Aguilera, J. A.; Aragón, C.
2018-06-01
Transition probabilities for 46 spectral lines of Mn II with wavelengths in the range 2000-3500 Å have been measured by CSigma laser-induced breakdown spectroscopy (Cσ-LIBS). For 28 of the lines, experimental data had not been reported previously. The Cσ-LIBS method, based in the construction of generalized curves of growth called Cσ graphs, avoids the error due to self-absorption. The samples used to generate the laser-induced plasmas are fused glass disks prepared from pure MnO. The Mn concentrations in the samples and the lines included in the study are selected to ensure the validity of the model of homogeneous plasma used. The results are compared to experimental and theoretical values available in the literature.
Factors related to nonuse of seat belts in Michigan.
DOT National Transportation Integrated Search
1987-09-01
This study combined direct observation of seat belt use with interview methods to : identify factors related to seat belt use in a state with a mandatory seat belt use law. Trained : observers recorded restraint use for a probability sample of motori...
Sampling design for long-term regional trends in marine rocky intertidal communities
Irvine, Gail V.; Shelley, Alice
2013-01-01
Probability-based designs reduce bias and allow inference of results to the pool of sites from which they were chosen. We developed and tested probability-based designs for monitoring marine rocky intertidal assemblages at Glacier Bay National Park and Preserve (GLBA), Alaska. A multilevel design was used that varied in scale and inference. The levels included aerial surveys, extensive sampling of 25 sites, and more intensive sampling of 6 sites. Aerial surveys of a subset of intertidal habitat indicated that the original target habitat of bedrock-dominated sites with slope ≤30° was rare. This unexpected finding illustrated one value of probability-based surveys and led to a shift in the target habitat type to include steeper, more mixed rocky habitat. Subsequently, we evaluated the statistical power of different sampling methods and sampling strategies to detect changes in the abundances of the predominant sessile intertidal taxa: barnacles Balanomorpha, the mussel Mytilus trossulus, and the rockweed Fucus distichus subsp. evanescens. There was greatest power to detect trends in Mytilus and lesser power for barnacles and Fucus. Because of its greater power, the extensive, coarse-grained sampling scheme was adopted in subsequent years over the intensive, fine-grained scheme. The sampling attributes that had the largest effects on power included sampling of “vertical” line transects (vs. horizontal line transects or quadrats) and increasing the number of sites. We also evaluated the power of several management-set parameters. Given equal sampling effort, sampling more sites fewer times had greater power. The information gained through intertidal monitoring is likely to be useful in assessing changes due to climate, including ocean acidification; invasive species; trampling effects; and oil spills.
NASA Technical Reports Server (NTRS)
Kim, Hyun Jung; Choi, Sang H.; Bae, Hyung-Bin; Lee, Tae Woo
2012-01-01
The National Aeronautics and Space Administration-invented X-ray diffraction (XRD) methods, including the total defect density measurement method and the spatial wafer mapping method, have confirmed super hetero epitaxy growth for rhombohedral single crystalline silicon germanium (Si1-xGex) on a c-plane sapphire substrate. However, the XRD method cannot observe the surface morphology or roughness because of the method s limited resolution. Therefore the authors used transmission electron microscopy (TEM) with samples prepared in two ways, the focused ion beam (FIB) method and the tripod method to study the structure between Si1-xGex and sapphire substrate and Si1?xGex itself. The sample preparation for TEM should be as fast as possible so that the sample should contain few or no artifacts induced by the preparation. The standard sample preparation method of mechanical polishing often requires a relatively long ion milling time (several hours), which increases the probability of inducing defects into the sample. The TEM sampling of the Si1-xGex on sapphire is also difficult because of the sapphire s high hardness and mechanical instability. The FIB method and the tripod method eliminate both problems when performing a cross-section TEM sampling of Si1-xGex on c-plane sapphire, which shows the surface morphology, the interface between film and substrate, and the crystal structure of the film. This paper explains the FIB sampling method and the tripod sampling method, and why sampling Si1-xGex, on a sapphire substrate with TEM, is necessary.
Enumeration of Vibrio cholerae O1 in Bangladesh waters by fluorescent-antibody direct viable count.
Brayton, P R; Tamplin, M L; Huq, A; Colwell, R R
1987-01-01
A field trial to enumerate Vibrio cholerae O1 in aquatic environments in Bangladesh was conducted, comparing fluorescent-antibody direct viable count with culture detection by the most-probable-number index. Specificity of a monoclonal antibody prepared against the O1 antigen was assessed and incorporated into the fluorescence staining method. All pond and water samples yielded higher counts of viable V. cholerae O1 by fluorescent-antibody direct viable count than by the most-probable-number index. Fluorescence microscopy is a more sensitive detection system than culture methods because it allows the enumeration of both culturable and nonculturable cells and therefore provides more precise monitoring of microbiological water quality. PMID:3324967
Optimized nested Markov chain Monte Carlo sampling: theory
DOE Office of Scientific and Technical Information (OSTI.GOV)
Coe, Joshua D; Shaw, M Sam; Sewell, Thomas D
2009-01-01
Metropolis Monte Carlo sampling of a reference potential is used to build a Markov chain in the isothermal-isobaric ensemble. At the endpoints of the chain, the energy is reevaluated at a different level of approximation (the 'full' energy) and a composite move encompassing all of the intervening steps is accepted on the basis of a modified Metropolis criterion. By manipulating the thermodynamic variables characterizing the reference system we maximize the average acceptance probability of composite moves, lengthening significantly the random walk made between consecutive evaluations of the full energy at a fixed acceptance probability. This provides maximally decorrelated samples ofmore » the full potential, thereby lowering the total number required to build ensemble averages of a given variance. The efficiency of the method is illustrated using model potentials appropriate to molecular fluids at high pressure. Implications for ab initio or density functional theory (DFT) treatment are discussed.« less
Validation of Statistical Sampling Algorithms in Visual Sample Plan (VSP): Summary Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nuffer, Lisa L; Sego, Landon H.; Wilson, John E.
2009-02-18
The U.S. Department of Homeland Security, Office of Technology Development (OTD) contracted with a set of U.S. Department of Energy national laboratories, including the Pacific Northwest National Laboratory (PNNL), to write a Remediation Guidance for Major Airports After a Chemical Attack. The report identifies key activities and issues that should be considered by a typical major airport following an incident involving release of a toxic chemical agent. Four experimental tasks were identified that would require further research in order to supplement the Remediation Guidance. One of the tasks, Task 4, OTD Chemical Remediation Statistical Sampling Design Validation, dealt with statisticalmore » sampling algorithm validation. This report documents the results of the sampling design validation conducted for Task 4. In 2005, the Government Accountability Office (GAO) performed a review of the past U.S. responses to Anthrax terrorist cases. Part of the motivation for this PNNL report was a major GAO finding that there was a lack of validated sampling strategies in the U.S. response to Anthrax cases. The report (GAO 2005) recommended that probability-based methods be used for sampling design in order to address confidence in the results, particularly when all sample results showed no remaining contamination. The GAO also expressed a desire that the methods be validated, which is the main purpose of this PNNL report. The objective of this study was to validate probability-based statistical sampling designs and the algorithms pertinent to within-building sampling that allow the user to prescribe or evaluate confidence levels of conclusions based on data collected as guided by the statistical sampling designs. Specifically, the designs found in the Visual Sample Plan (VSP) software were evaluated. VSP was used to calculate the number of samples and the sample location for a variety of sampling plans applied to an actual release site. Most of the sampling designs validated are probability based, meaning samples are located randomly (or on a randomly placed grid) so no bias enters into the placement of samples, and the number of samples is calculated such that IF the amount and spatial extent of contamination exceeds levels of concern, at least one of the samples would be taken from a contaminated area, at least X% of the time. Hence, "validation" of the statistical sampling algorithms is defined herein to mean ensuring that the "X%" (confidence) is actually met.« less
A fast and objective multidimensional kernel density estimation method: fastKDE
O'Brien, Travis A.; Kashinath, Karthik; Cavanaugh, Nicholas R.; ...
2016-03-07
Numerous facets of scientific research implicitly or explicitly call for the estimation of probability densities. Histograms and kernel density estimates (KDEs) are two commonly used techniques for estimating such information, with the KDE generally providing a higher fidelity representation of the probability density function (PDF). Both methods require specification of either a bin width or a kernel bandwidth. While techniques exist for choosing the kernel bandwidth optimally and objectively, they are computationally intensive, since they require repeated calculation of the KDE. A solution for objectively and optimally choosing both the kernel shape and width has recently been developed by Bernacchiamore » and Pigolotti (2011). While this solution theoretically applies to multidimensional KDEs, it has not been clear how to practically do so. A method for practically extending the Bernacchia-Pigolotti KDE to multidimensions is introduced. This multidimensional extension is combined with a recently-developed computational improvement to their method that makes it computationally efficient: a 2D KDE on 10 5 samples only takes 1 s on a modern workstation. This fast and objective KDE method, called the fastKDE method, retains the excellent statistical convergence properties that have been demonstrated for univariate samples. The fastKDE method exhibits statistical accuracy that is comparable to state-of-the-science KDE methods publicly available in R, and it produces kernel density estimates several orders of magnitude faster. The fastKDE method does an excellent job of encoding covariance information for bivariate samples. This property allows for direct calculation of conditional PDFs with fastKDE. It is demonstrated how this capability might be leveraged for detecting non-trivial relationships between quantities in physical systems, such as transitional behavior.« less
A sampling algorithm for segregation analysis
Tier, Bruce; Henshall, John
2001-01-01
Methods for detecting Quantitative Trait Loci (QTL) without markers have generally used iterative peeling algorithms for determining genotype probabilities. These algorithms have considerable shortcomings in complex pedigrees. A Monte Carlo Markov chain (MCMC) method which samples the pedigree of the whole population jointly is described. Simultaneous sampling of the pedigree was achieved by sampling descent graphs using the Metropolis-Hastings algorithm. A descent graph describes the inheritance state of each allele and provides pedigrees guaranteed to be consistent with Mendelian sampling. Sampling descent graphs overcomes most, if not all, of the limitations incurred by iterative peeling algorithms. The algorithm was able to find the QTL in most of the simulated populations. However, when the QTL was not modeled or found then its effect was ascribed to the polygenic component. No QTL were detected when they were not simulated. PMID:11742631
Azzolina, Nicholas A; Small, Mitchell J; Nakles, David V; Glazewski, Kyle A; Peck, Wesley D; Gorecki, Charles D; Bromhal, Grant S; Dilmore, Robert M
2015-01-20
This work uses probabilistic methods to simulate a hypothetical geologic CO2 storage site in a depleted oil and gas field, where the large number of legacy wells would make it cost-prohibitive to sample all wells for all measurements as part of the postinjection site care. Deep well leakage potential scores were assigned to the wells using a random subsample of 100 wells from a detailed study of 826 legacy wells that penetrate the basal Cambrian formation on the U.S. side of the U.S./Canadian border. Analytical solutions and Monte Carlo simulations were used to quantify the statistical power of selecting a leaking well. Power curves were developed as a function of (1) the number of leaking wells within the Area of Review; (2) the sampling design (random or judgmental, choosing first the wells with the highest deep leakage potential scores); (3) the number of wells included in the monitoring sampling plan; and (4) the relationship between a well’s leakage potential score and its relative probability of leakage. Cases where the deep well leakage potential scores are fully or partially informative of the relative leakage probability are compared to a noninformative base case in which leakage is equiprobable across all wells in the Area of Review. The results show that accurate prior knowledge about the probability of well leakage adds measurable value to the ability to detect a leaking well during the monitoring program, and that the loss in detection ability due to imperfect knowledge of the leakage probability can be quantified. This work underscores the importance of a data-driven, risk-based monitoring program that incorporates uncertainty quantification into long-term monitoring sampling plans at geologic CO2 storage sites.
Evaluation of seven aquatic sampling methods for amphibians and other aquatic fauna
Gunzburger, M.S.
2007-01-01
To design effective and efficient research and monitoring programs researchers must have a thorough understanding of the capabilities and limitations of their sampling methods. Few direct comparative studies exist for aquatic sampling methods for amphibians. The objective of this study was to simultaneously employ seven aquatic sampling methods in 10 wetlands to compare amphibian species richness and number of individuals detected with each method. Four sampling methods allowed counts of individuals (metal dipnet, D-frame dipnet, box trap, crayfish trap), whereas the other three methods allowed detection of species (visual encounter, aural, and froglogger). Amphibian species richness was greatest with froglogger, box trap, and aural samples. For anuran species, the sampling methods by which each life stage was detected was related to relative length of larval and breeding periods and tadpole size. Detection probability of amphibians varied across sampling methods. Box trap sampling resulted in the most precise amphibian count, but the precision of all four count-based methods was low (coefficient of variation > 145 for all methods). The efficacy of the four count sampling methods at sampling fish and aquatic invertebrates was also analyzed because these predatory taxa are known to be important predictors of amphibian habitat distribution. Species richness and counts were similar for fish with the four methods, whereas invertebrate species richness and counts were greatest in box traps. An effective wetland amphibian monitoring program in the southeastern United States should include multiple sampling methods to obtain the most accurate assessment of species community composition at each site. The combined use of frogloggers, crayfish traps, and dipnets may be the most efficient and effective amphibian monitoring protocol. ?? 2007 Brill Academic Publishers.
NASA Astrophysics Data System (ADS)
Lusiana, Evellin Dewi
2017-12-01
The parameters of binary probit regression model are commonly estimated by using Maximum Likelihood Estimation (MLE) method. However, MLE method has limitation if the binary data contains separation. Separation is the condition where there are one or several independent variables that exactly grouped the categories in binary response. It will result the estimators of MLE method become non-convergent, so that they cannot be used in modeling. One of the effort to resolve the separation is using Firths approach instead. This research has two aims. First, to identify the chance of separation occurrence in binary probit regression model between MLE method and Firths approach. Second, to compare the performance of binary probit regression model estimator that obtained by MLE method and Firths approach using RMSE criteria. Those are performed using simulation method and under different sample size. The results showed that the chance of separation occurrence in MLE method for small sample size is higher than Firths approach. On the other hand, for larger sample size, the probability decreased and relatively identic between MLE method and Firths approach. Meanwhile, Firths estimators have smaller RMSE than MLEs especially for smaller sample sizes. But for larger sample sizes, the RMSEs are not much different. It means that Firths estimators outperformed MLE estimator.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Conover, W.J.; Cox, D.D.; Martz, H.F.
1997-12-01
When using parametric empirical Bayes estimation methods for estimating the binomial or Poisson parameter, the validity of the assumed beta or gamma conjugate prior distribution is an important diagnostic consideration. Chi-square goodness-of-fit tests of the beta or gamma prior hypothesis are developed for use when the binomial sample sizes or Poisson exposure times vary. Nine examples illustrate the application of the methods, using real data from such diverse applications as the loss of feedwater flow rates in nuclear power plants, the probability of failure to run on demand and the failure rates of the high pressure coolant injection systems atmore » US commercial boiling water reactors, the probability of failure to run on demand of emergency diesel generators in US commercial nuclear power plants, the rate of failure of aircraft air conditioners, baseball batting averages, the probability of testing positive for toxoplasmosis, and the probability of tumors in rats. The tests are easily applied in practice by means of corresponding Mathematica{reg_sign} computer programs which are provided.« less
Royle, J. Andrew; Chandler, Richard B.; Yackulic, Charles; Nichols, James D.
2012-01-01
1. Understanding the factors affecting species occurrence is a pre-eminent focus of applied ecological research. However, direct information about species occurrence is lacking for many species. Instead, researchers sometimes have to rely on so-called presence-only data (i.e. when no direct information about absences is available), which often results from opportunistic, unstructured sampling. MAXENT is a widely used software program designed to model and map species distribution using presence-only data. 2. We provide a critical review of MAXENT as applied to species distribution modelling and discuss how it can lead to inferential errors. A chief concern is that MAXENT produces a number of poorly defined indices that are not directly related to the actual parameter of interest – the probability of occurrence (ψ). This focus on an index was motivated by the belief that it is not possible to estimate ψ from presence-only data; however, we demonstrate that ψ is identifiable using conventional likelihood methods under the assumptions of random sampling and constant probability of species detection. 3. The model is implemented in a convenient r package which we use to apply the model to simulated data and data from the North American Breeding Bird Survey. We demonstrate that MAXENT produces extreme under-predictions when compared to estimates produced by logistic regression which uses the full (presence/absence) data set. We note that MAXENT predictions are extremely sensitive to specification of the background prevalence, which is not objectively estimated using the MAXENT method. 4. As with MAXENT, formal model-based inference requires a random sample of presence locations. Many presence-only data sets, such as those based on museum records and herbarium collections, may not satisfy this assumption. However, when sampling is random, we believe that inference should be based on formal methods that facilitate inference about interpretable ecological quantities instead of vaguely defined indices.
Vallée, Julie; Souris, Marc; Fournet, Florence; Bochaton, Audrey; Mobillion, Virginie; Peyronnie, Karine; Salem, Gérard
2007-01-01
Background Geographical objectives and probabilistic methods are difficult to reconcile in a unique health survey. Probabilistic methods focus on individuals to provide estimates of a variable's prevalence with a certain precision, while geographical approaches emphasise the selection of specific areas to study interactions between spatial characteristics and health outcomes. A sample selected from a small number of specific areas creates statistical challenges: the observations are not independent at the local level, and this results in poor statistical validity at the global level. Therefore, it is difficult to construct a sample that is appropriate for both geographical and probability methods. Methods We used a two-stage selection procedure with a first non-random stage of selection of clusters. Instead of randomly selecting clusters, we deliberately chose a group of clusters, which as a whole would contain all the variation in health measures in the population. As there was no health information available before the survey, we selected a priori determinants that can influence the spatial homogeneity of the health characteristics. This method yields a distribution of variables in the sample that closely resembles that in the overall population, something that cannot be guaranteed with randomly-selected clusters, especially if the number of selected clusters is small. In this way, we were able to survey specific areas while minimising design effects and maximising statistical precision. Application We applied this strategy in a health survey carried out in Vientiane, Lao People's Democratic Republic. We selected well-known health determinants with unequal spatial distribution within the city: nationality and literacy. We deliberately selected a combination of clusters whose distribution of nationality and literacy is similar to the distribution in the general population. Conclusion This paper describes the conceptual reasoning behind the construction of the survey sample and shows that it can be advantageous to choose clusters using reasoned hypotheses, based on both probability and geographical approaches, in contrast to a conventional, random cluster selection strategy. PMID:17543100
O’Brien, Kathryn; Edwards, Adrian; Hood, Kerenza; Butler, Christopher C
2013-01-01
Background Urinary tract infection (UTI) in children may be associated with long-term complications that could be prevented by prompt treatment. Aim To determine the prevalence of UTI in acutely ill children ≤ 5 years presenting in general practice and to explore patterns of presenting symptoms and urine sampling strategies. Design and setting Prospective observational study with systematic urine sampling, in general practices in Wales, UK. Method In total, 1003 children were recruited from 13 general practices between March 2008 and July 2010. The prevalence of UTI was determined and multivariable analysis performed to determine the probability of UTI. Result Out of 597 (60.0%) children who provided urine samples within 2 days, the prevalence of UTI was 5.9% (95% confidence interval [CI] = 4.3% to 8.0%) overall, 7.3% in those < 3 years and 3.2% in 3–5 year olds. Neither a history of fever nor the absence of an alternative source of infection was associated with UTI (P = 0.64; P = 0.69, respectively). The probability of UTI in children aged ≥3 years without increased urinary frequency or dysuria was 2%. The probability of UTI was ≥5% in all other groups. Urine sampling based purely on GP suspicion would have missed 80% of UTIs, while a sampling strategy based on current guidelines would have missed 50%. Conclusion Approximately 6% of acutely unwell children presenting to UK general practice met the criteria for a laboratory diagnosis of UTI. This higher than previously recognised prior probability of UTI warrants raised awareness of the condition and suggests clinicians should lower their threshold for urine sampling in young children. The absence of fever or presence of an alternative source of infection, as emphasised in current guidelines, may not rule out UTI in young children with adequate certainty. PMID:23561695
NASA Astrophysics Data System (ADS)
Maymandi, Nahal; Kerachian, Reza; Nikoo, Mohammad Reza
2018-03-01
This paper presents a new methodology for optimizing Water Quality Monitoring (WQM) networks of reservoirs and lakes using the concept of the value of information (VOI) and utilizing results of a calibrated numerical water quality simulation model. With reference to the value of information theory, water quality of every checkpoint with a specific prior probability differs in time. After analyzing water quality samples taken from potential monitoring points, the posterior probabilities are updated using the Baye's theorem, and VOI of the samples is calculated. In the next step, the stations with maximum VOI is selected as optimal stations. This process is repeated for each sampling interval to obtain optimal monitoring network locations for each interval. The results of the proposed VOI-based methodology is compared with those obtained using an entropy theoretic approach. As the results of the two methodologies would be partially different, in the next step, the results are combined using a weighting method. Finally, the optimal sampling interval and location of WQM stations are chosen using the Evidential Reasoning (ER) decision making method. The efficiency and applicability of the methodology are evaluated using available water quantity and quality data of the Karkheh Reservoir in the southwestern part of Iran.
Quantifying Density Fluctuations in Volumes of All Shapes and Sizes Using Indirect Umbrella Sampling
NASA Astrophysics Data System (ADS)
Patel, Amish J.; Varilly, Patrick; Chandler, David; Garde, Shekhar
2011-10-01
Water density fluctuations are an important statistical mechanical observable and are related to many-body correlations, as well as hydrophobic hydration and interactions. Local water density fluctuations at a solid-water surface have also been proposed as a measure of its hydrophobicity. These fluctuations can be quantified by calculating the probability, P v ( N), of observing N waters in a probe volume of interest v. When v is large, calculating P v ( N) using molecular dynamics simulations is challenging, as the probability of observing very few waters is exponentially small, and the standard procedure for overcoming this problem (umbrella sampling in N) leads to undesirable impulsive forces. Patel et al. (J. Phys. Chem. B 114:1632, 2010) have recently developed an indirect umbrella sampling (INDUS) method, that samples a coarse-grained particle number to obtain P v ( N) in cuboidal volumes. Here, we present and demonstrate an extension of that approach to volumes of other basic shapes, like spheres and cylinders, as well as to collections of such volumes. We further describe the implementation of INDUS in the NPT ensemble and calculate P v ( N) distributions over a broad range of pressures. Our method may be of particular interest in characterizing the hydrophobicity of interfaces of proteins, nanotubes and related systems.
II. MORE THAN JUST CONVENIENT: THE SCIENTIFIC MERITS OF HOMOGENEOUS CONVENIENCE SAMPLES.
Jager, Justin; Putnick, Diane L; Bornstein, Marc H
2017-06-01
Despite their disadvantaged generalizability relative to probability samples, nonprobability convenience samples are the standard within developmental science, and likely will remain so because probability samples are cost-prohibitive and most available probability samples are ill-suited to examine developmental questions. In lieu of focusing on how to eliminate or sharply reduce reliance on convenience samples within developmental science, here we propose how to augment their advantages when it comes to understanding population effects as well as subpopulation differences. Although all convenience samples have less clear generalizability than probability samples, we argue that homogeneous convenience samples have clearer generalizability relative to conventional convenience samples. Therefore, when researchers are limited to convenience samples, they should consider homogeneous convenience samples as a positive alternative to conventional (or heterogeneous) convenience samples. We discuss future directions as well as potential obstacles to expanding the use of homogeneous convenience samples in developmental science. © 2017 The Society for Research in Child Development, Inc.
Participant Recruitment for Studies on Disability and Work: Challenges and Solutions.
Lysaght, Rosemary; Kranenburg, Rachelle; Armstrong, Carolyn; Krupa, Terry
2016-06-01
Purpose A number of key issues related to employment of persons with disabilities demand ongoing and effective lines of inquiry. There is evidence, however, that work researchers struggle with recruitment of participants, and that this may limit the types and appropriateness of methods selected. This two phase study sought to identify the nature of recruitment challenges in workplace-based disability research, and to identify strategies for addressing identified barriers. Methods The first phase of this study was a scoping review of the literature to identify the study designs and approaches frequently used in this field of inquiry, and the success of the various recruitment methods in use. In the second phase, we used qualitative methods to explore with employers and other stakeholders in the field their perceived challenges related to participating in disability-related research, and approaches that might address these. Results The most frequently used recruitment methods identified in the literature were non-probability approaches for qualitative studies, and sampling from existing worker databases for survey research. Struggles in participant recruitment were evidenced by the use of multiple recruitment strategies, and heavy reliance on convenience sampling. Employers cited a number of barriers to participation, including time pressures, fear of legal reprisal, and perceived lack of relevance to the organization. Conclusions Participant recruitment in disability-related research is a concern, particularly in studies that require collection of new data from organizations and individuals, and where large probability samples and/or stratified or purposeful samples are desirable. A number of strategies may contribute to improved success, including development of participatory research models that will enhance benefits and perceived benefits of workplace involvement.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ebeida, Mohamed S.; Mitchell, Scott A.; Swiler, Laura P.
We introduce a novel technique, POF-Darts, to estimate the Probability Of Failure based on random disk-packing in the uncertain parameter space. POF-Darts uses hyperplane sampling to explore the unexplored part of the uncertain space. We use the function evaluation at a sample point to determine whether it belongs to failure or non-failure regions, and surround it with a protection sphere region to avoid clustering. We decompose the domain into Voronoi cells around the function evaluations as seeds and choose the radius of the protection sphere depending on the local Lipschitz continuity. As sampling proceeds, regions uncovered with spheres will shrink,more » improving the estimation accuracy. After exhausting the function evaluation budget, we build a surrogate model using the function evaluations associated with the sample points and estimate the probability of failure by exhaustive sampling of that surrogate. In comparison to other similar methods, our algorithm has the advantages of decoupling the sampling step from the surrogate construction one, the ability to reach target POF values with fewer samples, and the capability of estimating the number and locations of disconnected failure regions, not just the POF value. Furthermore, we present various examples to demonstrate the efficiency of our novel approach.« less
Inferences about landbird abundance from count data: recent advances and future directions
Nichols, J.D.; Thomas, L.; Conn, P.B.; Thomson, David L.; Cooch, Evan G.; Conroy, Michael J.
2009-01-01
We summarize results of a November 2006 workshop dealing with recent research on the estimation of landbird abundance from count data. Our conceptual framework includes a decomposition of the probability of detecting a bird potentially exposed to sampling efforts into four separate probabilities. Primary inference methods are described and include distance sampling, multiple observers, time of detection, and repeated counts. The detection parameters estimated by these different approaches differ, leading to different interpretations of resulting estimates of density and abundance. Simultaneous use of combinations of these different inference approaches can not only lead to increased precision but also provides the ability to decompose components of the detection process. Recent efforts to test the efficacy of these different approaches using natural systems and a new bird radio test system provide sobering conclusions about the ability of observers to detect and localize birds in auditory surveys. Recent research is reported on efforts to deal with such potential sources of error as bird misclassification, measurement error, and density gradients. Methods for inference about spatial and temporal variation in avian abundance are outlined. Discussion topics include opinions about the need to estimate detection probability when drawing inference about avian abundance, methodological recommendations based on the current state of knowledge and suggestions for future research.
Mining Rare Events Data for Assessing Customer Attrition Risk
NASA Astrophysics Data System (ADS)
Au, Tom; Chin, Meei-Ling Ivy; Ma, Guangqin
Customer attrition refers to the phenomenon whereby a customer leaves a service provider. As competition intensifies, preventing customers from leaving is a major challenge to many businesses such as telecom service providers. Research has shown that retaining existing customers is more profitable than acquiring new customers due primarily to savings on acquisition costs, the higher volume of service consumption, and customer referrals. For a large enterprise, its customer base consists of tens of millions service subscribers, more often the events, such as switching to competitors or canceling services are large in absolute number, but rare in percentage, far less than 5%. Based on a simple random sample, popular statistical procedures, such as logistic regression, tree-based method and neural network, can sharply underestimate the probability of rare events, and often result a null model (no significant predictors). To improve efficiency and accuracy for event probability estimation, a case-based data collection technique is then considered. A case-based sample is formed by taking all available events and a small, but representative fraction of nonevents from a dataset of interest. In this article we showed a consistent prior correction method for events probability estimation and demonstrated the performance of the above data collection techniques in predicting customer attrition with actual telecommunications data.
On the objective identification of flood seasons
NASA Astrophysics Data System (ADS)
Cunderlik, Juraj M.; Ouarda, Taha B. M. J.; BobéE, Bernard
2004-01-01
The determination of seasons of high and low probability of flood occurrence is a task with many practical applications in contemporary hydrology and water resources management. Flood seasons are generally identified subjectively by visually assessing the temporal distribution of flood occurrences and, then at a regional scale, verified by comparing the temporal distribution with distributions obtained at hydrologically similar neighboring sites. This approach is subjective, time consuming, and potentially unreliable. The main objective of this study is therefore to introduce a new, objective, and systematic method for the identification of flood seasons. The proposed method tests the significance of flood seasons by comparing the observed variability of flood occurrences with the theoretical flood variability in a nonseasonal model. The method also addresses the uncertainty resulting from sampling variability by quantifying the probability associated with the identified flood seasons. The performance of the method was tested on an extensive number of samples with different record lengths generated from several theoretical models of flood seasonality. The proposed approach was then applied on real data from a large set of sites with different flood regimes across Great Britain. The results show that the method can efficiently identify flood seasons from both theoretical and observed distributions of flood occurrence. The results were used for the determination of the main flood seasonality types in Great Britain.
On the inequivalence of the CH and CHSH inequalities due to finite statistics
NASA Astrophysics Data System (ADS)
Renou, M. O.; Rosset, D.; Martin, A.; Gisin, N.
2017-06-01
Different variants of a Bell inequality, such as CHSH and CH, are known to be equivalent when evaluated on nonsignaling outcome probability distributions. However, in experimental setups, the outcome probability distributions are estimated using a finite number of samples. Therefore the nonsignaling conditions are only approximately satisfied and the robustness of the violation depends on the chosen inequality variant. We explain that phenomenon using the decomposition of the space of outcome probability distributions under the action of the symmetry group of the scenario, and propose a method to optimize the statistical robustness of a Bell inequality. In the process, we describe the finite group composed of relabeling of parties, measurement settings and outcomes, and identify correspondences between the irreducible representations of this group and properties of outcome probability distributions such as normalization, signaling or having uniform marginals.
Estimating nest detection probabilities for white-winged dove nest transects in Tamaulipas, Mexico
Nichols, J.D.; Tomlinson, R.E.; Waggerman, G.
1986-01-01
Nest transects in nesting colonies provide one source of information on White-winged Dove (Zenaida asiatica asiatica) population status and reproduction. Nests are counted along transects using standardized field methods each year in Texas and northeastern Mexico by personnel associated with Mexico's Office of Flora and Fauna, the Texas Parks and Wildlife Department, and the U.S. Fish and Wildlife Service. Nest counts on transects are combined with information on the size of nesting colonies to estimate total numbers of nests in sampled colonies. Historically, these estimates have been based on the actual nest counts on transects and thus have required the assumption that all nests lying within transect boundaries are detected (seen) with a probability of one. Our objectives were to test the hypothesis that nest detection probability is one and, if rejected, to estimate this probability.
Isolation and clinical sample typing of human leptospirosis cases in Argentina.
Chiani, Yosena; Jacob, Paulina; Varni, Vanina; Landolt, Noelia; Schmeling, María Fernanda; Pujato, Nazarena; Caimi, Karina; Vanasco, Bibiana
2016-01-01
Leptospira typing is carried out using isolated strains. Because of difficulties in obtaining them, direct identification of infective Leptospira in clinical samples is a high priority. Multilocus sequence typing (MLST) proved highly discriminatory for seven pathogenic species of Leptospira, allowing isolate characterization and robust assignment to species, in addition to phylogenetic evidence for the relatedness between species. In this study we characterized Leptospira strains circulating in Argentina, using typing methods applied to human clinical samples and isolates. Phylogenetic studies based on 16S ribosomal RNA gene sequences enabled typing of 8 isolates (6 Leptospira interrogans, one Leptospira wolffii and one Leptospira broomii) and 58 out of 85 (68.2%) clinical samples (55 L. interrogans, 2 Leptospira meyeri, and one Leptospira kirschneri). MLST results for the L. interrogans isolates indicated that five were probably Canicola serogroup (ST37) and one was probably Icterohaemorrhagiae serogroup (ST17). Eleven clinical samples (21.6%), provided MLST interpretable data: five were probably Pyrogenes serogroup (ST13), four Sejroe (ST20), one Autumnalis (ST22) and one Canicola (ST37). To the best of our knowledge this study is the first report of the use of an MLST typing scheme with seven loci to identify Leptospira directly from clinical samples in Argentina. The use of clinical samples presents the advantage of the possibility of knowing the infecting strain without resorting to isolates. This study also allowed, for the first time, the characterization of isolates of intermediate pathogenicity species (L. wolffii and L. broomii) from symptomatic patients. Copyright © 2015 Elsevier B.V. All rights reserved.
Yilmaz, Yildiz E; Bull, Shelley B
2011-11-29
Use of trait-dependent sampling designs in whole-genome association studies of sequence data can reduce total sequencing costs with modest losses of statistical efficiency. In a quantitative trait (QT) analysis of data from the Genetic Analysis Workshop 17 mini-exome for unrelated individuals in the Asian subpopulation, we investigate alternative designs that sequence only 50% of the entire cohort. In addition to a simple random sampling design, we consider extreme-phenotype designs that are of increasing interest in genetic association analysis of QTs, especially in studies concerned with the detection of rare genetic variants. We also evaluate a novel sampling design in which all individuals have a nonzero probability of being selected into the sample but in which individuals with extreme phenotypes have a proportionately larger probability. We take differential sampling of individuals with informative trait values into account by inverse probability weighting using standard survey methods which thus generalizes to the source population. In replicate 1 data, we applied the designs in association analysis of Q1 with both rare and common variants in the FLT1 gene, based on knowledge of the generating model. Using all 200 replicate data sets, we similarly analyzed Q1 and Q4 (which is known to be free of association with FLT1) to evaluate relative efficiency, type I error, and power. Simulation study results suggest that the QT-dependent selection designs generally yield greater than 50% relative efficiency compared to using the entire cohort, implying cost-effectiveness of 50% sample selection and worthwhile reduction of sequencing costs.
Clare, John; McKinney, Shawn T.; DePue, John E.; Loftin, Cynthia S.
2017-01-01
It is common to use multiple field sampling methods when implementing wildlife surveys to compare method efficacy or cost efficiency, integrate distinct pieces of information provided by separate methods, or evaluate method-specific biases and misclassification error. Existing models that combine information from multiple field methods or sampling devices permit rigorous comparison of method-specific detection parameters, enable estimation of additional parameters such as false-positive detection probability, and improve occurrence or abundance estimates, but with the assumption that the separate sampling methods produce detections independently of one another. This assumption is tenuous if methods are paired or deployed in close proximity simultaneously, a common practice that reduces the additional effort required to implement multiple methods and reduces the risk that differences between method-specific detection parameters are confounded by other environmental factors. We develop occupancy and spatial capture–recapture models that permit covariance between the detections produced by different methods, use simulation to compare estimator performance of the new models to models assuming independence, and provide an empirical application based on American marten (Martes americana) surveys using paired remote cameras, hair catches, and snow tracking. Simulation results indicate existing models that assume that methods independently detect organisms produce biased parameter estimates and substantially understate estimate uncertainty when this assumption is violated, while our reformulated models are robust to either methodological independence or covariance. Empirical results suggested that remote cameras and snow tracking had comparable probability of detecting present martens, but that snow tracking also produced false-positive marten detections that could potentially substantially bias distribution estimates if not corrected for. Remote cameras detected marten individuals more readily than passive hair catches. Inability to photographically distinguish individual sex did not appear to induce negative bias in camera density estimates; instead, hair catches appeared to produce detection competition between individuals that may have been a source of negative bias. Our model reformulations broaden the range of circumstances in which analyses incorporating multiple sources of information can be robustly used, and our empirical results demonstrate that using multiple field-methods can enhance inferences regarding ecological parameters of interest and improve understanding of how reliably survey methods sample these parameters.
Bayesian adaptive survey protocols for resource management
Halstead, Brian J.; Wylie, Glenn D.; Coates, Peter S.; Casazza, Michael L.
2011-01-01
Transparency in resource management decisions requires a proper accounting of uncertainty at multiple stages of the decision-making process. As information becomes available, periodic review and updating of resource management protocols reduces uncertainty and improves management decisions. One of the most basic steps to mitigating anthropogenic effects on populations is determining if a population of a species occurs in an area that will be affected by human activity. Species are rarely detected with certainty, however, and falsely declaring a species absent can cause improper conservation decisions or even extirpation of populations. We propose a method to design survey protocols for imperfectly detected species that accounts for multiple sources of uncertainty in the detection process, is capable of quantitatively incorporating expert opinion into the decision-making process, allows periodic updates to the protocol, and permits resource managers to weigh the severity of consequences if the species is falsely declared absent. We developed our method using the giant gartersnake (Thamnophis gigas), a threatened species precinctive to the Central Valley of California, as a case study. Survey date was negatively related to the probability of detecting the giant gartersnake, and water temperature was positively related to the probability of detecting the giant gartersnake at a sampled location. Reporting sampling effort, timing and duration of surveys, and water temperatures would allow resource managers to evaluate the probability that the giant gartersnake occurs at sampled sites where it is not detected. This information would also allow periodic updates and quantitative evaluation of changes to the giant gartersnake survey protocol. Because it naturally allows multiple sources of information and is predicated upon the idea of updating information, Bayesian analysis is well-suited to solving the problem of developing efficient sampling protocols for species of conservation concern.
DOE Office of Scientific and Technical Information (OSTI.GOV)
O'Brien, Travis A.; Kashinath, Karthik; Cavanaugh, Nicholas R.
Numerous facets of scientific research implicitly or explicitly call for the estimation of probability densities. Histograms and kernel density estimates (KDEs) are two commonly used techniques for estimating such information, with the KDE generally providing a higher fidelity representation of the probability density function (PDF). Both methods require specification of either a bin width or a kernel bandwidth. While techniques exist for choosing the kernel bandwidth optimally and objectively, they are computationally intensive, since they require repeated calculation of the KDE. A solution for objectively and optimally choosing both the kernel shape and width has recently been developed by Bernacchiamore » and Pigolotti (2011). While this solution theoretically applies to multidimensional KDEs, it has not been clear how to practically do so. A method for practically extending the Bernacchia-Pigolotti KDE to multidimensions is introduced. This multidimensional extension is combined with a recently-developed computational improvement to their method that makes it computationally efficient: a 2D KDE on 10 5 samples only takes 1 s on a modern workstation. This fast and objective KDE method, called the fastKDE method, retains the excellent statistical convergence properties that have been demonstrated for univariate samples. The fastKDE method exhibits statistical accuracy that is comparable to state-of-the-science KDE methods publicly available in R, and it produces kernel density estimates several orders of magnitude faster. The fastKDE method does an excellent job of encoding covariance information for bivariate samples. This property allows for direct calculation of conditional PDFs with fastKDE. It is demonstrated how this capability might be leveraged for detecting non-trivial relationships between quantities in physical systems, such as transitional behavior.« less
Berkas, Wayne R.
2000-01-01
Water samples from 27 wells completed in and near the Shell Valley aquifer were analyzed for benzene, toluene, ethylbenzene, and xylene (BTEX), polynuclear aromatic hydrocarbons (PAH), polychlorinated biphenyls (PCB), and pentachlorophenol (PCP) using the enzyme-linked immunoassay method. The analyses indicated the presence of PAH, PCB, and PCP in the study area. However, an individual compound at a high concentration or many compounds at low concentrations could cause the detections. Therefore, selected samples were analyzed using the gas chromatography (GC) method, which can detect individual compounds and determine the concentrations of those compounds. Concentrations for all compounds detected using the GC method were less than the minimum reporting levels (MRLs) for each constituent, indicating numerous compounds at low concentrations probably caused the immunoassay detections. The GC method also can detect compounds other than BTEX, PAH, PCB, and PCP. Concentrations for 81 of the additional compounds were determined and were less than the MRLs.Four compounds that could not be quantified accurately using the requested analytical methods also were detected. Acetone was detected in 4 of the 27 wells, 2-butanone was detected in 3 of the 27 wells, prometon was detected in 1 of the 27 wells, and tetrahydrofuran was detected in 9 of the 27 wells. Acetone, 2-butanone, and tetrahydrofuran probably leached from the polyvinyl chloride (PVC) pipe and joint glue and probably are not contaminants from the aquifer. Prometon is a herbicide that controls most annual and many perennial broadleaf weeds and primarily is used on roads and railroad tracks. The one occurrence of prometon could be caused by overspraying for weeds.
A Dirichlet-Multinomial Bayes Classifier for Disease Diagnosis with Microbial Compositions.
Gao, Xiang; Lin, Huaiying; Dong, Qunfeng
2017-01-01
Dysbiosis of microbial communities is associated with various human diseases, raising the possibility of using microbial compositions as biomarkers for disease diagnosis. We have developed a Bayes classifier by modeling microbial compositions with Dirichlet-multinomial distributions, which are widely used to model multicategorical count data with extra variation. The parameters of the Dirichlet-multinomial distributions are estimated from training microbiome data sets based on maximum likelihood. The posterior probability of a microbiome sample belonging to a disease or healthy category is calculated based on Bayes' theorem, using the likelihood values computed from the estimated Dirichlet-multinomial distribution, as well as a prior probability estimated from the training microbiome data set or previously published information on disease prevalence. When tested on real-world microbiome data sets, our method, called DMBC (for Dirichlet-multinomial Bayes classifier), shows better classification accuracy than the only existing Bayesian microbiome classifier based on a Dirichlet-multinomial mixture model and the popular random forest method. The advantage of DMBC is its built-in automatic feature selection, capable of identifying a subset of microbial taxa with the best classification accuracy between different classes of samples based on cross-validation. This unique ability enables DMBC to maintain and even improve its accuracy at modeling species-level taxa. The R package for DMBC is freely available at https://github.com/qunfengdong/DMBC. IMPORTANCE By incorporating prior information on disease prevalence, Bayes classifiers have the potential to estimate disease probability better than other common machine-learning methods. Thus, it is important to develop Bayes classifiers specifically tailored for microbiome data. Our method shows higher classification accuracy than the only existing Bayesian classifier and the popular random forest method, and thus provides an alternative option for using microbial compositions for disease diagnosis.
Espino, L; Way, M O; Wilson, L T
2008-02-01
Commercial rice, Oryza sativa L., fields in southeastern Texas were sampled during 2003 and 2004, and visual samples were compared with sweep net samples. Fields were sampled at different stages of panicle development, times of day, and by different operators. Significant differences were found between perimeter and within field sweep net samples, indicating that samples taken 9 m from the field margin overestimate within field Oebalus pugnax (F.) (Hemiptera: Pentatomidae) populations. Time of day did not significantly affect the number of O. pugnax caught with the sweep net; however, there was a trend to capture more insects during morning than afternoon. For all sampling methods evaluated during this study, O. pugnax was found to have an aggregated spatial pattern at most densities. When comparing sweep net with visual sampling methods, one sweep of the "long stick" and two sweeps of the "sweep stick" correlated well with the sweep net (r2 = 0.639 and r2 = 0.815, respectively). This relationship was not affected by time of day of sampling, stage of panicle development, type of planting or operator. Relative cost-reliability, which incorporates probability of adoption, indicates the visual methods are more cost-reliable than the sweep net for sampling O.
Probabilistic inference using linear Gaussian importance sampling for hybrid Bayesian networks
NASA Astrophysics Data System (ADS)
Sun, Wei; Chang, K. C.
2005-05-01
Probabilistic inference for Bayesian networks is in general NP-hard using either exact algorithms or approximate methods. However, for very complex networks, only the approximate methods such as stochastic sampling could be used to provide a solution given any time constraint. There are several simulation methods currently available. They include logic sampling (the first proposed stochastic method for Bayesian networks, the likelihood weighting algorithm) the most commonly used simulation method because of its simplicity and efficiency, the Markov blanket scoring method, and the importance sampling algorithm. In this paper, we first briefly review and compare these available simulation methods, then we propose an improved importance sampling algorithm called linear Gaussian importance sampling algorithm for general hybrid model (LGIS). LGIS is aimed for hybrid Bayesian networks consisting of both discrete and continuous random variables with arbitrary distributions. It uses linear function and Gaussian additive noise to approximate the true conditional probability distribution for continuous variable given both its parents and evidence in a Bayesian network. One of the most important features of the newly developed method is that it can adaptively learn the optimal important function from the previous samples. We test the inference performance of LGIS using a 16-node linear Gaussian model and a 6-node general hybrid model. The performance comparison with other well-known methods such as Junction tree (JT) and likelihood weighting (LW) shows that LGIS-GHM is very promising.
Monte Carlo methods to calculate impact probabilities
NASA Astrophysics Data System (ADS)
Rickman, H.; Wiśniowski, T.; Wajer, P.; Gabryszewski, R.; Valsecchi, G. B.
2014-09-01
Context. Unraveling the events that took place in the solar system during the period known as the late heavy bombardment requires the interpretation of the cratered surfaces of the Moon and terrestrial planets. This, in turn, requires good estimates of the statistical impact probabilities for different source populations of projectiles, a subject that has received relatively little attention, since the works of Öpik (1951, Proc. R. Irish Acad. Sect. A, 54, 165) and Wetherill (1967, J. Geophys. Res., 72, 2429). Aims: We aim to work around the limitations of the Öpik and Wetherill formulae, which are caused by singularities due to zero denominators under special circumstances. Using modern computers, it is possible to make good estimates of impact probabilities by means of Monte Carlo simulations, and in this work, we explore the available options. Methods: We describe three basic methods to derive the average impact probability for a projectile with a given semi-major axis, eccentricity, and inclination with respect to a target planet on an elliptic orbit. One is a numerical averaging of the Wetherill formula; the next is a Monte Carlo super-sizing method using the target's Hill sphere. The third uses extensive minimum orbit intersection distance (MOID) calculations for a Monte Carlo sampling of potentially impacting orbits, along with calculations of the relevant interval for the timing of the encounter allowing collision. Numerical experiments are carried out for an intercomparison of the methods and to scrutinize their behavior near the singularities (zero relative inclination and equal perihelion distances). Results: We find an excellent agreement between all methods in the general case, while there appear large differences in the immediate vicinity of the singularities. With respect to the MOID method, which is the only one that does not involve simplifying assumptions and approximations, the Wetherill averaging impact probability departs by diverging toward infinity, while the Hill sphere method results in a severely underestimated probability. We provide a discussion of the reasons for these differences, and we finally present the results of the MOID method in the form of probability maps for the Earth and Mars on their current orbits. These maps show a relatively flat probability distribution, except for the occurrence of two ridges found at small inclinations and for coinciding projectile/target perihelion distances. Conclusions: Our results verify the standard formulae in the general case, away from the singularities. In fact, severe shortcomings are limited to the immediate vicinity of those extreme orbits. On the other hand, the new Monte Carlo methods can be used without excessive consumption of computer time, and the MOID method avoids the problems associated with the other methods. Appendices are available in electronic form at http://www.aanda.org
Kundu, Anupam; Sabhapandit, Sanjib; Dhar, Abhishek
2011-03-01
We present an algorithm for finding the probabilities of rare events in nonequilibrium processes. The algorithm consists of evolving the system with a modified dynamics for which the required event occurs more frequently. By keeping track of the relative weight of phase-space trajectories generated by the modified and the original dynamics one can obtain the required probabilities. The algorithm is tested on two model systems of steady-state particle and heat transport where we find a huge improvement from direct simulation methods.
Zappelini, Lincohn; Martone-Rocha, Solange; Dropa, Milena; Matté, Maria Helena; Tiba, Monique Ribeiro; Breternitz, Bruna Suellen; Razzolini, Maria Tereza Pepe
2017-02-01
Nontyphoidal Salmonella (NTS) is a relevant pathogen involved in gastroenteritis outbreaks worldwide. In this study, we determined the capacity to combine the most probable number (MPN) and multiplex polymerase chain reaction (PCR) methods to characterize the most important Salmonella serotypes in raw sewage. A total of 499 isolates were recovered from 27 raw sewage samples and screened using two previously described multiplex PCR methods. From those, 123 isolates were selected based on PCR banding pattern-identical or similar to Salmonella Enteritidis and Salmonella Typhimurium-and submitted to conventional serotyping. Results showed that both PCR assays correctly serotyped Salmonella Enteritidis, however, they presented ambiguous results for Salmonella Typhimurium identification. These data highlight that MPN and multiplex PCR can be useful methods to describe microbial quality in raw sewage and suggest two new PCR patterns for Salmonella Enteritidis identification.
DNA Identification of Skeletal Remains from World War II Mass Graves Uncovered in Slovenia
Marjanović, Damir; Durmić-Pašić, Adaleta; Bakal, Narcisa; Haverić, Sanin; Kalamujić, Belma; Kovačević, Lejla; Ramić, Jasmin; Pojskić, Naris; Škaro, Vedrana; Projić, Petar; Bajrović, Kasim; Hadžiselimović, Rifat; Drobnič, Katja; Huffine, Ed; Davoren, Jon; Primorac, Dragan
2007-01-01
Aim To present the joint effort of three institutions in the identification of human remains from the World War II found in two mass graves in the area of Škofja Loka, Slovenia. Methods The remains of 27 individuals were found in two small and closely located mass graves. The DNA was isolated from bone and teeth samples using either standard phenol/chloroform alcohol extraction or optimized Qiagen DNA extraction procedure. Some recovered samples required the employment of additional DNA purification methods, such as N-buthanol treatment. QuantifilerTM Human DNA Quantification Kit was used for DNA quantification. PowerPlex 16 kit was used to simultaneously amplify 15 short tandem repeat (STR) loci. Matching probabilities were estimated using the DNA View program. Results Out of all processed samples, 15 remains were fully profiled at all 15 STR loci. The other 12 profiles were partial. The least successful profile included 13 loci. Also, 69 referent samples (buccal swabs) from potential living relatives were collected and profiled. Comparison of victims' profile against referent samples database resulted in 4 strong matches. In addition, 5 other profiles were matched to certain referent samples with lower probability. Conclusion Our results show that more than 6 decades after the end of the World War II, DNA analysis may significantly contribute to the identification of the remains from that period. Additional analysis of Y-STRs and mitochondrial DNA (mtDNA) markers will be performed in the second phase of the identification project. PMID:17696306
Importance Sampling in the Evaluation and Optimization of Buffered Failure Probability
2015-07-01
12th International Conference on Applications of Statistics and Probability in Civil Engineering, ICASP12 Vancouver, Canada, July 12-15, 2015...Importance Sampling in the Evaluation and Optimization of Buffered Failure Probability Marwan M. Harajli Graduate Student, Dept. of Civil and Environ...criterion is usually the failure probability . In this paper, we examine the buffered failure probability as an attractive alternative to the failure
Particle Filtering Methods for Incorporating Intelligence Updates
2017-03-01
methodology for incorporating intelligence updates into a stochastic model for target tracking. Due to the non -parametric assumptions of the PF...samples are taken with replacement from the remaining non -zero weighted particles at each iteration. With this methodology , a zero-weighted particle is...incorporation of information updates. A common method for incorporating information updates is Kalman filtering. However, given the probable nonlinear and non
ERIC Educational Resources Information Center
Stigler, James W.; Gonzales, Patrick; Kawanaka, Takako; Knoll, Steffen; Serrano, Ana
1999-01-01
Describes the methods and preliminary findings of the Videotape Classroom Study, a video survey of eighth-grade mathematics lessons in Germany, Japan, and the United States. Part of the Third International Mathematics and Science study, this research project is the first study of videotaped records from national probability samples. (SLD)
Le, Quang A; Doctor, Jason N
2011-05-01
As quality-adjusted life years have become the standard metric in health economic evaluations, mapping health-profile or disease-specific measures onto preference-based measures to obtain quality-adjusted life years has become a solution when health utilities are not directly available. However, current mapping methods are limited due to their predictive validity, reliability, and/or other methodological issues. We employ probability theory together with a graphical model, called a Bayesian network, to convert health-profile measures into preference-based measures and to compare the results to those estimated with current mapping methods. A sample of 19,678 adults who completed both the 12-item Short Form Health Survey (SF-12v2) and EuroQoL 5D (EQ-5D) questionnaires from the 2003 Medical Expenditure Panel Survey was split into training and validation sets. Bayesian networks were constructed to explore the probabilistic relationships between each EQ-5D domain and 12 items of the SF-12v2. The EQ-5D utility scores were estimated on the basis of the predicted probability of each response level of the 5 EQ-5D domains obtained from the Bayesian inference process using the following methods: Monte Carlo simulation, expected utility, and most-likely probability. Results were then compared with current mapping methods including multinomial logistic regression, ordinary least squares, and censored least absolute deviations. The Bayesian networks consistently outperformed other mapping models in the overall sample (mean absolute error=0.077, mean square error=0.013, and R overall=0.802), in different age groups, number of chronic conditions, and ranges of the EQ-5D index. Bayesian networks provide a new robust and natural approach to map health status responses into health utility measures for health economic evaluations.
Survey of South African fruit juices using a fast screening HILIC-MS method.
Stander, Marietjie A; Kühn, Wernich; Hiten, Nicholas F
2013-01-01
Adulteration of fruit juices--by the addition of sugar or other less expensive fruit juices as well as preservatives, artificial sweeteners and colours--was tested for by using a developed screening method. The method employs hydrophilic interaction liquid chromatography-mass spectrometry (HILIC-MS) using electrospray ionisation in the negative mode and ultraviolet light detection. Different fruit juices can be differentiated by the content of marker compounds like sorbitol, certain phenolic molecules and their saccharide profile. This method was used to test 46 fruit juice samples from the retail market as well as 12 control samples. The study focused on the main types of fruit juices consumed on the South African market including apple, orange, grape and blends of these juices with other fruits like mango, pear and guava. Overall, the 46 samples tested mostly agreed with label claims. One grape juice sample was adulterated, probably with apple juice. Natamycin above the legal limits was found in two samples. In addition, two samples contained natamycin and one sample benzoate without it being indicated on the label. The method is well suited as a quick screening method for fruit juice adulteration and if used routinely would reduce fruit juice adulteration without the cost of the current array of tests needed for authenticity testing.
NASA Astrophysics Data System (ADS)
Houben, Georg J.; Koeniger, Paul; Schloemer, Stefan; Gröger-Trampe, Jens; Sültenfuß, Jürgen
2018-02-01
Depth-specific sampling of groundwater is important for a variety of hydrogeological applications. Several sampling methods are available but comparably little is known about how their results compare. Therefore, samples from regular observation wells (short screen), micro-filters and direct push were compared for two sites with differing hydrogeological conditions and land use, both located in the Fuhrberger Feld, Germany. The encountered hydrochemical zonation requires a high resolution of 1 m or better, which the available small number of regular observation wells could only roughly mirror. Because the three methods employ significantly varying pumping rates and therefore, have varying spatial origins of the sample, individual concentrations at similar depths may differ significantly. In a hydrologically and chemically dynamical environment such as the agricultural site, this effect becomes more pronounced than for the more stable forest site. The micro-filters are probably the most depth-specific, but showed distinctly lower concentrations for dissolved gases than the other two methods, due to degassing during sampling. They should thus not be used for any method that relies on dissolved gas analysis.
WEIGHTED LIKELIHOOD ESTIMATION UNDER TWO-PHASE SAMPLING
Saegusa, Takumi; Wellner, Jon A.
2013-01-01
We develop asymptotic theory for weighted likelihood estimators (WLE) under two-phase stratified sampling without replacement. We also consider several variants of WLEs involving estimated weights and calibration. A set of empirical process tools are developed including a Glivenko–Cantelli theorem, a theorem for rates of convergence of M-estimators, and a Donsker theorem for the inverse probability weighted empirical processes under two-phase sampling and sampling without replacement at the second phase. Using these general results, we derive asymptotic distributions of the WLE of a finite-dimensional parameter in a general semiparametric model where an estimator of a nuisance parameter is estimable either at regular or nonregular rates. We illustrate these results and methods in the Cox model with right censoring and interval censoring. We compare the methods via their asymptotic variances under both sampling without replacement and the more usual (and easier to analyze) assumption of Bernoulli sampling at the second phase. PMID:24563559
Khan, Hafiz; Saxena, Anshul; Perisetti, Abhilash; Rafiq, Aamrin; Gabbidon, Kemesha; Mende, Sarah; Lyuksyutova, Maria; Quesada, Kandi; Blakely, Summre; Torres, Tiffany; Afesse, Mahlet
2016-12-01
Background: Breast cancer is a worldwide public health concern and is the most prevalent type of cancer in women in the United States. This study concerned the best fit of statistical probability models on the basis of survival times for nine state cancer registries: California, Connecticut, Georgia, Hawaii, Iowa, Michigan, New Mexico, Utah, and Washington. Materials and Methods: A probability random sampling method was applied to select and extract records of 2,000 breast cancer patients from the Surveillance Epidemiology and End Results (SEER) database for each of the nine state cancer registries used in this study. EasyFit software was utilized to identify the best probability models by using goodness of fit tests, and to estimate parameters for various statistical probability distributions that fit survival data. Results: Statistical analysis for the summary of statistics is reported for each of the states for the years 1973 to 2012. Kolmogorov-Smirnov, Anderson-Darling, and Chi-squared goodness of fit test values were used for survival data, the highest values of goodness of fit statistics being considered indicative of the best fit survival model for each state. Conclusions: It was found that California, Connecticut, Georgia, Iowa, New Mexico, and Washington followed the Burr probability distribution, while the Dagum probability distribution gave the best fit for Michigan and Utah, and Hawaii followed the Gamma probability distribution. These findings highlight differences between states through selected sociodemographic variables and also demonstrate probability modeling differences in breast cancer survival times. The results of this study can be used to guide healthcare providers and researchers for further investigations into social and environmental factors in order to reduce the occurrence of and mortality due to breast cancer. Creative Commons Attribution License
The oilspill risk analysis model of the U. S. Geological Survey
Smith, R.A.; Slack, J.R.; Wyant, Timothy; Lanfear, K.J.
1982-01-01
The U.S. Geological Survey has developed an oilspill risk analysis model to aid in estimating the environmental hazards of developing oil resources in Outer Continental Shelf (OCS) lease areas. The large, computerized model analyzes the probability of spill occurrence, as well as the likely paths or trajectories of spills in relation to the locations of recreational and biological resources which may be vulnerable. The analytical methodology can easily incorporate estimates of weathering rates , slick dispersion, and possible mitigating effects of cleanup. The probability of spill occurrence is estimated from information on the anticipated level of oil production and method of route of transport. Spill movement is modeled in Monte Carlo fashion with a sample of 500 spills per season, each transported by monthly surface current vectors and wind velocities sampled from 3-hour wind transition matrices. Transition matrices are based on historic wind records grouped in 41 wind velocity classes, and are constructed seasonally for up to six wind stations. Locations and monthly vulnerabilities of up to 31 categories of environmental resources are digitized within an 800,000 square kilometer study area. Model output includes tables of conditional impact probabilities (that is, the probability of hitting a target, given that a spill has occured), as well as probability distributions for oilspills occurring and contacting environmental resources within preselected vulnerability time horizons. (USGS)
The oilspill risk analysis model of the U. S. Geological Survey
Smith, R.A.; Slack, J.R.; Wyant, T.; Lanfear, K.J.
1980-01-01
The U.S. Geological Survey has developed an oilspill risk analysis model to aid in estimating the environmental hazards of developing oil resources in Outer Continental Shelf (OCS) lease areas. The large, computerized model analyzes the probability of spill occurrence, as well as the likely paths or trajectories of spills in relation to the locations of recreational and biological resources which may be vulnerable. The analytical methodology can easily incorporate estimates of weathering rates , slick dispersion, and possible mitigating effects of cleanup. The probability of spill occurrence is estimated from information on the anticipated level of oil production and method and route of transport. Spill movement is modeled in Monte Carlo fashion with a sample of 500 spills per season, each transported by monthly surface current vectors and wind velocities sampled from 3-hour wind transition matrices. Transition matrices are based on historic wind records grouped in 41 wind velocity classes, and are constructed seasonally for up to six wind stations. Locations and monthly vulnerabilities of up to 31 categories of environmental resources are digitized within an 800,000 square kilometer study area. Model output includes tables of conditional impact probabilities (that is, the probability of hitting a target, given that a spill has occurred), as well as probability distributions for oilspills occurring and contacting environmental resources within preselected vulnerability time horizons. (USGS)
A combinatorial perspective of the protein inference problem.
Yang, Chao; He, Zengyou; Yu, Weichuan
2013-01-01
In a shotgun proteomics experiment, proteins are the most biologically meaningful output. The success of proteomics studies depends on the ability to accurately and efficiently identify proteins. Many methods have been proposed to facilitate the identification of proteins from peptide identification results. However, the relationship between protein identification and peptide identification has not been thoroughly explained before. In this paper, we devote ourselves to a combinatorial perspective of the protein inference problem. We employ combinatorial mathematics to calculate the conditional protein probabilities (protein probability means the probability that a protein is correctly identified) under three assumptions, which lead to a lower bound, an upper bound, and an empirical estimation of protein probabilities, respectively. The combinatorial perspective enables us to obtain an analytical expression for protein inference. Our method achieves comparable results with ProteinProphet in a more efficient manner in experiments on two data sets of standard protein mixtures and two data sets of real samples. Based on our model, we study the impact of unique peptides and degenerate peptides (degenerate peptides are peptides shared by at least two proteins) on protein probabilities. Meanwhile, we also study the relationship between our model and ProteinProphet. We name our program ProteinInfer. Its Java source code, our supplementary document and experimental results are available at: >http://bioinformatics.ust.hk/proteininfer.
Shao, Jingyuan; Cao, Wen; Qu, Haibin; Pan, Jianyang; Gong, Xingchu
2018-01-01
The aim of this study was to present a novel analytical quality by design (AQbD) approach for developing an HPLC method to analyze herbal extracts. In this approach, critical method attributes (CMAs) and critical method parameters (CMPs) of the analytical method were determined using the same data collected from screening experiments. The HPLC-ELSD method for separation and quantification of sugars in Codonopsis Radix extract (CRE) samples and Astragali Radix extract (ARE) samples was developed as an example method with a novel AQbD approach. Potential CMAs and potential CMPs were found with Analytical Target Profile. After the screening experiments, the retention time of the D-glucose peak of CRE samples, the signal-to-noise ratio of the D-glucose peak of CRE samples, and retention time of the sucrose peak in ARE samples were considered CMAs. The initial and final composition of the mobile phase, flow rate, and column temperature were found to be CMPs using a standard partial regression coefficient method. The probability-based design space was calculated using a Monte-Carlo simulation method and verified by experiments. The optimized method was validated to be accurate and precise, and then it was applied in the analysis of CRE and ARE samples. The present AQbD approach is efficient and suitable for analysis objects with complex compositions.
Pedigree data analysis with crossover interference.
Browning, Sharon
2003-01-01
We propose a new method for calculating probabilities for pedigree genetic data that incorporates crossover interference using the chi-square models. Applications include relationship inference, genetic map construction, and linkage analysis. The method is based on importance sampling of unobserved inheritance patterns conditional on the observed genotype data and takes advantage of fast algorithms for no-interference models while using reweighting to allow for interference. We show that the method is effective for arbitrarily many markers with small pedigrees. PMID:12930760
The late Neandertal supraorbital fossils from Vindija Cave, Croatia: a biased sample?
Ahern, James C M; Lee, Sang-Hee; Hawks, John D
2002-09-01
The late Neandertal sample from Vindija (Croatia) has been described as transitional between the earlier Central European Neandertals from Krapina (Croatia) and modern humans. However, the morphological differences indicating this transition may rather be the result of different sex and/or age compositions between the samples. This study tests the hypothesis that the metric differences between the Krapina and Vindija supraorbital samples are due to sampling bias. We focus upon the supraorbital region because past studies have posited this region as particularly indicative of the Vindija sample's transitional nature. Furthermore, the supraorbital region varies significantly with both age and sex. We analyzed four chords and two derived indices of supraorbital torus form as defined by Smith & Ranyard (1980, Am. J. phys. Anthrop.93, pp. 589-610). For each variable, we analyzed relative sample bias of the Krapina and Vindija samples using three sampling methods. In order to test the hypothesis that the Vindija sample contains an over-representation of females and/or young while the Krapina sample is normal or also female/young biased, we determined the probability of drawing a sample of the same size as and with a mean equal to or less than Vindija's from a Krapina-based population. In order to test the hypothesis that the Vindija sample is female/young biased while the Krapina sample is male/old biased, we determined the probability of drawing a sample of the same size as and with a mean equal or less than Vindija's from a generated population whose mean is halfway between Krapina's and Vindija's. Finally, in order to test the hypothesis that the Vindija sample is normal while the Krapina sample contains an over-representation of males and/or old, we determined the probability of drawing a sample of the same size as and with a mean equal to or greater than Krapina's from a Vindija-based population. Unless we assume that the Vindija sample is female/young and the Krapina sample is male/old biased, our results falsify the hypothesis that the metric differences between the Krapina and Vindija samples are due to sample bias.
NASA Astrophysics Data System (ADS)
Sergeenko, N. P.
2017-11-01
An adequate statistical method should be developed in order to predict probabilistically the range of ionospheric parameters. This problem is solved in this paper. The time series of the critical frequency of the layer F2- foF2( t) were subjected to statistical processing. For the obtained samples {δ foF2}, statistical distributions and invariants up to the fourth order are calculated. The analysis shows that the distributions differ from the Gaussian law during the disturbances. At levels of sufficiently small probability distributions, there are arbitrarily large deviations from the model of the normal process. Therefore, it is attempted to describe statistical samples {δ foF2} based on the Poisson model. For the studied samples, the exponential characteristic function is selected under the assumption that time series are a superposition of some deterministic and random processes. Using the Fourier transform, the characteristic function is transformed into a nonholomorphic excessive-asymmetric probability-density function. The statistical distributions of the samples {δ foF2} calculated for the disturbed periods are compared with the obtained model distribution function. According to the Kolmogorov's criterion, the probabilities of the coincidence of a posteriori distributions with the theoretical ones are P 0.7-0.9. The conducted analysis makes it possible to draw a conclusion about the applicability of a model based on the Poisson random process for the statistical description and probabilistic variation estimates during heliogeophysical disturbances of the variations {δ foF2}.
Challenges of DNA-based mark-recapture studies of American black bears
Settlage, K.E.; Van Manen, F.T.; Clark, J.D.; King, T.L.
2008-01-01
We explored whether genetic sampling would be feasible to provide a region-wide population estimate for American black bears (Ursus americanus) in the southern Appalachians, USA. Specifically, we determined whether adequate capture probabilities (p >0.20) and population estimates with a low coefficient of variation (CV <20%) could be achieved given typical agency budget and personnel constraints. We extracted DNA from hair collected from baited barbed-wire enclosures sampled over a 10-week period on 2 study areas: a high-density black bear population in a portion of Great Smoky Mountains National Park and a lower density population on National Forest lands in North Carolina, South Carolina, and Georgia. We identified individual bears by their unique genotypes obtained from 9 microsatellite loci. We sampled 129 and 60 different bears in the National Park and National Forest study areas, respectively, and applied closed mark–recapture models to estimate population abundance. Capture probabilities and precision of the population estimates were acceptable only for sampling scenarios for which we pooled weekly sampling periods. We detected capture heterogeneity biases, probably because of inadequate spatial coverage by the hair-trapping grid. The logistical challenges of establishing and checking a sufficiently high density of hair traps make DNA-based estimates of black bears impractical for the southern Appalachian region. Alternatives are to estimate population size for smaller areas, estimate population growth rates or survival using mark–recapture methods, or use independent marking and recapturing techniques to reduce capture heterogeneity.
Vallée, Julie; Souris, Marc; Fournet, Florence; Bochaton, Audrey; Mobillion, Virginie; Peyronnie, Karine; Salem, Gérard
2007-06-01
Geographical objectives and probabilistic methods are difficult to reconcile in a unique health survey. Probabilistic methods focus on individuals to provide estimates of a variable's prevalence with a certain precision, while geographical approaches emphasise the selection of specific areas to study interactions between spatial characteristics and health outcomes. A sample selected from a small number of specific areas creates statistical challenges: the observations are not independent at the local level, and this results in poor statistical validity at the global level. Therefore, it is difficult to construct a sample that is appropriate for both geographical and probability methods. We used a two-stage selection procedure with a first non-random stage of selection of clusters. Instead of randomly selecting clusters, we deliberately chose a group of clusters, which as a whole would contain all the variation in health measures in the population. As there was no health information available before the survey, we selected a priori determinants that can influence the spatial homogeneity of the health characteristics. This method yields a distribution of variables in the sample that closely resembles that in the overall population, something that cannot be guaranteed with randomly-selected clusters, especially if the number of selected clusters is small. In this way, we were able to survey specific areas while minimising design effects and maximising statistical precision. We applied this strategy in a health survey carried out in Vientiane, Lao People's Democratic Republic. We selected well-known health determinants with unequal spatial distribution within the city: nationality and literacy. We deliberately selected a combination of clusters whose distribution of nationality and literacy is similar to the distribution in the general population. This paper describes the conceptual reasoning behind the construction of the survey sample and shows that it can be advantageous to choose clusters using reasoned hypotheses, based on both probability and geographical approaches, in contrast to a conventional, random cluster selection strategy.
Sightability adjustment methods for aerial surveys of wildlife populations
Steinhorst, R.K.; Samuel, M.D.
1989-01-01
Aerial surveys are routinely conducted to estimate the abundance of wildlife species and the rate of population change. However, sightability of animal groups is acknowledged as a significant source of bias in these estimates. Recent research has focused on the development of sightability models to predict the probability of sighting groups under various conditions. Given such models, we show how sightability can be incorporated into the estimator of population size as a probability of response using standard results from sample surveys. We develop formulas for the cases where the sighting probability must be estimated. An example, using data from a helicopter survey of moose in Alberta (Jacobson, Alberta Oil Sands Research Project Report, 1976), is given to illustrate the technique.
Distribution of the two-sample t-test statistic following blinded sample size re-estimation.
Lu, Kaifeng
2016-05-01
We consider the blinded sample size re-estimation based on the simple one-sample variance estimator at an interim analysis. We characterize the exact distribution of the standard two-sample t-test statistic at the final analysis. We describe a simulation algorithm for the evaluation of the probability of rejecting the null hypothesis at given treatment effect. We compare the blinded sample size re-estimation method with two unblinded methods with respect to the empirical type I error, the empirical power, and the empirical distribution of the standard deviation estimator and final sample size. We characterize the type I error inflation across the range of standardized non-inferiority margin for non-inferiority trials, and derive the adjusted significance level to ensure type I error control for given sample size of the internal pilot study. We show that the adjusted significance level increases as the sample size of the internal pilot study increases. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Munch, Jean W; Bassett, Margarita V
2006-01-01
N-nitrosodimethylamine (NDMA) is a probable human carcinogen of concern that has been identified as a drinking water contaminant. U.S. Environmental Protection Agency Method 521 has been developed for the analysis of NDMA and 6 additional N-nitrosamines in drinking water at low ng/L concentrations. The method uses solid-phase extraction with coconut charcoal as the sorbent and dichloromethane as the eluent to concentrate 0.50 L water samples to 1 mL. The extracts are analyzed by gas chromatography-chemical ionization tandem mass spectrometry using large-volume injection. Method performance was evaluated in 2 laboratories. Typical analyte recoveries of 87-104% were demonstrated for fortified reagent water samples, and recoveries of 77-106% were demonstrated for fortified drinking water samples. All relative standard deviations on replicate analyses were < 11%.
Mondol, Samrat; Navya, R; Athreya, Vidya; Sunagar, Kartik; Selvaraj, Velu Mani; Ramakrishnan, Uma
2009-12-04
Leopards are the most widely distributed of the large cats, ranging from Africa to the Russian Far East. Because of habitat fragmentation, high human population densities and the inherent adaptability of this species, they now occupy landscapes close to human settlements. As a result, they are the most common species involved in human wildlife conflict in India, necessitating their monitoring. However, their elusive nature makes such monitoring difficult. Recent advances in DNA methods along with non-invasive sampling techniques can be used to monitor populations and individuals across large landscapes including human dominated ones. In this paper, we describe a DNA-based method for leopard individual identification where we used fecal DNA samples to obtain genetic material. Further, we apply our methods to non-invasive samples collected in a human-dominated landscape to estimate the minimum number of leopards in this human-leopard conflict area in Western India. In this study, 25 of the 29 tested cross-specific microsatellite markers showed positive amplification in 37 wild-caught leopards. These loci revealed varied levels of polymorphism (four-12 alleles) and heterozygosity (0.05-0.79). Combining data on amplification success (including non-invasive samples) and locus specific polymorphisms, we showed that eight loci provide a sibling probability of identity of 0.0005, suggesting that this panel can be used to discriminate individuals in the wild. When this microsatellite panel was applied to fecal samples collected from a human-dominated landscape, we identified 7 individuals, with a sibling probability of identity of 0.001. Amplification success of field collected scats was up to 72%, and genotype error ranged from 0-7.4%. Our results demonstrated that the selected panel of eight microsatellite loci can conclusively identify leopards from various kinds of biological samples. Our methods can be used to monitor leopards over small and large landscapes to assess population trends, as well as could be tested for population assignment in forensic applications.
2009-01-01
Background Leopards are the most widely distributed of the large cats, ranging from Africa to the Russian Far East. Because of habitat fragmentation, high human population densities and the inherent adaptability of this species, they now occupy landscapes close to human settlements. As a result, they are the most common species involved in human wildlife conflict in India, necessitating their monitoring. However, their elusive nature makes such monitoring difficult. Recent advances in DNA methods along with non-invasive sampling techniques can be used to monitor populations and individuals across large landscapes including human dominated ones. In this paper, we describe a DNA-based method for leopard individual identification where we used fecal DNA samples to obtain genetic material. Further, we apply our methods to non-invasive samples collected in a human-dominated landscape to estimate the minimum number of leopards in this human-leopard conflict area in Western India. Results In this study, 25 of the 29 tested cross-specific microsatellite markers showed positive amplification in 37 wild-caught leopards. These loci revealed varied levels of polymorphism (four-12 alleles) and heterozygosity (0.05-0.79). Combining data on amplification success (including non-invasive samples) and locus specific polymorphisms, we showed that eight loci provide a sibling probability of identity of 0.0005, suggesting that this panel can be used to discriminate individuals in the wild. When this microsatellite panel was applied to fecal samples collected from a human-dominated landscape, we identified 7 individuals, with a sibling probability of identity of 0.001. Amplification success of field collected scats was up to 72%, and genotype error ranged from 0-7.4%. Conclusion Our results demonstrated that the selected panel of eight microsatellite loci can conclusively identify leopards from various kinds of biological samples. Our methods can be used to monitor leopards over small and large landscapes to assess population trends, as well as could be tested for population assignment in forensic applications. PMID:19961605
Storkel, Holly L.; Bontempo, Daniel E.; Aschenbrenner, Andrew J.; Maekawa, Junko; Lee, Su-Yeon
2013-01-01
Purpose Phonotactic probability or neighborhood density have predominately been defined using gross distinctions (i.e., low vs. high). The current studies examined the influence of finer changes in probability (Experiment 1) and density (Experiment 2) on word learning. Method The full range of probability or density was examined by sampling five nonwords from each of four quartiles. Three- and 5-year-old children received training on nonword-nonobject pairs. Learning was measured in a picture-naming task immediately following training and 1-week after training. Results were analyzed using multi-level modeling. Results A linear spline model best captured nonlinearities in phonotactic probability. Specifically word learning improved as probability increased in the lowest quartile, worsened as probability increased in the midlow quartile, and then remained stable and poor in the two highest quartiles. An ordinary linear model sufficiently described neighborhood density. Here, word learning improved as density increased across all quartiles. Conclusion Given these different patterns, phonotactic probability and neighborhood density appear to influence different word learning processes. Specifically, phonotactic probability may affect recognition that a sound sequence is an acceptable word in the language and is a novel word for the child, whereas neighborhood density may influence creation of a new representation in long-term memory. PMID:23882005
The Sampling Design of the China Family Panel Studies (CFPS)
Xie, Yu; Lu, Ping
2018-01-01
The China Family Panel Studies (CFPS) is an on-going, nearly nationwide, comprehensive, longitudinal social survey that is intended to serve research needs on a large variety of social phenomena in contemporary China. In this paper, we describe the sampling design of the CFPS sample for its 2010 baseline survey and methods for constructing weights to adjust for sampling design and survey nonresponses. Specifically, the CFPS used a multi-stage probability strategy to reduce operation costs and implicit stratification to increase efficiency. Respondents were oversampled in five provinces or administrative equivalents for regional comparisons. We provide operation details for both sampling and weights construction. PMID:29854418
Online Reinforcement Learning Using a Probability Density Estimation.
Agostini, Alejandro; Celaya, Enric
2017-01-01
Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and nonstationarity. In this kind of task, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The nonstationarity comes from the recursive nature of the estimations typical of temporal difference methods. This nonstationarity has a local profile, varying not only along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a gaussian mixture model. To deal with the nonstationarity problem, we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it dependent on the local density of samples, which we use to estimate the nonstationarity of the function at any given input point. To address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting depending only on time, thus avoiding undesired distortions of the approximation in less sampled regions.
Priority-pollutant trace elements in streambed sediments of the Cook Inlet basin, Alaska, 1998-2000
Frenzel, Steven A.
2002-01-01
Trace element concentrations in 48 streambed sediment samples collected at 47 sites in the Cook Inlet Basin, Alaska, were compared to concentrations from studies in the conterminous United States using identical methods and to Probable Effect Concentrations. Concentrations of arsenic, chromium, mercury, and nickel in the 0.063-mm size fraction of streambed sediments from the Cook Inlet Basin were elevated relative to reference sites in the conterminous United States. Concentrations of cadmium, lead, and zinc were highest at the most urbanized site in Anchorage and at two sites downstream from an ore body in Lake Clark National Park and Preserve. At least 35 percent of the 48 samples collected in the Cook Inlet Basin exceeded the Probable Effect Concentration for arsenic, chromium, or nickel. More than 50 percent of the samples were considered to have low potential toxicity for cadmium, lead, mercury, nickel, selenium, and zinc. A Probable Effect Concentration quotient that reflects the combined toxicity of arsenic, cadmium, chromium, copper, lead, mercury, nickel, and zinc was exceeded in 44 percent of the samples from the Cook Inlet Basin. The potential toxicity was high in the Denali and Lake Clark National Parks and Preserves where organic carbon concentrations in streambed sediments were low. However, potential toxicity results should be considered in context with the very small amounts of fine-grained sediment present in the streambed sediments of the Cook Inlet Basin.
Method of identifying clusters representing statistical dependencies in multivariate data
NASA Technical Reports Server (NTRS)
Borucki, W. J.; Card, D. H.; Lyle, G. C.
1975-01-01
Approach is first to cluster and then to compute spatial boundaries for resulting clusters. Next step is to compute, from set of Monte Carlo samples obtained from scrambled data, estimates of probabilities of obtaining at least as many points within boundaries as were actually observed in original data.
ERIC Educational Resources Information Center
Ashworth, Stephanie R.
2013-01-01
The study examined the relationship between secondary public school principals' emotional intelligence and school performance. The correlational study employed an explanatory sequential mixed methods model. The non-probability sample consisted of 105 secondary public school principals in Texas. The emotional intelligence characteristics of the…
Estimating abundance of mountain lions from unstructured spatial sampling
Robin E. Russell; J. Andrew Royle; Richard Desimone; Michael K. Schwartz; Victoria L. Edwards; Kristy P. Pilgrim; Kevin S. McKelvey
2012-01-01
Mountain lions (Puma concolor) are often difficult to monitor because of their low capture probabilities, extensive movements, and large territories. Methods for estimating the abundance of this species are needed to assess population status, determine harvest levels, evaluate the impacts of management actions on populations, and derive conservation and management...
Presence/absence as a metric for monitoring vertebrate populations
Len Ruggiero; Dean Pearson
2000-01-01
Developing cost effective methods for monitoring vertebrate populations is a persistent problem in wildlife biology. Population demographic data is too costly and time intensive to acquire, so researchers have begun investigating presence/absence sampling as a means for monitoring wildlife populations. We examined three important assumptions regarding the probability...
What Are Probability Surveys used by the National Aquatic Resource Surveys?
The National Aquatic Resource Surveys (NARS) use probability-survey designs to assess the condition of the nation’s waters. In probability surveys (also known as sample-surveys or statistical surveys), sampling sites are selected randomly.
Chelgren, Nathan D.; Samora, Barbara; Adams, Michael J.; McCreary, Brome
2011-01-01
High variability in abundance, cryptic coloration, and small body size of newly metamorphosed anurans have limited demographic studies of this life-history stage. We used line-transect distance sampling and Bayesian methods to estimate the abundance and spatial distribution of newly metamorphosed Western Toads (Anaxyrus boreas) in terrestrial habitat surrounding a montane lake in central Washington, USA. We completed 154 line-transect surveys from the commencement of metamorphosis (15 September 2009) to the date of first snow accumulation in fall (1 October 2009), and located 543 newly metamorphosed toads. After accounting for variable detection probability associated with the extent of barren habitats, estimates of total surface abundance ranged from a posterior median of 3,880 (95% credible intervals from 2,235 to 12,600) in the first week of sampling to 12,150 (5,543 to 51,670) during the second week of sampling. Numbers of newly metamorphosed toads dropped quickly with increasing distance from the lakeshore in a pattern that differed over the three weeks of the study and contradicted our original hypotheses. Though we hypothesized that the spatial distribution of toads would initially be concentrated near the lake shore and then spread outward from the lake over time, we observed the opposite. Ninety-five percent of individuals occurred within 20, 16, and 15 m of shore during weeks one, two, and three respectively, probably reflecting continued emergence of newly metamorphosed toads from the lake and mortality or burrow use of dispersed individuals. Numbers of toads were highest near the inlet stream of the lake. Distance sampling may provide a useful method for estimating the surface abundance of newly metamorphosed toads and relating their space use to landscape variables despite uncertain and variable probability of detection. We discuss means of improving the precision of estimates of total abundance.
NASA Astrophysics Data System (ADS)
Sonnenfeld, Alessandro; Chan, James H. H.; Shu, Yiping; More, Anupreeta; Oguri, Masamune; Suyu, Sherry H.; Wong, Kenneth C.; Lee, Chien-Hsiu; Coupon, Jean; Yonehara, Atsunori; Bolton, Adam S.; Jaelani, Anton T.; Tanaka, Masayuki; Miyazaki, Satoshi; Komiyama, Yutaka
2018-01-01
The Hyper Suprime-Cam Subaru Strategic Program (HSC-SSP) is an excellent survey for the search for strong lenses, thanks to its area, image quality, and depth. We use three different methods to look for lenses among 43000 luminous red galaxies from the Baryon Oscillation Spectroscopic Survey (BOSS) sample with photometry from the S16A internal data release of the HSC-SSP. The first method is a newly developed algorithm, named YATTALENS, which looks for arc-like features around massive galaxies and then estimates the likelihood of an object being a lens by performing a lens model fit. The second method, CHITAH, is a modeling-based algorithm originally developed to look for lensed quasars. The third method makes use of spectroscopic data to look for emission lines from objects at a different redshift from that of the main galaxy. We find 15 definite lenses, 36 highly probable lenses, and 282 possible lenses. Among the three methods, YATTALENS, which was developed specifically for this study, performs best in terms of both completeness and purity. Nevertheless, five highly probable lenses were missed by YATTALENS but found by the other two methods, indicating that the three methods are highly complementary. Based on these numbers, we expect to find ˜300 definite or probable lenses by the end of the HSC-SSP.
NASA Astrophysics Data System (ADS)
Shayanfar, Mohsen Ali; Barkhordari, Mohammad Ali; Roudak, Mohammad Amin
2017-06-01
Monte Carlo simulation (MCS) is a useful tool for computation of probability of failure in reliability analysis. However, the large number of required random samples makes it time-consuming. Response surface method (RSM) is another common method in reliability analysis. Although RSM is widely used for its simplicity, it cannot be trusted in highly nonlinear problems due to its linear nature. In this paper, a new efficient algorithm, employing the combination of importance sampling, as a class of MCS, and RSM is proposed. In the proposed algorithm, analysis starts with importance sampling concepts and using a represented two-step updating rule of design point. This part finishes after a small number of samples are generated. Then RSM starts to work using Bucher experimental design, with the last design point and a represented effective length as the center point and radius of Bucher's approach, respectively. Through illustrative numerical examples, simplicity and efficiency of the proposed algorithm and the effectiveness of the represented rules are shown.
Probabilistic liver atlas construction.
Dura, Esther; Domingo, Juan; Ayala, Guillermo; Marti-Bonmati, Luis; Goceri, E
2017-01-13
Anatomical atlases are 3D volumes or shapes representing an organ or structure of the human body. They contain either the prototypical shape of the object of interest together with other shapes representing its statistical variations (statistical atlas) or a probability map of belonging to the object (probabilistic atlas). Probabilistic atlases are mostly built with simple estimations only involving the data at each spatial location. A new method for probabilistic atlas construction that uses a generalized linear model is proposed. This method aims to improve the estimation of the probability to be covered by the liver. Furthermore, all methods to build an atlas involve previous coregistration of the sample of shapes available. The influence of the geometrical transformation adopted for registration in the quality of the final atlas has not been sufficiently investigated. The ability of an atlas to adapt to a new case is one of the most important quality criteria that should be taken into account. The presented experiments show that some methods for atlas construction are severely affected by the previous coregistration step. We show the good performance of the new approach. Furthermore, results suggest that extremely flexible registration methods are not always beneficial, since they can reduce the variability of the atlas and hence its ability to give sensible values of probability when used as an aid in segmentation of new cases.
Geostatistical applications in environmental remediation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stewart, R.N.; Purucker, S.T.; Lyon, B.F.
1995-02-01
Geostatistical analysis refers to a collection of statistical methods for addressing data that vary in space. By incorporating spatial information into the analysis, geostatistics has advantages over traditional statistical analysis for problems with a spatial context. Geostatistics has a history of success in earth science applications, and its popularity is increasing in other areas, including environmental remediation. Due to recent advances in computer technology, geostatistical algorithms can be executed at a speed comparable to many standard statistical software packages. When used responsibly, geostatistics is a systematic and defensible tool can be used in various decision frameworks, such as the Datamore » Quality Objectives (DQO) process. At every point in the site, geostatistics can estimate both the concentration level and the probability or risk of exceeding a given value. Using these probability maps can assist in identifying clean-up zones. Given any decision threshold and an acceptable level of risk, the probability maps identify those areas that are estimated to be above or below the acceptable risk. Those areas that are above the threshold are of the most concern with regard to remediation. In addition to estimating clean-up zones, geostatistics can assist in designing cost-effective secondary sampling schemes. Those areas of the probability map with high levels of estimated uncertainty are areas where more secondary sampling should occur. In addition, geostatistics has the ability to incorporate soft data directly into the analysis. These data include historical records, a highly correlated secondary contaminant, or expert judgment. The role of geostatistics in environmental remediation is a tool that in conjunction with other methods can provide a common forum for building consensus.« less
Pointwise probability reinforcements for robust statistical inference.
Frénay, Benoît; Verleysen, Michel
2014-02-01
Statistical inference using machine learning techniques may be difficult with small datasets because of abnormally frequent data (AFDs). AFDs are observations that are much more frequent in the training sample that they should be, with respect to their theoretical probability, and include e.g. outliers. Estimates of parameters tend to be biased towards models which support such data. This paper proposes to introduce pointwise probability reinforcements (PPRs): the probability of each observation is reinforced by a PPR and a regularisation allows controlling the amount of reinforcement which compensates for AFDs. The proposed solution is very generic, since it can be used to robustify any statistical inference method which can be formulated as a likelihood maximisation. Experiments show that PPRs can be easily used to tackle regression, classification and projection: models are freed from the influence of outliers. Moreover, outliers can be filtered manually since an abnormality degree is obtained for each observation. Copyright © 2013 Elsevier Ltd. All rights reserved.
Comparison of methods for estimating the attributable risk in the context of survival analysis.
Gassama, Malamine; Bénichou, Jacques; Dartois, Laureen; Thiébaut, Anne C M
2017-01-23
The attributable risk (AR) measures the proportion of disease cases that can be attributed to an exposure in the population. Several definitions and estimation methods have been proposed for survival data. Using simulations, we compared four methods for estimating AR defined in terms of survival functions: two nonparametric methods based on Kaplan-Meier's estimator, one semiparametric based on Cox's model, and one parametric based on the piecewise constant hazards model, as well as one simpler method based on estimated exposure prevalence at baseline and Cox's model hazard ratio. We considered a fixed binary exposure with varying exposure probabilities and strengths of association, and generated event times from a proportional hazards model with constant or monotonic (decreasing or increasing) Weibull baseline hazard, as well as from a nonproportional hazards model. We simulated 1,000 independent samples of size 1,000 or 10,000. The methods were compared in terms of mean bias, mean estimated standard error, empirical standard deviation and 95% confidence interval coverage probability at four equally spaced time points. Under proportional hazards, all five methods yielded unbiased results regardless of sample size. Nonparametric methods displayed greater variability than other approaches. All methods showed satisfactory coverage except for nonparametric methods at the end of follow-up for a sample size of 1,000 especially. With nonproportional hazards, nonparametric methods yielded similar results to those under proportional hazards, whereas semiparametric and parametric approaches that both relied on the proportional hazards assumption performed poorly. These methods were applied to estimate the AR of breast cancer due to menopausal hormone therapy in 38,359 women of the E3N cohort. In practice, our study suggests to use the semiparametric or parametric approaches to estimate AR as a function of time in cohort studies if the proportional hazards assumption appears appropriate.
Prevalence of anxiety, depression and post-traumatic stress disorder in the Kashmir Valley
Lenglet, Annick; Ariti, Cono; Shah, Showkat; Shah, Helal; Ara, Shabnum; Viney, Kerri; Janes, Simon; Pintaldi, Giovanni
2017-01-01
Background Following the partition of India in 1947, the Kashmir Valley has been subject to continual political insecurity and ongoing conflict, the region remains highly militarised. We conducted a representative cross-sectional population-based survey of adults to estimate the prevalence and predictors of anxiety, depression and post-traumatic stress disorder (PTSD) in the 10 districts of the Kashmir Valley. Methods Between October and December 2015, we interviewed 5519 out of 5600 invited participants, ≥18 years of age, randomly sampled using a probability proportional to size cluster sampling design. We estimated the prevalence of a probable psychological disorder using the Hopkins Symptom Checklist (HSCL-25) and the Harvard Trauma Questionnaire (HTQ-16). Both screening instruments had been culturally adapted and translated. Data were weighted to account for the sampling design and multivariate logistic regression analysis was conducted to identify risk factors for developing symptoms of psychological distress. Findings The estimated prevalence of mental distress in adults in the Kashmir Valley was 45% (95% CI 42.6 to 47.0). We identified 41% (95% CI 39.2 to 43.4) of adults with probable depression, 26% (95% CI 23.8 to 27.5) with probable anxiety and 19% (95% CI 17.5 to 21.2) with probable PTSD. The three disorders were associated with the following characteristics: being female, over 55 years of age, having had no formal education, living in a rural area and being widowed/divorced or separated. A dose–response association was found between the number of traumatic events experienced or witnessed and all three mental disorders. Interpretation The implementation of mental health awareness programmes, interventions aimed at high risk groups and addressing trauma-related symptoms from all causes are needed in the Kashmir Valley. PMID:29082026
Generating intrinsically disordered protein conformational ensembles from a Markov chain
NASA Astrophysics Data System (ADS)
Cukier, Robert I.
2018-03-01
Intrinsically disordered proteins (IDPs) sample a diverse conformational space. They are important to signaling and regulatory pathways in cells. An entropy penalty must be payed when an IDP becomes ordered upon interaction with another protein or a ligand. Thus, the degree of conformational disorder of an IDP is of interest. We create a dichotomic Markov model that can explore entropic features of an IDP. The Markov condition introduces local (neighbor residues in a protein sequence) rotamer dependences that arise from van der Waals and other chemical constraints. A protein sequence of length N is characterized by its (information) entropy and mutual information, MIMC, the latter providing a measure of the dependence among the random variables describing the rotamer probabilities of the residues that comprise the sequence. For a Markov chain, the MIMC is proportional to the pair mutual information MI which depends on the singlet and pair probabilities of neighbor residue rotamer sampling. All 2N sequence states are generated, along with their probabilities, and contrasted with the probabilities under the assumption of independent residues. An efficient method to generate realizations of the chain is also provided. The chain entropy, MIMC, and state probabilities provide the ingredients to distinguish different scenarios using the terminologies: MoRF (molecular recognition feature), not-MoRF, and not-IDP. A MoRF corresponds to large entropy and large MIMC (strong dependence among the residues' rotamer sampling), a not-MoRF corresponds to large entropy but small MIMC, and not-IDP corresponds to low entropy irrespective of the MIMC. We show that MorFs are most appropriate as descriptors of IDPs. They provide a reasonable number of high-population states that reflect the dependences between neighbor residues, thus classifying them as IDPs, yet without very large entropy that might lead to a too high entropy penalty.
Short assessment of the Big Five: robust across survey methods except telephone interviewing.
Lang, Frieder R; John, Dennis; Lüdtke, Oliver; Schupp, Jürgen; Wagner, Gert G
2011-06-01
We examined measurement invariance and age-related robustness of a short 15-item Big Five Inventory (BFI-S) of personality dimensions, which is well suited for applications in large-scale multidisciplinary surveys. The BFI-S was assessed in three different interviewing conditions: computer-assisted or paper-assisted face-to-face interviewing, computer-assisted telephone interviewing, and a self-administered questionnaire. Randomized probability samples from a large-scale German panel survey and a related probability telephone study were used in order to test method effects on self-report measures of personality characteristics across early, middle, and late adulthood. Exploratory structural equation modeling was used in order to test for measurement invariance of the five-factor model of personality trait domains across different assessment methods. For the short inventory, findings suggest strong robustness of self-report measures of personality dimensions among young and middle-aged adults. In old age, telephone interviewing was associated with greater distortions in reliable personality assessment. It is concluded that the greater mental workload of telephone interviewing limits the reliability of self-report personality assessment. Face-to-face surveys and self-administrated questionnaire completion are clearly better suited than phone surveys when personality traits in age-heterogeneous samples are assessed.
Switzer, P.; Harden, J.W.; Mark, R.K.
1988-01-01
A statistical method for estimating rates of soil development in a given region based on calibration from a series of dated soils is used to estimate ages of soils in the same region that are not dated directly. The method is designed specifically to account for sampling procedures and uncertainties that are inherent in soil studies. Soil variation and measurement error, uncertainties in calibration dates and their relation to the age of the soil, and the limited number of dated soils are all considered. Maximum likelihood (ML) is employed to estimate a parametric linear calibration curve, relating soil development to time or age on suitably transformed scales. Soil variation on a geomorphic surface of a certain age is characterized by replicate sampling of soils on each surface; such variation is assumed to have a Gaussian distribution. The age of a geomorphic surface is described by older and younger bounds. This technique allows age uncertainty to be characterized by either a Gaussian distribution or by a triangular distribution using minimum, best-estimate, and maximum ages. The calibration curve is taken to be linear after suitable (in certain cases logarithmic) transformations, if required, of the soil parameter and age variables. Soil variability, measurement error, and departures from linearity are described in a combined fashion using Gaussian distributions with variances particular to each sampled geomorphic surface and the number of sample replicates. Uncertainty in age of a geomorphic surface used for calibration is described using three parameters by one of two methods. In the first method, upper and lower ages are specified together with a coverage probability; this specification is converted to a Gaussian distribution with the appropriate mean and variance. In the second method, "absolute" older and younger ages are specified together with a most probable age; this specification is converted to an asymmetric triangular distribution with mode at the most probable age. The statistical variability of the ML-estimated calibration curve is assessed by a Monte Carlo method in which simulated data sets repeatedly are drawn from the distributional specification; calibration parameters are reestimated for each such simulation in order to assess their statistical variability. Several examples are used for illustration. The age of undated soils in a related setting may be estimated from the soil data using the fitted calibration curve. A second simulation to assess age estimate variability is described and applied to the examples. ?? 1988 International Association for Mathematical Geology.
Wong, Linda; Hill, Beth L; Hunsberger, Benjamin C; Bagwell, C Bruce; Curtis, Adam D; Davis, Bruce H
2015-01-01
Leuko64™ (Trillium Diagnostics) is a flow cytometric assay that measures neutrophil CD64 expression and serves as an in vitro indicator of infection/sepsis or the presence of a systemic acute inflammatory response. Leuko64 assay currently utilizes QuantiCALC, a semiautomated software that employs cluster algorithms to define cell populations. The software reduces subjective gating decisions, resulting in interanalyst variability of <5%. We evaluated a completely automated approach to measuring neutrophil CD64 expression using GemStone™ (Verity Software House) and probability state modeling (PSM). Four hundred and fifty-seven human blood samples were processed using the Leuko64 assay. Samples were analyzed on four different flow cytometer models: BD FACSCanto II, BD FACScan, BC Gallios/Navios, and BC FC500. A probability state model was designed to identify calibration beads and three leukocyte subpopulations based on differences in intensity levels of several parameters. PSM automatically calculates CD64 index values for each cell population using equations programmed into the model. GemStone software uses PSM that requires no operator intervention, thus totally automating data analysis and internal quality control flagging. Expert analysis with the predicate method (QuantiCALC) was performed. Interanalyst precision was evaluated for both methods of data analysis. PSM with GemStone correlates well with the expert manual analysis, r(2) = 0.99675 for the neutrophil CD64 index values with no intermethod bias detected. The average interanalyst imprecision for the QuantiCALC method was 1.06% (range 0.00-7.94%), which was reduced to 0.00% with the GemStone PSM. The operator-to-operator agreement in GemStone was a perfect correlation, r(2) = 1.000. Automated quantification of CD64 index values produced results that strongly correlate with expert analysis using a standard gate-based data analysis method. PSM successfully evaluated flow cytometric data generated by multiple instruments across multiple lots of the Leuko64 kit in all 457 cases. The probability-based method provides greater objectivity, higher data analysis speed, and allows for greater precision for in vitro diagnostic flow cytometric assays. © 2015 International Clinical Cytometry Society.
NASA Astrophysics Data System (ADS)
Nemoto, Takahiro; Alexakis, Alexandros
2018-02-01
The fluctuations of turbulence intensity in a pipe flow around the critical Reynolds number is difficult to study but important because they are related to turbulent-laminar transitions. We here propose a rare-event sampling method to study such fluctuations in order to measure the time scale of the transition efficiently. The method is composed of two parts: (i) the measurement of typical fluctuations (the bulk part of an accumulative probability function) and (ii) the measurement of rare fluctuations (the tail part of the probability function) by employing dynamics where a feedback control of the Reynolds number is implemented. We apply this method to a chaotic model of turbulent puffs proposed by Barkley and confirm that the time scale of turbulence decay increases super exponentially even for high Reynolds numbers up to Re =2500 , where getting enough statistics by brute-force calculations is difficult. The method uses a simple procedure of changing Reynolds number that can be applied even to experiments.
Multivariate survivorship analysis using two cross-sectional samples.
Hill, M E
1999-11-01
As an alternative to survival analysis with longitudinal data, I introduce a method that can be applied when one observes the same cohort in two cross-sectional samples collected at different points in time. The method allows for the estimation of log-probability survivorship models that estimate the influence of multiple time-invariant factors on survival over a time interval separating two samples. This approach can be used whenever the survival process can be adequately conceptualized as an irreversible single-decrement process (e.g., mortality, the transition to first marriage among a cohort of never-married individuals). Using data from the Integrated Public Use Microdata Series (Ruggles and Sobek 1997), I illustrate the multivariate method through an investigation of the effects of race, parity, and educational attainment on the survival of older women in the United States.
Analysis of injuries in taekwondo athletes.
Ji, MinJoon
2016-01-01
[Purpose] The present study aims to provide fundamental information on injuries in taekwondo by investigating the categories of injuries that occur in taekwondo and determining the locations of these injuries. [Subjects and Methods] The data of 512 taekwondo athletes were collected. The sampling method was convenience sampling along with non-probability sampling extraction methods. Questionnaire forms were used to obtain the data. [Results] The foot, knee, ankle, thigh, and head were most frequently injured while practicing taekwondo, and contusions, strains, and sprains were the main injuries diagnosed. [Conclusion] It is desirable to decrease the possibility of injuries to the lower extremities for extending participation in taekwondo. Other than the lower extremities, injuries of other specific body parts including the head or neck could be important factors limiting the duration of participation. Therefore, it is necessary to cope with these problems before practicing taekwondo.
Validating long-term satellite-derived disturbance products: the case of burned areas
NASA Astrophysics Data System (ADS)
Boschetti, L.; Roy, D. P.
2015-12-01
The potential research, policy and management applications of satellite products place a high priority on providing statements about their accuracy. A number of NASA, ESA and EU funded global and continental burned area products have been developed using coarse spatial resolution satellite data, and have the potential to become part of a long-term fire Climate Data Record. These products have usually been validated by comparison with reference burned area maps derived by visual interpretation of Landsat or similar spatial resolution data selected on an ad hoc basis. More optimally, a design-based validation method should be adopted that is characterized by the selection of reference data via a probability sampling that can subsequently be used to compute accuracy metrics, taking into account the sampling probability. Design based techniques have been used for annual land cover and land cover change product validation, but have not been widely used for burned area products, or for the validation of global products that are highly variable in time and space (e.g. snow, floods or other non-permanent phenomena). This has been due to the challenge of designing an appropriate sampling strategy, and to the cost of collecting independent reference data. We propose a tri-dimensional sampling grid that allows for probability sampling of Landsat data in time and in space. To sample the globe in the spatial domain with non-overlapping sampling units, the Thiessen Scene Area (TSA) tessellation of the Landsat WRS path/rows is used. The TSA grid is then combined with the 16-day Landsat acquisition calendar to provide tri-dimensonal elements (voxels). This allows the implementation of a sampling design where not only the location but also the time interval of the reference data is explicitly drawn by probability sampling. The proposed sampling design is a stratified random sampling, with two-level stratification of the voxels based on biomes and fire activity (Figure 1). The novel validation approach, used for the validation of the MODIS and forthcoming VIIRS global burned area products, is a general one, and could be used for the validation of other global products that are highly variable in space and time and is required to assess the accuracy of climate records. The approach is demonstrated using a 1 year dataset of MODIS fire products.
Kendall, W.L.; Nichols, J.D.; North, P.M.; Nichols, J.D.
1995-01-01
The use of the Cormack- Jolly-Seber model under a standard sampling scheme of one sample per time period, when the Jolly-Seber assumption that all emigration is permanent does not hold, leads to the confounding of temporary emigration probabilities with capture probabilities. This biases the estimates of capture probability when temporary emigration is a completely random process, and both capture and survival probabilities when there is a temporary trap response in temporary emigration, or it is Markovian. The use of secondary capture samples over a shorter interval within each period, during which the population is assumed to be closed (Pollock's robust design), provides a second source of information on capture probabilities. This solves the confounding problem, and thus temporary emigration probabilities can be estimated. This process can be accomplished in an ad hoc fashion for completely random temporary emigration and to some extent in the temporary trap response case, but modelling the complete sampling process provides more flexibility and permits direct estimation of variances. For the case of Markovian temporary emigration, a full likelihood is required.
Deviney, Frank A.; Rice, Karen; Brown, Donald E.
2012-01-01
Natural resource managers require information concerning the frequency, duration, and long-term probability of occurrence of water-quality indicator (WQI) violations of defined thresholds. The timing of these threshold crossings often is hidden from the observer, who is restricted to relatively infrequent observations. Here, a model for the hidden process is linked with a model for the observations, and the parameters describing duration, return period, and long-term probability of occurrence are estimated using Bayesian methods. A simulation experiment is performed to evaluate the approach under scenarios based on the equivalent of a total monitoring period of 5-30 years and an observation frequency of 1-50 observations per year. Given constant threshold crossing rate, accuracy and precision of parameter estimates increased with longer total monitoring period and more-frequent observations. Given fixed monitoring period and observation frequency, accuracy and precision of parameter estimates increased with longer times between threshold crossings. For most cases where the long-term probability of being in violation is greater than 0.10, it was determined that at least 600 observations are needed to achieve precise estimates. An application of the approach is presented using 22 years of quasi-weekly observations of acid-neutralizing capacity from Deep Run, a stream in Shenandoah National Park, Virginia. The time series also was sub-sampled to simulate monthly and semi-monthly sampling protocols. Estimates of the long-term probability of violation were unbiased despite sampling frequency; however, the expected duration and return period were over-estimated using the sub-sampled time series with respect to the full quasi-weekly time series.
Probability of detection of nests and implications for survey design
Smith, P.A.; Bart, J.; Lanctot, Richard B.; McCaffery, B.J.; Brown, S.
2009-01-01
Surveys based on double sampling include a correction for the probability of detection by assuming complete enumeration of birds in an intensively surveyed subsample of plots. To evaluate this assumption, we calculated the probability of detecting active shorebird nests by using information from observers who searched the same plots independently. Our results demonstrate that this probability varies substantially by species and stage of the nesting cycle but less by site or density of nests. Among the species we studied, the estimated single-visit probability of nest detection during the incubation period varied from 0.21 for the White-rumped Sandpiper (Calidris fuscicollis), the most difficult species to detect, to 0.64 for the Western Sandpiper (Calidris mauri), the most easily detected species, with a mean across species of 0.46. We used these detection probabilities to predict the fraction of persistent nests found over repeated nest searches. For a species with the mean value for detectability, the detection rate exceeded 0.85 after four visits. This level of nest detection was exceeded in only three visits for the Western Sandpiper, but six to nine visits were required for the White-rumped Sandpiper, depending on the type of survey employed. Our results suggest that the double-sampling method's requirement of nearly complete counts of birds in the intensively surveyed plots is likely to be met for birds with nests that survive over several visits of nest searching. Individuals with nests that fail quickly or individuals that do not breed can be detected with high probability only if territorial behavior is used to identify likely nesting pairs. ?? The Cooper Ornithological Society, 2009.
Witt, E. C.; Hippe, D.J.; Giovannitti, R.M.
1992-01-01
A total of 304 nutrient samples were collected from May 1990 through September 1991 to determine concentrations and loads of nutrients in water discharged from two spring basins in Cumberland County, Pa. Fifty-four percent of these nutrient samples were for the evaluation of (1) laboratory consistency, (2) container and preservative cleanliness, (3) maintenance of analyte representativeness as affected by three different preservation methods, and (4) comparison of analyte results with the "Most Probable Value" for Standard Reference Water Samples. Results of 37 duplicate analyses indicate that the Pennsylvania Department of Environmental Resources, Bureau of Laboratories (principal laboratory) remained within its ±10 percent goal for all but one analyte. Results of the blank analysis show that the sampling containers did not compromise the water quality. However, mercuric-chloride-preservation blanks apparently contained measurable ammonium in four of five samples and ammonium plus organic nitrogen in two of five samples. Interlaboratory results indicate substantial differences in the determination of nitrate and ammonium plus organic nitrogen between the principal laboratory and the U.S. Geological Survey National Water-Quality Laboratory. In comparison with the U.S. Environmental Protection Agency Quality-Control Samples, the principal laboratory was sufficiently accurate in its determination of nutrient anafytes. Analysis of replicate samples indicated that sulfuric-acid preservative best maintained the representativeness of the anafytes nitrate and ammonium plus organic nitrogen, whereas, mercuric chloride best maintained the representativeness of orthophosphate. Comparison of nutrient analyte determinations with the Most Probable Value for each preservation method shows that two of five analytes with no chemical preservative compare well, three of five with mercuric-chloride preservative compare well, and three of five with sulfuricacid preservative compare well.
New approach to calibrating bed load samplers
Hubbell, D.W.; Stevens, H.H.; Skinner, J.V.; Beverage, J.P.
1985-01-01
Cyclic variations in bed load discharge at a point, which are an inherent part of the process of bed load movement, complicate calibration of bed load samplers and preclude the use of average rates to define sampling efficiencies. Calibration curves, rather than efficiencies, are derived by two independent methods using data collected with prototype versions of the Helley‐Smith sampler in a large calibration facility capable of continuously measuring transport rates across a 9 ft (2.7 m) width. Results from both methods agree. Composite calibration curves, based on matching probability distribution functions of samples and measured rates from different hydraulic conditions (runs), are obtained for six different versions of the sampler. Sampled rates corrected by the calibration curves agree with measured rates for individual runs.
Direct sampling of chemical weapons in water by photoionization mass spectrometry.
Syage, Jack A; Cai, Sheng-Suan; Li, Jianwei; Evans, Matthew D
2006-05-01
The vulnerability of water supplies to toxic contamination calls for fast and effective means for screening water samples for multiple threats. We describe the use of photoionization (PI) mass spectrometry (MS) for high-speed, high-throughput screening and molecular identification of chemical weapons (CW) threats and other hazardous compounds. The screening technology can detect a wide range of compounds at subacute concentrations with no sample preparation and a sampling cycle time of approximately 45 s. The technology was tested with CW agents VX, GA, GB, GD, GF, HD, HN1, and HN3, in addition to riot agents and precursors. All are sensitively detected and give simple PI mass spectra dominated by the parent ion. The target application of the PI MS method is as a routine, real-time early warning system for CW agents and other hazardous compounds in air and in water. In this work, we also present comprehensive measurements for water analysis and report on the system detection limits, linearity, quantitation accuracy, and false positive (FP) and false negative rates for concentrations at subacute levels. The latter data are presented in the form of receiver operating characteristic curves of the form of detection probability P(D) versus FP probability P(FP). These measurements were made using the CW surrogate compounds, DMMP, DEMP, DEEP, and DIMP. Method detection limits (3sigma) obtained using a capillary injection method yielded 1, 6, 3, and 2 ng/mL, respectively. These results were obtained using 1-microL injections of water samples without any preparation, corresponding to mass detection limits of 1, 6, 3, and 2 pg, respectively. The linear range was about 3-4 decades and the dynamic range about 4-5 decades. The relative standard deviations were generally <10% at CW subacute concentrations levels.
Evaluation of methods for identifying spawning sites and habitat selection for alosines
Harris, Julianne E.; Hightower, Joseph E.
2010-01-01
Characterization of riverine spawning habitat is important for the management and restoration of anadromous alosines. We examined the relative effectiveness of oblique plankton tows and spawning pads for collecting the eggs of American shad Alosa sapidissima, hickory shad A. mediocris, and “river herring” (a collective term for alewife A. pseudoharengus and blueback herring A. aestivalis) in the Roanoke River, North Carolina. Relatively nonadhesive American shad eggs were only collected by plankton tows, whereas semiadhesive hickory shad and river herring eggs were collected by both methods. Compared with spawning pads, oblique plankton tows had higher probabilities of collecting eggs and led to the identification of longer spawning periods. In assumed spawning areas, twice-weekly plankton sampling for 15 min throughout the spawning season had a 95% or greater probability of collecting at least one egg for all alosines; however, the probabilities were lower in areas with more limited spawning. Comparisons of plankton tows, spawning pads, and two other methods of identifying spawning habitat (direct observation of spawning and examination of female histology) suggested differences in effectiveness and efficiency. Riverwide information on spawning sites and timing for all alosines is most efficiently obtained by plankton sampling. Spawning pads and direct observations of spawning are the best ways to determine microhabitat selectivity for appropriate species, especially when spawning sites have previously been identified. Histological examination can help determine primary spawning sites but is most useful when information on reproductive biology and spawning periodicity is also desired. The target species, riverine habitat conditions, and research goals should be considered when selecting methods with which to evaluate alosine spawning habitat.
The Heuristic Value of p in Inductive Statistical Inference
Krueger, Joachim I.; Heck, Patrick R.
2017-01-01
Many statistical methods yield the probability of the observed data – or data more extreme – under the assumption that a particular hypothesis is true. This probability is commonly known as ‘the’ p-value. (Null Hypothesis) Significance Testing ([NH]ST) is the most prominent of these methods. The p-value has been subjected to much speculation, analysis, and criticism. We explore how well the p-value predicts what researchers presumably seek: the probability of the hypothesis being true given the evidence, and the probability of reproducing significant results. We also explore the effect of sample size on inferential accuracy, bias, and error. In a series of simulation experiments, we find that the p-value performs quite well as a heuristic cue in inductive inference, although there are identifiable limits to its usefulness. We conclude that despite its general usefulness, the p-value cannot bear the full burden of inductive inference; it is but one of several heuristic cues available to the data analyst. Depending on the inferential challenge at hand, investigators may supplement their reports with effect size estimates, Bayes factors, or other suitable statistics, to communicate what they think the data say. PMID:28649206
The Heuristic Value of p in Inductive Statistical Inference.
Krueger, Joachim I; Heck, Patrick R
2017-01-01
Many statistical methods yield the probability of the observed data - or data more extreme - under the assumption that a particular hypothesis is true. This probability is commonly known as 'the' p -value. (Null Hypothesis) Significance Testing ([NH]ST) is the most prominent of these methods. The p -value has been subjected to much speculation, analysis, and criticism. We explore how well the p -value predicts what researchers presumably seek: the probability of the hypothesis being true given the evidence, and the probability of reproducing significant results. We also explore the effect of sample size on inferential accuracy, bias, and error. In a series of simulation experiments, we find that the p -value performs quite well as a heuristic cue in inductive inference, although there are identifiable limits to its usefulness. We conclude that despite its general usefulness, the p -value cannot bear the full burden of inductive inference; it is but one of several heuristic cues available to the data analyst. Depending on the inferential challenge at hand, investigators may supplement their reports with effect size estimates, Bayes factors, or other suitable statistics, to communicate what they think the data say.
Frison, Severine; Kerac, Marko; Checchi, Francesco; Nicholas, Jennifer
2017-01-01
The assessment of the prevalence of acute malnutrition in children under five is widely used for the detection of emergencies, planning interventions, advocacy, and monitoring and evaluation. This study examined PROBIT Methods which convert parameters (mean and standard deviation (SD)) of a normally distributed variable to a cumulative probability below any cut-off to estimate acute malnutrition in children under five using Middle-Upper Arm Circumference (MUAC). We assessed the performance of: PROBIT Method I, with mean MUAC from the survey sample and MUAC SD from a database of previous surveys; and PROBIT Method II, with mean and SD of MUAC observed in the survey sample. Specifically, we generated sub-samples from 852 survey datasets, simulating 100 surveys for eight sample sizes. Overall the methods were tested on 681 600 simulated surveys. PROBIT methods relying on sample sizes as small as 50 had better performance than the classic method for estimating and classifying the prevalence of acute malnutrition. They had better precision in the estimation of acute malnutrition for all sample sizes and better coverage for smaller sample sizes, while having relatively little bias. They classified situations accurately for a threshold of 5% acute malnutrition. Both PROBIT methods had similar outcomes. PROBIT Methods have a clear advantage in the assessment of acute malnutrition prevalence based on MUAC, compared to the classic method. Their use would require much lower sample sizes, thus enable great time and resource savings and permit timely and/or locally relevant prevalence estimates of acute malnutrition for a swift and well-targeted response.
Petitot, Maud; Manceau, Nicolas; Geniez, Philippe; Besnard, Aurélien
2014-09-01
Setting up effective conservation strategies requires the precise determination of the targeted species' distribution area and, if possible, its local abundance. However, detection issues make these objectives complex for most vertebrates. The detection probability is usually <1 and is highly dependent on species phenology and other environmental variables. The aim of this study was to define an optimized survey protocol for the Mediterranean amphibian community, that is, to determine the most favorable periods and the most effective sampling techniques for detecting all species present on a site in a minimum number of field sessions and a minimum amount of prospecting effort. We visited 49 ponds located in the Languedoc region of southern France on four occasions between February and June 2011. Amphibians were detected using three methods: nighttime call count, nighttime visual encounter, and daytime netting. The detection nondetection data obtained was then modeled using site-occupancy models. The detection probability of amphibians sharply differed between species, the survey method used and the date of the survey. These three covariates also interacted. Thus, a minimum of three visits spread over the breeding season, using a combination of all three survey methods, is needed to reach a 95% detection level for all species in the Mediterranean region. Synthesis and applications: detection nondetection surveys combined to site occupancy modeling approach are powerful methods that can be used to estimate the detection probability and to determine the prospecting effort necessary to assert that a species is absent from a site.
Thouand, Gérald; Durand, Marie-José; Maul, Armand; Gancet, Christian; Blok, Han
2011-01-01
The European REACH Regulation (Registration, Evaluation, Authorization of CHemical substances) implies, among other things, the evaluation of the biodegradability of chemical substances produced by industry. A large set of test methods is available including detailed information on the appropriate conditions for testing. However, the inoculum used for these tests constitutes a “black box.” If biodegradation is achievable from the growth of a small group of specific microbial species with the substance as the only carbon source, the result of the test depends largely on the cell density of this group at “time zero.” If these species are relatively rare in an inoculum that is normally used, the likelihood of inoculating a test with sufficient specific cells becomes a matter of probability. Normally this probability increases with total cell density and with the diversity of species in the inoculum. Furthermore the history of the inoculum, e.g., a possible pre-exposure to the test substance or similar substances will have a significant influence on the probability. A high probability can be expected for substances that are widely used and regularly released into the environment, whereas a low probability can be expected for new xenobiotic substances that have not yet been released into the environment. Be that as it may, once the inoculum sample contains sufficient specific degraders, the performance of the biodegradation will follow a typical S shaped growth curve which depends on the specific growth rate under laboratory conditions, the so called F/M ratio (ratio between food and biomass) and the more or less toxic recalcitrant, but possible, metabolites. Normally regulators require the evaluation of the growth curve using a simple approach such as half-time. Unfortunately probability and biodegradation half-time are very often confused. As the half-time values reflect laboratory conditions which are quite different from environmental conditions (after a substance is released), these values should not be used to quantify and predict environmental behavior. The probability value could be of much greater benefit for predictions under realistic conditions. The main issue in the evaluation of probability is that the result is not based on a single inoculum from an environmental sample, but on a variety of samples. These samples can be representative of regional or local areas, climate regions, water types, and history, e.g., pristine or polluted. The above concept has provided us with a new approach, namely “Probabio.” With this approach, persistence is not only regarded as a simple intrinsic property of a substance, but also as the capability of various environmental samples to degrade a substance under realistic exposure conditions and F/M ratio. PMID:21863143
Rooting phylogenetic trees under the coalescent model using site pattern probabilities.
Tian, Yuan; Kubatko, Laura
2017-12-19
Phylogenetic tree inference is a fundamental tool to estimate ancestor-descendant relationships among different species. In phylogenetic studies, identification of the root - the most recent common ancestor of all sampled organisms - is essential for complete understanding of the evolutionary relationships. Rooted trees benefit most downstream application of phylogenies such as species classification or study of adaptation. Often, trees can be rooted by using outgroups, which are species that are known to be more distantly related to the sampled organisms than any other species in the phylogeny. However, outgroups are not always available in evolutionary research. In this study, we develop a new method for rooting species tree under the coalescent model, by developing a series of hypothesis tests for rooting quartet phylogenies using site pattern probabilities. The power of this method is examined by simulation studies and by application to an empirical North American rattlesnake data set. The method shows high accuracy across the simulation conditions considered, and performs well for the rattlesnake data. Thus, it provides a computationally efficient way to accurately root species-level phylogenies that incorporates the coalescent process. The method is robust to variation in substitution model, but is sensitive to the assumption of a molecular clock. Our study establishes a computationally practical method for rooting species trees that is more efficient than traditional methods. The method will benefit numerous evolutionary studies that require rooting a phylogenetic tree without having to specify outgroups.
Ye, Qing; Pan, Hao; Liu, Changhua
2015-01-01
This research proposes a novel framework of final drive simultaneous failure diagnosis containing feature extraction, training paired diagnostic models, generating decision threshold, and recognizing simultaneous failure modes. In feature extraction module, adopt wavelet package transform and fuzzy entropy to reduce noise interference and extract representative features of failure mode. Use single failure sample to construct probability classifiers based on paired sparse Bayesian extreme learning machine which is trained only by single failure modes and have high generalization and sparsity of sparse Bayesian learning approach. To generate optimal decision threshold which can convert probability output obtained from classifiers into final simultaneous failure modes, this research proposes using samples containing both single and simultaneous failure modes and Grid search method which is superior to traditional techniques in global optimization. Compared with other frequently used diagnostic approaches based on support vector machine and probability neural networks, experiment results based on F 1-measure value verify that the diagnostic accuracy and efficiency of the proposed framework which are crucial for simultaneous failure diagnosis are superior to the existing approach. PMID:25722717
Assessing bat detectability and occupancy with multiple automated echolocation detectors
Gorresen, P.M.; Miles, A.C.; Todd, C.M.; Bonaccorso, F.J.; Weller, T.J.
2008-01-01
Occupancy analysis and its ability to account for differential detection probabilities is important for studies in which detecting echolocation calls is used as a measure of bat occurrence and activity. We examined the feasibility of remotely acquiring bat encounter histories to estimate detection probability and occupancy. We used echolocation detectors coupled to digital recorders operating at a series of proximate sites on consecutive nights in 2 trial surveys for the Hawaiian hoary bat (Lasiurus cinereus semotus). Our results confirmed that the technique is readily amenable for use in occupancy analysis. We also conducted a simulation exercise to assess the effects of sampling effort on parameter estimation. The results indicated that the precision and bias of parameter estimation were often more influenced by the number of sites sampled than number of visits. Acceptable accuracy often was not attained until at least 15 sites or 15 visits were used to estimate detection probability and occupancy. The method has significant potential for use in monitoring trends in bat activity and in comparative studies of habitat use. ?? 2008 American Society of Mammalogists.
Robustness of survival estimates for radio-marked animals
Bunck, C.M.; Chen, C.-L.
1992-01-01
Telemetry techniques are often used to study the survival of birds and mammals; particularly whcn mark-recapture approaches are unsuitable. Both parametric and nonparametric methods to estimate survival have becn developed or modified from other applications. An implicit assumption in these approaches is that the probability of re-locating an animal with a functioning transmitter is one. A Monte Carlo study was conducted to determine the bias and variance of the Kaplan-Meier estimator and an estimator based also on the assumption of constant hazard and to eva!uate the performance of the two-sample tests associated with each. Modifications of each estimator which allow a re-Iocation probability of less than one are described and evaluated. Generallv the unmodified estimators were biased but had lower variance. At low sample sizes all estimators performed poorly. Under the null hypothesis, the distribution of all test statistics reasonably approximated the null distribution when survival was low but not when it was high. The power of the two-sample tests were similar.
Ostrand, William D.; Drew, G.S.; Suryan, R.M.; McDonald, L.L.
1998-01-01
We compared strip transect and radio-tracking methods of determining foraging range of Black-legged Kittiwakes (Rissa tridactyla). The mean distance birds were observed from their colony determined by radio-tracking was significantly greater than the mean value calculated from strip transects. We determined that this difference was due to two sources of bias: (1) as distance from the colony increased, the area of available habitat also increased resulting in decreasing bird densities (bird spreading). Consequently, the probability of detecting birds during transect surveys also would decrease as distance from the colony increased, and (2) the maximum distance birds were observed from the colony during radio-tracking exceeded the extent of the strip transect survey. We compared the observed number of birds seen on the strip transect survey to the predictions of a model of the decreasing probability of detection due to bird spreading. Strip transect data were significantly different from modeled data; however, the field data were consistently equal to or below the model predictions, indicating a general conformity to the concept of declining detection at increasing distance. We conclude that radio-tracking data gave a more representative indication of foraging distances than did strip transect sampling. Previous studies of seabirds that have used strip transect sampling without accounting for bird spreading or the effects of study-area limitations probably underestimated foraging range.
Mills, T C; Stall, R; Pollack, L; Paul, J P; Binson, D; Canchola, J; Catania, J A
2001-01-01
OBJECTIVES: This study investigated the limitations of probability samples of men who have sex with men (MSM), limited to single cities and to the areas of highest concentrations of MSM ("gay ghettos"). METHODS: A probability sample of 2881 MSM in 4 American cities completed interviews by telephone. RESULTS: MSM who resided in ghettos differed from other MSM, although in different ways in each city. Non-ghetto-dwelling MSM were less involved in the gay and lesbian community. They were also less likely to have only male sexual partners, to identify as gay, and to have been tested for HIV. CONCLUSIONS: These differences between MSM who live in gay ghettos and those who live elsewhere have clear implications for HIV prevention efforts and health care planning. PMID:11392945
Entropy from State Probabilities: Hydration Entropy of Cations
2013-01-01
Entropy is an important energetic quantity determining the progression of chemical processes. We propose a new approach to obtain hydration entropy directly from probability density functions in state space. We demonstrate the validity of our approach for a series of cations in aqueous solution. Extensive validation of simulation results was performed. Our approach does not make prior assumptions about the shape of the potential energy landscape and is capable of calculating accurate hydration entropy values. Sampling times in the low nanosecond range are sufficient for the investigated ionic systems. Although the presented strategy is at the moment limited to systems for which a scalar order parameter can be derived, this is not a principal limitation of the method. The strategy presented is applicable to any chemical system where sufficient sampling of conformational space is accessible, for example, by computer simulations. PMID:23651109
DOE Office of Scientific and Technical Information (OSTI.GOV)
La Russa, D
Purpose: The purpose of this project is to develop a robust method of parameter estimation for a Poisson-based TCP model using Bayesian inference. Methods: Bayesian inference was performed using the PyMC3 probabilistic programming framework written in Python. A Poisson-based TCP regression model that accounts for clonogen proliferation was fit to observed rates of local relapse as a function of equivalent dose in 2 Gy fractions for a population of 623 stage-I non-small-cell lung cancer patients. The Slice Markov Chain Monte Carlo sampling algorithm was used to sample the posterior distributions, and was initiated using the maximum of the posterior distributionsmore » found by optimization. The calculation of TCP with each sample step required integration over the free parameter α, which was performed using an adaptive 24-point Gauss-Legendre quadrature. Convergence was verified via inspection of the trace plot and posterior distribution for each of the fit parameters, as well as with comparisons of the most probable parameter values with their respective maximum likelihood estimates. Results: Posterior distributions for α, the standard deviation of α (σ), the average tumour cell-doubling time (Td), and the repopulation delay time (Tk), were generated assuming α/β = 10 Gy, and a fixed clonogen density of 10{sup 7} cm−{sup 3}. Posterior predictive plots generated from samples from these posterior distributions are in excellent agreement with the observed rates of local relapse used in the Bayesian inference. The most probable values of the model parameters also agree well with maximum likelihood estimates. Conclusion: A robust method of performing Bayesian inference of TCP data using a complex TCP model has been established.« less
Empirical likelihood-based confidence intervals for mean medical cost with censored data.
Jeyarajah, Jenny; Qin, Gengsheng
2017-11-10
In this paper, we propose empirical likelihood methods based on influence function and jackknife techniques for constructing confidence intervals for mean medical cost with censored data. We conduct a simulation study to compare the coverage probabilities and interval lengths of our proposed confidence intervals with that of the existing normal approximation-based confidence intervals and bootstrap confidence intervals. The proposed methods have better finite-sample performances than existing methods. Finally, we illustrate our proposed methods with a relevant example. Copyright © 2017 John Wiley & Sons, Ltd.
Novel Maximum-based Timing Acquisition for Spread-Spectrum Communications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sibbetty, Taylor; Moradiz, Hussein; Farhang-Boroujeny, Behrouz
This paper proposes and analyzes a new packet detection and timing acquisition method for spread spectrum systems. The proposed method provides an enhancement over the typical thresholding techniques that have been proposed for direct sequence spread spectrum (DS-SS). The effective implementation of thresholding methods typically require accurate knowledge of the received signal-to-noise ratio (SNR), which is particularly difficult to estimate in spread spectrum systems. Instead, we propose a method which utilizes a consistency metric of the location of maximum samples at the output of a filter matched to the spread spectrum waveform to achieve acquisition, and does not require knowledgemore » of the received SNR. Through theoretical study, we show that the proposed method offers a low probability of missed detection over a large range of SNR with a corresponding probability of false alarm far lower than other methods. Computer simulations that corroborate our theoretical results are also presented. Although our work here has been motivated by our previous study of a filter bank multicarrier spread-spectrum (FB-MC-SS) system, the proposed method is applicable to DS-SS systems as well.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wen, Haiming; Lin, Yaojun; Seidman, David N.
The preparation of transmission electron microcopy (TEM) samples from powders with particle sizes larger than ~100 nm poses a challenge. The existing methods are complicated and expensive, or have a low probability of success. Herein, we report a modified methodology for preparation of TEM samples from powders, which is efficient, cost-effective, and easy to perform. This method involves mixing powders with an epoxy on a piece of weighing paper, curing the powder–epoxy mixture to form a bulk material, grinding the bulk to obtain a thin foil, punching TEM discs from the foil, dimpling the discs, and ion milling the dimpledmore » discs to electron transparency. Compared with the well established and robust grinding–dimpling–ion-milling method for TEM sample preparation for bulk materials, our modified approach for preparing TEM samples from powders only requires two additional simple steps. In this article, step-by-step procedures for our methodology are described in detail, and important strategies to ensure success are elucidated. Furthermore, our methodology has been applied successfully for preparing TEM samples with large thin areas and high quality for many different mechanically milled metallic powders.« less
Wen, Haiming; Lin, Yaojun; Seidman, David N.; ...
2015-09-09
The preparation of transmission electron microcopy (TEM) samples from powders with particle sizes larger than ~100 nm poses a challenge. The existing methods are complicated and expensive, or have a low probability of success. Herein, we report a modified methodology for preparation of TEM samples from powders, which is efficient, cost-effective, and easy to perform. This method involves mixing powders with an epoxy on a piece of weighing paper, curing the powder–epoxy mixture to form a bulk material, grinding the bulk to obtain a thin foil, punching TEM discs from the foil, dimpling the discs, and ion milling the dimpledmore » discs to electron transparency. Compared with the well established and robust grinding–dimpling–ion-milling method for TEM sample preparation for bulk materials, our modified approach for preparing TEM samples from powders only requires two additional simple steps. In this article, step-by-step procedures for our methodology are described in detail, and important strategies to ensure success are elucidated. Furthermore, our methodology has been applied successfully for preparing TEM samples with large thin areas and high quality for many different mechanically milled metallic powders.« less
Lopatka, Martin; Sigman, Michael E; Sjerps, Marjan J; Williams, Mary R; Vivó-Truyols, Gabriel
2015-07-01
Forensic chemical analysis of fire debris addresses the question of whether ignitable liquid residue is present in a sample and, if so, what type. Evidence evaluation regarding this question is complicated by interference from pyrolysis products of the substrate materials present in a fire. A method is developed to derive a set of class-conditional features for the evaluation of such complex samples. The use of a forensic reference collection allows characterization of the variation in complex mixtures of substrate materials and ignitable liquids even when the dominant feature is not specific to an ignitable liquid. Making use of a novel method for data imputation under complex mixing conditions, a distribution is modeled for the variation between pairs of samples containing similar ignitable liquid residues. Examining the covariance of variables within the different classes allows different weights to be placed on features more important in discerning the presence of a particular ignitable liquid residue. Performance of the method is evaluated using a database of total ion spectrum (TIS) measurements of ignitable liquid and fire debris samples. These measurements include 119 nominal masses measured by GC-MS and averaged across a chromatographic profile. Ignitable liquids are labeled using the American Society for Testing and Materials (ASTM) E1618 standard class definitions. Statistical analysis is performed in the class-conditional feature space wherein new forensic traces are represented based on their likeness to known samples contained in a forensic reference collection. The demonstrated method uses forensic reference data as the basis of probabilistic statements concerning the likelihood of the obtained analytical results given the presence of ignitable liquid residue of each of the ASTM classes (including a substrate only class). When prior probabilities of these classes can be assumed, these likelihoods can be connected to class probabilities. In order to compare the performance of this method to previous work, a uniform prior was assumed, resulting in an 81% accuracy for an independent test of 129 real burn samples. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Static versus dynamic sampling for data mining
DOE Office of Scientific and Technical Information (OSTI.GOV)
John, G.H.; Langley, P.
1996-12-31
As data warehouses grow to the point where one hundred gigabytes is considered small, the computational efficiency of data-mining algorithms on large databases becomes increasingly important. Using a sample from the database can speed up the datamining process, but this is only acceptable if it does not reduce the quality of the mined knowledge. To this end, we introduce the {open_quotes}Probably Close Enough{close_quotes} criterion to describe the desired properties of a sample. Sampling usually refers to the use of static statistical tests to decide whether a sample is sufficiently similar to the large database, in the absence of any knowledgemore » of the tools the data miner intends to use. We discuss dynamic sampling methods, which take into account the mining tool being used and can thus give better samples. We describe dynamic schemes that observe a mining tool`s performance on training samples of increasing size and use these results to determine when a sample is sufficiently large. We evaluate these sampling methods on data from the UCI repository and conclude that dynamic sampling is preferable.« less
DuPont Qualicon BAX System polymerase chain reaction assay. Performance Tested Method 100201.
Tice, George; Andaloro, Bridget; Fallon, Dawn; Wallace, F Morgan
2009-01-01
A recent outbreak of Salmonella in peanut butter has highlighted the need for validation of rapid detection methods. A multilaboratory study for detecting Salmonella in peanut butter was conducted as part of the AOAC Research Institute Emergency Response Validation program for methods that detect outbreak threats to food safety. Three sites tested spiked samples from the same master mix according to the U.S. Food and Drug Administration's Bacteriological Analytical Manual (FDA-BAM) method and the BAX System method. Salmonella Typhimurium (ATCC 14028) was grown in brain heart infusion for 24 h at 37 degrees C, then diluted to appropriate levels for sample inoculation. Master samples of peanut butter were spiked at high and low target levels, mixed, and allowed to equilibrate at room temperature for 2 weeks. Spike levels were low [1.08 most probable number (MPN)/25 g]; high (11.5 MPN/25 g) and unspiked to serve as negative controls. Each master sample was divided into 25 g portions and coded to blind the samples. Twenty portions of each spiked master sample and five portions of the unspiked sample were tested at each site. At each testing site, samples were blended in 25 g portions with 225 mL prewarmed lactose broth until thoroughly homogenized, then allowed to remain at room temperature for 55-65 min. Samples were adjusted to a pH of 6.8 +/- 0.2, if necessary, and incubated for 22-26 h at 35 degrees C. Across the three reporting laboratories, the BAX System detected Salmonella in 10/60 low-spike samples and 58/60 high-spike samples. The reference FDA-BAM method yielded positive results for 11/60 low-spike and 58/60 high-spike samples. Neither method demonstrated positive results for any of the 15 unspiked samples.
Extended Importance Sampling for Reliability Analysis under Evidence Theory
NASA Astrophysics Data System (ADS)
Yuan, X. K.; Chen, B.; Zhang, B. Q.
2018-05-01
In early engineering practice, the lack of data and information makes uncertainty difficult to deal with. However, evidence theory has been proposed to handle uncertainty with limited information as an alternative way to traditional probability theory. In this contribution, a simulation-based approach, called ‘Extended importance sampling’, is proposed based on evidence theory to handle problems with epistemic uncertainty. The proposed approach stems from the traditional importance sampling for reliability analysis under probability theory, and is developed to handle the problem with epistemic uncertainty. It first introduces a nominal instrumental probability density function (PDF) for every epistemic uncertainty variable, and thus an ‘equivalent’ reliability problem under probability theory is obtained. Then the samples of these variables are generated in a way of importance sampling. Based on these samples, the plausibility and belief (upper and lower bounds of probability) can be estimated. It is more efficient than direct Monte Carlo simulation. Numerical and engineering examples are given to illustrate the efficiency and feasible of the proposed approach.
Serious Emotional Disturbance among Youths Exposed to Hurricane Katrina 2 Years Postdisaster
ERIC Educational Resources Information Center
McLaughlin, Katie A.; Fairbank, John A.; Gruber, Michael J.; Jones, Russell T.; Lakoma, Matthew D.; Pfefferbaum, Betty; Sampson, Nancy A.; Kessler, Ronald C.
2009-01-01
Objective: To estimate the prevalence of serious emotional disturbance (SED) among children and adolescents exposed to Hurricane Katrina along with the associations of SED with hurricane-related stressors, sociodemographics, and family factors 18 to 27 months after the hurricane. Method: A probability sample of prehurricane residents of areas…
Using a Calendar and Explanatory Instructions to Aid Within-Household Selection in Mail Surveys
ERIC Educational Resources Information Center
Stange, Mathew; Smyth, Jolene D.; Olson, Kristen
2016-01-01
Although researchers can easily select probability samples of addresses using the U.S. Postal Service's Delivery Sequence File, randomly selecting respondents within households for surveys remains challenging. Researchers often place within-household selection instructions, such as the next or last birthday methods, in survey cover letters to…
Katherine C. Kendall; Kevin S. McKelvey
2008-01-01
The identification of species from hair samples is probably as old as humanity, but did not receive much scientific attention until efficient and relatively inexpensive methods for amplifying DNA became available. Prior to this time, keys were used to identify species through the microscopic analysis of hair shaft morphology (Moore et al. 1974; also see Raphael 1994...
Mathematical Analysis of a Multiple-Look Concept Identification Model.
ERIC Educational Resources Information Center
Cotton, John W.
The behavior of focus samples central to the multiple-look model of Trabasso and Bower is examined by three methods. First, exact probabilities of success conditional upon a certain brief history of stimulation are determined. Second, possible states of the organism during the experiment are defined and a transition matrix for those states…
Changing the Subject: The Place of Revisions in Grammatical Development
ERIC Educational Resources Information Center
Rispoli, Matthew
2018-01-01
Purpose: This article focuses on toddlers' revisions of the sentence subject and tests the hypothesis that subject diversity (i.e., the number of different subjects produced) increases the probability of subject revision. Method: One-hour language samples were collected from 61 children (32 girls) at 27 months. Spontaneously produced, active…
2013-08-01
cost due to potential warranty costs, repairs and loss of market share. Reliability is the probability that the system will perform its intended...MCMC and splitting sampling schemes. Our proposed SS/ STP method is presented in Section 4, including accuracy bounds and computational effort
Disadvantages of the Horsfall-Barratt Scale for estimating severity of citrus canker
USDA-ARS?s Scientific Manuscript database
Direct visual estimation of disease severity to the nearest percent was compared to using the Horsfall-Barratt (H-B) scale. Data from a simulation model designed to sample two diseased populations were used to investigate the probability of the two methods to reject a null hypothesis (H0) using a t-...
Nichols, J.D.; Sauer, J.R.; Hines, J.E.; Boulinier, T.; Pollock, K.H.; Therres, Glenn D.
2001-01-01
Although many ecological monitoring programs are now in place, the use of resulting data to draw inferences about changes in biodiversity is problematic. The difficulty arises because of the inability to count all animals present in any sampled area. This inability results not only in underestimation of species richness but also in potentially misleading comparisons of species richness over time and space. We recommend the use of probabilistic estimators for estimating species richness and related parameters (e.g., rate of change in species richness, local extinction probability, local turnover, local colonization) when animal detection probabilities are <1. We illustrate these methods using data from the North American Breeding Bird Survey obtained along survey routes in Maryland. We also introduce software to implement these estimation methods.
The theory precision analyse of RFM localization of satellite remote sensing imagery
NASA Astrophysics Data System (ADS)
Zhang, Jianqing; Xv, Biao
2009-11-01
The tradition method of detecting precision of Rational Function Model(RFM) is to make use of a great deal check points, and it calculates mean square error through comparing calculational coordinate with known coordinate. This method is from theory of probability, through a large number of samples to statistic estimate value of mean square error, we can think its estimate value approaches in its true when samples are well enough. This paper is from angle of survey adjustment, take law of propagation of error as the theory basis, and it calculates theory precision of RFM localization. Then take the SPOT5 three array imagery as experiment data, and the result of traditional method and narrated method in the paper are compared, while has confirmed tradition method feasible, and answered its theory precision question from the angle of survey adjustment.
PROBABILITY SAMPLING AND POPULATION INFERENCE IN MONITORING PROGRAMS
A fundamental difference between probability sampling and conventional statistics is that "sampling" deals with real, tangible populations, whereas "conventional statistics" usually deals with hypothetical populations that have no real-world realization. he focus here is on real ...
Rothmann, Mark
2005-01-01
When testing the equality of means from two different populations, a t-test or large sample normal test tend to be performed. For these tests, when the sample size or design for the second sample is dependent on the results of the first sample, the type I error probability is altered for each specific possibility in the null hypothesis. We will examine the impact on the type I error probabilities for two confidence interval procedures and procedures using test statistics when the design for the second sample or experiment is dependent on the results from the first sample or experiment (or series of experiments). Ways for controlling a desired maximum type I error probability or a desired type I error rate will be discussed. Results are applied to the setting of noninferiority comparisons in active controlled trials where the use of a placebo is unethical.
Tsunami hazard assessments with consideration of uncertain earthquakes characteristics
NASA Astrophysics Data System (ADS)
Sepulveda, I.; Liu, P. L. F.; Grigoriu, M. D.; Pritchard, M. E.
2017-12-01
The uncertainty quantification of tsunami assessments due to uncertain earthquake characteristics faces important challenges. First, the generated earthquake samples must be consistent with the properties observed in past events. Second, it must adopt an uncertainty propagation method to determine tsunami uncertainties with a feasible computational cost. In this study we propose a new methodology, which improves the existing tsunami uncertainty assessment methods. The methodology considers two uncertain earthquake characteristics, the slip distribution and location. First, the methodology considers the generation of consistent earthquake slip samples by means of a Karhunen Loeve (K-L) expansion and a translation process (Grigoriu, 2012), applicable to any non-rectangular rupture area and marginal probability distribution. The K-L expansion was recently applied by Le Veque et al. (2016). We have extended the methodology by analyzing accuracy criteria in terms of the tsunami initial conditions. Furthermore, and unlike this reference, we preserve the original probability properties of the slip distribution, by avoiding post sampling treatments such as earthquake slip scaling. Our approach is analyzed and justified in the framework of the present study. Second, the methodology uses a Stochastic Reduced Order model (SROM) (Grigoriu, 2009) instead of a classic Monte Carlo simulation, which reduces the computational cost of the uncertainty propagation. The methodology is applied on a real case. We study tsunamis generated at the site of the 2014 Chilean earthquake. We generate earthquake samples with expected magnitude Mw 8. We first demonstrate that the stochastic approach of our study generates consistent earthquake samples with respect to the target probability laws. We also show that the results obtained from SROM are more accurate than classic Monte Carlo simulations. We finally validate the methodology by comparing the simulated tsunamis and the tsunami records for the 2014 Chilean earthquake. Results show that leading wave measurements fall within the tsunami sample space. At later times, however, there are mismatches between measured data and the simulated results, suggesting that other sources of uncertainty are as relevant as the uncertainty of the studied earthquake characteristics.
NASA Astrophysics Data System (ADS)
Dasgupta, Bhaskar; Nakamura, Haruki; Higo, Junichi
2016-10-01
Virtual-system coupled adaptive umbrella sampling (VAUS) enhances sampling along a reaction coordinate by using a virtual degree of freedom. However, VAUS and regular adaptive umbrella sampling (AUS) methods are yet computationally expensive. To decrease the computational burden further, improvements of VAUS for all-atom explicit solvent simulation are presented here. The improvements include probability distribution calculation by a Markov approximation; parameterization of biasing forces by iterative polynomial fitting; and force scaling. These when applied to study Ala-pentapeptide dimerization in explicit solvent showed advantage over regular AUS. By using improved VAUS larger biological systems are amenable.
Estimation of probability of failure for damage-tolerant aerospace structures
NASA Astrophysics Data System (ADS)
Halbert, Keith
The majority of aircraft structures are designed to be damage-tolerant such that safe operation can continue in the presence of minor damage. It is necessary to schedule inspections so that minor damage can be found and repaired. It is generally not possible to perform structural inspections prior to every flight. The scheduling is traditionally accomplished through a deterministic set of methods referred to as Damage Tolerance Analysis (DTA). DTA has proven to produce safe aircraft but does not provide estimates of the probability of failure of future flights or the probability of repair of future inspections. Without these estimates maintenance costs cannot be accurately predicted. Also, estimation of failure probabilities is now a regulatory requirement for some aircraft. The set of methods concerned with the probabilistic formulation of this problem are collectively referred to as Probabilistic Damage Tolerance Analysis (PDTA). The goal of PDTA is to control the failure probability while holding maintenance costs to a reasonable level. This work focuses specifically on PDTA for fatigue cracking of metallic aircraft structures. The growth of a crack (or cracks) must be modeled using all available data and engineering knowledge. The length of a crack can be assessed only indirectly through evidence such as non-destructive inspection results, failures or lack of failures, and the observed severity of usage of the structure. The current set of industry PDTA tools are lacking in several ways: they may in some cases yield poor estimates of failure probabilities, they cannot realistically represent the variety of possible failure and maintenance scenarios, and they do not allow for model updates which incorporate observed evidence. A PDTA modeling methodology must be flexible enough to estimate accurately the failure and repair probabilities under a variety of maintenance scenarios, and be capable of incorporating observed evidence as it becomes available. This dissertation describes and develops new PDTA methodologies that directly address the deficiencies of the currently used tools. The new methods are implemented as a free, publicly licensed and open source R software package that can be downloaded from the Comprehensive R Archive Network. The tools consist of two main components. First, an explicit (and expensive) Monte Carlo approach is presented which simulates the life of an aircraft structural component flight-by-flight. This straightforward MC routine can be used to provide defensible estimates of the failure probabilities for future flights and repair probabilities for future inspections under a variety of failure and maintenance scenarios. This routine is intended to provide baseline estimates against which to compare the results of other, more efficient approaches. Second, an original approach is described which models the fatigue process and future scheduled inspections as a hidden Markov model. This model is solved using a particle-based approximation and the sequential importance sampling algorithm, which provides an efficient solution to the PDTA problem. Sequential importance sampling is an extension of importance sampling to a Markov process, allowing for efficient Bayesian updating of model parameters. This model updating capability, the benefit of which is demonstrated, is lacking in other PDTA approaches. The results of this approach are shown to agree with the results of the explicit Monte Carlo routine for a number of PDTA problems. Extensions to the typical PDTA problem, which cannot be solved using currently available tools, are presented and solved in this work. These extensions include incorporating observed evidence (such as non-destructive inspection results), more realistic treatment of possible future repairs, and the modeling of failure involving more than one crack (the so-called continuing damage problem). The described hidden Markov model / sequential importance sampling approach to PDTA has the potential to improve aerospace structural safety and reduce maintenance costs by providing a more accurate assessment of the risk of failure and the likelihood of repairs throughout the life of an aircraft.
Clare, John; McKinney, Shawn T; DePue, John E; Loftin, Cynthia S
2017-10-01
It is common to use multiple field sampling methods when implementing wildlife surveys to compare method efficacy or cost efficiency, integrate distinct pieces of information provided by separate methods, or evaluate method-specific biases and misclassification error. Existing models that combine information from multiple field methods or sampling devices permit rigorous comparison of method-specific detection parameters, enable estimation of additional parameters such as false-positive detection probability, and improve occurrence or abundance estimates, but with the assumption that the separate sampling methods produce detections independently of one another. This assumption is tenuous if methods are paired or deployed in close proximity simultaneously, a common practice that reduces the additional effort required to implement multiple methods and reduces the risk that differences between method-specific detection parameters are confounded by other environmental factors. We develop occupancy and spatial capture-recapture models that permit covariance between the detections produced by different methods, use simulation to compare estimator performance of the new models to models assuming independence, and provide an empirical application based on American marten (Martes americana) surveys using paired remote cameras, hair catches, and snow tracking. Simulation results indicate existing models that assume that methods independently detect organisms produce biased parameter estimates and substantially understate estimate uncertainty when this assumption is violated, while our reformulated models are robust to either methodological independence or covariance. Empirical results suggested that remote cameras and snow tracking had comparable probability of detecting present martens, but that snow tracking also produced false-positive marten detections that could potentially substantially bias distribution estimates if not corrected for. Remote cameras detected marten individuals more readily than passive hair catches. Inability to photographically distinguish individual sex did not appear to induce negative bias in camera density estimates; instead, hair catches appeared to produce detection competition between individuals that may have been a source of negative bias. Our model reformulations broaden the range of circumstances in which analyses incorporating multiple sources of information can be robustly used, and our empirical results demonstrate that using multiple field-methods can enhance inferences regarding ecological parameters of interest and improve understanding of how reliably survey methods sample these parameters. © 2017 by the Ecological Society of America.
Entropy-Based Search Algorithm for Experimental Design
NASA Astrophysics Data System (ADS)
Malakar, N. K.; Knuth, K. H.
2011-03-01
The scientific method relies on the iterated processes of inference and inquiry. The inference phase consists of selecting the most probable models based on the available data; whereas the inquiry phase consists of using what is known about the models to select the most relevant experiment. Optimizing inquiry involves searching the parameterized space of experiments to select the experiment that promises, on average, to be maximally informative. In the case where it is important to learn about each of the model parameters, the relevance of an experiment is quantified by Shannon entropy of the distribution of experimental outcomes predicted by a probable set of models. If the set of potential experiments is described by many parameters, we must search this high-dimensional entropy space. Brute force search methods will be slow and computationally expensive. We present an entropy-based search algorithm, called nested entropy sampling, to select the most informative experiment for efficient experimental design. This algorithm is inspired by Skilling's nested sampling algorithm used in inference and borrows the concept of a rising threshold while a set of experiment samples are maintained. We demonstrate that this algorithm not only selects highly relevant experiments, but also is more efficient than brute force search. Such entropic search techniques promise to greatly benefit autonomous experimental design.
Occupancy Modeling Species-Environment Relationships with Non-ignorable Survey Designs.
Irvine, Kathryn M; Rodhouse, Thomas J; Wright, Wilson J; Olsen, Anthony R
2018-05-26
Statistical models supporting inferences about species occurrence patterns in relation to environmental gradients are fundamental to ecology and conservation biology. A common implicit assumption is that the sampling design is ignorable and does not need to be formally accounted for in analyses. The analyst assumes data are representative of the desired population and statistical modeling proceeds. However, if datasets from probability and non-probability surveys are combined or unequal selection probabilities are used, the design may be non ignorable. We outline the use of pseudo-maximum likelihood estimation for site-occupancy models to account for such non-ignorable survey designs. This estimation method accounts for the survey design by properly weighting the pseudo-likelihood equation. In our empirical example, legacy and newer randomly selected locations were surveyed for bats to bridge a historic statewide effort with an ongoing nationwide program. We provide a worked example using bat acoustic detection/non-detection data and show how analysts can diagnose whether their design is ignorable. Using simulations we assessed whether our approach is viable for modeling datasets composed of sites contributed outside of a probability design Pseudo-maximum likelihood estimates differed from the usual maximum likelihood occu31 pancy estimates for some bat species. Using simulations we show the maximum likelihood estimator of species-environment relationships with non-ignorable sampling designs was biased, whereas the pseudo-likelihood estimator was design-unbiased. However, in our simulation study the designs composed of a large proportion of legacy or non-probability sites resulted in estimation issues for standard errors. These issues were likely a result of highly variable weights confounded by small sample sizes (5% or 10% sampling intensity and 4 revisits). Aggregating datasets from multiple sources logically supports larger sample sizes and potentially increases spatial extents for statistical inferences. Our results suggest that ignoring the mechanism for how locations were selected for data collection (e.g., the sampling design) could result in erroneous model-based conclusions. Therefore, in order to ensure robust and defensible recommendations for evidence-based conservation decision-making, the survey design information in addition to the data themselves must be available for analysts. Details for constructing the weights used in estimation and code for implementation are provided. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Methods for sampling geographically mobile female traders in an East African market setting
Achiro, Lillian; Kwena, Zachary A.; McFarland, Willi; Neilands, Torsten B.; Cohen, Craig R.; Bukusi, Elizabeth A.; Camlin, Carol S.
2018-01-01
Background The role of migration in the spread of HIV in sub-Saharan Africa is well-documented. Yet migration and HIV research have often focused on HIV risks to male migrants and their partners, or migrants overall, often failing to measure the risks to women via their direct involvement in migration. Inconsistent measures of mobility, gender biases in those measures, and limited data sources for sex-specific population-based estimates of mobility have contributed to a paucity of research on the HIV prevention and care needs of migrant and highly mobile women. This study addresses an urgent need for novel methods for developing probability-based, systematic samples of highly mobile women, focusing on a population of female traders operating out of one of the largest open air markets in East Africa. Our method involves three stages: 1.) identification and mapping of all market stall locations using Global Positioning System (GPS) coordinates; 2.) using female market vendor stall GPS coordinates to build the sampling frame using replicates; and 3.) using maps and GPS data for recruitment of study participants. Results The location of 6,390 vendor stalls were mapped using GPS. Of these, 4,064 stalls occupied by women (63.6%) were used to draw four replicates of 128 stalls each, and a fifth replicate of 15 pre-selected random alternates for a total of 527 stalls assigned to one of five replicates. Staff visited 323 stalls from the first three replicates and from these successfully recruited 306 female vendors into the study for a participation rate of 94.7%. Mobilization strategies and involving traders association representatives in participant recruitment were critical to the study’s success. Conclusion The study’s high participation rate suggests that this geospatial sampling method holds promise for development of probability-based samples in other settings that serve as transport hubs for highly mobile populations. PMID:29324780
Impact of signal scattering and parametric uncertainties on receiver operating characteristics
NASA Astrophysics Data System (ADS)
Wilson, D. Keith; Breton, Daniel J.; Hart, Carl R.; Pettit, Chris L.
2017-05-01
The receiver operating characteristic (ROC curve), which is a plot of the probability of detection as a function of the probability of false alarm, plays a key role in the classical analysis of detector performance. However, meaningful characterization of the ROC curve is challenging when practically important complications such as variations in source emissions, environmental impacts on the signal propagation, uncertainties in the sensor response, and multiple sources of interference are considered. In this paper, a relatively simple but realistic model for scattered signals is employed to explore how parametric uncertainties impact the ROC curve. In particular, we show that parametric uncertainties in the mean signal and noise power substantially raise the tails of the distributions; since receiver operation with a very low probability of false alarm and a high probability of detection is normally desired, these tails lead to severely degraded performance. Because full a priori knowledge of such parametric uncertainties is rarely available in practice, analyses must typically be based on a finite sample of environmental states, which only partially characterize the range of parameter variations. We show how this effect can lead to misleading assessments of system performance. For the cases considered, approximately 64 or more statistically independent samples of the uncertain parameters are needed to accurately predict the probabilities of detection and false alarm. A connection is also described between selection of suitable distributions for the uncertain parameters, and Bayesian adaptive methods for inferring the parameters.
NASA Technical Reports Server (NTRS)
Generazio, Edward R.
2011-01-01
The capability of an inspection system is established by applications of various methodologies to determine the probability of detection (POD). One accepted metric of an adequate inspection system is that for a minimum flaw size and all greater flaw sizes, there is 0.90 probability of detection with 95% confidence (90/95 POD). Directed design of experiments for probability of detection (DOEPOD) has been developed to provide an efficient and accurate methodology that yields estimates of POD and confidence bounds for both Hit-Miss or signal amplitude testing, where signal amplitudes are reduced to Hit-Miss by using a signal threshold Directed DOEPOD uses a nonparametric approach for the analysis or inspection data that does require any assumptions about the particular functional form of a POD function. The DOEPOD procedure identifies, for a given sample set whether or not the minimum requirement of 0.90 probability of detection with 95% confidence is demonstrated for a minimum flaw size and for all greater flaw sizes (90/95 POD). The DOEPOD procedures are sequentially executed in order to minimize the number of samples needed to demonstrate that there is a 90/95 POD lower confidence bound at a given flaw size and that the POD is monotonic for flaw sizes exceeding that 90/95 POD flaw size. The conservativeness of the DOEPOD methodology results is discussed. Validated guidelines for binomial estimation of POD for fracture critical inspection are established.
Wolf, R; Orsel, K; De Buck, J; Kanevets, U; Barkema, H W
2016-04-01
Mycobacterium avium ssp. paratuberculosis (MAP) causes Johne's disease, a production-limiting disease in cattle. Detection of infected herds is often done using environmental samples (ES) of manure, which are collected in cattle pens and manure storage areas. Disadvantages of the method are that sample accuracy is affected by cattle housing and type of manure storage area. Furthermore, some sampling locations (e.g., manure lagoons) are frequently not readily accessible. However, sampling socks (SO), as used for Salmonella spp. testing in chicken flocks, might be an easy to use and accurate alternative to ES. The objective of the study was to assess accuracy of SO for detection of MAP in dairy herds. At each of 102 participating herds, 6 ES and 2 SO were collected. In total, 45 herds had only negative samples in both methods and 29 herds had ≥1 positive ES and ≥1 positive SO. Furthermore, 27 herds with ≥1 positive ES had no positive SO, and 1 herd with no positive ES had 1 positive SO. Bayesian simulation with informative priors on sensitivity of ES and MAP herd prevalence provided a posterior sensitivity for SO of 43.5% (95% probability interval=33-58), and 78.5% (95% probability interval=62-93) for ES. Although SO were easy to use, accuracy was lower than for ES. Therefore, with improvements in the sampling protocol (e.g., more SO per farm and more frequent herd visits), as well as improvements in the laboratory protocol, perhaps SO would be a useful alternative for ES. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Brůžek, Jaroslav; Santos, Frédéric; Dutailly, Bruno; Murail, Pascal; Cunha, Eugenia
2017-10-01
A new tool for skeletal sex estimation based on measurements of the human os coxae is presented using skeletons from a metapopulation of identified adult individuals from twelve independent population samples. For reliable sex estimation, a posterior probability greater than 0.95 was considered to be the classification threshold: below this value, estimates are considered indeterminate. By providing free software, we aim to develop an even more disseminated method for sex estimation. Ten metric variables collected from 2,040 ossa coxa of adult subjects of known sex were recorded between 1986 and 2002 (reference sample). To test both the validity and reliability, a target sample consisting of two series of adult ossa coxa of known sex (n = 623) was used. The DSP2 software (Diagnose Sexuelle Probabiliste v2) is based on Linear Discriminant Analysis, and the posterior probabilities are calculated using an R script. For the reference sample, any combination of four dimensions provides a correct sex estimate in at least 99% of cases. The percentage of individuals for whom sex can be estimated depends on the number of dimensions; for all ten variables it is higher than 90%. Those results are confirmed in the target sample. Our posterior probability threshold of 0.95 for sex estimate corresponds to the traditional sectioning point used in osteological studies. DSP2 software is replacing the former version that should not be used anymore. DSP2 is a robust and reliable technique for sexing adult os coxae, and is also user friendly. © 2017 Wiley Periodicals, Inc.
Mürmann, Lisandra; dos Santos, Maria Cecília; Longaray, Solange Mendes; Both, Jane Mari Corrêa; Cardoso, Marisa
2008-01-01
Data concerning the prevalence and populations of Salmonella in foods implicated in outbreaks may be important to the development of quantitative microbial risk assessments of individual food products. In this sense, the objective of the present study was to assess the amount of Salmonella sp. in different foods implicated in foodborne outbreaks in Rio Grande do Sul occurred in 2005 and to characterize the isolated strains using phenotypic and genotypic methods. Nineteen food samples involved in ten foodborne outbreaks occurred in 2005, and positive on Salmonella isolation at the Central Laboratory of the Health Department of the State of Rio Grande do Sul, were included in this study. Food samples were submitted to estimation of Salmonella using the Most Probable Number (MPN) technique. Moreover, one confirmed Salmonella colony of each food sample was serotyped, characterized by its XbaI-macrorestriction profile, and submitted to antimicrobial resistance testing. Foods containing eggs, mayonnaise or chicken were contaminated with Salmonella in eight outbreaks. Higher counts (>107 MPN.g-1) of Salmonella were detected mostly in foods containing mayonnaise. The isolation of Salmonella from multiple food items in five outbreaks probably resulted from the cross-contamination, and the high Salmonella counts detected in almost all analyzed samples probably resulted from storing in inadequate temperature. All strains were identified as S. Enteritidis, and presented a unique macrorestriction profile, demonstrating the predominance of one clonal group in foods involved in the salmonellosis outbreaks. A low frequency of antimicrobial resistant S. Enteritidis strains was observed and nalidixic acid was the only resistance marker detected. PMID:24031261
A modified Wald interval for the area under the ROC curve (AUC) in diagnostic case-control studies
2014-01-01
Background The area under the receiver operating characteristic (ROC) curve, referred to as the AUC, is an appropriate measure for describing the overall accuracy of a diagnostic test or a biomarker in early phase trials without having to choose a threshold. There are many approaches for estimating the confidence interval for the AUC. However, all are relatively complicated to implement. Furthermore, many approaches perform poorly for large AUC values or small sample sizes. Methods The AUC is actually a probability. So we propose a modified Wald interval for a single proportion, which can be calculated on a pocket calculator. We performed a simulation study to compare this modified Wald interval (without and with continuity correction) with other intervals regarding coverage probability and statistical power. Results The main result is that the proposed modified Wald intervals maintain and exploit the type I error much better than the intervals of Agresti-Coull, Wilson, and Clopper-Pearson. The interval suggested by Bamber, the Mann-Whitney interval without transformation and also the interval of the binormal AUC are very liberal. For small sample sizes the Wald interval with continuity has a comparable coverage probability as the LT interval and higher power. For large sample sizes the results of the LT interval and of the Wald interval without continuity correction are comparable. Conclusions If individual patient data is not available, but only the estimated AUC and the total sample size, the modified Wald intervals can be recommended as confidence intervals for the AUC. For small sample sizes the continuity correction should be used. PMID:24552686
Multilevel sequential Monte Carlo samplers
Beskos, Alexandros; Jasra, Ajay; Law, Kody; ...
2016-08-24
Here, we study the approximation of expectations w.r.t. probability distributions associated to the solution of partial differential equations (PDEs); this scenario appears routinely in Bayesian inverse problems. In practice, one often has to solve the associated PDE numerically, using, for instance finite element methods and leading to a discretisation bias, with the step-size level h L. In addition, the expectation cannot be computed analytically and one often resorts to Monte Carlo methods. In the context of this problem, it is known that the introduction of the multilevel Monte Carlo (MLMC) method can reduce the amount of computational effort to estimate expectations, for a given level of error. This is achieved via a telescoping identity associated to a Monte Carlo approximation of a sequence of probability distributions with discretisation levelsmore » $${\\infty}$$ >h 0>h 1 ...>h L. In many practical problems of interest, one cannot achieve an i.i.d. sampling of the associated sequence of probability distributions. A sequential Monte Carlo (SMC) version of the MLMC method is introduced to deal with this problem. In conclusion, it is shown that under appropriate assumptions, the attractive property of a reduction of the amount of computational effort to estimate expectations, for a given level of error, can be maintained within the SMC context.« less
Pometti, Carolina L; Bessega, Cecilia F; Saidman, Beatriz O; Vilardi, Juan C
2014-03-01
Bayesian clustering as implemented in STRUCTURE or GENELAND software is widely used to form genetic groups of populations or individuals. On the other hand, in order to satisfy the need for less computer-intensive approaches, multivariate analyses are specifically devoted to extracting information from large datasets. In this paper, we report the use of a dataset of AFLP markers belonging to 15 sampling sites of Acacia caven for studying the genetic structure and comparing the consistency of three methods: STRUCTURE, GENELAND and DAPC. Of these methods, DAPC was the fastest one and showed accuracy in inferring the K number of populations (K = 12 using the find.clusters option and K = 15 with a priori information of populations). GENELAND in turn, provides information on the area of membership probabilities for individuals or populations in the space, when coordinates are specified (K = 12). STRUCTURE also inferred the number of K populations and the membership probabilities of individuals based on ancestry, presenting the result K = 11 without prior information of populations and K = 15 using the LOCPRIOR option. Finally, in this work all three methods showed high consistency in estimating the population structure, inferring similar numbers of populations and the membership probabilities of individuals to each group, with a high correlation between each other.
Rosenblum, Michael A; Laan, Mark J van der
2009-01-07
The validity of standard confidence intervals constructed in survey sampling is based on the central limit theorem. For small sample sizes, the central limit theorem may give a poor approximation, resulting in confidence intervals that are misleading. We discuss this issue and propose methods for constructing confidence intervals for the population mean tailored to small sample sizes. We present a simple approach for constructing confidence intervals for the population mean based on tail bounds for the sample mean that are correct for all sample sizes. Bernstein's inequality provides one such tail bound. The resulting confidence intervals have guaranteed coverage probability under much weaker assumptions than are required for standard methods. A drawback of this approach, as we show, is that these confidence intervals are often quite wide. In response to this, we present a method for constructing much narrower confidence intervals, which are better suited for practical applications, and that are still more robust than confidence intervals based on standard methods, when dealing with small sample sizes. We show how to extend our approaches to much more general estimation problems than estimating the sample mean. We describe how these methods can be used to obtain more reliable confidence intervals in survey sampling. As a concrete example, we construct confidence intervals using our methods for the number of violent deaths between March 2003 and July 2006 in Iraq, based on data from the study "Mortality after the 2003 invasion of Iraq: A cross sectional cluster sample survey," by Burnham et al. (2006).
Metocean design parameter estimation for fixed platform based on copula functions
NASA Astrophysics Data System (ADS)
Zhai, Jinjin; Yin, Qilin; Dong, Sheng
2017-08-01
Considering the dependent relationship among wave height, wind speed, and current velocity, we construct novel trivariate joint probability distributions via Archimedean copula functions. Total 30-year data of wave height, wind speed, and current velocity in the Bohai Sea are hindcast and sampled for case study. Four kinds of distributions, namely, Gumbel distribution, lognormal distribution, Weibull distribution, and Pearson Type III distribution, are candidate models for marginal distributions of wave height, wind speed, and current velocity. The Pearson Type III distribution is selected as the optimal model. Bivariate and trivariate probability distributions of these environmental conditions are established based on four bivariate and trivariate Archimedean copulas, namely, Clayton, Frank, Gumbel-Hougaard, and Ali-Mikhail-Haq copulas. These joint probability models can maximize marginal information and the dependence among the three variables. The design return values of these three variables can be obtained by three methods: univariate probability, conditional probability, and joint probability. The joint return periods of different load combinations are estimated by the proposed models. Platform responses (including base shear, overturning moment, and deck displacement) are further calculated. For the same return period, the design values of wave height, wind speed, and current velocity obtained by the conditional and joint probability models are much smaller than those by univariate probability. Considering the dependence among variables, the multivariate probability distributions provide close design parameters to actual sea state for ocean platform design.
Cetacean population density estimation from single fixed sensors using passive acoustics.
Küsel, Elizabeth T; Mellinger, David K; Thomas, Len; Marques, Tiago A; Moretti, David; Ward, Jessica
2011-06-01
Passive acoustic methods are increasingly being used to estimate animal population density. Most density estimation methods are based on estimates of the probability of detecting calls as functions of distance. Typically these are obtained using receivers capable of localizing calls or from studies of tagged animals. However, both approaches are expensive to implement. The approach described here uses a MonteCarlo model to estimate the probability of detecting calls from single sensors. The passive sonar equation is used to predict signal-to-noise ratios (SNRs) of received clicks, which are then combined with a detector characterization that predicts probability of detection as a function of SNR. Input distributions for source level, beam pattern, and whale depth are obtained from the literature. Acoustic propagation modeling is used to estimate transmission loss. Other inputs for density estimation are call rate, obtained from the literature, and false positive rate, obtained from manual analysis of a data sample. The method is applied to estimate density of Blainville's beaked whales over a 6-day period around a single hydrophone located in the Tongue of the Ocean, Bahamas. Results are consistent with those from previous analyses, which use additional tag data. © 2011 Acoustical Society of America
The 2-10 keV unabsorbed luminosity function of AGN from the LSS, CDFS, and COSMOS surveys
NASA Astrophysics Data System (ADS)
Ranalli, P.; Koulouridis, E.; Georgantopoulos, I.; Fotopoulou, S.; Hsu, L.-T.; Salvato, M.; Comastri, A.; Pierre, M.; Cappelluti, N.; Carrera, F. J.; Chiappetti, L.; Clerc, N.; Gilli, R.; Iwasawa, K.; Pacaud, F.; Paltani, S.; Plionis, E.; Vignali, C.
2016-05-01
The XMM-Large scale structure (XMM-LSS), XMM-Cosmological evolution survey (XMM-COSMOS), and XMM-Chandra deep field south (XMM-CDFS) surveys are complementary in terms of sky coverage and depth. Together, they form a clean sample with the least possible variance in instrument effective areas and point spread function. Therefore this is one of the best samples available to determine the 2-10 keV luminosity function of active galactic nuclei (AGN) and their evolution. The samples and the relevant corrections for incompleteness are described. A total of 2887 AGN is used to build the LF in the luminosity interval 1042-1046 erg s-1 and in the redshift interval 0.001-4. A new method to correct for absorption by considering the probability distribution for the column density conditioned on the hardness ratio is presented. The binned luminosity function and its evolution is determined with a variant of the Page-Carrera method, which is improved to include corrections for absorption and to account for the full probability distribution of photometric redshifts. Parametric models, namely a double power law with luminosity and density evolution (LADE) or luminosity-dependent density evolution (LDDE), are explored using Bayesian inference. We introduce the Watanabe-Akaike information criterion (WAIC) to compare the models and estimate their predictive power. Our data are best described by the LADE model, as hinted by the WAIC indicator. We also explore the recently proposed 15-parameter extended LDDE model and find that this extension is not supported by our data. The strength of our method is that it provides unabsorbed, non-parametric estimates, credible intervals for luminosity function parameters, and a model choice based on predictive power for future data. Based on observations obtained with XMM-Newton, an ESA science mission with instruments and contributions directly funded by ESA member states and NASA.Tables with the samples of the posterior probability distributions are only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/590/A80
Estimating the probability of rare events: addressing zero failure data.
Quigley, John; Revie, Matthew
2011-07-01
Traditional statistical procedures for estimating the probability of an event result in an estimate of zero when no events are realized. Alternative inferential procedures have been proposed for the situation where zero events have been realized but often these are ad hoc, relying on selecting methods dependent on the data that have been realized. Such data-dependent inference decisions violate fundamental statistical principles, resulting in estimation procedures whose benefits are difficult to assess. In this article, we propose estimating the probability of an event occurring through minimax inference on the probability that future samples of equal size realize no more events than that in the data on which the inference is based. Although motivated by inference on rare events, the method is not restricted to zero event data and closely approximates the maximum likelihood estimate (MLE) for nonzero data. The use of the minimax procedure provides a risk adverse inferential procedure where there are no events realized. A comparison is made with the MLE and regions of the underlying probability are identified where this approach is superior. Moreover, a comparison is made with three standard approaches to supporting inference where no event data are realized, which we argue are unduly pessimistic. We show that for situations of zero events the estimator can be simply approximated with 1/2.5n, where n is the number of trials. © 2011 Society for Risk Analysis.
A computer program for uncertainty analysis integrating regression and Bayesian methods
Lu, Dan; Ye, Ming; Hill, Mary C.; Poeter, Eileen P.; Curtis, Gary
2014-01-01
This work develops a new functionality in UCODE_2014 to evaluate Bayesian credible intervals using the Markov Chain Monte Carlo (MCMC) method. The MCMC capability in UCODE_2014 is based on the FORTRAN version of the differential evolution adaptive Metropolis (DREAM) algorithm of Vrugt et al. (2009), which estimates the posterior probability density function of model parameters in high-dimensional and multimodal sampling problems. The UCODE MCMC capability provides eleven prior probability distributions and three ways to initialize the sampling process. It evaluates parametric and predictive uncertainties and it has parallel computing capability based on multiple chains to accelerate the sampling process. This paper tests and demonstrates the MCMC capability using a 10-dimensional multimodal mathematical function, a 100-dimensional Gaussian function, and a groundwater reactive transport model. The use of the MCMC capability is made straightforward and flexible by adopting the JUPITER API protocol. With the new MCMC capability, UCODE_2014 can be used to calculate three types of uncertainty intervals, which all can account for prior information: (1) linear confidence intervals which require linearity and Gaussian error assumptions and typically 10s–100s of highly parallelizable model runs after optimization, (2) nonlinear confidence intervals which require a smooth objective function surface and Gaussian observation error assumptions and typically 100s–1,000s of partially parallelizable model runs after optimization, and (3) MCMC Bayesian credible intervals which require few assumptions and commonly 10,000s–100,000s or more partially parallelizable model runs. Ready access allows users to select methods best suited to their work, and to compare methods in many circumstances.
Evaluating gull diets: A comparison of conventional methods and stable isotope analysis
Weiser, Emily L.; Powell, Abby N.
2011-01-01
Samples such as regurgitated pellets and food remains have traditionally been used in studies of bird diets, but these can produce biased estimates depending on the digestibility of different foods. Stable isotope analysis has been developed as a method for assessing bird diets that is not biased by digestibility. These two methods may provide complementary or conflicting information on diets of birds, but are rarely compared directly. We analyzed carbon and nitrogen stable isotope ratios of feathers of Glaucous Gull (Larus hyperboreus) chicks from eight breeding colonies in northern Alaska, and used a Bayesian mixing model to generate a probability distribution for the contribution of each food group to diets. We compared these model results with probability distributions from conventional diet samples (pellets and food remains) from the same colonies and time periods. Relative to the stable isotope estimates, conventional analysis often overestimated the contributions of birds and small mammals to gull diets and often underestimated the contributions of fish and zooplankton. Both methods gave similar estimates for the contributions of scavenged caribou, miscellaneous marine foods, and garbage to diets. Pellets and food remains therefore may be useful for assessing the importance of garbage relative to certain other foods in diets of gulls and similar birds, but are clearly inappropriate for estimating the potential impact of gulls on birds, small mammals, or fish. However, conventional samples provide more species-level information than stable isotope analysis, so a combined approach would be most useful for diet analysis and assessing a predator's impact on particular prey groups.
Incorporating availability for detection in estimates of bird abundance
Diefenbach, D.R.; Marshall, M.R.; Mattice, J.A.; Brauning, D.W.
2007-01-01
Several bird-survey methods have been proposed that provide an estimated detection probability so that bird-count statistics can be used to estimate bird abundance. However, some of these estimators adjust counts of birds observed by the probability that a bird is detected and assume that all birds are available to be detected at the time of the survey. We marked male Henslow's Sparrows (Ammodramus henslowii) and Grasshopper Sparrows (A. savannarum) and monitored their behavior during May-July 2002 and 2003 to estimate the proportion of time they were available for detection. We found that the availability of Henslow's Sparrows declined in late June to <10% for 5- or 10-min point counts when a male had to sing and be visible to the observer; but during 20 May-19 June, males were available for detection 39.1% (SD = 27.3) of the time for 5-min point counts and 43.9% (SD = 28.9) of the time for 10-min point counts (n = 54). We detected no temporal changes in availability for Grasshopper Sparrows, but estimated availability to be much lower for 5-min point counts (10.3%, SD = 12.2) than for 10-min point counts (19.2%, SD = 22.3) when males had to be visible and sing during the sampling period (n = 80). For distance sampling, we estimated the availability of Henslow's Sparrows to be 44.2% (SD = 29.0) and the availability of Grasshopper Sparrows to be 20.6% (SD = 23.5). We show how our estimates of availability can be incorporated in the abundance and variance estimators for distance sampling and modify the abundance and variance estimators for the double-observer method. Methods that directly estimate availability from bird counts but also incorporate detection probabilities need further development and will be important for obtaining unbiased estimates of abundance for these species.
Methods for threshold determination in multiplexed assays
Tammero, Lance F. Bentley; Dzenitis, John M; Hindson, Benjamin J
2014-06-24
Methods for determination of threshold values of signatures comprised in an assay are described. Each signature enables detection of a target. The methods determine a probability density function of negative samples and a corresponding false positive rate curve. A false positive criterion is established and a threshold for that signature is determined as a point at which the false positive rate curve intersects the false positive criterion. A method for quantitative analysis and interpretation of assay results together with a method for determination of a desired limit of detection of a signature in an assay are also described.
NASA Astrophysics Data System (ADS)
Sakata, T.; Suzuki, M.; Yamamoto, T.; Nakanishi, S.; Funahashi, M.; Tsurumachi, N.
2017-10-01
We investigated the optical transmission properties of one-dimensional photonic crystal (1D-PC) microcavity structures containing the liquid-crystalline (LC) perylene tetracarboxylic bisimide (PTCBI) derivative. We fabricated the microcavity structures for this study by two different methods and observed the cavity polaritons successfully in both samples. For one sample, since the PTCBI molecules were aligned in the cavity layer of the 1D-PC by utilizing a friction transfer method, vacuum Rabi splitting energy was strongly dependent on the polarization of the incident light produced by the peculiar optical features of the LC organic semiconductor. For the other sample, we did not utilize the friction transfer method and did not observe such polarization dependence. However, we did observe a relatively large Rabi splitting energy of 187 meV, probably due to the improvement of optical confinement effect.
NASA Astrophysics Data System (ADS)
Heinemeier, Jan; Jungner, Högne; Lindroos, Alf; Ringbom, Åsa; von Konow, Thorborg; Rud, Niels
1997-03-01
A method for refining lime mortar samples for 14C dating has been developed. It includes mechanical and chemical separation of mortar carbonate with optical control of the purity of the samples. The method has been applied to a large series of AMS datings on lime mortar from three medieval churches on the Åland Islands, Finland. The datings show convincing internal consistency and confine the construction time of the churches to AD 1280-1380 with a most probable date just before AD 1300. We have also applied the method to the controversial Newport Tower, Rhode Island, USA. Our mortar datings confine the building to colonial time in the 17th century and thus refute claims of Viking origin of the tower. For the churches, a parallel series of datings of organic (charcoal) inclusions in the mortar show less reliable results than the mortar samples, which is ascribed to poor association with the construction time.
A method to estimate spatiotemporal air quality in an urban traffic corridor.
Singh, Nongthombam Premananda; Gokhale, Sharad
2015-12-15
Air quality exposure assessment using personal exposure sampling or direct measurement of spatiotemporal air pollutant concentrations has difficulty and limitations. Most statistical methods used for estimating spatiotemporal air quality do not account for the source characteristics (e.g. emissions). In this study, a prediction method, based on the lognormal probability distribution of hourly-average-spatial concentrations of carbon monoxide (CO) obtained by a CALINE4 model, has been developed and validated in an urban traffic corridor. The data on CO concentrations were collected at three locations and traffic and meteorology within the urban traffic corridor.(1) The method has been developed with the data of one location and validated at other two locations. The method estimated the CO concentrations reasonably well (correlation coefficient, r≥0.96). Later, the method has been applied to estimate the probability of occurrence [P(C≥Cstd] of the spatial CO concentrations in the corridor. The results have been promising and, therefore, may be useful to quantifying spatiotemporal air quality within an urban area. Copyright © 2015 Elsevier B.V. All rights reserved.
Normal uniform mixture differential gene expression detection for cDNA microarrays
Dean, Nema; Raftery, Adrian E
2005-01-01
Background One of the primary tasks in analysing gene expression data is finding genes that are differentially expressed in different samples. Multiple testing issues due to the thousands of tests run make some of the more popular methods for doing this problematic. Results We propose a simple method, Normal Uniform Differential Gene Expression (NUDGE) detection for finding differentially expressed genes in cDNA microarrays. The method uses a simple univariate normal-uniform mixture model, in combination with new normalization methods for spread as well as mean that extend the lowess normalization of Dudoit, Yang, Callow and Speed (2002) [1]. It takes account of multiple testing, and gives probabilities of differential expression as part of its output. It can be applied to either single-slide or replicated experiments, and it is very fast. Three datasets are analyzed using NUDGE, and the results are compared to those given by other popular methods: unadjusted and Bonferroni-adjusted t tests, Significance Analysis of Microarrays (SAM), and Empirical Bayes for microarrays (EBarrays) with both Gamma-Gamma and Lognormal-Normal models. Conclusion The method gives a high probability of differential expression to genes known/suspected a priori to be differentially expressed and a low probability to the others. In terms of known false positives and false negatives, the method outperforms all multiple-replicate methods except for the Gamma-Gamma EBarrays method to which it offers comparable results with the added advantages of greater simplicity, speed, fewer assumptions and applicability to the single replicate case. An R package called nudge to implement the methods in this paper will be made available soon at . PMID:16011807
Improved high-dimensional prediction with Random Forests by the use of co-data.
Te Beest, Dennis E; Mes, Steven W; Wilting, Saskia M; Brakenhoff, Ruud H; van de Wiel, Mark A
2017-12-28
Prediction in high dimensional settings is difficult due to the large number of variables relative to the sample size. We demonstrate how auxiliary 'co-data' can be used to improve the performance of a Random Forest in such a setting. Co-data are incorporated in the Random Forest by replacing the uniform sampling probabilities that are used to draw candidate variables by co-data moderated sampling probabilities. Co-data here are defined as any type information that is available on the variables of the primary data, but does not use its response labels. These moderated sampling probabilities are, inspired by empirical Bayes, learned from the data at hand. We demonstrate the co-data moderated Random Forest (CoRF) with two examples. In the first example we aim to predict the presence of a lymph node metastasis with gene expression data. We demonstrate how a set of external p-values, a gene signature, and the correlation between gene expression and DNA copy number can improve the predictive performance. In the second example we demonstrate how the prediction of cervical (pre-)cancer with methylation data can be improved by including the location of the probe relative to the known CpG islands, the number of CpG sites targeted by a probe, and a set of p-values from a related study. The proposed method is able to utilize auxiliary co-data to improve the performance of a Random Forest.
Optimal Time-Resource Allocation for Energy-Efficient Physical Activity Detection
Thatte, Gautam; Li, Ming; Lee, Sangwon; Emken, B. Adar; Annavaram, Murali; Narayanan, Shrikanth; Spruijt-Metz, Donna; Mitra, Urbashi
2011-01-01
The optimal allocation of samples for physical activity detection in a wireless body area network for health-monitoring is considered. The number of biometric samples collected at the mobile device fusion center, from both device-internal and external Bluetooth heterogeneous sensors, is optimized to minimize the transmission power for a fixed number of samples, and to meet a performance requirement defined using the probability of misclassification between multiple hypotheses. A filter-based feature selection method determines an optimal feature set for classification, and a correlated Gaussian model is considered. Using experimental data from overweight adolescent subjects, it is found that allocating a greater proportion of samples to sensors which better discriminate between certain activity levels can result in either a lower probability of error or energy-savings ranging from 18% to 22%, in comparison to equal allocation of samples. The current activity of the subjects and the performance requirements do not significantly affect the optimal allocation, but employing personalized models results in improved energy-efficiency. As the number of samples is an integer, an exhaustive search to determine the optimal allocation is typical, but computationally expensive. To this end, an alternate, continuous-valued vector optimization is derived which yields approximately optimal allocations and can be implemented on the mobile fusion center due to its significantly lower complexity. PMID:21796237
Nuclear Ensemble Approach with Importance Sampling.
Kossoski, Fábris; Barbatti, Mario
2018-06-12
We show that the importance sampling technique can effectively augment the range of problems where the nuclear ensemble approach can be applied. A sampling probability distribution function initially determines the collection of initial conditions for which calculations are performed, as usual. Then, results for a distinct target distribution are computed by introducing compensating importance sampling weights for each sampled point. This mapping between the two probability distributions can be performed whenever they are both explicitly constructed. Perhaps most notably, this procedure allows for the computation of temperature dependent observables. As a test case, we investigated the UV absorption spectra of phenol, which has been shown to have a marked temperature dependence. Application of the proposed technique to a range that covers 500 K provides results that converge to those obtained with conventional sampling. We further show that an overall improved rate of convergence is obtained when sampling is performed at intermediate temperatures. The comparison between calculated and the available measured cross sections is very satisfactory, as the main features of the spectra are correctly reproduced. As a second test case, one of Tully's classical models was revisited, and we show that the computation of dynamical observables also profits from the importance sampling technique. In summary, the strategy developed here can be employed to assess the role of temperature for any property calculated within the nuclear ensemble method, with the same computational cost as doing so for a single temperature.
Meixell, Brandt W.; Arnold, Todd W.; Lindberg, Mark S.; Smith, Matthew M.; Runstadler, Jonathan A.; Ramey, Andy M.
2016-01-01
Methods: We used molecular methods to screen blood samples and cloacal/oropharyngeal swabs collected from 1347 ducks of five species during May-August 2010, in interior Alaska, for the presence of hematozoa, Influenza A Virus (IAV), and IAV antibodies. Using models to account for imperfect detection of parasites, we estimated seasonal variation in prevalence of three parasite genera (Haemoproteus, Plasmodium, Leucocytozoon) and investigated how co-infection with parasites and viruses were related to the probability of infection. Results: We detected parasites from each hematozoan genus in adult and juvenile ducks of all species sampled. Seasonal patterns in detection and prevalence varied by parasite genus and species, age, and sex of duck hosts. The probabilities of infection for Haemoproteus and Leucocytozoon parasites were strongly positively correlated, but hematozoa infection was not correlated with IAV infection or serostatus. The probability of Haemoproteus infection was negatively related to body condition in juvenile ducks; relationships between Leucocytozoon infection and body condition varied among host species. Conclusions: We present prevalence estimates for Haemoproteus, Leucocytozoon, and Plasmodium infections in waterfowl at the interface of the sub-Arctic and Arctic and provide evidence for local transmission of all three parasite genera. Variation in prevalence and molecular detection of hematozoa parasites in wild ducks is influenced by seasonal timing and a number of host traits. A positive correlation in co-infection of Leucocytozoon and Haemoproteus suggests that infection probability by parasites in one or both genera is enhanced by infection with the other, or that encounter rates of hosts and genus-specific vectors are correlated. Using size-adjusted mass as an index of host condition, we did not find evidence for strong deleterious consequences of hematozoa infection in wild ducks.
Finding SDSS Galaxy Clusters in 4-dimensional Color Space Using the False Discovery Rate
NASA Astrophysics Data System (ADS)
Nichol, R. C.; Miller, C. J.; Reichart, D.; Wasserman, L.; Genovese, C.; SDSS Collaboration
2000-12-01
We describe a recently developed statistical technique that provides a meaningful cut-off in probability-based decision making. We are concerned with multiple testing, where each test produces a well-defined probability (or p-value). By well-known, we mean that the null hypothesis used to determine the p-value is fully understood and appropriate. The method is entitled False Discovery Rate (FDR) and its largest advantage over other measures is that it allows one to specify a maximal amount of acceptable error. As an example of this tool, we apply FDR to a four-dimensional clustering algorithm using SDSS data. For each galaxy (or test galaxy), we count the number of neighbors that fit within one standard deviation of a four dimensional Gaussian centered on that test galaxy. The mean and standard deviation of that Gaussian are determined from the colors and errors of the test galaxy. We then take that same Gaussian and place it on a random selection of n galaxies and make a similar count. In the limit of large n, we expect the median count around these random galaxies to represent a typical field galaxy. For every test galaxy we determine the probability (or p-value) that it is a field galaxy based on these counts. A low p-value implies that the test galaxy is in a cluster environment. Once we have a p-value for every galaxy, we use FDR to determine at what level we should make our probability cut-off. Once this cut-off is made, we have a final sample of galaxies that are cluster-like galaxies. Using FDR, we also know the maximum amount of field contamination in our cluster galaxy sample. We present our preliminary galaxy clustering results using these methods.
Lum, Kirsten J.; Sundaram, Rajeshwari; Louis, Thomas A.
2015-01-01
Prospective pregnancy studies are a valuable source of longitudinal data on menstrual cycle length. However, care is needed when making inferences of such renewal processes. For example, accounting for the sampling plan is necessary for unbiased estimation of the menstrual cycle length distribution for the study population. If couples can enroll when they learn of the study as opposed to waiting for the start of a new menstrual cycle, then due to length-bias, the enrollment cycle will be stochastically larger than the general run of cycles, a typical property of prevalent cohort studies. Furthermore, the probability of enrollment can depend on the length of time since a woman’s last menstrual period (a backward recurrence time), resulting in selection effects. We focus on accounting for length-bias and selection effects in the likelihood for enrollment menstrual cycle length, using a recursive two-stage approach wherein we first estimate the probability of enrollment as a function of the backward recurrence time and then use it in a likelihood with sampling weights that account for length-bias and selection effects. To broaden the applicability of our methods, we augment our model to incorporate a couple-specific random effect and time-independent covariate. A simulation study quantifies performance for two scenarios of enrollment probability when proper account is taken of sampling plan features. In addition, we estimate the probability of enrollment and the distribution of menstrual cycle length for the study population of the Longitudinal Investigation of Fertility and the Environment Study. PMID:25027273
O'Reilly, Joseph E; Donoghue, Philip C J
2018-03-01
Consensus trees are required to summarize trees obtained through MCMC sampling of a posterior distribution, providing an overview of the distribution of estimated parameters such as topology, branch lengths, and divergence times. Numerous consensus tree construction methods are available, each presenting a different interpretation of the tree sample. The rise of morphological clock and sampled-ancestor methods of divergence time estimation, in which times and topology are coestimated, has increased the popularity of the maximum clade credibility (MCC) consensus tree method. The MCC method assumes that the sampled, fully resolved topology with the highest clade credibility is an adequate summary of the most probable clades, with parameter estimates from compatible sampled trees used to obtain the marginal distributions of parameters such as clade ages and branch lengths. Using both simulated and empirical data, we demonstrate that MCC trees, and trees constructed using the similar maximum a posteriori (MAP) method, often include poorly supported and incorrect clades when summarizing diffuse posterior samples of trees. We demonstrate that the paucity of information in morphological data sets contributes to the inability of MCC and MAP trees to accurately summarise of the posterior distribution. Conversely, majority-rule consensus (MRC) trees represent a lower proportion of incorrect nodes when summarizing the same posterior samples of trees. Thus, we advocate the use of MRC trees, in place of MCC or MAP trees, in attempts to summarize the results of Bayesian phylogenetic analyses of morphological data.
O’Reilly, Joseph E; Donoghue, Philip C J
2018-01-01
Abstract Consensus trees are required to summarize trees obtained through MCMC sampling of a posterior distribution, providing an overview of the distribution of estimated parameters such as topology, branch lengths, and divergence times. Numerous consensus tree construction methods are available, each presenting a different interpretation of the tree sample. The rise of morphological clock and sampled-ancestor methods of divergence time estimation, in which times and topology are coestimated, has increased the popularity of the maximum clade credibility (MCC) consensus tree method. The MCC method assumes that the sampled, fully resolved topology with the highest clade credibility is an adequate summary of the most probable clades, with parameter estimates from compatible sampled trees used to obtain the marginal distributions of parameters such as clade ages and branch lengths. Using both simulated and empirical data, we demonstrate that MCC trees, and trees constructed using the similar maximum a posteriori (MAP) method, often include poorly supported and incorrect clades when summarizing diffuse posterior samples of trees. We demonstrate that the paucity of information in morphological data sets contributes to the inability of MCC and MAP trees to accurately summarise of the posterior distribution. Conversely, majority-rule consensus (MRC) trees represent a lower proportion of incorrect nodes when summarizing the same posterior samples of trees. Thus, we advocate the use of MRC trees, in place of MCC or MAP trees, in attempts to summarize the results of Bayesian phylogenetic analyses of morphological data. PMID:29106675
A New Automated Method and Sample Data Flow for Analysis of Volatile Nitrosamines in Human Urine*
Hodgson, James A.; Seyler, Tiffany H.; McGahee, Ernest; Arnstein, Stephen; Wang, Lanqing
2016-01-01
Volatile nitrosamines (VNAs) are a group of compounds classified as probable (group 2A) and possible (group 2B) carcinogens in humans. Along with certain foods and contaminated drinking water, VNAs are detected at high levels in tobacco products and in both mainstream and sidestream smoke. Our laboratory monitors six urinary VNAs—N-nitrosodimethylamine (NDMA), N-nitrosomethylethylamine (NMEA), N-nitrosodiethylamine (NDEA), N-nitrosopiperidine (NPIP), N-nitrosopyrrolidine (NPYR), and N-nitrosomorpholine (NMOR)—using isotope dilution GC-MS/MS (QQQ) for large population studies such as the National Health and Nutrition Examination Survey (NHANES). In this paper, we report for the first time a new automated sample preparation method to more efficiently quantitate these VNAs. Automation is done using Hamilton STAR™ and Caliper Staccato™ workstations. This new automated method reduces sample preparation time from 4 hours to 2.5 hours while maintaining precision (inter-run CV < 10%) and accuracy (85% - 111%). More importantly this method increases sample throughput while maintaining a low limit of detection (<10 pg/mL) for all analytes. A streamlined sample data flow was created in parallel to the automated method, in which samples can be tracked from receiving to final LIMs output with minimal human intervention, further minimizing human error in the sample preparation process. This new automated method and the sample data flow are currently applied in bio-monitoring of VNAs in the US non-institutionalized population NHANES 2013-2014 cycle. PMID:26949569
Normal probability plots with confidence.
Chantarangsi, Wanpen; Liu, Wei; Bretz, Frank; Kiatsupaibul, Seksan; Hayter, Anthony J; Wan, Fang
2015-01-01
Normal probability plots are widely used as a statistical tool for assessing whether an observed simple random sample is drawn from a normally distributed population. The users, however, have to judge subjectively, if no objective rule is provided, whether the plotted points fall close to a straight line. In this paper, we focus on how a normal probability plot can be augmented by intervals for all the points so that, if the population distribution is normal, then all the points should fall into the corresponding intervals simultaneously with probability 1-α. These simultaneous 1-α probability intervals provide therefore an objective mean to judge whether the plotted points fall close to the straight line: the plotted points fall close to the straight line if and only if all the points fall into the corresponding intervals. The powers of several normal probability plot based (graphical) tests and the most popular nongraphical Anderson-Darling and Shapiro-Wilk tests are compared by simulation. Based on this comparison, recommendations are given in Section 3 on which graphical tests should be used in what circumstances. An example is provided to illustrate the methods. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Phonotactic Probability Effects in Children Who Stutter
Anderson, Julie D.; Byrd, Courtney T.
2008-01-01
Purpose The purpose of this study was to examine the influence of phonotactic probability, the frequency of different sound segments and segment sequences, on the overall fluency with which words are produced by preschool children who stutter (CWS), as well as to determine whether it has an effect on the type of stuttered disfluency produced. Method A 500+ word language sample was obtained from 19 CWS. Each stuttered word was randomly paired with a fluently produced word that closely matched it in grammatical class, word length, familiarity, word and neighborhood frequency, and neighborhood density. Phonotactic probability values were obtained for the stuttered and fluent words from an online database. Results Phonotactic probability did not have a significant influence on the overall susceptibility of words to stuttering, but it did impact the type of stuttered disfluency produced. In specific, single-syllable word repetitions were significantly lower in phonotactic probability than fluently produced words, as well as part-word repetitions and sound prolongations. Conclusions In general, the differential impact of phonotactic probability on the type of stuttering-like disfluency produced by young CWS provides some support for the notion that different disfluency types may originate in the disruption of different levels of processing. PMID:18658056
The efficacy of respondent-driven sampling for the health assessment of minority populations.
Badowski, Grazyna; Somera, Lilnabeth P; Simsiman, Brayan; Lee, Hye-Ryeon; Cassel, Kevin; Yamanaka, Alisha; Ren, JunHao
2017-10-01
Respondent driven sampling (RDS) is a relatively new network sampling technique typically employed for hard-to-reach populations. Like snowball sampling, initial respondents or "seeds" recruit additional respondents from their network of friends. Under certain assumptions, the method promises to produce a sample independent from the biases that may have been introduced by the non-random choice of "seeds." We conducted a survey on health communication in Guam's general population using the RDS method, the first survey that has utilized this methodology in Guam. It was conducted in hopes of identifying a cost-efficient non-probability sampling strategy that could generate reasonable population estimates for both minority and general populations. RDS data was collected in Guam in 2013 (n=511) and population estimates were compared with 2012 BRFSS data (n=2031) and the 2010 census data. The estimates were calculated using the unweighted RDS sample and the weighted sample using RDS inference methods and compared with known population characteristics. The sample size was reached in 23days, providing evidence that the RDS method is a viable, cost-effective data collection method, which can provide reasonable population estimates. However, the results also suggest that the RDS inference methods used to reduce bias, based on self-reported estimates of network sizes, may not always work. Caution is needed when interpreting RDS study findings. For a more diverse sample, data collection should not be conducted in just one location. Fewer questions about network estimates should be asked, and more careful consideration should be given to the kind of incentives offered to participants. Copyright © 2017. Published by Elsevier Ltd.
Spatial patch occupancy patterns of the Lower Keys marsh rabbit
Eaton, Mitchell J.; Hughes, Phillip T.; Nichols, James D.; Morkill, Anne; Anderson, Chad
2011-01-01
Reliable estimates of presence or absence of a species can provide substantial information on management questions related to distribution and habitat use but should incorporate the probability of detection to reduce bias. We surveyed for the endangered Lower Keys marsh rabbit (Sylvilagus palustris hefneri) in habitat patches on 5 Florida Key islands, USA, to estimate occupancy and detection probabilities. We derived detection probabilities using spatial replication of plots and evaluated hypotheses that patch location (coastal or interior) and patch size influence occupancy and detection. Results demonstrate that detection probability, given rabbits were present, was <0.5 and suggest that naïve estimates (i.e., estimates without consideration of imperfect detection) of patch occupancy are negatively biased. We found that patch size and location influenced probability of occupancy but not detection. Our findings will be used by Refuge managers to evaluate population trends of Lower Keys marsh rabbits from historical data and to guide management decisions for species recovery. The sampling and analytical methods we used may be useful for researchers and managers of other endangered lagomorphs and cryptic or fossorial animals occupying diverse habitats.
Masculinity-femininity predicts sexual orientation in men but not in women.
Udry, J Richard; Chantala, Kim
2006-11-01
Using the nationally representative sample of about 15,000 Add Health respondents in Wave III, the hypothesis is tested that masculinity-femininity in adolescence is correlated with sexual orientation 5 years later and 6 years later: that is, that for adolescent males in 1995 and again in 1996, more feminine males have a higher probability of self-identifying as homosexuals in 2001-02. It is predicted that for adolescent females in 1995 and 1996, more masculine females have a higher probability of self-identifying as homosexuals in 2001-02. Masculinity-femininity is measured by the classical method used by Terman & Miles. For both time periods, the hypothesis was strongly confirmed for males: the more feminine males had several times the probability of being attracted to same-sex partners, several times the probability of having same-sex partners, and several times the probability of self-identifying as homosexuals, compared with more masculine males. For females, no relationship was found at either time period between masculinity and sex of preference. The biological mechanism underlying homosexuality may be different for males and females.
Rare behavior of growth processes via umbrella sampling of trajectories
NASA Astrophysics Data System (ADS)
Klymko, Katherine; Geissler, Phillip L.; Garrahan, Juan P.; Whitelam, Stephen
2018-03-01
We compute probability distributions of trajectory observables for reversible and irreversible growth processes. These results reveal a correspondence between reversible and irreversible processes, at particular points in parameter space, in terms of their typical and atypical trajectories. Thus key features of growth processes can be insensitive to the precise form of the rate constants used to generate them, recalling the insensitivity to microscopic details of certain equilibrium behavior. We obtained these results using a sampling method, inspired by the "s -ensemble" large-deviation formalism, that amounts to umbrella sampling in trajectory space. The method is a simple variant of existing approaches, and applies to ensembles of trajectories controlled by the total number of events. It can be used to determine large-deviation rate functions for trajectory observables in or out of equilibrium.
Adsorption of NO on alumina-supported oxides and oxide-hydroxides of manganese.
Spasova, I; Nikolov, P; Mehandjiev, D
2005-10-15
The adsorption capacity for NO of alumina-supported oxides and oxide-hydroxides of manganese have been studied. Two series of samples have been prepared by precipitation on gamma-alumina and appropriate thermal treatment. The samples have been characterized by adsorption methods, magnetic methods, electronic paramagnetic resonance (EPR), transient response technique, and temperature-programmed desorption (TPD). The influence of the concentration of the initial manganese-containing solution has been investigated. The sample, prepared with a solution with Mn concentration of 4 g/100 ml, has been shown to be the best adsorbent for NO under the conditions of the experiment. It has been found that the presence mainly of Mn3+ ions on the surface of the support is probably responsible for the enhanced adsorption capacity.
Commeau, Judith A.; Valentine, Page C.
1991-01-01
Most of the sample analyzed by the method described were marine muds collected from the Gulf of Maine (Valentine and Commeau, 1990). The silt and clay fraction (up to 99 wt% of the sediment) is composed of clay minerals (chiefly illite-mica and chlorite), silt-size quartz and feldspar, and small crystals (2-12 um) of rutile and hematite. The bulk sediment samples contained an average of 2 to 3 wt percent CaCO3. Tiher samples analyzed include red and gray Carboniferous and Triassic sandstones and siltstones exposed around the Bay of Fundy region and Paleozoic sandstones, siltstones, and shales from northern Maine and New Brunswick. These rocks are probable sources for the fine-grained rutile found in the Gulf of Maine.
NASA Astrophysics Data System (ADS)
Koshinchanov, Georgy; Dimitrov, Dobri
2008-11-01
The characteristics of rainfall intensity are important for many purposes, including design of sewage and drainage systems, tuning flood warning procedures, etc. Those estimates are usually statistical estimates of the intensity of precipitation realized for certain period of time (e.g. 5, 10 min., etc) with different return period (e.g. 20, 100 years, etc). The traditional approach in evaluating the mentioned precipitation intensities is to process the pluviometer's records and fit probability distribution to samples of intensities valid for certain locations ore regions. Those estimates further become part of the state regulations to be used for various economic activities. Two problems occur using the mentioned approach: 1. Due to various factors the climate conditions are changed and the precipitation intensity estimates need regular update; 2. As far as the extremes of the probability distribution are of particular importance for the practice, the methodology of the distribution fitting needs specific attention to those parts of the distribution. The aim of this paper is to make review of the existing methodologies for processing the intensive rainfalls and to refresh some of the statistical estimates for the studied areas. The methodologies used in Bulgaria for analyzing the intensive rainfalls and produce relevant statistical estimates: The method of the maximum intensity, used in the National Institute of Meteorology and Hydrology to process and decode the pluviometer's records, followed by distribution fitting for each precipitation duration period; As the above, but with separate modeling of probability distribution for the middle and high probability quantiles. Method is similar to the first one, but with a threshold of 0,36 mm/min of intensity; Another method proposed by the Russian hydrologist G. A. Aleksiev for regionalization of estimates over some territory, improved and adapted by S. Gerasimov for Bulgaria; Next method is considering only the intensive rainfalls (if any) during the day with the maximal annual daily precipitation total for a given year; Conclusions are drown on the relevance and adequacy of the applied methods.
Guan, Li; Hao, Bibo; Cheng, Qijin; Yip, Paul SF
2015-01-01
Background Traditional offline assessment of suicide probability is time consuming and difficult in convincing at-risk individuals to participate. Identifying individuals with high suicide probability through online social media has an advantage in its efficiency and potential to reach out to hidden individuals, yet little research has been focused on this specific field. Objective The objective of this study was to apply two classification models, Simple Logistic Regression (SLR) and Random Forest (RF), to examine the feasibility and effectiveness of identifying high suicide possibility microblog users in China through profile and linguistic features extracted from Internet-based data. Methods There were nine hundred and nine Chinese microblog users that completed an Internet survey, and those scoring one SD above the mean of the total Suicide Probability Scale (SPS) score, as well as one SD above the mean in each of the four subscale scores in the participant sample were labeled as high-risk individuals, respectively. Profile and linguistic features were fed into two machine learning algorithms (SLR and RF) to train the model that aims to identify high-risk individuals in general suicide probability and in its four dimensions. Models were trained and then tested by 5-fold cross validation; in which both training set and test set were generated under the stratified random sampling rule from the whole sample. There were three classic performance metrics (Precision, Recall, F1 measure) and a specifically defined metric “Screening Efficiency” that were adopted to evaluate model effectiveness. Results Classification performance was generally matched between SLR and RF. Given the best performance of the classification models, we were able to retrieve over 70% of the labeled high-risk individuals in overall suicide probability as well as in the four dimensions. Screening Efficiency of most models varied from 1/4 to 1/2. Precision of the models was generally below 30%. Conclusions Individuals in China with high suicide probability are recognizable by profile and text-based information from microblogs. Although there is still much space to improve the performance of classification models in the future, this study may shed light on preliminary screening of risky individuals via machine learning algorithms, which can work side-by-side with expert scrutiny to increase efficiency in large-scale-surveillance of suicide probability from online social media. PMID:26543921
Making great leaps forward: Accounting for detectability in herpetological field studies
Mazerolle, Marc J.; Bailey, Larissa L.; Kendall, William L.; Royle, J. Andrew; Converse, Sarah J.; Nichols, James D.
2007-01-01
Detecting individuals of amphibian and reptile species can be a daunting task. Detection can be hindered by various factors such as cryptic behavior, color patterns, or observer experience. These factors complicate the estimation of state variables of interest (e.g., abundance, occupancy, species richness) as well as the vital rates that induce changes in these state variables (e.g., survival probabilities for abundance; extinction probabilities for occupancy). Although ad hoc methods (e.g., counts uncorrected for detection, return rates) typically perform poorly in the face of no detection, they continue to be used extensively in various fields, including herpetology. However, formal approaches that estimate and account for the probability of detection, such as capture-mark-recapture (CMR) methods and distance sampling, are available. In this paper, we present classical approaches and recent advances in methods accounting for detectability that are particularly pertinent for herpetological data sets. Through examples, we illustrate the use of several methods, discuss their performance compared to that of ad hoc methods, and we suggest available software to perform these analyses. The methods we discuss control for imperfect detection and reduce bias in estimates of demographic parameters such as population size, survival, or, at other levels of biological organization, species occurrence. Among these methods, recently developed approaches that no longer require marked or resighted individuals should be particularly of interest to field herpetologists. We hope that our effort will encourage practitioners to implement some of the estimation methods presented herein instead of relying on ad hoc methods that make more limiting assumptions.
Calibrating SALT: a sampling scheme to improve estimates of suspended sediment yield
Robert B. Thomas
1986-01-01
Abstract - SALT (Selection At List Time) is a variable probability sampling scheme that provides unbiased estimates of suspended sediment yield and its variance. SALT performs better than standard schemes which are estimate variance. Sampling probabilities are based on a sediment rating function which promotes greater sampling intensity during periods of high...
Faster computation of exact RNA shape probabilities.
Janssen, Stefan; Giegerich, Robert
2010-03-01
Abstract shape analysis allows efficient computation of a representative sample of low-energy foldings of an RNA molecule. More comprehensive information is obtained by computing shape probabilities, accumulating the Boltzmann probabilities of all structures within each abstract shape. Such information is superior to free energies because it is independent of sequence length and base composition. However, up to this point, computation of shape probabilities evaluates all shapes simultaneously and comes with a computation cost which is exponential in the length of the sequence. We device an approach called RapidShapes that computes the shapes above a specified probability threshold T by generating a list of promising shapes and constructing specialized folding programs for each shape to compute its share of Boltzmann probability. This aims at a heuristic improvement of runtime, while still computing exact probability values. Evaluating this approach and several substrategies, we find that only a small proportion of shapes have to be actually computed. For an RNA sequence of length 400, this leads, depending on the threshold, to a 10-138 fold speed-up compared with the previous complete method. Thus, probabilistic shape analysis has become feasible in medium-scale applications, such as the screening of RNA transcripts in a bacterial genome. RapidShapes is available via http://bibiserv.cebitec.uni-bielefeld.de/rnashapes
Higo, Junichi; Ikebe, Jinzen; Kamiya, Narutoshi; Nakamura, Haruki
2012-03-01
Protein folding and protein-ligand docking have long persisted as important subjects in biophysics. Using multicanonical molecular dynamics (McMD) simulations with realistic expressions, i.e., all-atom protein models and an explicit solvent, free-energy landscapes have been computed for several systems, such as the folding of peptides/proteins composed of a few amino acids up to nearly 60 amino-acid residues, protein-ligand interactions, and coupled folding and binding of intrinsically disordered proteins. Recent progress in conformational sampling and its applications to biophysical systems are reviewed in this report, including descriptions of several outstanding studies. In addition, an algorithm and detailed procedures used for multicanonical sampling are presented along with the methodology of adaptive umbrella sampling. Both methods control the simulation so that low-probability regions along a reaction coordinate are sampled frequently. The reaction coordinate is the potential energy for multicanonical sampling and is a structural identifier for adaptive umbrella sampling. One might imagine that this probability control invariably enhances conformational transitions among distinct stable states, but this study examines the enhanced conformational sampling of a simple system and shows that reasonably well-controlled sampling slows the transitions. This slowing is induced by a rapid change of entropy along the reaction coordinate. We then provide a recipe to speed up the sampling by loosening the rapid change of entropy. Finally, we report all-atom McMD simulation results of various biophysical systems in an explicit solvent.
Longitudinal study of Escherichia coli O157 shedding and super shedding in dairy heifers.
Williams, K J; Ward, M P; Dhungyel, O P
2015-04-01
A longitudinal study was conducted to assess the methods available for detection of Escherichia coli O157 and to investigate the prevalence and occurrence of long-term shedding and super shedding in a cohort of Australian dairy heifers. Samples were obtained at approximately weekly intervals from heifers at pasture under normal management systems. Selective sampling techniques were used with the aim of identifying heifers with a higher probability of shedding or super shedding. Rectoanal mucosal swabs (RAMS) and fecal samples were obtained from each heifer. Direct culture of feces was used for detection and enumeration. Feces and RAMS were tested by enrichment culture. Selected samples were further tested retrospectively by immunomagnetic separation of enriched samples. Of 784 samples obtained, 154 (19.6%) were detected as positive using culture methods. Adjusting for selective sampling, the prevalence was 71 (15.6%) of 454. In total, 66 samples were detected as positive at >10(2) CFU/g of which 8 were >10(4) CFU/g and classed as super shedding. A significant difference was observed in detection by enriched culture of RAMS and feces. Dairy heifers within this cohort exhibited variable E. coli O157 shedding, consistent with previous estimates of shedding. Super shedding was detected at a low frequency and inconsistently from individual heifers. All detection methods identified some samples as positive that were not detected by any other method, indicating that the testing methods used will influence survey results.
Objective estimates based on experimental data and initial and final knowledge
NASA Technical Reports Server (NTRS)
Rosenbaum, B. M.
1972-01-01
An extension of the method of Jaynes, whereby least biased probability estimates are obtained, permits such estimates to be made which account for experimental data on hand as well as prior and posterior knowledge. These estimates can be made for both discrete and continuous sample spaces. The method allows a simple interpretation of Laplace's two rules: the principle of insufficient reason and the rule of succession. Several examples are analyzed by way of illustration.
Williams, Mary R; Sigman, Michael E; Lewis, Jennifer; Pitan, Kelly McHugh
2012-10-10
A bayesian soft classification method combined with target factor analysis (TFA) is described and tested for the analysis of fire debris data. The method relies on analysis of the average mass spectrum across the chromatographic profile (i.e., the total ion spectrum, TIS) from multiple samples taken from a single fire scene. A library of TIS from reference ignitable liquids with assigned ASTM classification is used as the target factors in TFA. The class-conditional distributions of correlations between the target and predicted factors for each ASTM class are represented by kernel functions and analyzed by bayesian decision theory. The soft classification approach assists in assessing the probability that ignitable liquid residue from a specific ASTM E1618 class, is present in a set of samples from a single fire scene, even in the presence of unspecified background contributions from pyrolysis products. The method is demonstrated with sample data sets and then tested on laboratory-scale burn data and large-scale field test burns. The overall performance achieved in laboratory and field test of the method is approximately 80% correct classification of fire debris samples. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Ghosh, S.; Parker, W.; Odom, L.
2003-04-01
The detrimental influence which airborne contaminants has on vegetation in many parts of the world has become of increasing interest and concern in recent years. The use of suitable plants such as epiphytes (vegetation which grows on another plant) for measuring concentrations of airborne materials provides the advantages of (a) an integration of the periodic fluctuations in amounts of these materials that occur over relatively long periods of time and (b) economy in sampling. This class of plants, which are mosses and lichens, are somewhat less dependent on their substrates and may act more purely as air indicators. The epiphytes do not derive nutrients from soil, but depend on airborne moisture and particulates for elemental sources. The way with which they absorb nutrients from these external sources gives rise to an uncommon sensitivity to the harmful effects of air pollution. Also in addition, plants of this class absorb constituents of airborne particulates which may not be directly toxic to the plant but of environmental concern to humans. In particular, trace element accumulation in epiphytic Tillandsia usneoides L. (Spanish Moss) common in Atlantic and Gulf Coastal plains has been used in air pollution studies. Recent studies have also evaluated Spanish moss as an indicator of contamination of pesticides and other organic aromatic compounds. Two hundred and six samples of Spanish moss (Tillandsia usneoides L.) were collected from over its geographic range in Florida for this study. The samples were analyzed for a variety of major and minor elements, and the resulting data were statistically analyzed for pertinent geochemical associations. Three statistical methods have been used on the geochemical data of Spanish moss to evaluate the nature of probable sources for each of the elements. This kind of work is being done because the exact nature and location of each specimen is unknown. So, the three different statistical methods have been used to classify or determine where the elements came in from based on the following study done by HT Shacklette and JJ Connor in 1973. The first method used, R mode Cluster Analysis (CA) in this report has resulted in some specific group of elements that tend to be coming from the same kind of sources. The Rare Earth Elements (REEs) show an excellent grouping. Their probable source maybe from samples, which had lots of intake of soil dust and rock dust. The grouping of elements Co-Pb-V-Ni-W-Ba probably is because they are all from samples collected near highway where there is a lot of automotive exhaust. Again, clustering of Zn-Sn-Mo-In-Sb probably show that they are from samples, which came from some industrial sites. Samples probably collected from and around sea beaches will have the following elements clustering together: Na-Mg-Li-B. The second method, Principal Component Analysis (PCA) in this project has resulted in a specific descriptive model of chemical variation in Spanish moss. The model appears to be mathematically adequate, in that 4 components describe nearly 64% of the total observed variation and also informative in identifying some major probable sources for the different elements analyzed. Two kinds of sources have been identified: one is natural particulates from soil and rock dust and agricultural sources &the other being technological metals from automotive exhaust and industrial output. The first two components have been labeled as "natural particulates" and the remaining as "technological metals" after Connor and Shacklette. The elements with highest loadings on first and second component are lithium, boron, sodium, magnesium, aluminum, calcium, scandium, titanium, iron, selenium, strontium, yttrium, molybdenum, indium, antimony, lanthanum, uranium and bismuth and Rare earth elements (REE) which in general have mainly agricultural and natural sources. The elements with highest loadings on third and fourth component are potassium, copper, arsenic, barium, vanadium, manganese, cobalt, nickel, copper, zinc, arsenic, rubidium, cadmium, tin, cesium, tungsten, barium, mercury and lead, mostly having industrial and automotive sources. Discriminant Function Analysis (DFA) has been used as verification to the results obtained from CA and PCA. There is some factor, which has a strong effect on some of the elements, and this factor is unidentified yet in the project. Overall, most of the elements are behaving as expected based on the earlier work of Shacklette and Connor (1973) and results from CA and PCA.
Trends in Serious Emotional Disturbance among Youths Exposed to Hurricane Katrina
ERIC Educational Resources Information Center
McLaughlin, Katie A.; Fairbank, John A.; Gruber, Michael J.; Jones, Russell T.; Osofsky, Joy D.; Pfefferbaum, Betty; Sampson, Nancy A.; Kessler, Ronald C.
2010-01-01
Objective: To examine patterns and predictors of trends in "DSM-IV" serious emotional disturbance (SED) among youths exposed to Hurricane Katrina. Method: A probability sample of adult pre-hurricane residents of the areas affected by Katrina completed baseline and follow-up telephone surveys 18 to 27 months post-hurricane and 12 to 18…
Constituent loads in small streams: the process and problems of estimating sediment flux
R. B. Thomas
1989-01-01
Constituent loads in small streams are often estimated poorly. This is especially true for discharge-related constituents like sediment, since their flux is highly variable and mainly occurs during infrequent high-flow events. One reason for low-quality estimates is that most prevailing data collection methods ignore sampling probabilities and only partly account for...
Estimation and applications of size-based distributions in forestry
Jeffrey H. Gove
2003-01-01
Size-based distributions arise in several contexts in forestry and ecology. Simple power relationships (e.g., basal area and diameter at breast height) between variables are one such area of interest arising from a modeling perspective. Another, probability proportional to size sampline (PPS), is found in the most widely used methods for sampling standing or dead and...
Attentional Regulation in Young Twins with Probable Stuttering, High Nonfluency, and Typical Fluency
ERIC Educational Resources Information Center
Felsenfeld, Susan; van Beljsterveldt, Catharina Eugenie Maria; Boomsma, Dorret Irene
2010-01-01
Purpose: Using a sample of 20,445 Dutch twins, this study examined the relationship between speech fluency and attentional regulation in children. A secondary objective was to identify etiological overlap between nonfluency and poor attention using fluency-discordant twin pairs. Method: Three fluency groups were created at age 5 using a parent…
Aspergillus DNA contamination in blood collection tubes.
Harrison, Elizabeth; Stalhberger, Thomas; Whelan, Ruth; Sugrue, Michele; Wingard, John R; Alexander, Barbara D; Follett, Sarah A; Bowyer, Paul; Denning, David W
2010-08-01
Fungal polymerase chain reaction (PCR)-based diagnostic methods are at risk for contamination. Sample collection containers were investigated for fungal DNA contamination using real-time PCR assays. Up to 18% of blood collection tubes were contaminated with fungal DNA, probably Aspergillus fumigatus. Lower proportions of contamination in other vessels were observed. Copyright 2010 Elsevier Inc. All rights reserved.
Propagating probability distributions of stand variables using sequential Monte Carlo methods
Jeffrey H. Gove
2009-01-01
A general probabilistic approach to stand yield estimation is developed based on sequential Monte Carlo filters, also known as particle filters. The essential steps in the development of the sampling importance resampling (SIR) particle filter are presented. The SIR filter is then applied to simulated and observed data showing how the 'predictor - corrector'...
Importance and Effectiveness of Student Health Services at a South Texas University
ERIC Educational Resources Information Center
McCaig, Marilyn M.
2013-01-01
The study examined the health needs of students at a south Texas university and documented the utility of the student health center. The descriptive study employed a mixed methods explanatory sequential design (ESD). The non-probability sample consisted of 140 students who utilized the university's health center during the period of March 23-30,…
Application of Influence Diagrams in Identifying Soviet Satellite Missions
1990-12-01
Probabilities Comparison ......................... 58 35. Continuous Model Variables ............................ 59 36. Sample Inclination Data...diagramming is a method which allows the simple construction of a model to illustrate the interrelationships which exist among variables by capturing an...environmental monitoring systems. The module also contained an array of instruments for geophysical and astrophysical experimentation . 4.3.14.3 Soyuz. The Soyuz
Age of Alcohol-Dependence Onset: Associations with Severity of Dependence and Seeking Treatment
ERIC Educational Resources Information Center
Hingson, Ralph W.; Heeren, Timothy; Winter, Michael R.
2007-01-01
Objective: We explored whether people who become alcohol dependent at younger ages are more likely to seek alcohol-related help or treatment or experience chronic relapsing dependence. Methods: In 2001-2002 the National Institute on Alcohol Abuse and Alcoholism completed a face-to-face interview survey with a multistage probability sample of 43…
Are Ataques de Nerviosa in Puerto Rican Children Associated with Psychiatric Disorder?
ERIC Educational Resources Information Center
Guarnaccia, Peter J.; Martinez, Igda; Ramirez, Rafael; Canino, Glorisa
2005-01-01
Objective: To provide the first empirical analysis of a cultural syndrome in children by examining the prevalence and psychiatric correlates of ataques de nervios in an epidemiological study of the mental health of children in Puerto Rico. Method: Probability samples of caretakers of children 4-17 years old in the community (N = 1,892; response…
Sample Training Based Wildfire Segmentation by 2D Histogram θ-Division with Minimum Error
Dong, Erqian; Sun, Mingui; Jia, Wenyan; Zhang, Dengyi; Yuan, Zhiyong
2013-01-01
A novel wildfire segmentation algorithm is proposed with the help of sample training based 2D histogram θ-division and minimum error. Based on minimum error principle and 2D color histogram, the θ-division methods were presented recently, but application of prior knowledge on them has not been explored. For the specific problem of wildfire segmentation, we collect sample images with manually labeled fire pixels. Then we define the probability function of error division to evaluate θ-division segmentations, and the optimal angle θ is determined by sample training. Performances in different color channels are compared, and the suitable channel is selected. To further improve the accuracy, the combination approach is presented with both θ-division and other segmentation methods such as GMM. Our approach is tested on real images, and the experiments prove its efficiency for wildfire segmentation. PMID:23878526
New color-based tracking algorithm for joints of the upper extremities
NASA Astrophysics Data System (ADS)
Wu, Xiangping; Chow, Daniel H. K.; Zheng, Xiaoxiang
2007-11-01
To track the joints of the upper limb of stroke sufferers for rehabilitation assessment, a new tracking algorithm which utilizes a developed color-based particle filter and a novel strategy for handling occlusions is proposed in this paper. Objects are represented by their color histogram models and particle filter is introduced to track the objects within a probability framework. Kalman filter, as a local optimizer, is integrated into the sampling stage of the particle filter that steers samples to a region with high likelihood and therefore fewer samples is required. A color clustering method and anatomic constraints are used in dealing with occlusion problem. Compared with the general basic particle filtering method, the experimental results show that the new algorithm has reduced the number of samples and hence the computational consumption, and has achieved better abilities of handling complete occlusion over a few frames.
Nested Sampling for Bayesian Model Comparison in the Context of Salmonella Disease Dynamics
Dybowski, Richard; McKinley, Trevelyan J.; Mastroeni, Pietro; Restif, Olivier
2013-01-01
Understanding the mechanisms underlying the observed dynamics of complex biological systems requires the statistical assessment and comparison of multiple alternative models. Although this has traditionally been done using maximum likelihood-based methods such as Akaike's Information Criterion (AIC), Bayesian methods have gained in popularity because they provide more informative output in the form of posterior probability distributions. However, comparison between multiple models in a Bayesian framework is made difficult by the computational cost of numerical integration over large parameter spaces. A new, efficient method for the computation of posterior probabilities has recently been proposed and applied to complex problems from the physical sciences. Here we demonstrate how nested sampling can be used for inference and model comparison in biological sciences. We present a reanalysis of data from experimental infection of mice with Salmonella enterica showing the distribution of bacteria in liver cells. In addition to confirming the main finding of the original analysis, which relied on AIC, our approach provides: (a) integration across the parameter space, (b) estimation of the posterior parameter distributions (with visualisations of parameter correlations), and (c) estimation of the posterior predictive distributions for goodness-of-fit assessments of the models. The goodness-of-fit results suggest that alternative mechanistic models and a relaxation of the quasi-stationary assumption should be considered. PMID:24376528
Estimating Kinship in Admixed Populations
Thornton, Timothy; Tang, Hua; Hoffmann, Thomas J.; Ochs-Balcom, Heather M.; Caan, Bette J.; Risch, Neil
2012-01-01
Genome-wide association studies (GWASs) are commonly used for the mapping of genetic loci that influence complex traits. A problem that is often encountered in both population-based and family-based GWASs is that of identifying cryptic relatedness and population stratification because it is well known that failure to appropriately account for both pedigree and population structure can lead to spurious association. A number of methods have been proposed for identifying relatives in samples from homogeneous populations. A strong assumption of population homogeneity, however, is often untenable, and many GWASs include samples from structured populations. Here, we consider the problem of estimating relatedness in structured populations with admixed ancestry. We propose a method, REAP (relatedness estimation in admixed populations), for robust estimation of identity by descent (IBD)-sharing probabilities and kinship coefficients in admixed populations. REAP appropriately accounts for population structure and ancestry-related assortative mating by using individual-specific allele frequencies at SNPs that are calculated on the basis of ancestry derived from whole-genome analysis. In simulation studies with related individuals and admixture from highly divergent populations, we demonstrate that REAP gives accurate IBD-sharing probabilities and kinship coefficients. We apply REAP to the Mexican Americans in Los Angeles, California (MXL) population sample of release 3 of phase III of the International Haplotype Map Project; in this sample, we identify third- and fourth-degree relatives who have not previously been reported. We also apply REAP to the African American and Hispanic samples from the Women's Health Initiative SNP Health Association Resource (WHI-SHARe) study, in which hundreds of pairs of cryptically related individuals have been identified. PMID:22748210
DOE Office of Scientific and Technical Information (OSTI.GOV)
Martin, Peter R., E-mail: pmarti46@uwo.ca; Cool, Derek W.; Romagnoli, Cesare
2014-07-15
Purpose: Magnetic resonance imaging (MRI)-targeted, 3D transrectal ultrasound (TRUS)-guided “fusion” prostate biopsy intends to reduce the ∼23% false negative rate of clinical two-dimensional TRUS-guided sextant biopsy. Although it has been reported to double the positive yield, MRI-targeted biopsies continue to yield false negatives. Therefore, the authors propose to investigate how biopsy system needle delivery error affects the probability of sampling each tumor, by accounting for uncertainties due to guidance system error, image registration error, and irregular tumor shapes. Methods: T2-weighted, dynamic contrast-enhanced T1-weighted, and diffusion-weighted prostate MRI and 3D TRUS images were obtained from 49 patients. A radiologist and radiologymore » resident contoured 81 suspicious regions, yielding 3D tumor surfaces that were registered to the 3D TRUS images using an iterative closest point prostate surface-based method to yield 3D binary images of the suspicious regions in the TRUS context. The probabilityP of obtaining a sample of tumor tissue in one biopsy core was calculated by integrating a 3D Gaussian distribution over each suspicious region domain. Next, the authors performed an exhaustive search to determine the maximum root mean squared error (RMSE, in mm) of a biopsy system that gives P ≥ 95% for each tumor sample, and then repeated this procedure for equal-volume spheres corresponding to each tumor sample. Finally, the authors investigated the effect of probe-axis-direction error on measured tumor burden by studying the relationship between the error and estimated percentage of core involvement. Results: Given a 3.5 mm RMSE for contemporary fusion biopsy systems,P ≥ 95% for 21 out of 81 tumors. The authors determined that for a biopsy system with 3.5 mm RMSE, one cannot expect to sample tumors of approximately 1 cm{sup 3} or smaller with 95% probability with only one biopsy core. The predicted maximum RMSE giving P ≥ 95% for each tumor was consistently greater when using spherical tumor shapes as opposed to no shape assumption. However, an assumption of spherical tumor shape for RMSE = 3.5 mm led to a mean overestimation of tumor sampling probabilities of 3%, implying that assuming spherical tumor shape may be reasonable for many prostate tumors. The authors also determined that a biopsy system would need to have a RMS needle delivery error of no more than 1.6 mm in order to sample 95% of tumors with one core. The authors’ experiments also indicated that the effect of axial-direction error on the measured tumor burden was mitigated by the 18 mm core length at 3.5 mm RMSE. Conclusions: For biopsy systems with RMSE ≥ 3.5 mm, more than one biopsy core must be taken from the majority of tumors to achieveP ≥ 95%. These observations support the authors’ perspective that some tumors of clinically significant sizes may require more than one biopsy attempt in order to be sampled during the first biopsy session. This motivates the authors’ ongoing development of an approach to optimize biopsy plans with the aim of achieving a desired probability of obtaining a sample from each tumor, while minimizing the number of biopsies. Optimized planning of within-tumor targets for MRI-3D TRUS fusion biopsy could support earlier diagnosis of prostate cancer while it remains localized to the gland and curable.« less
Estimating occupancy and abundance using aerial images with imperfect detection
Williams, Perry J.; Hooten, Mevin B.; Womble, Jamie N.; Bower, Michael R.
2017-01-01
Species distribution and abundance are critical population characteristics for efficient management, conservation, and ecological insight. Point process models are a powerful tool for modelling distribution and abundance, and can incorporate many data types, including count data, presence-absence data, and presence-only data. Aerial photographic images are a natural tool for collecting data to fit point process models, but aerial images do not always capture all animals that are present at a site. Methods for estimating detection probability for aerial surveys usually include collecting auxiliary data to estimate the proportion of time animals are available to be detected.We developed an approach for fitting point process models using an N-mixture model framework to estimate detection probability for aerial occupancy and abundance surveys. Our method uses multiple aerial images taken of animals at the same spatial location to provide temporal replication of sample sites. The intersection of the images provide multiple counts of individuals at different times. We examined this approach using both simulated and real data of sea otters (Enhydra lutris kenyoni) in Glacier Bay National Park, southeastern Alaska.Using our proposed methods, we estimated detection probability of sea otters to be 0.76, the same as visual aerial surveys that have been used in the past. Further, simulations demonstrated that our approach is a promising tool for estimating occupancy, abundance, and detection probability from aerial photographic surveys.Our methods can be readily extended to data collected using unmanned aerial vehicles, as technology and regulations permit. The generality of our methods for other aerial surveys depends on how well surveys can be designed to meet the assumptions of N-mixture models.
A Bayesian predictive two-stage design for phase II clinical trials.
Sambucini, Valeria
2008-04-15
In this paper, we propose a Bayesian two-stage design for phase II clinical trials, which represents a predictive version of the single threshold design (STD) recently introduced by Tan and Machin. The STD two-stage sample sizes are determined specifying a minimum threshold for the posterior probability that the true response rate exceeds a pre-specified target value and assuming that the observed response rate is slightly higher than the target. Unlike the STD, we do not refer to a fixed experimental outcome, but take into account the uncertainty about future data. In both stages, the design aims to control the probability of getting a large posterior probability that the true response rate exceeds the target value. Such a probability is expressed in terms of prior predictive distributions of the data. The performance of the design is based on the distinction between analysis and design priors, recently introduced in the literature. The properties of the method are studied when all the design parameters vary.
Auxiliary Parameter MCMC for Exponential Random Graph Models
NASA Astrophysics Data System (ADS)
Byshkin, Maksym; Stivala, Alex; Mira, Antonietta; Krause, Rolf; Robins, Garry; Lomi, Alessandro
2016-11-01
Exponential random graph models (ERGMs) are a well-established family of statistical models for analyzing social networks. Computational complexity has so far limited the appeal of ERGMs for the analysis of large social networks. Efficient computational methods are highly desirable in order to extend the empirical scope of ERGMs. In this paper we report results of a research project on the development of snowball sampling methods for ERGMs. We propose an auxiliary parameter Markov chain Monte Carlo (MCMC) algorithm for sampling from the relevant probability distributions. The method is designed to decrease the number of allowed network states without worsening the mixing of the Markov chains, and suggests a new approach for the developments of MCMC samplers for ERGMs. We demonstrate the method on both simulated and actual (empirical) network data and show that it reduces CPU time for parameter estimation by an order of magnitude compared to current MCMC methods.
van den Bosch, Frank; Gottwald, Timothy R.; Alonso Chavez, Vasthi
2017-01-01
The spread of pathogens into new environments poses a considerable threat to human, animal, and plant health, and by extension, human and animal wellbeing, ecosystem function, and agricultural productivity, worldwide. Early detection through effective surveillance is a key strategy to reduce the risk of their establishment. Whilst it is well established that statistical and economic considerations are of vital importance when planning surveillance efforts, it is also important to consider epidemiological characteristics of the pathogen in question—including heterogeneities within the epidemiological system itself. One of the most pronounced realisations of this heterogeneity is seen in the case of vector-borne pathogens, which spread between ‘hosts’ and ‘vectors’—with each group possessing distinct epidemiological characteristics. As a result, an important question when planning surveillance for emerging vector-borne pathogens is where to place sampling resources in order to detect the pathogen as early as possible. We answer this question by developing a statistical function which describes the probability distributions of the prevalences of infection at first detection in both hosts and vectors. We also show how this method can be adapted in order to maximise the probability of early detection of an emerging pathogen within imposed sample size and/or cost constraints, and demonstrate its application using two simple models of vector-borne citrus pathogens. Under the assumption of a linear cost function, we find that sampling costs are generally minimised when either hosts or vectors, but not both, are sampled. PMID:28846676
Archaeal β diversity patterns under the seafloor along geochemical gradients
NASA Astrophysics Data System (ADS)
Koyano, Hitoshi; Tsubouchi, Taishi; Kishino, Hirohisa; Akutsu, Tatsuya
2014-09-01
Recently, deep drilling into the seafloor has revealed that there are vast sedimentary ecosystems of diverse microorganisms, particularly archaea, in subsurface areas. We investigated the β diversity patterns of archaeal communities in sediment layers under the seafloor and their determinants. This study was accomplished by analyzing large environmental samples of 16S ribosomal RNA gene sequences and various geochemical data collected from a sediment core of 365.3 m, obtained by drilling into the seafloor off the east coast of the Shimokita Peninsula. To extract the maximum amount of information from these environmental samples, we first developed a method for measuring β diversity using sequence data by applying probability theory on a set of strings developed by two of the authors in a previous publication. We introduced an index of β diversity between sequence populations from which the sequence data were sampled. We then constructed an estimator of the β diversity index based on the sequence data and demonstrated that it converges to the β diversity index between sequence populations with probability of 1 as the number of sampled sequences increases. Next, we applied this new method to quantify β diversities between archaeal sequence populations under the seafloor and constructed a quantitative model of the estimated β diversity patterns. Nearly 90% of the variation in the archaeal β diversity was explained by a model that included as variables the differences in the abundances of chlorine, iodine, and carbon between the sediment layers.
Wu, F Z; Ma, J; Hu, X N; Zeng, L
2015-02-01
The mealybug species Phenacoccus solenopsis (P. solenopsis) has caused much agricultural damage since its recent invasion in China. However, the source of this invasion remains unclear. This study uses molecular methods to clarify the relationships among different population of P. solenopsis from China, USA, Pakistan, India, and Vietnam to determine the geographic origin of the introduction of this species into China. P. solenopsis samples were collected from 25 different locations in three provinces of Southern China. Samples from the USA, Pakistan, and Vietnam were also obtained. Parts of the mitochondrial genes for cytochrome oxidase I (COI) were sequenced for each sample. Homologous DNA sequences of the samples from the USA and India were downloaded from Gen Bank. Two haplotypes were found in China. The first was from most samples from the Guangdong, Guangxi, and Hainan populations in the China and Pakistan groups, and the second from a few samples from the Guangdong, Guangxi, Hainan populations in the China, Pakistan, India, and Vietnam groups. As shown in the maximum likelihood of trees constructed using the COI sequences, these samples belonged to two clades. Phylogenetic analysis suggested that most P. solenopsis mealybugs in Southern China are probably closely related to populations in Pakistan. The variation, relationship, expansion, and probable geographic origin of P. solenopsis mealybugs in Southern China are also discussed.
Eskiyurt, Reyhan; Ozkan, Birgul
2017-01-01
This study was carried out to determine the reasons of the suicide probability and reasons for living of the inpatients hospitalized at the psychiatry clinic and to analyze the relationship between them. The sample of the study consisted of 192 patients who were hospitalized in psychiatric clinics between February and May 2016 and who agreed to participate in the study. In collecting data, personal information form, suicide probability scale (SPS), reasons for living inventory (RFL), and Beck's depression inventory (BDI) were used. Stepwise regression method was used to determine the factors that predict suicide probability. In the study, as a result of analyses made, the median score on the SPS was found 76.0, the median score on the RFL was found 137.0, the median score on the BDI of the patients was found 13.5, and it was found that patients with a high probability of suicide had less reasons for living and that their depression levels were very high. As a result of stepwise regression analysis, it was determined that suicidal ideation, reasons for living, maltreatment, education level, age, and income status were the predictors of suicide probability ( F = 61.125; P < 0.001). It was found that the patients who hospitalized in the psychiatric clinic have high suicide probability and the reasons of living are strong predictors of suicide probability in accordance with the literature.
Unification of field theory and maximum entropy methods for learning probability densities
NASA Astrophysics Data System (ADS)
Kinney, Justin B.
2015-09-01
The need to estimate smooth probability distributions (a.k.a. probability densities) from finite sampled data is ubiquitous in science. Many approaches to this problem have been described, but none is yet regarded as providing a definitive solution. Maximum entropy estimation and Bayesian field theory are two such approaches. Both have origins in statistical physics, but the relationship between them has remained unclear. Here I unify these two methods by showing that every maximum entropy density estimate can be recovered in the infinite smoothness limit of an appropriate Bayesian field theory. I also show that Bayesian field theory estimation can be performed without imposing any boundary conditions on candidate densities, and that the infinite smoothness limit of these theories recovers the most common types of maximum entropy estimates. Bayesian field theory thus provides a natural test of the maximum entropy null hypothesis and, furthermore, returns an alternative (lower entropy) density estimate when the maximum entropy hypothesis is falsified. The computations necessary for this approach can be performed rapidly for one-dimensional data, and software for doing this is provided.
Unification of field theory and maximum entropy methods for learning probability densities.
Kinney, Justin B
2015-09-01
The need to estimate smooth probability distributions (a.k.a. probability densities) from finite sampled data is ubiquitous in science. Many approaches to this problem have been described, but none is yet regarded as providing a definitive solution. Maximum entropy estimation and Bayesian field theory are two such approaches. Both have origins in statistical physics, but the relationship between them has remained unclear. Here I unify these two methods by showing that every maximum entropy density estimate can be recovered in the infinite smoothness limit of an appropriate Bayesian field theory. I also show that Bayesian field theory estimation can be performed without imposing any boundary conditions on candidate densities, and that the infinite smoothness limit of these theories recovers the most common types of maximum entropy estimates. Bayesian field theory thus provides a natural test of the maximum entropy null hypothesis and, furthermore, returns an alternative (lower entropy) density estimate when the maximum entropy hypothesis is falsified. The computations necessary for this approach can be performed rapidly for one-dimensional data, and software for doing this is provided.
NASA Astrophysics Data System (ADS)
Feng, X.; Sheng, Y.; Condon, A. J.; Paramygin, V. A.; Hall, T.
2012-12-01
A cost effective method, JPM-OS (Joint Probability Method with Optimal Sampling), for determining storm response and inundation return frequencies was developed and applied to quantify the hazard of hurricane storm surges and inundation along the Southwest FL,US coast (Condon and Sheng 2012). The JPM-OS uses piecewise multivariate regression splines coupled with dimension adaptive sparse grids to enable the generation of a base flood elevation (BFE) map. Storms are characterized by their landfall characteristics (pressure deficit, radius to maximum winds, forward speed, heading, and landfall location) and a sparse grid algorithm determines the optimal set of storm parameter combinations so that the inundation from any other storm parameter combination can be determined. The end result is a sample of a few hundred (197 for SW FL) optimal storms which are simulated using a dynamically coupled storm surge / wave modeling system CH3D-SSMS (Sheng et al. 2010). The limited historical climatology (1940 - 2009) is explored to develop probabilistic characterizations of the five storm parameters. The probability distributions are discretized and the inundation response of all parameter combinations is determined by the interpolation in five-dimensional space of the optimal storms. The surge response and the associated joint probability of the parameter combination is used to determine the flood elevation with a 1% annual probability of occurrence. The limited historical data constrains the accuracy of the PDFs of the hurricane characteristics, which in turn affect the accuracy of the BFE maps calculated. To offset the deficiency of limited historical dataset, this study presents a different method for producing coastal inundation maps. Instead of using the historical storm data, here we adopt 33,731 tracks that can represent the storm climatology in North Atlantic basin and SW Florida coasts. This large quantity of hurricane tracks is generated from a new statistical model which had been used for Western North Pacific (WNP) tropical cyclone (TC) genesis (Hall 2011) as well as North Atlantic tropical cyclone genesis (Hall and Jewson 2007). The introduction of these tracks complements the shortage of the historical samples and allows for more reliable PDFs required for implementation of JPM-OS. Using the 33,731 tracks and JPM-OS, an optimal storm ensemble is determined. This approach results in different storms/winds for storm surge and inundation modeling, and produces different Base Flood Elevation maps for coastal regions. Coastal inundation maps produced by the two different methods will be discussed in detail in the poster paper.
Liu, Datong; Peng, Yu; Peng, Xiyuan
2018-01-01
Effective anomaly detection of sensing data is essential for identifying potential system failures. Because they require no prior knowledge or accumulated labels, and provide uncertainty presentation, the probability prediction methods (e.g., Gaussian process regression (GPR) and relevance vector machine (RVM)) are especially adaptable to perform anomaly detection for sensing series. Generally, one key parameter of prediction models is coverage probability (CP), which controls the judging threshold of the testing sample and is generally set to a default value (e.g., 90% or 95%). There are few criteria to determine the optimal CP for anomaly detection. Therefore, this paper designs a graphic indicator of the receiver operating characteristic curve of prediction interval (ROC-PI) based on the definition of the ROC curve which can depict the trade-off between the PI width and PI coverage probability across a series of cut-off points. Furthermore, the Youden index is modified to assess the performance of different CPs, by the minimization of which the optimal CP is derived by the simulated annealing (SA) algorithm. Experiments conducted on two simulation datasets demonstrate the validity of the proposed method. Especially, an actual case study on sensing series from an on-orbit satellite illustrates its significant performance in practical application. PMID:29587372
Doidge, James C
2018-02-01
Population-based cohort studies are invaluable to health research because of the breadth of data collection over time, and the representativeness of their samples. However, they are especially prone to missing data, which can compromise the validity of analyses when data are not missing at random. Having many waves of data collection presents opportunity for participants' responsiveness to be observed over time, which may be informative about missing data mechanisms and thus useful as an auxiliary variable. Modern approaches to handling missing data such as multiple imputation and maximum likelihood can be difficult to implement with the large numbers of auxiliary variables and large amounts of non-monotone missing data that occur in cohort studies. Inverse probability-weighting can be easier to implement but conventional wisdom has stated that it cannot be applied to non-monotone missing data. This paper describes two methods of applying inverse probability-weighting to non-monotone missing data, and explores the potential value of including measures of responsiveness in either inverse probability-weighting or multiple imputation. Simulation studies are used to compare methods and demonstrate that responsiveness in longitudinal studies can be used to mitigate bias induced by missing data, even when data are not missing at random.
Improved Compressive Sensing of Natural Scenes Using Localized Random Sampling
Barranca, Victor J.; Kovačič, Gregor; Zhou, Douglas; Cai, David
2016-01-01
Compressive sensing (CS) theory demonstrates that by using uniformly-random sampling, rather than uniformly-spaced sampling, higher quality image reconstructions are often achievable. Considering that the structure of sampling protocols has such a profound impact on the quality of image reconstructions, we formulate a new sampling scheme motivated by physiological receptive field structure, localized random sampling, which yields significantly improved CS image reconstructions. For each set of localized image measurements, our sampling method first randomly selects an image pixel and then measures its nearby pixels with probability depending on their distance from the initially selected pixel. We compare the uniformly-random and localized random sampling methods over a large space of sampling parameters, and show that, for the optimal parameter choices, higher quality image reconstructions can be consistently obtained by using localized random sampling. In addition, we argue that the localized random CS optimal parameter choice is stable with respect to diverse natural images, and scales with the number of samples used for reconstruction. We expect that the localized random sampling protocol helps to explain the evolutionarily advantageous nature of receptive field structure in visual systems and suggests several future research areas in CS theory and its application to brain imaging. PMID:27555464
Analytical Algorithms to Quantify the Uncertainty in Remaining Useful Life Prediction
NASA Technical Reports Server (NTRS)
Sankararaman, Shankar; Saxena, Abhinav; Daigle, Matthew; Goebel, Kai
2013-01-01
This paper investigates the use of analytical algorithms to quantify the uncertainty in the remaining useful life (RUL) estimate of components used in aerospace applications. The prediction of RUL is affected by several sources of uncertainty and it is important to systematically quantify their combined effect by computing the uncertainty in the RUL prediction in order to aid risk assessment, risk mitigation, and decisionmaking. While sampling-based algorithms have been conventionally used for quantifying the uncertainty in RUL, analytical algorithms are computationally cheaper and sometimes, are better suited for online decision-making. While exact analytical algorithms are available only for certain special cases (for e.g., linear models with Gaussian variables), effective approximations can be made using the the first-order second moment method (FOSM), the first-order reliability method (FORM), and the inverse first-order reliability method (Inverse FORM). These methods can be used not only to calculate the entire probability distribution of RUL but also to obtain probability bounds on RUL. This paper explains these three methods in detail and illustrates them using the state-space model of a lithium-ion battery.
Multifractal diffusion entropy analysis: Optimal bin width of probability histograms
NASA Astrophysics Data System (ADS)
Jizba, Petr; Korbel, Jan
2014-11-01
In the framework of Multifractal Diffusion Entropy Analysis we propose a method for choosing an optimal bin-width in histograms generated from underlying probability distributions of interest. The method presented uses techniques of Rényi’s entropy and the mean squared error analysis to discuss the conditions under which the error in the multifractal spectrum estimation is minimal. We illustrate the utility of our approach by focusing on a scaling behavior of financial time series. In particular, we analyze the S&P500 stock index as sampled at a daily rate in the time period 1950-2013. In order to demonstrate a strength of the method proposed we compare the multifractal δ-spectrum for various bin-widths and show the robustness of the method, especially for large values of q. For such values, other methods in use, e.g., those based on moment estimation, tend to fail for heavy-tailed data or data with long correlations. Connection between the δ-spectrum and Rényi’s q parameter is also discussed and elucidated on a simple example of multiscale time series.
Martens, Brian K; DiGennaro, Florence D; Reed, Derek D; Szczech, Frances M; Rosenthal, Blair D
2008-01-01
Descriptive assessment methods have been used in applied settings to identify consequences for problem behavior, thereby aiding in the design of effective treatment programs. Consensus has not been reached, however, regarding the types of data or analytic strategies that are most useful for describing behavior–consequence relations. One promising approach involves the analysis of conditional probabilities from sequential recordings of behavior and events that follow its occurrence. In this paper we review several strategies for identifying contingent relations from conditional probabilities, and propose an alternative strategy known as a contingency space analysis (CSA). Step-by-step procedures for conducting and interpreting a CSA using sample data are presented, followed by discussion of the potential use of a CSA for conducting descriptive assessments, informing intervention design, and evaluating changes in reinforcement contingencies following treatment. PMID:18468280
Shao, Jing-Yuan; Qu, Hai-Bin; Gong, Xing-Chu
2018-05-01
In this work, two algorithms (overlapping method and the probability-based method) for design space calculation were compared by using the data collected from extraction process of Codonopsis Radix as an example. In the probability-based method, experimental error was simulated to calculate the probability of reaching the standard. The effects of several parameters on the calculated design space were studied, including simulation number, step length, and the acceptable probability threshold. For the extraction process of Codonopsis Radix, 10 000 times of simulation and 0.02 for the calculation step length can lead to a satisfactory design space. In general, the overlapping method is easy to understand, and can be realized by several kinds of commercial software without coding programs, but the reliability of the process evaluation indexes when operating in the design space is not indicated. Probability-based method is complex in calculation, but can provide the reliability to ensure that the process indexes can reach the standard within the acceptable probability threshold. In addition, there is no probability mutation in the edge of design space by probability-based method. Therefore, probability-based method is recommended for design space calculation. Copyright© by the Chinese Pharmaceutical Association.
Dunham, Jason B.; Chelgren, Nathan D.; Heck, Michael P.; Clark, Steven M.
2013-01-01
We evaluated the probability of detecting larval lampreys using different methods of backpack electrofishing in wadeable streams in the U.S. Pacific Northwest. Our primary objective was to compare capture of lampreys using electrofishing with standard settings for salmon and trout to settings specifically adapted for capture of lampreys. Field work consisted of removal sampling by means of backpack electrofishing in 19 sites in streams representing a broad range of conditions in the region. Captures of lampreys at these sites were analyzed with a modified removal-sampling model and Bayesian estimation to measure the relative odds of capture using the lamprey-specific settings compared with the standard salmonid settings. We found that the odds of capture were 2.66 (95% credible interval, 0.87–78.18) times greater for the lamprey-specific settings relative to standard salmonid settings. When estimates of capture probability were applied to estimating the probabilities of detection, we found high (>0.80) detectability when the actual number of lampreys in a site was greater than 10 individuals and effort was at least two passes of electrofishing, regardless of the settings used. Further work is needed to evaluate key assumptions in our approach, including the evaluation of individual-specific capture probabilities and population closure. For now our results suggest comparable results are possible for detection of lampreys by using backpack electrofishing with salmonid- or lamprey-specific settings.
Parsons, Tom
2008-01-01
Paleoearthquake observations often lack enough events at a given site to directly define a probability density function (PDF) for earthquake recurrence. Sites with fewer than 10-15 intervals do not provide enough information to reliably determine the shape of the PDF using standard maximum-likelihood techniques [e.g., Ellsworth et al., 1999]. In this paper I present a method that attempts to fit wide ranges of distribution parameters to short paleoseismic series. From repeated Monte Carlo draws, it becomes possible to quantitatively estimate most likely recurrence PDF parameters, and a ranked distribution of parameters is returned that can be used to assess uncertainties in hazard calculations. In tests on short synthetic earthquake series, the method gives results that cluster around the mean of the input distribution, whereas maximum likelihood methods return the sample means [e.g., NIST/SEMATECH, 2006]. For short series (fewer than 10 intervals), sample means tend to reflect the median of an asymmetric recurrence distribution, possibly leading to an overestimate of the hazard should they be used in probability calculations. Therefore a Monte Carlo approach may be useful for assessing recurrence from limited paleoearthquake records. Further, the degree of functional dependence among parameters like mean recurrence interval and coefficient of variation can be established. The method is described for use with time-independent and time-dependent PDF?s, and results from 19 paleoseismic sequences on strike-slip faults throughout the state of California are given.
Parsons, T.
2008-01-01
Paleoearthquake observations often lack enough events at a given site to directly define a probability density function (PDF) for earthquake recurrence. Sites with fewer than 10-15 intervals do not provide enough information to reliably determine the shape of the PDF using standard maximum-likelihood techniques (e.g., Ellsworth et al., 1999). In this paper I present a method that attempts to fit wide ranges of distribution parameters to short paleoseismic series. From repeated Monte Carlo draws, it becomes possible to quantitatively estimate most likely recurrence PDF parameters, and a ranked distribution of parameters is returned that can be used to assess uncertainties in hazard calculations. In tests on short synthetic earthquake series, the method gives results that cluster around the mean of the input distribution, whereas maximum likelihood methods return the sample means (e.g., NIST/SEMATECH, 2006). For short series (fewer than 10 intervals), sample means tend to reflect the median of an asymmetric recurrence distribution, possibly leading to an overestimate of the hazard should they be used in probability calculations. Therefore a Monte Carlo approach may be useful for assessing recurrence from limited paleoearthquake records. Further, the degree of functional dependence among parameters like mean recurrence interval and coefficient of variation can be established. The method is described for use with time-independent and time-dependent PDFs, and results from 19 paleoseismic sequences on strike-slip faults throughout the state of California are given.
Work Measurement as a Generalized Quantum Measurement
NASA Astrophysics Data System (ADS)
Roncaglia, Augusto J.; Cerisola, Federico; Paz, Juan Pablo
2014-12-01
We present a new method to measure the work w performed on a driven quantum system and to sample its probability distribution P (w ). The method is based on a simple fact that remained unnoticed until now: Work on a quantum system can be measured by performing a generalized quantum measurement at a single time. Such measurement, which technically speaking is denoted as a positive operator valued measure reduces to an ordinary projective measurement on an enlarged system. This observation not only demystifies work measurement but also suggests a new quantum algorithm to efficiently sample the distribution P (w ). This can be used, in combination with fluctuation theorems, to estimate free energies of quantum states on a quantum computer.
Quantifying recent erosion and sediment delivery using probability sampling: A case study
Jack Lewis
2002-01-01
Abstract - Estimates of erosion and sediment delivery have often relied on measurements from locations that were selected to be representative of particular terrain types. Such judgement samples are likely to overestimate or underestimate the mean of the quantity of interest. Probability sampling can eliminate the bias due to sample selection, and it permits the...