Sample records for statistical sampling methods

  1. Calculating Confidence, Uncertainty, and Numbers of Samples When Using Statistical Sampling Approaches to Characterize and Clear Contaminated Areas

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Piepel, Gregory F.; Matzke, Brett D.; Sego, Landon H.

    2013-04-27

    This report discusses the methodology, formulas, and inputs needed to make characterization and clearance decisions for Bacillus anthracis-contaminated and uncontaminated (or decontaminated) areas using a statistical sampling approach. Specifically, the report includes the methods and formulas for calculating the • number of samples required to achieve a specified confidence in characterization and clearance decisions • confidence in making characterization and clearance decisions for a specified number of samples for two common statistically based environmental sampling approaches. In particular, the report addresses an issue raised by the Government Accountability Office by providing methods and formulas to calculate the confidence that amore » decision area is uncontaminated (or successfully decontaminated) if all samples collected according to a statistical sampling approach have negative results. Key to addressing this topic is the probability that an individual sample result is a false negative, which is commonly referred to as the false negative rate (FNR). The two statistical sampling approaches currently discussed in this report are 1) hotspot sampling to detect small isolated contaminated locations during the characterization phase, and 2) combined judgment and random (CJR) sampling during the clearance phase. Typically if contamination is widely distributed in a decision area, it will be detectable via judgment sampling during the characterization phrase. Hotspot sampling is appropriate for characterization situations where contamination is not widely distributed and may not be detected by judgment sampling. CJR sampling is appropriate during the clearance phase when it is desired to augment judgment samples with statistical (random) samples. The hotspot and CJR statistical sampling approaches are discussed in the report for four situations: 1. qualitative data (detect and non-detect) when the FNR = 0 or when using statistical sampling methods that account for FNR > 0 2. qualitative data when the FNR > 0 but statistical sampling methods are used that assume the FNR = 0 3. quantitative data (e.g., contaminant concentrations expressed as CFU/cm2) when the FNR = 0 or when using statistical sampling methods that account for FNR > 0 4. quantitative data when the FNR > 0 but statistical sampling methods are used that assume the FNR = 0. For Situation 2, the hotspot sampling approach provides for stating with Z% confidence that a hotspot of specified shape and size with detectable contamination will be found. Also for Situation 2, the CJR approach provides for stating with X% confidence that at least Y% of the decision area does not contain detectable contamination. Forms of these statements for the other three situations are discussed in Section 2.2. Statistical methods that account for FNR > 0 currently only exist for the hotspot sampling approach with qualitative data (or quantitative data converted to qualitative data). This report documents the current status of methods and formulas for the hotspot and CJR sampling approaches. Limitations of these methods are identified. Extensions of the methods that are applicable when FNR = 0 to account for FNR > 0, or to address other limitations, will be documented in future revisions of this report if future funding supports the development of such extensions. For quantitative data, this report also presents statistical methods and formulas for 1. quantifying the uncertainty in measured sample results 2. estimating the true surface concentration corresponding to a surface sample 3. quantifying the uncertainty of the estimate of the true surface concentration. All of the methods and formulas discussed in the report were applied to example situations to illustrate application of the methods and interpretation of the results.« less

  2. Time Series Analysis Based on Running Mann Whitney Z Statistics

    USDA-ARS?s Scientific Manuscript database

    A sensitive and objective time series analysis method based on the calculation of Mann Whitney U statistics is described. This method samples data rankings over moving time windows, converts those samples to Mann-Whitney U statistics, and then normalizes the U statistics to Z statistics using Monte-...

  3. The Utility of Robust Means in Statistics

    ERIC Educational Resources Information Center

    Goodwyn, Fara

    2012-01-01

    Location estimates calculated from heuristic data were examined using traditional and robust statistical methods. The current paper demonstrates the impact outliers have on the sample mean and proposes robust methods to control for outliers in sample data. Traditional methods fail because they rely on the statistical assumptions of normality and…

  4. A review of empirical research related to the use of small quantitative samples in clinical outcome scale development.

    PubMed

    Houts, Carrie R; Edwards, Michael C; Wirth, R J; Deal, Linda S

    2016-11-01

    There has been a notable increase in the advocacy of using small-sample designs as an initial quantitative assessment of item and scale performance during the scale development process. This is particularly true in the development of clinical outcome assessments (COAs), where Rasch analysis has been advanced as an appropriate statistical tool for evaluating the developing COAs using a small sample. We review the benefits such methods are purported to offer from both a practical and statistical standpoint and detail several problematic areas, including both practical and statistical theory concerns, with respect to the use of quantitative methods, including Rasch-consistent methods, with small samples. The feasibility of obtaining accurate information and the potential negative impacts of misusing large-sample statistical methods with small samples during COA development are discussed.

  5. Introducing 3D U-statistic method for separating anomaly from background in exploration geochemical data with associated software development

    NASA Astrophysics Data System (ADS)

    Ghannadpour, Seyyed Saeed; Hezarkhani, Ardeshir

    2016-03-01

    The U-statistic method is one of the most important structural methods to separate the anomaly from the background. It considers the location of samples and carries out the statistical analysis of the data without judging from a geochemical point of view and tries to separate subpopulations and determine anomalous areas. In the present study, to use U-statistic method in three-dimensional (3D) condition, U-statistic is applied on the grade of two ideal test examples, by considering sample Z values (elevation). So far, this is the first time that this method has been applied on a 3D condition. To evaluate the performance of 3D U-statistic method and in order to compare U-statistic with one non-structural method, the method of threshold assessment based on median and standard deviation (MSD method) is applied on the two example tests. Results show that the samples indicated by U-statistic method as anomalous are more regular and involve less dispersion than those indicated by the MSD method. So that, according to the location of anomalous samples, denser areas of them can be determined as promising zones. Moreover, results show that at a threshold of U = 0, the total error of misclassification for U-statistic method is much smaller than the total error of criteria of bar {x}+n× s. Finally, 3D model of two test examples for separating anomaly from background using 3D U-statistic method is provided. The source code for a software program, which was developed in the MATLAB programming language in order to perform the calculations of the 3D U-spatial statistic method, is additionally provided. This software is compatible with all the geochemical varieties and can be used in similar exploration projects.

  6. [Statistical prediction methods in violence risk assessment and its application].

    PubMed

    Liu, Yuan-Yuan; Hu, Jun-Mei; Yang, Min; Li, Xiao-Song

    2013-06-01

    It is an urgent global problem how to improve the violence risk assessment. As a necessary part of risk assessment, statistical methods have remarkable impacts and effects. In this study, the predicted methods in violence risk assessment from the point of statistics are reviewed. The application of Logistic regression as the sample of multivariate statistical model, decision tree model as the sample of data mining technique, and neural networks model as the sample of artificial intelligence technology are all reviewed. This study provides data in order to contribute the further research of violence risk assessment.

  7. Some connections between importance sampling and enhanced sampling methods in molecular dynamics.

    PubMed

    Lie, H C; Quer, J

    2017-11-21

    In molecular dynamics, enhanced sampling methods enable the collection of better statistics of rare events from a reference or target distribution. We show that a large class of these methods is based on the idea of importance sampling from mathematical statistics. We illustrate this connection by comparing the Hartmann-Schütte method for rare event simulation (J. Stat. Mech. Theor. Exp. 2012, P11004) and the Valsson-Parrinello method of variationally enhanced sampling [Phys. Rev. Lett. 113, 090601 (2014)]. We use this connection in order to discuss how recent results from the Monte Carlo methods literature can guide the development of enhanced sampling methods.

  8. Some connections between importance sampling and enhanced sampling methods in molecular dynamics

    NASA Astrophysics Data System (ADS)

    Lie, H. C.; Quer, J.

    2017-11-01

    In molecular dynamics, enhanced sampling methods enable the collection of better statistics of rare events from a reference or target distribution. We show that a large class of these methods is based on the idea of importance sampling from mathematical statistics. We illustrate this connection by comparing the Hartmann-Schütte method for rare event simulation (J. Stat. Mech. Theor. Exp. 2012, P11004) and the Valsson-Parrinello method of variationally enhanced sampling [Phys. Rev. Lett. 113, 090601 (2014)]. We use this connection in order to discuss how recent results from the Monte Carlo methods literature can guide the development of enhanced sampling methods.

  9. Quantitative Methods for Analysing Joint Questionnaire Data: Exploring the Role of Joint in Force Design

    DTIC Science & Technology

    2015-08-01

    the nine questions. The Statistical Package for the Social Sciences ( SPSS ) [11] was used to conduct statistical analysis on the sample. Two types...constructs. SPSS was again used to conduct statistical analysis on the sample. This time factor analysis was conducted. Factor analysis attempts to...Business Research Methods and Statistics using SPSS . P432. 11 IBM SPSS Statistics . (2012) 12 Burns, R.B., Burns, R.A. (2008) ‘Business Research

  10. Establishing Statistical Equivalence of Data from Different Sampling Approaches for Assessment of Bacterial Phenotypic Antimicrobial Resistance

    PubMed Central

    2018-01-01

    ABSTRACT To assess phenotypic bacterial antimicrobial resistance (AMR) in different strata (e.g., host populations, environmental areas, manure, or sewage effluents) for epidemiological purposes, isolates of target bacteria can be obtained from a stratum using various sample types. Also, different sample processing methods can be applied. The MIC of each target antimicrobial drug for each isolate is measured. Statistical equivalence testing of the MIC data for the isolates allows evaluation of whether different sample types or sample processing methods yield equivalent estimates of the bacterial antimicrobial susceptibility in the stratum. We demonstrate this approach on the antimicrobial susceptibility estimates for (i) nontyphoidal Salmonella spp. from ground or trimmed meat versus cecal content samples of cattle in processing plants in 2013-2014 and (ii) nontyphoidal Salmonella spp. from urine, fecal, and blood human samples in 2015 (U.S. National Antimicrobial Resistance Monitoring System data). We found that the sample types for cattle yielded nonequivalent susceptibility estimates for several antimicrobial drug classes and thus may gauge distinct subpopulations of salmonellae. The quinolone and fluoroquinolone susceptibility estimates for nontyphoidal salmonellae from human blood are nonequivalent to those from urine or feces, conjecturally due to the fluoroquinolone (ciprofloxacin) use to treat infections caused by nontyphoidal salmonellae. We also demonstrate statistical equivalence testing for comparing sample processing methods for fecal samples (culturing one versus multiple aliquots per sample) to assess AMR in fecal Escherichia coli. These methods yield equivalent results, except for tetracyclines. Importantly, statistical equivalence testing provides the MIC difference at which the data from two sample types or sample processing methods differ statistically. Data users (e.g., microbiologists and epidemiologists) may then interpret practical relevance of the difference. IMPORTANCE Bacterial antimicrobial resistance (AMR) needs to be assessed in different populations or strata for the purposes of surveillance and determination of the efficacy of interventions to halt AMR dissemination. To assess phenotypic antimicrobial susceptibility, isolates of target bacteria can be obtained from a stratum using different sample types or employing different sample processing methods in the laboratory. The MIC of each target antimicrobial drug for each of the isolates is measured, yielding the MIC distribution across the isolates from each sample type or sample processing method. We describe statistical equivalence testing for the MIC data for evaluating whether two sample types or sample processing methods yield equivalent estimates of the bacterial phenotypic antimicrobial susceptibility in the stratum. This includes estimating the MIC difference at which the data from the two approaches differ statistically. Data users (e.g., microbiologists, epidemiologists, and public health professionals) can then interpret whether that present difference is practically relevant. PMID:29475868

  11. Establishing Statistical Equivalence of Data from Different Sampling Approaches for Assessment of Bacterial Phenotypic Antimicrobial Resistance.

    PubMed

    Shakeri, Heman; Volkova, Victoriya; Wen, Xuesong; Deters, Andrea; Cull, Charley; Drouillard, James; Müller, Christian; Moradijamei, Behnaz; Jaberi-Douraki, Majid

    2018-05-01

    To assess phenotypic bacterial antimicrobial resistance (AMR) in different strata (e.g., host populations, environmental areas, manure, or sewage effluents) for epidemiological purposes, isolates of target bacteria can be obtained from a stratum using various sample types. Also, different sample processing methods can be applied. The MIC of each target antimicrobial drug for each isolate is measured. Statistical equivalence testing of the MIC data for the isolates allows evaluation of whether different sample types or sample processing methods yield equivalent estimates of the bacterial antimicrobial susceptibility in the stratum. We demonstrate this approach on the antimicrobial susceptibility estimates for (i) nontyphoidal Salmonella spp. from ground or trimmed meat versus cecal content samples of cattle in processing plants in 2013-2014 and (ii) nontyphoidal Salmonella spp. from urine, fecal, and blood human samples in 2015 (U.S. National Antimicrobial Resistance Monitoring System data). We found that the sample types for cattle yielded nonequivalent susceptibility estimates for several antimicrobial drug classes and thus may gauge distinct subpopulations of salmonellae. The quinolone and fluoroquinolone susceptibility estimates for nontyphoidal salmonellae from human blood are nonequivalent to those from urine or feces, conjecturally due to the fluoroquinolone (ciprofloxacin) use to treat infections caused by nontyphoidal salmonellae. We also demonstrate statistical equivalence testing for comparing sample processing methods for fecal samples (culturing one versus multiple aliquots per sample) to assess AMR in fecal Escherichia coli These methods yield equivalent results, except for tetracyclines. Importantly, statistical equivalence testing provides the MIC difference at which the data from two sample types or sample processing methods differ statistically. Data users (e.g., microbiologists and epidemiologists) may then interpret practical relevance of the difference. IMPORTANCE Bacterial antimicrobial resistance (AMR) needs to be assessed in different populations or strata for the purposes of surveillance and determination of the efficacy of interventions to halt AMR dissemination. To assess phenotypic antimicrobial susceptibility, isolates of target bacteria can be obtained from a stratum using different sample types or employing different sample processing methods in the laboratory. The MIC of each target antimicrobial drug for each of the isolates is measured, yielding the MIC distribution across the isolates from each sample type or sample processing method. We describe statistical equivalence testing for the MIC data for evaluating whether two sample types or sample processing methods yield equivalent estimates of the bacterial phenotypic antimicrobial susceptibility in the stratum. This includes estimating the MIC difference at which the data from the two approaches differ statistically. Data users (e.g., microbiologists, epidemiologists, and public health professionals) can then interpret whether that present difference is practically relevant. Copyright © 2018 Shakeri et al.

  12. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Fangyan; Zhang, Song; Chung Wong, Pak

    Effectively visualizing large graphs and capturing the statistical properties are two challenging tasks. To aid in these two tasks, many sampling approaches for graph simplification have been proposed, falling into three categories: node sampling, edge sampling, and traversal-based sampling. It is still unknown which approach is the best. We evaluate commonly used graph sampling methods through a combined visual and statistical comparison of graphs sampled at various rates. We conduct our evaluation on three graph models: random graphs, small-world graphs, and scale-free graphs. Initial results indicate that the effectiveness of a sampling method is dependent on the graph model, themore » size of the graph, and the desired statistical property. This benchmark study can be used as a guideline in choosing the appropriate method for a particular graph sampling task, and the results presented can be incorporated into graph visualization and analysis tools.« less

  13. A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis.

    PubMed

    Lin, Johnny; Bentler, Peter M

    2012-01-01

    Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne's asymptotically distribution-free method and Satorra Bentler's mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler's statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby's study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic.

  14. Effect of the absolute statistic on gene-sampling gene-set analysis methods.

    PubMed

    Nam, Dougu

    2017-06-01

    Gene-set enrichment analysis and its modified versions have commonly been used for identifying altered functions or pathways in disease from microarray data. In particular, the simple gene-sampling gene-set analysis methods have been heavily used for datasets with only a few sample replicates. The biggest problem with this approach is the highly inflated false-positive rate. In this paper, the effect of absolute gene statistic on gene-sampling gene-set analysis methods is systematically investigated. Thus far, the absolute gene statistic has merely been regarded as a supplementary method for capturing the bidirectional changes in each gene set. Here, it is shown that incorporating the absolute gene statistic in gene-sampling gene-set analysis substantially reduces the false-positive rate and improves the overall discriminatory ability. Its effect was investigated by power, false-positive rate, and receiver operating curve for a number of simulated and real datasets. The performances of gene-set analysis methods in one-tailed (genome-wide association study) and two-tailed (gene expression data) tests were also compared and discussed.

  15. Valid statistical inference methods for a case-control study with missing data.

    PubMed

    Tian, Guo-Liang; Zhang, Chi; Jiang, Xuejun

    2018-04-01

    The main objective of this paper is to derive the valid sampling distribution of the observed counts in a case-control study with missing data under the assumption of missing at random by employing the conditional sampling method and the mechanism augmentation method. The proposed sampling distribution, called the case-control sampling distribution, can be used to calculate the standard errors of the maximum likelihood estimates of parameters via the Fisher information matrix and to generate independent samples for constructing small-sample bootstrap confidence intervals. Theoretical comparisons of the new case-control sampling distribution with two existing sampling distributions exhibit a large difference. Simulations are conducted to investigate the influence of the three different sampling distributions on statistical inferences. One finding is that the conclusion by the Wald test for testing independency under the two existing sampling distributions could be completely different (even contradictory) from the Wald test for testing the equality of the success probabilities in control/case groups under the proposed distribution. A real cervical cancer data set is used to illustrate the proposed statistical methods.

  16. STATISTICAL SAMPLING AND DATA ANALYSIS

    EPA Science Inventory

    Research is being conducted to develop approaches to improve soil and sediment sampling techniques, measurement design and geostatistics, and data analysis via chemometric, environmetric, and robust statistical methods. Improvements in sampling contaminated soil and other hetero...

  17. A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis

    PubMed Central

    Lin, Johnny; Bentler, Peter M.

    2012-01-01

    Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne’s asymptotically distribution-free method and Satorra Bentler’s mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler’s statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby’s study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic. PMID:23144511

  18. Multi-Reader ROC studies with Split-Plot Designs: A Comparison of Statistical Methods

    PubMed Central

    Obuchowski, Nancy A.; Gallas, Brandon D.; Hillis, Stephen L.

    2012-01-01

    Rationale and Objectives Multi-reader imaging trials often use a factorial design, where study patients undergo testing with all imaging modalities and readers interpret the results of all tests for all patients. A drawback of the design is the large number of interpretations required of each reader. Split-plot designs have been proposed as an alternative, in which one or a subset of readers interprets all images of a sample of patients, while other readers interpret the images of other samples of patients. In this paper we compare three methods of analysis for the split-plot design. Materials and Methods Three statistical methods are presented: Obuchowski-Rockette method modified for the split-plot design, a newly proposed marginal-mean ANOVA approach, and an extension of the three-sample U-statistic method. A simulation study using the Roe-Metz model was performed to compare the type I error rate, power and confidence interval coverage of the three test statistics. Results The type I error rates for all three methods are close to the nominal level but tend to be slightly conservative. The statistical power is nearly identical for the three methods. The coverage of 95% CIs fall close to the nominal coverage for small and large sample sizes. Conclusions The split-plot MRMC study design can be statistically efficient compared with the factorial design, reducing the number of interpretations required per reader. Three methods of analysis, shown to have nominal type I error rate, similar power, and nominal CI coverage, are available for this study design. PMID:23122570

  19. A Comparison of EPI Sampling, Probability Sampling, and Compact Segment Sampling Methods for Micro and Small Enterprises

    PubMed Central

    Chao, Li-Wei; Szrek, Helena; Peltzer, Karl; Ramlagan, Shandir; Fleming, Peter; Leite, Rui; Magerman, Jesswill; Ngwenya, Godfrey B.; Pereira, Nuno Sousa; Behrman, Jere

    2011-01-01

    Finding an efficient method for sampling micro- and small-enterprises (MSEs) for research and statistical reporting purposes is a challenge in developing countries, where registries of MSEs are often nonexistent or outdated. This lack of a sampling frame creates an obstacle in finding a representative sample of MSEs. This study uses computer simulations to draw samples from a census of businesses and non-businesses in the Tshwane Municipality of South Africa, using three different sampling methods: the traditional probability sampling method, the compact segment sampling method, and the World Health Organization’s Expanded Programme on Immunization (EPI) sampling method. Three mechanisms by which the methods could differ are tested, the proximity selection of respondents, the at-home selection of respondents, and the use of inaccurate probability weights. The results highlight the importance of revisits and accurate probability weights, but the lesser effect of proximity selection on the samples’ statistical properties. PMID:22582004

  20. Comparison of the statistics of salmonella testing of chilled broiler chicken carcasses by whole carcass rinse and neck skin excision

    USDA-ARS?s Scientific Manuscript database

    Whether a required Salmonella test series is passed or failed depends not only on the presence of the bacteria, but also on the methods for taking samples, the methods for culturing samples, and the statistics associated with the sampling plan. The pass-fail probabilities of the two-class attribute...

  1. A robust and efficient statistical method for genetic association studies using case and control samples from multiple cohorts

    PubMed Central

    2013-01-01

    Background The theoretical basis of genome-wide association studies (GWAS) is statistical inference of linkage disequilibrium (LD) between any polymorphic marker and a putative disease locus. Most methods widely implemented for such analyses are vulnerable to several key demographic factors and deliver a poor statistical power for detecting genuine associations and also a high false positive rate. Here, we present a likelihood-based statistical approach that accounts properly for non-random nature of case–control samples in regard of genotypic distribution at the loci in populations under study and confers flexibility to test for genetic association in presence of different confounding factors such as population structure, non-randomness of samples etc. Results We implemented this novel method together with several popular methods in the literature of GWAS, to re-analyze recently published Parkinson’s disease (PD) case–control samples. The real data analysis and computer simulation show that the new method confers not only significantly improved statistical power for detecting the associations but also robustness to the difficulties stemmed from non-randomly sampling and genetic structures when compared to its rivals. In particular, the new method detected 44 significant SNPs within 25 chromosomal regions of size < 1 Mb but only 6 SNPs in two of these regions were previously detected by the trend test based methods. It discovered two SNPs located 1.18 Mb and 0.18 Mb from the PD candidates, FGF20 and PARK8, without invoking false positive risk. Conclusions We developed a novel likelihood-based method which provides adequate estimation of LD and other population model parameters by using case and control samples, the ease in integration of these samples from multiple genetically divergent populations and thus confers statistically robust and powerful analyses of GWAS. On basis of simulation studies and analysis of real datasets, we demonstrated significant improvement of the new method over the non-parametric trend test, which is the most popularly implemented in the literature of GWAS. PMID:23394771

  2. Statistical scaling of geometric characteristics in stochastically generated pore microstructures

    DOE PAGES

    Hyman, Jeffrey D.; Guadagnini, Alberto; Winter, C. Larrabee

    2015-05-21

    In this study, we analyze the statistical scaling of structural attributes of virtual porous microstructures that are stochastically generated by thresholding Gaussian random fields. Characterization of the extent at which randomly generated pore spaces can be considered as representative of a particular rock sample depends on the metrics employed to compare the virtual sample against its physical counterpart. Typically, comparisons against features and/patterns of geometric observables, e.g., porosity and specific surface area, flow-related macroscopic parameters, e.g., permeability, or autocorrelation functions are used to assess the representativeness of a virtual sample, and thereby the quality of the generation method. Here, wemore » rely on manifestations of statistical scaling of geometric observables which were recently observed in real millimeter scale rock samples [13] as additional relevant metrics by which to characterize a virtual sample. We explore the statistical scaling of two geometric observables, namely porosity (Φ) and specific surface area (SSA), of porous microstructures generated using the method of Smolarkiewicz and Winter [42] and Hyman and Winter [22]. Our results suggest that the method can produce virtual pore space samples displaying the symptoms of statistical scaling observed in real rock samples. Order q sample structure functions (statistical moments of absolute increments) of Φ and SSA scale as a power of the separation distance (lag) over a range of lags, and extended self-similarity (linear relationship between log structure functions of successive orders) appears to be an intrinsic property of the generated media. The width of the range of lags where power-law scaling is observed and the Hurst coefficient associated with the variables we consider can be controlled by the generation parameters of the method.« less

  3. Managing Clustered Data Using Hierarchical Linear Modeling

    ERIC Educational Resources Information Center

    Warne, Russell T.; Li, Yan; McKyer, E. Lisako J.; Condie, Rachel; Diep, Cassandra S.; Murano, Peter S.

    2012-01-01

    Researchers in nutrition research often use cluster or multistage sampling to gather participants for their studies. These sampling methods often produce violations of the assumption of data independence that most traditional statistics share. Hierarchical linear modeling is a statistical method that can overcome violations of the independence…

  4. Multi-reader ROC studies with split-plot designs: a comparison of statistical methods.

    PubMed

    Obuchowski, Nancy A; Gallas, Brandon D; Hillis, Stephen L

    2012-12-01

    Multireader imaging trials often use a factorial design, in which study patients undergo testing with all imaging modalities and readers interpret the results of all tests for all patients. A drawback of this design is the large number of interpretations required of each reader. Split-plot designs have been proposed as an alternative, in which one or a subset of readers interprets all images of a sample of patients, while other readers interpret the images of other samples of patients. In this paper, the authors compare three methods of analysis for the split-plot design. Three statistical methods are presented: the Obuchowski-Rockette method modified for the split-plot design, a newly proposed marginal-mean analysis-of-variance approach, and an extension of the three-sample U-statistic method. A simulation study using the Roe-Metz model was performed to compare the type I error rate, power, and confidence interval coverage of the three test statistics. The type I error rates for all three methods are close to the nominal level but tend to be slightly conservative. The statistical power is nearly identical for the three methods. The coverage of 95% confidence intervals falls close to the nominal coverage for small and large sample sizes. The split-plot multireader, multicase study design can be statistically efficient compared to the factorial design, reducing the number of interpretations required per reader. Three methods of analysis, shown to have nominal type I error rates, similar power, and nominal confidence interval coverage, are available for this study design. Copyright © 2012 AUR. All rights reserved.

  5. Methods for flexible sample-size design in clinical trials: Likelihood, weighted, dual test, and promising zone approaches.

    PubMed

    Shih, Weichung Joe; Li, Gang; Wang, Yining

    2016-03-01

    Sample size plays a crucial role in clinical trials. Flexible sample-size designs, as part of the more general category of adaptive designs that utilize interim data, have been a popular topic in recent years. In this paper, we give a comparative review of four related methods for such a design. The likelihood method uses the likelihood ratio test with an adjusted critical value. The weighted method adjusts the test statistic with given weights rather than the critical value. The dual test method requires both the likelihood ratio statistic and the weighted statistic to be greater than the unadjusted critical value. The promising zone approach uses the likelihood ratio statistic with the unadjusted value and other constraints. All four methods preserve the type-I error rate. In this paper we explore their properties and compare their relationships and merits. We show that the sample size rules for the dual test are in conflict with the rules of the promising zone approach. We delineate what is necessary to specify in the study protocol to ensure the validity of the statistical procedure and what can be kept implicit in the protocol so that more flexibility can be attained for confirmatory phase III trials in meeting regulatory requirements. We also prove that under mild conditions, the likelihood ratio test still preserves the type-I error rate when the actual sample size is larger than the re-calculated one. Copyright © 2015 Elsevier Inc. All rights reserved.

  6. APA's Learning Objectives for Research Methods and Statistics in Practice: A Multimethod Analysis

    ERIC Educational Resources Information Center

    Tomcho, Thomas J.; Rice, Diana; Foels, Rob; Folmsbee, Leah; Vladescu, Jason; Lissman, Rachel; Matulewicz, Ryan; Bopp, Kara

    2009-01-01

    Research methods and statistics courses constitute a core undergraduate psychology requirement. We analyzed course syllabi and faculty self-reported coverage of both research methods and statistics course learning objectives to assess the concordance with APA's learning objectives (American Psychological Association, 2007). We obtained a sample of…

  7. Statistical Methods in Assembly Quality Management of Multi-Element Products on Automatic Rotor Lines

    NASA Astrophysics Data System (ADS)

    Pries, V. V.; Proskuriakov, N. E.

    2018-04-01

    To control the assembly quality of multi-element mass-produced products on automatic rotor lines, control methods with operational feedback are required. However, due to possible failures in the operation of the devices and systems of automatic rotor line, there is always a real probability of getting defective (incomplete) products into the output process stream. Therefore, a continuous sampling control of the products completeness, based on the use of statistical methods, remains an important element in managing the quality of assembly of multi-element mass products on automatic rotor lines. The feature of continuous sampling control of the multi-element products completeness in the assembly process is its breaking sort, which excludes the possibility of returning component parts after sampling control to the process stream and leads to a decrease in the actual productivity of the assembly equipment. Therefore, the use of statistical procedures for continuous sampling control of the multi-element products completeness when assembled on automatic rotor lines requires the use of such sampling plans that ensure a minimum size of control samples. Comparison of the values of the limit of the average output defect level for the continuous sampling plan (CSP) and for the automated continuous sampling plan (ACSP) shows the possibility of providing lower limit values for the average output defects level using the ACSP-1. Also, the average sample size when using the ACSP-1 plan is less than when using the CSP-1 plan. Thus, the application of statistical methods in the assembly quality management of multi-element products on automatic rotor lines, involving the use of proposed plans and methods for continuous selective control, will allow to automating sampling control procedures and the required level of quality of assembled products while minimizing sample size.

  8. Determination of polarimetric parameters of honey by near-infrared transflectance spectroscopy.

    PubMed

    García-Alvarez, M; Ceresuela, S; Huidobro, J F; Hermida, M; Rodríguez-Otero, J L

    2002-01-30

    NIR transflectance spectroscopy was used to determine polarimetric parameters (direct polarization, polarization after inversion, specific rotation in dry matter, and polarization due to nonmonosaccharides) and sucrose in honey. In total, 156 honey samples were collected during 1992 (45 samples), 1995 (56 samples), and 1996 (55 samples). Samples were analyzed by NIR spectroscopy and polarimetric methods. Calibration (118 samples) and validation (38 samples) sets were made up; honeys from the three years were included in both sets. Calibrations were performed by modified partial least-squares regression and scatter correction by standard normal variation and detrend methods. For direct polarization, polarization after inversion, specific rotation in dry matter, and polarization due to nonmonosaccharides, good statistics (bias, SEV, and R(2)) were obtained for the validation set, and no statistically (p = 0.05) significant differences were found between instrumental and polarimetric methods for these parameters. Statistical data for sucrose were not as good as those of the other parameters. Therefore, NIR spectroscopy is not an effective method for quantitative analysis of sucrose in these honey samples. However, NIR spectroscopy may be an acceptable method for semiquantitative evaluation of sucrose for honeys, such as those in our study, containing up to 3% of sucrose. Further work is necessary to validate the uncertainty at higher levels.

  9. The study of combining Latin Hypercube Sampling method and LU decomposition method (LULHS method) for constructing spatial random field

    NASA Astrophysics Data System (ADS)

    WANG, P. T.

    2015-12-01

    Groundwater modeling requires to assign hydrogeological properties to every numerical grid. Due to the lack of detailed information and the inherent spatial heterogeneity, geological properties can be treated as random variables. Hydrogeological property is assumed to be a multivariate distribution with spatial correlations. By sampling random numbers from a given statistical distribution and assigning a value to each grid, a random field for modeling can be completed. Therefore, statistics sampling plays an important role in the efficiency of modeling procedure. Latin Hypercube Sampling (LHS) is a stratified random sampling procedure that provides an efficient way to sample variables from their multivariate distributions. This study combines the the stratified random procedure from LHS and the simulation by using LU decomposition to form LULHS. Both conditional and unconditional simulations of LULHS were develpoed. The simulation efficiency and spatial correlation of LULHS are compared to the other three different simulation methods. The results show that for the conditional simulation and unconditional simulation, LULHS method is more efficient in terms of computational effort. Less realizations are required to achieve the required statistical accuracy and spatial correlation.

  10. Enhancing the Equating of Item Difficulty Metrics: Estimation of Reference Distribution. Research Report. ETS RR-14-07

    ERIC Educational Resources Information Center

    Ali, Usama S.; Walker, Michael E.

    2014-01-01

    Two methods are currently in use at Educational Testing Service (ETS) for equating observed item difficulty statistics. The first method involves the linear equating of item statistics in an observed sample to reference statistics on the same items. The second method, or the item response curve (IRC) method, involves the summation of conditional…

  11. Analysis of Statistical Methods Currently used in Toxicology Journals

    PubMed Central

    Na, Jihye; Yang, Hyeri

    2014-01-01

    Statistical methods are frequently used in toxicology, yet it is not clear whether the methods employed by the studies are used consistently and conducted based on sound statistical grounds. The purpose of this paper is to describe statistical methods used in top toxicology journals. More specifically, we sampled 30 papers published in 2014 from Toxicology and Applied Pharmacology, Archives of Toxicology, and Toxicological Science and described methodologies used to provide descriptive and inferential statistics. One hundred thirteen endpoints were observed in those 30 papers, and most studies had sample size less than 10, with the median and the mode being 6 and 3 & 6, respectively. Mean (105/113, 93%) was dominantly used to measure central tendency, and standard error of the mean (64/113, 57%) and standard deviation (39/113, 34%) were used to measure dispersion, while few studies provide justifications regarding why the methods being selected. Inferential statistics were frequently conducted (93/113, 82%), with one-way ANOVA being most popular (52/93, 56%), yet few studies conducted either normality or equal variance test. These results suggest that more consistent and appropriate use of statistical method is necessary which may enhance the role of toxicology in public health. PMID:25343012

  12. Analysis of Statistical Methods Currently used in Toxicology Journals.

    PubMed

    Na, Jihye; Yang, Hyeri; Bae, SeungJin; Lim, Kyung-Min

    2014-09-01

    Statistical methods are frequently used in toxicology, yet it is not clear whether the methods employed by the studies are used consistently and conducted based on sound statistical grounds. The purpose of this paper is to describe statistical methods used in top toxicology journals. More specifically, we sampled 30 papers published in 2014 from Toxicology and Applied Pharmacology, Archives of Toxicology, and Toxicological Science and described methodologies used to provide descriptive and inferential statistics. One hundred thirteen endpoints were observed in those 30 papers, and most studies had sample size less than 10, with the median and the mode being 6 and 3 & 6, respectively. Mean (105/113, 93%) was dominantly used to measure central tendency, and standard error of the mean (64/113, 57%) and standard deviation (39/113, 34%) were used to measure dispersion, while few studies provide justifications regarding why the methods being selected. Inferential statistics were frequently conducted (93/113, 82%), with one-way ANOVA being most popular (52/93, 56%), yet few studies conducted either normality or equal variance test. These results suggest that more consistent and appropriate use of statistical method is necessary which may enhance the role of toxicology in public health.

  13. A method for determining the weak statistical stationarity of a random process

    NASA Technical Reports Server (NTRS)

    Sadeh, W. Z.; Koper, C. A., Jr.

    1978-01-01

    A method for determining the weak statistical stationarity of a random process is presented. The core of this testing procedure consists of generating an equivalent ensemble which approximates a true ensemble. Formation of an equivalent ensemble is accomplished through segmenting a sufficiently long time history of a random process into equal, finite, and statistically independent sample records. The weak statistical stationarity is ascertained based on the time invariance of the equivalent-ensemble averages. Comparison of these averages with their corresponding time averages over a single sample record leads to a heuristic estimate of the ergodicity of a random process. Specific variance tests are introduced for evaluating the statistical independence of the sample records, the time invariance of the equivalent-ensemble autocorrelations, and the ergodicity. Examination and substantiation of these procedures were conducted utilizing turbulent velocity signals.

  14. Performance of Bootstrapping Approaches To Model Test Statistics and Parameter Standard Error Estimation in Structural Equation Modeling.

    ERIC Educational Resources Information Center

    Nevitt, Jonathan; Hancock, Gregory R.

    2001-01-01

    Evaluated the bootstrap method under varying conditions of nonnormality, sample size, model specification, and number of bootstrap samples drawn from the resampling space. Results for the bootstrap suggest the resampling-based method may be conservative in its control over model rejections, thus having an impact on the statistical power associated…

  15. Appropriate Statistical Analysis for Two Independent Groups of Likert-Type Data

    ERIC Educational Resources Information Center

    Warachan, Boonyasit

    2011-01-01

    The objective of this research was to determine the robustness and statistical power of three different methods for testing the hypothesis that ordinal samples of five and seven Likert categories come from equal populations. The three methods are the two sample t-test with equal variances, the Mann-Whitney test, and the Kolmogorov-Smirnov test. In…

  16. Small sample estimation of the reliability function for technical products

    NASA Astrophysics Data System (ADS)

    Lyamets, L. L.; Yakimenko, I. V.; Kanishchev, O. A.; Bliznyuk, O. A.

    2017-12-01

    It is demonstrated that, in the absence of big statistic samples obtained as a result of testing complex technical products for failure, statistic estimation of the reliability function of initial elements can be made by the moments method. A formal description of the moments method is given and its advantages in the analysis of small censored samples are discussed. A modified algorithm is proposed for the implementation of the moments method with the use of only the moments at which the failures of initial elements occur.

  17. Propensity-score matching in the cardiovascular surgery literature from 2004 to 2006: a systematic review and suggestions for improvement.

    PubMed

    Austin, Peter C

    2007-11-01

    I conducted a systematic review of the use of propensity score matching in the cardiovascular surgery literature. I examined the adequacy of reporting and whether appropriate statistical methods were used. I examined 60 articles published in the Annals of Thoracic Surgery, European Journal of Cardio-thoracic Surgery, Journal of Cardiovascular Surgery, and the Journal of Thoracic and Cardiovascular Surgery between January 1, 2004, and December 31, 2006. Thirty-one of the 60 studies did not provide adequate information on how the propensity score-matched pairs were formed. Eleven (18%) of studies did not report on whether matching on the propensity score balanced baseline characteristics between treated and untreated subjects in the matched sample. No studies used appropriate methods to compare baseline characteristics between treated and untreated subjects in the propensity score-matched sample. Eight (13%) of the 60 studies explicitly used statistical methods appropriate for the analysis of matched data when estimating the effect of treatment on the outcomes. Two studies used appropriate methods for some outcomes, but not for all outcomes. Thirty-nine (65%) studies explicitly used statistical methods that were inappropriate for matched-pairs data when estimating the effect of treatment on outcomes. Eleven studies did not report the statistical tests that were used to assess the statistical significance of the treatment effect. Analysis of propensity score-matched samples tended to be poor in the cardiovascular surgery literature. Most statistical analyses ignored the matched nature of the sample. I provide suggestions for improving the reporting and analysis of studies that use propensity score matching.

  18. The Effect of Cluster Sampling Design in Survey Research on the Standard Error Statistic.

    ERIC Educational Resources Information Center

    Wang, Lin; Fan, Xitao

    Standard statistical methods are used to analyze data that is assumed to be collected using a simple random sampling scheme. These methods, however, tend to underestimate variance when the data is collected with a cluster design, which is often found in educational survey research. The purposes of this paper are to demonstrate how a cluster design…

  19. Quantification of integrated HIV DNA by repetitive-sampling Alu-HIV PCR on the basis of poisson statistics.

    PubMed

    De Spiegelaere, Ward; Malatinkova, Eva; Lynch, Lindsay; Van Nieuwerburgh, Filip; Messiaen, Peter; O'Doherty, Una; Vandekerckhove, Linos

    2014-06-01

    Quantification of integrated proviral HIV DNA by repetitive-sampling Alu-HIV PCR is a candidate virological tool to monitor the HIV reservoir in patients. However, the experimental procedures and data analysis of the assay are complex and hinder its widespread use. Here, we provide an improved and simplified data analysis method by adopting binomial and Poisson statistics. A modified analysis method on the basis of Poisson statistics was used to analyze the binomial data of positive and negative reactions from a 42-replicate Alu-HIV PCR by use of dilutions of an integration standard and on samples of 57 HIV-infected patients. Results were compared with the quantitative output of the previously described Alu-HIV PCR method. Poisson-based quantification of the Alu-HIV PCR was linearly correlated with the standard dilution series, indicating that absolute quantification with the Poisson method is a valid alternative for data analysis of repetitive-sampling Alu-HIV PCR data. Quantitative outputs of patient samples assessed by the Poisson method correlated with the previously described Alu-HIV PCR analysis, indicating that this method is a valid alternative for quantifying integrated HIV DNA. Poisson-based analysis of the Alu-HIV PCR data enables absolute quantification without the need of a standard dilution curve. Implementation of the CI estimation permits improved qualitative analysis of the data and provides a statistical basis for the required minimal number of technical replicates. © 2014 The American Association for Clinical Chemistry.

  20. Analysis of Longitudinal Outcome Data with Missing Values in Total Knee Arthroplasty.

    PubMed

    Kang, Yeon Gwi; Lee, Jang Taek; Kang, Jong Yeal; Kim, Ga Hye; Kim, Tae Kyun

    2016-01-01

    We sought to determine the influence of missing data on the statistical results, and to determine which statistical method is most appropriate for the analysis of longitudinal outcome data of TKA with missing values among repeated measures ANOVA, generalized estimating equation (GEE) and mixed effects model repeated measures (MMRM). Data sets with missing values were generated with different proportion of missing data, sample size and missing-data generation mechanism. Each data set was analyzed with three statistical methods. The influence of missing data was greater with higher proportion of missing data and smaller sample size. MMRM tended to show least changes in the statistics. When missing values were generated by 'missing not at random' mechanism, no statistical methods could fully avoid deviations in the results. Copyright © 2016 Elsevier Inc. All rights reserved.

  1. Methods for collection and analysis of aquatic biological and microbiological samples

    USGS Publications Warehouse

    Greeson, Phillip E.; Ehlke, T.A.; Irwin, G.A.; Lium, B.W.; Slack, K.V.

    1977-01-01

    Chapter A4 contains methods used by the U.S. Geological Survey to collect, preserve, and analyze waters to determine their biological and microbiological properties. Part 1 discusses biological sampling and sampling statistics. The statistical procedures are accompanied by examples. Part 2 consists of detailed descriptions of more than 45 individual methods, including those for bacteria, phytoplankton, zooplankton, seston, periphyton, macrophytes, benthic invertebrates, fish and other vertebrates, cellular contents, productivity, and bioassays. Each method is summarized, and the application, interferences, apparatus, reagents, collection, analysis, calculations, reporting of results, precision and references are given. Part 3 consists of a glossary. Part 4 is a list of taxonomic references.

  2. Density-based empirical likelihood procedures for testing symmetry of data distributions and K-sample comparisons.

    PubMed

    Vexler, Albert; Tanajian, Hovig; Hutson, Alan D

    In practice, parametric likelihood-ratio techniques are powerful statistical tools. In this article, we propose and examine novel and simple distribution-free test statistics that efficiently approximate parametric likelihood ratios to analyze and compare distributions of K groups of observations. Using the density-based empirical likelihood methodology, we develop a Stata package that applies to a test for symmetry of data distributions and compares K -sample distributions. Recognizing that recent statistical software packages do not sufficiently address K -sample nonparametric comparisons of data distributions, we propose a new Stata command, vxdbel, to execute exact density-based empirical likelihood-ratio tests using K samples. To calculate p -values of the proposed tests, we use the following methods: 1) a classical technique based on Monte Carlo p -value evaluations; 2) an interpolation technique based on tabulated critical values; and 3) a new hybrid technique that combines methods 1 and 2. The third, cutting-edge method is shown to be very efficient in the context of exact-test p -value computations. This Bayesian-type method considers tabulated critical values as prior information and Monte Carlo generations of test statistic values as data used to depict the likelihood function. In this case, a nonparametric Bayesian method is proposed to compute critical values of exact tests.

  3. Statistical differences between relative quantitative molecular fingerprints from microbial communities.

    PubMed

    Portillo, M C; Gonzalez, J M

    2008-08-01

    Molecular fingerprints of microbial communities are a common method for the analysis and comparison of environmental samples. The significance of differences between microbial community fingerprints was analyzed considering the presence of different phylotypes and their relative abundance. A method is proposed by simulating coverage of the analyzed communities as a function of sampling size applying a Cramér-von Mises statistic. Comparisons were performed by a Monte Carlo testing procedure. As an example, this procedure was used to compare several sediment samples from freshwater ponds using a relative quantitative PCR-DGGE profiling technique. The method was able to discriminate among different samples based on their molecular fingerprints, and confirmed the lack of differences between aliquots from a single sample.

  4. [Effect sizes, statistical power and sample sizes in "the Japanese Journal of Psychology"].

    PubMed

    Suzukawa, Yumi; Toyoda, Hideki

    2012-04-01

    This study analyzed the statistical power of research studies published in the "Japanese Journal of Psychology" in 2008 and 2009. Sample effect sizes and sample statistical powers were calculated for each statistical test and analyzed with respect to the analytical methods and the fields of the studies. The results show that in the fields like perception, cognition or learning, the effect sizes were relatively large, although the sample sizes were small. At the same time, because of the small sample sizes, some meaningful effects could not be detected. In the other fields, because of the large sample sizes, meaningless effects could be detected. This implies that researchers who could not get large enough effect sizes would use larger samples to obtain significant results.

  5. Rasch fit statistics and sample size considerations for polytomous data

    PubMed Central

    Smith, Adam B; Rush, Robert; Fallowfield, Lesley J; Velikova, Galina; Sharpe, Michael

    2008-01-01

    Background Previous research on educational data has demonstrated that Rasch fit statistics (mean squares and t-statistics) are highly susceptible to sample size variation for dichotomously scored rating data, although little is known about this relationship for polytomous data. These statistics help inform researchers about how well items fit to a unidimensional latent trait, and are an important adjunct to modern psychometrics. Given the increasing use of Rasch models in health research the purpose of this study was therefore to explore the relationship between fit statistics and sample size for polytomous data. Methods Data were collated from a heterogeneous sample of cancer patients (n = 4072) who had completed both the Patient Health Questionnaire – 9 and the Hospital Anxiety and Depression Scale. Ten samples were drawn with replacement for each of eight sample sizes (n = 25 to n = 3200). The Rating and Partial Credit Models were applied and the mean square and t-fit statistics (infit/outfit) derived for each model. Results The results demonstrated that t-statistics were highly sensitive to sample size, whereas mean square statistics remained relatively stable for polytomous data. Conclusion It was concluded that mean square statistics were relatively independent of sample size for polytomous data and that misfit to the model could be identified using published recommended ranges. PMID:18510722

  6. Genome-wide association analysis of secondary imaging phenotypes from the Alzheimer's disease neuroimaging initiative study.

    PubMed

    Zhu, Wensheng; Yuan, Ying; Zhang, Jingwen; Zhou, Fan; Knickmeyer, Rebecca C; Zhu, Hongtu

    2017-02-01

    The aim of this paper is to systematically evaluate a biased sampling issue associated with genome-wide association analysis (GWAS) of imaging phenotypes for most imaging genetic studies, including the Alzheimer's Disease Neuroimaging Initiative (ADNI). Specifically, the original sampling scheme of these imaging genetic studies is primarily the retrospective case-control design, whereas most existing statistical analyses of these studies ignore such sampling scheme by directly correlating imaging phenotypes (called the secondary traits) with genotype. Although it has been well documented in genetic epidemiology that ignoring the case-control sampling scheme can produce highly biased estimates, and subsequently lead to misleading results and suspicious associations, such findings are not well documented in imaging genetics. We use extensive simulations and a large-scale imaging genetic data analysis of the Alzheimer's Disease Neuroimaging Initiative (ADNI) data to evaluate the effects of the case-control sampling scheme on GWAS results based on some standard statistical methods, such as linear regression methods, while comparing it with several advanced statistical methods that appropriately adjust for the case-control sampling scheme. Copyright © 2016 Elsevier Inc. All rights reserved.

  7. Chi-squared and C statistic minimization for low count per bin data. [sampling in X ray astronomy

    NASA Technical Reports Server (NTRS)

    Nousek, John A.; Shue, David R.

    1989-01-01

    Results are presented from a computer simulation comparing two statistical fitting techniques on data samples with large and small counts per bin; the results are then related specifically to X-ray astronomy. The Marquardt and Powell minimization techniques are compared by using both to minimize the chi-squared statistic. In addition, Cash's C statistic is applied, with Powell's method, and it is shown that the C statistic produces better fits in the low-count regime than chi-squared.

  8. Reflexion on linear regression trip production modelling method for ensuring good model quality

    NASA Astrophysics Data System (ADS)

    Suprayitno, Hitapriya; Ratnasari, Vita

    2017-11-01

    Transport Modelling is important. For certain cases, the conventional model still has to be used, in which having a good trip production model is capital. A good model can only be obtained from a good sample. Two of the basic principles of a good sampling is having a sample capable to represent the population characteristics and capable to produce an acceptable error at a certain confidence level. It seems that this principle is not yet quite understood and used in trip production modeling. Therefore, investigating the Trip Production Modelling practice in Indonesia and try to formulate a better modeling method for ensuring the Model Quality is necessary. This research result is presented as follows. Statistics knows a method to calculate span of prediction value at a certain confidence level for linear regression, which is called Confidence Interval of Predicted Value. The common modeling practice uses R2 as the principal quality measure, the sampling practice varies and not always conform to the sampling principles. An experiment indicates that small sample is already capable to give excellent R2 value and sample composition can significantly change the model. Hence, good R2 value, in fact, does not always mean good model quality. These lead to three basic ideas for ensuring good model quality, i.e. reformulating quality measure, calculation procedure, and sampling method. A quality measure is defined as having a good R2 value and a good Confidence Interval of Predicted Value. Calculation procedure must incorporate statistical calculation method and appropriate statistical tests needed. A good sampling method must incorporate random well distributed stratified sampling with a certain minimum number of samples. These three ideas need to be more developed and tested.

  9. Statistical analysis of radioimmunoassay. In comparison with bioassay (in Japanese)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nakano, R.

    1973-01-01

    Using the data of RIA (radioimmunoassay), statistical procedures for dealing with two problems of the linearization of dose response curve and calculation of relative potency were described. There were three methods for linearization of dose response curve of RIA. In each method, the following parameters were shown on the horizontal and vertical axis: dose x, (B/T)/sup -1/; c/x + c, B/T (C: dose which makes B/T 50%); log x, logit B/T. Among them, the last method seems to be most practical. The statistical procedures for bioassay were employed for calculating the relative potency of unknown samples compared to the standardmore » samples from dose response curves of standand and unknown samples using regression coefficient. It is desirable that relative potency is calculated by plotting more than 5 points in the standard curve and plotting more than 2 points in unknow samples. For examining the statistical limit of precision of measuremert, LH activity of gonadotropin in urine was measured and relative potency, precision coefficient and the upper and lower limits of relative potency at 95% confidence limit were calculated. On the other hand, bioassay (by the ovarian ascorbic acid reduction method and anteriol lobe of prostate weighing method) was done in the same samples, and the precision was compared with that of RIA. In these examinations, the upper and lower limits of the relative potency at 95% confidence limit were near each other, while in bioassay, a considerable difference was observed between the upper and lower limits. The necessity of standardization and systematization of the statistical procedures for increasing the precision of RIA was pointed out. (JA)« less

  10. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Woodroffe, J. R.; Brito, T. V.; Jordanova, V. K.

    In the standard practice of neutron multiplicity counting , the first three sampled factorial moments of the event triggered neutron count distribution were used to quantify the three main neutron source terms: the spontaneous fissile material effective mass, the relative (α,n) production and the induced fission source responsible for multiplication. Our study compares three methods to quantify the statistical uncertainty of the estimated mass: the bootstrap method, propagation of variance through moments, and statistical analysis of cycle data method. Each of the three methods was implemented on a set of four different NMC measurements, held at the JRC-laboratory in Ispra,more » Italy, sampling four different Pu samples in a standard Plutonium Scrap Multiplicity Counter (PSMC) well counter.« less

  11. Data-optimized source modeling with the Backwards Liouville Test–Kinetic method

    DOE PAGES

    Woodroffe, J. R.; Brito, T. V.; Jordanova, V. K.; ...

    2017-09-14

    In the standard practice of neutron multiplicity counting , the first three sampled factorial moments of the event triggered neutron count distribution were used to quantify the three main neutron source terms: the spontaneous fissile material effective mass, the relative (α,n) production and the induced fission source responsible for multiplication. Our study compares three methods to quantify the statistical uncertainty of the estimated mass: the bootstrap method, propagation of variance through moments, and statistical analysis of cycle data method. Each of the three methods was implemented on a set of four different NMC measurements, held at the JRC-laboratory in Ispra,more » Italy, sampling four different Pu samples in a standard Plutonium Scrap Multiplicity Counter (PSMC) well counter.« less

  12. Application of binomial and multinomial probability statistics to the sampling design process of a global grain tracing and recall system

    USDA-ARS?s Scientific Manuscript database

    Small, coded, pill-sized tracers embedded in grain are proposed as a method for grain traceability. A sampling process for a grain traceability system was designed and investigated by applying probability statistics using a science-based sampling approach to collect an adequate number of tracers fo...

  13. Sampling and counting genome rearrangement scenarios

    PubMed Central

    2015-01-01

    Background Even for moderate size inputs, there are a tremendous number of optimal rearrangement scenarios, regardless what the model is and which specific question is to be answered. Therefore giving one optimal solution might be misleading and cannot be used for statistical inferring. Statistically well funded methods are necessary to sample uniformly from the solution space and then a small number of samples are sufficient for statistical inferring. Contribution In this paper, we give a mini-review about the state-of-the-art of sampling and counting rearrangement scenarios, focusing on the reversal, DCJ and SCJ models. Above that, we also give a Gibbs sampler for sampling most parsimonious labeling of evolutionary trees under the SCJ model. The method has been implemented and tested on real life data. The software package together with example data can be downloaded from http://www.renyi.hu/~miklosi/SCJ-Gibbs/ PMID:26452124

  14. Non-Immunogenic Structurally and Biologically Intact Tissue Matrix Grafts for the Immediate Repair of Ballistic-Induced Vascular and Nerve Tissue Injury in Combat Casualty Care

    DTIC Science & Technology

    2005-07-01

    as an access graft is addressed using statistical methods below. Graft consistency can be defined statistically as the variance associated with the...addressed using statistical methods below. Graft consistency can be defined statistically as the variance associated with the sample of grafts tested in...measured using a refractometer (Brix % method). The equilibration data are shown in Graph 1. The results suggest the following equilibration scheme: 40% v/v

  15. Comparison of Sample Size by Bootstrap and by Formulas Based on Normal Distribution Assumption.

    PubMed

    Wang, Zuozhen

    2018-01-01

    Bootstrapping technique is distribution-independent, which provides an indirect way to estimate the sample size for a clinical trial based on a relatively smaller sample. In this paper, sample size estimation to compare two parallel-design arms for continuous data by bootstrap procedure are presented for various test types (inequality, non-inferiority, superiority, and equivalence), respectively. Meanwhile, sample size calculation by mathematical formulas (normal distribution assumption) for the identical data are also carried out. Consequently, power difference between the two calculation methods is acceptably small for all the test types. It shows that the bootstrap procedure is a credible technique for sample size estimation. After that, we compared the powers determined using the two methods based on data that violate the normal distribution assumption. To accommodate the feature of the data, the nonparametric statistical method of Wilcoxon test was applied to compare the two groups in the data during the process of bootstrap power estimation. As a result, the power estimated by normal distribution-based formula is far larger than that by bootstrap for each specific sample size per group. Hence, for this type of data, it is preferable that the bootstrap method be applied for sample size calculation at the beginning, and that the same statistical method as used in the subsequent statistical analysis is employed for each bootstrap sample during the course of bootstrap sample size estimation, provided there is historical true data available that can be well representative of the population to which the proposed trial is planning to extrapolate.

  16. Individualized statistical learning from medical image databases: application to identification of brain lesions.

    PubMed

    Erus, Guray; Zacharaki, Evangelia I; Davatzikos, Christos

    2014-04-01

    This paper presents a method for capturing statistical variation of normal imaging phenotypes, with emphasis on brain structure. The method aims to estimate the statistical variation of a normative set of images from healthy individuals, and identify abnormalities as deviations from normality. A direct estimation of the statistical variation of the entire volumetric image is challenged by the high-dimensionality of images relative to smaller sample sizes. To overcome this limitation, we iteratively sample a large number of lower dimensional subspaces that capture image characteristics ranging from fine and localized to coarser and more global. Within each subspace, a "target-specific" feature selection strategy is applied to further reduce the dimensionality, by considering only imaging characteristics present in a test subject's images. Marginal probability density functions of selected features are estimated through PCA models, in conjunction with an "estimability" criterion that limits the dimensionality of estimated probability densities according to available sample size and underlying anatomy variation. A test sample is iteratively projected to the subspaces of these marginals as determined by PCA models, and its trajectory delineates potential abnormalities. The method is applied to segmentation of various brain lesion types, and to simulated data on which superiority of the iterative method over straight PCA is demonstrated. Copyright © 2014 Elsevier B.V. All rights reserved.

  17. Individualized Statistical Learning from Medical Image Databases: Application to Identification of Brain Lesions

    PubMed Central

    Erus, Guray; Zacharaki, Evangelia I.; Davatzikos, Christos

    2014-01-01

    This paper presents a method for capturing statistical variation of normal imaging phenotypes, with emphasis on brain structure. The method aims to estimate the statistical variation of a normative set of images from healthy individuals, and identify abnormalities as deviations from normality. A direct estimation of the statistical variation of the entire volumetric image is challenged by the high-dimensionality of images relative to smaller sample sizes. To overcome this limitation, we iteratively sample a large number of lower dimensional subspaces that capture image characteristics ranging from fine and localized to coarser and more global. Within each subspace, a “target-specific” feature selection strategy is applied to further reduce the dimensionality, by considering only imaging characteristics present in a test subject’s images. Marginal probability density functions of selected features are estimated through PCA models, in conjunction with an “estimability” criterion that limits the dimensionality of estimated probability densities according to available sample size and underlying anatomy variation. A test sample is iteratively projected to the subspaces of these marginals as determined by PCA models, and its trajectory delineates potential abnormalities. The method is applied to segmentation of various brain lesion types, and to simulated data on which superiority of the iterative method over straight PCA is demonstrated. PMID:24607564

  18. On the analysis of very small samples of Gaussian repeated measurements: an alternative approach.

    PubMed

    Westgate, Philip M; Burchett, Woodrow W

    2017-03-15

    The analysis of very small samples of Gaussian repeated measurements can be challenging. First, due to a very small number of independent subjects contributing outcomes over time, statistical power can be quite small. Second, nuisance covariance parameters must be appropriately accounted for in the analysis in order to maintain the nominal test size. However, available statistical strategies that ensure valid statistical inference may lack power, whereas more powerful methods may have the potential for inflated test sizes. Therefore, we explore an alternative approach to the analysis of very small samples of Gaussian repeated measurements, with the goal of maintaining valid inference while also improving statistical power relative to other valid methods. This approach uses generalized estimating equations with a bias-corrected empirical covariance matrix that accounts for all small-sample aspects of nuisance correlation parameter estimation in order to maintain valid inference. Furthermore, the approach utilizes correlation selection strategies with the goal of choosing the working structure that will result in the greatest power. In our study, we show that when accurate modeling of the nuisance correlation structure impacts the efficiency of regression parameter estimation, this method can improve power relative to existing methods that yield valid inference. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  19. An evaluation of flow-stratified sampling for estimating suspended sediment loads

    Treesearch

    Robert B. Thomas; Jack Lewis

    1995-01-01

    Abstract - Flow-stratified sampling is a new method for sampling water quality constituents such as suspended sediment to estimate loads. As with selection-at-list-time (SALT) and time-stratified sampling, flow-stratified sampling is a statistical method requiring random sampling, and yielding unbiased estimates of load and variance. It can be used to estimate event...

  20. The space of ultrametric phylogenetic trees.

    PubMed

    Gavryushkin, Alex; Drummond, Alexei J

    2016-08-21

    The reliability of a phylogenetic inference method from genomic sequence data is ensured by its statistical consistency. Bayesian inference methods produce a sample of phylogenetic trees from the posterior distribution given sequence data. Hence the question of statistical consistency of such methods is equivalent to the consistency of the summary of the sample. More generally, statistical consistency is ensured by the tree space used to analyse the sample. In this paper, we consider two standard parameterisations of phylogenetic time-trees used in evolutionary models: inter-coalescent interval lengths and absolute times of divergence events. For each of these parameterisations we introduce a natural metric space on ultrametric phylogenetic trees. We compare the introduced spaces with existing models of tree space and formulate several formal requirements that a metric space on phylogenetic trees must possess in order to be a satisfactory space for statistical analysis, and justify them. We show that only a few known constructions of the space of phylogenetic trees satisfy these requirements. However, our results suggest that these basic requirements are not enough to distinguish between the two metric spaces we introduce and that the choice between metric spaces requires additional properties to be considered. Particularly, that the summary tree minimising the square distance to the trees from the sample might be different for different parameterisations. This suggests that further fundamental insight is needed into the problem of statistical consistency of phylogenetic inference methods. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  1. ProUCL version 4.1.00 Documentation Downloads

    EPA Pesticide Factsheets

    ProUCL version 4.1.00 represents a comprehensive statistical software package equipped with statistical methods and graphical tools needed to address many environmental sampling and statistical issues as described in various these guidance documents.

  2. Across-cohort QC analyses of GWAS summary statistics from complex traits.

    PubMed

    Chen, Guo-Bo; Lee, Sang Hong; Robinson, Matthew R; Trzaskowski, Maciej; Zhu, Zhi-Xiang; Winkler, Thomas W; Day, Felix R; Croteau-Chonka, Damien C; Wood, Andrew R; Locke, Adam E; Kutalik, Zoltán; Loos, Ruth J F; Frayling, Timothy M; Hirschhorn, Joel N; Yang, Jian; Wray, Naomi R; Visscher, Peter M

    2016-01-01

    Genome-wide association studies (GWASs) have been successful in discovering SNP trait associations for many quantitative traits and common diseases. Typically, the effect sizes of SNP alleles are very small and this requires large genome-wide association meta-analyses (GWAMAs) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study, we propose four metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We propose methods to examine the concordance between demographic information, and summary statistics and methods to investigate sample overlap. (I) We use the population genetics F st statistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. (II) We conduct principal component analysis based on reported allele frequencies, and are able to recover the ancestral information for each cohort. (III) We propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. (IV) To quantify unknown sample overlap across all pairs of cohorts, we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy.

  3. Across-cohort QC analyses of GWAS summary statistics from complex traits

    PubMed Central

    Chen, Guo-Bo; Lee, Sang Hong; Robinson, Matthew R; Trzaskowski, Maciej; Zhu, Zhi-Xiang; Winkler, Thomas W; Day, Felix R; Croteau-Chonka, Damien C; Wood, Andrew R; Locke, Adam E; Kutalik, Zoltán; Loos, Ruth J F; Frayling, Timothy M; Hirschhorn, Joel N; Yang, Jian; Wray, Naomi R; Visscher, Peter M

    2017-01-01

    Genome-wide association studies (GWASs) have been successful in discovering SNP trait associations for many quantitative traits and common diseases. Typically, the effect sizes of SNP alleles are very small and this requires large genome-wide association meta-analyses (GWAMAs) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study, we propose four metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We propose methods to examine the concordance between demographic information, and summary statistics and methods to investigate sample overlap. (I) We use the population genetics Fst statistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. (II) We conduct principal component analysis based on reported allele frequencies, and are able to recover the ancestral information for each cohort. (III) We propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. (IV) To quantify unknown sample overlap across all pairs of cohorts, we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy. PMID:27552965

  4. Validation of a modification to Performance-Tested Method 070601: Reveal Listeria Test for detection of Listeria spp. in selected foods and selected environmental samples.

    PubMed

    Alles, Susan; Peng, Linda X; Mozola, Mark A

    2009-01-01

    A modification to Performance-Tested Method (PTM) 070601, Reveal Listeria Test (Reveal), is described. The modified method uses a new media formulation, LESS enrichment broth, in single-step enrichment protocols for both foods and environmental sponge and swab samples. Food samples are enriched for 27-30 h at 30 degrees C and environmental samples for 24-48 h at 30 degrees C. Implementation of these abbreviated enrichment procedures allows test results to be obtained on a next-day basis. In testing of 14 food types in internal comparative studies with inoculated samples, there was a statistically significant difference in performance between the Reveal and reference culture [U.S. Food and Drug Administration's Bacteriological Analytical Manual (FDA/BAM) or U.S. Department of Agriculture-Food Safety and Inspection Service (USDA-FSIS)] methods for only a single food in one trial (pasteurized crab meat) at the 27 h enrichment time point, with more positive results obtained with the FDA/BAM reference method. No foods showed statistically significant differences in method performance at the 30 h time point. Independent laboratory testing of 3 foods again produced a statistically significant difference in results for crab meat at the 27 h time point; otherwise results of the Reveal and reference methods were statistically equivalent. Overall, considering both internal and independent laboratory trials, sensitivity of the Reveal method relative to the reference culture procedures in testing of foods was 85.9% at 27 h and 97.1% at 30 h. Results from 5 environmental surfaces inoculated with various strains of Listeria spp. showed that the Reveal method was more productive than the reference USDA-FSIS culture procedure for 3 surfaces (stainless steel, plastic, and cast iron), whereas results were statistically equivalent to the reference method for the other 2 surfaces (ceramic tile and sealed concrete). An independent laboratory trial with ceramic tile inoculated with L. monocytogenes confirmed the effectiveness of the Reveal method at the 24 h time point. Overall, sensitivity of the Reveal method at 24 h relative to that of the USDA-FSIS method was 153%. The Reveal method exhibited extremely high specificity, with only a single false-positive result in all trials combined for overall specificity of 99.5%.

  5. Transmission overhaul and replacement predictions using Weibull and renewel theory

    NASA Technical Reports Server (NTRS)

    Savage, M.; Lewicki, D. G.

    1989-01-01

    A method to estimate the frequency of transmission overhauls is presented. This method is based on the two-parameter Weibull statistical distribution for component life. A second method is presented to estimate the number of replacement components needed to support the transmission overhaul pattern. The second method is based on renewal theory. Confidence statistics are applied with both methods to improve the statistical estimate of sample behavior. A transmission example is also presented to illustrate the use of the methods. Transmission overhaul frequency and component replacement calculations are included in the example.

  6. On using summary statistics from an external calibration sample to correct for covariate measurement error.

    PubMed

    Guo, Ying; Little, Roderick J; McConnell, Daniel S

    2012-01-01

    Covariate measurement error is common in epidemiologic studies. Current methods for correcting measurement error with information from external calibration samples are insufficient to provide valid adjusted inferences. We consider the problem of estimating the regression of an outcome Y on covariates X and Z, where Y and Z are observed, X is unobserved, but a variable W that measures X with error is observed. Information about measurement error is provided in an external calibration sample where data on X and W (but not Y and Z) are recorded. We describe a method that uses summary statistics from the calibration sample to create multiple imputations of the missing values of X in the regression sample, so that the regression coefficients of Y on X and Z and associated standard errors can be estimated using simple multiple imputation combining rules, yielding valid statistical inferences under the assumption of a multivariate normal distribution. The proposed method is shown by simulation to provide better inferences than existing methods, namely the naive method, classical calibration, and regression calibration, particularly for correction for bias and achieving nominal confidence levels. We also illustrate our method with an example using linear regression to examine the relation between serum reproductive hormone concentrations and bone mineral density loss in midlife women in the Michigan Bone Health and Metabolism Study. Existing methods fail to adjust appropriately for bias due to measurement error in the regression setting, particularly when measurement error is substantial. The proposed method corrects this deficiency.

  7. Exploring the Connection Between Sampling Problems in Bayesian Inference and Statistical Mechanics

    NASA Technical Reports Server (NTRS)

    Pohorille, Andrew

    2006-01-01

    The Bayesian and statistical mechanical communities often share the same objective in their work - estimating and integrating probability distribution functions (pdfs) describing stochastic systems, models or processes. Frequently, these pdfs are complex functions of random variables exhibiting multiple, well separated local minima. Conventional strategies for sampling such pdfs are inefficient, sometimes leading to an apparent non-ergodic behavior. Several recently developed techniques for handling this problem have been successfully applied in statistical mechanics. In the multicanonical and Wang-Landau Monte Carlo (MC) methods, the correct pdfs are recovered from uniform sampling of the parameter space by iteratively establishing proper weighting factors connecting these distributions. Trivial generalizations allow for sampling from any chosen pdf. The closely related transition matrix method relies on estimating transition probabilities between different states. All these methods proved to generate estimates of pdfs with high statistical accuracy. In another MC technique, parallel tempering, several random walks, each corresponding to a different value of a parameter (e.g. "temperature"), are generated and occasionally exchanged using the Metropolis criterion. This method can be considered as a statistically correct version of simulated annealing. An alternative approach is to represent the set of independent variables as a Hamiltonian system. Considerab!e progress has been made in understanding how to ensure that the system obeys the equipartition theorem or, equivalently, that coupling between the variables is correctly described. Then a host of techniques developed for dynamical systems can be used. Among them, probably the most powerful is the Adaptive Biasing Force method, in which thermodynamic integration and biased sampling are combined to yield very efficient estimates of pdfs. The third class of methods deals with transitions between states described by rate constants. These problems are isomorphic with chemical kinetics problems. Recently, several efficient techniques for this purpose have been developed based on the approach originally proposed by Gillespie. Although the utility of the techniques mentioned above for Bayesian problems has not been determined, further research along these lines is warranted

  8. Automated grain mapping using wide angle convergent beam electron diffraction in transmission electron microscope for nanomaterials.

    PubMed

    Kumar, Vineet

    2011-12-01

    The grain size statistics, commonly derived from the grain map of a material sample, are important microstructure characteristics that greatly influence its properties. The grain map for nanomaterials is usually obtained manually by visual inspection of the transmission electron microscope (TEM) micrographs because automated methods do not perform satisfactorily. While the visual inspection method provides reliable results, it is a labor intensive process and is often prone to human errors. In this article, an automated grain mapping method is developed using TEM diffraction patterns. The presented method uses wide angle convergent beam diffraction in the TEM. The automated technique was applied on a platinum thin film sample to obtain the grain map and subsequently derive grain size statistics from it. The grain size statistics obtained with the automated method were found in good agreement with the visual inspection method.

  9. [Respondent-Driven Sampling: a new sampling method to study visible and hidden populations].

    PubMed

    Mantecón, Alejandro; Juan, Montse; Calafat, Amador; Becoña, Elisardo; Román, Encarna

    2008-01-01

    The paper introduces a variant of chain-referral sampling: respondent-driven sampling (RDS). This sampling method shows that methods based on network analysis can be combined with the statistical validity of standard probability sampling methods. In this sense, RDS appears to be a mathematical improvement of snowball sampling oriented to the study of hidden populations. However, we try to prove its validity with populations that are not within a sampling frame but can nonetheless be contacted without difficulty. The basics of RDS are explained through our research on young people (aged 14 to 25) who go clubbing, consume alcohol and other drugs, and have sex. Fieldwork was carried out between May and July 2007 in three Spanish regions: Baleares, Galicia and Comunidad Valenciana. The presentation of the study shows the utility of this type of sampling when the population is accessible but there is a difficulty deriving from the lack of a sampling frame. However, the sample obtained is not a random representative one in statistical terms of the target population. It must be acknowledged that the final sample is representative of a 'pseudo-population' that approximates to the target population but is not identical to it.

  10. Estimating the mass variance in neutron multiplicity counting-A comparison of approaches

    NASA Astrophysics Data System (ADS)

    Dubi, C.; Croft, S.; Favalli, A.; Ocherashvili, A.; Pedersen, B.

    2017-12-01

    In the standard practice of neutron multiplicity counting , the first three sampled factorial moments of the event triggered neutron count distribution are used to quantify the three main neutron source terms: the spontaneous fissile material effective mass, the relative (α , n) production and the induced fission source responsible for multiplication. This study compares three methods to quantify the statistical uncertainty of the estimated mass: the bootstrap method, propagation of variance through moments, and statistical analysis of cycle data method. Each of the three methods was implemented on a set of four different NMC measurements, held at the JRC-laboratory in Ispra, Italy, sampling four different Pu samples in a standard Plutonium Scrap Multiplicity Counter (PSMC) well counter.

  11. Estimating the mass variance in neutron multiplicity counting $-$ A comparison of approaches

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dubi, C.; Croft, S.; Favalli, A.

    In the standard practice of neutron multiplicity counting, the first three sampled factorial moments of the event triggered neutron count distribution are used to quantify the three main neutron source terms: the spontaneous fissile material effective mass, the relative (α,n) production and the induced fission source responsible for multiplication. This study compares three methods to quantify the statistical uncertainty of the estimated mass: the bootstrap method, propagation of variance through moments, and statistical analysis of cycle data method. Each of the three methods was implemented on a set of four different NMC measurements, held at the JRC-laboratory in Ispra, Italy,more » sampling four different Pu samples in a standard Plutonium Scrap Multiplicity Counter (PSMC) well counter.« less

  12. Estimating the mass variance in neutron multiplicity counting $-$ A comparison of approaches

    DOE PAGES

    Dubi, C.; Croft, S.; Favalli, A.; ...

    2017-09-14

    In the standard practice of neutron multiplicity counting, the first three sampled factorial moments of the event triggered neutron count distribution are used to quantify the three main neutron source terms: the spontaneous fissile material effective mass, the relative (α,n) production and the induced fission source responsible for multiplication. This study compares three methods to quantify the statistical uncertainty of the estimated mass: the bootstrap method, propagation of variance through moments, and statistical analysis of cycle data method. Each of the three methods was implemented on a set of four different NMC measurements, held at the JRC-laboratory in Ispra, Italy,more » sampling four different Pu samples in a standard Plutonium Scrap Multiplicity Counter (PSMC) well counter.« less

  13. Methodological approaches in analysing observational data: A practical example on how to address clustering and selection bias.

    PubMed

    Trutschel, Diana; Palm, Rebecca; Holle, Bernhard; Simon, Michael

    2017-11-01

    Because not every scientific question on effectiveness can be answered with randomised controlled trials, research methods that minimise bias in observational studies are required. Two major concerns influence the internal validity of effect estimates: selection bias and clustering. Hence, to reduce the bias of the effect estimates, more sophisticated statistical methods are needed. To introduce statistical approaches such as propensity score matching and mixed models into representative real-world analysis and to conduct the implementation in statistical software R to reproduce the results. Additionally, the implementation in R is presented to allow the results to be reproduced. We perform a two-level analytic strategy to address the problems of bias and clustering: (i) generalised models with different abilities to adjust for dependencies are used to analyse binary data and (ii) the genetic matching and covariate adjustment methods are used to adjust for selection bias. Hence, we analyse the data from two population samples, the sample produced by the matching method and the full sample. The different analysis methods in this article present different results but still point in the same direction. In our example, the estimate of the probability of receiving a case conference is higher in the treatment group than in the control group. Both strategies, genetic matching and covariate adjustment, have their limitations but complement each other to provide the whole picture. The statistical approaches were feasible for reducing bias but were nevertheless limited by the sample used. For each study and obtained sample, the pros and cons of the different methods have to be weighted. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  14. Mapping cell populations in flow cytometry data for cross‐sample comparison using the Friedman–Rafsky test statistic as a distance measure

    PubMed Central

    Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu

    2015-01-01

    Abstract Flow cytometry (FCM) is a fluorescence‐based single‐cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap‐FR, a novel method for cell population mapping across FCM samples. FlowMap‐FR is based on the Friedman–Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap‐FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap‐FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap‐FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap‐FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap‐FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback–Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL‐distance in distinguishing equivalent from nonequivalent cell populations. FlowMap‐FR was also employed as a distance metric to match cell populations delineated by manual gating across 30 FCM samples from a benchmark FlowCAP data set. An F‐measure of 0.88 was obtained, indicating high precision and recall of the FR‐based population matching results. FlowMap‐FR has been implemented as a standalone R/Bioconductor package so that it can be easily incorporated into current FCM data analytical workflows. © 2015 International Society for Advancement of Cytometry PMID:26274018

  15. Mapping cell populations in flow cytometry data for cross-sample comparison using the Friedman-Rafsky test statistic as a distance measure.

    PubMed

    Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu; Scheuermann, Richard H

    2016-01-01

    Flow cytometry (FCM) is a fluorescence-based single-cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap-FR, a novel method for cell population mapping across FCM samples. FlowMap-FR is based on the Friedman-Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap-FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap-FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap-FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap-FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap-FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback-Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL-distance in distinguishing equivalent from nonequivalent cell populations. FlowMap-FR was also employed as a distance metric to match cell populations delineated by manual gating across 30 FCM samples from a benchmark FlowCAP data set. An F-measure of 0.88 was obtained, indicating high precision and recall of the FR-based population matching results. FlowMap-FR has been implemented as a standalone R/Bioconductor package so that it can be easily incorporated into current FCM data analytical workflows. © The Authors. Published by Wiley Periodicals, Inc. on behalf of ISAC.

  16. Quantification and Statistical Analysis Methods for Vessel Wall Components from Stained Images with Masson's Trichrome

    PubMed Central

    Hernández-Morera, Pablo; Castaño-González, Irene; Travieso-González, Carlos M.; Mompeó-Corredera, Blanca; Ortega-Santana, Francisco

    2016-01-01

    Purpose To develop a digital image processing method to quantify structural components (smooth muscle fibers and extracellular matrix) in the vessel wall stained with Masson’s trichrome, and a statistical method suitable for small sample sizes to analyze the results previously obtained. Methods The quantification method comprises two stages. The pre-processing stage improves tissue image appearance and the vessel wall area is delimited. In the feature extraction stage, the vessel wall components are segmented by grouping pixels with a similar color. The area of each component is calculated by normalizing the number of pixels of each group by the vessel wall area. Statistical analyses are implemented by permutation tests, based on resampling without replacement from the set of the observed data to obtain a sampling distribution of an estimator. The implementation can be parallelized on a multicore machine to reduce execution time. Results The methods have been tested on 48 vessel wall samples of the internal saphenous vein stained with Masson’s trichrome. The results show that the segmented areas are consistent with the perception of a team of doctors and demonstrate good correlation between the expert judgments and the measured parameters for evaluating vessel wall changes. Conclusion The proposed methodology offers a powerful tool to quantify some components of the vessel wall. It is more objective, sensitive and accurate than the biochemical and qualitative methods traditionally used. The permutation tests are suitable statistical techniques to analyze the numerical measurements obtained when the underlying assumptions of the other statistical techniques are not met. PMID:26761643

  17. A nonparametric method to generate synthetic populations to adjust for complex sampling design features.

    PubMed

    Dong, Qi; Elliott, Michael R; Raghunathan, Trivellore E

    2014-06-01

    Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs.

  18. A nonparametric method to generate synthetic populations to adjust for complex sampling design features

    PubMed Central

    Dong, Qi; Elliott, Michael R.; Raghunathan, Trivellore E.

    2017-01-01

    Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs. PMID:29200608

  19. Evaluation of respondent-driven sampling.

    PubMed

    McCreesh, Nicky; Frost, Simon D W; Seeley, Janet; Katongole, Joseph; Tarsh, Matilda N; Ndunguse, Richard; Jichi, Fatima; Lunel, Natasha L; Maher, Dermot; Johnston, Lisa G; Sonnenberg, Pam; Copas, Andrew J; Hayes, Richard J; White, Richard G

    2012-01-01

    Respondent-driven sampling is a novel variant of link-tracing sampling for estimating the characteristics of hard-to-reach groups, such as HIV prevalence in sex workers. Despite its use by leading health organizations, the performance of this method in realistic situations is still largely unknown. We evaluated respondent-driven sampling by comparing estimates from a respondent-driven sampling survey with total population data. Total population data on age, tribe, religion, socioeconomic status, sexual activity, and HIV status were available on a population of 2402 male household heads from an open cohort in rural Uganda. A respondent-driven sampling (RDS) survey was carried out in this population, using current methods of sampling (RDS sample) and statistical inference (RDS estimates). Analyses were carried out for the full RDS sample and then repeated for the first 250 recruits (small sample). We recruited 927 household heads. Full and small RDS samples were largely representative of the total population, but both samples underrepresented men who were younger, of higher socioeconomic status, and with unknown sexual activity and HIV status. Respondent-driven sampling statistical inference methods failed to reduce these biases. Only 31%-37% (depending on method and sample size) of RDS estimates were closer to the true population proportions than the RDS sample proportions. Only 50%-74% of respondent-driven sampling bootstrap 95% confidence intervals included the population proportion. Respondent-driven sampling produced a generally representative sample of this well-connected nonhidden population. However, current respondent-driven sampling inference methods failed to reduce bias when it occurred. Whether the data required to remove bias and measure precision can be collected in a respondent-driven sampling survey is unresolved. Respondent-driven sampling should be regarded as a (potentially superior) form of convenience sampling method, and caution is required when interpreting findings based on the sampling method.

  20. Analyzing Kernel Matrices for the Identification of Differentially Expressed Genes

    PubMed Central

    Xia, Xiao-Lei; Xing, Huanlai; Liu, Xueqin

    2013-01-01

    One of the most important applications of microarray data is the class prediction of biological samples. For this purpose, statistical tests have often been applied to identify the differentially expressed genes (DEGs), followed by the employment of the state-of-the-art learning machines including the Support Vector Machines (SVM) in particular. The SVM is a typical sample-based classifier whose performance comes down to how discriminant samples are. However, DEGs identified by statistical tests are not guaranteed to result in a training dataset composed of discriminant samples. To tackle this problem, a novel gene ranking method namely the Kernel Matrix Gene Selection (KMGS) is proposed. The rationale of the method, which roots in the fundamental ideas of the SVM algorithm, is described. The notion of ''the separability of a sample'' which is estimated by performing -like statistics on each column of the kernel matrix, is first introduced. The separability of a classification problem is then measured, from which the significance of a specific gene is deduced. Also described is a method of Kernel Matrix Sequential Forward Selection (KMSFS) which shares the KMGS method's essential ideas but proceeds in a greedy manner. On three public microarray datasets, our proposed algorithms achieved noticeably competitive performance in terms of the B.632+ error rate. PMID:24349110

  1. [Evaluation of using statistical methods in selected national medical journals].

    PubMed

    Sych, Z

    1996-01-01

    The paper covers the performed evaluation of frequency with which the statistical methods were applied in analyzed works having been published in six selected, national medical journals in the years 1988-1992. For analysis the following journals were chosen, namely: Klinika Oczna, Medycyna Pracy, Pediatria Polska, Polski Tygodnik Lekarski, Roczniki Państwowego Zakładu Higieny, Zdrowie Publiczne. Appropriate number of works up to the average in the remaining medical journals was randomly selected from respective volumes of Pol. Tyg. Lek. The studies did not include works wherein the statistical analysis was not implemented, which referred both to national and international publications. That exemption was also extended to review papers, casuistic ones, reviews of books, handbooks, monographies, reports from scientific congresses, as well as papers on historical topics. The number of works was defined in each volume. Next, analysis was performed to establish the mode of finding out a suitable sample in respective studies, differentiating two categories: random and target selections. Attention was also paid to the presence of control sample in the individual works. In the analysis attention was also focussed on the existence of sample characteristics, setting up three categories: complete, partial and lacking. In evaluating the analyzed works an effort was made to present the results of studies in tables and figures (Tab. 1, 3). Analysis was accomplished with regard to the rate of employing statistical methods in analyzed works in relevant volumes of six selected, national medical journals for the years 1988-1992, simultaneously determining the number of works, in which no statistical methods were used. Concurrently the frequency of applying the individual statistical methods was analyzed in the scrutinized works. Prominence was given to fundamental statistical methods in the field of descriptive statistics (measures of position, measures of dispersion) as well as most important methods of mathematical statistics such as parametric tests of significance, analysis of variance (in single and dual classifications). non-parametric tests of significance, correlation and regression. The works, in which use was made of either multiple correlation or multiple regression or else more complex methods of studying the relationship for two or more numbers of variables, were incorporated into the works whose statistical methods were constituted by correlation and regression as well as other methods, e.g. statistical methods being used in epidemiology (coefficients of incidence and morbidity, standardization of coefficients, survival tables) factor analysis conducted by Jacobi-Hotellng's method, taxonomic methods and others. On the basis of the performed studies it has been established that the frequency of employing statistical methods in the six selected national, medical journals in the years 1988-1992 was 61.1-66.0% of the analyzed works (Tab. 3), and they generally were almost similar to the frequency provided in English language medical journals. On a whole, no significant differences were disclosed in the frequency of applied statistical methods (Tab. 4) as well as in frequency of random tests (Tab. 3) in the analyzed works, appearing in the medical journals in respective years 1988-1992. The most frequently used statistical methods in analyzed works for 1988-1992 were the measures of position 44.2-55.6% and measures of dispersion 32.5-38.5% as well as parametric tests of significance 26.3-33.1% of the works analyzed (Tab. 4). For the purpose of increasing the frequency and reliability of the used statistical methods, the didactics should be widened in the field of biostatistics at medical studies and postgraduation training designed for physicians and scientific-didactic workers.

  2. Statistical inference based on the nonparametric maximum likelihood estimator under double-truncation.

    PubMed

    Emura, Takeshi; Konno, Yoshihiko; Michimae, Hirofumi

    2015-07-01

    Doubly truncated data consist of samples whose observed values fall between the right- and left- truncation limits. With such samples, the distribution function of interest is estimated using the nonparametric maximum likelihood estimator (NPMLE) that is obtained through a self-consistency algorithm. Owing to the complicated asymptotic distribution of the NPMLE, the bootstrap method has been suggested for statistical inference. This paper proposes a closed-form estimator for the asymptotic covariance function of the NPMLE, which is computationally attractive alternative to bootstrapping. Furthermore, we develop various statistical inference procedures, such as confidence interval, goodness-of-fit tests, and confidence bands to demonstrate the usefulness of the proposed covariance estimator. Simulations are performed to compare the proposed method with both the bootstrap and jackknife methods. The methods are illustrated using the childhood cancer dataset.

  3. Fast and accurate imputation of summary statistics enhances evidence of functional enrichment

    PubMed Central

    Pasaniuc, Bogdan; Zaitlen, Noah; Shi, Huwenbo; Bhatia, Gaurav; Gusev, Alexander; Pickrell, Joseph; Hirschhorn, Joel; Strachan, David P.; Patterson, Nick; Price, Alkes L.

    2014-01-01

    Motivation: Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. Results: In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1–5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case–control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of χ2 association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses. Availability and implementation: Publicly available software package available at http://bogdan.bioinformatics.ucla.edu/software/. Contact: bpasaniuc@mednet.ucla.edu or aprice@hsph.harvard.edu Supplementary information: Supplementary materials are available at Bioinformatics online. PMID:24990607

  4. Sampling methods for amphibians in streams in the Pacific Northwest.

    Treesearch

    R. Bruce Bury; Paul Stephen Corn

    1991-01-01

    Methods describing how to sample aquatic and semiaquatic amphibians in small streams and headwater habitats in the Pacific Northwest are presented. We developed a technique that samples 10-meter stretches of selected streams, which was adequate to detect presence or absence of amphibian species and provided sample sizes statistically sufficient to compare abundance of...

  5. Statistical Inference on Memory Structure of Processes and Its Applications to Information Theory

    DTIC Science & Technology

    2016-05-12

    valued times series from a sample. (A practical algorithm to compute the estimator is a work in progress.) Third, finitely-valued spatial processes...ES) U.S. Army Research Office P.O. Box 12211 Research Triangle Park, NC 27709-2211 mathematical statistics; time series ; Markov chains; random...proved. Second, a statistical method is developed to estimate the memory depth of discrete- time and continuously-valued times series from a sample. (A

  6. Application of Digital Image Analysis to Determine Pancreatic Islet Mass and Purity in Clinical Islet Isolation and Transplantation

    PubMed Central

    Wang, Ling-jia; Kissler, Hermann J; Wang, Xiaojun; Cochet, Olivia; Krzystyniak, Adam; Misawa, Ryosuke; Golab, Karolina; Tibudan, Martin; Grzanka, Jakub; Savari, Omid; Grose, Randall; Kaufman, Dixon B; Millis, Michael; Witkowski, Piotr

    2015-01-01

    Pancreatic islet mass, represented by islet equivalent (IEQ), is the most important parameter in decision making for clinical islet transplantation. To obtain IEQ, the sample of islets is routinely counted manually under a microscope and discarded thereafter. Islet purity, another parameter in islet processing, is routinely acquired by estimation only. In this study, we validated our digital image analysis (DIA) system developed using the software of Image Pro Plus for islet mass and purity assessment. Application of the DIA allows to better comply with current good manufacturing practice (cGMP) standards. Human islet samples were captured as calibrated digital images for the permanent record. Five trained technicians participated in determination of IEQ and purity by manual counting method and DIA. IEQ count showed statistically significant correlations between the manual method and DIA in all sample comparisons (r >0.819 and p < 0.0001). Statistically significant difference in IEQ between both methods was found only in High purity 100μL sample group (p = 0.029). As far as purity determination, statistically significant differences between manual assessment and DIA measurement was found in High and Low purity 100μL samples (p<0.005), In addition, islet particle number (IPN) and the IEQ/IPN ratio did not differ statistically between manual counting method and DIA. In conclusion, the DIA used in this study is a reliable technique in determination of IEQ and purity. Islet sample preserved as a digital image and results produced by DIA can be permanently stored for verification, technical training and islet information exchange between different islet centers. Therefore, DIA complies better with cGMP requirements than the manual counting method. We propose DIA as a quality control tool to supplement the established standard manual method for islets counting and purity estimation. PMID:24806436

  7. Efficient statistical tests to compare Youden index: accounting for contingency correlation.

    PubMed

    Chen, Fangyao; Xue, Yuqiang; Tan, Ming T; Chen, Pingyan

    2015-04-30

    Youden index is widely utilized in studies evaluating accuracy of diagnostic tests and performance of predictive, prognostic, or risk models. However, both one and two independent sample tests on Youden index have been derived ignoring the dependence (association) between sensitivity and specificity, resulting in potentially misleading findings. Besides, paired sample test on Youden index is currently unavailable. This article develops efficient statistical inference procedures for one sample, independent, and paired sample tests on Youden index by accounting for contingency correlation, namely associations between sensitivity and specificity and paired samples typically represented in contingency tables. For one and two independent sample tests, the variances are estimated by Delta method, and the statistical inference is based on the central limit theory, which are then verified by bootstrap estimates. For paired samples test, we show that the estimated covariance of the two sensitivities and specificities can be represented as a function of kappa statistic so the test can be readily carried out. We then show the remarkable accuracy of the estimated variance using a constrained optimization approach. Simulation is performed to evaluate the statistical properties of the derived tests. The proposed approaches yield more stable type I errors at the nominal level and substantially higher power (efficiency) than does the original Youden's approach. Therefore, the simple explicit large sample solution performs very well. Because we can readily implement the asymptotic and exact bootstrap computation with common software like R, the method is broadly applicable to the evaluation of diagnostic tests and model performance. Copyright © 2015 John Wiley & Sons, Ltd.

  8. Towards efficient multi-scale methods for monitoring sugarcane aphid infestations in sorghum

    USDA-ARS?s Scientific Manuscript database

    We discuss approaches and issues involved with developing optimal monitoring methods for sugarcane aphid infestations (SCA) in grain sorghum. We discuss development of sequential sampling methods that allow for estimation of the number of aphids per sample unit, and statistical decision making rela...

  9. Confidence Interval Coverage for Cohen's Effect Size Statistic

    ERIC Educational Resources Information Center

    Algina, James; Keselman, H. J.; Penfield, Randall D.

    2006-01-01

    Kelley compared three methods for setting a confidence interval (CI) around Cohen's standardized mean difference statistic: the noncentral-"t"-based, percentile (PERC) bootstrap, and biased-corrected and accelerated (BCA) bootstrap methods under three conditions of nonnormality, eight cases of sample size, and six cases of population…

  10. Sampling methods to the statistical control of the production of blood components.

    PubMed

    Pereira, Paulo; Seghatchian, Jerard; Caldeira, Beatriz; Santos, Paula; Castro, Rosa; Fernandes, Teresa; Xavier, Sandra; de Sousa, Gracinda; de Almeida E Sousa, João Paulo

    2017-12-01

    The control of blood components specifications is a requirement generalized in Europe by the European Commission Directives and in the US by the AABB standards. The use of a statistical process control methodology is recommended in the related literature, including the EDQM guideline. The control reliability is dependent of the sampling. However, a correct sampling methodology seems not to be systematically applied. Commonly, the sampling is intended to comply uniquely with the 1% specification to the produced blood components. Nevertheless, on a purely statistical viewpoint, this model could be argued not to be related to a consistent sampling technique. This could be a severe limitation to detect abnormal patterns and to assure that the production has a non-significant probability of producing nonconforming components. This article discusses what is happening in blood establishments. Three statistical methodologies are proposed: simple random sampling, sampling based on the proportion of a finite population, and sampling based on the inspection level. The empirical results demonstrate that these models are practicable in blood establishments contributing to the robustness of sampling and related statistical process control decisions for the purpose they are suggested for. Copyright © 2017 Elsevier Ltd. All rights reserved.

  11. Estimating statistical uncertainty of Monte Carlo efficiency-gain in the context of a correlated sampling Monte Carlo code for brachytherapy treatment planning with non-normal dose distribution.

    PubMed

    Mukhopadhyay, Nitai D; Sampson, Andrew J; Deniz, Daniel; Alm Carlsson, Gudrun; Williamson, Jeffrey; Malusek, Alexandr

    2012-01-01

    Correlated sampling Monte Carlo methods can shorten computing times in brachytherapy treatment planning. Monte Carlo efficiency is typically estimated via efficiency gain, defined as the reduction in computing time by correlated sampling relative to conventional Monte Carlo methods when equal statistical uncertainties have been achieved. The determination of the efficiency gain uncertainty arising from random effects, however, is not a straightforward task specially when the error distribution is non-normal. The purpose of this study is to evaluate the applicability of the F distribution and standardized uncertainty propagation methods (widely used in metrology to estimate uncertainty of physical measurements) for predicting confidence intervals about efficiency gain estimates derived from single Monte Carlo runs using fixed-collision correlated sampling in a simplified brachytherapy geometry. A bootstrap based algorithm was used to simulate the probability distribution of the efficiency gain estimates and the shortest 95% confidence interval was estimated from this distribution. It was found that the corresponding relative uncertainty was as large as 37% for this particular problem. The uncertainty propagation framework predicted confidence intervals reasonably well; however its main disadvantage was that uncertainties of input quantities had to be calculated in a separate run via a Monte Carlo method. The F distribution noticeably underestimated the confidence interval. These discrepancies were influenced by several photons with large statistical weights which made extremely large contributions to the scored absorbed dose difference. The mechanism of acquiring high statistical weights in the fixed-collision correlated sampling method was explained and a mitigation strategy was proposed. Copyright © 2011 Elsevier Ltd. All rights reserved.

  12. [Application of statistics on chronic-diseases-relating observational research papers].

    PubMed

    Hong, Zhi-heng; Wang, Ping; Cao, Wei-hua

    2012-09-01

    To study the application of statistics on Chronic-diseases-relating observational research papers which were recently published in the Chinese Medical Association Magazines, with influential index above 0.5. Using a self-developed criterion, two investigators individually participated in assessing the application of statistics on Chinese Medical Association Magazines, with influential index above 0.5. Different opinions reached an agreement through discussion. A total number of 352 papers from 6 magazines, including the Chinese Journal of Epidemiology, Chinese Journal of Oncology, Chinese Journal of Preventive Medicine, Chinese Journal of Cardiology, Chinese Journal of Internal Medicine and Chinese Journal of Endocrinology and Metabolism, were reviewed. The rate of clear statement on the following contents as: research objectives, t target audience, sample issues, objective inclusion criteria and variable definitions were 99.43%, 98.57%, 95.43%, 92.86% and 96.87%. The correct rates of description on quantitative and qualitative data were 90.94% and 91.46%, respectively. The rates on correctly expressing the results, on statistical inference methods related to quantitative, qualitative data and modeling were 100%, 95.32% and 87.19%, respectively. 89.49% of the conclusions could directly response to the research objectives. However, 69.60% of the papers did not mention the exact names of the study design, statistically, that the papers were using. 11.14% of the papers were in lack of further statement on the exclusion criteria. Percentage of the papers that could clearly explain the sample size estimation only taking up as 5.16%. Only 24.21% of the papers clearly described the variable value assignment. Regarding the introduction on statistical conduction and on database methods, the rate was only 24.15%. 18.75% of the papers did not express the statistical inference methods sufficiently. A quarter of the papers did not use 'standardization' appropriately. As for the aspect of statistical inference, the rate of description on statistical testing prerequisite was only 24.12% while 9.94% papers did not even employ the statistical inferential method that should be used. The main deficiencies on the application of Statistics used in papers related to Chronic-diseases-related observational research were as follows: lack of sample-size determination, variable value assignment description not sufficient, methods on statistics were not introduced clearly or properly, lack of consideration for pre-requisition regarding the use of statistical inferences.

  13. How conservative is Fisher's exact test? A quantitative evaluation of the two-sample comparative binomial trial.

    PubMed

    Crans, Gerald G; Shuster, Jonathan J

    2008-08-15

    The debate as to which statistical methodology is most appropriate for the analysis of the two-sample comparative binomial trial has persisted for decades. Practitioners who favor the conditional methods of Fisher, Fisher's exact test (FET), claim that only experimental outcomes containing the same amount of information should be considered when performing analyses. Hence, the total number of successes should be fixed at its observed level in hypothetical repetitions of the experiment. Using conditional methods in clinical settings can pose interpretation difficulties, since results are derived using conditional sample spaces rather than the set of all possible outcomes. Perhaps more importantly from a clinical trial design perspective, this test can be too conservative, resulting in greater resource requirements and more subjects exposed to an experimental treatment. The actual significance level attained by FET (the size of the test) has not been reported in the statistical literature. Berger (J. R. Statist. Soc. D (The Statistician) 2001; 50:79-85) proposed assessing the conservativeness of conditional methods using p-value confidence intervals. In this paper we develop a numerical algorithm that calculates the size of FET for sample sizes, n, up to 125 per group at the two-sided significance level, alpha = 0.05. Additionally, this numerical method is used to define new significance levels alpha(*) = alpha+epsilon, where epsilon is a small positive number, for each n, such that the size of the test is as close as possible to the pre-specified alpha (0.05 for the current work) without exceeding it. Lastly, a sample size and power calculation example are presented, which demonstrates the statistical advantages of implementing the adjustment to FET (using alpha(*) instead of alpha) in the two-sample comparative binomial trial. 2008 John Wiley & Sons, Ltd

  14. Evaluating the effect of disturbed ensemble distributions on SCFG based statistical sampling of RNA secondary structures.

    PubMed

    Scheid, Anika; Nebel, Markus E

    2012-07-09

    Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent) stochastic context-free grammar (SCFG) that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF) approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples), where neither of these two competing approaches generally outperforms the other. In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones), then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst-case time requirements of such an SCFG based sampling method without significant accuracy losses. If, on the other hand, the quality of sampled structures can be observed to strongly react to slight disturbances, there is little hope for improving the complexity by heuristic procedures. We hence provide a reliable test for the hypothesis that a heuristic method could be implemented to improve the time scaling of RNA secondary structure prediction in the worst-case - without sacrificing much of the accuracy of the results. Our experiments indicate that absolute errors generally lead to the generation of useless sample sets, whereas relative errors seem to have only small negative impact on both the predictive accuracy and the overall quality of resulting structure samples. Based on these observations, we present some useful ideas for developing a time-reduced sampling method guaranteeing an acceptable predictive accuracy. We also discuss some inherent drawbacks that arise in the context of approximation. The key results of this paper are crucial for the design of an efficient and competitive heuristic prediction method based on the increasingly accepted and attractive statistical sampling approach. This has indeed been indicated by the construction of prototype algorithms.

  15. Evaluating the effect of disturbed ensemble distributions on SCFG based statistical sampling of RNA secondary structures

    PubMed Central

    2012-01-01

    Background Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent) stochastic context-free grammar (SCFG) that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF) approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples), where neither of these two competing approaches generally outperforms the other. Results In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones), then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst-case time requirements of such an SCFG based sampling method without significant accuracy losses. If, on the other hand, the quality of sampled structures can be observed to strongly react to slight disturbances, there is little hope for improving the complexity by heuristic procedures. We hence provide a reliable test for the hypothesis that a heuristic method could be implemented to improve the time scaling of RNA secondary structure prediction in the worst-case – without sacrificing much of the accuracy of the results. Conclusions Our experiments indicate that absolute errors generally lead to the generation of useless sample sets, whereas relative errors seem to have only small negative impact on both the predictive accuracy and the overall quality of resulting structure samples. Based on these observations, we present some useful ideas for developing a time-reduced sampling method guaranteeing an acceptable predictive accuracy. We also discuss some inherent drawbacks that arise in the context of approximation. The key results of this paper are crucial for the design of an efficient and competitive heuristic prediction method based on the increasingly accepted and attractive statistical sampling approach. This has indeed been indicated by the construction of prototype algorithms. PMID:22776037

  16. An investigative comparison of purging and non-purging groundwater sampling methods in Karoo aquifer monitoring wells

    NASA Astrophysics Data System (ADS)

    Gomo, M.; Vermeulen, D.

    2015-03-01

    An investigation was conducted to statistically compare the influence of non-purging and purging groundwater sampling methods on analysed inorganic chemistry parameters and calculated saturation indices. Groundwater samples were collected from 15 monitoring wells drilled in Karoo aquifers before and after purging for the comparative study. For the non-purging method, samples were collected from groundwater flow zones located in the wells using electrical conductivity (EC) profiling. The two data sets of non-purged and purged groundwater samples were analysed for inorganic chemistry parameters at the Institute of Groundwater Studies (IGS) laboratory of the Free University in South Africa. Saturation indices for mineral phases that were found in the data base of PHREEQC hydrogeochemical model were calculated for each data set. Four one-way ANOVA tests were conducted using Microsoft excel 2007 to investigate if there is any statistically significant difference between: (1) all inorganic chemistry parameters measured in the non-purged and purged groundwater samples per each specific well, (2) all mineral saturation indices calculated for the non-purged and purged groundwater samples per each specific well, (3) individual inorganic chemistry parameters measured in the non-purged and purged groundwater samples across all wells and (4) Individual mineral saturation indices calculated for non-purged and purged groundwater samples across all wells. For all the ANOVA tests conducted, the calculated alpha values (p) are greater than 0.05 (significance level) and test statistic (F) is less than the critical value (Fcrit) (F < Fcrit). The results imply that there was no statistically significant difference between the two data sets. With a 95% confidence, it was therefore concluded that the variance between groups was rather due to random chance and not to the influence of the sampling methods (tested factor). It is therefore be possible that in some hydrogeologic conditions, non-purged groundwater samples might be just as representative as the purged ones. The findings of this study can provide an important platform for future evidence oriented research investigations to establish the necessity of purging prior to groundwater sampling in different aquifer systems.

  17. Radar prediction of absolute rain fade distributions for earth-satellite paths and general methods for extrapolation of fade statistics to other locations

    NASA Technical Reports Server (NTRS)

    Goldhirsh, J.

    1982-01-01

    The first absolute rain fade distribution method described establishes absolute fade statistics at a given site by means of a sampled radar data base. The second method extrapolates absolute fade statistics from one location to another, given simultaneously measured fade and rain rate statistics at the former. Both methods employ similar conditional fade statistic concepts and long term rain rate distributions. Probability deviations in the 2-19% range, with an 11% average, were obtained upon comparison of measured and predicted levels at given attenuations. The extrapolation of fade distributions to other locations at 28 GHz showed very good agreement with measured data at three sites located in the continental temperate region.

  18. Standard deviation and standard error of the mean.

    PubMed

    Lee, Dong Kyu; In, Junyong; Lee, Sangseok

    2015-06-01

    In most clinical and experimental studies, the standard deviation (SD) and the estimated standard error of the mean (SEM) are used to present the characteristics of sample data and to explain statistical analysis results. However, some authors occasionally muddle the distinctive usage between the SD and SEM in medical literature. Because the process of calculating the SD and SEM includes different statistical inferences, each of them has its own meaning. SD is the dispersion of data in a normal distribution. In other words, SD indicates how accurately the mean represents sample data. However the meaning of SEM includes statistical inference based on the sampling distribution. SEM is the SD of the theoretical distribution of the sample means (the sampling distribution). While either SD or SEM can be applied to describe data and statistical results, one should be aware of reasonable methods with which to use SD and SEM. We aim to elucidate the distinctions between SD and SEM and to provide proper usage guidelines for both, which summarize data and describe statistical results.

  19. Standard deviation and standard error of the mean

    PubMed Central

    In, Junyong; Lee, Sangseok

    2015-01-01

    In most clinical and experimental studies, the standard deviation (SD) and the estimated standard error of the mean (SEM) are used to present the characteristics of sample data and to explain statistical analysis results. However, some authors occasionally muddle the distinctive usage between the SD and SEM in medical literature. Because the process of calculating the SD and SEM includes different statistical inferences, each of them has its own meaning. SD is the dispersion of data in a normal distribution. In other words, SD indicates how accurately the mean represents sample data. However the meaning of SEM includes statistical inference based on the sampling distribution. SEM is the SD of the theoretical distribution of the sample means (the sampling distribution). While either SD or SEM can be applied to describe data and statistical results, one should be aware of reasonable methods with which to use SD and SEM. We aim to elucidate the distinctions between SD and SEM and to provide proper usage guidelines for both, which summarize data and describe statistical results. PMID:26045923

  20. The Analysis of Organizational Diagnosis on Based Six Box Model in Universities

    ERIC Educational Resources Information Center

    Hamid, Rahimi; Siadat, Sayyed Ali; Reza, Hoveida; Arash, Shahin; Ali, Nasrabadi Hasan; Azizollah, Arbabisarjou

    2011-01-01

    Purpose: The analysis of organizational diagnosis on based six box model at universities. Research method: Research method was descriptive-survey. Statistical population consisted of 1544 faculty members of universities which through random strafed sampling method 218 persons were chosen as the sample. Research Instrument were organizational…

  1. What influences the choice of assessment methods in health technology assessments? Statistical analysis of international health technology assessments from 1989 to 2002.

    PubMed

    Draborg, Eva; Andersen, Christian Kronborg

    2006-01-01

    Health technology assessment (HTA) has been used as input in decision making worldwide for more than 25 years. However, no uniform definition of HTA or agreement on assessment methods exists, leaving open the question of what influences the choice of assessment methods in HTAs. The objective of this study is to analyze statistically a possible relationship between methods of assessment used in practical HTAs, type of assessed technology, type of assessors, and year of publication. A sample of 433 HTAs published by eleven leading institutions or agencies in nine countries was reviewed and analyzed by multiple logistic regression. The study shows that outsourcing of HTA reports to external partners is associated with a higher likelihood of using assessment methods, such as meta-analysis, surveys, economic evaluations, and randomized controlled trials; and with a lower likelihood of using assessment methods, such as literature reviews and "other methods". The year of publication was statistically related to the inclusion of economic evaluations and shows a decreasing likelihood during the year span. The type of assessed technology was related to economic evaluations with a decreasing likelihood, to surveys, and to "other methods" with a decreasing likelihood when pharmaceuticals were the assessed type of technology. During the period from 1989 to 2002, no major developments in assessment methods used in practical HTAs were shown statistically in a sample of 433 HTAs worldwide. Outsourcing to external assessors has a statistically significant influence on choice of assessment methods.

  2. THE SCREENING AND RANKING ALGORITHM FOR CHANGE-POINTS DETECTION IN MULTIPLE SAMPLES

    PubMed Central

    Song, Chi; Min, Xiaoyi; Zhang, Heping

    2016-01-01

    The chromosome copy number variation (CNV) is the deviation of genomic regions from their normal copy number states, which may associate with many human diseases. Current genetic studies usually collect hundreds to thousands of samples to study the association between CNV and diseases. CNVs can be called by detecting the change-points in mean for sequences of array-based intensity measurements. Although multiple samples are of interest, the majority of the available CNV calling methods are single sample based. Only a few multiple sample methods have been proposed using scan statistics that are computationally intensive and designed toward either common or rare change-points detection. In this paper, we propose a novel multiple sample method by adaptively combining the scan statistic of the screening and ranking algorithm (SaRa), which is computationally efficient and is able to detect both common and rare change-points. We prove that asymptotically this method can find the true change-points with almost certainty and show in theory that multiple sample methods are superior to single sample methods when shared change-points are of interest. Additionally, we report extensive simulation studies to examine the performance of our proposed method. Finally, using our proposed method as well as two competing approaches, we attempt to detect CNVs in the data from the Primary Open-Angle Glaucoma Genes and Environment study, and conclude that our method is faster and requires less information while our ability to detect the CNVs is comparable or better. PMID:28090239

  3. Point Intercept (PO)

    Treesearch

    John F. Caratti

    2006-01-01

    The FIREMON Point Intercept (PO) method is used to assess changes in plant species cover or ground cover for a macroplot. This method uses a narrow diameter sampling pole or sampling pins, placed at systematic intervals along line transects to sample within plot variation and quantify statistically valid changes in plant species cover and height over time. Plant...

  4. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gilbert, Richard O.

    The application of statistics to environmental pollution monitoring studies requires a knowledge of statistical analysis methods particularly well suited to pollution data. This book fills that need by providing sampling plans, statistical tests, parameter estimation procedure techniques, and references to pertinent publications. Most of the statistical techniques are relatively simple, and examples, exercises, and case studies are provided to illustrate procedures. The book is logically divided into three parts. Chapters 1, 2, and 3 are introductory chapters. Chapters 4 through 10 discuss field sampling designs and Chapters 11 through 18 deal with a broad range of statistical analysis procedures. Somemore » statistical techniques given here are not commonly seen in statistics book. For example, see methods for handling correlated data (Sections 4.5 and 11.12), for detecting hot spots (Chapter 10), and for estimating a confidence interval for the mean of a lognormal distribution (Section 13.2). Also, Appendix B lists a computer code that estimates and tests for trends over time at one or more monitoring stations using nonparametric methods (Chapters 16 and 17). Unfortunately, some important topics could not be included because of their complexity and the need to limit the length of the book. For example, only brief mention could be made of time series analysis using Box-Jenkins methods and of kriging techniques for estimating spatial and spatial-time patterns of pollution, although multiple references on these topics are provided. Also, no discussion of methods for assessing risks from environmental pollution could be included.« less

  5. The Statistical Power of Planned Comparisons.

    ERIC Educational Resources Information Center

    Benton, Roberta L.

    Basic principles underlying statistical power are examined; and issues pertaining to effect size, sample size, error variance, and significance level are highlighted via the use of specific hypothetical examples. Analysis of variance (ANOVA) and related methods remain popular, although other procedures sometimes have more statistical power against…

  6. Statistical inference for tumor growth inhibition T/C ratio.

    PubMed

    Wu, Jianrong

    2010-09-01

    The tumor growth inhibition T/C ratio is commonly used to quantify treatment effects in drug screening tumor xenograft experiments. The T/C ratio is converted to an antitumor activity rating using an arbitrary cutoff point and often without any formal statistical inference. Here, we applied a nonparametric bootstrap method and a small sample likelihood ratio statistic to make a statistical inference of the T/C ratio, including both hypothesis testing and a confidence interval estimate. Furthermore, sample size and power are also discussed for statistical design of tumor xenograft experiments. Tumor xenograft data from an actual experiment were analyzed to illustrate the application.

  7. Statistical methods used in the public health literature and implications for training of public health professionals

    PubMed Central

    Hayat, Matthew J.; Powell, Amanda; Johnson, Tessa; Cadwell, Betsy L.

    2017-01-01

    Statistical literacy and knowledge is needed to read and understand the public health literature. The purpose of this study was to quantify basic and advanced statistical methods used in public health research. We randomly sampled 216 published articles from seven top tier general public health journals. Studies were reviewed by two readers and a standardized data collection form completed for each article. Data were analyzed with descriptive statistics and frequency distributions. Results were summarized for statistical methods used in the literature, including descriptive and inferential statistics, modeling, advanced statistical techniques, and statistical software used. Approximately 81.9% of articles reported an observational study design and 93.1% of articles were substantively focused. Descriptive statistics in table or graphical form were reported in more than 95% of the articles, and statistical inference reported in more than 76% of the studies reviewed. These results reveal the types of statistical methods currently used in the public health literature. Although this study did not obtain information on what should be taught, information on statistical methods being used is useful for curriculum development in graduate health sciences education, as well as making informed decisions about continuing education for public health professionals. PMID:28591190

  8. Statistical methods used in the public health literature and implications for training of public health professionals.

    PubMed

    Hayat, Matthew J; Powell, Amanda; Johnson, Tessa; Cadwell, Betsy L

    2017-01-01

    Statistical literacy and knowledge is needed to read and understand the public health literature. The purpose of this study was to quantify basic and advanced statistical methods used in public health research. We randomly sampled 216 published articles from seven top tier general public health journals. Studies were reviewed by two readers and a standardized data collection form completed for each article. Data were analyzed with descriptive statistics and frequency distributions. Results were summarized for statistical methods used in the literature, including descriptive and inferential statistics, modeling, advanced statistical techniques, and statistical software used. Approximately 81.9% of articles reported an observational study design and 93.1% of articles were substantively focused. Descriptive statistics in table or graphical form were reported in more than 95% of the articles, and statistical inference reported in more than 76% of the studies reviewed. These results reveal the types of statistical methods currently used in the public health literature. Although this study did not obtain information on what should be taught, information on statistical methods being used is useful for curriculum development in graduate health sciences education, as well as making informed decisions about continuing education for public health professionals.

  9. An Improved LC-ESI-MS/MS Method to Quantify Pregabalin in Human Plasma and Dry Plasma Spot for Therapeutic Monitoring and Pharmacokinetic Applications.

    PubMed

    Dwivedi, Jaya; Namdev, Kuldeep K; Chilkoti, Deepak C; Verma, Surajpal; Sharma, Swapnil

    2018-06-06

    Therapeutic drug monitoring (TDM) of anti-epileptic drugs provides a valid clinical tool in optimization of overall therapy. However, TDM is challenging due to the high biological samples (plasma/blood) storage/shipment costs and the limited availability of laboratories providing TDM services. Sampling in the form of dry plasma spot (DPS) or dry blood spot (DBS) is a suitable alternative to overcome these issues. An improved, simple, rapid, and stability indicating method for quantification of pregabalin in human plasma and DPS has been developed and validated. Analyses were performed on liquid chromatography tandem mass spectrometer under positive ionization mode of electrospray interface. Pregabain-d4 was used as internal standard, and the chromatographic separations were performed on Poroshell 120 EC-C18 column using an isocratic mobile phase flow rate of 1 mL/min. Stability of pregabalin in DPS was evaluated under simulated real-time conditions. Extraction procedures from plasma and DPS samples were compared using statistical tests. The method was validated considering the FDA method validation guideline. The method was linear over the concentration range of 20-16000 ng/mL and 100-10000 ng/mL in plasma and DPS, respectively. DPS samples were found stable for only one week upon storage at room temperature and for at least four weeks at freezing temperature (-20 ± 5 °C). Method was applied for quantification of pregabalin in over 600 samples of a clinical study. Statistical analyses revealed that two extraction procedures in plasma and DPS samples showed statistically insignificant difference and can be used interchangeably without any bias. Proposed method involves simple and rapid steps of sample processing that do not require a pre- or post-column derivatization procedure. The method is suitable for routine pharmacokinetic analysis and therapeutic monitoring of pregabalin.

  10. Some challenges with statistical inference in adaptive designs.

    PubMed

    Hung, H M James; Wang, Sue-Jane; Yang, Peiling

    2014-01-01

    Adaptive designs have generated a great deal of attention to clinical trial communities. The literature contains many statistical methods to deal with added statistical uncertainties concerning the adaptations. Increasingly encountered in regulatory applications are adaptive statistical information designs that allow modification of sample size or related statistical information and adaptive selection designs that allow selection of doses or patient populations during the course of a clinical trial. For adaptive statistical information designs, a few statistical testing methods are mathematically equivalent, as a number of articles have stipulated, but arguably there are large differences in their practical ramifications. We pinpoint some undesirable features of these methods in this work. For adaptive selection designs, the selection based on biomarker data for testing the correlated clinical endpoints may increase statistical uncertainty in terms of type I error probability, and most importantly the increased statistical uncertainty may be impossible to assess.

  11. Evaluation of Respondent-Driven Sampling

    PubMed Central

    McCreesh, Nicky; Frost, Simon; Seeley, Janet; Katongole, Joseph; Tarsh, Matilda Ndagire; Ndunguse, Richard; Jichi, Fatima; Lunel, Natasha L; Maher, Dermot; Johnston, Lisa G; Sonnenberg, Pam; Copas, Andrew J; Hayes, Richard J; White, Richard G

    2012-01-01

    Background Respondent-driven sampling is a novel variant of link-tracing sampling for estimating the characteristics of hard-to-reach groups, such as HIV prevalence in sex-workers. Despite its use by leading health organizations, the performance of this method in realistic situations is still largely unknown. We evaluated respondent-driven sampling by comparing estimates from a respondent-driven sampling survey with total-population data. Methods Total-population data on age, tribe, religion, socioeconomic status, sexual activity and HIV status were available on a population of 2402 male household-heads from an open cohort in rural Uganda. A respondent-driven sampling (RDS) survey was carried out in this population, employing current methods of sampling (RDS sample) and statistical inference (RDS estimates). Analyses were carried out for the full RDS sample and then repeated for the first 250 recruits (small sample). Results We recruited 927 household-heads. Full and small RDS samples were largely representative of the total population, but both samples under-represented men who were younger, of higher socioeconomic status, and with unknown sexual activity and HIV status. Respondent-driven-sampling statistical-inference methods failed to reduce these biases. Only 31%-37% (depending on method and sample size) of RDS estimates were closer to the true population proportions than the RDS sample proportions. Only 50%-74% of respondent-driven-sampling bootstrap 95% confidence intervals included the population proportion. Conclusions Respondent-driven sampling produced a generally representative sample of this well-connected non-hidden population. However, current respondent-driven-sampling inference methods failed to reduce bias when it occurred. Whether the data required to remove bias and measure precision can be collected in a respondent-driven sampling survey is unresolved. Respondent-driven sampling should be regarded as a (potentially superior) form of convenience-sampling method, and caution is required when interpreting findings based on the sampling method. PMID:22157309

  12. Inference with viral quasispecies diversity indices: clonal and NGS approaches.

    PubMed

    Gregori, Josep; Salicrú, Miquel; Domingo, Esteban; Sanchez, Alex; Esteban, Juan I; Rodríguez-Frías, Francisco; Quer, Josep

    2014-04-15

    Given the inherent dynamics of a viral quasispecies, we are often interested in the comparison of diversity indices of sequential samples of a patient, or in the comparison of diversity indices of virus in groups of patients in a treated versus control design. It is then important to make sure that the diversity measures from each sample may be compared with no bias and within a consistent statistical framework. In the present report, we review some indices often used as measures for viral quasispecies complexity and provide means for statistical inference, applying procedures taken from the ecology field. In particular, we examine the Shannon entropy and the mutation frequency, and we discuss the appropriateness of different normalization methods of the Shannon entropy found in the literature. By taking amplicons ultra-deep pyrosequencing (UDPS) raw data as a surrogate of a real hepatitis C virus viral population, we study through in-silico sampling the statistical properties of these indices under two methods of viral quasispecies sampling, classical cloning followed by Sanger sequencing (CCSS) and next-generation sequencing (NGS) such as UDPS. We propose solutions specific to each of the two sampling methods-CCSS and NGS-to guarantee statistically conforming conclusions as free of bias as possible. josep.gregori@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  13. Implementation of unsteady sampling procedures for the parallel direct simulation Monte Carlo method

    NASA Astrophysics Data System (ADS)

    Cave, H. M.; Tseng, K.-C.; Wu, J.-S.; Jermy, M. C.; Huang, J.-C.; Krumdieck, S. P.

    2008-06-01

    An unsteady sampling routine for a general parallel direct simulation Monte Carlo method called PDSC is introduced, allowing the simulation of time-dependent flow problems in the near continuum range. A post-processing procedure called DSMC rapid ensemble averaging method (DREAM) is developed to improve the statistical scatter in the results while minimising both memory and simulation time. This method builds an ensemble average of repeated runs over small number of sampling intervals prior to the sampling point of interest by restarting the flow using either a Maxwellian distribution based on macroscopic properties for near equilibrium flows (DREAM-I) or output instantaneous particle data obtained by the original unsteady sampling of PDSC for strongly non-equilibrium flows (DREAM-II). The method is validated by simulating shock tube flow and the development of simple Couette flow. Unsteady PDSC is found to accurately predict the flow field in both cases with significantly reduced run-times over single processor code and DREAM greatly reduces the statistical scatter in the results while maintaining accurate particle velocity distributions. Simulations are then conducted of two applications involving the interaction of shocks over wedges. The results of these simulations are compared to experimental data and simulations from the literature where there these are available. In general, it was found that 10 ensembled runs of DREAM processing could reduce the statistical uncertainty in the raw PDSC data by 2.5-3.3 times, based on the limited number of cases in the present study.

  14. Development and Validation of an Instrument to Measure Indonesian Pre-Service Teachers' Conceptions of Statistics

    ERIC Educational Resources Information Center

    Idris, Khairiani; Yang, Kai-Lin

    2017-01-01

    This article reports the results of a mixed-methods approach to develop and validate an instrument to measure Indonesian pre-service teachers' conceptions of statistics. First, a phenomenographic study involving a sample of 44 participants uncovered six categories of conceptions of statistics. Second, an instrument of conceptions of statistics was…

  15. Sample size and power considerations in network meta-analysis

    PubMed Central

    2012-01-01

    Background Network meta-analysis is becoming increasingly popular for establishing comparative effectiveness among multiple interventions for the same disease. Network meta-analysis inherits all methodological challenges of standard pairwise meta-analysis, but with increased complexity due to the multitude of intervention comparisons. One issue that is now widely recognized in pairwise meta-analysis is the issue of sample size and statistical power. This issue, however, has so far only received little attention in network meta-analysis. To date, no approaches have been proposed for evaluating the adequacy of the sample size, and thus power, in a treatment network. Findings In this article, we develop easy-to-use flexible methods for estimating the ‘effective sample size’ in indirect comparison meta-analysis and network meta-analysis. The effective sample size for a particular treatment comparison can be interpreted as the number of patients in a pairwise meta-analysis that would provide the same degree and strength of evidence as that which is provided in the indirect comparison or network meta-analysis. We further develop methods for retrospectively estimating the statistical power for each comparison in a network meta-analysis. We illustrate the performance of the proposed methods for estimating effective sample size and statistical power using data from a network meta-analysis on interventions for smoking cessation including over 100 trials. Conclusion The proposed methods are easy to use and will be of high value to regulatory agencies and decision makers who must assess the strength of the evidence supporting comparative effectiveness estimates. PMID:22992327

  16. Line Intercept (LI)

    Treesearch

    John F. Caratti

    2006-01-01

    The FIREMON Line Intercept (LI) method is used to assess changes in plant species cover for a macroplot. This method uses multiple line transects to sample within plot variation and quantify statistically valid changes in plant species cover and height over time. This method is suited for most forest and rangeland communities, but is especially useful for sampling...

  17. Interpretation of correlations in clinical research.

    PubMed

    Hung, Man; Bounsanga, Jerry; Voss, Maren Wright

    2017-11-01

    Critically analyzing research is a key skill in evidence-based practice and requires knowledge of research methods, results interpretation, and applications, all of which rely on a foundation based in statistics. Evidence-based practice makes high demands on trained medical professionals to interpret an ever-expanding array of research evidence. As clinical training emphasizes medical care rather than statistics, it is useful to review the basics of statistical methods and what they mean for interpreting clinical studies. We reviewed the basic concepts of correlational associations, violations of normality, unobserved variable bias, sample size, and alpha inflation. The foundations of causal inference were discussed and sound statistical analyses were examined. We discuss four ways in which correlational analysis is misused, including causal inference overreach, over-reliance on significance, alpha inflation, and sample size bias. Recent published studies in the medical field provide evidence of causal assertion overreach drawn from correlational findings. The findings present a primer on the assumptions and nature of correlational methods of analysis and urge clinicians to exercise appropriate caution as they critically analyze the evidence before them and evaluate evidence that supports practice. Critically analyzing new evidence requires statistical knowledge in addition to clinical knowledge. Studies can overstate relationships, expressing causal assertions when only correlational evidence is available. Failure to account for the effect of sample size in the analyses tends to overstate the importance of predictive variables. It is important not to overemphasize the statistical significance without consideration of effect size and whether differences could be considered clinically meaningful.

  18. Equivalent statistics and data interpretation.

    PubMed

    Francis, Gregory

    2017-08-01

    Recent reform efforts in psychological science have led to a plethora of choices for scientists to analyze their data. A scientist making an inference about their data must now decide whether to report a p value, summarize the data with a standardized effect size and its confidence interval, report a Bayes Factor, or use other model comparison methods. To make good choices among these options, it is necessary for researchers to understand the characteristics of the various statistics used by the different analysis frameworks. Toward that end, this paper makes two contributions. First, it shows that for the case of a two-sample t test with known sample sizes, many different summary statistics are mathematically equivalent in the sense that they are based on the very same information in the data set. When the sample sizes are known, the p value provides as much information about a data set as the confidence interval of Cohen's d or a JZS Bayes factor. Second, this equivalence means that different analysis methods differ only in their interpretation of the empirical data. At first glance, it might seem that mathematical equivalence of the statistics suggests that it does not matter much which statistic is reported, but the opposite is true because the appropriateness of a reported statistic is relative to the inference it promotes. Accordingly, scientists should choose an analysis method appropriate for their scientific investigation. A direct comparison of the different inferential frameworks provides some guidance for scientists to make good choices and improve scientific practice.

  19. “Magnitude-based Inference”: A Statistical Review

    PubMed Central

    Welsh, Alan H.; Knight, Emma J.

    2015-01-01

    ABSTRACT Purpose We consider “magnitude-based inference” and its interpretation by examining in detail its use in the problem of comparing two means. Methods We extract from the spreadsheets, which are provided to users of the analysis (http://www.sportsci.org/), a precise description of how “magnitude-based inference” is implemented. We compare the implemented version of the method with general descriptions of it and interpret the method in familiar statistical terms. Results and Conclusions We show that “magnitude-based inference” is not a progressive improvement on modern statistics. The additional probabilities introduced are not directly related to the confidence interval but, rather, are interpretable either as P values for two different nonstandard tests (for different null hypotheses) or as approximate Bayesian calculations, which also lead to a type of test. We also discuss sample size calculations associated with “magnitude-based inference” and show that the substantial reduction in sample sizes claimed for the method (30% of the sample size obtained from standard frequentist calculations) is not justifiable so the sample size calculations should not be used. Rather than using “magnitude-based inference,” a better solution is to be realistic about the limitations of the data and use either confidence intervals or a fully Bayesian analysis. PMID:25051387

  20. Trends in study design and the statistical methods employed in a leading general medicine journal.

    PubMed

    Gosho, M; Sato, Y; Nagashima, K; Takahashi, S

    2018-02-01

    Study design and statistical methods have become core components of medical research, and the methodology has become more multifaceted and complicated over time. The study of the comprehensive details and current trends of study design and statistical methods is required to support the future implementation of well-planned clinical studies providing information about evidence-based medicine. Our purpose was to illustrate study design and statistical methods employed in recent medical literature. This was an extension study of Sato et al. (N Engl J Med 2017; 376: 1086-1087), which reviewed 238 articles published in 2015 in the New England Journal of Medicine (NEJM) and briefly summarized the statistical methods employed in NEJM. Using the same database, we performed a new investigation of the detailed trends in study design and individual statistical methods that were not reported in the Sato study. Due to the CONSORT statement, prespecification and justification of sample size are obligatory in planning intervention studies. Although standard survival methods (eg Kaplan-Meier estimator and Cox regression model) were most frequently applied, the Gray test and Fine-Gray proportional hazard model for considering competing risks were sometimes used for a more valid statistical inference. With respect to handling missing data, model-based methods, which are valid for missing-at-random data, were more frequently used than single imputation methods. These methods are not recommended as a primary analysis, but they have been applied in many clinical trials. Group sequential design with interim analyses was one of the standard designs, and novel design, such as adaptive dose selection and sample size re-estimation, was sometimes employed in NEJM. Model-based approaches for handling missing data should replace single imputation methods for primary analysis in the light of the information found in some publications. Use of adaptive design with interim analyses is increasing after the presentation of the FDA guidance for adaptive design. © 2017 John Wiley & Sons Ltd.

  1. A note on the kappa statistic for clustered dichotomous data.

    PubMed

    Zhou, Ming; Yang, Zhao

    2014-06-30

    The kappa statistic is widely used to assess the agreement between two raters. Motivated by a simulation-based cluster bootstrap method to calculate the variance of the kappa statistic for clustered physician-patients dichotomous data, we investigate its special correlation structure and develop a new simple and efficient data generation algorithm. For the clustered physician-patients dichotomous data, based on the delta method and its special covariance structure, we propose a semi-parametric variance estimator for the kappa statistic. An extensive Monte Carlo simulation study is performed to evaluate the performance of the new proposal and five existing methods with respect to the empirical coverage probability, root-mean-square error, and average width of the 95% confidence interval for the kappa statistic. The variance estimator ignoring the dependence within a cluster is generally inappropriate, and the variance estimators from the new proposal, bootstrap-based methods, and the sampling-based delta method perform reasonably well for at least a moderately large number of clusters (e.g., the number of clusters K ⩾50). The new proposal and sampling-based delta method provide convenient tools for efficient computations and non-simulation-based alternatives to the existing bootstrap-based methods. Moreover, the new proposal has acceptable performance even when the number of clusters is as small as K = 25. To illustrate the practical application of all the methods, one psychiatric research data and two simulated clustered physician-patients dichotomous data are analyzed. Copyright © 2014 John Wiley & Sons, Ltd.

  2. Statistical Inference for Data Adaptive Target Parameters.

    PubMed

    Hubbard, Alan E; Kherad-Pajouh, Sara; van der Laan, Mark J

    2016-05-01

    Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in an estimation sample (one of the V subsamples) and corresponding complementary parameter-generating sample. For each of the V parameter-generating samples, we apply an algorithm that maps the sample to a statistical target parameter. We define our sample-split data adaptive statistical target parameter as the average of these V-sample specific target parameters. We present an estimator (and corresponding central limit theorem) of this type of data adaptive target parameter. This general methodology for generating data adaptive target parameters is demonstrated with a number of practical examples that highlight new opportunities for statistical learning from data. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming "data-driven", the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods. To suggest such potential, and to verify the predictions of the theory, extensive simulation studies, along with a data analysis based on adaptively determined intervention rules are shown and give insight into how to structure such an approach. The results show that the data adaptive target parameter approach provides a general framework and resulting methodology for data-driven science.

  3. Resampling-based Methods in Single and Multiple Testing for Equality of Covariance/Correlation Matrices

    PubMed Central

    Yang, Yang; DeGruttola, Victor

    2016-01-01

    Traditional resampling-based tests for homogeneity in covariance matrices across multiple groups resample residuals, that is, data centered by group means. These residuals do not share the same second moments when the null hypothesis is false, which makes them difficult to use in the setting of multiple testing. An alternative approach is to resample standardized residuals, data centered by group sample means and standardized by group sample covariance matrices. This approach, however, has been observed to inflate type I error when sample size is small or data are generated from heavy-tailed distributions. We propose to improve this approach by using robust estimation for the first and second moments. We discuss two statistics: the Bartlett statistic and a statistic based on eigen-decomposition of sample covariance matrices. Both statistics can be expressed in terms of standardized errors under the null hypothesis. These methods are extended to test homogeneity in correlation matrices. Using simulation studies, we demonstrate that the robust resampling approach provides comparable or superior performance, relative to traditional approaches, for single testing and reasonable performance for multiple testing. The proposed methods are applied to data collected in an HIV vaccine trial to investigate possible determinants, including vaccine status, vaccine-induced immune response level and viral genotype, of unusual correlation pattern between HIV viral load and CD4 count in newly infected patients. PMID:22740584

  4. Resampling-based methods in single and multiple testing for equality of covariance/correlation matrices.

    PubMed

    Yang, Yang; DeGruttola, Victor

    2012-06-22

    Traditional resampling-based tests for homogeneity in covariance matrices across multiple groups resample residuals, that is, data centered by group means. These residuals do not share the same second moments when the null hypothesis is false, which makes them difficult to use in the setting of multiple testing. An alternative approach is to resample standardized residuals, data centered by group sample means and standardized by group sample covariance matrices. This approach, however, has been observed to inflate type I error when sample size is small or data are generated from heavy-tailed distributions. We propose to improve this approach by using robust estimation for the first and second moments. We discuss two statistics: the Bartlett statistic and a statistic based on eigen-decomposition of sample covariance matrices. Both statistics can be expressed in terms of standardized errors under the null hypothesis. These methods are extended to test homogeneity in correlation matrices. Using simulation studies, we demonstrate that the robust resampling approach provides comparable or superior performance, relative to traditional approaches, for single testing and reasonable performance for multiple testing. The proposed methods are applied to data collected in an HIV vaccine trial to investigate possible determinants, including vaccine status, vaccine-induced immune response level and viral genotype, of unusual correlation pattern between HIV viral load and CD4 count in newly infected patients.

  5. Robustness of S1 statistic with Hodges-Lehmann for skewed distributions

    NASA Astrophysics Data System (ADS)

    Ahad, Nor Aishah; Yahaya, Sharipah Soaad Syed; Yin, Lee Ping

    2016-10-01

    Analysis of variance (ANOVA) is a common use parametric method to test the differences in means for more than two groups when the populations are normally distributed. ANOVA is highly inefficient under the influence of non- normal and heteroscedastic settings. When the assumptions are violated, researchers are looking for alternative such as Kruskal-Wallis under nonparametric or robust method. This study focused on flexible method, S1 statistic for comparing groups using median as the location estimator. S1 statistic was modified by substituting the median with Hodges-Lehmann and the default scale estimator with the variance of Hodges-Lehmann and MADn to produce two different test statistics for comparing groups. Bootstrap method was used for testing the hypotheses since the sampling distributions of these modified S1 statistics are unknown. The performance of the proposed statistic in terms of Type I error was measured and compared against the original S1 statistic, ANOVA and Kruskal-Wallis. The propose procedures show improvement compared to the original statistic especially under extremely skewed distribution.

  6. Monitoring larval populations of the Douglas-fir tussock moth and the western spruce budworm on permanent plots: sampling methods and statistical properties of data

    Treesearch

    A.R. Mason; H.G. Paul

    1994-01-01

    Procedures for monitoring larval populations of the Douglas-fir tussock moth and the western spruce budworm are recommended based on many years experience in sampling these species in eastern Oregon and Washington. It is shown that statistically reliable estimates of larval density can be made for a population by sampling host trees in a series of permanent plots in a...

  7. Statistical power and optimal design in experiments in which samples of participants respond to samples of stimuli.

    PubMed

    Westfall, Jacob; Kenny, David A; Judd, Charles M

    2014-10-01

    Researchers designing experiments in which a sample of participants responds to a sample of stimuli are faced with difficult questions about optimal study design. The conventional procedures of statistical power analysis fail to provide appropriate answers to these questions because they are based on statistical models in which stimuli are not assumed to be a source of random variation in the data, models that are inappropriate for experiments involving crossed random factors of participants and stimuli. In this article, we present new methods of power analysis for designs with crossed random factors, and we give detailed, practical guidance to psychology researchers planning experiments in which a sample of participants responds to a sample of stimuli. We extensively examine 5 commonly used experimental designs, describe how to estimate statistical power in each, and provide power analysis results based on a reasonable set of default parameter values. We then develop general conclusions and formulate rules of thumb concerning the optimal design of experiments in which a sample of participants responds to a sample of stimuli. We show that in crossed designs, statistical power typically does not approach unity as the number of participants goes to infinity but instead approaches a maximum attainable power value that is possibly small, depending on the stimulus sample. We also consider the statistical merits of designs involving multiple stimulus blocks. Finally, we provide a simple and flexible Web-based power application to aid researchers in planning studies with samples of stimuli.

  8. A Science and Risk-Based Pragmatic Methodology for Blend and Content Uniformity Assessment.

    PubMed

    Sayeed-Desta, Naheed; Pazhayattil, Ajay Babu; Collins, Jordan; Doshi, Chetan

    2018-04-01

    This paper describes a pragmatic approach that can be applied in assessing powder blend and unit dosage uniformity of solid dose products at Process Design, Process Performance Qualification, and Continued/Ongoing Process Verification stages of the Process Validation lifecycle. The statistically based sampling, testing, and assessment plan was developed due to the withdrawal of the FDA draft guidance for industry "Powder Blends and Finished Dosage Units-Stratified In-Process Dosage Unit Sampling and Assessment." This paper compares the proposed Grouped Area Variance Estimate (GAVE) method with an alternate approach outlining the practicality and statistical rationalization using traditional sampling and analytical methods. The approach is designed to fit solid dose processes assuring high statistical confidence in both powder blend uniformity and dosage unit uniformity during all three stages of the lifecycle complying with ASTM standards as recommended by the US FDA.

  9. Systematic sampling for suspended sediment

    Treesearch

    Robert B. Thomas

    1991-01-01

    Abstract - Because of high costs or complex logistics, scientific populations cannot be measured entirely and must be sampled. Accepted scientific practice holds that sample selection be based on statistical principles to assure objectivity when estimating totals and variances. Probability sampling--obtaining samples with known probabilities--is the only method that...

  10. Distribution of the two-sample t-test statistic following blinded sample size re-estimation.

    PubMed

    Lu, Kaifeng

    2016-05-01

    We consider the blinded sample size re-estimation based on the simple one-sample variance estimator at an interim analysis. We characterize the exact distribution of the standard two-sample t-test statistic at the final analysis. We describe a simulation algorithm for the evaluation of the probability of rejecting the null hypothesis at given treatment effect. We compare the blinded sample size re-estimation method with two unblinded methods with respect to the empirical type I error, the empirical power, and the empirical distribution of the standard deviation estimator and final sample size. We characterize the type I error inflation across the range of standardized non-inferiority margin for non-inferiority trials, and derive the adjusted significance level to ensure type I error control for given sample size of the internal pilot study. We show that the adjusted significance level increases as the sample size of the internal pilot study increases. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  11. Balanced mechanical resonator for powder handling device

    NASA Technical Reports Server (NTRS)

    Sarrazin, Philippe C. (Inventor); Brunner, Will M. (Inventor)

    2012-01-01

    A system incorporating a balanced mechanical resonator and a method for vibration of a sample composed of granular material to generate motion of a powder sample inside the sample holder for obtaining improved analysis statistics, without imparting vibration to the sample holder support.

  12. "Magnitude-based inference": a statistical review.

    PubMed

    Welsh, Alan H; Knight, Emma J

    2015-04-01

    We consider "magnitude-based inference" and its interpretation by examining in detail its use in the problem of comparing two means. We extract from the spreadsheets, which are provided to users of the analysis (http://www.sportsci.org/), a precise description of how "magnitude-based inference" is implemented. We compare the implemented version of the method with general descriptions of it and interpret the method in familiar statistical terms. We show that "magnitude-based inference" is not a progressive improvement on modern statistics. The additional probabilities introduced are not directly related to the confidence interval but, rather, are interpretable either as P values for two different nonstandard tests (for different null hypotheses) or as approximate Bayesian calculations, which also lead to a type of test. We also discuss sample size calculations associated with "magnitude-based inference" and show that the substantial reduction in sample sizes claimed for the method (30% of the sample size obtained from standard frequentist calculations) is not justifiable so the sample size calculations should not be used. Rather than using "magnitude-based inference," a better solution is to be realistic about the limitations of the data and use either confidence intervals or a fully Bayesian analysis.

  13. Automated sampling assessment for molecular simulations using the effective sample size

    PubMed Central

    Zhang, Xin; Bhatt, Divesh; Zuckerman, Daniel M.

    2010-01-01

    To quantify the progress in the development of algorithms and forcefields used in molecular simulations, a general method for the assessment of the sampling quality is needed. Statistical mechanics principles suggest the populations of physical states characterize equilibrium sampling in a fundamental way. We therefore develop an approach for analyzing the variances in state populations, which quantifies the degree of sampling in terms of the effective sample size (ESS). The ESS estimates the number of statistically independent configurations contained in a simulated ensemble. The method is applicable to both traditional dynamics simulations as well as more modern (e.g., multi–canonical) approaches. Our procedure is tested in a variety of systems from toy models to atomistic protein simulations. We also introduce a simple automated procedure to obtain approximate physical states from dynamic trajectories: this allows sample–size estimation in systems for which physical states are not known in advance. PMID:21221418

  14. Fast and accurate imputation of summary statistics enhances evidence of functional enrichment.

    PubMed

    Pasaniuc, Bogdan; Zaitlen, Noah; Shi, Huwenbo; Bhatia, Gaurav; Gusev, Alexander; Pickrell, Joseph; Hirschhorn, Joel; Strachan, David P; Patterson, Nick; Price, Alkes L

    2014-10-15

    Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1-5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case-control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of [Formula: see text] association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses. Publicly available software package available at http://bogdan.bioinformatics.ucla.edu/software/. bpasaniuc@mednet.ucla.edu or aprice@hsph.harvard.edu Supplementary materials are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  15. Differential gene expression detection and sample classification using penalized linear regression models.

    PubMed

    Wu, Baolin

    2006-02-15

    Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p > n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the 'large p small n' is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the 'nearest shrunken centroid' proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the (1) penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the (1) penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection.

  16. Small sample mediation testing: misplaced confidence in bootstrapped confidence intervals.

    PubMed

    Koopman, Joel; Howe, Michael; Hollenbeck, John R; Sin, Hock-Peng

    2015-01-01

    Bootstrapping is an analytical tool commonly used in psychology to test the statistical significance of the indirect effect in mediation models. Bootstrapping proponents have particularly advocated for its use for samples of 20-80 cases. This advocacy has been heeded, especially in the Journal of Applied Psychology, as researchers are increasingly utilizing bootstrapping to test mediation with samples in this range. We discuss reasons to be concerned with this escalation, and in a simulation study focused specifically on this range of sample sizes, we demonstrate not only that bootstrapping has insufficient statistical power to provide a rigorous hypothesis test in most conditions but also that bootstrapping has a tendency to exhibit an inflated Type I error rate. We then extend our simulations to investigate an alternative empirical resampling method as well as a Bayesian approach and demonstrate that they exhibit comparable statistical power to bootstrapping in small samples without the associated inflated Type I error. Implications for researchers testing mediation hypotheses in small samples are presented. For researchers wishing to use these methods in their own research, we have provided R syntax in the online supplemental materials. (c) 2015 APA, all rights reserved.

  17. Explorations in statistics: the log transformation.

    PubMed

    Curran-Everett, Douglas

    2018-06-01

    Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This thirteenth installment of Explorations in Statistics explores the log transformation, an established technique that rescales the actual observations from an experiment so that the assumptions of some statistical analysis are better met. A general assumption in statistics is that the variability of some response Y is homogeneous across groups or across some predictor variable X. If the variability-the standard deviation-varies in rough proportion to the mean value of Y, a log transformation can equalize the standard deviations. Moreover, if the actual observations from an experiment conform to a skewed distribution, then a log transformation can make the theoretical distribution of the sample mean more consistent with a normal distribution. This is important: the results of a one-sample t test are meaningful only if the theoretical distribution of the sample mean is roughly normal. If we log-transform our observations, then we want to confirm the transformation was useful. We can do this if we use the Box-Cox method, if we bootstrap the sample mean and the statistic t itself, and if we assess the residual plots from the statistical model of the actual and transformed sample observations.

  18. The Beginner's Guide to the Bootstrap Method of Resampling.

    ERIC Educational Resources Information Center

    Lane, Ginny G.

    The bootstrap method of resampling can be useful in estimating the replicability of study results. The bootstrap procedure creates a mock population from a given sample of data from which multiple samples are then drawn. The method extends the usefulness of the jackknife procedure as it allows for computation of a given statistic across a maximal…

  19. A New Sample Size Formula for Regression.

    ERIC Educational Resources Information Center

    Brooks, Gordon P.; Barcikowski, Robert S.

    The focus of this research was to determine the efficacy of a new method of selecting sample sizes for multiple linear regression. A Monte Carlo simulation was used to study both empirical predictive power rates and empirical statistical power rates of the new method and seven other methods: those of C. N. Park and A. L. Dudycha (1974); J. Cohen…

  20. Relations between Brain Structure and Attentional Function in Spina Bifida: Utilization of Robust Statistical Approaches

    PubMed Central

    Kulesz, Paulina A.; Tian, Siva; Juranek, Jenifer; Fletcher, Jack M.; Francis, David J.

    2015-01-01

    Objective Weak structure-function relations for brain and behavior may stem from problems in estimating these relations in small clinical samples with frequently occurring outliers. In the current project, we focused on the utility of using alternative statistics to estimate these relations. Method Fifty-four children with spina bifida meningomyelocele performed attention tasks and received MRI of the brain. Using a bootstrap sampling process, the Pearson product moment correlation was compared with four robust correlations: the percentage bend correlation, the Winsorized correlation, the skipped correlation using the Donoho-Gasko median, and the skipped correlation using the minimum volume ellipsoid estimator Results All methods yielded similar estimates of the relations between measures of brain volume and attention performance. The similarity of estimates across correlation methods suggested that the weak structure-function relations previously found in many studies are not readily attributable to the presence of outlying observations and other factors that violate the assumptions behind the Pearson correlation. Conclusions Given the difficulty of assembling large samples for brain-behavior studies, estimating correlations using multiple, robust methods may enhance the statistical conclusion validity of studies yielding small, but often clinically significant, correlations. PMID:25495830

  1. Experimental and environmental factors affect spurious detection of ecological thresholds

    USGS Publications Warehouse

    Daily, Jonathan P.; Hitt, Nathaniel P.; Smith, David; Snyder, Craig D.

    2012-01-01

    Threshold detection methods are increasingly popular for assessing nonlinear responses to environmental change, but their statistical performance remains poorly understood. We simulated linear change in stream benthic macroinvertebrate communities and evaluated the performance of commonly used threshold detection methods based on model fitting (piecewise quantile regression [PQR]), data partitioning (nonparametric change point analysis [NCPA]), and a hybrid approach (significant zero crossings [SiZer]). We demonstrated that false detection of ecological thresholds (type I errors) and inferences on threshold locations are influenced by sample size, rate of linear change, and frequency of observations across the environmental gradient (i.e., sample-environment distribution, SED). However, the relative importance of these factors varied among statistical methods and between inference types. False detection rates were influenced primarily by user-selected parameters for PQR (τ) and SiZer (bandwidth) and secondarily by sample size (for PQR) and SED (for SiZer). In contrast, the location of reported thresholds was influenced primarily by SED. Bootstrapped confidence intervals for NCPA threshold locations revealed strong correspondence to SED. We conclude that the choice of statistical methods for threshold detection should be matched to experimental and environmental constraints to minimize false detection rates and avoid spurious inferences regarding threshold location.

  2. Volatility measurement with directional change in Chinese stock market: Statistical property and investment strategy

    NASA Astrophysics Data System (ADS)

    Ma, Junjun; Xiong, Xiong; He, Feng; Zhang, Wei

    2017-04-01

    The stock price fluctuation is studied in this paper with intrinsic time perspective. The event, directional change (DC) or overshoot, are considered as time scale of price time series. With this directional change law, its corresponding statistical properties and parameter estimation is tested in Chinese stock market. Furthermore, a directional change trading strategy is proposed for invest in the market portfolio in Chinese stock market, and both in-sample and out-of-sample performance are compared among the different method of model parameter estimation. We conclude that DC method can capture important fluctuations in Chinese stock market and gain profit due to the statistical property that average upturn overshoot size is bigger than average downturn directional change size. The optimal parameter of DC method is not fixed and we obtained 1.8% annual excess return with this DC-based trading strategy.

  3. A Comparison of Analytical and Data Preprocessing Methods for Spectral Fingerprinting

    PubMed Central

    LUTHRIA, DEVANAND L.; MUKHOPADHYAY, SUDARSAN; LIN, LONG-ZE; HARNLY, JAMES M.

    2013-01-01

    Spectral fingerprinting, as a method of discriminating between plant cultivars and growing treatments for a common set of broccoli samples, was compared for six analytical instruments. Spectra were acquired for finely powdered solid samples using Fourier transform infrared (FT-IR) and Fourier transform near-infrared (NIR) spectrometry. Spectra were also acquired for unfractionated aqueous methanol extracts of the powders using molecular absorption in the ultraviolet (UV) and visible (VIS) regions and mass spectrometry with negative (MS−) and positive (MS+) ionization. The spectra were analyzed using nested one-way analysis of variance (ANOVA) and principal component analysis (PCA) to statistically evaluate the quality of discrimination. All six methods showed statistically significant differences between the cultivars and treatments. The significance of the statistical tests was improved by the judicious selection of spectral regions (IR and NIR), masses (MS+ and MS−), and derivatives (IR, NIR, UV, and VIS). PMID:21352644

  4. Density (DE)

    Treesearch

    John F. Caratti

    2006-01-01

    The FIREMON Density (DE) method is used to assess changes in plant species density and height for a macroplot. This method uses multiple quadrats and belt transects (transects having a width) to sample within plot variation and quantify statistically valid changes in plant species density and height over time. Herbaceous plant species are sampled with quadrats while...

  5. A statistical evaluation of spectral fingerprinting methods using analysis of variance and principal component analysis

    USDA-ARS?s Scientific Manuscript database

    Six methods were compared with respect to spectral fingerprinting of a well-characterized series of broccoli samples. Spectral fingerprints were acquired for finely-powdered solid samples using Fourier transform-infrared (IR) and Fourier transform-near infrared (NIR) spectrometry and for aqueous met...

  6. Statistical Methods and Tools for Uxo Characterization (SERDP Final Technical Report)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pulsipher, Brent A.; Gilbert, Richard O.; Wilson, John E.

    2004-11-15

    The Strategic Environmental Research and Development Program (SERDP) issued a statement of need for FY01 titled Statistical Sampling for Unexploded Ordnance (UXO) Site Characterization that solicited proposals to develop statistically valid sampling protocols for cost-effective, practical, and reliable investigation of sites contaminated with UXO; protocols that could be validated through subsequent field demonstrations. The SERDP goal was the development of a sampling strategy for which a fraction of the site is initially surveyed by geophysical detectors to confidently identify clean areas and subsections (target areas, TAs) that had elevated densities of anomalous geophysical detector readings that could indicate the presencemore » of UXO. More detailed surveys could then be conducted to search the identified TAs for UXO. SERDP funded three projects: those proposed by the Pacific Northwest National Laboratory (PNNL) (SERDP Project No. UXO 1199), Sandia National Laboratory (SNL), and Oak Ridge National Laboratory (ORNL). The projects were closely coordinated to minimize duplication of effort and facilitate use of shared algorithms where feasible. This final report for PNNL Project 1199 describes the methods developed by PNNL to address SERDP's statement-of-need for the development of statistically-based geophysical survey methods for sites where 100% surveys are unattainable or cost prohibitive.« less

  7. Statistical Methods for Detecting Differentially Abundant Features in Clinical Metagenomic Samples

    PubMed Central

    White, James Robert; Nagarajan, Niranjan; Pop, Mihai

    2009-01-01

    Numerous studies are currently underway to characterize the microbial communities inhabiting our world. These studies aim to dramatically expand our understanding of the microbial biosphere and, more importantly, hope to reveal the secrets of the complex symbiotic relationship between us and our commensal bacterial microflora. An important prerequisite for such discoveries are computational tools that are able to rapidly and accurately compare large datasets generated from complex bacterial communities to identify features that distinguish them. We present a statistical method for comparing clinical metagenomic samples from two treatment populations on the basis of count data (e.g. as obtained through sequencing) to detect differentially abundant features. Our method, Metastats, employs the false discovery rate to improve specificity in high-complexity environments, and separately handles sparsely-sampled features using Fisher's exact test. Under a variety of simulations, we show that Metastats performs well compared to previously used methods, and significantly outperforms other methods for features with sparse counts. We demonstrate the utility of our method on several datasets including a 16S rRNA survey of obese and lean human gut microbiomes, COG functional profiles of infant and mature gut microbiomes, and bacterial and viral metabolic subsystem data inferred from random sequencing of 85 metagenomes. The application of our method to the obesity dataset reveals differences between obese and lean subjects not reported in the original study. For the COG and subsystem datasets, we provide the first statistically rigorous assessment of the differences between these populations. The methods described in this paper are the first to address clinical metagenomic datasets comprising samples from multiple subjects. Our methods are robust across datasets of varied complexity and sampling level. While designed for metagenomic applications, our software can also be applied to digital gene expression studies (e.g. SAGE). A web server implementation of our methods and freely available source code can be found at http://metastats.cbcb.umd.edu/. PMID:19360128

  8. Version 2.0 Visual Sample Plan (VSP): UXO Module Code Description and Verification

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gilbert, Richard O.; Wilson, John E.; O'Brien, Robert F.

    2003-05-06

    The Pacific Northwest National Laboratory (PNNL) is developing statistical methods for determining the amount of geophysical surveys conducted along transects (swaths) that are needed to achieve specified levels of confidence of finding target areas (TAs) of anomalous readings and possibly unexploded ordnance (UXO) at closed, transferring and transferred (CTT) Department of Defense (DoD) ranges and other sites. The statistical methods developed by PNNL have been coded into the UXO module of the Visual Sample Plan (VSP) software code that is being developed by PNNL with support from the DoD, the U.S. Department of Energy (DOE, and the U.S. Environmental Protectionmore » Agency (EPA). (The VSP software and VSP Users Guide (Hassig et al, 2002) may be downloaded from http://dqo.pnl.gov/vsp.) This report describes and documents the statistical methods developed and the calculations and verification testing that have been conducted to verify that VSPs implementation of these methods is correct and accurate.« less

  9. Interpreting “statistical hypothesis testing” results in clinical research

    PubMed Central

    Sarmukaddam, Sanjeev B.

    2012-01-01

    Difference between “Clinical Significance and Statistical Significance” should be kept in mind while interpreting “statistical hypothesis testing” results in clinical research. This fact is already known to many but again pointed out here as philosophy of “statistical hypothesis testing” is sometimes unnecessarily criticized mainly due to failure in considering such distinction. Randomized controlled trials are also wrongly criticized similarly. Some scientific method may not be applicable in some peculiar/particular situation does not mean that the method is useless. Also remember that “statistical hypothesis testing” is not for decision making and the field of “decision analysis” is very much an integral part of science of statistics. It is not correct to say that “confidence intervals have nothing to do with confidence” unless one understands meaning of the word “confidence” as used in context of confidence interval. Interpretation of the results of every study should always consider all possible alternative explanations like chance, bias, and confounding. Statistical tests in inferential statistics are, in general, designed to answer the question “How likely is the difference found in random sample(s) is due to chance” and therefore limitation of relying only on statistical significance in making clinical decisions should be avoided. PMID:22707861

  10. Surveying Europe's Only Cave-Dwelling Chordate Species (Proteus anguinus) Using Environmental DNA.

    PubMed

    Vörös, Judit; Márton, Orsolya; Schmidt, Benedikt R; Gál, Júlia Tünde; Jelić, Dušan

    2017-01-01

    In surveillance of subterranean fauna, especially in the case of rare or elusive aquatic species, traditional techniques used for epigean species are often not feasible. We developed a non-invasive survey method based on environmental DNA (eDNA) to detect the presence of the red-listed cave-dwelling amphibian, Proteus anguinus, in the caves of the Dinaric Karst. We tested the method in fifteen caves in Croatia, from which the species was previously recorded or expected to occur. We successfully confirmed the presence of P. anguinus from ten caves and detected the species for the first time in five others. Using a hierarchical occupancy model we compared the availability and detection probability of eDNA of two water sampling methods, filtration and precipitation. The statistical analysis showed that both availability and detection probability depended on the method and estimates for both probabilities were higher using filter samples than for precipitation samples. Combining reliable field and laboratory methods with robust statistical modeling will give the best estimates of species occurrence.

  11. A simulative comparison of respondent driven sampling with incentivized snowball sampling – the “strudel effect”

    PubMed Central

    Gyarmathy, V. Anna; Johnston, Lisa G.; Caplinskiene, Irma; Caplinskas, Saulius; Latkin, Carl A.

    2014-01-01

    Background Respondent driven sampling (RDS) and Incentivized Snowball Sampling (ISS) are two sampling methods that are commonly used to reach people who inject drugs (PWID). Methods We generated a set of simulated RDS samples on an actual sociometric ISS sample of PWID in Vilnius, Lithuania (“original sample”) to assess if the simulated RDS estimates were statistically significantly different from the original ISS sample prevalences for HIV (9.8%), Hepatitis A (43.6%), Hepatitis B (Anti-HBc 43.9% and HBsAg 3.4%), Hepatitis C (87.5%), syphilis (6.8%) and Chlamydia (8.8%) infections and for selected behavioral risk characteristics. Results The original sample consisted of a large component of 249 people (83% of the sample) and 13 smaller components with 1 to 12 individuals. Generally, as long as all seeds were recruited from the large component of the original sample, the simulation samples simply recreated the large component. There were no significant differences between the large component and the entire original sample for the characteristics of interest. Altogether 99.2% of 360 simulation sample point estimates were within the confidence interval of the original prevalence values for the characteristics of interest. Conclusions When population characteristics are reflected in large network components that dominate the population, RDS and ISS may produce samples that have statistically non-different prevalence values, even though some isolated network components may be under-sampled and/or statistically significantly different from the main groups. This so-called “strudel effect” is discussed in the paper. PMID:24360650

  12. Statistical methods for astronomical data with upper limits. I - Univariate distributions

    NASA Technical Reports Server (NTRS)

    Feigelson, E. D.; Nelson, P. I.

    1985-01-01

    The statistical treatment of univariate censored data is discussed. A heuristic derivation of the Kaplan-Meier maximum-likelihood estimator from first principles is presented which results in an expression amenable to analytic error analysis. Methods for comparing two or more censored samples are given along with simple computational examples, stressing the fact that most astronomical problems involve upper limits while the standard mathematical methods require lower limits. The application of univariate survival analysis to six data sets in the recent astrophysical literature is described, and various aspects of the use of survival analysis in astronomy, such as the limitations of various two-sample tests and the role of parametric modelling, are discussed.

  13. Statistical Methods for Passive Vehicle Classification in Urban Traffic Surveillance and Control

    DOT National Transportation Integrated Search

    1980-01-01

    A statistical approach to passive vehicle classification using the phase-shift signature from electromagnetic presence-type vehicle detectors is developed with digitized samples of the analog phase-shift signature, the problem of classifying vehicle ...

  14. Statistical inference for extended or shortened phase II studies based on Simon's two-stage designs.

    PubMed

    Zhao, Junjun; Yu, Menggang; Feng, Xi-Ping

    2015-06-07

    Simon's two-stage designs are popular choices for conducting phase II clinical trials, especially in the oncology trials to reduce the number of patients placed on ineffective experimental therapies. Recently Koyama and Chen (2008) discussed how to conduct proper inference for such studies because they found that inference procedures used with Simon's designs almost always ignore the actual sampling plan used. In particular, they proposed an inference method for studies when the actual second stage sample sizes differ from planned ones. We consider an alternative inference method based on likelihood ratio. In particular, we order permissible sample paths under Simon's two-stage designs using their corresponding conditional likelihood. In this way, we can calculate p-values using the common definition: the probability of obtaining a test statistic value at least as extreme as that observed under the null hypothesis. In addition to providing inference for a couple of scenarios where Koyama and Chen's method can be difficult to apply, the resulting estimate based on our method appears to have certain advantage in terms of inference properties in many numerical simulations. It generally led to smaller biases and narrower confidence intervals while maintaining similar coverages. We also illustrated the two methods in a real data setting. Inference procedures used with Simon's designs almost always ignore the actual sampling plan. Reported P-values, point estimates and confidence intervals for the response rate are not usually adjusted for the design's adaptiveness. Proper statistical inference procedures should be used.

  15. Sticky trap and stem-tap sampling protocols for the Asian citrus psyllid (Hemiptera: Psyllidae)

    USDA-ARS?s Scientific Manuscript database

    Sampling statistics were obtained to develop a sampling protocol for estimating numbers of adult Diaphorina citri in citrus using two different sampling methods: yellow sticky traps and stem–tap samples. A 4.0 ha block of mature orange trees was stratified into ten 0.4 ha strata and sampled using...

  16. A new statistical distance scale for planetary nebulae

    NASA Astrophysics Data System (ADS)

    Ali, Alaa; Ismail, H. A.; Alsolami, Z.

    2015-05-01

    In the first part of the present article we discuss the consistency among different individual distance methods of Galactic planetary nebulae, while in the second part we develop a new statistical distance scale based on a calibrating sample of well determined distances. A set composed of 315 planetary nebulae with individual distances are extracted from the literature. Inspecting the data set indicates that the accuracy of distances is varying among different individual methods and also among different sources where the same individual method was applied. Therefore, we derive a reliable weighted mean distance for each object by considering the influence of the distance error and the weight of each individual method. The results reveal that the discussed individual methods are consistent with each other, except the gravity method that produces higher distances compared to other individual methods. From the initial data set, we construct a standard calibrating sample consists of 82 objects. This sample is restricted only to the objects with distances determined from at least two different individual methods, except few objects with trusted distances determined from the trigonometric, spectroscopic, and cluster membership methods. In addition to the well determined distances for this sample, it shows a lot of advantages over that used in the prior distance scales. This sample is used to recalibrate the mass-radius and radio surface brightness temperature-radius relationships. An average error of ˜30 % is estimated for the new distance scale. The newly distance scale is compared with the most widely used statistical scales in literature, where the results show that it is roughly similar to the majority of them within ˜±20 % difference. Furthermore, the new scale yields a weighted mean distance to the Galactic center of 7.6±1.35 kpc, which in good agreement with the very recent measure of Malkin 2013.

  17. Sample Size Estimation: The Easy Way

    ERIC Educational Resources Information Center

    Weller, Susan C.

    2015-01-01

    This article presents a simple approach to making quick sample size estimates for basic hypothesis tests. Although there are many sources available for estimating sample sizes, methods are not often integrated across statistical tests, levels of measurement of variables, or effect sizes. A few parameters are required to estimate sample sizes and…

  18. [A comparison of convenience sampling and purposive sampling].

    PubMed

    Suen, Lee-Jen Wu; Huang, Hui-Man; Lee, Hao-Hsien

    2014-06-01

    Convenience sampling and purposive sampling are two different sampling methods. This article first explains sampling terms such as target population, accessible population, simple random sampling, intended sample, actual sample, and statistical power analysis. These terms are then used to explain the difference between "convenience sampling" and purposive sampling." Convenience sampling is a non-probabilistic sampling technique applicable to qualitative or quantitative studies, although it is most frequently used in quantitative studies. In convenience samples, subjects more readily accessible to the researcher are more likely to be included. Thus, in quantitative studies, opportunity to participate is not equal for all qualified individuals in the target population and study results are not necessarily generalizable to this population. As in all quantitative studies, increasing the sample size increases the statistical power of the convenience sample. In contrast, purposive sampling is typically used in qualitative studies. Researchers who use this technique carefully select subjects based on study purpose with the expectation that each participant will provide unique and rich information of value to the study. As a result, members of the accessible population are not interchangeable and sample size is determined by data saturation not by statistical power analysis.

  19. Evaluating Composite Sampling Methods of Bacillus Spores at Low Concentrations

    PubMed Central

    Hess, Becky M.; Amidan, Brett G.; Anderson, Kevin K.; Hutchison, Janine R.

    2016-01-01

    Restoring all facility operations after the 2001 Amerithrax attacks took years to complete, highlighting the need to reduce remediation time. Some of the most time intensive tasks were environmental sampling and sample analyses. Composite sampling allows disparate samples to be combined, with only a single analysis needed, making it a promising method to reduce response times. We developed a statistical experimental design to test three different composite sampling methods: 1) single medium single pass composite (SM-SPC): a single cellulose sponge samples multiple coupons with a single pass across each coupon; 2) single medium multi-pass composite: a single cellulose sponge samples multiple coupons with multiple passes across each coupon (SM-MPC); and 3) multi-medium post-sample composite (MM-MPC): a single cellulose sponge samples a single surface, and then multiple sponges are combined during sample extraction. Five spore concentrations of Bacillus atrophaeus Nakamura spores were tested; concentrations ranged from 5 to 100 CFU/coupon (0.00775 to 0.155 CFU/cm2). Study variables included four clean surface materials (stainless steel, vinyl tile, ceramic tile, and painted dry wallboard) and three grime coated/dirty materials (stainless steel, vinyl tile, and ceramic tile). Analysis of variance for the clean study showed two significant factors: composite method (p< 0.0001) and coupon material (p = 0.0006). Recovery efficiency (RE) was higher overall using the MM-MPC method compared to the SM-SPC and SM-MPC methods. RE with the MM-MPC method for concentrations tested (10 to 100 CFU/coupon) was similar for ceramic tile, dry wall, and stainless steel for clean materials. RE was lowest for vinyl tile with both composite methods. Statistical tests for the dirty study showed RE was significantly higher for vinyl and stainless steel materials, but lower for ceramic tile. These results suggest post-sample compositing can be used to reduce sample analysis time when responding to a Bacillus anthracis contamination event of clean or dirty surfaces. PMID:27736999

  20. Evaluating Composite Sampling Methods of Bacillus Spores at Low Concentrations.

    PubMed

    Hess, Becky M; Amidan, Brett G; Anderson, Kevin K; Hutchison, Janine R

    2016-01-01

    Restoring all facility operations after the 2001 Amerithrax attacks took years to complete, highlighting the need to reduce remediation time. Some of the most time intensive tasks were environmental sampling and sample analyses. Composite sampling allows disparate samples to be combined, with only a single analysis needed, making it a promising method to reduce response times. We developed a statistical experimental design to test three different composite sampling methods: 1) single medium single pass composite (SM-SPC): a single cellulose sponge samples multiple coupons with a single pass across each coupon; 2) single medium multi-pass composite: a single cellulose sponge samples multiple coupons with multiple passes across each coupon (SM-MPC); and 3) multi-medium post-sample composite (MM-MPC): a single cellulose sponge samples a single surface, and then multiple sponges are combined during sample extraction. Five spore concentrations of Bacillus atrophaeus Nakamura spores were tested; concentrations ranged from 5 to 100 CFU/coupon (0.00775 to 0.155 CFU/cm2). Study variables included four clean surface materials (stainless steel, vinyl tile, ceramic tile, and painted dry wallboard) and three grime coated/dirty materials (stainless steel, vinyl tile, and ceramic tile). Analysis of variance for the clean study showed two significant factors: composite method (p< 0.0001) and coupon material (p = 0.0006). Recovery efficiency (RE) was higher overall using the MM-MPC method compared to the SM-SPC and SM-MPC methods. RE with the MM-MPC method for concentrations tested (10 to 100 CFU/coupon) was similar for ceramic tile, dry wall, and stainless steel for clean materials. RE was lowest for vinyl tile with both composite methods. Statistical tests for the dirty study showed RE was significantly higher for vinyl and stainless steel materials, but lower for ceramic tile. These results suggest post-sample compositing can be used to reduce sample analysis time when responding to a Bacillus anthracis contamination event of clean or dirty surfaces.

  1. Testing the non-unity of rate ratio under inverse sampling.

    PubMed

    Tang, Man-Lai; Liao, Yi Jie; Ng, Hong Keung Tony; Chan, Ping Shing

    2007-08-01

    Inverse sampling is considered to be a more appropriate sampling scheme than the usual binomial sampling scheme when subjects arrive sequentially, when the underlying response of interest is acute, and when maximum likelihood estimators of some epidemiologic indices are undefined. In this article, we study various statistics for testing non-unity rate ratios in case-control studies under inverse sampling. These include the Wald, unconditional score, likelihood ratio and conditional score statistics. Three methods (the asymptotic, conditional exact, and Mid-P methods) are adopted for P-value calculation. We evaluate the performance of different combinations of test statistics and P-value calculation methods in terms of their empirical sizes and powers via Monte Carlo simulation. In general, asymptotic score and conditional score tests are preferable for their actual type I error rates are well controlled around the pre-chosen nominal level, and their powers are comparatively the largest. The exact version of Wald test is recommended if one wants to control the actual type I error rate at or below the pre-chosen nominal level. If larger power is expected and fluctuation of sizes around the pre-chosen nominal level are allowed, then the Mid-P version of Wald test is a desirable alternative. We illustrate the methodologies with a real example from a heart disease study. (c) 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

  2. [The research protocol III. Study population].

    PubMed

    Arias-Gómez, Jesús; Villasís-Keever, Miguel Ángel; Miranda-Novales, María Guadalupe

    2016-01-01

    The study population is defined as a set of cases, determined, limited, and accessible, that will constitute the subjects for the selection of the sample, and must fulfill several characteristics and distinct criteria. The objectives of this manuscript are focused on specifying each one of the elements required to make the selection of the participants of a research project, during the elaboration of the protocol, including the concepts of study population, sample, selection criteria and sampling methods. After delineating the study population, the researcher must specify the criteria that each participant has to comply. The criteria that include the specific characteristics are denominated selection or eligibility criteria. These criteria are inclusion, exclusion and elimination, and will delineate the eligible population. The sampling methods are divided in two large groups: 1) probabilistic or random sampling and 2) non-probabilistic sampling. The difference lies in the employment of statistical methods to select the subjects. In every research, it is necessary to establish at the beginning the specific number of participants to be included to achieve the objectives of the study. This number is the sample size, and can be calculated or estimated with mathematical formulas and statistic software.

  3. Validation of a modification to Performance-Tested Method 010403: microwell DNA hybridization assay for detection of Listeria spp. in selected foods and selected environmental surfaces.

    PubMed

    Alles, Susan; Peng, Linda X; Mozola, Mark A

    2009-01-01

    A modification to Performance-Tested Method 010403, GeneQuence Listeria Test (DNAH method), is described. The modified method uses a new media formulation, LESS enrichment broth, in single-step enrichment protocols for both foods and environmental sponge and swab samples. Food samples are enriched for 27-30 h at 30 degrees C, and environmental samples for 24-48 h at 30 degrees C. Implementation of these abbreviated enrichment procedures allows test results to be obtained on a next-day basis. In testing of 14 food types in internal comparative studies with inoculated samples, there were statistically significant differences in method performance between the DNAH method and reference culture procedures for only 2 foods (pasteurized crab meat and lettuce) at the 27 h enrichment time point and for only a single food (pasteurized crab meat) in one trial at the 30 h enrichment time point. Independent laboratory testing with 3 foods showed statistical equivalence between the methods for all foods, and results support the findings of the internal trials. Overall, considering both internal and independent laboratory trials, sensitivity of the DNAH method relative to the reference culture procedures was 90.5%. Results of testing 5 environmental surfaces inoculated with various strains of Listeria spp. showed that the DNAH method was more productive than the reference U.S. Department of Agriculture-Food Safety and Inspection Service (USDA-FSIS) culture procedure for 3 surfaces (stainless steel, plastic, and cast iron), whereas results were statistically equivalent to the reference method for the other 2 surfaces (ceramic tile and sealed concrete). An independent laboratory trial with ceramic tile inoculated with L. monocytogenes confirmed the effectiveness of the DNAH method at the 24 h time point. Overall, sensitivity of the DNAH method at 24 h relative to that of the USDA-FSIS method was 152%. The DNAH method exhibited extremely high specificity, with only 1% false-positive reactions overall.

  4. Comparison of three sampling and analytical methods for the determination of airborne hexavalent chromium.

    PubMed

    Boiano, J M; Wallace, M E; Sieber, W K; Groff, J H; Wang, J; Ashley, K

    2000-08-01

    A field study was conducted with the goal of comparing the performance of three recently developed or modified sampling and analytical methods for the determination of airborne hexavalent chromium (Cr(VI)). The study was carried out in a hard chrome electroplating facility and in a jet engine manufacturing facility where airborne Cr(VI) was expected to be present. The analytical methods evaluated included two laboratory-based procedures (OSHA Method ID-215 and NIOSH Method 7605) and a field-portable method (NIOSH Method 7703). These three methods employ an identical sampling methodology: collection of Cr(VI)-containing aerosol on a polyvinyl chloride (PVC) filter housed in a sampling cassette, which is connected to a personal sampling pump calibrated at an appropriate flow rate. The basis of the analytical methods for all three methods involves extraction of the PVC filter in alkaline buffer solution, chemical isolation of the Cr(VI) ion, complexation of the Cr(VI) ion with 1,5-diphenylcarbazide, and spectrometric measurement of the violet chromium diphenylcarbazone complex at 540 nm. However, there are notable specific differences within the sample preparation procedures used in three methods. To assess the comparability of the three measurement protocols, a total of 20 side-by-side air samples were collected, equally divided between a chromic acid electroplating operation and a spray paint operation where water soluble forms of Cr(VI) were used. A range of Cr(VI) concentrations from 0.6 to 960 microg m(-3), with Cr(VI) mass loadings ranging from 0.4 to 32 microg, was measured at the two operations. The equivalence of the means of the log-transformed Cr(VI) concentrations obtained from the different analytical methods was compared. Based on analysis of variance (ANOVA) results, no statistically significant differences were observed between mean values measured using each of the three methods. Small but statistically significant differences were observed between results obtained from performance evaluation samples for the NIOSH field method and the OSHA laboratory method.

  5. A General Linear Method for Equating with Small Samples

    ERIC Educational Resources Information Center

    Albano, Anthony D.

    2015-01-01

    Research on equating with small samples has shown that methods with stronger assumptions and fewer statistical estimates can lead to decreased error in the estimated equating function. This article introduces a new approach to linear observed-score equating, one which provides flexible control over how form difficulty is assumed versus estimated…

  6. Triacylglycerol Analysis in Human Milk and Other Mammalian Species: Small-Scale Sample Preparation, Characterization, and Statistical Classification Using HPLC-ELSD Profiles.

    PubMed

    Ten-Doménech, Isabel; Beltrán-Iturat, Eduardo; Herrero-Martínez, José Manuel; Sancho-Llopis, Juan Vicente; Simó-Alfonso, Ernesto Francisco

    2015-06-24

    In this work, a method for the separation of triacylglycerols (TAGs) present in human milk and from other mammalian species by reversed-phase high-performance liquid chromatography using a core-shell particle packed column with UV and evaporative light-scattering detectors is described. Under optimal conditions, a mobile phase containing acetonitrile/n-pentanol at 10 °C gave an excellent resolution among more than 50 TAG peaks. A small-scale method for fat extraction in these milks (particularly of interest for human milk samples) using minimal amounts of sample and reagents was also developed. The proposed extraction protocol and the traditional method were compared, giving similar results, with respect to the total fat and relative TAG contents. Finally, a statistical study based on linear discriminant analysis on the TAG composition of different types of milks (human, cow, sheep, and goat) was carried out to differentiate the samples according to their mammalian origin.

  7. Sampling of temporal networks: Methods and biases

    NASA Astrophysics Data System (ADS)

    Rocha, Luis E. C.; Masuda, Naoki; Holme, Petter

    2017-11-01

    Temporal networks have been increasingly used to model a diversity of systems that evolve in time; for example, human contact structures over which dynamic processes such as epidemics take place. A fundamental aspect of real-life networks is that they are sampled within temporal and spatial frames. Furthermore, one might wish to subsample networks to reduce their size for better visualization or to perform computationally intensive simulations. The sampling method may affect the network structure and thus caution is necessary to generalize results based on samples. In this paper, we study four sampling strategies applied to a variety of real-life temporal networks. We quantify the biases generated by each sampling strategy on a number of relevant statistics such as link activity, temporal paths and epidemic spread. We find that some biases are common in a variety of networks and statistics, but one strategy, uniform sampling of nodes, shows improved performance in most scenarios. Given the particularities of temporal network data and the variety of network structures, we recommend that the choice of sampling methods be problem oriented to minimize the potential biases for the specific research questions on hand. Our results help researchers to better design network data collection protocols and to understand the limitations of sampled temporal network data.

  8. Identifying natural flow regimes using fish communities

    NASA Astrophysics Data System (ADS)

    Chang, Fi-John; Tsai, Wen-Ping; Wu, Tzu-Ching; Chen, Hung-kwai; Herricks, Edwin E.

    2011-10-01

    SummaryModern water resources management has adopted natural flow regimes as reasonable targets for river restoration and conservation. The characterization of a natural flow regime begins with the development of hydrologic statistics from flow records. However, little guidance exists for defining the period of record needed for regime determination. In Taiwan, the Taiwan Eco-hydrological Indicator System (TEIS), a group of hydrologic statistics selected for fisheries relevance, is being used to evaluate ecological flows. The TEIS consists of a group of hydrologic statistics selected to characterize the relationships between flow and the life history of indigenous species. Using the TEIS and biosurvey data for Taiwan, this paper identifies the length of hydrologic record sufficient for natural flow regime characterization. To define the ecological hydrology of fish communities, this study connected hydrologic statistics to fish communities by using methods to define antecedent conditions that influence existing community composition. A moving average method was applied to TEIS statistics to reflect the effects of antecedent flow condition and a point-biserial correlation method was used to relate fisheries collections with TEIS statistics. The resulting fish species-TEIS (FISH-TEIS) hydrologic statistics matrix takes full advantage of historical flows and fisheries data. The analysis indicates that, in the watersheds analyzed, averaging TEIS statistics for the present year and 3 years prior to the sampling date, termed MA(4), is sufficient to develop a natural flow regime. This result suggests that flow regimes based on hydrologic statistics for the period of record can be replaced by regimes developed for sampled fish communities.

  9. Standard methods for sampling North American freshwater fishes

    USGS Publications Warehouse

    Bonar, Scott A.; Hubert, Wayne A.; Willis, David W.

    2009-01-01

    This important reference book provides standard sampling methods recommended by the American Fisheries Society for assessing and monitoring freshwater fish populations in North America. Methods apply to ponds, reservoirs, natural lakes, and streams and rivers containing cold and warmwater fishes. Range-wide and eco-regional averages for indices of abundance, population structure, and condition for individual species are supplied to facilitate comparisons of standard data among populations. Provides information on converting nonstandard to standard data, statistical and database procedures for analyzing and storing standard data, and methods to prevent transfer of invasive species while sampling.

  10. Approximate sample size formulas for the two-sample trimmed mean test with unequal variances.

    PubMed

    Luh, Wei-Ming; Guo, Jiin-Huarng

    2007-05-01

    Yuen's two-sample trimmed mean test statistic is one of the most robust methods to apply when variances are heterogeneous. The present study develops formulas for the sample size required for the test. The formulas are applicable for the cases of unequal variances, non-normality and unequal sample sizes. Given the specified alpha and the power (1-beta), the minimum sample size needed by the proposed formulas under various conditions is less than is given by the conventional formulas. Moreover, given a specified size of sample calculated by the proposed formulas, simulation results show that Yuen's test can achieve statistical power which is generally superior to that of the approximate t test. A numerical example is provided.

  11. A statistical approach to selecting and confirming validation targets in -omics experiments

    PubMed Central

    2012-01-01

    Background Genomic technologies are, by their very nature, designed for hypothesis generation. In some cases, the hypotheses that are generated require that genome scientists confirm findings about specific genes or proteins. But one major advantage of high-throughput technology is that global genetic, genomic, transcriptomic, and proteomic behaviors can be observed. Manual confirmation of every statistically significant genomic result is prohibitively expensive. This has led researchers in genomics to adopt the strategy of confirming only a handful of the most statistically significant results, a small subset chosen for biological interest, or a small random subset. But there is no standard approach for selecting and quantitatively evaluating validation targets. Results Here we present a new statistical method and approach for statistically validating lists of significant results based on confirming only a small random sample. We apply our statistical method to show that the usual practice of confirming only the most statistically significant results does not statistically validate result lists. We analyze an extensively validated RNA-sequencing experiment to show that confirming a random subset can statistically validate entire lists of significant results. Finally, we analyze multiple publicly available microarray experiments to show that statistically validating random samples can both (i) provide evidence to confirm long gene lists and (ii) save thousands of dollars and hundreds of hours of labor over manual validation of each significant result. Conclusions For high-throughput -omics studies, statistical validation is a cost-effective and statistically valid approach to confirming lists of significant results. PMID:22738145

  12. Toward cost-efficient sampling methods

    NASA Astrophysics Data System (ADS)

    Luo, Peng; Li, Yongli; Wu, Chong; Zhang, Guijie

    2015-09-01

    The sampling method has been paid much attention in the field of complex network in general and statistical physics in particular. This paper proposes two new sampling methods based on the idea that a small part of vertices with high node degree could possess the most structure information of a complex network. The two proposed sampling methods are efficient in sampling high degree nodes so that they would be useful even if the sampling rate is low, which means cost-efficient. The first new sampling method is developed on the basis of the widely used stratified random sampling (SRS) method and the second one improves the famous snowball sampling (SBS) method. In order to demonstrate the validity and accuracy of two new sampling methods, we compare them with the existing sampling methods in three commonly used simulation networks that are scale-free network, random network, small-world network, and also in two real networks. The experimental results illustrate that the two proposed sampling methods perform much better than the existing sampling methods in terms of achieving the true network structure characteristics reflected by clustering coefficient, Bonacich centrality and average path length, especially when the sampling rate is low.

  13. Statistical Algorithms Accounting for Background Density in the Detection of UXO Target Areas at DoD Munitions Sites

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Matzke, Brett D.; Wilson, John E.; Hathaway, J.

    2008-02-12

    Statistically defensible methods are presented for developing geophysical detector sampling plans and analyzing data for munitions response sites where unexploded ordnance (UXO) may exist. Detection methods for identifying areas of elevated anomaly density from background density are shown. Additionally, methods are described which aid in the choice of transect pattern and spacing to assure with degree of confidence that a target area (TA) of specific size, shape, and anomaly density will be identified using the detection methods. Methods for evaluating the sensitivity of designs to variation in certain parameters are also discussed. Methods presented have been incorporated into the Visualmore » Sample Plan (VSP) software (free at http://dqo.pnl.gov/vsp) and demonstrated at multiple sites in the United States. Application examples from actual transect designs and surveys from the previous two years are demonstrated.« less

  14. MAFsnp: A Multi-Sample Accurate and Flexible SNP Caller Using Next-Generation Sequencing Data

    PubMed Central

    Hu, Jiyuan; Li, Tengfei; Xiu, Zidi; Zhang, Hong

    2015-01-01

    Most existing statistical methods developed for calling single nucleotide polymorphisms (SNPs) using next-generation sequencing (NGS) data are based on Bayesian frameworks, and there does not exist any SNP caller that produces p-values for calling SNPs in a frequentist framework. To fill in this gap, we develop a new method MAFsnp, a Multiple-sample based Accurate and Flexible algorithm for calling SNPs with NGS data. MAFsnp is based on an estimated likelihood ratio test (eLRT) statistic. In practical situation, the involved parameter is very close to the boundary of the parametric space, so the standard large sample property is not suitable to evaluate the finite-sample distribution of the eLRT statistic. Observing that the distribution of the test statistic is a mixture of zero and a continuous part, we propose to model the test statistic with a novel two-parameter mixture distribution. Once the parameters in the mixture distribution are estimated, p-values can be easily calculated for detecting SNPs, and the multiple-testing corrected p-values can be used to control false discovery rate (FDR) at any pre-specified level. With simulated data, MAFsnp is shown to have much better control of FDR than the existing SNP callers. Through the application to two real datasets, MAFsnp is also shown to outperform the existing SNP callers in terms of calling accuracy. An R package “MAFsnp” implementing the new SNP caller is freely available at http://homepage.fudan.edu.cn/zhangh/softwares/. PMID:26309201

  15. Monitoring and identification of spatiotemporal landscape changes in multiple remote sensing images by using a stratified conditional Latin hypercube sampling approach and geostatistical simulation.

    PubMed

    Lin, Yu-Pin; Chu, Hone-Jay; Huang, Yu-Long; Tang, Chia-Hsi; Rouhani, Shahrokh

    2011-06-01

    This study develops a stratified conditional Latin hypercube sampling (scLHS) approach for multiple, remotely sensed, normalized difference vegetation index (NDVI) images. The objective is to sample, monitor, and delineate spatiotemporal landscape changes, including spatial heterogeneity and variability, in a given area. The scLHS approach, which is based on the variance quadtree technique (VQT) and the conditional Latin hypercube sampling (cLHS) method, selects samples in order to delineate landscape changes from multiple NDVI images. The images are then mapped for calibration and validation by using sequential Gaussian simulation (SGS) with the scLHS selected samples. Spatial statistical results indicate that in terms of their statistical distribution, spatial distribution, and spatial variation, the statistics and variograms of the scLHS samples resemble those of multiple NDVI images more closely than those of cLHS and VQT samples. Moreover, the accuracy of simulated NDVI images based on SGS with scLHS samples is significantly better than that of simulated NDVI images based on SGS with cLHS samples and VQT samples, respectively. However, the proposed approach efficiently monitors the spatial characteristics of landscape changes, including the statistics, spatial variability, and heterogeneity of NDVI images. In addition, SGS with the scLHS samples effectively reproduces spatial patterns and landscape changes in multiple NDVI images.

  16. Likert scales, levels of measurement and the "laws" of statistics.

    PubMed

    Norman, Geoff

    2010-12-01

    Reviewers of research reports frequently criticize the choice of statistical methods. While some of these criticisms are well-founded, frequently the use of various parametric methods such as analysis of variance, regression, correlation are faulted because: (a) the sample size is too small, (b) the data may not be normally distributed, or (c) The data are from Likert scales, which are ordinal, so parametric statistics cannot be used. In this paper, I dissect these arguments, and show that many studies, dating back to the 1930s consistently show that parametric statistics are robust with respect to violations of these assumptions. Hence, challenges like those above are unfounded, and parametric methods can be utilized without concern for "getting the wrong answer".

  17. Improving the analysis of composite endpoints in rare disease trials.

    PubMed

    McMenamin, Martina; Berglind, Anna; Wason, James M S

    2018-05-22

    Composite endpoints are recommended in rare diseases to increase power and/or to sufficiently capture complexity. Often, they are in the form of responder indices which contain a mixture of continuous and binary components. Analyses of these outcomes typically treat them as binary, thus only using the dichotomisations of continuous components. The augmented binary method offers a more efficient alternative and is therefore especially useful for rare diseases. Previous work has indicated the method may have poorer statistical properties when the sample size is small. Here we investigate small sample properties and implement small sample corrections. We re-sample from a previous trial with sample sizes varying from 30 to 80. We apply the standard binary and augmented binary methods and determine the power, type I error rate, coverage and average confidence interval width for each of the estimators. We implement Firth's adjustment for the binary component models and a small sample variance correction for the generalized estimating equations, applying the small sample adjusted methods to each sub-sample as before for comparison. For the log-odds treatment effect the power of the augmented binary method is 20-55% compared to 12-20% for the standard binary method. Both methods have approximately nominal type I error rates. The difference in response probabilities exhibit similar power but both unadjusted methods demonstrate type I error rates of 6-8%. The small sample corrected methods have approximately nominal type I error rates. On both scales, the reduction in average confidence interval width when using the adjusted augmented binary method is 17-18%. This is equivalent to requiring a 32% smaller sample size to achieve the same statistical power. The augmented binary method with small sample corrections provides a substantial improvement for rare disease trials using composite endpoints. We recommend the use of the method for the primary analysis in relevant rare disease trials. We emphasise that the method should be used alongside other efforts in improving the quality of evidence generated from rare disease trials rather than replace them.

  18. Suitability and setup of next-generation sequencing-based method for taxonomic characterization of aquatic microbial biofilm.

    PubMed

    Bakal, Tomas; Janata, Jiri; Sabova, Lenka; Grabic, Roman; Zlabek, Vladimir; Najmanova, Lucie

    2018-06-16

    A robust and widely applicable method for sampling of aquatic microbial biofilm and further sample processing is presented. The method is based on next-generation sequencing of V4-V5 variable regions of 16S rRNA gene and further statistical analysis of sequencing data, which could be useful not only to investigate taxonomic composition of biofilm bacterial consortia but also to assess aquatic ecosystem health. Five artificial materials commonly used for biofilm growth (glass, stainless steel, aluminum, polypropylene, polyethylene) were tested to determine the one giving most robust and reproducible results. The effect of used sampler material on total microbial composition was not statistically significant; however, the non-plastic materials (glass, metal) gave more stable outputs without irregularities among sample parallels. The bias of the method is assessed with respect to the employment of a non-quantitative step (PCR amplification) to obtain quantitative results (relative abundance of identified taxa). This aspect is often overlooked in ecological and medical studies. We document that sequencing of a mixture of three merged primary PCR reactions for each sample and further evaluation of median values from three technical replicates for each sample enables to overcome this bias and gives robust and repeatable results well distinguishing among sampling localities and seasons.

  19. Evaluating Composite Sampling Methods of Bacillus spores at Low Concentrations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hess, Becky M.; Amidan, Brett G.; Anderson, Kevin K.

    Restoring facility operations after the 2001 Amerithrax attacks took over three months to complete, highlighting the need to reduce remediation time. The most time intensive tasks were environmental sampling and sample analyses. Composite sampling allows disparate samples to be combined, with only a single analysis needed, making it a promising method to reduce response times. We developed a statistical experimental design to test three different composite sampling methods: 1) single medium single pass composite: a single cellulose sponge samples multiple coupons; 2) single medium multi-pass composite: a single cellulose sponge is used to sample multiple coupons; and 3) multi-medium post-samplemore » composite: a single cellulose sponge samples a single surface, and then multiple sponges are combined during sample extraction. Five spore concentrations of Bacillus atrophaeus Nakamura spores were tested; concentrations ranged from 5 to 100 CFU/coupon (0.00775 to 0.155CFU/cm2, respectively). Study variables included four clean surface materials (stainless steel, vinyl tile, ceramic tile, and painted wallboard) and three grime coated/dirty materials (stainless steel, vinyl tile, and ceramic tile). Analysis of variance for the clean study showed two significant factors: composite method (p-value < 0.0001) and coupon material (p-value = 0.0008). Recovery efficiency (RE) was higher overall using the post-sample composite (PSC) method compared to single medium composite from both clean and grime coated materials. RE with the PSC method for concentrations tested (10 to 100 CFU/coupon) was similar for ceramic tile, painted wall board, and stainless steel for clean materials. RE was lowest for vinyl tile with both composite methods. Statistical tests for the dirty study showed RE was significantly higher for vinyl and stainless steel materials, but significantly lower for ceramic tile. These results suggest post-sample compositing can be used to reduce sample analysis time when responding to a Bacillus anthracis contamination event of clean or dirty surfaces.« less

  20. Data processing of qualitative results from an interlaboratory comparison for the detection of “Flavescence dorée” phytoplasma: How the use of statistics can improve the reliability of the method validation process in plant pathology

    PubMed Central

    Renaudin, Isabelle; Poliakoff, Françoise

    2017-01-01

    A working group established in the framework of the EUPHRESCO European collaborative project aimed to compare and validate diagnostic protocols for the detection of “Flavescence dorée” (FD) phytoplasma in grapevines. Seven molecular protocols were compared in an interlaboratory test performance study where each laboratory had to analyze the same panel of samples consisting of DNA extracts prepared by the organizing laboratory. The tested molecular methods consisted of universal and group-specific real-time and end-point nested PCR tests. Different statistical approaches were applied to this collaborative study. Firstly, there was the standard statistical approach consisting in analyzing samples which are known to be positive and samples which are known to be negative and reporting the proportion of false-positive and false-negative results to respectively calculate diagnostic specificity and sensitivity. This approach was supplemented by the calculation of repeatability and reproducibility for qualitative methods based on the notions of accordance and concordance. Other new approaches were also implemented, based, on the one hand, on the probability of detection model, and, on the other hand, on Bayes’ theorem. These various statistical approaches are complementary and give consistent results. Their combination, and in particular, the introduction of new statistical approaches give overall information on the performance and limitations of the different methods, and are particularly useful for selecting the most appropriate detection scheme with regards to the prevalence of the pathogen. Three real-time PCR protocols (methods M4, M5 and M6 respectively developed by Hren (2007), Pelletier (2009) and under patent oligonucleotides) achieved the highest levels of performance for FD phytoplasma detection. This paper also addresses the issue of indeterminate results and the identification of outlier results. The statistical tools presented in this paper and their combination can be applied to many other studies concerning plant pathogens and other disciplines that use qualitative detection methods. PMID:28384335

  1. Data processing of qualitative results from an interlaboratory comparison for the detection of "Flavescence dorée" phytoplasma: How the use of statistics can improve the reliability of the method validation process in plant pathology.

    PubMed

    Chabirand, Aude; Loiseau, Marianne; Renaudin, Isabelle; Poliakoff, Françoise

    2017-01-01

    A working group established in the framework of the EUPHRESCO European collaborative project aimed to compare and validate diagnostic protocols for the detection of "Flavescence dorée" (FD) phytoplasma in grapevines. Seven molecular protocols were compared in an interlaboratory test performance study where each laboratory had to analyze the same panel of samples consisting of DNA extracts prepared by the organizing laboratory. The tested molecular methods consisted of universal and group-specific real-time and end-point nested PCR tests. Different statistical approaches were applied to this collaborative study. Firstly, there was the standard statistical approach consisting in analyzing samples which are known to be positive and samples which are known to be negative and reporting the proportion of false-positive and false-negative results to respectively calculate diagnostic specificity and sensitivity. This approach was supplemented by the calculation of repeatability and reproducibility for qualitative methods based on the notions of accordance and concordance. Other new approaches were also implemented, based, on the one hand, on the probability of detection model, and, on the other hand, on Bayes' theorem. These various statistical approaches are complementary and give consistent results. Their combination, and in particular, the introduction of new statistical approaches give overall information on the performance and limitations of the different methods, and are particularly useful for selecting the most appropriate detection scheme with regards to the prevalence of the pathogen. Three real-time PCR protocols (methods M4, M5 and M6 respectively developed by Hren (2007), Pelletier (2009) and under patent oligonucleotides) achieved the highest levels of performance for FD phytoplasma detection. This paper also addresses the issue of indeterminate results and the identification of outlier results. The statistical tools presented in this paper and their combination can be applied to many other studies concerning plant pathogens and other disciplines that use qualitative detection methods.

  2. Implementation of novel statistical procedures and other advanced approaches to improve analysis of CASA data.

    PubMed

    Ramón, M; Martínez-Pastor, F

    2018-04-23

    Computer-aided sperm analysis (CASA) produces a wealth of data that is frequently ignored. The use of multiparametric statistical methods can help explore these datasets, unveiling the subpopulation structure of sperm samples. In this review we analyse the significance of the internal heterogeneity of sperm samples and its relevance. We also provide a brief description of the statistical tools used for extracting sperm subpopulations from the datasets, namely unsupervised clustering (with non-hierarchical, hierarchical and two-step methods) and the most advanced supervised methods, based on machine learning. The former method has allowed exploration of subpopulation patterns in many species, whereas the latter offering further possibilities, especially considering functional studies and the practical use of subpopulation analysis. We also consider novel approaches, such as the use of geometric morphometrics or imaging flow cytometry. Finally, although the data provided by CASA systems provides valuable information on sperm samples by applying clustering analyses, there are several caveats. Protocols for capturing and analysing motility or morphometry should be standardised and adapted to each experiment, and the algorithms should be open in order to allow comparison of results between laboratories. Moreover, we must be aware of new technology that could change the paradigm for studying sperm motility and morphology.

  3. Interlaboratory study of free cyanide methods compared to total cyanide measurements and the effect of preservation with sodium hydroxide for secondary- and tertiary-treated waste water samples.

    PubMed

    Stanley, Brett J; Antonio, Karen

    2012-11-01

    Several methods exist for the measurement of cyanide levels in treated wastewater,typically requiring preservation of the sample with sodium hydroxide to minimize loss of hydrogen cyanide gas (HCN). Recent reports have shown that cyanide levels may increase with chlorination or preservation. In this study, three flow injection analysis methods involving colorimetric and amperometric detection were compared within one laboratory, as well as across separate laboratories and equipment. Split wastewater samples from eight facilities and three different sampling periods were tested. An interlaboratory confidence interval of 3.5 ppb was calculated compared with the intralaboratory reporting limit of 2 ppb. The results show that free cyanide measurements are not statistically different than total cyanide levels. An artificial increase in cyanide level is observed with all methods for preserved samples relative to nonpreserved samples, with an average increase of 2.3 ppb. The possible loss of cyanide without preservation is shown to be statistically insignificant if properly stored up to 48 hours. The cyanide increase with preservation is further substantiated with the method of standard additions and is not a matrix interference. The increase appears to be correlated with the amount of cyanide observed without preservation, which appears to be greater in those facilities that disinfect their wastewater with chlorine, followed by dechlorination with sodium bisulfite.

  4. Evaluating the efficiency of environmental monitoring programs

    USGS Publications Warehouse

    Levine, Carrie R.; Yanai, Ruth D.; Lampman, Gregory G.; Burns, Douglas A.; Driscoll, Charles T.; Lawrence, Gregory B.; Lynch, Jason; Schoch, Nina

    2014-01-01

    Statistical uncertainty analyses can be used to improve the efficiency of environmental monitoring, allowing sampling designs to maximize information gained relative to resources required for data collection and analysis. In this paper, we illustrate four methods of data analysis appropriate to four types of environmental monitoring designs. To analyze a long-term record from a single site, we applied a general linear model to weekly stream chemistry data at Biscuit Brook, NY, to simulate the effects of reducing sampling effort and to evaluate statistical confidence in the detection of change over time. To illustrate a detectable difference analysis, we analyzed a one-time survey of mercury concentrations in loon tissues in lakes in the Adirondack Park, NY, demonstrating the effects of sampling intensity on statistical power and the selection of a resampling interval. To illustrate a bootstrapping method, we analyzed the plot-level sampling intensity of forest inventory at the Hubbard Brook Experimental Forest, NH, to quantify the sampling regime needed to achieve a desired confidence interval. Finally, to analyze time-series data from multiple sites, we assessed the number of lakes and the number of samples per year needed to monitor change over time in Adirondack lake chemistry using a repeated-measures mixed-effects model. Evaluations of time series and synoptic long-term monitoring data can help determine whether sampling should be re-allocated in space or time to optimize the use of financial and human resources.

  5. A Bayesian nonparametric method for prediction in EST analysis

    PubMed Central

    Lijoi, Antonio; Mena, Ramsés H; Prünster, Igor

    2007-01-01

    Background Expressed sequence tags (ESTs) analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library. Results In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a) the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b) the number of new unique genes to be observed in a future sample; c) the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail. Conclusion The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample. PMID:17868445

  6. Chi-squared and C statistic minimization for low count per bin data

    NASA Astrophysics Data System (ADS)

    Nousek, John A.; Shue, David R.

    1989-07-01

    Results are presented from a computer simulation comparing two statistical fitting techniques on data samples with large and small counts per bin; the results are then related specifically to X-ray astronomy. The Marquardt and Powell minimization techniques are compared by using both to minimize the chi-squared statistic. In addition, Cash's C statistic is applied, with Powell's method, and it is shown that the C statistic produces better fits in the low-count regime than chi-squared.

  7. Sawmill simulation: concepts and computer use

    Treesearch

    Hugh W. Reynolds; Charles J. Gatchell

    1969-01-01

    Product specifications were fed into a computer so that the yield of products from the same sample of logs could be determined for simulated sawing methods. Since different sawing patterns were tested on the same sample, variation among log samples was eliminated; hence, the statistical conclusions are very precise.

  8. Statistical methods for efficient design of community surveys of response to noise: Random coefficients regression models

    NASA Technical Reports Server (NTRS)

    Tomberlin, T. J.

    1985-01-01

    Research studies of residents' responses to noise consist of interviews with samples of individuals who are drawn from a number of different compact study areas. The statistical techniques developed provide a basis for those sample design decisions. These techniques are suitable for a wide range of sample survey applications. A sample may consist of a random sample of residents selected from a sample of compact study areas, or in a more complex design, of a sample of residents selected from a sample of larger areas (e.g., cities). The techniques may be applied to estimates of the effects on annoyance of noise level, numbers of noise events, the time-of-day of the events, ambient noise levels, or other factors. Methods are provided for determining, in advance, how accurately these effects can be estimated for different sample sizes and study designs. Using a simple cost function, they also provide for optimum allocation of the sample across the stages of the design for estimating these effects. These techniques are developed via a regression model in which the regression coefficients are assumed to be random, with components of variance associated with the various stages of a multi-stage sample design.

  9. Data-Based Detection of Potential Terrorist Attacks: Statistical and Graphical Methods

    DTIC Science & Technology

    2010-06-01

    Naren; Vasquez-Robinet, Cecilia; Watkinson, Jonathan: "A General Probabilistic Model of the PCR Process," Applied Mathematics and Computation 182(1...September 2006. Seminar, Measuring the effect of Length biased sampling, Mathematical Sciences Section, National Security Agency, 19 September 2006...Committee on National Statistics, 9 February 2007. Invited seminar, Statistical Tests for Bullet Lead Comparisons, Department of Mathematics , Butler

  10. Radiation detection method and system using the sequential probability ratio test

    DOEpatents

    Nelson, Karl E [Livermore, CA; Valentine, John D [Redwood City, CA; Beauchamp, Brock R [San Ramon, CA

    2007-07-17

    A method and system using the Sequential Probability Ratio Test to enhance the detection of an elevated level of radiation, by determining whether a set of observations are consistent with a specified model within a given bounds of statistical significance. In particular, the SPRT is used in the present invention to maximize the range of detection, by providing processing mechanisms for estimating the dynamic background radiation, adjusting the models to reflect the amount of background knowledge at the current point in time, analyzing the current sample using the models to determine statistical significance, and determining when the sample has returned to the expected background conditions.

  11. Relations between volumetric measures of brain structure and attentional function in spina bifida: utilization of robust statistical approaches.

    PubMed

    Kulesz, Paulina A; Tian, Siva; Juranek, Jenifer; Fletcher, Jack M; Francis, David J

    2015-03-01

    Weak structure-function relations for brain and behavior may stem from problems in estimating these relations in small clinical samples with frequently occurring outliers. In the current project, we focused on the utility of using alternative statistics to estimate these relations. Fifty-four children with spina bifida meningomyelocele performed attention tasks and received MRI of the brain. Using a bootstrap sampling process, the Pearson product-moment correlation was compared with 4 robust correlations: the percentage bend correlation, the Winsorized correlation, the skipped correlation using the Donoho-Gasko median, and the skipped correlation using the minimum volume ellipsoid estimator. All methods yielded similar estimates of the relations between measures of brain volume and attention performance. The similarity of estimates across correlation methods suggested that the weak structure-function relations previously found in many studies are not readily attributable to the presence of outlying observations and other factors that violate the assumptions behind the Pearson correlation. Given the difficulty of assembling large samples for brain-behavior studies, estimating correlations using multiple, robust methods may enhance the statistical conclusion validity of studies yielding small, but often clinically significant, correlations. PsycINFO Database Record (c) 2015 APA, all rights reserved.

  12. Advanced statistical methods for improved data analysis of NASA astrophysics missions

    NASA Technical Reports Server (NTRS)

    Feigelson, Eric D.

    1992-01-01

    The investigators under this grant studied ways to improve the statistical analysis of astronomical data. They looked at existing techniques, the development of new techniques, and the production and distribution of specialized software to the astronomical community. Abstracts of nine papers that were produced are included, as well as brief descriptions of four software packages. The articles that are abstracted discuss analytical and Monte Carlo comparisons of six different linear least squares fits, a (second) paper on linear regression in astronomy, two reviews of public domain software for the astronomer, subsample and half-sample methods for estimating sampling distributions, a nonparametric estimation of survival functions under dependent competing risks, censoring in astronomical data due to nondetections, an astronomy survival analysis computer package called ASURV, and improving the statistical methodology of astronomical data analysis.

  13. Geographic and temporal validity of prediction models: Different approaches were useful to examine model performance

    PubMed Central

    Austin, Peter C.; van Klaveren, David; Vergouwe, Yvonne; Nieboer, Daan; Lee, Douglas S.; Steyerberg, Ewout W.

    2017-01-01

    Objective Validation of clinical prediction models traditionally refers to the assessment of model performance in new patients. We studied different approaches to geographic and temporal validation in the setting of multicenter data from two time periods. Study Design and Setting We illustrated different analytic methods for validation using a sample of 14,857 patients hospitalized with heart failure at 90 hospitals in two distinct time periods. Bootstrap resampling was used to assess internal validity. Meta-analytic methods were used to assess geographic transportability. Each hospital was used once as a validation sample, with the remaining hospitals used for model derivation. Hospital-specific estimates of discrimination (c-statistic) and calibration (calibration intercepts and slopes) were pooled using random effects meta-analysis methods. I2 statistics and prediction interval width quantified geographic transportability. Temporal transportability was assessed using patients from the earlier period for model derivation and patients from the later period for model validation. Results Estimates of reproducibility, pooled hospital-specific performance, and temporal transportability were on average very similar, with c-statistics of 0.75. Between-hospital variation was moderate according to I2 statistics and prediction intervals for c-statistics. Conclusion This study illustrates how performance of prediction models can be assessed in settings with multicenter data at different time periods. PMID:27262237

  14. Some Tests of Randomness with Applications

    DTIC Science & Technology

    1981-02-01

    freedom. For further details, the reader is referred to Gnanadesikan (1977, p. 169) wherein other relevant tests are also given, Graphical tests, as...sample from a gamma distri- bution. J. Am. Statist. Assoc. 71, 480-7. Gnanadesikan , R. (1977). Methods for Statistical Data Analysis of Multivariate

  15. Regression modeling of particle size distributions in urban storm water: advancements through improved sample collection methods

    USGS Publications Warehouse

    Fienen, Michael N.; Selbig, William R.

    2012-01-01

    A new sample collection system was developed to improve the representation of sediment entrained in urban storm water by integrating water quality samples from the entire water column. The depth-integrated sampler arm (DISA) was able to mitigate sediment stratification bias in storm water, thereby improving the characterization of suspended-sediment concentration and particle size distribution at three independent study locations. Use of the DISA decreased variability, which improved statistical regression to predict particle size distribution using surrogate environmental parameters, such as precipitation depth and intensity. The performance of this statistical modeling technique was compared to results using traditional fixed-point sampling methods and was found to perform better. When environmental parameters can be used to predict particle size distributions, environmental managers have more options when characterizing concentrations, loads, and particle size distributions in urban runoff.

  16. An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data.

    PubMed

    Jenkinson, Garrett; Abante, Jordi; Feinberg, Andrew P; Goutsias, John

    2018-03-07

    DNA methylation is a stable form of epigenetic memory used by cells to control gene expression. Whole genome bisulfite sequencing (WGBS) has emerged as a gold-standard experimental technique for studying DNA methylation by producing high resolution genome-wide methylation profiles. Statistical modeling and analysis is employed to computationally extract and quantify information from these profiles in an effort to identify regions of the genome that demonstrate crucial or aberrant epigenetic behavior. However, the performance of most currently available methods for methylation analysis is hampered by their inability to directly account for statistical dependencies between neighboring methylation sites, thus ignoring significant information available in WGBS reads. We present a powerful information-theoretic approach for genome-wide modeling and analysis of WGBS data based on the 1D Ising model of statistical physics. This approach takes into account correlations in methylation by utilizing a joint probability model that encapsulates all information available in WGBS methylation reads and produces accurate results even when applied on single WGBS samples with low coverage. Using the Shannon entropy, our approach provides a rigorous quantification of methylation stochasticity in individual WGBS samples genome-wide. Furthermore, it utilizes the Jensen-Shannon distance to evaluate differences in methylation distributions between a test and a reference sample. Differential performance assessment using simulated and real human lung normal/cancer data demonstrate a clear superiority of our approach over DSS, a recently proposed method for WGBS data analysis. Critically, these results demonstrate that marginal methods become statistically invalid when correlations are present in the data. This contribution demonstrates clear benefits and the necessity of modeling joint probability distributions of methylation using the 1D Ising model of statistical physics and of quantifying methylation stochasticity using concepts from information theory. By employing this methodology, substantial improvement of DNA methylation analysis can be achieved by effectively taking into account the massive amount of statistical information available in WGBS data, which is largely ignored by existing methods.

  17. Validation of Statistical Sampling Algorithms in Visual Sample Plan (VSP): Summary Report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nuffer, Lisa L; Sego, Landon H.; Wilson, John E.

    2009-02-18

    The U.S. Department of Homeland Security, Office of Technology Development (OTD) contracted with a set of U.S. Department of Energy national laboratories, including the Pacific Northwest National Laboratory (PNNL), to write a Remediation Guidance for Major Airports After a Chemical Attack. The report identifies key activities and issues that should be considered by a typical major airport following an incident involving release of a toxic chemical agent. Four experimental tasks were identified that would require further research in order to supplement the Remediation Guidance. One of the tasks, Task 4, OTD Chemical Remediation Statistical Sampling Design Validation, dealt with statisticalmore » sampling algorithm validation. This report documents the results of the sampling design validation conducted for Task 4. In 2005, the Government Accountability Office (GAO) performed a review of the past U.S. responses to Anthrax terrorist cases. Part of the motivation for this PNNL report was a major GAO finding that there was a lack of validated sampling strategies in the U.S. response to Anthrax cases. The report (GAO 2005) recommended that probability-based methods be used for sampling design in order to address confidence in the results, particularly when all sample results showed no remaining contamination. The GAO also expressed a desire that the methods be validated, which is the main purpose of this PNNL report. The objective of this study was to validate probability-based statistical sampling designs and the algorithms pertinent to within-building sampling that allow the user to prescribe or evaluate confidence levels of conclusions based on data collected as guided by the statistical sampling designs. Specifically, the designs found in the Visual Sample Plan (VSP) software were evaluated. VSP was used to calculate the number of samples and the sample location for a variety of sampling plans applied to an actual release site. Most of the sampling designs validated are probability based, meaning samples are located randomly (or on a randomly placed grid) so no bias enters into the placement of samples, and the number of samples is calculated such that IF the amount and spatial extent of contamination exceeds levels of concern, at least one of the samples would be taken from a contaminated area, at least X% of the time. Hence, "validation" of the statistical sampling algorithms is defined herein to mean ensuring that the "X%" (confidence) is actually met.« less

  18. Smooth quantile normalization.

    PubMed

    Hicks, Stephanie C; Okrah, Kwame; Paulson, Joseph N; Quackenbush, John; Irizarry, Rafael A; Bravo, Héctor Corrada

    2018-04-01

    Between-sample normalization is a critical step in genomic data analysis to remove systematic bias and unwanted technical variation in high-throughput data. Global normalization methods are based on the assumption that observed variability in global properties is due to technical reasons and are unrelated to the biology of interest. For example, some methods correct for differences in sequencing read counts by scaling features to have similar median values across samples, but these fail to reduce other forms of unwanted technical variation. Methods such as quantile normalization transform the statistical distributions across samples to be the same and assume global differences in the distribution are induced by only technical variation. However, it remains unclear how to proceed with normalization if these assumptions are violated, for example, if there are global differences in the statistical distributions between biological conditions or groups, and external information, such as negative or control features, is not available. Here, we introduce a generalization of quantile normalization, referred to as smooth quantile normalization (qsmooth), which is based on the assumption that the statistical distribution of each sample should be the same (or have the same distributional shape) within biological groups or conditions, but allowing that they may differ between groups. We illustrate the advantages of our method on several high-throughput datasets with global differences in distributions corresponding to different biological conditions. We also perform a Monte Carlo simulation study to illustrate the bias-variance tradeoff and root mean squared error of qsmooth compared to other global normalization methods. A software implementation is available from https://github.com/stephaniehicks/qsmooth.

  19. A UNIFIED FRAMEWORK FOR VARIANCE COMPONENT ESTIMATION WITH SUMMARY STATISTICS IN GENOME-WIDE ASSOCIATION STUDIES.

    PubMed

    Zhou, Xiang

    2017-12-01

    Linear mixed models (LMMs) are among the most commonly used tools for genetic association studies. However, the standard method for estimating variance components in LMMs-the restricted maximum likelihood estimation method (REML)-suffers from several important drawbacks: REML requires individual-level genotypes and phenotypes from all samples in the study, is computationally slow, and produces downward-biased estimates in case control studies. To remedy these drawbacks, we present an alternative framework for variance component estimation, which we refer to as MQS. MQS is based on the method of moments (MoM) and the minimal norm quadratic unbiased estimation (MINQUE) criterion, and brings two seemingly unrelated methods-the renowned Haseman-Elston (HE) regression and the recent LD score regression (LDSC)-into the same unified statistical framework. With this new framework, we provide an alternative but mathematically equivalent form of HE that allows for the use of summary statistics. We provide an exact estimation form of LDSC to yield unbiased and statistically more efficient estimates. A key feature of our method is its ability to pair marginal z -scores computed using all samples with SNP correlation information computed using a small random subset of individuals (or individuals from a proper reference panel), while capable of producing estimates that can be almost as accurate as if both quantities are computed using the full data. As a result, our method produces unbiased and statistically efficient estimates, and makes use of summary statistics, while it is computationally efficient for large data sets. Using simulations and applications to 37 phenotypes from 8 real data sets, we illustrate the benefits of our method for estimating and partitioning SNP heritability in population studies as well as for heritability estimation in family studies. Our method is implemented in the GEMMA software package, freely available at www.xzlab.org/software.html.

  20. Study on Measuring the Viscosity of Lubricating Oil by Viscometer Based on Hele - Shaw Principle

    NASA Astrophysics Data System (ADS)

    Li, Longfei

    2017-12-01

    In order to explore the method of accurately measuring the viscosity value of oil samples using the viscometer based on Hele-Shaw principle, three different measurement methods are designed in the laboratory, and the statistical characteristics of the measured values are compared, in order to get the best measurement method. The results show that the oil sample to be measured is placed in the magnetic field formed by the magnet, and the oil sample can be sucked from the same distance from the magnet. The viscosity value of the sample can be measured accurately.

  1. Learning process mapping heuristics under stochastic sampling overheads

    NASA Technical Reports Server (NTRS)

    Ieumwananonthachai, Arthur; Wah, Benjamin W.

    1991-01-01

    A statistical method was developed previously for improving process mapping heuristics. The method systematically explores the space of possible heuristics under a specified time constraint. Its goal is to get the best possible heuristics while trading between the solution quality of the process mapping heuristics and their execution time. The statistical selection method is extended to take into consideration the variations in the amount of time used to evaluate heuristics on a problem instance. The improvement in performance is presented using the more realistic assumption along with some methods that alleviate the additional complexity.

  2. Surveying Europe’s Only Cave-Dwelling Chordate Species (Proteus anguinus) Using Environmental DNA

    PubMed Central

    Márton, Orsolya; Schmidt, Benedikt R.; Gál, Júlia Tünde; Jelić, Dušan

    2017-01-01

    In surveillance of subterranean fauna, especially in the case of rare or elusive aquatic species, traditional techniques used for epigean species are often not feasible. We developed a non-invasive survey method based on environmental DNA (eDNA) to detect the presence of the red-listed cave-dwelling amphibian, Proteus anguinus, in the caves of the Dinaric Karst. We tested the method in fifteen caves in Croatia, from which the species was previously recorded or expected to occur. We successfully confirmed the presence of P. anguinus from ten caves and detected the species for the first time in five others. Using a hierarchical occupancy model we compared the availability and detection probability of eDNA of two water sampling methods, filtration and precipitation. The statistical analysis showed that both availability and detection probability depended on the method and estimates for both probabilities were higher using filter samples than for precipitation samples. Combining reliable field and laboratory methods with robust statistical modeling will give the best estimates of species occurrence. PMID:28129383

  3. On the Use of Biomineral Oxygen Isotope Data to Identify Human Migrants in the Archaeological Record: Intra-Sample Variation, Statistical Methods and Geographical Considerations

    PubMed Central

    Lightfoot, Emma; O’Connell, Tamsin C.

    2016-01-01

    Oxygen isotope analysis of archaeological skeletal remains is an increasingly popular tool to study past human migrations. It is based on the assumption that human body chemistry preserves the δ18O of precipitation in such a way as to be a useful technique for identifying migrants and, potentially, their homelands. In this study, the first such global survey, we draw on published human tooth enamel and bone bioapatite data to explore the validity of using oxygen isotope analyses to identify migrants in the archaeological record. We use human δ18O results to show that there are large variations in human oxygen isotope values within a population sample. This may relate to physiological factors influencing the preservation of the primary isotope signal, or due to human activities (such as brewing, boiling, stewing, differential access to water sources and so on) causing variation in ingested water and food isotope values. We compare the number of outliers identified using various statistical methods. We determine that the most appropriate method for identifying migrants is dependent on the data but is likely to be the IQR or median absolute deviation from the median under most archaeological circumstances. Finally, through a spatial assessment of the dataset, we show that the degree of overlap in human isotope values from different locations across Europe is such that identifying individuals’ homelands on the basis of oxygen isotope analysis alone is not possible for the regions analysed to date. Oxygen isotope analysis is a valid method for identifying first-generation migrants from an archaeological site when used appropriately, however it is difficult to identify migrants using statistical methods for a sample size of less than c. 25 individuals. In the absence of local previous analyses, each sample should be treated as an individual dataset and statistical techniques can be used to identify migrants, but in most cases pinpointing a specific homeland should not be attempted. PMID:27124001

  4. Statistical description of non-Gaussian samples in the F2 layer of the ionosphere during heliogeophysical disturbances

    NASA Astrophysics Data System (ADS)

    Sergeenko, N. P.

    2017-11-01

    An adequate statistical method should be developed in order to predict probabilistically the range of ionospheric parameters. This problem is solved in this paper. The time series of the critical frequency of the layer F2- foF2( t) were subjected to statistical processing. For the obtained samples {δ foF2}, statistical distributions and invariants up to the fourth order are calculated. The analysis shows that the distributions differ from the Gaussian law during the disturbances. At levels of sufficiently small probability distributions, there are arbitrarily large deviations from the model of the normal process. Therefore, it is attempted to describe statistical samples {δ foF2} based on the Poisson model. For the studied samples, the exponential characteristic function is selected under the assumption that time series are a superposition of some deterministic and random processes. Using the Fourier transform, the characteristic function is transformed into a nonholomorphic excessive-asymmetric probability-density function. The statistical distributions of the samples {δ foF2} calculated for the disturbed periods are compared with the obtained model distribution function. According to the Kolmogorov's criterion, the probabilities of the coincidence of a posteriori distributions with the theoretical ones are P 0.7-0.9. The conducted analysis makes it possible to draw a conclusion about the applicability of a model based on the Poisson random process for the statistical description and probabilistic variation estimates during heliogeophysical disturbances of the variations {δ foF2}.

  5. Evidence and Clinical Trials.

    NASA Astrophysics Data System (ADS)

    Goodman, Steven N.

    1989-11-01

    This dissertation explores the use of a mathematical measure of statistical evidence, the log likelihood ratio, in clinical trials. The methods and thinking behind the use of an evidential measure are contrasted with traditional methods of analyzing data, which depend primarily on a p-value as an estimate of the statistical strength of an observed data pattern. It is contended that neither the behavioral dictates of Neyman-Pearson hypothesis testing methods, nor the coherency dictates of Bayesian methods are realistic models on which to base inference. The use of the likelihood alone is applied to four aspects of trial design or conduct: the calculation of sample size, the monitoring of data, testing for the equivalence of two treatments, and meta-analysis--the combining of results from different trials. Finally, a more general model of statistical inference, using belief functions, is used to see if it is possible to separate the assessment of evidence from our background knowledge. It is shown that traditional and Bayesian methods can be modeled as two ends of a continuum of structured background knowledge, methods which summarize evidence at the point of maximum likelihood assuming no structure, and Bayesian methods assuming complete knowledge. Both schools are seen to be missing a concept of ignorance- -uncommitted belief. This concept provides the key to understanding the problem of sampling to a foregone conclusion and the role of frequency properties in statistical inference. The conclusion is that statistical evidence cannot be defined independently of background knowledge, and that frequency properties of an estimator are an indirect measure of uncommitted belief. Several likelihood summaries need to be used in clinical trials, with the quantitative disparity between summaries being an indirect measure of our ignorance. This conclusion is linked with parallel ideas in the philosophy of science and cognitive psychology.

  6. Testing non-inferiority of a new treatment in three-arm clinical trials with binary endpoints.

    PubMed

    Tang, Nian-Sheng; Yu, Bin; Tang, Man-Lai

    2014-12-18

    A two-arm non-inferiority trial without a placebo is usually adopted to demonstrate that an experimental treatment is not worse than a reference treatment by a small pre-specified non-inferiority margin due to ethical concerns. Selection of the non-inferiority margin and establishment of assay sensitivity are two major issues in the design, analysis and interpretation for two-arm non-inferiority trials. Alternatively, a three-arm non-inferiority clinical trial including a placebo is usually conducted to assess the assay sensitivity and internal validity of a trial. Recently, some large-sample approaches have been developed to assess the non-inferiority of a new treatment based on the three-arm trial design. However, these methods behave badly with small sample sizes in the three arms. This manuscript aims to develop some reliable small-sample methods to test three-arm non-inferiority. Saddlepoint approximation, exact and approximate unconditional, and bootstrap-resampling methods are developed to calculate p-values of the Wald-type, score and likelihood ratio tests. Simulation studies are conducted to evaluate their performance in terms of type I error rate and power. Our empirical results show that the saddlepoint approximation method generally behaves better than the asymptotic method based on the Wald-type test statistic. For small sample sizes, approximate unconditional and bootstrap-resampling methods based on the score test statistic perform better in the sense that their corresponding type I error rates are generally closer to the prespecified nominal level than those of other test procedures. Both approximate unconditional and bootstrap-resampling test procedures based on the score test statistic are generally recommended for three-arm non-inferiority trials with binary outcomes.

  7. Misrepresenting random sampling? A systematic review of research papers in the Journal of Advanced Nursing.

    PubMed

    Williamson, Graham R

    2003-11-01

    This paper discusses the theoretical limitations of the use of random sampling and probability theory in the production of a significance level (or P-value) in nursing research. Potential alternatives, in the form of randomization tests, are proposed. Research papers in nursing, medicine and psychology frequently misrepresent their statistical findings, as the P-values reported assume random sampling. In this systematic review of studies published between January 1995 and June 2002 in the Journal of Advanced Nursing, 89 (68%) studies broke this assumption because they used convenience samples or entire populations. As a result, some of the findings may be questionable. The key ideas of random sampling and probability theory for statistical testing (for generating a P-value) are outlined. The result of a systematic review of research papers published in the Journal of Advanced Nursing is then presented, showing how frequently random sampling appears to have been misrepresented. Useful alternative techniques that might overcome these limitations are then discussed. REVIEW LIMITATIONS: This review is limited in scope because it is applied to one journal, and so the findings cannot be generalized to other nursing journals or to nursing research in general. However, it is possible that other nursing journals are also publishing research articles based on the misrepresentation of random sampling. The review is also limited because in several of the articles the sampling method was not completely clearly stated, and in this circumstance a judgment has been made as to the sampling method employed, based on the indications given by author(s). Quantitative researchers in nursing should be very careful that the statistical techniques they use are appropriate for the design and sampling methods of their studies. If the techniques they employ are not appropriate, they run the risk of misinterpreting findings by using inappropriate, unrepresentative and biased samples.

  8. Improved radiation dose efficiency in solution SAXS using a sheath flow sample environment

    PubMed Central

    Kirby, Nigel; Cowieson, Nathan; Hawley, Adrian M.; Mudie, Stephen T.; McGillivray, Duncan J.; Kusel, Michael; Samardzic-Boban, Vesna; Ryan, Timothy M.

    2016-01-01

    Radiation damage is a major limitation to synchrotron small-angle X-ray scattering analysis of biomacromolecules. Flowing the sample during exposure helps to reduce the problem, but its effectiveness in the laminar-flow regime is limited by slow flow velocity at the walls of sample cells. To overcome this limitation, the coflow method was developed, where the sample flows through the centre of its cell surrounded by a flow of matched buffer. The method permits an order-of-magnitude increase of X-ray incident flux before sample damage, improves measurement statistics and maintains low sample concentration limits. The method also efficiently handles sample volumes of a few microlitres, can increase sample throughput, is intrinsically resistant to capillary fouling by sample and is suited to static samples and size-exclusion chromatography applications. The method unlocks further potential of third-generation synchrotron beamlines to facilitate new and challenging applications in solution scattering. PMID:27917826

  9. What Good Are Statistics that Don't Generalize?

    ERIC Educational Resources Information Center

    Shaffer, David Williamson; Serlin, Ronald C.

    2004-01-01

    Quantitative and qualitative inquiry are sometimes portrayed as distinct and incompatible paradigms for research in education. Approaches to combining qualitative and quantitative research typically "integrate" the two methods by letting them co-exist independently within a single research study. Here we describe intra-sample statistical analysis…

  10. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Plemons, R.E.; Hopwood, W.H. Jr.; Hamilton, J.H.

    For a number of years the Oak Ridge Y-12 Plant Laboratory has been analyzing coal predominately for the utilities department of the Y-12 Plant. All laboratory procedures, except a Leco sulfur method which used the Leco Instruction Manual as a reference, were written based on the ASTM coal analyses. Sulfur is analyzed at the present time by two methods, gravimetric and Leco. The laboratory has two major endeavors for monitoring the quality of its coal analyses. (1) A control program by the Plant Statistical Quality Control Department. Quality Control submits one sample for every nine samples submitted by the utilitiesmore » departments and the laboratory analyzes a control sample along with the utilities samples. (2) An exchange program with the DOE Coal Analysis Laboratory in Bruceton, Pennsylvania. The Y-12 Laboratory submits to the DOE Coal Laboratory, on even numbered months, a sample that Y-12 has analyzed. The DOE Coal Laboratory submits, on odd numbered months, one of their analyzed samples to the Y-12 Plant Laboratory to be analyzed. The results of these control and exchange programs are monitored not only by laboratory personnel, but also by Statistical Quality Control personnel who provide statistical evaluations. After analysis and reporting of results, all utilities samples are retained by the laboratory until the coal contracts have been settled. The utilities departments have responsibility for the initiation and preparation of the coal samples. The samples normally received by the laboratory have been ground to 4-mesh, reduced to 0.5-gallon quantities, and sealed in air-tight containers. Sample identification numbers and a Request for Analysis are generated by the utilities departments.« less

  11. A modified weighted function method for parameter estimation of Pearson type three distribution

    NASA Astrophysics Data System (ADS)

    Liang, Zhongmin; Hu, Yiming; Li, Binquan; Yu, Zhongbo

    2014-04-01

    In this paper, an unconventional method called Modified Weighted Function (MWF) is presented for the conventional moment estimation of a probability distribution function. The aim of MWF is to estimate the coefficient of variation (CV) and coefficient of skewness (CS) from the original higher moment computations to the first-order moment calculations. The estimators for CV and CS of Pearson type three distribution function (PE3) were derived by weighting the moments of the distribution with two weight functions, which were constructed by combining two negative exponential-type functions. The selection of these weight functions was based on two considerations: (1) to relate weight functions to sample size in order to reflect the relationship between the quantity of sample information and the role of weight function and (2) to allocate more weights to data close to medium-tail positions in a sample series ranked in an ascending order. A Monte-Carlo experiment was conducted to simulate a large number of samples upon which statistical properties of MWF were investigated. For the PE3 parent distribution, results of MWF were compared to those of the original Weighted Function (WF) and Linear Moments (L-M). The results indicate that MWF was superior to WF and slightly better than L-M, in terms of statistical unbiasness and effectiveness. In addition, the robustness of MWF, WF, and L-M were compared by designing the Monte-Carlo experiment that samples are obtained from Log-Pearson type three distribution (LPE3), three parameter Log-Normal distribution (LN3), and Generalized Extreme Value distribution (GEV), respectively, but all used as samples from the PE3 distribution. The results show that in terms of statistical unbiasness, no one method possesses the absolutely overwhelming advantage among MWF, WF, and L-M, while in terms of statistical effectiveness, the MWF is superior to WF and L-M.

  12. A STATISTICAL SURVEY OF DIOXIN-LIKE COMPOUNDS IN ...

    EPA Pesticide Factsheets

    The USEPA and the USDA completed the first statistically designed survey of the occurrence and concentration of dibenzo-p-dioxins (CDDs), dibenzofurans (CDFs), and coplanar polychlorinated biphenyls (PCBs) in the fat of beef animals raised for human consumption in the United States. Back fat was sampled from 63 carcasses at federally inspected slaughter establishments nationwide. The sample design called for sampling beef animal classes in proportion to national annual slaughter statistics. All samples were analyzed using a modification of EPA method 1613, using isotope dilution, High Resolution GC/MS to determine the rate of occurrence of 2,3,7,8-substituted CDDs/CDFs/PCBs. The method detection limits ranged from 0.05 ng/kg for TCDD to 3 ng/kg for OCDD. The results of this survey showed a mean concentration (reported as I-TEQ, lipid adjusted) in U.S. beef animals of 0.35 ng/kg and 0.89 ng/kg for CDD/CDF TEQs when either non-detects are treated as 0 value or assigned a value of 1/2 the detection limit, respectively, and 0.51 ng/kg for coplanar PCB TEQs at both non-detect equal 0 and 1/2 detection limit. journal article

  13. ARK: Aggregation of Reads by K-Means for Estimation of Bacterial Community Composition.

    PubMed

    Koslicki, David; Chatterjee, Saikat; Shahrivar, Damon; Walker, Alan W; Francis, Suzanna C; Fraser, Louise J; Vehkaperä, Mikko; Lan, Yueheng; Corander, Jukka

    2015-01-01

    Estimation of bacterial community composition from high-throughput sequenced 16S rRNA gene amplicons is a key task in microbial ecology. Since the sequence data from each sample typically consist of a large number of reads and are adversely impacted by different levels of biological and technical noise, accurate analysis of such large datasets is challenging. There has been a recent surge of interest in using compressed sensing inspired and convex-optimization based methods to solve the estimation problem for bacterial community composition. These methods typically rely on summarizing the sequence data by frequencies of low-order k-mers and matching this information statistically with a taxonomically structured database. Here we show that the accuracy of the resulting community composition estimates can be substantially improved by aggregating the reads from a sample with an unsupervised machine learning approach prior to the estimation phase. The aggregation of reads is a pre-processing approach where we use a standard K-means clustering algorithm that partitions a large set of reads into subsets with reasonable computational cost to provide several vectors of first order statistics instead of only single statistical summarization in terms of k-mer frequencies. The output of the clustering is then processed further to obtain the final estimate for each sample. The resulting method is called Aggregation of Reads by K-means (ARK), and it is based on a statistical argument via mixture density formulation. ARK is found to improve the fidelity and robustness of several recently introduced methods, with only a modest increase in computational complexity. An open source, platform-independent implementation of the method in the Julia programming language is freely available at https://github.com/dkoslicki/ARK. A Matlab implementation is available at http://www.ee.kth.se/ctsoftware.

  14. Irradiation-hyperthermia in canine hemangiopericytomas: large-animal model for therapeutic response.

    PubMed

    Richardson, R C; Anderson, V L; Voorhees, W D; Blevins, W E; Inskeep, T K; Janas, W; Shupe, R E; Babbs, C F

    1984-11-01

    Results of irradiation-hyperthermia treatment in 11 dogs with naturally occurring hemangiopericytoma were reported. Similarities of canine and human hemangiopericytomas were described. Orthovoltage X-irradiation followed by microwave-induced hyperthermia resulted in a 91% objective response rate. A statistical procedure was given to evaluate quantitatively the clinical behavior of locally invasive, nonmetastatic tumors in dogs that were undergoing therapy for control of local disease. The procedure used a small sample size and demonstrated distribution of the data on a scaled response as well as transformation of the data through classical parametric and nonparametric statistical methods. These statistical methods set confidence limits on the population mean and placed tolerance limits on a population percentage. Application of the statistical methods to human and animal clinical trials was apparent.

  15. Statistical approaches used to assess and redesign surface water-quality-monitoring networks.

    PubMed

    Khalil, B; Ouarda, T B M J

    2009-11-01

    An up-to-date review of the statistical approaches utilized for the assessment and redesign of surface water quality monitoring (WQM) networks is presented. The main technical aspects of network design are covered in four sections, addressing monitoring objectives, water quality variables, sampling frequency and spatial distribution of sampling locations. This paper discusses various monitoring objectives and related procedures used for the assessment and redesign of long-term surface WQM networks. The appropriateness of each approach for the design, contraction or expansion of monitoring networks is also discussed. For each statistical approach, its advantages and disadvantages are examined from a network design perspective. Possible methods to overcome disadvantages and deficiencies in the statistical approaches that are currently in use are recommended.

  16. Analysis of censored data.

    PubMed

    Lucijanic, Marko; Petrovecki, Mladen

    2012-01-01

    Analyzing events over time is often complicated by incomplete, or censored, observations. Special non-parametric statistical methods were developed to overcome difficulties in summarizing and comparing censored data. Life-table (actuarial) method and Kaplan-Meier method are described with an explanation of survival curves. For the didactic purpose authors prepared a workbook based on most widely used Kaplan-Meier method. It should help the reader understand how Kaplan-Meier method is conceptualized and how it can be used to obtain statistics and survival curves needed to completely describe a sample of patients. Log-rank test and hazard ratio are also discussed.

  17. A Review and Comparison of Methods for Recreating Individual Patient Data from Published Kaplan-Meier Survival Curves for Economic Evaluations: A Simulation Study

    PubMed Central

    Wan, Xiaomin; Peng, Liubao; Li, Yuanjian

    2015-01-01

    Background In general, the individual patient-level data (IPD) collected in clinical trials are not available to independent researchers to conduct economic evaluations; researchers only have access to published survival curves and summary statistics. Thus, methods that use published survival curves and summary statistics to reproduce statistics for economic evaluations are essential. Four methods have been identified: two traditional methods 1) least squares method, 2) graphical method; and two recently proposed methods by 3) Hoyle and Henley, 4) Guyot et al. The four methods were first individually reviewed and subsequently assessed regarding their abilities to estimate mean survival through a simulation study. Methods A number of different scenarios were developed that comprised combinations of various sample sizes, censoring rates and parametric survival distributions. One thousand simulated survival datasets were generated for each scenario, and all methods were applied to actual IPD. The uncertainty in the estimate of mean survival time was also captured. Results All methods provided accurate estimates of the mean survival time when the sample size was 500 and a Weibull distribution was used. When the sample size was 100 and the Weibull distribution was used, the Guyot et al. method was almost as accurate as the Hoyle and Henley method; however, more biases were identified in the traditional methods. When a lognormal distribution was used, the Guyot et al. method generated noticeably less bias and a more accurate uncertainty compared with the Hoyle and Henley method. Conclusions The traditional methods should not be preferred because of their remarkable overestimation. When the Weibull distribution was used for a fitted model, the Guyot et al. method was almost as accurate as the Hoyle and Henley method. However, if the lognormal distribution was used, the Guyot et al. method was less biased compared with the Hoyle and Henley method. PMID:25803659

  18. Errors in patient specimen collection: application of statistical process control.

    PubMed

    Dzik, Walter Sunny; Beckman, Neil; Selleng, Kathleen; Heddle, Nancy; Szczepiorkowski, Zbigniew; Wendel, Silvano; Murphy, Michael

    2008-10-01

    Errors in the collection and labeling of blood samples for pretransfusion testing increase the risk of transfusion-associated patient morbidity and mortality. Statistical process control (SPC) is a recognized method to monitor the performance of a critical process. An easy-to-use SPC method was tested to determine its feasibility as a tool for monitoring quality in transfusion medicine. SPC control charts were adapted to a spreadsheet presentation. Data tabulating the frequency of mislabeled and miscollected blood samples from 10 hospitals in five countries from 2004 to 2006 were used to demonstrate the method. Control charts were produced to monitor process stability. The participating hospitals found the SPC spreadsheet very suitable to monitor the performance of the sample labeling and collection and applied SPC charts to suit their specific needs. One hospital monitored subcategories of sample error in detail. A large hospital monitored the number of wrong-blood-in-tube (WBIT) events. Four smaller-sized facilities, each following the same policy for sample collection, combined their data on WBIT samples into a single control chart. One hospital used the control chart to monitor the effect of an educational intervention. A simple SPC method is described that can monitor the process of sample collection and labeling in any hospital. SPC could be applied to other critical steps in the transfusion processes as a tool for biovigilance and could be used to develop regional or national performance standards for pretransfusion sample collection. A link is provided to download the spreadsheet for free.

  19. Textural Analysis and Substrate Classification in the Nearshore Region of Lake Superior Using High-Resolution Multibeam Bathymetry

    NASA Astrophysics Data System (ADS)

    Dennison, Andrew G.

    Classification of the seafloor substrate can be done with a variety of methods. These methods include Visual (dives, drop cameras); mechanical (cores, grab samples); acoustic (statistical analysis of echosounder returns). Acoustic methods offer a more powerful and efficient means of collecting useful information about the bottom type. Due to the nature of an acoustic survey, larger areas can be sampled, and by combining the collected data with visual and mechanical survey methods provide greater confidence in the classification of a mapped region. During a multibeam sonar survey, both bathymetric and backscatter data is collected. It is well documented that the statistical characteristic of a sonar backscatter mosaic is dependent on bottom type. While classifying the bottom-type on the basis on backscatter alone can accurately predict and map bottom-type, i.e a muddy area from a rocky area, it lacks the ability to resolve and capture fine textural details, an important factor in many habitat mapping studies. Statistical processing of high-resolution multibeam data can capture the pertinent details about the bottom-type that are rich in textural information. Further multivariate statistical processing can then isolate characteristic features, and provide the basis for an accurate classification scheme. The development of a new classification method is described here. It is based upon the analysis of textural features in conjunction with ground truth sampling. The processing and classification result of two geologically distinct areas in nearshore regions of Lake Superior; off the Lester River,MN and Amnicon River, WI are presented here, using the Minnesota Supercomputer Institute's Mesabi computing cluster for initial processing. Processed data is then calibrated using ground truth samples to conduct an accuracy assessment of the surveyed areas. From analysis of high-resolution bathymetry data collected at both survey sites is was possible to successfully calculate a series of measures that describe textural information about the lake floor. Further processing suggests that the features calculated capture a significant amount of statistical information about the lake floor terrain as well. Two sources of error, an anomalous heave and refraction error significantly deteriorated the quality of the processed data and resulting validate results. Ground truth samples used to validate the classification methods utilized for both survey sites, however, resulted in accuracy values ranging from 5 -30 percent at the Amnicon River, and between 60-70 percent for the Lester River. The final results suggest that this new processing methodology does adequately capture textural information about the lake floor and does provide an acceptable classification in the absence of significant data quality issues.

  20. Monitoring Method of Cow Anthrax Based on Gis and Spatial Statistical Analysis

    NASA Astrophysics Data System (ADS)

    Li, Lin; Yang, Yong; Wang, Hongbin; Dong, Jing; Zhao, Yujun; He, Jianbin; Fan, Honggang

    Geographic information system (GIS) is a computer application system, which possesses the ability of manipulating spatial information and has been used in many fields related with the spatial information management. Many methods and models have been established for analyzing animal diseases distribution models and temporal-spatial transmission models. Great benefits have been gained from the application of GIS in animal disease epidemiology. GIS is now a very important tool in animal disease epidemiological research. Spatial analysis function of GIS can be widened and strengthened by using spatial statistical analysis, allowing for the deeper exploration, analysis, manipulation and interpretation of spatial pattern and spatial correlation of the animal disease. In this paper, we analyzed the cow anthrax spatial distribution characteristics in the target district A (due to the secret of epidemic data we call it district A) based on the established GIS of the cow anthrax in this district in combination of spatial statistical analysis and GIS. The Cow anthrax is biogeochemical disease, and its geographical distribution is related closely to the environmental factors of habitats and has some spatial characteristics, and therefore the correct analysis of the spatial distribution of anthrax cow for monitoring and the prevention and control of anthrax has a very important role. However, the application of classic statistical methods in some areas is very difficult because of the pastoral nomadic context. The high mobility of livestock and the lack of enough suitable sampling for the some of the difficulties in monitoring currently make it nearly impossible to apply rigorous random sampling methods. It is thus necessary to develop an alternative sampling method, which could overcome the lack of sampling and meet the requirements for randomness. The GIS computer application software ArcGIS9.1 was used to overcome the lack of data of sampling sites.Using ArcGIS 9.1 and GEODA to analyze the cow anthrax spatial distribution of district A. we gained some conclusions about cow anthrax' density: (1) there is a spatial clustering model. (2) there is an intensely spatial autocorrelation. We established a prediction model to estimate the anthrax distribution based on the spatial characteristic of the density of cow anthrax. Comparing with the true distribution, the prediction model has a well coincidence and is feasible to the application. The method using a GIS tool facilitates can be implemented significantly in the cow anthrax monitoring and investigation, and the space statistics - related prediction model provides a fundamental use for other study on space-related animal diseases.

  1. State-dependent biasing method for importance sampling in the weighted stochastic simulation algorithm.

    PubMed

    Roh, Min K; Gillespie, Dan T; Petzold, Linda R

    2010-11-07

    The weighted stochastic simulation algorithm (wSSA) was developed by Kuwahara and Mura [J. Chem. Phys. 129, 165101 (2008)] to efficiently estimate the probabilities of rare events in discrete stochastic systems. The wSSA uses importance sampling to enhance the statistical accuracy in the estimation of the probability of the rare event. The original algorithm biases the reaction selection step with a fixed importance sampling parameter. In this paper, we introduce a novel method where the biasing parameter is state-dependent. The new method features improved accuracy, efficiency, and robustness.

  2. Atlas(®) Listeria monocytogenes LmG2 Detection Assay Using Transcription Mediated Amplification to Detect Listeria monocytogenes in Selected Foods and Stainless Steel Surface.

    PubMed

    Bres, Vanessa; Yang, Hua; Hsu, Ernie; Ren, Yan; Cheng, Ying; Wisniewski, Michele; Hanhan, Maesa; Zaslavsky, Polina; Noll, Nathan; Weaver, Brett; Campbell, Paul; Reshatoff, Michael; Becker, Michael

    2014-01-01

    The Atlas Listeria monocytogenes LmG2 Detection Assay, developed by Roka Bioscience Inc., was compared to a reference culture method for seven food types (hot dogs, cured ham, deli turkey, chicken salad, vanilla ice cream, frozen chocolate cream pie, and frozen cheese pizza) and one surface (stainless steel, grade 316). A 125 g portion of deli turkey was tested using a 1:4 food:media dilution ratio, and a 25 g portion for all other foods was tested using 1:9 food:media dilution ratio. The enrichment time and media for Roka's method was 24 to 28 h for 25 g food samples and environmental surfaces, and 44 to 48 h for 125 g at 35 ± 2°C in PALCAM broth containing 0.02 g/L nalidixic acid. Comparison of the Atlas Listeria monocytogenes LmG2 Detection Assay to the reference method required an unpaired approach. For each matrix, 20 samples inoculated at a fractional level and five samples inoculated at a high level with a different strain of Listeria monocytogenes were tested by each method. The Atlas Listeria monocytogenes LmG2 Detection Assay was compared to the Official Methods of Analysis of AOAC INTERNATIONAL 993.12 method for dairy products, the U.S. Department of Agriculture, Food Safety and Inspection Service, Microbiology Laboratory Guidebook 8.08 method for ready-to-eat meat and environmental samples, and the U.S. Food and Drug Administration Bacteriological Analytical Manual, Chapter 10 method for frozen foods. In the method developer studies, Roka's method, at 24 h (or 44 h for 125 g food samples), had 126 positives out of 200 total inoculated samples, compared to 102 positives for the reference methods at 48 h. In the independent laboratory studies, vanilla ice cream, deli turkey and stainless steel grade 316 were evaluated. Roka's method, at 24 h (or 44 h for 125 g food samples), had 64 positives out of 75 total inoculated samples compared to 54 positives for the reference methods at 48 h. The Atlas Listeria monocytogenes LmG2 Detection Assay detected all 50 L. monocytogenes strains that encompassed 13 serotypes across the various lineages and none of the 30 exclusive organisms, including seven other Listeria species. The product consistency and kit stability studies revealed no statistical differences between the three lots tested or to the term of the shelf life. Finally, the robustness study demonstrated no statistical differences when samples were incubated at 33 ± 2°C or 37 ± 2°C, when enrichment aliquots were 1.3 mL or 1.7 mL, or when the samples were analyzed the same day or five days later. Overall the Atlas Listeria monocytogenes LmG2 Detection Assay is statistically equivalent to or better than the reference methods and is robust to the tested variations.

  3. Methyl-CpG island-associated genome signature tags

    DOEpatents

    Dunn, John J

    2014-05-20

    Disclosed is a method for analyzing the organismic complexity of a sample through analysis of the nucleic acid in the sample. In the disclosed method, through a series of steps, including digestion with a type II restriction enzyme, ligation of capture adapters and linkers and digestion with a type IIS restriction enzyme, genome signature tags are produced. The sequences of a statistically significant number of the signature tags are determined and the sequences are used to identify and quantify the organisms in the sample. Various embodiments of the invention described herein include methods for using single point genome signature tags to analyze the related families present in a sample, methods for analyzing sequences associated with hyper- and hypo-methylated CpG islands, methods for visualizing organismic complexity change in a sampling location over time and methods for generating the genome signature tag profile of a sample of fragmented DNA.

  4. Robust functional statistics applied to Probability Density Function shape screening of sEMG data.

    PubMed

    Boudaoud, S; Rix, H; Al Harrach, M; Marin, F

    2014-01-01

    Recent studies pointed out possible shape modifications of the Probability Density Function (PDF) of surface electromyographical (sEMG) data according to several contexts like fatigue and muscle force increase. Following this idea, criteria have been proposed to monitor these shape modifications mainly using High Order Statistics (HOS) parameters like skewness and kurtosis. In experimental conditions, these parameters are confronted with small sample size in the estimation process. This small sample size induces errors in the estimated HOS parameters restraining real-time and precise sEMG PDF shape monitoring. Recently, a functional formalism, the Core Shape Model (CSM), has been used to analyse shape modifications of PDF curves. In this work, taking inspiration from CSM method, robust functional statistics are proposed to emulate both skewness and kurtosis behaviors. These functional statistics combine both kernel density estimation and PDF shape distances to evaluate shape modifications even in presence of small sample size. Then, the proposed statistics are tested, using Monte Carlo simulations, on both normal and Log-normal PDFs that mimic observed sEMG PDF shape behavior during muscle contraction. According to the obtained results, the functional statistics seem to be more robust than HOS parameters to small sample size effect and more accurate in sEMG PDF shape screening applications.

  5. Testing for independence in J×K contingency tables with complex sample survey data.

    PubMed

    Lipsitz, Stuart R; Fitzmaurice, Garrett M; Sinha, Debajyoti; Hevelone, Nathanael; Giovannucci, Edward; Hu, Jim C

    2015-09-01

    The test of independence of row and column variables in a (J×K) contingency table is a widely used statistical test in many areas of application. For complex survey samples, use of the standard Pearson chi-squared test is inappropriate due to correlation among units within the same cluster. Rao and Scott (1981, Journal of the American Statistical Association 76, 221-230) proposed an approach in which the standard Pearson chi-squared statistic is multiplied by a design effect to adjust for the complex survey design. Unfortunately, this test fails to exist when one of the observed cell counts equals zero. Even with the large samples typical of many complex surveys, zero cell counts can occur for rare events, small domains, or contingency tables with a large number of cells. Here, we propose Wald and score test statistics for independence based on weighted least squares estimating equations. In contrast to the Rao-Scott test statistic, the proposed Wald and score test statistics always exist. In simulations, the score test is found to perform best with respect to type I error. The proposed method is motivated by, and applied to, post surgical complications data from the United States' Nationwide Inpatient Sample (NIS) complex survey of hospitals in 2008. © 2015, The International Biometric Society.

  6. A review and comparison of methods for recreating individual patient data from published Kaplan-Meier survival curves for economic evaluations: a simulation study.

    PubMed

    Wan, Xiaomin; Peng, Liubao; Li, Yuanjian

    2015-01-01

    In general, the individual patient-level data (IPD) collected in clinical trials are not available to independent researchers to conduct economic evaluations; researchers only have access to published survival curves and summary statistics. Thus, methods that use published survival curves and summary statistics to reproduce statistics for economic evaluations are essential. Four methods have been identified: two traditional methods 1) least squares method, 2) graphical method; and two recently proposed methods by 3) Hoyle and Henley, 4) Guyot et al. The four methods were first individually reviewed and subsequently assessed regarding their abilities to estimate mean survival through a simulation study. A number of different scenarios were developed that comprised combinations of various sample sizes, censoring rates and parametric survival distributions. One thousand simulated survival datasets were generated for each scenario, and all methods were applied to actual IPD. The uncertainty in the estimate of mean survival time was also captured. All methods provided accurate estimates of the mean survival time when the sample size was 500 and a Weibull distribution was used. When the sample size was 100 and the Weibull distribution was used, the Guyot et al. method was almost as accurate as the Hoyle and Henley method; however, more biases were identified in the traditional methods. When a lognormal distribution was used, the Guyot et al. method generated noticeably less bias and a more accurate uncertainty compared with the Hoyle and Henley method. The traditional methods should not be preferred because of their remarkable overestimation. When the Weibull distribution was used for a fitted model, the Guyot et al. method was almost as accurate as the Hoyle and Henley method. However, if the lognormal distribution was used, the Guyot et al. method was less biased compared with the Hoyle and Henley method.

  7. A Comparison of Single Sample and Bootstrap Methods to Assess Mediation in Cluster Randomized Trials

    ERIC Educational Resources Information Center

    Pituch, Keenan A.; Stapleton, Laura M.; Kang, Joo Youn

    2006-01-01

    A Monte Carlo study examined the statistical performance of single sample and bootstrap methods that can be used to test and form confidence interval estimates of indirect effects in two cluster randomized experimental designs. The designs were similar in that they featured random assignment of clusters to one of two treatment conditions and…

  8. Statistical methods and errors in family medicine articles between 2010 and 2014-Suez Canal University, Egypt: A cross-sectional study

    PubMed Central

    Nour-Eldein, Hebatallah

    2016-01-01

    Background: With limited statistical knowledge of most physicians it is not uncommon to find statistical errors in research articles. Objectives: To determine the statistical methods and to assess the statistical errors in family medicine (FM) research articles that were published between 2010 and 2014. Methods: This was a cross-sectional study. All 66 FM research articles that were published over 5 years by FM authors with affiliation to Suez Canal University were screened by the researcher between May and August 2015. Types and frequencies of statistical methods were reviewed in all 66 FM articles. All 60 articles with identified inferential statistics were examined for statistical errors and deficiencies. A comprehensive 58-item checklist based on statistical guidelines was used to evaluate the statistical quality of FM articles. Results: Inferential methods were recorded in 62/66 (93.9%) of FM articles. Advanced analyses were used in 29/66 (43.9%). Contingency tables 38/66 (57.6%), regression (logistic, linear) 26/66 (39.4%), and t-test 17/66 (25.8%) were the most commonly used inferential tests. Within 60 FM articles with identified inferential statistics, no prior sample size 19/60 (31.7%), application of wrong statistical tests 17/60 (28.3%), incomplete documentation of statistics 59/60 (98.3%), reporting P value without test statistics 32/60 (53.3%), no reporting confidence interval with effect size measures 12/60 (20.0%), use of mean (standard deviation) to describe ordinal/nonnormal data 8/60 (13.3%), and errors related to interpretation were mainly for conclusions without support by the study data 5/60 (8.3%). Conclusion: Inferential statistics were used in the majority of FM articles. Data analysis and reporting statistics are areas for improvement in FM research articles. PMID:27453839

  9. Bayesian demography 250 years after Bayes

    PubMed Central

    Bijak, Jakub; Bryant, John

    2016-01-01

    Bayesian statistics offers an alternative to classical (frequentist) statistics. It is distinguished by its use of probability distributions to describe uncertain quantities, which leads to elegant solutions to many difficult statistical problems. Although Bayesian demography, like Bayesian statistics more generally, is around 250 years old, only recently has it begun to flourish. The aim of this paper is to review the achievements of Bayesian demography, address some misconceptions, and make the case for wider use of Bayesian methods in population studies. We focus on three applications: demographic forecasts, limited data, and highly structured or complex models. The key advantages of Bayesian methods are the ability to integrate information from multiple sources and to describe uncertainty coherently. Bayesian methods also allow for including additional (prior) information next to the data sample. As such, Bayesian approaches are complementary to many traditional methods, which can be productively re-expressed in Bayesian terms. PMID:26902889

  10. Sampling in health geography: reconciling geographical objectives and probabilistic methods. An example of a health survey in Vientiane (Lao PDR)

    PubMed Central

    Vallée, Julie; Souris, Marc; Fournet, Florence; Bochaton, Audrey; Mobillion, Virginie; Peyronnie, Karine; Salem, Gérard

    2007-01-01

    Background Geographical objectives and probabilistic methods are difficult to reconcile in a unique health survey. Probabilistic methods focus on individuals to provide estimates of a variable's prevalence with a certain precision, while geographical approaches emphasise the selection of specific areas to study interactions between spatial characteristics and health outcomes. A sample selected from a small number of specific areas creates statistical challenges: the observations are not independent at the local level, and this results in poor statistical validity at the global level. Therefore, it is difficult to construct a sample that is appropriate for both geographical and probability methods. Methods We used a two-stage selection procedure with a first non-random stage of selection of clusters. Instead of randomly selecting clusters, we deliberately chose a group of clusters, which as a whole would contain all the variation in health measures in the population. As there was no health information available before the survey, we selected a priori determinants that can influence the spatial homogeneity of the health characteristics. This method yields a distribution of variables in the sample that closely resembles that in the overall population, something that cannot be guaranteed with randomly-selected clusters, especially if the number of selected clusters is small. In this way, we were able to survey specific areas while minimising design effects and maximising statistical precision. Application We applied this strategy in a health survey carried out in Vientiane, Lao People's Democratic Republic. We selected well-known health determinants with unequal spatial distribution within the city: nationality and literacy. We deliberately selected a combination of clusters whose distribution of nationality and literacy is similar to the distribution in the general population. Conclusion This paper describes the conceptual reasoning behind the construction of the survey sample and shows that it can be advantageous to choose clusters using reasoned hypotheses, based on both probability and geographical approaches, in contrast to a conventional, random cluster selection strategy. PMID:17543100

  11. Surveying immigrants without sampling frames - evaluating the success of alternative field methods.

    PubMed

    Reichel, David; Morales, Laura

    2017-01-01

    This paper evaluates the sampling methods of an international survey, the Immigrant Citizens Survey, which aimed at surveying immigrants from outside the European Union (EU) in 15 cities in seven EU countries. In five countries, no sample frame was available for the target population. Consequently, alternative ways to obtain a representative sample had to be found. In three countries 'location sampling' was employed, while in two countries traditional methods were used with adaptations to reach the target population. The paper assesses the main methodological challenges of carrying out a survey among a group of immigrants for whom no sampling frame exists. The samples of the survey in these five countries are compared to results of official statistics in order to assess the accuracy of the samples obtained through the different sampling methods. It can be shown that alternative sampling methods can provide meaningful results in terms of core demographic characteristics although some estimates differ to some extent from the census results.

  12. Correcting for Optimistic Prediction in Small Data Sets

    PubMed Central

    Smith, Gordon C. S.; Seaman, Shaun R.; Wood, Angela M.; Royston, Patrick; White, Ian R.

    2014-01-01

    The C statistic is a commonly reported measure of screening test performance. Optimistic estimation of the C statistic is a frequent problem because of overfitting of statistical models in small data sets, and methods exist to correct for this issue. However, many studies do not use such methods, and those that do correct for optimism use diverse methods, some of which are known to be biased. We used clinical data sets (United Kingdom Down syndrome screening data from Glasgow (1991–2003), Edinburgh (1999–2003), and Cambridge (1990–2006), as well as Scottish national pregnancy discharge data (2004–2007)) to evaluate different approaches to adjustment for optimism. We found that sample splitting, cross-validation without replication, and leave-1-out cross-validation produced optimism-adjusted estimates of the C statistic that were biased and/or associated with greater absolute error than other available methods. Cross-validation with replication, bootstrapping, and a new method (leave-pair-out cross-validation) all generated unbiased optimism-adjusted estimates of the C statistic and had similar absolute errors in the clinical data set. Larger simulation studies confirmed that all 3 methods performed similarly with 10 or more events per variable, or when the C statistic was 0.9 or greater. However, with lower events per variable or lower C statistics, bootstrapping tended to be optimistic but with lower absolute and mean squared errors than both methods of cross-validation. PMID:24966219

  13. Methods of analysis by the U.S. Geological Survey National Water Quality Laboratory : evaluation of alkaline persulfate digestion as an alternative to Kjeldahl digestion for determination of total and dissolved nitrogen and phosphorus in water

    USGS Publications Warehouse

    Patton, Charles J.; Kryskalla, Jennifer R.

    2003-01-01

    Alkaline persulfate digestion was evaluated and validated as a more sensitive, accurate, and less toxic alternative to Kjeldahl digestion for routine determination of nitrogen and phosphorus in surface- and ground-water samples in a large-scale and geographically diverse study conducted by U.S. Geological Survey (USGS) between October 1, 2001, and September 30, 2002. Data for this study were obtained from about 2,100 surface- and ground-water samples that were analyzed for Kjeldahl nitrogen and Kjeldahl phosphorus in the course of routine operations at the USGS National Water Quality Laboratory (NWQL). These samples were analyzed independently for total nitrogen and total phosphorus using an alkaline persulfate digestion method developed by the NWQL Methods Research and Development Program. About half of these samples were collected during nominally high-flow (April-June) conditions and the other half were collected during nominally low-flow (August-September) conditions. The number of filtered and whole-water samples analyzed from each flow regime was about equal.By operational definition, Kjeldahl nitrogen (ammonium + organic nitrogen) and alkaline persulfate digestion total nitrogen (ammonium + nitrite + nitrate + organic nitrogen) are not equivalent. It was necessary, therefore, to reconcile this operational difference by subtracting nitrate + nitrite concentra-tions from alkaline persulfate dissolved and total nitrogen concentrations prior to graphical and statistical comparisons with dissolved and total Kjeldahl nitrogen concentrations. On the basis of two-population paired t-test statistics, the means of all nitrate-corrected alkaline persulfate nitrogen and Kjeldahl nitrogen concentrations (2,066 paired results) were significantly different from zero at the p = 0.05 level. Statistically, the means of Kjeldahl nitrogen concentrations were greater than those of nitrate-corrected alkaline persulfate nitrogen concentrations. Experimental evidence strongly suggests, however, that this apparent low bias resulted from nitrate interference in the Kjeldahl digestion method rather than low nitrogen recovery by the alkaline persulfate digestion method. Typically, differences between means of Kjeldahl nitrogen and nitrate-corrected alkaline persulfate nitrogen in low-nitrate concentration (< 0.1 milligram nitrate nitrogen per liter) subsets of filtered surface- and ground-water samples were statistically equivalent to zero at the p =level.Paired analytical results for dissolved and total phosphorus in Kjeldahl and alkaline persulfate digests were directly comparable because both digestion methods convert all forms of phosphorus in water samples to orthophosphate. On the basis of two-population paired t-test statistics, the means of all Kjeldahl phosphorus and alkaline persulfate phosphorus concentrations (2,093 paired results) were not significantly different from zero at the p = 0.05 level. For some subsets of these data, which were grouped according to water type and flow conditions at the time of sample collection, differences between means of Kjeldahl phosphorus and alkaline persulfate phosphorus concentrations were not equivalent to zero at the p = 0.05 level. Differences between means of these subsets, however, were less than the method detection limit for phosphorus (0.007 milligram phosphorus per liter) by the alkaline persulfate digestion method, and were therefore analytically insignificant.This report provides details of the alkaline persulfate digestion procedure, interference studies, recovery of various nitrogen- and phosphorus-containing compounds, and other analytical figures of merit. The automated air-segmented continuous flow methods developed to determine nitrate and orthophosphate in the alkaline persulfate digests also are described. About 125 microliters of digested sample are required to determine nitrogen and phosphorus in parallel at a rate of about 100 samples per hour with less than 1-percent sample in

  14. Monte Carlo Simulation for Perusal and Practice.

    ERIC Educational Resources Information Center

    Brooks, Gordon P.; Barcikowski, Robert S.; Robey, Randall R.

    The meaningful investigation of many problems in statistics can be solved through Monte Carlo methods. Monte Carlo studies can help solve problems that are mathematically intractable through the analysis of random samples from populations whose characteristics are known to the researcher. Using Monte Carlo simulation, the values of a statistic are…

  15. Constructing Sample Space with Combinatorial Reasoning: A Mixed Methods Study

    ERIC Educational Resources Information Center

    McGalliard, William A., III.

    2012-01-01

    Recent curricular developments suggest that students at all levels need to be statistically literate and able to efficiently and accurately make probabilistic decisions. Furthermore, statistical literacy is a requirement to being a well-informed citizen of society. Research also recognizes that the ability to reason probabilistically is supported…

  16. Review of Statistical Methods for Analysing Healthcare Resources and Costs

    PubMed Central

    Mihaylova, Borislava; Briggs, Andrew; O'Hagan, Anthony; Thompson, Simon G

    2011-01-01

    We review statistical methods for analysing healthcare resource use and costs, their ability to address skewness, excess zeros, multimodality and heavy right tails, and their ease for general use. We aim to provide guidance on analysing resource use and costs focusing on randomised trials, although methods often have wider applicability. Twelve broad categories of methods were identified: (I) methods based on the normal distribution, (II) methods following transformation of data, (III) single-distribution generalized linear models (GLMs), (IV) parametric models based on skewed distributions outside the GLM family, (V) models based on mixtures of parametric distributions, (VI) two (or multi)-part and Tobit models, (VII) survival methods, (VIII) non-parametric methods, (IX) methods based on truncation or trimming of data, (X) data components models, (XI) methods based on averaging across models, and (XII) Markov chain methods. Based on this review, our recommendations are that, first, simple methods are preferred in large samples where the near-normality of sample means is assured. Second, in somewhat smaller samples, relatively simple methods, able to deal with one or two of above data characteristics, may be preferable but checking sensitivity to assumptions is necessary. Finally, some more complex methods hold promise, but are relatively untried; their implementation requires substantial expertise and they are not currently recommended for wider applied work. Copyright © 2010 John Wiley & Sons, Ltd. PMID:20799344

  17. A method to estimate the effect of deformable image registration uncertainties on daily dose mapping

    PubMed Central

    Murphy, Martin J.; Salguero, Francisco J.; Siebers, Jeffrey V.; Staub, David; Vaman, Constantin

    2012-01-01

    Purpose: To develop a statistical sampling procedure for spatially-correlated uncertainties in deformable image registration and then use it to demonstrate their effect on daily dose mapping. Methods: Sequential daily CT studies are acquired to map anatomical variations prior to fractionated external beam radiotherapy. The CTs are deformably registered to the planning CT to obtain displacement vector fields (DVFs). The DVFs are used to accumulate the dose delivered each day onto the planning CT. Each DVF has spatially-correlated uncertainties associated with it. Principal components analysis (PCA) is applied to measured DVF error maps to produce decorrelated principal component modes of the errors. The modes are sampled independently and reconstructed to produce synthetic registration error maps. The synthetic error maps are convolved with dose mapped via deformable registration to model the resulting uncertainty in the dose mapping. The results are compared to the dose mapping uncertainty that would result from uncorrelated DVF errors that vary randomly from voxel to voxel. Results: The error sampling method is shown to produce synthetic DVF error maps that are statistically indistinguishable from the observed error maps. Spatially-correlated DVF uncertainties modeled by our procedure produce patterns of dose mapping error that are different from that due to randomly distributed uncertainties. Conclusions: Deformable image registration uncertainties have complex spatial distributions. The authors have developed and tested a method to decorrelate the spatial uncertainties and make statistical samples of highly correlated error maps. The sample error maps can be used to investigate the effect of DVF uncertainties on daily dose mapping via deformable image registration. An initial demonstration of this methodology shows that dose mapping uncertainties can be sensitive to spatial patterns in the DVF uncertainties. PMID:22320766

  18. Estimation of distributional parameters for censored trace level water quality data: 2. Verification and applications

    USGS Publications Warehouse

    Helsel, Dennis R.; Gilliom, Robert J.

    1986-01-01

    Estimates of distributional parameters (mean, standard deviation, median, interquartile range) are often desired for data sets containing censored observations. Eight methods for estimating these parameters have been evaluated by R. J. Gilliom and D. R. Helsel (this issue) using Monte Carlo simulations. To verify those findings, the same methods are now applied to actual water quality data. The best method (lowest root-mean-squared error (rmse)) over all parameters, sample sizes, and censoring levels is log probability regression (LR), the method found best in the Monte Carlo simulations. Best methods for estimating moment or percentile parameters separately are also identical to the simulations. Reliability of these estimates can be expressed as confidence intervals using rmse and bias values taken from the simulation results. Finally, a new simulation study shows that best methods for estimating uncensored sample statistics from censored data sets are identical to those for estimating population parameters. Thus this study and the companion study by Gilliom and Helsel form the basis for making the best possible estimates of either population parameters or sample statistics from censored water quality data, and for assessments of their reliability.

  19. Methodological reporting of randomized trials in five leading Chinese nursing journals.

    PubMed

    Shi, Chunhu; Tian, Jinhui; Ren, Dan; Wei, Hongli; Zhang, Lihuan; Wang, Quan; Yang, Kehu

    2014-01-01

    Randomized controlled trials (RCTs) are not always well reported, especially in terms of their methodological descriptions. This study aimed to investigate the adherence of methodological reporting complying with CONSORT and explore associated trial level variables in the Chinese nursing care field. In June 2012, we identified RCTs published in five leading Chinese nursing journals and included trials with details of randomized methods. The quality of methodological reporting was measured through the methods section of the CONSORT checklist and the overall CONSORT methodological items score was calculated and expressed as a percentage. Meanwhile, we hypothesized that some general and methodological characteristics were associated with reporting quality and conducted a regression with these data to explore the correlation. The descriptive and regression statistics were calculated via SPSS 13.0. In total, 680 RCTs were included. The overall CONSORT methodological items score was 6.34 ± 0.97 (Mean ± SD). No RCT reported descriptions and changes in "trial design," changes in "outcomes" and "implementation," or descriptions of the similarity of interventions for "blinding." Poor reporting was found in detailing the "settings of participants" (13.1%), "type of randomization sequence generation" (1.8%), calculation methods of "sample size" (0.4%), explanation of any interim analyses and stopping guidelines for "sample size" (0.3%), "allocation concealment mechanism" (0.3%), additional analyses in "statistical methods" (2.1%), and targeted subjects and methods of "blinding" (5.9%). More than 50% of trials described randomization sequence generation, the eligibility criteria of "participants," "interventions," and definitions of the "outcomes" and "statistical methods." The regression analysis found that publication year and ITT analysis were weakly associated with CONSORT score. The completeness of methodological reporting of RCTs in the Chinese nursing care field is poor, especially with regard to the reporting of trial design, changes in outcomes, sample size calculation, allocation concealment, blinding, and statistical methods.

  20. Transport Coefficients from Large Deviation Functions

    NASA Astrophysics Data System (ADS)

    Gao, Chloe; Limmer, David

    2017-10-01

    We describe a method for computing transport coefficients from the direct evaluation of large deviation function. This method is general, relying on only equilibrium fluctuations, and is statistically efficient, employing trajectory based importance sampling. Equilibrium fluctuations of molecular currents are characterized by their large deviation functions, which is a scaled cumulant generating function analogous to the free energy. A diffusion Monte Carlo algorithm is used to evaluate the large deviation functions, from which arbitrary transport coefficients are derivable. We find significant statistical improvement over traditional Green-Kubo based calculations. The systematic and statistical errors of this method are analyzed in the context of specific transport coefficient calculations, including the shear viscosity, interfacial friction coefficient, and thermal conductivity.

  1. A Non-Intrusive Algorithm for Sensitivity Analysis of Chaotic Flow Simulations

    NASA Technical Reports Server (NTRS)

    Blonigan, Patrick J.; Wang, Qiqi; Nielsen, Eric J.; Diskin, Boris

    2017-01-01

    We demonstrate a novel algorithm for computing the sensitivity of statistics in chaotic flow simulations to parameter perturbations. The algorithm is non-intrusive but requires exposing an interface. Based on the principle of shadowing in dynamical systems, this algorithm is designed to reduce the effect of the sampling error in computing sensitivity of statistics in chaotic simulations. We compare the effectiveness of this method to that of the conventional finite difference method.

  2. Estimating and comparing microbial diversity in the presence of sequencing errors

    PubMed Central

    Chiu, Chun-Huo

    2016-01-01

    Estimating and comparing microbial diversity are statistically challenging due to limited sampling and possible sequencing errors for low-frequency counts, producing spurious singletons. The inflated singleton count seriously affects statistical analysis and inferences about microbial diversity. Previous statistical approaches to tackle the sequencing errors generally require different parametric assumptions about the sampling model or about the functional form of frequency counts. Different parametric assumptions may lead to drastically different diversity estimates. We focus on nonparametric methods which are universally valid for all parametric assumptions and can be used to compare diversity across communities. We develop here a nonparametric estimator of the true singleton count to replace the spurious singleton count in all methods/approaches. Our estimator of the true singleton count is in terms of the frequency counts of doubletons, tripletons and quadrupletons, provided these three frequency counts are reliable. To quantify microbial alpha diversity for an individual community, we adopt the measure of Hill numbers (effective number of taxa) under a nonparametric framework. Hill numbers, parameterized by an order q that determines the measures’ emphasis on rare or common species, include taxa richness (q = 0), Shannon diversity (q = 1, the exponential of Shannon entropy), and Simpson diversity (q = 2, the inverse of Simpson index). A diversity profile which depicts the Hill number as a function of order q conveys all information contained in a taxa abundance distribution. Based on the estimated singleton count and the original non-singleton frequency counts, two statistical approaches (non-asymptotic and asymptotic) are developed to compare microbial diversity for multiple communities. (1) A non-asymptotic approach refers to the comparison of estimated diversities of standardized samples with a common finite sample size or sample completeness. This approach aims to compare diversity estimates for equally-large or equally-complete samples; it is based on the seamless rarefaction and extrapolation sampling curves of Hill numbers, specifically for q = 0, 1 and 2. (2) An asymptotic approach refers to the comparison of the estimated asymptotic diversity profiles. That is, this approach compares the estimated profiles for complete samples or samples whose size tends to be sufficiently large. It is based on statistical estimation of the true Hill number of any order q ≥ 0. In the two approaches, replacing the spurious singleton count by our estimated count, we can greatly remove the positive biases associated with diversity estimates due to spurious singletons and also make fair comparisons across microbial communities, as illustrated in our simulation results and in applying our method to analyze sequencing data from viral metagenomes. PMID:26855872

  3. Statistical and Machine Learning forecasting methods: Concerns and ways forward

    PubMed Central

    Makridakis, Spyros; Assimakopoulos, Vassilios

    2018-01-01

    Machine Learning (ML) methods have been proposed in the academic literature as alternatives to statistical ones for time series forecasting. Yet, scant evidence is available about their relative performance in terms of accuracy and computational requirements. The purpose of this paper is to evaluate such performance across multiple forecasting horizons using a large subset of 1045 monthly time series used in the M3 Competition. After comparing the post-sample accuracy of popular ML methods with that of eight traditional statistical ones, we found that the former are dominated across both accuracy measures used and for all forecasting horizons examined. Moreover, we observed that their computational requirements are considerably greater than those of statistical methods. The paper discusses the results, explains why the accuracy of ML models is below that of statistical ones and proposes some possible ways forward. The empirical results found in our research stress the need for objective and unbiased ways to test the performance of forecasting methods that can be achieved through sizable and open competitions allowing meaningful comparisons and definite conclusions. PMID:29584784

  4. Robust matching for voice recognition

    NASA Astrophysics Data System (ADS)

    Higgins, Alan; Bahler, L.; Porter, J.; Blais, P.

    1994-10-01

    This paper describes an automated method of comparing a voice sample of an unknown individual with samples from known speakers in order to establish or verify the individual's identity. The method is based on a statistical pattern matching approach that employs a simple training procedure, requires no human intervention (transcription, work or phonetic marketing, etc.), and makes no assumptions regarding the expected form of the statistical distributions of the observations. The content of the speech material (vocabulary, grammar, etc.) is not assumed to be constrained in any way. An algorithm is described which incorporates frame pruning and channel equalization processes designed to achieve robust performance with reasonable computational resources. An experimental implementation demonstrating the feasibility of the concept is described.

  5. Forest statistics for Southwest Arkansas counties

    Treesearch

    T. Richard Quick; Mary S. Hedlund

    1979-01-01

    These tables were derived from data obtained during a 1978 inventory of 20 counties comprising the Southwest Unit of Arkansas (fig. 1). The data on forest acreage and timber volume were secured by a systematic sampling method involving a forest-nonforest classification on aerial photographs and on-the-ground measurements of trees at sample locations. The sample...

  6. Forest statistics for plateau Tennessee counties

    Treesearch

    Renewable Resources Evaluation Research Work Unit

    1982-01-01

    These tables were derived from data obtained during a 1980 inventory of 16 counties comprising the Plateau Unit of Tennessee (fib. 1). The data on forest acreage and timber volume were secured by a systematic sampling method involving a forest-nonforest classification on aerial photographs and on-the-ground measurements of trees at sample locations. The sample...

  7. Forest statistics for Northwest Louisiana Parishes

    Treesearch

    James F. Rosson; Daniel F. Bertelson

    1985-01-01

    These tables were derived from data obtained during a 1984 inventory of 13 parishes comprising the Northwest unit of Louisiana (fig. 1). The data on forest acreage and timber volume were secured by a systematic sampling method involving a forest-nonforest classification on aerial photographs and on-the-ground measurements of trees at sample locations. The sample...

  8. Forest statistics for Southwest Louisiana parishes

    Treesearch

    James F. Rosson; Daniel F. Bertelson

    1985-01-01

    These tables were derived from data obtained during a 1984 inventory of 11 parishes comprising the Southwest Unit of Louisiana (fig. 1). The data on forest acreage and timber volume were secured by a systematic sampling method involving a forest-nonforest classification on aerial photographs and on-the-ground measurements of trees at sample locations. The sample...

  9. Performance of digital RGB reflectance color extraction for plaque lesion

    NASA Astrophysics Data System (ADS)

    Hashim, Hadzli; Taib, Mohd Nasir; Jailani, Rozita; Sulaiman, Saadiah; Baba, Roshidah

    2005-01-01

    Several clinical psoriasis lesion groups are been studied for digital RGB color features extraction. Previous works have used samples size that included all the outliers lying beyond the standard deviation factors from the peak histograms. This paper described the statistical performances of the RGB model with and without removing these outliers. Plaque lesion is experimented with other types of psoriasis. The statistical tests are compared with respect to three samples size; the original 90 samples, the first size reduction by removing outliers from 2 standard deviation distances (2SD) and the second size reduction by removing outliers from 1 standard deviation distance (1SD). Quantification of data images through the normal/direct and differential of the conventional reflectance method is considered. Results performances are concluded by observing the error plots with 95% confidence interval and findings of the inference T-tests applied. The statistical tests outcomes have shown that B component for conventional differential method can be used to distinctively classify plaque from the other psoriasis groups in consistent with the error plots finding with an improvement in p-value greater than 0.5.

  10. Assessing Statistically Significant Heavy-Metal Concentrations in Abandoned Mine Areas via Hot Spot Analysis of Portable XRF Data

    PubMed Central

    Kim, Sung-Min; Choi, Yosoon

    2017-01-01

    To develop appropriate measures to prevent soil contamination in abandoned mining areas, an understanding of the spatial variation of the potentially toxic trace elements (PTEs) in the soil is necessary. For the purpose of effective soil sampling, this study uses hot spot analysis, which calculates a z-score based on the Getis-Ord Gi* statistic to identify a statistically significant hot spot sample. To constitute a statistically significant hot spot, a feature with a high value should also be surrounded by other features with high values. Using relatively cost- and time-effective portable X-ray fluorescence (PXRF) analysis, sufficient input data are acquired from the Busan abandoned mine and used for hot spot analysis. To calibrate the PXRF data, which have a relatively low accuracy, the PXRF analysis data are transformed using the inductively coupled plasma atomic emission spectrometry (ICP-AES) data. The transformed PXRF data of the Busan abandoned mine are classified into four groups according to their normalized content and z-scores: high content with a high z-score (HH), high content with a low z-score (HL), low content with a high z-score (LH), and low content with a low z-score (LL). The HL and LH cases may be due to measurement errors. Additional or complementary surveys are required for the areas surrounding these suspect samples or for significant hot spot areas. The soil sampling is conducted according to a four-phase procedure in which the hot spot analysis and proposed group classification method are employed to support the development of a sampling plan for the following phase. Overall, 30, 50, 80, and 100 samples are investigated and analyzed in phases 1–4, respectively. The method implemented in this case study may be utilized in the field for the assessment of statistically significant soil contamination and the identification of areas for which an additional survey is required. PMID:28629168

  11. Assessing Statistically Significant Heavy-Metal Concentrations in Abandoned Mine Areas via Hot Spot Analysis of Portable XRF Data.

    PubMed

    Kim, Sung-Min; Choi, Yosoon

    2017-06-18

    To develop appropriate measures to prevent soil contamination in abandoned mining areas, an understanding of the spatial variation of the potentially toxic trace elements (PTEs) in the soil is necessary. For the purpose of effective soil sampling, this study uses hot spot analysis, which calculates a z -score based on the Getis-Ord Gi* statistic to identify a statistically significant hot spot sample. To constitute a statistically significant hot spot, a feature with a high value should also be surrounded by other features with high values. Using relatively cost- and time-effective portable X-ray fluorescence (PXRF) analysis, sufficient input data are acquired from the Busan abandoned mine and used for hot spot analysis. To calibrate the PXRF data, which have a relatively low accuracy, the PXRF analysis data are transformed using the inductively coupled plasma atomic emission spectrometry (ICP-AES) data. The transformed PXRF data of the Busan abandoned mine are classified into four groups according to their normalized content and z -scores: high content with a high z -score (HH), high content with a low z -score (HL), low content with a high z -score (LH), and low content with a low z -score (LL). The HL and LH cases may be due to measurement errors. Additional or complementary surveys are required for the areas surrounding these suspect samples or for significant hot spot areas. The soil sampling is conducted according to a four-phase procedure in which the hot spot analysis and proposed group classification method are employed to support the development of a sampling plan for the following phase. Overall, 30, 50, 80, and 100 samples are investigated and analyzed in phases 1-4, respectively. The method implemented in this case study may be utilized in the field for the assessment of statistically significant soil contamination and the identification of areas for which an additional survey is required.

  12. Evaluation of methods to sample fecal indicator bacteria in foreshore sand and pore water at freshwater beaches.

    PubMed

    Vogel, Laura J; Edge, Thomas A; O'Carroll, Denis M; Solo-Gabriele, Helena M; Kushnir, Caitlin S E; Robinson, Clare E

    2017-09-15

    Fecal indicator bacteria (FIB) are known to accumulate in foreshore beach sand and pore water (referred to as foreshore reservoir) where they act as a non-point source for contaminating adjacent surface waters. While guidelines exist for sampling surface waters at recreational beaches, there is no widely-accepted method to collect sand/sediment or pore water samples for FIB enumeration. The effect of different sampling strategies in quantifying the abundance of FIB in the foreshore reservoir is unclear. Sampling was conducted at six freshwater beaches with different sand types to evaluate sampling methods for characterizing the abundance of E. coli in the foreshore reservoir as well as the partitioning of E. coli between different components in the foreshore reservoir (pore water, saturated sand, unsaturated sand). Methods were evaluated for collection of pore water (drive point, shovel, and careful excavation), unsaturated sand (top 1 cm, top 5 cm), and saturated sand (sediment core, shovel, and careful excavation). Ankle-depth surface water samples were also collected for comparison. Pore water sampled with a shovel resulted in the highest observed E. coli concentrations (only statistically significant at fine sand beaches) and lowest variability compared to other sampling methods. Collection of the top 1 cm of unsaturated sand resulted in higher and more variable concentrations than the top 5 cm of sand. There were no statistical differences in E. coli concentrations when using different methods to sample the saturated sand. Overall, the unsaturated sand had the highest amount of E. coli when compared to saturated sand and pore water (considered on a bulk volumetric basis). The findings presented will help determine the appropriate sampling strategy for characterizing FIB abundance in the foreshore reservoir as a means of predicting its potential impact on nearshore surface water quality and public health risk. Copyright © 2017 Elsevier Ltd. All rights reserved.

  13. Investigating the Stability of Four Methods for Estimating Item Bias.

    ERIC Educational Resources Information Center

    Perlman, Carole L.; And Others

    The reliability of item bias estimates was studied for four methods: (1) the transformed delta method; (2) Shepard's modified delta method; (3) Rasch's one-parameter residual analysis; and (4) the Mantel-Haenszel procedure. Bias statistics were computed for each sample using all methods. Data were from administration of multiple-choice items from…

  14. Reduced detection by Ziehl-Neelsen method of acid-fast bacilli in sputum samples preserved in cetylpyridinium chloride solution.

    PubMed

    Selvakumar, N; Sudhamathi, S; Duraipandian, M; Frieden, T R; Narayanan, P R

    2004-02-01

    Twelve health facilities implementing the DOTS strategy, and the Tuberculosis Research Centre (TRC), Chennai, India. To determine the detection rates using Ziehl-Neelsen (ZN) and auramine-phenol to stain acid-fast bacilli (AFB) in sputum samples stored in cetylpyridinium chloride (CPC) solution. Two smears were prepared from each of 988 sputum samples collected in CPC and randomly allocated, one to ZN and the other to auramine-phenol staining. All samples were processed for culture of Mycobacterium tuberculosis. A significantly higher proportion of samples were negative using the ZN method compared to the auramine-phenol method (74.5% vs. 61.8%, McNamara's paired chi2 test; P < 0.001). Among 377 samples that were positive using auramine-phenol, 44% were negative using ZN. There were more culture-positive, smear-negative samples in ZN (52.7%) than in auramine-phenol (30%); the difference attained statistical significance (McNemar's paired chi2 test; P < 0.00004). Using ZN, of the 104 smears made immediately after collection, 52 were positive for AFB, of which only 35 (67.3%) were positive after storage in CPC; the reduction in the number of positive smears attained statistical significance (McNemar's paired chi2 test; P = 0.004). Detection of AFB in sputum samples preserved in CPC is significantly reduced using ZN staining.

  15. A re-evaluation of a case-control model with contaminated controls for resource selection studies

    Treesearch

    Christopher T. Rota; Joshua J. Millspaugh; Dylan C. Kesler; Chad P. Lehman; Mark A. Rumble; Catherine M. B. Jachowski

    2013-01-01

    A common sampling design in resource selection studies involves measuring resource attributes at sample units used by an animal and at sample units considered available for use. Few models can estimate the absolute probability of using a sample unit from such data, but such approaches are generally preferred over statistical methods that estimate a relative probability...

  16. FabricS: A user-friendly, complete and robust software for particle shape-fabric analysis

    NASA Astrophysics Data System (ADS)

    Moreno Chávez, G.; Castillo Rivera, F.; Sarocchi, D.; Borselli, L.; Rodríguez-Sedano, L. A.

    2018-06-01

    Shape-fabric is a textural parameter related to the spatial arrangement of elongated particles in geological samples. Its usefulness spans a range from sedimentary petrology to igneous and metamorphic petrology. Independently of the process being studied, when a material flows, the elongated particles are oriented with the major axis in the direction of flow. In sedimentary petrology this information has been used for studies of paleo-flow direction of turbidites, the origin of quartz sediments, and locating ignimbrite vents, among others. In addition to flow direction and its polarity, the method enables flow rheology to be inferred. The use of shape-fabric has been limited due to the difficulties of automatically measuring particles and analyzing them with reliable circular statistics programs. This has dampened interest in the method for a long time. Shape-fabric measurement has increased in popularity since the 1980s thanks to the development of new image analysis techniques and circular statistics software. However, the programs currently available are unreliable, old and are incompatible with newer operating systems, or require programming skills. The goal of our work is to develop a user-friendly program, in the MATLAB environment, with a graphical user interface, that can process images and includes editing functions, and thresholds (elongation and size) for selecting a particle population and analyzing it with reliable circular statistics algorithms. Moreover, the method also has to produce rose diagrams, orientation vectors, and a complete series of statistical parameters. All these requirements are met by our new software. In this paper, we briefly explain the methodology from collection of oriented samples in the field to the minimum number of particles needed to obtain reliable fabric data. We obtained the data using specific statistical tests and taking into account the degree of iso-orientation of the samples and the required degree of reliability. The program has been verified by means of several simulations performed using appropriately designed features and by analyzing real samples.

  17. The Content of Statistical Requirements for Authors in Biomedical Research Journals

    PubMed Central

    Liu, Tian-Yi; Cai, Si-Yu; Nie, Xiao-Lu; Lyu, Ya-Qi; Peng, Xiao-Xia; Feng, Guo-Shuang

    2016-01-01

    Background: Robust statistical designing, sound statistical analysis, and standardized presentation are important to enhance the quality and transparency of biomedical research. This systematic review was conducted to summarize the statistical reporting requirements introduced by biomedical research journals with an impact factor of 10 or above so that researchers are able to give statistical issues’ serious considerations not only at the stage of data analysis but also at the stage of methodological design. Methods: Detailed statistical instructions for authors were downloaded from the homepage of each of the included journals or obtained from the editors directly via email. Then, we described the types and numbers of statistical guidelines introduced by different press groups. Items of statistical reporting guideline as well as particular requirements were summarized in frequency, which were grouped into design, method of analysis, and presentation, respectively. Finally, updated statistical guidelines and particular requirements for improvement were summed up. Results: Totally, 21 of 23 press groups introduced at least one statistical guideline. More than half of press groups can update their statistical instruction for authors gradually relative to issues of new statistical reporting guidelines. In addition, 16 press groups, covering 44 journals, address particular statistical requirements. The most of the particular requirements focused on the performance of statistical analysis and transparency in statistical reporting, including “address issues relevant to research design, including participant flow diagram, eligibility criteria, and sample size estimation,” and “statistical methods and the reasons.” Conclusions: Statistical requirements for authors are becoming increasingly perfected. Statistical requirements for authors remind researchers that they should make sufficient consideration not only in regards to statistical methods during the research design, but also standardized statistical reporting, which would be beneficial in providing stronger evidence and making a greater critical appraisal of evidence more accessible. PMID:27748343

  18. Sampling and Data Analysis for Environmental Microbiology

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Murray, Christopher J.

    2001-06-01

    A brief review of the literature indicates the importance of statistical analysis in applied and environmental microbiology. Sampling designs are particularly important for successful studies, and it is highly recommended that researchers review their sampling design before heading to the laboratory or the field. Most statisticians have numerous stories of scientists who approached them after their study was complete only to have to tell them that the data they gathered could not be used to test the hypothesis they wanted to address. Once the data are gathered, a large and complex body of statistical techniques are available for analysis ofmore » the data. Those methods include both numerical and graphical techniques for exploratory characterization of the data. Hypothesis testing and analysis of variance (ANOVA) are techniques that can be used to compare the mean and variance of two or more groups of samples. Regression can be used to examine the relationships between sets of variables and is often used to examine the dependence of microbiological populations on microbiological parameters. Multivariate statistics provides several methods that can be used for interpretation of datasets with a large number of variables and to partition samples into similar groups, a task that is very common in taxonomy, but also has applications in other fields of microbiology. Geostatistics and other techniques have been used to examine the spatial distribution of microorganisms. The objectives of this chapter are to provide a brief survey of some of the statistical techniques that can be used for sample design and data analysis of microbiological data in environmental studies, and to provide some examples of their use from the literature.« less

  19. Power calculation for overall hypothesis testing with high-dimensional commensurate outcomes.

    PubMed

    Chi, Yueh-Yun; Gribbin, Matthew J; Johnson, Jacqueline L; Muller, Keith E

    2014-02-28

    The complexity of system biology means that any metabolic, genetic, or proteomic pathway typically includes so many components (e.g., molecules) that statistical methods specialized for overall testing of high-dimensional and commensurate outcomes are required. While many overall tests have been proposed, very few have power and sample size methods. We develop accurate power and sample size methods and software to facilitate study planning for high-dimensional pathway analysis. With an account of any complex correlation structure between high-dimensional outcomes, the new methods allow power calculation even when the sample size is less than the number of variables. We derive the exact (finite-sample) and approximate non-null distributions of the 'univariate' approach to repeated measures test statistic, as well as power-equivalent scenarios useful to generalize our numerical evaluations. Extensive simulations of group comparisons support the accuracy of the approximations even when the ratio of number of variables to sample size is large. We derive a minimum set of constants and parameters sufficient and practical for power calculation. Using the new methods and specifying the minimum set to determine power for a study of metabolic consequences of vitamin B6 deficiency helps illustrate the practical value of the new results. Free software implementing the power and sample size methods applies to a wide range of designs, including one group pre-intervention and post-intervention comparisons, multiple parallel group comparisons with one-way or factorial designs, and the adjustment and evaluation of covariate effects. Copyright © 2013 John Wiley & Sons, Ltd.

  20. Disparities in Cervical Cancer Characteristics and Survival Between White Hispanics and White Non-Hispanic Women.

    PubMed

    Khan, Hafiz M R; Gabbidon, Kemesha; Saxena, Anshul; Abdool-Ghany, Faheema; Dodge, John M; Lenzmeier, Taylor

    2016-10-01

    Cervical cancer is the second most common cancer among women resulting in nearly 500,000 cases annually. Screening leads to better treatment and survival time. However, human papillomavirus (HPV) exposure, screening, and treatment vary among races and ethnicities in the United States. The purpose of this study is to examine disparities in characteristics of cervical cancer and survival of cases between White Hispanic (WH) and White non-Hispanic (WNH) women in the United States. We used a stratified random sampling method to select cervical cancer patient records from nine states; a simple random sampling method to extract the demographic and disease characteristics data within states from the Surveillance Epidemiology and End Results (SEER) database. We used statistical probability distribution methods for discrete and continuous data. The chi-square test and independent samples t-test were used to evaluate statistically significant differences. Furthermore, the Cox Proportional Regression and the Kaplan-Meier survival estimators were used to compare WH and WNH population survival times in the United States. The samples of WNH and WH women included 4,000 cervical cancer cases from 1973-2009. There were statistically significant differences between ethnicities: marital status (p < 0.001); primary site of cancer (p < 0.001); lymph node involvement (p < 0.001); grading and differentiation (p < 0.0001); and tumor behavior (p < 0.001). The mean age of diagnosis for both groups showed no statistical differences. However, the mean survival time for WNH was 221.7 (standard deviation [SD] = 118.1) months and for WH was 190.3 (SD = 120.3), which differed significantly (p < 0.001). Clear disparities exist in risk factors, cervical cancer characteristics, and survival time between WH and WNH women.

  1. Statistics provide guidance for indigenous organic carbon detection on Mars missions.

    PubMed

    Sephton, Mark A; Carter, Jonathan N

    2014-08-01

    Data from the Viking and Mars Science Laboratory missions indicate the presence of organic compounds that are not definitively martian in origin. Both contamination and confounding mineralogies have been suggested as alternatives to indigenous organic carbon. Intuitive thought suggests that we are repeatedly obtaining data that confirms the same level of uncertainty. Bayesian statistics may suggest otherwise. If an organic detection method has a true positive to false positive ratio greater than one, then repeated organic matter detection progressively increases the probability of indigeneity. Bayesian statistics also reveal that methods with higher ratios of true positives to false positives give higher overall probabilities and that detection of organic matter in a sample with a higher prior probability of indigenous organic carbon produces greater confidence. Bayesian statistics, therefore, provide guidance for the planning and operation of organic carbon detection activities on Mars. Suggestions for future organic carbon detection missions and instruments are as follows: (i) On Earth, instruments should be tested with analog samples of known organic content to determine their true positive to false positive ratios. (ii) On the mission, for an instrument with a true positive to false positive ratio above one, it should be recognized that each positive detection of organic carbon will result in a progressive increase in the probability of indigenous organic carbon being present; repeated measurements, therefore, can overcome some of the deficiencies of a less-than-definitive test. (iii) For a fixed number of analyses, the highest true positive to false positive ratio method or instrument will provide the greatest probability that indigenous organic carbon is present. (iv) On Mars, analyses should concentrate on samples with highest prior probability of indigenous organic carbon; intuitive desires to contrast samples of high prior probability and low prior probability of indigenous organic carbon should be resisted.

  2. A Simple and Robust Method for Partially Matched Samples Using the P-Values Pooling Approach

    PubMed Central

    Kuan, Pei Fen; Huang, Bo

    2013-01-01

    This paper focuses on statistical analyses in scenarios where some samples from the matched pairs design are missing, resulting in partially matched samples. Motivated by the idea of meta-analysis, we recast the partially matched samples as coming from two experimental designs, and propose a simple yet robust approach based on the weighted Z-test to integrate the p-values computed from these two designs. We show that the proposed approach achieves better operating characteristics in simulations and a case study, compared to existing methods for partially matched samples. PMID:23417968

  3. Comparison of statistical sampling methods with ScannerBit, the GAMBIT scanning module

    NASA Astrophysics Data System (ADS)

    Martinez, Gregory D.; McKay, James; Farmer, Ben; Scott, Pat; Roebber, Elinore; Putze, Antje; Conrad, Jan

    2017-11-01

    We introduce ScannerBit, the statistics and sampling module of the public, open-source global fitting framework GAMBIT. ScannerBit provides a standardised interface to different sampling algorithms, enabling the use and comparison of multiple computational methods for inferring profile likelihoods, Bayesian posteriors, and other statistical quantities. The current version offers random, grid, raster, nested sampling, differential evolution, Markov Chain Monte Carlo (MCMC) and ensemble Monte Carlo samplers. We also announce the release of a new standalone differential evolution sampler, Diver, and describe its design, usage and interface to ScannerBit. We subject Diver and three other samplers (the nested sampler MultiNest, the MCMC GreAT, and the native ScannerBit implementation of the ensemble Monte Carlo algorithm T-Walk) to a battery of statistical tests. For this we use a realistic physical likelihood function, based on the scalar singlet model of dark matter. We examine the performance of each sampler as a function of its adjustable settings, and the dimensionality of the sampling problem. We evaluate performance on four metrics: optimality of the best fit found, completeness in exploring the best-fit region, number of likelihood evaluations, and total runtime. For Bayesian posterior estimation at high resolution, T-Walk provides the most accurate and timely mapping of the full parameter space. For profile likelihood analysis in less than about ten dimensions, we find that Diver and MultiNest score similarly in terms of best fit and speed, outperforming GreAT and T-Walk; in ten or more dimensions, Diver substantially outperforms the other three samplers on all metrics.

  4. Reliability of a Measure of Institutional Discrimination against Minorities

    DTIC Science & Technology

    1979-12-01

    samples are presented. The first is based upon classical statistical theory and the second derives from a series of computer-generated Monte Carlo...Institutional racism and sexism . Englewood Cliffs, N. J.: Prentice-Hall, Inc., 1978. Hays, W. L. and Winkler, R. L. Statistics : probability, inference... statistical measure of the e of institutional discrimination are discussed. Two methods of dealing with the problem of reliability of the measure in small

  5. The Seismic risk perception in Italy deduced by a statistical sample

    NASA Astrophysics Data System (ADS)

    Crescimbene, Massimo; La Longa, Federica; Camassi, Romano; Pino, Nicola Alessandro; Pessina, Vera; Peruzza, Laura; Cerbara, Loredana; Crescimbene, Cristiana

    2015-04-01

    In 2014 EGU Assembly we presented the results of a web a survey on the perception of seismic risk in Italy. The data were derived from over 8,500 questionnaires coming from all Italian regions. Our questionnaire was built by using the semantic differential method (Osgood et al. 1957) with a seven points Likert scale. The questionnaire is inspired the main theoretical approaches of risk perception (psychometric paradigm, cultural theory, etc.) .The results were promising and seem to clearly indicate an underestimation of seismic risk by the italian population. Based on these promising results, the DPC has funded our research for the second year. In 2015 EGU Assembly we present the results of a new survey deduced by an italian statistical sample. The importance of statistical significance at national scale was also suggested by ISTAT (Italian Statistic Institute), considering the study as of national interest, accepted the "project on the perception of seismic risk" as a pilot study inside the National Statistical System (SISTAN), encouraging our RU to proceed in this direction. The survey was conducted by a company specialised in population surveys using the CATI method (computer assisted telephone interview). Preliminary results will be discussed. The statistical support was provided by the research partner CNR-IRPPS. This research is funded by Italian Civil Protection Department (DPC).

  6. Differences in results of analyses of concurrent and split stream-water samples collected and analyzed by the US Geological Survey and the Illinois Environmental Protection Agency, 1985-91

    USGS Publications Warehouse

    Melching, C.S.; Coupe, R.H.

    1995-01-01

    During water years 1985-91, the U.S. Geological Survey (USGS) and the Illinois Environmental Protection Agency (IEPA) cooperated in the collection and analysis of concurrent and split stream-water samples from selected sites in Illinois. Concurrent samples were collected independently by field personnel from each agency at the same time and sent to the IEPA laboratory, whereas the split samples were collected by USGS field personnel and divided into aliquots that were sent to each agency's laboratory for analysis. The water-quality data from these programs were examined by means of the Wilcoxon signed ranks test to identify statistically significant differences between results of the USGS and IEPA analyses. The data sets for constituents and properties identified by the Wilcoxon test as having significant differences were further examined by use of the paired t-test, mean relative percentage difference, and scattergrams to determine if the differences were important. Of the 63 constituents and properties in the concurrent-sample analysis, differences in only 2 (pH and ammonia) were statistically significant and large enough to concern water-quality engineers and planners. Of the 27 constituents and properties in the split-sample analysis, differences in 9 (turbidity, dissolved potassium, ammonia, total phosphorus, dissolved aluminum, dissolved barium, dissolved iron, dissolved manganese, and dissolved nickel) were statistically significant and large enough to con- cern water-quality engineers and planners. The differences in concentration between pairs of the concurrent samples were compared to the precision of the laboratory or field method used. The differences in concentration between pairs of the concurrent samples were compared to the precision of the laboratory or field method used. The differences in concentration between paris of split samples were compared to the precision of the laboratory method used and the interlaboratory precision of measuring a given concentration or property. Consideration of method precision indicated that differences between concurrent samples were insignificant for all concentrations and properties except pH, and that differences between split samples were significant for all concentrations and properties. Consideration of interlaboratory precision indicated that the differences between the split samples were not unusually large. The results for the split samples illustrate the difficulty in obtaining comparable and accurate water-quality data.

  7. Use of missing data methods in longitudinal studies: the persistence of bad practices in developmental psychology.

    PubMed

    Jelicić, Helena; Phelps, Erin; Lerner, Richard M

    2009-07-01

    Developmental science rests on describing, explaining, and optimizing intraindividual changes and, hence, empirically requires longitudinal research. Problems of missing data arise in most longitudinal studies, thus creating challenges for interpreting the substance and structure of intraindividual change. Using a sample of reports of longitudinal studies obtained from three flagship developmental journals-Child Development, Developmental Psychology, and Journal of Research on Adolescence-we examined the number of longitudinal studies reporting missing data and the missing data techniques used. Of the 100 longitudinal studies sampled, 57 either reported having missing data or had discrepancies in sample sizes reported for different analyses. The majority of these studies (82%) used missing data techniques that are statistically problematic (either listwise deletion or pairwise deletion) and not among the methods recommended by statisticians (i.e., the direct maximum likelihood method and the multiple imputation method). Implications of these results for developmental theory and application, and the need for understanding the consequences of using statistically inappropriate missing data techniques with actual longitudinal data sets, are discussed.

  8. Probability Issues in without Replacement Sampling

    ERIC Educational Resources Information Center

    Joarder, A. H.; Al-Sabah, W. S.

    2007-01-01

    Sampling without replacement is an important aspect in teaching conditional probabilities in elementary statistics courses. Different methods proposed in different texts for calculating probabilities of events in this context are reviewed and their relative merits and limitations in applications are pinpointed. An alternative representation of…

  9. Pitfalls in statistical landslide susceptibility modelling

    NASA Astrophysics Data System (ADS)

    Schröder, Boris; Vorpahl, Peter; Märker, Michael; Elsenbeer, Helmut

    2010-05-01

    The use of statistical methods is a well-established approach to predict landslide occurrence probabilities and to assess landslide susceptibility. This is achieved by applying statistical methods relating historical landslide inventories to topographic indices as predictor variables. In our contribution, we compare several new and powerful methods developed in machine learning and well-established in landscape ecology and macroecology for predicting the distribution of shallow landslides in tropical mountain rainforests in southern Ecuador (among others: boosted regression trees, multivariate adaptive regression splines, maximum entropy). Although these methods are powerful, we think it is necessary to follow a basic set of guidelines to avoid some pitfalls regarding data sampling, predictor selection, and model quality assessment, especially if a comparison of different models is contemplated. We therefore suggest to apply a novel toolbox to evaluate approaches to the statistical modelling of landslide susceptibility. Additionally, we propose some methods to open the "black box" as an inherent part of machine learning methods in order to achieve further explanatory insights into preparatory factors that control landslides. Sampling of training data should be guided by hypotheses regarding processes that lead to slope failure taking into account their respective spatial scales. This approach leads to the selection of a set of candidate predictor variables considered on adequate spatial scales. This set should be checked for multicollinearity in order to facilitate model response curve interpretation. Model quality assesses how well a model is able to reproduce independent observations of its response variable. This includes criteria to evaluate different aspects of model performance, i.e. model discrimination, model calibration, and model refinement. In order to assess a possible violation of the assumption of independency in the training samples or a possible lack of explanatory information in the chosen set of predictor variables, the model residuals need to be checked for spatial auto¬correlation. Therefore, we calculate spline correlograms. In addition to this, we investigate partial dependency plots and bivariate interactions plots considering possible interactions between predictors to improve model interpretation. Aiming at presenting this toolbox for model quality assessment, we investigate the influence of strategies in the construction of training datasets for statistical models on model quality.

  10. Use of Unlabeled Samples for Mitigating the Hughes Phenomenon

    NASA Technical Reports Server (NTRS)

    Landgrebe, David A.; Shahshahani, Behzad M.

    1993-01-01

    The use of unlabeled samples in improving the performance of classifiers is studied. When the number of training samples is fixed and small, additional feature measurements may reduce the performance of a statistical classifier. It is shown that by using unlabeled samples, estimates of the parameters can be improved and therefore this phenomenon may be mitigated. Various methods for using unlabeled samples are reviewed and experimental results are provided.

  11. Weighting Statistical Inputs for Data Used to Support Effective Decision Making During Severe Emergency Weather and Environmental Events

    NASA Technical Reports Server (NTRS)

    Gardner, Adrian

    2010-01-01

    National Aeronautical and Space Administration (NASA) weather and atmospheric environmental organizations are insatiable consumers of geophysical, hydrometeorological and solar weather statistics. The expanding array of internet-worked sensors producing targeted physical measurements has generated an almost factorial explosion of near real-time inputs to topical statistical datasets. Normalizing and value-based parsing of such statistical datasets in support of time-constrained weather and environmental alerts and warnings is essential, even with dedicated high-performance computational capabilities. What are the optimal indicators for advanced decision making? How do we recognize the line between sufficient statistical sampling and excessive, mission destructive sampling ? How do we assure that the normalization and parsing process, when interpolated through numerical models, yields accurate and actionable alerts and warnings? This presentation will address the integrated means and methods to achieve desired outputs for NASA and consumers of its data.

  12. Evaluation of three sample preparation methods for the direct identification of bacteria in positive blood cultures by MALDI-TOF.

    PubMed

    Tanner, Hannah; Evans, Jason T; Gossain, Savita; Hussain, Abid

    2017-01-18

    Patient mortality is significantly reduced by rapid identification of bacteria from sterile sites. MALDI-TOF can identify bacteria directly from positive blood cultures and multiple sample preparation methods are available. We evaluated three sample preparation methods and two MALDI-TOF score cut-off values. Positive blood culture bottles with organisms present in Gram stains were prospectively analysed by MALDI-TOF. Three lysis reagents (Saponin, SDS, and SepsiTyper lysis bufer) were applied to each positive culture followed by centrifugation, washing and protein extraction steps. Methods were compared using the McNemar test and 16S rDNA sequencing was used to assess discordant results. In 144 monomicrobial cultures, using ≥2.000 as the cut-off value, species level identifications were obtained from 69/144 (48%) samples using Saponin, 86/144 (60%) using SDS, and 91/144 (63%) using SepsiTyper. The difference between SDS and SepsiTyper was not statistically significant (P = 0.228). Differences between Saponin and the other two reagents were significant (P < 0.01). Using ≥1.700 plus top three results matching as the cut-off value, species level identifications were obtained from 100/144 (69%) samples using Saponin, 103/144 (72%) using SDS, and 106/144 (74%) using SepsiTyper and there was no statistical difference between the methods. No true discordances between culture and direct MALDI-TOF identification were observed in monomicrobial cultures. In 32 polymicrobial cultures, MALDI-TOF identified one organism in 34-75% of samples depending on the method. This study demonstrates two inexpensive in-house detergent lysis methods are non-inferior to a commercial kit for analysis of positive blood cultures by direct MALDI-TOF in a clinical diagnostic microbiology laboratory.

  13. High throughput nonparametric probability density estimation.

    PubMed

    Farmer, Jenny; Jacobs, Donald

    2018-01-01

    In high throughput applications, such as those found in bioinformatics and finance, it is important to determine accurate probability distribution functions despite only minimal information about data characteristics, and without using human subjectivity. Such an automated process for univariate data is implemented to achieve this goal by merging the maximum entropy method with single order statistics and maximum likelihood. The only required properties of the random variables are that they are continuous and that they are, or can be approximated as, independent and identically distributed. A quasi-log-likelihood function based on single order statistics for sampled uniform random data is used to empirically construct a sample size invariant universal scoring function. Then a probability density estimate is determined by iteratively improving trial cumulative distribution functions, where better estimates are quantified by the scoring function that identifies atypical fluctuations. This criterion resists under and over fitting data as an alternative to employing the Bayesian or Akaike information criterion. Multiple estimates for the probability density reflect uncertainties due to statistical fluctuations in random samples. Scaled quantile residual plots are also introduced as an effective diagnostic to visualize the quality of the estimated probability densities. Benchmark tests show that estimates for the probability density function (PDF) converge to the true PDF as sample size increases on particularly difficult test probability densities that include cases with discontinuities, multi-resolution scales, heavy tails, and singularities. These results indicate the method has general applicability for high throughput statistical inference.

  14. High throughput nonparametric probability density estimation

    PubMed Central

    Farmer, Jenny

    2018-01-01

    In high throughput applications, such as those found in bioinformatics and finance, it is important to determine accurate probability distribution functions despite only minimal information about data characteristics, and without using human subjectivity. Such an automated process for univariate data is implemented to achieve this goal by merging the maximum entropy method with single order statistics and maximum likelihood. The only required properties of the random variables are that they are continuous and that they are, or can be approximated as, independent and identically distributed. A quasi-log-likelihood function based on single order statistics for sampled uniform random data is used to empirically construct a sample size invariant universal scoring function. Then a probability density estimate is determined by iteratively improving trial cumulative distribution functions, where better estimates are quantified by the scoring function that identifies atypical fluctuations. This criterion resists under and over fitting data as an alternative to employing the Bayesian or Akaike information criterion. Multiple estimates for the probability density reflect uncertainties due to statistical fluctuations in random samples. Scaled quantile residual plots are also introduced as an effective diagnostic to visualize the quality of the estimated probability densities. Benchmark tests show that estimates for the probability density function (PDF) converge to the true PDF as sample size increases on particularly difficult test probability densities that include cases with discontinuities, multi-resolution scales, heavy tails, and singularities. These results indicate the method has general applicability for high throughput statistical inference. PMID:29750803

  15. Rank-based testing of equal survivorship based on cross-sectional survival data with or without prospective follow-up.

    PubMed

    Chan, Kwun Chuen Gary; Qin, Jing

    2015-10-01

    Existing linear rank statistics cannot be applied to cross-sectional survival data without follow-up since all subjects are essentially censored. However, partial survival information are available from backward recurrence times and are frequently collected from health surveys without prospective follow-up. Under length-biased sampling, a class of linear rank statistics is proposed based only on backward recurrence times without any prospective follow-up. When follow-up data are available, the proposed rank statistic and a conventional rank statistic that utilizes follow-up information from the same sample are shown to be asymptotically independent. We discuss four ways to combine these two statistics when follow-up is present. Simulations show that all combined statistics have substantially improved power compared with conventional rank statistics, and a Mantel-Haenszel test performed the best among the proposal statistics. The method is applied to a cross-sectional health survey without follow-up and a study of Alzheimer's disease with prospective follow-up. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  16. Water Quality Sensing and Spatio-Temporal Monitoring Structure with Autocorrelation Kernel Methods.

    PubMed

    Vizcaíno, Iván P; Carrera, Enrique V; Muñoz-Romero, Sergio; Cumbal, Luis H; Rojo-Álvarez, José Luis

    2017-10-16

    Pollution on water resources is usually analyzed with monitoring campaigns, which consist of programmed sampling, measurement, and recording of the most representative water quality parameters. These campaign measurements yields a non-uniform spatio-temporal sampled data structure to characterize complex dynamics phenomena. In this work, we propose an enhanced statistical interpolation method to provide water quality managers with statistically interpolated representations of spatial-temporal dynamics. Specifically, our proposal makes efficient use of the a priori available information of the quality parameter measurements through Support Vector Regression (SVR) based on Mercer's kernels. The methods are benchmarked against previously proposed methods in three segments of the Machángara River and one segment of the San Pedro River in Ecuador, and their different dynamics are shown by statistically interpolated spatial-temporal maps. The best interpolation performance in terms of mean absolute error was the SVR with Mercer's kernel given by either the Mahalanobis spatial-temporal covariance matrix or by the bivariate estimated autocorrelation function. In particular, the autocorrelation kernel provides with significant improvement of the estimation quality, consistently for all the six water quality variables, which points out the relevance of including a priori knowledge of the problem.

  17. Water Quality Sensing and Spatio-Temporal Monitoring Structure with Autocorrelation Kernel Methods

    PubMed Central

    Vizcaíno, Iván P.; Muñoz-Romero, Sergio; Cumbal, Luis H.

    2017-01-01

    Pollution on water resources is usually analyzed with monitoring campaigns, which consist of programmed sampling, measurement, and recording of the most representative water quality parameters. These campaign measurements yields a non-uniform spatio-temporal sampled data structure to characterize complex dynamics phenomena. In this work, we propose an enhanced statistical interpolation method to provide water quality managers with statistically interpolated representations of spatial-temporal dynamics. Specifically, our proposal makes efficient use of the a priori available information of the quality parameter measurements through Support Vector Regression (SVR) based on Mercer’s kernels. The methods are benchmarked against previously proposed methods in three segments of the Machángara River and one segment of the San Pedro River in Ecuador, and their different dynamics are shown by statistically interpolated spatial-temporal maps. The best interpolation performance in terms of mean absolute error was the SVR with Mercer’s kernel given by either the Mahalanobis spatial-temporal covariance matrix or by the bivariate estimated autocorrelation function. In particular, the autocorrelation kernel provides with significant improvement of the estimation quality, consistently for all the six water quality variables, which points out the relevance of including a priori knowledge of the problem. PMID:29035333

  18. Statistical inference from multiple iTRAQ experiments without using common reference standards.

    PubMed

    Herbrich, Shelley M; Cole, Robert N; West, Keith P; Schulze, Kerry; Yager, James D; Groopman, John D; Christian, Parul; Wu, Lee; O'Meally, Robert N; May, Damon H; McIntosh, Martin W; Ruczinski, Ingo

    2013-02-01

    Isobaric tags for relative and absolute quantitation (iTRAQ) is a prominent mass spectrometry technology for protein identification and quantification that is capable of analyzing multiple samples in a single experiment. Frequently, iTRAQ experiments are carried out using an aliquot from a pool of all samples, or "masterpool", in one of the channels as a reference sample standard to estimate protein relative abundances in the biological samples and to combine abundance estimates from multiple experiments. In this manuscript, we show that using a masterpool is counterproductive. We obtain more precise estimates of protein relative abundance by using the available biological data instead of the masterpool and do not need to occupy a channel that could otherwise be used for another biological sample. In addition, we introduce a simple statistical method to associate proteomic data from multiple iTRAQ experiments with a numeric response and show that this approach is more powerful than the conventionally employed masterpool-based approach. We illustrate our methods using data from four replicate iTRAQ experiments on aliquots of the same pool of plasma samples and from a 406-sample project designed to identify plasma proteins that covary with nutrient concentrations in chronically undernourished children from South Asia.

  19. Maximum entropy PDF projection: A review

    NASA Astrophysics Data System (ADS)

    Baggenstoss, Paul M.

    2017-06-01

    We review maximum entropy (MaxEnt) PDF projection, a method with wide potential applications in statistical inference. The method constructs a sampling distribution for a high-dimensional vector x based on knowing the sampling distribution p(z) of a lower-dimensional feature z = T (x). Under mild conditions, the distribution p(x) having highest possible entropy among all distributions consistent with p(z) may be readily found. Furthermore, the MaxEnt p(x) may be sampled, making the approach useful in Monte Carlo methods. We review the theorem and present a case study in model order selection and classification for handwritten character recognition.

  20. Accurate low-cost methods for performance evaluation of cache memory systems

    NASA Technical Reports Server (NTRS)

    Laha, Subhasis; Patel, Janak H.; Iyer, Ravishankar K.

    1988-01-01

    Methods of simulation based on statistical techniques are proposed to decrease the need for large trace measurements and for predicting true program behavior. Sampling techniques are applied while the address trace is collected from a workload. This drastically reduces the space and time needed to collect the trace. Simulation techniques are developed to use the sampled data not only to predict the mean miss rate of the cache, but also to provide an empirical estimate of its actual distribution. Finally, a concept of primed cache is introduced to simulate large caches by the sampling-based method.

  1. Sampling in health geography: reconciling geographical objectives and probabilistic methods. An example of a health survey in Vientiane (Lao PDR).

    PubMed

    Vallée, Julie; Souris, Marc; Fournet, Florence; Bochaton, Audrey; Mobillion, Virginie; Peyronnie, Karine; Salem, Gérard

    2007-06-01

    Geographical objectives and probabilistic methods are difficult to reconcile in a unique health survey. Probabilistic methods focus on individuals to provide estimates of a variable's prevalence with a certain precision, while geographical approaches emphasise the selection of specific areas to study interactions between spatial characteristics and health outcomes. A sample selected from a small number of specific areas creates statistical challenges: the observations are not independent at the local level, and this results in poor statistical validity at the global level. Therefore, it is difficult to construct a sample that is appropriate for both geographical and probability methods. We used a two-stage selection procedure with a first non-random stage of selection of clusters. Instead of randomly selecting clusters, we deliberately chose a group of clusters, which as a whole would contain all the variation in health measures in the population. As there was no health information available before the survey, we selected a priori determinants that can influence the spatial homogeneity of the health characteristics. This method yields a distribution of variables in the sample that closely resembles that in the overall population, something that cannot be guaranteed with randomly-selected clusters, especially if the number of selected clusters is small. In this way, we were able to survey specific areas while minimising design effects and maximising statistical precision. We applied this strategy in a health survey carried out in Vientiane, Lao People's Democratic Republic. We selected well-known health determinants with unequal spatial distribution within the city: nationality and literacy. We deliberately selected a combination of clusters whose distribution of nationality and literacy is similar to the distribution in the general population. This paper describes the conceptual reasoning behind the construction of the survey sample and shows that it can be advantageous to choose clusters using reasoned hypotheses, based on both probability and geographical approaches, in contrast to a conventional, random cluster selection strategy.

  2. Analysis of chemical warfare agents. II. Use of thiols and statistical experimental design for the trace level determination of vesicant compounds in air samples.

    PubMed

    Muir, Bob; Quick, Suzanne; Slater, Ben J; Cooper, David B; Moran, Mary C; Timperley, Christopher M; Carrick, Wendy A; Burnell, Christopher K

    2005-03-18

    Thermal desorption with gas chromatography-mass spectrometry (TD-GC-MS) remains the technique of choice for analysis of trace concentrations of analytes in air samples. This paper describes the development and application of a method for analysing the vesicant compounds sulfur mustard and Lewisites I-III. 3,4-Dimercaptotoluene and butanethiol were used to spike sorbent tubes and vesicant vapours sampled; Lewisite I and II reacted with the thiols while sulfur mustard and Lewisite III did not. Statistical experimental design was used to optimise thermal desorption parameters and the optimum method used to determine vesicant compounds in headspace samples taken from a decontamination trial. 3,4-Dimercaptotoluene reacted with Lewisites I and II to give a common derivative with a limit of detection (LOD) of 260 microg m(-3), while the butanethiol gave distinct derivatives with limits of detection around 30 microg m(-3).

  3. Forest statistics for West-Central Tennessee counties

    Treesearch

    Renewable Resource Evaluation Research Work Unit

    1982-01-01

    These tables were derived from data obtained during a 1980 inventory of 11 counties comprising the West Central Unit of Tennessee (fig. 1). The data on forest acreage and timber volume were secured by systematic sampling method involving a forest-non-forest classification on aerial photographs and on-the-ground measurements of trees at sample locations. The sample...

  4. Forest statistics for Southwest-South Alabama counties

    Treesearch

    Forest Inventory and Analysis Research Work Unit

    1983-01-01

    These tables were derived from data obtained during a 1982 inventory of 21 counties comprising the Southeast Unit of Alabama (fig. 1). The data on forest acreage and timber volume were secured by a systematic sampling method of involving a forest-nonforest classification on aerial photographs and on-the-ground measurements of trees at sample locations. The sample...

  5. Forest statistics for North-Central Alabama counties

    Treesearch

    Forest Inventory and Analysis Research Work Unit

    1983-01-01

    These tables were derived from data obtained during a 1982 inventory of 15 counties comprising the North Central Unit of Alabama (fig. 1). The data on forest acreage and timber volume were secured by systematic sampling method involving a forest-nonforest classification on aerial photographs and on-the-ground measurements of trees at sample locations. The sample...

  6. Forest statistics for West Tennessee counties

    Treesearch

    No Author Named

    1982-01-01

    These tables were derived from data obtained during a 1980 inventory of 18 counties comprising the West Unit of Tennessee (fig. 1). The data on forest acreage and timber volume were secured by a systematic sampling method involving a forest-nonforest classification on aerial photographs and on-the-ground measurements of trees at sample locations. The sample locations...

  7. Forest statistics for North Alabama counties

    Treesearch

    Forest Inventory and Analysis Work Unit

    1983-01-01

    These tables were derived from data obtained during a 1982 inventory of 10 counties comprising the North Unit of Alabama (fig. 1). The data on forest acreage and timber volume were secured by a systematic sampling method involving a forest-nonforest classification on aerial photographs and on-the-ground measurements of trees at sample locations. The sample locations...

  8. Forest statistics for Central Tennessee counties

    Treesearch

    Renewable Resources Evaluation Research Work Unit

    1981-01-01

    These tables were derived from data obtained during a 1980 inventory of 23 counties comprising the Central Unit of Tennessee (fig. 1). The data on forest acreage and timber volume were secured by a systematic sampling method involving a forest-non-forest classification on aerial photographs and on-the-ground measurements of trees at sample locations. The sample...

  9. Forest statistics for Southwest Alabama counties

    Treesearch

    SO Southern Experiment Sta

    1983-01-01

    These tables were derived from data obtained during a 1982 inventory of 21 counties comprising the Southeast Unit of Alabama (fig. 1). The data on forest acreage and timber volume were secured by a systematic sampling method of involving a forest-nonforest classification on aerial photographs and on-the-ground measurements of trees at sample locations. The sample...

  10. Forest statistics for West Central Alabama counties

    Treesearch

    SO Southern Experiment Sta

    1983-01-01

    These tables were derived from data obtained during a 1982 inventory of 21 counties comprising the Southeast Unit of Alabama (fig. 1). The data on forest acreage and timber volume were secured by a systematic sampling method of involving a forest-nonforest classification on aerial photographs and on-the-ground measurements of trees at sample locations. The sample...

  11. Statistical sampling methods for soils monitoring

    Treesearch

    Ann M. Abbott

    2010-01-01

    Development of the best sampling design to answer a research question should be an interactive venture between the land manager or researcher and statisticians, and is the result of answering various questions. A series of questions that can be asked to guide the researcher in making decisions that will arrive at an effective sampling plan are described, and a case...

  12. A Monte Carlo study of Weibull reliability analysis for space shuttle main engine components

    NASA Technical Reports Server (NTRS)

    Abernethy, K.

    1986-01-01

    The incorporation of a number of additional capabilities into an existing Weibull analysis computer program and the results of Monte Carlo computer simulation study to evaluate the usefulness of the Weibull methods using samples with a very small number of failures and extensive censoring are discussed. Since the censoring mechanism inherent in the Space Shuttle Main Engine (SSME) data is hard to analyze, it was decided to use a random censoring model, generating censoring times from a uniform probability distribution. Some of the statistical techniques and computer programs that are used in the SSME Weibull analysis are described. The methods documented in were supplemented by adding computer calculations of approximate (using iteractive methods) confidence intervals for several parameters of interest. These calculations are based on a likelihood ratio statistic which is asymptotically a chisquared statistic with one degree of freedom. The assumptions built into the computer simulations are described. The simulation program and the techniques used in it are described there also. Simulation results are tabulated for various combinations of Weibull shape parameters and the numbers of failures in the samples.

  13. Evaluation of a segment-based LANDSAT full-frame approach to corp area estimation

    NASA Technical Reports Server (NTRS)

    Bauer, M. E. (Principal Investigator); Hixson, M. M.; Davis, S. M.

    1981-01-01

    As the registration of LANDSAT full frames enters the realm of current technology, sampling methods should be examined which utilize other than the segment data used for LACIE. The effect of separating the functions of sampling for training and sampling for area estimation. The frame selected for analysis was acquired over north central Iowa on August 9, 1978. A stratification of he full-frame was defined. Training data came from segments within the frame. Two classification and estimation procedures were compared: statistics developed on one segment were used to classify that segment, and pooled statistics from the segments were used to classify a systematic sample of pixels. Comparisons to USDA/ESCS estimates illustrate that the full-frame sampling approach can provide accurate and precise area estimates.

  14. Assessment of statistical methods used in library-based approaches to microbial source tracking.

    PubMed

    Ritter, Kerry J; Carruthers, Ethan; Carson, C Andrew; Ellender, R D; Harwood, Valerie J; Kingsley, Kyle; Nakatsu, Cindy; Sadowsky, Michael; Shear, Brian; West, Brian; Whitlock, John E; Wiggins, Bruce A; Wilbur, Jayson D

    2003-12-01

    Several commonly used statistical methods for fingerprint identification in microbial source tracking (MST) were examined to assess the effectiveness of pattern-matching algorithms to correctly identify sources. Although numerous statistical methods have been employed for source identification, no widespread consensus exists as to which is most appropriate. A large-scale comparison of several MST methods, using identical fecal sources, presented a unique opportunity to assess the utility of several popular statistical methods. These included discriminant analysis, nearest neighbour analysis, maximum similarity and average similarity, along with several measures of distance or similarity. Threshold criteria for excluding uncertain or poorly matched isolates from final analysis were also examined for their ability to reduce false positives and increase prediction success. Six independent libraries used in the study were constructed from indicator bacteria isolated from fecal materials of humans, seagulls, cows and dogs. Three of these libraries were constructed using the rep-PCR technique and three relied on antibiotic resistance analysis (ARA). Five of the libraries were constructed using Escherichia coli and one using Enterococcus spp. (ARA). Overall, the outcome of this study suggests a high degree of variability across statistical methods. Despite large differences in correct classification rates among the statistical methods, no single statistical approach emerged as superior. Thresholds failed to consistently increase rates of correct classification and improvement was often associated with substantial effective sample size reduction. Recommendations are provided to aid in selecting appropriate analyses for these types of data.

  15. Statistical methods for conducting agreement (comparison of clinical tests) and precision (repeatability or reproducibility) studies in optometry and ophthalmology.

    PubMed

    McAlinden, Colm; Khadka, Jyoti; Pesudovs, Konrad

    2011-07-01

    The ever-expanding choice of ocular metrology and imaging equipment has driven research into the validity of their measurements. Consequently, studies of the agreement between two instruments or clinical tests have proliferated in the ophthalmic literature. It is important that researchers apply the appropriate statistical tests in agreement studies. Correlation coefficients are hazardous and should be avoided. The 'limits of agreement' method originally proposed by Altman and Bland in 1983 is the statistical procedure of choice. Its step-by-step use and practical considerations in relation to optometry and ophthalmology are detailed in addition to sample size considerations and statistical approaches to precision (repeatability or reproducibility) estimates. Ophthalmic & Physiological Optics © 2011 The College of Optometrists.

  16. Analysis of select Dalbergia and trade timber using direct analysis in real time and time-of-flight mass spectrometry for CITES enforcement.

    PubMed

    Lancaster, Cady; Espinoza, Edgard

    2012-05-15

    International trade of several Dalbergia wood species is regulated by The Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES). In order to supplement morphological identification of these species, a rapid chemical method of analysis was developed. Using Direct Analysis in Real Time (DART) ionization coupled with Time-of-Flight (TOF) Mass Spectrometry (MS), selected Dalbergia and common trade species were analyzed. Each of the 13 wood species was classified using principal component analysis and linear discriminant analysis (LDA). These statistical data clusters served as reliable anchors for species identification of unknowns. Analysis of 20 or more samples from the 13 species studied in this research indicates that the DART-TOFMS results are reproducible. Statistical analysis of the most abundant ions gave good classifications that were useful for identifying unknown wood samples. DART-TOFMS and LDA analysis of 13 species of selected timber samples and the statistical classification allowed for the correct assignment of unknown wood samples. This method is rapid and can be useful when anatomical identification is difficult but needed in order to support CITES enforcement. Published 2012. This article is a US Government work and is in the public domain in the USA.

  17. Wang-Landau Reaction Ensemble Method: Simulation of Weak Polyelectrolytes and General Acid-Base Reactions.

    PubMed

    Landsgesell, Jonas; Holm, Christian; Smiatek, Jens

    2017-02-14

    We present a novel method for the study of weak polyelectrolytes and general acid-base reactions in molecular dynamics and Monte Carlo simulations. The approach combines the advantages of the reaction ensemble and the Wang-Landau sampling method. Deprotonation and protonation reactions are simulated explicitly with the help of the reaction ensemble method, while the accurate sampling of the corresponding phase space is achieved by the Wang-Landau approach. The combination of both techniques provides a sufficient statistical accuracy such that meaningful estimates for the density of states and the partition sum can be obtained. With regard to these estimates, several thermodynamic observables like the heat capacity or reaction free energies can be calculated. We demonstrate that the computation times for the calculation of titration curves with a high statistical accuracy can be significantly decreased when compared to the original reaction ensemble method. The applicability of our approach is validated by the study of weak polyelectrolytes and their thermodynamic properties.

  18. Statistical Approach To Extraction Of Texture In SAR

    NASA Technical Reports Server (NTRS)

    Rignot, Eric J.; Kwok, Ronald

    1992-01-01

    Improved statistical method of extraction of textural features in synthetic-aperture-radar (SAR) images takes account of effects of scheme used to sample raw SAR data, system noise, resolution of radar equipment, and speckle. Treatment of speckle incorporated into overall statistical treatment of speckle, system noise, and natural variations in texture. One computes speckle auto-correlation function from system transfer function that expresses effect of radar aperature and incorporates range and azimuth resolutions.

  19. Field Penetration in a Rectangular Box Using Numerical Techniques: An Effort to Obtain Statistical Shielding Effectiveness

    NASA Technical Reports Server (NTRS)

    Bunting, Charles F.; Yu, Shih-Pin

    2006-01-01

    This paper emphasizes the application of numerical methods to explore the ideas related to shielding effectiveness from a statistical view. An empty rectangular box is examined using a hybrid modal/moment method. The basic computational method is presented followed by the results for single- and multiple observation points within the over-moded empty structure. The statistics of the field are obtained by using frequency stirring, borrowed from the ideas connected with reverberation chamber techniques, and extends the ideas of shielding effectiveness well into the multiple resonance regions. The study presented in this paper will address the average shielding effectiveness over a broad spatial sample within the enclosure as the frequency is varied.

  20. Calculating the free energy of transfer of small solutes into a model lipid membrane: Comparison between metadynamics and umbrella sampling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bochicchio, Davide; Panizon, Emanuele; Ferrando, Riccardo

    2015-10-14

    We compare the performance of two well-established computational algorithms for the calculation of free-energy landscapes of biomolecular systems, umbrella sampling and metadynamics. We look at benchmark systems composed of polyethylene and polypropylene oligomers interacting with lipid (phosphatidylcholine) membranes, aiming at the calculation of the oligomer water-membrane free energy of transfer. We model our test systems at two different levels of description, united-atom and coarse-grained. We provide optimized parameters for the two methods at both resolutions. We devote special attention to the analysis of statistical errors in the two different methods and propose a general procedure for the error estimation inmore » metadynamics simulations. Metadynamics and umbrella sampling yield the same estimates for the water-membrane free energy profile, but metadynamics can be more efficient, providing lower statistical uncertainties within the same simulation time.« less

  1. Multiplex Real-Time PCR for Detection of Staphylococcus aureus, mecA and Panton-Valentine Leukocidin (PVL) Genes from Selective Enrichments from Animals and Retail Meat

    PubMed Central

    Velasco, Valeria; Sherwood, Julie S.; Rojas-García, Pedro P.; Logue, Catherine M.

    2014-01-01

    The aim of this study was to compare a real-time PCR assay, with a conventional culture/PCR method, to detect S. aureus, mecA and Panton-Valentine Leukocidin (PVL) genes in animals and retail meat, using a two-step selective enrichment protocol. A total of 234 samples were examined (77 animal nasal swabs, 112 retail raw meat, and 45 deli meat). The multiplex real-time PCR targeted the genes: nuc (identification of S. aureus), mecA (associated with methicillin resistance) and PVL (virulence factor), and the primary and secondary enrichment samples were assessed. The conventional culture/PCR method included the two-step selective enrichment, selective plating, biochemical testing, and multiplex PCR for confirmation. The conventional culture/PCR method recovered 95/234 positive S. aureus samples. Application of real-time PCR on samples following primary and secondary enrichment detected S. aureus in 111/234 and 120/234 samples respectively. For detection of S. aureus, the kappa statistic was 0.68–0.88 (from substantial to almost perfect agreement) and 0.29–0.77 (from fair to substantial agreement) for primary and secondary enrichments, using real-time PCR. For detection of mecA gene, the kappa statistic was 0–0.49 (from no agreement beyond that expected by chance to moderate agreement) for primary and secondary enrichment samples. Two pork samples were mecA gene positive by all methods. The real-time PCR assay detected the mecA gene in samples that were negative for S. aureus, but positive for Staphylococcus spp. The PVL gene was not detected in any sample by the conventional culture/PCR method or the real-time PCR assay. Among S. aureus isolated by conventional culture/PCR method, the sequence type ST398, and multi-drug resistant strains were found in animals and raw meat samples. The real-time PCR assay may be recommended as a rapid method for detection of S. aureus and the mecA gene, with further confirmation of methicillin-resistant S. aureus (MRSA) using the standard culture method. PMID:24849624

  2. Multiplex real-time PCR for detection of Staphylococcus aureus, mecA and Panton-Valentine Leukocidin (PVL) genes from selective enrichments from animals and retail meat.

    PubMed

    Velasco, Valeria; Sherwood, Julie S; Rojas-García, Pedro P; Logue, Catherine M

    2014-01-01

    The aim of this study was to compare a real-time PCR assay, with a conventional culture/PCR method, to detect S. aureus, mecA and Panton-Valentine Leukocidin (PVL) genes in animals and retail meat, using a two-step selective enrichment protocol. A total of 234 samples were examined (77 animal nasal swabs, 112 retail raw meat, and 45 deli meat). The multiplex real-time PCR targeted the genes: nuc (identification of S. aureus), mecA (associated with methicillin resistance) and PVL (virulence factor), and the primary and secondary enrichment samples were assessed. The conventional culture/PCR method included the two-step selective enrichment, selective plating, biochemical testing, and multiplex PCR for confirmation. The conventional culture/PCR method recovered 95/234 positive S. aureus samples. Application of real-time PCR on samples following primary and secondary enrichment detected S. aureus in 111/234 and 120/234 samples respectively. For detection of S. aureus, the kappa statistic was 0.68-0.88 (from substantial to almost perfect agreement) and 0.29-0.77 (from fair to substantial agreement) for primary and secondary enrichments, using real-time PCR. For detection of mecA gene, the kappa statistic was 0-0.49 (from no agreement beyond that expected by chance to moderate agreement) for primary and secondary enrichment samples. Two pork samples were mecA gene positive by all methods. The real-time PCR assay detected the mecA gene in samples that were negative for S. aureus, but positive for Staphylococcus spp. The PVL gene was not detected in any sample by the conventional culture/PCR method or the real-time PCR assay. Among S. aureus isolated by conventional culture/PCR method, the sequence type ST398, and multi-drug resistant strains were found in animals and raw meat samples. The real-time PCR assay may be recommended as a rapid method for detection of S. aureus and the mecA gene, with further confirmation of methicillin-resistant S. aureus (MRSA) using the standard culture method.

  3. Using Elderly Educators to Increase Colorectal Cancer Screening.

    ERIC Educational Resources Information Center

    Weinrich, Sally P.; And Others

    1993-01-01

    Used elderly educator method for increasing rate of return of fecal occult blood sampling in colorectal screening among 171 socioeconomically disadvantaged older persons. Two methods using elderly educators had overall response rate of more than 60%. Found statistically significant difference between two methods that used elderly educators and two…

  4. Directions for new developments on statistical design and analysis of small population group trials.

    PubMed

    Hilgers, Ralf-Dieter; Roes, Kit; Stallard, Nigel

    2016-06-14

    Most statistical design and analysis methods for clinical trials have been developed and evaluated where at least several hundreds of patients could be recruited. These methods may not be suitable to evaluate therapies if the sample size is unavoidably small, which is usually termed by small populations. The specific sample size cut off, where the standard methods fail, needs to be investigated. In this paper, the authors present their view on new developments for design and analysis of clinical trials in small population groups, where conventional statistical methods may be inappropriate, e.g., because of lack of power or poor adherence to asymptotic approximations due to sample size restrictions. Following the EMA/CHMP guideline on clinical trials in small populations, we consider directions for new developments in the area of statistical methodology for design and analysis of small population clinical trials. We relate the findings to the research activities of three projects, Asterix, IDeAl, and InSPiRe, which have received funding since 2013 within the FP7-HEALTH-2013-INNOVATION-1 framework of the EU. As not all aspects of the wide research area of small population clinical trials can be addressed, we focus on areas where we feel advances are needed and feasible. The general framework of the EMA/CHMP guideline on small population clinical trials stimulates a number of research areas. These serve as the basis for the three projects, Asterix, IDeAl, and InSPiRe, which use various approaches to develop new statistical methodology for design and analysis of small population clinical trials. Small population clinical trials refer to trials with a limited number of patients. Small populations may result form rare diseases or specific subtypes of more common diseases. New statistical methodology needs to be tailored to these specific situations. The main results from the three projects will constitute a useful toolbox for improved design and analysis of small population clinical trials. They address various challenges presented by the EMA/CHMP guideline as well as recent discussions about extrapolation. There is a need for involvement of the patients' perspective in the planning and conduct of small population clinical trials for a successful therapy evaluation.

  5. Statistical methods and errors in family medicine articles between 2010 and 2014-Suez Canal University, Egypt: A cross-sectional study.

    PubMed

    Nour-Eldein, Hebatallah

    2016-01-01

    With limited statistical knowledge of most physicians it is not uncommon to find statistical errors in research articles. To determine the statistical methods and to assess the statistical errors in family medicine (FM) research articles that were published between 2010 and 2014. This was a cross-sectional study. All 66 FM research articles that were published over 5 years by FM authors with affiliation to Suez Canal University were screened by the researcher between May and August 2015. Types and frequencies of statistical methods were reviewed in all 66 FM articles. All 60 articles with identified inferential statistics were examined for statistical errors and deficiencies. A comprehensive 58-item checklist based on statistical guidelines was used to evaluate the statistical quality of FM articles. Inferential methods were recorded in 62/66 (93.9%) of FM articles. Advanced analyses were used in 29/66 (43.9%). Contingency tables 38/66 (57.6%), regression (logistic, linear) 26/66 (39.4%), and t-test 17/66 (25.8%) were the most commonly used inferential tests. Within 60 FM articles with identified inferential statistics, no prior sample size 19/60 (31.7%), application of wrong statistical tests 17/60 (28.3%), incomplete documentation of statistics 59/60 (98.3%), reporting P value without test statistics 32/60 (53.3%), no reporting confidence interval with effect size measures 12/60 (20.0%), use of mean (standard deviation) to describe ordinal/nonnormal data 8/60 (13.3%), and errors related to interpretation were mainly for conclusions without support by the study data 5/60 (8.3%). Inferential statistics were used in the majority of FM articles. Data analysis and reporting statistics are areas for improvement in FM research articles.

  6. Use of the Analysis of the Volatile Faecal Metabolome in Screening for Colorectal Cancer

    PubMed Central

    2015-01-01

    Diagnosis of colorectal cancer is an invasive and expensive colonoscopy, which is usually carried out after a positive screening test. Unfortunately, existing screening tests lack specificity and sensitivity, hence many unnecessary colonoscopies are performed. Here we report on a potential new screening test for colorectal cancer based on the analysis of volatile organic compounds (VOCs) in the headspace of faecal samples. Faecal samples were obtained from subjects who had a positive faecal occult blood sample (FOBT). Subjects subsequently had colonoscopies performed to classify them into low risk (non-cancer) and high risk (colorectal cancer) groups. Volatile organic compounds were analysed by selected ion flow tube mass spectrometry (SIFT-MS) and then data were analysed using both univariate and multivariate statistical methods. Ions most likely from hydrogen sulphide, dimethyl sulphide and dimethyl disulphide are statistically significantly higher in samples from high risk rather than low risk subjects. Results using multivariate methods show that the test gives a correct classification of 75% with 78% specificity and 72% sensitivity on FOBT positive samples, offering a potentially effective alternative to FOBT. PMID:26086914

  7. Multivariate statistical approach to estimate mixing proportions for unknown end members

    USGS Publications Warehouse

    Valder, Joshua F.; Long, Andrew J.; Davis, Arden D.; Kenner, Scott J.

    2012-01-01

    A multivariate statistical method is presented, which includes principal components analysis (PCA) and an end-member mixing model to estimate unknown end-member hydrochemical compositions and the relative mixing proportions of those end members in mixed waters. PCA, together with the Hotelling T2 statistic and a conceptual model of groundwater flow and mixing, was used in selecting samples that best approximate end members, which then were used as initial values in optimization of the end-member mixing model. This method was tested on controlled datasets (i.e., true values of estimates were known a priori) and found effective in estimating these end members and mixing proportions. The controlled datasets included synthetically generated hydrochemical data, synthetically generated mixing proportions, and laboratory analyses of sample mixtures, which were used in an evaluation of the effectiveness of this method for potential use in actual hydrological settings. For three different scenarios tested, correlation coefficients (R2) for linear regression between the estimated and known values ranged from 0.968 to 0.993 for mixing proportions and from 0.839 to 0.998 for end-member compositions. The method also was applied to field data from a study of end-member mixing in groundwater as a field example and partial method validation.

  8. Texture Classification by Texton: Statistical versus Binary

    PubMed Central

    Guo, Zhenhua; Zhang, Zhongcheng; Li, Xiu; Li, Qin; You, Jane

    2014-01-01

    Using statistical textons for texture classification has shown great success recently. The maximal response 8 (Statistical_MR8), image patch (Statistical_Joint) and locally invariant fractal (Statistical_Fractal) are typical statistical texton algorithms and state-of-the-art texture classification methods. However, there are two limitations when using these methods. First, it needs a training stage to build a texton library, thus the recognition accuracy will be highly depended on the training samples; second, during feature extraction, local feature is assigned to a texton by searching for the nearest texton in the whole library, which is time consuming when the library size is big and the dimension of feature is high. To address the above two issues, in this paper, three binary texton counterpart methods were proposed, Binary_MR8, Binary_Joint, and Binary_Fractal. These methods do not require any training step but encode local feature into binary representation directly. The experimental results on the CUReT, UIUC and KTH-TIPS databases show that binary texton could get sound results with fast feature extraction, especially when the image size is not big and the quality of image is not poor. PMID:24520346

  9. Detecting subtle hydrochemical anomalies with multivariate statistics: an example from homogeneous groundwaters in the Great Artesian Basin, Australia

    NASA Astrophysics Data System (ADS)

    O'Shea, Bethany; Jankowski, Jerzy

    2006-12-01

    The major ion composition of Great Artesian Basin groundwater in the lower Namoi River valley is relatively homogeneous in chemical composition. Traditional graphical techniques have been combined with multivariate statistical methods to determine whether subtle differences in the chemical composition of these waters can be delineated. Hierarchical cluster analysis and principal components analysis were successful in delineating minor variations within the groundwaters of the study area that were not visually identified in the graphical techniques applied. Hydrochemical interpretation allowed geochemical processes to be identified in each statistically defined water type and illustrated how these groundwaters differ from one another. Three main geochemical processes were identified in the groundwaters: ion exchange, precipitation, and mixing between waters from different sources. Both statistical methods delineated an anomalous sample suspected of being influenced by magmatic CO2 input. The use of statistical methods to complement traditional graphical techniques for waters appearing homogeneous is emphasized for all investigations of this type. Copyright

  10. An Analysis of Methods Used to Examine Gender Differences in Computer-Related Behavior.

    ERIC Educational Resources Information Center

    Kay, Robin

    1992-01-01

    Review of research investigating gender differences in computer-related behavior examines statistical and methodological flaws. Issues addressed include sample selection, sample size, scale development, scale quality, the use of univariate and multivariate analyses, regressional analysis, construct definition, construct testing, and the…

  11. A model and variance reduction method for computing statistical outputs of stochastic elliptic partial differential equations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vidal-Codina, F., E-mail: fvidal@mit.edu; Nguyen, N.C., E-mail: cuongng@mit.edu; Giles, M.B., E-mail: mike.giles@maths.ox.ac.uk

    We present a model and variance reduction method for the fast and reliable computation of statistical outputs of stochastic elliptic partial differential equations. Our method consists of three main ingredients: (1) the hybridizable discontinuous Galerkin (HDG) discretization of elliptic partial differential equations (PDEs), which allows us to obtain high-order accurate solutions of the governing PDE; (2) the reduced basis method for a new HDG discretization of the underlying PDE to enable real-time solution of the parameterized PDE in the presence of stochastic parameters; and (3) a multilevel variance reduction method that exploits the statistical correlation among the different reduced basismore » approximations and the high-fidelity HDG discretization to accelerate the convergence of the Monte Carlo simulations. The multilevel variance reduction method provides efficient computation of the statistical outputs by shifting most of the computational burden from the high-fidelity HDG approximation to the reduced basis approximations. Furthermore, we develop a posteriori error estimates for our approximations of the statistical outputs. Based on these error estimates, we propose an algorithm for optimally choosing both the dimensions of the reduced basis approximations and the sizes of Monte Carlo samples to achieve a given error tolerance. We provide numerical examples to demonstrate the performance of the proposed method.« less

  12. Inference on network statistics by restricting to the network space: applications to sexual history data.

    PubMed

    Goyal, Ravi; De Gruttola, Victor

    2018-01-30

    Analysis of sexual history data intended to describe sexual networks presents many challenges arising from the fact that most surveys collect information on only a very small fraction of the population of interest. In addition, partners are rarely identified and responses are subject to reporting biases. Typically, each network statistic of interest, such as mean number of sexual partners for men or women, is estimated independently of other network statistics. There is, however, a complex relationship among networks statistics; and knowledge of these relationships can aid in addressing concerns mentioned earlier. We develop a novel method that constrains a posterior predictive distribution of a collection of network statistics in order to leverage the relationships among network statistics in making inference about network properties of interest. The method ensures that inference on network properties is compatible with an actual network. Through extensive simulation studies, we also demonstrate that use of this method can improve estimates in settings where there is uncertainty that arises both from sampling and from systematic reporting bias compared with currently available approaches to estimation. To illustrate the method, we apply it to estimate network statistics using data from the Chicago Health and Social Life Survey. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  13. Identification of discriminant proteins through antibody profiling, methods and apparatus for identifying an individual

    DOEpatents

    Apel, William A.; Thompson, Vicki S; Lacey, Jeffrey A.; Gentillon, Cynthia A.

    2016-08-09

    A method for determining a plurality of proteins for discriminating and positively identifying an individual based from a biological sample. The method may include profiling a biological sample from a plurality of individuals against a protein array including a plurality of proteins. The protein array may include proteins attached to a support in a preselected pattern such that locations of the proteins are known. The biological sample may be contacted with the protein array such that a portion of antibodies in the biological sample reacts with and binds to the proteins forming immune complexes. A statistical analysis method, such as discriminant analysis, may be performed to determine discriminating proteins for distinguishing individuals. Proteins of interest may be used to form a protein array. Such a protein array may be used, for example, to compare a forensic sample from an unknown source with a sample from a known source.

  14. Identification of discriminant proteins through antibody profiling, methods and apparatus for identifying an individual

    DOEpatents

    Thompson, Vicki S; Lacey, Jeffrey A; Gentillon, Cynthia A; Apel, William A

    2015-03-03

    A method for determining a plurality of proteins for discriminating and positively identifying an individual based from a biological sample. The method may include profiling a biological sample from a plurality of individuals against a protein array including a plurality of proteins. The protein array may include proteins attached to a support in a preselected pattern such that locations of the proteins are known. The biological sample may be contacted with the protein array such that a portion of antibodies in the biological sample reacts with and binds to the proteins forming immune complexes. A statistical analysis method, such as discriminant analysis, may be performed to determine discriminating proteins for distinguishing individuals. Proteins of interest may be used to form a protein array. Such a protein array may be used, for example, to compare a forensic sample from an unknown source with a sample from a known source.

  15. Sampling Errors in Monthly Rainfall Totals for TRMM and SSM/I, Based on Statistics of Retrieved Rain Rates and Simple Models

    NASA Technical Reports Server (NTRS)

    Bell, Thomas L.; Kundu, Prasun K.; Einaudi, Franco (Technical Monitor)

    2000-01-01

    Estimates from TRMM satellite data of monthly total rainfall over an area are subject to substantial sampling errors due to the limited number of visits to the area by the satellite during the month. Quantitative comparisons of TRMM averages with data collected by other satellites and by ground-based systems require some estimate of the size of this sampling error. A method of estimating this sampling error based on the actual statistics of the TRMM observations and on some modeling work has been developed. "Sampling error" in TRMM monthly averages is defined here relative to the monthly total a hypothetical satellite permanently stationed above the area would have reported. "Sampling error" therefore includes contributions from the random and systematic errors introduced by the satellite remote sensing system. As part of our long-term goal of providing error estimates for each grid point accessible to the TRMM instruments, sampling error estimates for TRMM based on rain retrievals from TRMM microwave (TMI) data are compared for different times of the year and different oceanic areas (to minimize changes in the statistics due to algorithmic differences over land and ocean). Changes in sampling error estimates due to changes in rain statistics due 1) to evolution of the official algorithms used to process the data, and 2) differences from other remote sensing systems such as the Defense Meteorological Satellite Program (DMSP) Special Sensor Microwave/Imager (SSM/I), are analyzed.

  16. Protein Multiplexed Immunoassay Analysis with R.

    PubMed

    Breen, Edmond J

    2017-01-01

    Plasma samples from 177 control and type 2 diabetes patients collected at three Australian hospitals are screened for 14 analytes using six custom-made multiplex kits across 60 96-well plates. In total 354 samples were collected from the patients, representing one baseline and one end point sample from each patient. R methods and source code for analyzing the analyte fluorescence response obtained from these samples by Luminex Bio-Plex ® xMap multiplexed immunoassay technology are disclosed. Techniques and R procedures for reading Bio-Plex ® result files for statistical analysis and data visualization are also presented. The need for technical replicates and the number of technical replicates are addressed as well as plate layout design strategies. Multinomial regression is used to determine plate to sample covariate balance. Methods for matching clinical covariate information to Bio-Plex ® results and vice versa are given. As well as methods for measuring and inspecting the quality of the fluorescence responses are presented. Both fixed and mixed-effect approaches for immunoassay statistical differential analysis are presented and discussed. A random effect approach to outlier analysis and detection is also shown. The bioinformatics R methodology present here provides a foundation for rigorous and reproducible analysis of the fluorescence response obtained from multiplexed immunoassays.

  17. Assigning African elephant DNA to geographic region of origin: Applications to the ivory trade

    PubMed Central

    Wasser, Samuel K.; Shedlock, Andrew M.; Comstock, Kenine; Ostrander, Elaine A.; Mutayoba, Benezeth; Stephens, Matthew

    2004-01-01

    Resurgence of illicit trade in African elephant ivory is placing the elephant at renewed risk. Regulation of this trade could be vastly improved by the ability to verify the geographic origin of tusks. We address this need by developing a combined genetic and statistical method to determine the origin of poached ivory. Our statistical approach exploits a smoothing method to estimate geographic-specific allele frequencies over the entire African elephants' range for 16 microsatellite loci, using 315 tissue and 84 scat samples from forest (Loxodonta africana cyclotis) and savannah (Loxodonta africana africana) elephants at 28 locations. These geographic-specific allele frequency estimates are used to infer the geographic origin of DNA samples, such as could be obtained from tusks of unknown origin. We demonstrate that our method alleviates several problems associated with standard assignment methods in this context, and the absolute accuracy of our method is high. Continent-wide, 50% of samples were located within 500 km, and 80% within 932 km of their actual place of origin. Accuracy varied by region (median accuracies: West Africa, 135 km; Central Savannah, 286 km; Central Forest, 411 km; South, 535 km; and East, 697 km). In some cases, allele frequencies vary considerably over small geographic regions, making much finer discriminations possible and suggesting that resolution could be further improved by collection of samples from locations not represented in our study. PMID:15459317

  18. An enzyme-linked immunosorbent assay for the determination of dioxins in contaminated sediment and soil samples

    PubMed Central

    Van Emon, Jeanette M.; Chuang, Jane C.; Lordo, Robert A.; Schrock, Mary E.; Nichkova, Mikaela; Gee, Shirley J.; Hammock, Bruce D.

    2010-01-01

    A 96-microwell enzyme-linked immunosorbent assay (ELISA) method was evaluated to determine PCDDs/PCDFs in sediment and soil samples from an EPA Superfund site. Samples were prepared and analyzed by both the ELISA and a gas chromatography/high resolution mass spectrometry (GC/HRMS) method. Comparable method precision, accuracy, and detection level (8 ng kg−1) were achieved by the ELISA method with respect to GC/HRMS. However, the extraction and cleanup method developed for the ELISA requires refinement for the soil type that yielded a waxy residue after sample processing. Four types of statistical analyses (Pearson correlation coefficient, paired t-test, nonparametric tests, and McNemar’s test of association) were performed to determine whether the two methods produced statistically different results. The log-transformed ELISA-derived 2,3,7,8-tetrachlorodibenzo-p-dioxin values and logtransformed GC/HRMS-derived TEQ values were significantly correlated (r = 0.79) at the 0.05 level. The median difference in values between ELISA and GC/HRMS was not significant at the 0.05 level. Low false negative and false positive rates (<10%) were observed for the ELISA when compared to the GC/HRMS at 1000 ng TEQ kg−1. The findings suggest that immunochemical technology could be a complementary monitoring tool for determining concentrations at the 1000 ng TEQ kg−1 action level for contaminated sediment and soil. The ELISA could also be used in an analytical triage approach to screen and rank samples prior to instrumental analysis. PMID:18313102

  19. Statistical power in parallel group point exposure studies with time-to-event outcomes: an empirical comparison of the performance of randomized controlled trials and the inverse probability of treatment weighting (IPTW) approach.

    PubMed

    Austin, Peter C; Schuster, Tibor; Platt, Robert W

    2015-10-15

    Estimating statistical power is an important component of the design of both randomized controlled trials (RCTs) and observational studies. Methods for estimating statistical power in RCTs have been well described and can be implemented simply. In observational studies, statistical methods must be used to remove the effects of confounding that can occur due to non-random treatment assignment. Inverse probability of treatment weighting (IPTW) using the propensity score is an attractive method for estimating the effects of treatment using observational data. However, sample size and power calculations have not been adequately described for these methods. We used an extensive series of Monte Carlo simulations to compare the statistical power of an IPTW analysis of an observational study with time-to-event outcomes with that of an analysis of a similarly-structured RCT. We examined the impact of four factors on the statistical power function: number of observed events, prevalence of treatment, the marginal hazard ratio, and the strength of the treatment-selection process. We found that, on average, an IPTW analysis had lower statistical power compared to an analysis of a similarly-structured RCT. The difference in statistical power increased as the magnitude of the treatment-selection model increased. The statistical power of an IPTW analysis tended to be lower than the statistical power of a similarly-structured RCT.

  20. KECSA-Movable Type Implicit Solvation Model (KMTISM)

    PubMed Central

    2015-01-01

    Computation of the solvation free energy for chemical and biological processes has long been of significant interest. The key challenges to effective solvation modeling center on the choice of potential function and configurational sampling. Herein, an energy sampling approach termed the “Movable Type” (MT) method, and a statistical energy function for solvation modeling, “Knowledge-based and Empirical Combined Scoring Algorithm” (KECSA) are developed and utilized to create an implicit solvation model: KECSA-Movable Type Implicit Solvation Model (KMTISM) suitable for the study of chemical and biological systems. KMTISM is an implicit solvation model, but the MT method performs energy sampling at the atom pairwise level. For a specific molecular system, the MT method collects energies from prebuilt databases for the requisite atom pairs at all relevant distance ranges, which by its very construction encodes all possible molecular configurations simultaneously. Unlike traditional statistical energy functions, KECSA converts structural statistical information into categorized atom pairwise interaction energies as a function of the radial distance instead of a mean force energy function. Within the implicit solvent model approximation, aqueous solvation free energies are then obtained from the NVT ensemble partition function generated by the MT method. Validation is performed against several subsets selected from the Minnesota Solvation Database v2012. Results are compared with several solvation free energy calculation methods, including a one-to-one comparison against two commonly used classical implicit solvation models: MM-GBSA and MM-PBSA. Comparison against a quantum mechanics based polarizable continuum model is also discussed (Cramer and Truhlar’s Solvation Model 12). PMID:25691832

  1. Statistical techniques for sampling and monitoring natural resources

    Treesearch

    Hans T. Schreuder; Richard Ernst; Hugo Ramirez-Maldonado

    2004-01-01

    We present the statistical theory of inventory and monitoring from a probabilistic point of view. We start with the basics and show the interrelationships between designs and estimators illustrating the methods with a small artificial population as well as with a mapped realistic population. For such applications, useful open source software is given in Appendix 4....

  2. The Adequacy of Different Robust Statistical Tests in Comparing Two Independent Groups

    ERIC Educational Resources Information Center

    Pero-Cebollero, Maribel; Guardia-Olmos, Joan

    2013-01-01

    In the current study, we evaluated various robust statistical methods for comparing two independent groups. Two scenarios for simulation were generated: one of equality and another of population mean differences. In each of the scenarios, 33 experimental conditions were used as a function of sample size, standard deviation and asymmetry. For each…

  3. Reporting quality of statistical methods in surgical observational studies: protocol for systematic review.

    PubMed

    Wu, Robert; Glen, Peter; Ramsay, Tim; Martel, Guillaume

    2014-06-28

    Observational studies dominate the surgical literature. Statistical adjustment is an important strategy to account for confounders in observational studies. Research has shown that published articles are often poor in statistical quality, which may jeopardize their conclusions. The Statistical Analyses and Methods in the Published Literature (SAMPL) guidelines have been published to help establish standards for statistical reporting.This study will seek to determine whether the quality of statistical adjustment and the reporting of these methods are adequate in surgical observational studies. We hypothesize that incomplete reporting will be found in all surgical observational studies, and that the quality and reporting of these methods will be of lower quality in surgical journals when compared with medical journals. Finally, this work will seek to identify predictors of high-quality reporting. This work will examine the top five general surgical and medical journals, based on a 5-year impact factor (2007-2012). All observational studies investigating an intervention related to an essential component area of general surgery (defined by the American Board of Surgery), with an exposure, outcome, and comparator, will be included in this systematic review. Essential elements related to statistical reporting and quality were extracted from the SAMPL guidelines and include domains such as intent of analysis, primary analysis, multiple comparisons, numbers and descriptive statistics, association and correlation analyses, linear regression, logistic regression, Cox proportional hazard analysis, analysis of variance, survival analysis, propensity analysis, and independent and correlated analyses. Each article will be scored as a proportion based on fulfilling criteria in relevant analyses used in the study. A logistic regression model will be built to identify variables associated with high-quality reporting. A comparison will be made between the scores of surgical observational studies published in medical versus surgical journals. Secondary outcomes will pertain to individual domains of analysis. Sensitivity analyses will be conducted. This study will explore the reporting and quality of statistical analyses in surgical observational studies published in the most referenced surgical and medical journals in 2013 and examine whether variables (including the type of journal) can predict high-quality reporting.

  4. [Optimized application of nested PCR method for detection of malaria].

    PubMed

    Yao-Guang, Z; Li, J; Zhen-Yu, W; Li, C

    2017-04-28

    Objective To optimize the application of the nested PCR method for the detection of malaria according to the working practice, so as to improve the efficiency of malaria detection. Methods Premixing solution of PCR, internal primers for further amplification and new designed primers that aimed at two Plasmodium ovale subspecies were employed to optimize the reaction system, reaction condition and specific primers of P . ovale on basis of routine nested PCR. Then the specificity and the sensitivity of the optimized method were analyzed. The positive blood samples and examination samples of malaria were detected by the routine nested PCR and the optimized method simultaneously, and the detection results were compared and analyzed. Results The optimized method showed good specificity, and its sensitivity could reach the pg to fg level. The two methods were used to detect the same positive malarial blood samples simultaneously, the results indicated that the PCR products of the two methods had no significant difference, but the non-specific amplification reduced obviously and the detection rates of P . ovale subspecies improved, as well as the total specificity also increased through the use of the optimized method. The actual detection results of 111 cases of malarial blood samples showed that the sensitivity and specificity of the routine nested PCR were 94.57% and 86.96%, respectively, and those of the optimized method were both 93.48%, and there was no statistically significant difference between the two methods in the sensitivity ( P > 0.05), but there was a statistically significant difference between the two methods in the specificity ( P < 0.05). Conclusion The optimized PCR can improve the specificity without reducing the sensitivity on the basis of the routine nested PCR, it also can save the cost and increase the efficiency of malaria detection as less experiment links.

  5. Combined Uncertainty and A-Posteriori Error Bound Estimates for General CFD Calculations: Theory and Software Implementation

    NASA Technical Reports Server (NTRS)

    Barth, Timothy J.

    2014-01-01

    This workshop presentation discusses the design and implementation of numerical methods for the quantification of statistical uncertainty, including a-posteriori error bounds, for output quantities computed using CFD methods. Hydrodynamic realizations often contain numerical error arising from finite-dimensional approximation (e.g. numerical methods using grids, basis functions, particles) and statistical uncertainty arising from incomplete information and/or statistical characterization of model parameters and random fields. The first task at hand is to derive formal error bounds for statistics given realizations containing finite-dimensional numerical error [1]. The error in computed output statistics contains contributions from both realization error and the error resulting from the calculation of statistics integrals using a numerical method. A second task is to devise computable a-posteriori error bounds by numerically approximating all terms arising in the error bound estimates. For the same reason that CFD calculations including error bounds but omitting uncertainty modeling are only of limited value, CFD calculations including uncertainty modeling but omitting error bounds are only of limited value. To gain maximum value from CFD calculations, a general software package for uncertainty quantification with quantified error bounds has been developed at NASA. The package provides implementations for a suite of numerical methods used in uncertainty quantification: Dense tensorization basis methods [3] and a subscale recovery variant [1] for non-smooth data, Sparse tensorization methods[2] utilizing node-nested hierarchies, Sampling methods[4] for high-dimensional random variable spaces.

  6. SparRec: An effective matrix completion framework of missing data imputation for GWAS

    NASA Astrophysics Data System (ADS)

    Jiang, Bo; Ma, Shiqian; Causey, Jason; Qiao, Linbo; Hardin, Matthew Price; Bitts, Ian; Johnson, Daniel; Zhang, Shuzhong; Huang, Xiuzhen

    2016-10-01

    Genome-wide association studies present computational challenges for missing data imputation, while the advances of genotype technologies are generating datasets of large sample sizes with sample sets genotyped on multiple SNP chips. We present a new framework SparRec (Sparse Recovery) for imputation, with the following properties: (1) The optimization models of SparRec, based on low-rank and low number of co-clusters of matrices, are different from current statistics methods. While our low-rank matrix completion (LRMC) model is similar to Mendel-Impute, our matrix co-clustering factorization (MCCF) model is completely new. (2) SparRec, as other matrix completion methods, is flexible to be applied to missing data imputation for large meta-analysis with different cohorts genotyped on different sets of SNPs, even when there is no reference panel. This kind of meta-analysis is very challenging for current statistics based methods. (3) SparRec has consistent performance and achieves high recovery accuracy even when the missing data rate is as high as 90%. Compared with Mendel-Impute, our low-rank based method achieves similar accuracy and efficiency, while the co-clustering based method has advantages in running time. The testing results show that SparRec has significant advantages and competitive performance over other state-of-the-art existing statistics methods including Beagle and fastPhase.

  7. A statistical assessment of pesticide pollution in surface waters using environmental monitoring data: Chlorpyrifos in Central Valley, California.

    PubMed

    Wang, Dan; Singhasemanon, Nan; Goh, Kean S

    2016-11-15

    Pesticides are routinely monitored in surface waters and resultant data are analyzed to assess whether their uses will damage aquatic eco-systems. However, the utility of the monitoring data is limited because of the insufficiency in the temporal and spatial sampling coverage and the inability to detect and quantify trace concentrations. This study developed a novel assessment procedure that addresses those limitations by combining 1) statistical methods capable of extracting information from concentrations below changing detection limits, 2) statistical resampling techniques that account for uncertainties rooted in the non-detects and insufficient/irregular sampling coverage, and 3) multiple lines of evidence that improve confidence in the final conclusion. This procedure was demonstrated by an assessment on chlorpyrifos monitoring data in surface waters of California's Central Valley (2005-2013). We detected a significant downward trend in the concentrations, which cannot be observed by commonly-used statistical approaches. We assessed that the aquatic risk was low using a probabilistic method that works with non-detects and has the ability to differentiate indicator groups with varying sensitivity. In addition, we showed that the frequency of exceedance over ambient aquatic life water quality criteria was affected by pesticide use, precipitation and irrigation demand in certain periods anteceding the water sampling events. Copyright © 2016 Elsevier B.V. All rights reserved.

  8. Method matters: Experimental evidence for shorter avian sperm in faecal compared to abdominal massage samples.

    PubMed

    Girndt, Antje; Cockburn, Glenn; Sánchez-Tójar, Alfredo; Løvlie, Hanne; Schroeder, Julia

    2017-01-01

    Birds are model organisms in sperm biology. Previous work in zebra finches, suggested that sperm sampled from males' faeces and ejaculates do not differ in size. Here, we tested this assumption in a captive population of house sparrows, Passer domesticus. We compared sperm length in samples from three collection techniques: female dummy, faecal and abdominal massage samples. We found that sperm were significantly shorter in faecal than abdominal massage samples, which was explained by shorter heads and midpieces, but not flagella. This result might indicate that faecal sampled sperm could be less mature than sperm collected by abdominal massage. The female dummy method resulted in an insufficient number of experimental ejaculates because most males ignored it. In light of these results, we recommend using abdominal massage as a preferred method for avian sperm sampling. Where avian sperm cannot be collected by abdominal massage alone, we advise controlling for sperm sampling protocol statistically.

  9. Entrepreneurial Intentions of Agricultural Students: Levels and Determinants

    ERIC Educational Resources Information Center

    Pouratashi, Mahtab

    2015-01-01

    Purpose: This paper examined levels and determinants of entrepreneurial intentions amongst agricultural students. Methodology: The statistical population comprised students in colleges of agriculture at University of Tehran. By use of a random sampling method, a sample of 120 students participated in the study. The instrument for data collection…

  10. "Adultspan" Publication Patterns: Author and Article Characteristics from 1999 to 2009

    ERIC Educational Resources Information Center

    Erford, Bradley T.; Clark, Kelly H.; Erford, Breann M.

    2011-01-01

    Publication patterns of articles in "Adultspan" from 1999 to 2009 were reviewed. Author characteristics and article content were analyzed to determine trends over time. Research articles were analyzed specifically for type of research design, classification, sampling method, types of participants, sample size, types of statistics used, and…

  11. Monitoring changes in exotic vegetation

    Treesearch

    Robert D. Sutter

    1998-01-01

    Ecological monitoring provides critical information for management decisions by measuring changes in managed and unmanaged populations, communities and ecological systems. It integrates ecology, goal and objective setting, sampling design, sampling methods, and statistical analysis. It is a topic that I, with a team of Nature Conservancy ecologists, teach in a six day...

  12. A new statistic for identifying batch effects in high-throughput genomic data that uses guided principal component analysis.

    PubMed

    Reese, Sarah E; Archer, Kellie J; Therneau, Terry M; Atkinson, Elizabeth J; Vachon, Celine M; de Andrade, Mariza; Kocher, Jean-Pierre A; Eckel-Passow, Jeanette E

    2013-11-15

    Batch effects are due to probe-specific systematic variation between groups of samples (batches) resulting from experimental features that are not of biological interest. Principal component analysis (PCA) is commonly used as a visual tool to determine whether batch effects exist after applying a global normalization method. However, PCA yields linear combinations of the variables that contribute maximum variance and thus will not necessarily detect batch effects if they are not the largest source of variability in the data. We present an extension of PCA to quantify the existence of batch effects, called guided PCA (gPCA). We describe a test statistic that uses gPCA to test whether a batch effect exists. We apply our proposed test statistic derived using gPCA to simulated data and to two copy number variation case studies: the first study consisted of 614 samples from a breast cancer family study using Illumina Human 660 bead-chip arrays, whereas the second case study consisted of 703 samples from a family blood pressure study that used Affymetrix SNP Array 6.0. We demonstrate that our statistic has good statistical properties and is able to identify significant batch effects in two copy number variation case studies. We developed a new statistic that uses gPCA to identify whether batch effects exist in high-throughput genomic data. Although our examples pertain to copy number data, gPCA is general and can be used on other data types as well. The gPCA R package (Available via CRAN) provides functionality and data to perform the methods in this article. reesese@vcu.edu

  13. Comparison of the performance of the CMS Hierarchical Condition Category (CMS-HCC) risk adjuster with the Charlson and Elixhauser comorbidity measures in predicting mortality.

    PubMed

    Li, Pengxiang; Kim, Michelle M; Doshi, Jalpa A

    2010-08-20

    The Centers for Medicare and Medicaid Services (CMS) has implemented the CMS-Hierarchical Condition Category (CMS-HCC) model to risk adjust Medicare capitation payments. This study intends to assess the performance of the CMS-HCC risk adjustment method and to compare it to the Charlson and Elixhauser comorbidity measures in predicting in-hospital and six-month mortality in Medicare beneficiaries. The study used the 2005-2006 Chronic Condition Data Warehouse (CCW) 5% Medicare files. The primary study sample included all community-dwelling fee-for-service Medicare beneficiaries with a hospital admission between January 1st, 2006 and June 30th, 2006. Additionally, four disease-specific samples consisting of subgroups of patients with principal diagnoses of congestive heart failure (CHF), stroke, diabetes mellitus (DM), and acute myocardial infarction (AMI) were also selected. Four analytic files were generated for each sample by extracting inpatient and/or outpatient claims for each patient. Logistic regressions were used to compare the methods. Model performance was assessed using the c-statistic, the Akaike's information criterion (AIC), the Bayesian information criterion (BIC) and their 95% confidence intervals estimated using bootstrapping. The CMS-HCC had statistically significant higher c-statistic and lower AIC and BIC values than the Charlson and Elixhauser methods in predicting in-hospital and six-month mortality across all samples in analytic files that included claims from the index hospitalization. Exclusion of claims for the index hospitalization generally led to drops in model performance across all methods with the highest drops for the CMS-HCC method. However, the CMS-HCC still performed as well or better than the other two methods. The CMS-HCC method demonstrated better performance relative to the Charlson and Elixhauser methods in predicting in-hospital and six-month mortality. The CMS-HCC model is preferred over the Charlson and Elixhauser methods if information about the patient's diagnoses prior to the index hospitalization is available and used to code the risk adjusters. However, caution should be exercised in studies evaluating inpatient processes of care and where data on pre-index admission diagnoses are unavailable.

  14. Fragment size distribution statistics in dynamic fragmentation of laser shock-loaded tin

    NASA Astrophysics Data System (ADS)

    He, Weihua; Xin, Jianting; Zhao, Yongqiang; Chu, Genbai; Xi, Tao; Shui, Min; Lu, Feng; Gu, Yuqiu

    2017-06-01

    This work investigates the geometric statistics method to characterize the size distribution of tin fragments produced in the laser shock-loaded dynamic fragmentation process. In the shock experiments, the ejection of the tin sample with etched V-shape groove in the free surface are collected by the soft recovery technique. Subsequently, the produced fragments are automatically detected with the fine post-shot analysis techniques including the X-ray micro-tomography and the improved watershed method. To characterize the size distributions of the fragments, a theoretical random geometric statistics model based on Poisson mixtures is derived for dynamic heterogeneous fragmentation problem, which reveals linear combinational exponential distribution. The experimental data related to fragment size distributions of the laser shock-loaded tin sample are examined with the proposed theoretical model, and its fitting performance is compared with that of other state-of-the-art fragment size distribution models. The comparison results prove that our proposed model can provide far more reasonable fitting result for the laser shock-loaded tin.

  15. Data-adaptive test statistics for microarray data.

    PubMed

    Mukherjee, Sach; Roberts, Stephen J; van der Laan, Mark J

    2005-09-01

    An important task in microarray data analysis is the selection of genes that are differentially expressed between different tissue samples, such as healthy and diseased. However, microarray data contain an enormous number of dimensions (genes) and very few samples (arrays), a mismatch which poses fundamental statistical problems for the selection process that have defied easy resolution. In this paper, we present a novel approach to the selection of differentially expressed genes in which test statistics are learned from data using a simple notion of reproducibility in selection results as the learning criterion. Reproducibility, as we define it, can be computed without any knowledge of the 'ground-truth', but takes advantage of certain properties of microarray data to provide an asymptotically valid guide to expected loss under the true data-generating distribution. We are therefore able to indirectly minimize expected loss, and obtain results substantially more robust than conventional methods. We apply our method to simulated and oligonucleotide array data. By request to the corresponding author.

  16. Cover/Frequency (CF)

    Treesearch

    John F. Caratti

    2006-01-01

    The FIREMON Cover/Frequency (CF) method is used to assess changes in plant species cover and frequency for a macroplot. This method uses multiple quadrats to sample within-plot variation and quantify statistically valid changes in plant species cover, height, and frequency over time. Because it is difficult to estimate cover in quadrats for larger plants, this method...

  17. [Statistical analysis of German radiologic periodicals: developmental trends in the last 10 years].

    PubMed

    Golder, W

    1999-09-01

    To identify which statistical tests are applied in German radiological publications, to what extent their use has changed during the last decade, and which factors might be responsible for this development. The major articles published in "ROFO" and "DER RADIOLOGE" during 1988, 1993 and 1998 were reviewed for statistical content. The contributions were classified by principal focus and radiological subspecialty. The methods used were assigned to descriptive, basal and advanced statistics. Sample size, significance level and power were established. The use of experts' assistance was monitored. Finally, we calculated the so-called cumulative accessibility of the publications. 525 contributions were found to be eligible. In 1988, 87% used descriptive statistics only, 12.5% basal, and 0.5% advanced statistics. The corresponding figures in 1993 and 1998 are 62 and 49%, 32 and 41%, and 6 and 10%, respectively. Statistical techniques were most likely to be used in research on musculoskeletal imaging and articles dedicated to MRI. Six basic categories of statistical methods account for the complete statistical analysis appearing in 90% of the articles. ROC analysis is the single most common advanced technique. Authors make increasingly use of statistical experts' opinion and programs. During the last decade, the use of statistical methods in German radiological journals has fundamentally improved, both quantitatively and qualitatively. Presently, advanced techniques account for 20% of the pertinent statistical tests. This development seems to be promoted by the increasing availability of statistical analysis software.

  18. The Bootstrap, the Jackknife, and the Randomization Test: A Sampling Taxonomy.

    PubMed

    Rodgers, J L

    1999-10-01

    A simple sampling taxonomy is defined that shows the differences between and relationships among the bootstrap, the jackknife, and the randomization test. Each method has as its goal the creation of an empirical sampling distribution that can be used to test statistical hypotheses, estimate standard errors, and/or create confidence intervals. Distinctions between the methods can be made based on the sampling approach (with replacement versus without replacement) and the sample size (replacing the whole original sample versus replacing a subset of the original sample). The taxonomy is useful for teaching the goals and purposes of resampling schemes. An extension of the taxonomy implies other possible resampling approaches that have not previously been considered. Univariate and multivariate examples are presented.

  19. Point process statistics in atom probe tomography.

    PubMed

    Philippe, T; Duguay, S; Grancher, G; Blavette, D

    2013-09-01

    We present a review of spatial point processes as statistical models that we have designed for the analysis and treatment of atom probe tomography (APT) data. As a major advantage, these methods do not require sampling. The mean distance to nearest neighbour is an attractive approach to exhibit a non-random atomic distribution. A χ(2) test based on distance distributions to nearest neighbour has been developed to detect deviation from randomness. Best-fit methods based on first nearest neighbour distance (1 NN method) and pair correlation function are presented and compared to assess the chemical composition of tiny clusters. Delaunay tessellation for cluster selection has been also illustrated. These statistical tools have been applied to APT experiments on microelectronics materials. Copyright © 2012 Elsevier B.V. All rights reserved.

  20. Reliability approach to rotating-component design. [fatigue life and stress concentration

    NASA Technical Reports Server (NTRS)

    Kececioglu, D. B.; Lalli, V. R.

    1975-01-01

    A probabilistic methodology for designing rotating mechanical components using reliability to relate stress to strength is explained. The experimental test machines and data obtained for steel to verify this methodology are described. A sample mechanical rotating component design problem is solved by comparing a deterministic design method with the new design-by reliability approach. The new method shows that a smaller size and weight can be obtained for specified rotating shaft life and reliability, and uses the statistical distortion-energy theory with statistical fatigue diagrams for optimum shaft design. Statistical methods are presented for (1) determining strength distributions for steel experimentally, (2) determining a failure theory for stress variations in a rotating shaft subjected to reversed bending and steady torque, and (3) relating strength to stress by reliability.

  1. Methodological Reporting of Randomized Trials in Five Leading Chinese Nursing Journals

    PubMed Central

    Shi, Chunhu; Tian, Jinhui; Ren, Dan; Wei, Hongli; Zhang, Lihuan; Wang, Quan; Yang, Kehu

    2014-01-01

    Background Randomized controlled trials (RCTs) are not always well reported, especially in terms of their methodological descriptions. This study aimed to investigate the adherence of methodological reporting complying with CONSORT and explore associated trial level variables in the Chinese nursing care field. Methods In June 2012, we identified RCTs published in five leading Chinese nursing journals and included trials with details of randomized methods. The quality of methodological reporting was measured through the methods section of the CONSORT checklist and the overall CONSORT methodological items score was calculated and expressed as a percentage. Meanwhile, we hypothesized that some general and methodological characteristics were associated with reporting quality and conducted a regression with these data to explore the correlation. The descriptive and regression statistics were calculated via SPSS 13.0. Results In total, 680 RCTs were included. The overall CONSORT methodological items score was 6.34±0.97 (Mean ± SD). No RCT reported descriptions and changes in “trial design,” changes in “outcomes” and “implementation,” or descriptions of the similarity of interventions for “blinding.” Poor reporting was found in detailing the “settings of participants” (13.1%), “type of randomization sequence generation” (1.8%), calculation methods of “sample size” (0.4%), explanation of any interim analyses and stopping guidelines for “sample size” (0.3%), “allocation concealment mechanism” (0.3%), additional analyses in “statistical methods” (2.1%), and targeted subjects and methods of “blinding” (5.9%). More than 50% of trials described randomization sequence generation, the eligibility criteria of “participants,” “interventions,” and definitions of the “outcomes” and “statistical methods.” The regression analysis found that publication year and ITT analysis were weakly associated with CONSORT score. Conclusions The completeness of methodological reporting of RCTs in the Chinese nursing care field is poor, especially with regard to the reporting of trial design, changes in outcomes, sample size calculation, allocation concealment, blinding, and statistical methods. PMID:25415382

  2. Beyond the floor effect on the WISC-IV in individuals with Down syndrome: are there cognitive strengths and weaknesses?

    PubMed

    Pezzuti, L; Nacinovich, R; Oggiano, S; Bomba, M; Ferri, R; La Stella, A; Rossetti, S; Orsini, A

    2018-07-01

    Individuals with Down syndrome generally show a floor effect on Wechsler Scales that is manifested by flat profiles and with many or all of the weighted scores on the subtests equal to 1. The main aim of the present paper is to use the statistical Hessl method and the extended statistical method of Orsini, Pezzuti and Hulbert with a sample of individuals with Down syndrome (n = 128; 72 boys and 56 girls), to underline the variability of performance on Wechsler Intelligence Scale for Children-Fourth Edition subtests and indices, highlighting any strengths and weaknesses of this population that otherwise appear to be flattened. Based on results using traditional transformation of raw scores into weighted scores, a very high percentage of subtests with weighted score of 1 occurred in the Down syndrome sample, with a floor effect and without any statistically significant difference between four core Wechsler Intelligence Scale for Children-Fourth Edition indices. The results, using traditional transformation, confirm a deep cognitive impairment of those with Down syndrome. Conversely, using the new statistical method, it is immediately apparent that the variability of the scores, both on subtests and indices, is wider with respect to the traditional method. Children with Down syndrome show a greater ability in the Verbal Comprehension Index than in the Working Memory Index. © 2018 MENCAP and International Association of the Scientific Study of Intellectual and Developmental Disabilities and John Wiley & Sons Ltd.

  3. Statistical Learning Theory for High Dimensional Prediction: Application to Criterion-Keyed Scale Development

    PubMed Central

    Chapman, Benjamin P.; Weiss, Alexander; Duberstein, Paul

    2016-01-01

    Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in “big data” problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how three common SLT algorithms–Supervised Principal Components, Regularization, and Boosting—can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach—or perhaps because of them–SLT methods may hold value as a statistically rigorous approach to exploratory regression. PMID:27454257

  4. The Content of Statistical Requirements for Authors in Biomedical Research Journals.

    PubMed

    Liu, Tian-Yi; Cai, Si-Yu; Nie, Xiao-Lu; Lyu, Ya-Qi; Peng, Xiao-Xia; Feng, Guo-Shuang

    2016-10-20

    Robust statistical designing, sound statistical analysis, and standardized presentation are important to enhance the quality and transparency of biomedical research. This systematic review was conducted to summarize the statistical reporting requirements introduced by biomedical research journals with an impact factor of 10 or above so that researchers are able to give statistical issues' serious considerations not only at the stage of data analysis but also at the stage of methodological design. Detailed statistical instructions for authors were downloaded from the homepage of each of the included journals or obtained from the editors directly via email. Then, we described the types and numbers of statistical guidelines introduced by different press groups. Items of statistical reporting guideline as well as particular requirements were summarized in frequency, which were grouped into design, method of analysis, and presentation, respectively. Finally, updated statistical guidelines and particular requirements for improvement were summed up. Totally, 21 of 23 press groups introduced at least one statistical guideline. More than half of press groups can update their statistical instruction for authors gradually relative to issues of new statistical reporting guidelines. In addition, 16 press groups, covering 44 journals, address particular statistical requirements. The most of the particular requirements focused on the performance of statistical analysis and transparency in statistical reporting, including "address issues relevant to research design, including participant flow diagram, eligibility criteria, and sample size estimation," and "statistical methods and the reasons." Statistical requirements for authors are becoming increasingly perfected. Statistical requirements for authors remind researchers that they should make sufficient consideration not only in regards to statistical methods during the research design, but also standardized statistical reporting, which would be beneficial in providing stronger evidence and making a greater critical appraisal of evidence more accessible.

  5. Performance testing of NIOSH Method 5524/ASTM Method D-7049-04, for determination of metalworking fluids.

    PubMed

    Glaser, Robert; Kurimo, Robert; Shulman, Stanley

    2007-08-01

    A performance test of NIOSH Method 5524/ASTM Method D-7049-04 for analysis of metalworking fluids (MWF) was conducted. These methods involve determination of the total and extractable weights of MWF samples; extractions are performed using a ternary blend of toluene:dichloromethane:methanol and a binary blend of methanol:water. Six laboratories participated in this study. A preliminary analysis of 20 blank samples was made to familiarize the laboratories with the procedure(s) and to estimate the methods' limits of detection/quantitation (LODs/LOQs). Synthetically generated samples of a semisynthetic MWF aerosol were then collected on tared polytetrafluoroethylene (PTFE) filters and analyzed according to the methods by all participants. Sample masses deposited (approximately 400-500 micro g) corresponded to amounts expected in an 8-hr shift at the NIOSH recommended exposure levels (REL) of 0.4 mg/m(3) (thoracic) and 0.5 mg/m(3) (total particulate). The generator output was monitored with a calibrated laser particle counter. One laboratory significantly underreported the sampled masses relative to the other five labs. A follow-up study compared only gravimetric results of this laboratory with those of two other labs. In the preliminary analysis of blanks; the average LOQs were 0.094 mg for the total weight analysis and 0.136 mg for the extracted weight analyses. For the six-lab study, the average LOQs were 0.064 mg for the total weight analyses and 0.067 mg for the extracted weight analyses. Using ASTM conventions, h and k statistics were computed to determine the degree of consistency of each laboratory with the others. One laboratory experienced problems with precision but not bias. The precision estimates for the remaining five labs were not different statistically (alpha = 0.005) for either the total or extractable weights. For all six labs, the average fraction extracted was > or =0.94 (CV = 0.025). Pooled estimates of the total coefficients of variation of analysis were 0.13 for the total weight samples and 0.13 for the extracted weight samples. An overall method bias of -5% was determined by comparing the overall mean concentration reported by the participants to that determined by the particle counter. In the three-lab follow-up study, the nonconsistent lab reported results that were unbiased but statistically less precise than the others; the average LOQ was 0.133 mg for the total weight analyses. It is concluded that aerosolized MWF sampled at concentrations corresponding to either of the NIOSH RELs can generally be shipped unrefrigerated, stored refrigerated up to 7 days, and then analyzed quantitatively and precisely for MWF using the NIOSH/ASTM procedures.

  6. Neutron/Gamma-ray discrimination through measures of fit

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Amiri, Moslem; Prenosil, Vaclav; Cvachovec, Frantisek

    2015-07-01

    Statistical tests and their underlying measures of fit can be utilized to separate neutron/gamma-ray pulses in a mixed radiation field. In this article, first the application of a sample statistical test is explained. Fit measurement-based methods require true pulse shapes to be used as reference for discrimination. This requirement makes practical implementation of these methods difficult; typically another discrimination approach should be employed to capture samples of neutrons and gamma-rays before running the fit-based technique. In this article, we also propose a technique to eliminate this requirement. These approaches are applied to several sets of mixed neutron and gamma-ray pulsesmore » obtained through different digitizers using stilbene scintillator in order to analyze them and measure their discrimination quality. (authors)« less

  7. Solving large-scale PDE-constrained Bayesian inverse problems with Riemann manifold Hamiltonian Monte Carlo

    NASA Astrophysics Data System (ADS)

    Bui-Thanh, T.; Girolami, M.

    2014-11-01

    We consider the Riemann manifold Hamiltonian Monte Carlo (RMHMC) method for solving statistical inverse problems governed by partial differential equations (PDEs). The Bayesian framework is employed to cast the inverse problem into the task of statistical inference whose solution is the posterior distribution in infinite dimensional parameter space conditional upon observation data and Gaussian prior measure. We discretize both the likelihood and the prior using the H1-conforming finite element method together with a matrix transfer technique. The power of the RMHMC method is that it exploits the geometric structure induced by the PDE constraints of the underlying inverse problem. Consequently, each RMHMC posterior sample is almost uncorrelated/independent from the others providing statistically efficient Markov chain simulation. However this statistical efficiency comes at a computational cost. This motivates us to consider computationally more efficient strategies for RMHMC. At the heart of our construction is the fact that for Gaussian error structures the Fisher information matrix coincides with the Gauss-Newton Hessian. We exploit this fact in considering a computationally simplified RMHMC method combining state-of-the-art adjoint techniques and the superiority of the RMHMC method. Specifically, we first form the Gauss-Newton Hessian at the maximum a posteriori point and then use it as a fixed constant metric tensor throughout RMHMC simulation. This eliminates the need for the computationally costly differential geometric Christoffel symbols, which in turn greatly reduces computational effort at a corresponding loss of sampling efficiency. We further reduce the cost of forming the Fisher information matrix by using a low rank approximation via a randomized singular value decomposition technique. This is efficient since a small number of Hessian-vector products are required. The Hessian-vector product in turn requires only two extra PDE solves using the adjoint technique. Various numerical results up to 1025 parameters are presented to demonstrate the ability of the RMHMC method in exploring the geometric structure of the problem to propose (almost) uncorrelated/independent samples that are far away from each other, and yet the acceptance rate is almost unity. The results also suggest that for the PDE models considered the proposed fixed metric RMHMC can attain almost as high a quality performance as the original RMHMC, i.e. generating (almost) uncorrelated/independent samples, while being two orders of magnitude less computationally expensive.

  8. Measuring, Understanding, and Responding to Covert Social Networks: Passive and Active Tomography

    DTIC Science & Technology

    2017-11-29

    Methods for generating a random sample of networks with desired properties are important tools for the analysis of social , biological, and information...on Theoretical Foundations for Statistical Network Analysis at the Isaac Newton Institute for Mathematical Sciences at Cambridge U. (organized by...Approach SOCIAL SCIENCES STATISTICS EECS Problems span three disciplines Scientific focus is needed at the interfaces

  9. Conditional Random Fields for Fast, Large-Scale Genome-Wide Association Studies

    PubMed Central

    Huang, Jim C.; Meek, Christopher; Kadie, Carl; Heckerman, David

    2011-01-01

    Understanding the role of genetic variation in human diseases remains an important problem to be solved in genomics. An important component of such variation consist of variations at single sites in DNA, or single nucleotide polymorphisms (SNPs). Typically, the problem of associating particular SNPs to phenotypes has been confounded by hidden factors such as the presence of population structure, family structure or cryptic relatedness in the sample of individuals being analyzed. Such confounding factors lead to a large number of spurious associations and missed associations. Various statistical methods have been proposed to account for such confounding factors such as linear mixed-effect models (LMMs) or methods that adjust data based on a principal components analysis (PCA), but these methods either suffer from low power or cease to be tractable for larger numbers of individuals in the sample. Here we present a statistical model for conducting genome-wide association studies (GWAS) that accounts for such confounding factors. Our method scales in runtime quadratic in the number of individuals being studied with only a modest loss in statistical power as compared to LMM-based and PCA-based methods when testing on synthetic data that was generated from a generalized LMM. Applying our method to both real and synthetic human genotype/phenotype data, we demonstrate the ability of our model to correct for confounding factors while requiring significantly less runtime relative to LMMs. We have implemented methods for fitting these models, which are available at http://www.microsoft.com/science. PMID:21765897

  10. An introduction to Bayesian statistics in health psychology.

    PubMed

    Depaoli, Sarah; Rus, Holly M; Clifton, James P; van de Schoot, Rens; Tiemensma, Jitske

    2017-09-01

    The aim of the current article is to provide a brief introduction to Bayesian statistics within the field of health psychology. Bayesian methods are increasing in prevalence in applied fields, and they have been shown in simulation research to improve the estimation accuracy of structural equation models, latent growth curve (and mixture) models, and hierarchical linear models. Likewise, Bayesian methods can be used with small sample sizes since they do not rely on large sample theory. In this article, we discuss several important components of Bayesian statistics as they relate to health-based inquiries. We discuss the incorporation and impact of prior knowledge into the estimation process and the different components of the analysis that should be reported in an article. We present an example implementing Bayesian estimation in the context of blood pressure changes after participants experienced an acute stressor. We conclude with final thoughts on the implementation of Bayesian statistics in health psychology, including suggestions for reviewing Bayesian manuscripts and grant proposals. We have also included an extensive amount of online supplementary material to complement the content presented here, including Bayesian examples using many different software programmes and an extensive sensitivity analysis examining the impact of priors.

  11. Statistical dependency in visual scanning

    NASA Technical Reports Server (NTRS)

    Ellis, Stephen R.; Stark, Lawrence

    1986-01-01

    A method to identify statistical dependencies in the positions of eye fixations is developed and applied to eye movement data from subjects who viewed dynamic displays of air traffic and judged future relative position of aircraft. Analysis of approximately 23,000 fixations on points of interest on the display identified statistical dependencies in scanning that were independent of the physical placement of the points of interest. Identification of these dependencies is inconsistent with random-sampling-based theories used to model visual search and information seeking.

  12. Statistical considerations for plot design, sampling procedures, analysis, and quality assurance of ozone injury studies

    Treesearch

    Michael Arbaugh; Larry Bednar

    1996-01-01

    The sampling methods used to monitor ozone injury to ponderosa and Jeffrey pines depend on the objectives of the study, geographic and genetic composition of the forest, and the source and composition of air pollutant emissions. By using a standardized sampling methodology, it may be possible to compare conditions within local areas more accurately, and to apply the...

  13. Influence of the processing route of porcelain/Ti-6Al-4V interfaces on shear bond strength.

    PubMed

    Toptan, Fatih; Alves, Alexandra C; Henriques, Bruno; Souza, Júlio C M; Coelho, Rui; Silva, Filipe S; Rocha, Luís A; Ariza, Edith

    2013-04-01

    This study aims at evaluating the two-fold effect of initial surface conditions and dental porcelain-to-Ti-6Al-4V alloy joining processing route on the shear bond strength. Porcelain-to-Ti-6Al-4V samples were processed by conventional furnace firing (porcelain-fused-to-metal) and hot pressing. Prior to the processing, Ti-6Al-4V cylinders were prepared by three different surface treatments: polishing, alumina or silica blasting. Within the firing process, polished and alumina blasted samples were subjected to two different cooling rates: air cooling and a slower cooling rate (65°C/min). Metal/porcelain bond strength was evaluated by shear bond test. The data were analyzed using one-way ANOVA followed by Tuckey's test (p<0.05). Before and after shear bond tests, metallic surfaces and metal/ceramic interfaces were examined by Field Emission Gun Scanning Electron Microscope (FEG-SEM) equipped with Energy Dispersive X-Ray Spectroscopy (EDS). Shear bond strength values of the porcelain-to-Ti-6Al-4V alloy interfaces ranged from 27.1±8.9MPa for porcelain fused to polished samples up to 134.0±43.4MPa for porcelain fused to alumina blasted samples. According to the statistical analysis, no significant difference were found on the shear bond strength values for different cooling rates. Processing method was statistically significant only for the polished samples, and airborne particle abrasion was statistically significant only for the fired samples. The type of the blasting material did not cause a statistically significant difference on the shear bond strength values. Shear bond strength of dental porcelain to Ti-6Al-4V alloys can be significantly improved from controlled conditions of surface treatments and processing methods. Copyright © 2013 Elsevier Ltd. All rights reserved.

  14. Detecting cell death with optical coherence tomography and envelope statistics

    NASA Astrophysics Data System (ADS)

    Farhat, Golnaz; Yang, Victor X. D.; Czarnota, Gregory J.; Kolios, Michael C.

    2011-02-01

    Currently no standard clinical or preclinical noninvasive method exists to monitor cell death based on morphological changes at the cellular level. In our past work we have demonstrated that quantitative high frequency ultrasound imaging can detect cell death in vitro and in vivo. In this study we apply quantitative methods previously used with high frequency ultrasound to optical coherence tomography (OCT) to detect cell death. The ultimate goal of this work is to use these methods for optically-based clinical and preclinical cancer treatment monitoring. Optical coherence tomography data were acquired from acute myeloid leukemia cells undergoing three modes of cell death. Significant increases in integrated backscatter were observed for cells undergoing apoptosis and mitotic arrest, while necrotic cells induced a decrease. These changes appear to be linked to structural changes observed in histology obtained from the cell samples. Signal envelope statistics were analyzed from fittings of the generalized gamma distribution to histograms of envelope intensities. The parameters from this distribution demonstrated sensitivities to morphological changes in the cell samples. These results indicate that OCT integrated backscatter and first order envelope statistics can be used to detect and potentially differentiate between modes of cell death in vitro.

  15. Pick a Sample.

    ERIC Educational Resources Information Center

    Peterson, Ivars

    1991-01-01

    A method that enables people to obtain the benefits of statistics and probability theory without the shortcomings of conventional methods because it is free of mathematical formulas and is easy to understand and use is described. A resampling technique called the "bootstrap" is discussed in terms of application and development. (KR)

  16. Climate Verification Using Running Mann Whitney Z Statistics

    USDA-ARS?s Scientific Manuscript database

    A robust method previously used to detect observed intra- to multi-decadal (IMD) climate regimes was adapted to test whether climate models could reproduce IMD variations in U.S. surface temperatures during 1919-2008. This procedure, called the running Mann Whitney Z (MWZ) method, samples data ranki...

  17. Delta13C and delta18O isotopic composition of CaCO3 measured by continuous flow isotope ratio mass spectrometry: statistical evaluation and verification by application to Devils Hole core DH-11 calcite.

    PubMed

    Révész, Kinga M; Landwehr, Jurate M

    2002-01-01

    A new method was developed to analyze the stable carbon and oxygen isotope ratios of small samples (400 +/- 20 micro g) of calcium carbonate. This new method streamlines the classical phosphoric acid/calcium carbonate (H(3)PO(4)/CaCO(3)) reaction method by making use of a recently available Thermoquest-Finnigan GasBench II preparation device and a Delta Plus XL continuous flow isotope ratio mass spectrometer. Conditions for which the H(3)PO(4)/CaCO(3) reaction produced reproducible and accurate results with minimal error had to be determined. When the acid/carbonate reaction temperature was kept at 26 degrees C and the reaction time was between 24 and 54 h, the precision of the carbon and oxygen isotope ratios for pooled samples from three reference standard materials was

  18. A Tool for Estimating Variability in Wood Preservative Treatment Retention

    Treesearch

    Patricia K. Lebow; Adam M. Taylor; Timothy M. Young

    2015-01-01

    Composite sampling is standard practice for evaluation of preservative retention levels in preservative-treated wood. Current protocols provide an average retention value but no estimate of uncertainty. Here we describe a statistical method for calculating uncertainty estimates using the standard sampling regime with minimal additional chemical analysis. This tool can...

  19. An Investigation of Sample Size Splitting on ATFIND and DIMTEST

    ERIC Educational Resources Information Center

    Socha, Alan; DeMars, Christine E.

    2013-01-01

    Modeling multidimensional test data with a unidimensional model can result in serious statistical errors, such as bias in item parameter estimates. Many methods exist for assessing the dimensionality of a test. The current study focused on DIMTEST. Using simulated data, the effects of sample size splitting for use with the ATFIND procedure for…

  20. State Estimates of Disability in America. Disability Statistics Report 3.

    ERIC Educational Resources Information Center

    LaPlante, Mitchell P.

    This study presents and discusses existing data on disability by state, from the 1980 and 1990 censuses, the Current Population Survey (CPS), and the National Health Interview Survey (NHIS). The study used direct methods for states with large sample sizes and synthetic estimates for states with low sample sizes. The study's highlighted findings…

  1. Effect of different mixing methods on the bacterial microleakage of calcium-enriched mixture cement.

    PubMed

    Shahi, Shahriar; Jeddi Khajeh, Soniya; Rahimi, Saeed; Yavari, Hamid R; Jafari, Farnaz; Samiei, Mohammad; Ghasemi, Negin; Milani, Amin S

    2016-10-01

    Calcium-enriched mixture (CEM) cement is used in the field of endodontics. It is similar to mineral trioxide aggregate in its main ingredients. The present study investigated the effect of different mixing methods on the bacterial microleakage of CEM cement. A total of 55 human single-rooted human permanent teeth were decoronated so that 14-mm-long samples were obtained and obturated with AH26 sealer and gutta-percha using lateral condensation technique. Three millimeters of the root end were cut off and randomly divided into 3 groups of 15 each (3 mixing methods of amalgamator, ultrasonic and conventional) and 2 negative and positive control groups (each containing 5 samples). BHI (brain-heart infusion agar) suspension containing Enterococcus faecalis was used for bacterial leakage assessment. Statistical analysis was carried out using descriptive statistics, Kaplan-Meier survival analysis with censored data and log rank test. Statistical significance was set at P<0.05. The survival means for conventional, amalgamator and ultrasonic methods were 62.13±12.44, 68.87±12.79 and 77.53±12.52 days, respectively. The log rank test showed no significant differences between the groups. Based on the results of the present study it can be concluded that different mixing methods had no significant effect on the bacterial microleakage of CEM cement.

  2. Integrating quantitative PCR and Bayesian statistics in quantifying human adenoviruses in small volumes of source water.

    PubMed

    Wu, Jianyong; Gronewold, Andrew D; Rodriguez, Roberto A; Stewart, Jill R; Sobsey, Mark D

    2014-02-01

    Rapid quantification of viral pathogens in drinking and recreational water can help reduce waterborne disease risks. For this purpose, samples in small volume (e.g. 1L) are favored because of the convenience of collection, transportation and processing. However, the results of viral analysis are often subject to uncertainty. To overcome this limitation, we propose an approach that integrates Bayesian statistics, efficient concentration methods, and quantitative PCR (qPCR) to quantify viral pathogens in water. Using this approach, we quantified human adenoviruses (HAdVs) in eighteen samples of source water collected from six drinking water treatment plants. HAdVs were found in seven samples. In the other eleven samples, HAdVs were not detected by qPCR, but might have existed based on Bayesian inference. Our integrated approach that quantifies uncertainty provides a better understanding than conventional assessments of potential risks to public health, particularly in cases when pathogens may present a threat but cannot be detected by traditional methods. © 2013 Elsevier B.V. All rights reserved.

  3. A Ricin Forensic Profiling Approach Based on a Complex Set of Biomarkers

    DOE PAGES

    Fredriksson, Sten-Ake; Wunschel, David S.; Lindstrom, Susanne Wiklund; ...

    2018-03-28

    A forensic method for the retrospective determination of preparation methods used for illicit ricin toxin production was developed. The method was based on a complex set of biomarkers, including carbohydrates, fatty acids, seed storage proteins, in combination with data on ricin and Ricinus communis agglutinin. The analyses were performed on samples prepared from four castor bean plant (R. communis) cultivars by four different sample preparation methods (PM1 – PM4) ranging from simple disintegration of the castor beans to multi-step preparation methods including different protein precipitation methods. Comprehensive analytical data was collected by use of a range of analytical methods andmore » robust orthogonal partial least squares-discriminant analysis- models (OPLS-DA) were constructed based on the calibration set. By the use of a decision tree and two OPLS-DA models, the sample preparation methods of test set samples were determined. The model statistics of the two models were good and a 100% rate of correct predictions of the test set was achieved.« less

  4. A Ricin Forensic Profiling Approach Based on a Complex Set of Biomarkers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fredriksson, Sten-Ake; Wunschel, David S.; Lindstrom, Susanne Wiklund

    A forensic method for the retrospective determination of preparation methods used for illicit ricin toxin production was developed. The method was based on a complex set of biomarkers, including carbohydrates, fatty acids, seed storage proteins, in combination with data on ricin and Ricinus communis agglutinin. The analyses were performed on samples prepared from four castor bean plant (R. communis) cultivars by four different sample preparation methods (PM1 – PM4) ranging from simple disintegration of the castor beans to multi-step preparation methods including different protein precipitation methods. Comprehensive analytical data was collected by use of a range of analytical methods andmore » robust orthogonal partial least squares-discriminant analysis- models (OPLS-DA) were constructed based on the calibration set. By the use of a decision tree and two OPLS-DA models, the sample preparation methods of test set samples were determined. The model statistics of the two models were good and a 100% rate of correct predictions of the test set was achieved.« less

  5. What Teachers Should Know About the Bootstrap: Resampling in the Undergraduate Statistics Curriculum

    PubMed Central

    Hesterberg, Tim C.

    2015-01-01

    Bootstrapping has enormous potential in statistics education and practice, but there are subtle issues and ways to go wrong. For example, the common combination of nonparametric bootstrapping and bootstrap percentile confidence intervals is less accurate than using t-intervals for small samples, though more accurate for larger samples. My goals in this article are to provide a deeper understanding of bootstrap methods—how they work, when they work or not, and which methods work better—and to highlight pedagogical issues. Supplementary materials for this article are available online. [Received December 2014. Revised August 2015] PMID:27019512

  6. Mutation Clusters from Cancer Exome.

    PubMed

    Kakushadze, Zura; Yu, Willie

    2017-08-15

    We apply our statistically deterministic machine learning/clustering algorithm *K-means (recently developed in https://ssrn.com/abstract=2908286) to 10,656 published exome samples for 32 cancer types. A majority of cancer types exhibit a mutation clustering structure. Our results are in-sample stable. They are also out-of-sample stable when applied to 1389 published genome samples across 14 cancer types. In contrast, we find in- and out-of-sample instabilities in cancer signatures extracted from exome samples via nonnegative matrix factorization (NMF), a computationally-costly and non-deterministic method. Extracting stable mutation structures from exome data could have important implications for speed and cost, which are critical for early-stage cancer diagnostics, such as novel blood-test methods currently in development.

  7. Mutation Clusters from Cancer Exome

    PubMed Central

    Kakushadze, Zura; Yu, Willie

    2017-01-01

    We apply our statistically deterministic machine learning/clustering algorithm *K-means (recently developed in https://ssrn.com/abstract=2908286) to 10,656 published exome samples for 32 cancer types. A majority of cancer types exhibit a mutation clustering structure. Our results are in-sample stable. They are also out-of-sample stable when applied to 1389 published genome samples across 14 cancer types. In contrast, we find in- and out-of-sample instabilities in cancer signatures extracted from exome samples via nonnegative matrix factorization (NMF), a computationally-costly and non-deterministic method. Extracting stable mutation structures from exome data could have important implications for speed and cost, which are critical for early-stage cancer diagnostics, such as novel blood-test methods currently in development. PMID:28809811

  8. Statistics Clinic

    NASA Technical Reports Server (NTRS)

    Feiveson, Alan H.; Foy, Millennia; Ploutz-Snyder, Robert; Fiedler, James

    2014-01-01

    Do you have elevated p-values? Is the data analysis process getting you down? Do you experience anxiety when you need to respond to criticism of statistical methods in your manuscript? You may be suffering from Insufficient Statistical Support Syndrome (ISSS). For symptomatic relief of ISSS, come for a free consultation with JSC biostatisticians at our help desk during the poster sessions at the HRP Investigators Workshop. Get answers to common questions about sample size, missing data, multiple testing, when to trust the results of your analyses and more. Side effects may include sudden loss of statistics anxiety, improved interpretation of your data, and increased confidence in your results.

  9. A Fast Framework for Abrupt Change Detection Based on Binary Search Trees and Kolmogorov Statistic

    PubMed Central

    Qi, Jin-Peng; Qi, Jie; Zhang, Qing

    2016-01-01

    Change-Point (CP) detection has attracted considerable attention in the fields of data mining and statistics; it is very meaningful to discuss how to quickly and efficiently detect abrupt change from large-scale bioelectric signals. Currently, most of the existing methods, like Kolmogorov-Smirnov (KS) statistic and so forth, are time-consuming, especially for large-scale datasets. In this paper, we propose a fast framework for abrupt change detection based on binary search trees (BSTs) and a modified KS statistic, named BSTKS (binary search trees and Kolmogorov statistic). In this method, first, two binary search trees, termed as BSTcA and BSTcD, are constructed by multilevel Haar Wavelet Transform (HWT); second, three search criteria are introduced in terms of the statistic and variance fluctuations in the diagnosed time series; last, an optimal search path is detected from the root to leaf nodes of two BSTs. The studies on both the synthetic time series samples and the real electroencephalograph (EEG) recordings indicate that the proposed BSTKS can detect abrupt change more quickly and efficiently than KS, t-statistic (t), and Singular-Spectrum Analyses (SSA) methods, with the shortest computation time, the highest hit rate, the smallest error, and the highest accuracy out of four methods. This study suggests that the proposed BSTKS is very helpful for useful information inspection on all kinds of bioelectric time series signals. PMID:27413364

  10. A Fast Framework for Abrupt Change Detection Based on Binary Search Trees and Kolmogorov Statistic.

    PubMed

    Qi, Jin-Peng; Qi, Jie; Zhang, Qing

    2016-01-01

    Change-Point (CP) detection has attracted considerable attention in the fields of data mining and statistics; it is very meaningful to discuss how to quickly and efficiently detect abrupt change from large-scale bioelectric signals. Currently, most of the existing methods, like Kolmogorov-Smirnov (KS) statistic and so forth, are time-consuming, especially for large-scale datasets. In this paper, we propose a fast framework for abrupt change detection based on binary search trees (BSTs) and a modified KS statistic, named BSTKS (binary search trees and Kolmogorov statistic). In this method, first, two binary search trees, termed as BSTcA and BSTcD, are constructed by multilevel Haar Wavelet Transform (HWT); second, three search criteria are introduced in terms of the statistic and variance fluctuations in the diagnosed time series; last, an optimal search path is detected from the root to leaf nodes of two BSTs. The studies on both the synthetic time series samples and the real electroencephalograph (EEG) recordings indicate that the proposed BSTKS can detect abrupt change more quickly and efficiently than KS, t-statistic (t), and Singular-Spectrum Analyses (SSA) methods, with the shortest computation time, the highest hit rate, the smallest error, and the highest accuracy out of four methods. This study suggests that the proposed BSTKS is very helpful for useful information inspection on all kinds of bioelectric time series signals.

  11. Scoring Methods for Building Genotypic Scores: An Application to Didanosine Resistance in a Large Derivation Set

    PubMed Central

    Houssaini, Allal; Assoumou, Lambert; Miller, Veronica; Calvez, Vincent; Marcelin, Anne-Geneviève; Flandre, Philippe

    2013-01-01

    Background Several attempts have been made to determine HIV-1 resistance from genotype resistance testing. We compare scoring methods for building weighted genotyping scores and commonly used systems to determine whether the virus of a HIV-infected patient is resistant. Methods and Principal Findings Three statistical methods (linear discriminant analysis, support vector machine and logistic regression) are used to determine the weight of mutations involved in HIV resistance. We compared these weighted scores with known interpretation systems (ANRS, REGA and Stanford HIV-db) to classify patients as resistant or not. Our methodology is illustrated on the Forum for Collaborative HIV Research didanosine database (N = 1453). The database was divided into four samples according to the country of enrolment (France, USA/Canada, Italy and Spain/UK/Switzerland). The total sample and the four country-based samples allow external validation (one sample is used to estimate a score and the other samples are used to validate it). We used the observed precision to compare the performance of newly derived scores with other interpretation systems. Our results show that newly derived scores performed better than or similar to existing interpretation systems, even with external validation sets. No difference was found between the three methods investigated. Our analysis identified four new mutations associated with didanosine resistance: D123S, Q207K, H208Y and K223Q. Conclusions We explored the potential of three statistical methods to construct weighted scores for didanosine resistance. Our proposed scores performed at least as well as already existing interpretation systems and previously unrecognized didanosine-resistance associated mutations were identified. This approach could be used for building scores of genotypic resistance to other antiretroviral drugs. PMID:23555613

  12. Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters.

    PubMed

    Hensman, James; Lawrence, Neil D; Rattray, Magnus

    2013-08-20

    Time course data from microarrays and high-throughput sequencing experiments require simple, computationally efficient and powerful statistical models to extract meaningful biological signal, and for tasks such as data fusion and clustering. Existing methodologies fail to capture either the temporal or replicated nature of the experiments, and often impose constraints on the data collection process, such as regularly spaced samples, or similar sampling schema across replications. We propose hierarchical Gaussian processes as a general model of gene expression time-series, with application to a variety of problems. In particular, we illustrate the method's capacity for missing data imputation, data fusion and clustering.The method can impute data which is missing both systematically and at random: in a hold-out test on real data, performance is significantly better than commonly used imputation methods. The method's ability to model inter- and intra-cluster variance leads to more biologically meaningful clusters. The approach removes the necessity for evenly spaced samples, an advantage illustrated on a developmental Drosophila dataset with irregular replications. The hierarchical Gaussian process model provides an excellent statistical basis for several gene-expression time-series tasks. It has only a few additional parameters over a regular GP, has negligible additional complexity, is easily implemented and can be integrated into several existing algorithms. Our experiments were implemented in python, and are available from the authors' website: http://staffwww.dcs.shef.ac.uk/people/J.Hensman/.

  13. Determination of variables in the prediction of strontium distribution coefficients for selected sediments

    USGS Publications Warehouse

    Pace, M.N.; Rosentreter, J.J.; Bartholomay, R.C.

    2001-01-01

    Idaho State University and the US Geological Survey, in cooperation with the US Department of Energy, conducted a study to determine and evaluate strontium distribution coefficients (Kds) of subsurface materials at the Idaho National Engineering and Environmental Laboratory (INEEL). The Kds were determined to aid in assessing the variability of strontium Kds and their effects on chemical transport of strontium-90 in the Snake River Plain aquifer system. Data from batch experiments done to determine strontium Kds of five sediment-infill samples and six standard reference material samples were analyzed by using multiple linear regression analysis and the stepwise variable-selection method in the statistical program, Statistical Product and Service Solutions, to derive an equation of variables that can be used to predict strontium Kds of sediment-infill samples. The sediment-infill samples were from basalt vesicles and fractures from a selected core at the INEEL; strontium Kds ranged from ???201 to 356 ml g-1. The standard material samples consisted of clay minerals and calcite. The statistical analyses of the batch-experiment results showed that the amount of strontium in the initial solution, the amount of manganese oxide in the sample material, and the amount of potassium in the initial solution are the most important variables in predicting strontium Kds of sediment-infill samples.

  14. A simulative comparison of respondent driven sampling with incentivized snowball sampling--the "strudel effect".

    PubMed

    Gyarmathy, V Anna; Johnston, Lisa G; Caplinskiene, Irma; Caplinskas, Saulius; Latkin, Carl A

    2014-02-01

    Respondent driven sampling (RDS) and incentivized snowball sampling (ISS) are two sampling methods that are commonly used to reach people who inject drugs (PWID). We generated a set of simulated RDS samples on an actual sociometric ISS sample of PWID in Vilnius, Lithuania ("original sample") to assess if the simulated RDS estimates were statistically significantly different from the original ISS sample prevalences for HIV (9.8%), Hepatitis A (43.6%), Hepatitis B (Anti-HBc 43.9% and HBsAg 3.4%), Hepatitis C (87.5%), syphilis (6.8%) and Chlamydia (8.8%) infections and for selected behavioral risk characteristics. The original sample consisted of a large component of 249 people (83% of the sample) and 13 smaller components with 1-12 individuals. Generally, as long as all seeds were recruited from the large component of the original sample, the simulation samples simply recreated the large component. There were no significant differences between the large component and the entire original sample for the characteristics of interest. Altogether 99.2% of 360 simulation sample point estimates were within the confidence interval of the original prevalence values for the characteristics of interest. When population characteristics are reflected in large network components that dominate the population, RDS and ISS may produce samples that have statistically non-different prevalence values, even though some isolated network components may be under-sampled and/or statistically significantly different from the main groups. This so-called "strudel effect" is discussed in the paper. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  15. Computing physical properties with quantum Monte Carlo methods with statistical fluctuations independent of system size.

    PubMed

    Assaraf, Roland

    2014-12-01

    We show that the recently proposed correlated sampling without reweighting procedure extends the locality (asymptotic independence of the system size) of a physical property to the statistical fluctuations of its estimator. This makes the approach potentially vastly more efficient for computing space-localized properties in large systems compared with standard correlated methods. A proof is given for a large collection of noninteracting fragments. Calculations on hydrogen chains suggest that this behavior holds not only for systems displaying short-range correlations, but also for systems with long-range correlations.

  16. Extreme Quantile Estimation in Binary Response Models

    DTIC Science & Technology

    1990-03-01

    in Cancer Research," Biometria , VoL 66, pp. 307-316. Hsi, B.P. [1969], ’The Multiple Sample Up-and-Down Method in Bioassay," Journal of the American...New Method of Estimation," Biometria , VoL 53, pp. 439-454. Wetherill, G.B. [1976], Sequential Methods in Statistics, London: Chapman and Hall. Wu, C.FJ

  17. ADHD and Method Variance: A Latent Variable Approach Applied to a Nationally Representative Sample of College Freshmen

    ERIC Educational Resources Information Center

    Konold, Timothy R.; Glutting, Joseph J.

    2008-01-01

    This study employed a correlated trait-correlated method application of confirmatory factor analysis to disentangle trait and method variance from measures of attention-deficit/hyperactivity disorder obtained at the college level. The two trait factors were "Diagnostic and Statistical Manual of Mental Disorders-Fourth Edition" ("DSM-IV")…

  18. Sawing SHOLO logs: three methods

    Treesearch

    Ronald E. Coleman; Hugh W. Reynolds

    1973-01-01

    Three different methods of sawing the SHOLO log were compared on a board-foot yield basis. Using sawmill simulation, all three methods of sawing were performed on the same sample of logs, eliminating differences due to sapling. A statistical test was made to determine whether or not there were any real differences between the board-foot yields. Two of the sawing...

  19. Influence of concentration, time and method of application of citric acid and sodium citrate in root conditioning

    PubMed Central

    CAVASSIM, Rodrigo; LEITE, Fábio Renato Manzolli; ZANDIM, Daniela Leal; DANTAS, Andrea Abi Rached; RACHED, Ricardo Samih Georges Abi; SAMPAIO, José Eduardo Cezar

    2012-01-01

    Objective The aim of this study was to establish the parameters of concentration, time and mode of application of citric acid and sodium citrate in relation to root conditioning. Material and Methods A total of 495 samples were obtained and equally distributed among 11 groups (5 for testing different concentrations of citric acid, 5 for testing different concentrations of sodium citrate and 1 control group). After laboratorial processing, the samples were analyzed under scanning electron microscopy. A previously calibrated and blind examiner evaluated micrographs of the samples. Non-parametric statistical analysis was performed to analyze the data obtained. Results Brushing 25% citric acid for 3 min, promoted greater exposure of collagen fibers in comparison with the brushing of 1% citric acid for 1 minute and its topical application at 1% for 3 min. Sodium citrate exposed collagen fibers in a few number of samples. Conclusion Despite the lack of statistical significance, better results for collagen exposure were obtained with brushing application of 25% citric acid for 3 min than with other application parameter. Sodium citrate produced a few number of samples with collagen exposure, so it is not indicated for root conditioning. PMID:22858707

  20. [Flavouring estimation of quality of grape wines with use of methods of mathematical statistics].

    PubMed

    Yakuba, Yu F; Khalaphyan, A A; Temerdashev, Z A; Bessonov, V V; Malinkin, A D

    2016-01-01

    The questions of forming of wine's flavour integral estimation during the tasting are discussed, the advantages and disadvantages of the procedures are declared. As investigating materials we used the natural white and red wines of Russian manufactures, which were made with the traditional technologies from Vitis Vinifera, straight hybrids, blending and experimental wines (more than 300 different samples). The aim of the research was to set the correlation between the content of wine's nonvolatile matter and wine's tasting quality rating by mathematical statistics methods. The content of organic acids, amino acids and cations in wines were considered as the main factors influencing on the flavor. Basically, they define the beverage's quality. The determination of those components in wine's samples was done by the electrophoretic method «CAPEL». Together with the analytical checking of wine's samples quality the representative group of specialists simultaneously carried out wine's tasting estimation using 100 scores system. The possibility of statistical modelling of correlation of wine's tasting estimation based on analytical data of amino acids and cations determination reasonably describing the wine's flavour was examined. The statistical modelling of correlation between the wine's tasting estimation and the content of major cations (ammonium, potassium, sodium, magnesium, calcium), free amino acids (proline, threonine, arginine) and the taking into account the level of influence on flavour and analytical valuation within fixed limits of quality accordance were done with Statistica. Adequate statistical models which are able to predict tasting estimation that is to determine the wine's quality using the content of components forming the flavour properties have been constructed. It is emphasized that along with aromatic (volatile) substances the nonvolatile matter - mineral substances and organic substances - amino acids such as proline, threonine, arginine influence on wine's flavour properties. It has been shown the nonvolatile components contribute in organoleptic and flavour quality estimation of wines as aromatic volatile substances but they take part in forming the expert's evaluation.

  1. Estimating the mean and standard deviation of environmental data with below detection limit observations: Considering highly skewed data and model misspecification.

    PubMed

    Shoari, Niloofar; Dubé, Jean-Sébastien; Chenouri, Shoja'eddin

    2015-11-01

    In environmental studies, concentration measurements frequently fall below detection limits of measuring instruments, resulting in left-censored data. Some studies employ parametric methods such as the maximum likelihood estimator (MLE), robust regression on order statistic (rROS), and gamma regression on order statistic (GROS), while others suggest a non-parametric approach, the Kaplan-Meier method (KM). Using examples of real data from a soil characterization study in Montreal, we highlight the need for additional investigations that aim at unifying the existing literature. A number of studies have examined this issue; however, those considering data skewness and model misspecification are rare. These aspects are investigated in this paper through simulations. Among other findings, results show that for low skewed data, the performance of different statistical methods is comparable, regardless of the censoring percentage and sample size. For highly skewed data, the performance of the MLE method under lognormal and Weibull distributions is questionable; particularly, when the sample size is small or censoring percentage is high. In such conditions, MLE under gamma distribution, rROS, GROS, and KM are less sensitive to skewness. Related to model misspecification, MLE based on lognormal and Weibull distributions provides poor estimates when the true distribution of data is misspecified. However, the methods of rROS, GROS, and MLE under gamma distribution are generally robust to model misspecifications regardless of skewness, sample size, and censoring percentage. Since the characteristics of environmental data (e.g., type of distribution and skewness) are unknown a priori, we suggest using MLE based on gamma distribution, rROS and GROS. Copyright © 2015 Elsevier Ltd. All rights reserved.

  2. COMPARISON OF TWO METHODS FOR THE ISOLATION OF SALMONELLAE FROM IMPORTED FOODS.

    PubMed

    TAYLOR, W I; HOBBS, B C; SMITH, M E

    1964-01-01

    Two methods for the detection of salmonellae in foods were compared in 179 imported meat and egg samples. The number of positive samples and replications, and the number of strains and kinds of serotypes were statistically comparable by both the direct enrichment method of the Food Hygiene Laboratory in England, and the pre-enrichment method devised for processed foods in the United States. Boneless frozen beef, veal, and horsemeat imported from five countries for consumption in England were found to have salmonellae present in 48 of 116 (41%) samples. Dried egg products imported from three countries were observed to have salmonellae in 10 of 63 (16%) samples. The high incidence of salmonellae isolated from imported foods illustrated the existence of an international health hazard resulting from the continuous introduction of exogenous strains of pathogenic microorganisms on a large scale.

  3. Bounds on the minimum number of recombination events in a sample history.

    PubMed Central

    Myers, Simon R; Griffiths, Robert C

    2003-01-01

    Recombination is an important evolutionary factor in many organisms, including humans, and understanding its effects is an important task facing geneticists. Detecting past recombination events is thus important; this article introduces statistics that give a lower bound on the number of recombination events in the history of a sample, on the basis of the patterns of variation in the sample DNA. Such lower bounds are appropriate, since many recombination events in the history are typically undetectable, so the true number of historical recombinations is unobtainable. The statistics can be calculated quickly by computer and improve upon the earlier bound of Hudson and Kaplan 1985. A method is developed to combine bounds on local regions in the data to produce more powerful improved bounds. The method is flexible to different models of recombination occurrence. The approach gives recombination event bounds between all pairs of sites, to help identify regions with more detectable recombinations, and these bounds can be viewed graphically. Under coalescent simulations, there is a substantial improvement over the earlier method (of up to a factor of 2) in the expected number of recombination events detected by one of the new minima, across a wide range of parameter values. The method is applied to data from a region within the lipoprotein lipase gene and the amount of detected recombination is substantially increased. Further, there is strong clustering of detected recombination events in an area near the center of the region. A program implementing these statistics, which was used for this article, is available from http://www.stats.ox.ac.uk/mathgen/programs.html. PMID:12586723

  4. Method and system of Jones-matrix mapping of blood plasma films with "fuzzy" analysis in differentiation of breast pathology changes

    NASA Astrophysics Data System (ADS)

    Zabolotna, Natalia I.; Radchenko, Kostiantyn O.; Karas, Oleksandr V.

    2018-01-01

    A fibroadenoma diagnosing of breast using statistical analysis (determination and analysis of statistical moments of the 1st-4th order) of the obtained polarization images of Jones matrix imaginary elements of the optically thin (attenuation coefficient τ <= 0,1 ) blood plasma films with further intellectual differentiation based on the method of "fuzzy" logic and discriminant analysis were proposed. The accuracy of the intellectual differentiation of blood plasma samples to the "norm" and "fibroadenoma" of breast was 82.7% by the method of linear discriminant analysis, and by the "fuzzy" logic method is 95.3%. The obtained results allow to confirm the potentially high level of reliability of the method of differentiation by "fuzzy" analysis.

  5. Mechanical properties of silicate glasses exposed to a low-Earth orbit

    NASA Technical Reports Server (NTRS)

    Wiedlocher, David E.; Tucker, Dennis S.; Nichols, Ron; Kinser, Donald L.

    1992-01-01

    The effects of a 5.8 year exposure to low earth orbit environment upon the mechanical properties of commercial optical fused silica, low iron soda-lime-silica, Pyrex 7740, Vycor 7913, BK-7, and the glass ceramic Zerodur were examined. Mechanical testing employed the ASTM-F-394 piston on 3-ball method in a liquid nitrogen environment. Samples were exposed on the Long Duration Exposure Facility (LDEF) in two locations. Impacts were observed on all specimens except Vycor. Weibull analysis as well as a standard statistical evaluation were conducted. The Weibull analysis revealed no differences between control samples and the two exposed samples. We thus concluded that radiation components of the Earth orbital environment did not degrade the mechanical strength of the samples examined within the limits of experimental error. The upper bound of strength degradation for meteorite impacted samples based upon statistical analysis and observation was 50 percent.

  6. Differentiation of commercial vermiculite based on statistical analysis of bulk chemical data: Fingerprinting vermiculite from Libby, Montana U.S.A

    USGS Publications Warehouse

    Gunter, M.E.; Singleton, E.; Bandli, B.R.; Lowers, H.A.; Meeker, G.P.

    2005-01-01

    Major-, minor-, and trace-element compositions, as determined by X-ray fluorescence (XRF) analysis, were obtained on 34 samples of vermiculite to ascertain whether chemical differences exist to the extent of determining the source of commercial products. The sample set included ores from four deposits, seven commercially available garden products, and insulation from four attics. The trace-element distributions of Ba, Cr, and V can be used to distinguish the Libby vermiculite samples from the garden products. In general, the overall composition of the Libby and South Carolina deposits appeared similar, but differed from the South Africa and China deposits based on simple statistical methods. Cluster analysis provided a good distinction of the four ore types, grouped the four attic samples with the Libby ore, and, with less certainty, grouped the garden samples with the South Africa ore.

  7. The multiple imputation method: a case study involving secondary data analysis.

    PubMed

    Walani, Salimah R; Cleland, Charles M

    2015-05-01

    To illustrate with the example of a secondary data analysis study the use of the multiple imputation method to replace missing data. Most large public datasets have missing data, which need to be handled by researchers conducting secondary data analysis studies. Multiple imputation is a technique widely used to replace missing values while preserving the sample size and sampling variability of the data. The 2004 National Sample Survey of Registered Nurses. The authors created a model to impute missing values using the chained equation method. They used imputation diagnostics procedures and conducted regression analysis of imputed data to determine the differences between the log hourly wages of internationally educated and US-educated registered nurses. The authors used multiple imputation procedures to replace missing values in a large dataset with 29,059 observations. Five multiple imputed datasets were created. Imputation diagnostics using time series and density plots showed that imputation was successful. The authors also present an example of the use of multiple imputed datasets to conduct regression analysis to answer a substantive research question. Multiple imputation is a powerful technique for imputing missing values in large datasets while preserving the sample size and variance of the data. Even though the chained equation method involves complex statistical computations, recent innovations in software and computation have made it possible for researchers to conduct this technique on large datasets. The authors recommend nurse researchers use multiple imputation methods for handling missing data to improve the statistical power and external validity of their studies.

  8. Kolmogorov-Smirnov test for spatially correlated data

    USGS Publications Warehouse

    Olea, R.A.; Pawlowsky-Glahn, V.

    2009-01-01

    The Kolmogorov-Smirnov test is a convenient method for investigating whether two underlying univariate probability distributions can be regarded as undistinguishable from each other or whether an underlying probability distribution differs from a hypothesized distribution. Application of the test requires that the sample be unbiased and the outcomes be independent and identically distributed, conditions that are violated in several degrees by spatially continuous attributes, such as topographical elevation. A generalized form of the bootstrap method is used here for the purpose of modeling the distribution of the statistic D of the Kolmogorov-Smirnov test. The innovation is in the resampling, which in the traditional formulation of bootstrap is done by drawing from the empirical sample with replacement presuming independence. The generalization consists of preparing resamplings with the same spatial correlation as the empirical sample. This is accomplished by reading the value of unconditional stochastic realizations at the sampling locations, realizations that are generated by simulated annealing. The new approach was tested by two empirical samples taken from an exhaustive sample closely following a lognormal distribution. One sample was a regular, unbiased sample while the other one was a clustered, preferential sample that had to be preprocessed. Our results show that the p-value for the spatially correlated case is always larger that the p-value of the statistic in the absence of spatial correlation, which is in agreement with the fact that the information content of an uncorrelated sample is larger than the one for a spatially correlated sample of the same size. ?? Springer-Verlag 2008.

  9. HPLC determination of caffeine in coffee beverage

    NASA Astrophysics Data System (ADS)

    Fajara, B. E. P.; Susanti, H.

    2017-11-01

    Coffee is the second largest beverage which is consumed by people in the world, besides the water. One of the compounds which contained in coffee is caffeine. Caffeine has the pharmacological effect such as stimulating the central nervous system. The purpose of this study is to determine the level of caffeine in coffee beverages with HPLC method. Three branded coffee beverages which include in 3 of Top Brand Index 2016 Phase 2 were used as samples. Qualitative analysis was performed by Parry method, Dragendorff reagent, and comparing the retention time between sample and caffeine standard. Quantitative analysis was done by HPLC method with methanol-water (95:5v/v) as mobile phase and ODS as stationary phasewith flow rate 1 mL/min and UV 272 nm as the detector. The level of caffeine data was statistically analyzed using Anova at 95% confidence level. The Qualitative analysis showed that the three samples contained caffeine. The average of caffeine level in coffee bottles of X, Y, and Z were 138.048 mg/bottle, 109.699 mg/bottle, and 147.669 mg/bottle, respectively. The caffeine content of the three coffee beverage samples are statistically different (p<0.05). The levels of caffeine contained in X, Y, and Z coffee beverage samples were not meet the requirements set by the Indonesian Standard Agency of 50 mg/serving.

  10. Order parameter free enhanced sampling of the vapor-liquid transition using the generalized replica exchange method.

    PubMed

    Lu, Qing; Kim, Jaegil; Straub, John E

    2013-03-14

    The generalized Replica Exchange Method (gREM) is extended into the isobaric-isothermal ensemble, and applied to simulate a vapor-liquid phase transition in Lennard-Jones fluids. Merging an optimally designed generalized ensemble sampling with replica exchange, gREM is particularly well suited for the effective simulation of first-order phase transitions characterized by "backbending" in the statistical temperature. While the metastable and unstable states in the vicinity of the first-order phase transition are masked by the enthalpy gap in temperature replica exchange method simulations, they are transformed into stable states through the parameterized effective sampling weights in gREM simulations, and join vapor and liquid phases with a succession of unimodal enthalpy distributions. The enhanced sampling across metastable and unstable states is achieved without the need to identify a "good" order parameter for biased sampling. We performed gREM simulations at various pressures below and near the critical pressure to examine the change in behavior of the vapor-liquid phase transition at different pressures. We observed a crossover from the first-order phase transition at low pressure, characterized by the backbending in the statistical temperature and the "kink" in the Gibbs free energy, to a continuous second-order phase transition near the critical pressure. The controlling mechanisms of nucleation and continuous phase transition are evident and the coexistence properties and phase diagram are found in agreement with literature results.

  11. A Comparison of Normal and Elliptical Estimation Methods in Structural Equation Models.

    ERIC Educational Resources Information Center

    Schumacker, Randall E.; Cheevatanarak, Suchittra

    Monte Carlo simulation compared chi-square statistics, parameter estimates, and root mean square error of approximation values using normal and elliptical estimation methods. Three research conditions were imposed on the simulated data: sample size, population contamination percent, and kurtosis. A Bentler-Weeks structural model established the…

  12. 42 CFR 402.109 - Statistical sampling.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 42 Public Health 2 2011-10-01 2011-10-01 false Statistical sampling. 402.109 Section 402.109... Statistical sampling. (a) Purpose. CMS or OIG may introduce the results of a statistical sampling study to... or caused to be presented. (b) Prima facie evidence. The results of the statistical sampling study...

  13. 42 CFR 402.109 - Statistical sampling.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 42 Public Health 2 2010-10-01 2010-10-01 false Statistical sampling. 402.109 Section 402.109... Statistical sampling. (a) Purpose. CMS or OIG may introduce the results of a statistical sampling study to... or caused to be presented. (b) Prima facie evidence. The results of the statistical sampling study...

  14. Association between oral lichen planus and Epstein–Barr virus in Iranian patients

    PubMed Central

    Shariati, Matin; Mokhtari, Mojgan; Masoudifar, Aria

    2018-01-01

    Background: Oral lichen planus (OLP) is a common mucocutaneous disease with malignant transformation potential. Several etiologies such as humoral, autoimmunity, and viral infections might play a role, but still there is no definite etiology for this disease. The aim of this study was to investigate the presence of Epstein–Barr virus (EBV) genome in Iranian patients with OLP as compared to people with normal mucosa. Materials and Methods: The study was carried out on a case group including 38 tissue specimens of patients with histopathological confirmation of OLP and a control group including 38 samples of healthy mucosa. All samples were examined by nested polymerase chain reaction (PCR) method to determine the DNA of EBV. Results: Twenty-two (57.9%) female samples and 16 (42.1%) male samples with OLP were randomly selected as the case group, and 20 (52.6%) female samples and 18 (47.4%) male samples with healthy mucosa as the control group. There was a statistically significant difference in the percentage of EBV positivity between the case (15.8%) and the control groups (P < 0.05); in the case group, three female samples (13.6%) and three male samples (18.8%) were infected with EBV; the difference between the genders was not statistically significant (P = 0.50). Conclusion: Results emphasized that EBV genome was significantly higher among Iranian patients with OLP so antiviral therapy might be helpful. PMID:29692821

  15. How Good Are Statistical Models at Approximating Complex Fitness Landscapes?

    PubMed Central

    du Plessis, Louis; Leventhal, Gabriel E.; Bonhoeffer, Sebastian

    2016-01-01

    Fitness landscapes determine the course of adaptation by constraining and shaping evolutionary trajectories. Knowledge of the structure of a fitness landscape can thus predict evolutionary outcomes. Empirical fitness landscapes, however, have so far only offered limited insight into real-world questions, as the high dimensionality of sequence spaces makes it impossible to exhaustively measure the fitness of all variants of biologically meaningful sequences. We must therefore revert to statistical descriptions of fitness landscapes that are based on a sparse sample of fitness measurements. It remains unclear, however, how much data are required for such statistical descriptions to be useful. Here, we assess the ability of regression models accounting for single and pairwise mutations to correctly approximate a complex quasi-empirical fitness landscape. We compare approximations based on various sampling regimes of an RNA landscape and find that the sampling regime strongly influences the quality of the regression. On the one hand it is generally impossible to generate sufficient samples to achieve a good approximation of the complete fitness landscape, and on the other hand systematic sampling schemes can only provide a good description of the immediate neighborhood of a sequence of interest. Nevertheless, we obtain a remarkably good and unbiased fit to the local landscape when using sequences from a population that has evolved under strong selection. Thus, current statistical methods can provide a good approximation to the landscape of naturally evolving populations. PMID:27189564

  16. Effect size measures in a two-independent-samples case with nonnormal and nonhomogeneous data.

    PubMed

    Li, Johnson Ching-Hong

    2016-12-01

    In psychological science, the "new statistics" refer to the new statistical practices that focus on effect size (ES) evaluation instead of conventional null-hypothesis significance testing (Cumming, Psychological Science, 25, 7-29, 2014). In a two-independent-samples scenario, Cohen's (1988) standardized mean difference (d) is the most popular ES, but its accuracy relies on two assumptions: normality and homogeneity of variances. Five other ESs-the unscaled robust d (d r * ; Hogarty & Kromrey, 2001), scaled robust d (d r ; Algina, Keselman, & Penfield, Psychological Methods, 10, 317-328, 2005), point-biserial correlation (r pb ; McGrath & Meyer, Psychological Methods, 11, 386-401, 2006), common-language ES (CL; Cliff, Psychological Bulletin, 114, 494-509, 1993), and nonparametric estimator for CL (A w ; Ruscio, Psychological Methods, 13, 19-30, 2008)-may be robust to violations of these assumptions, but no study has systematically evaluated their performance. Thus, in this simulation study the performance of these six ESs was examined across five factors: data distribution, sample, base rate, variance ratio, and sample size. The results showed that A w and d r were generally robust to these violations, and A w slightly outperformed d r . Implications for the use of A w and d r in real-world research are discussed.

  17. Creamatocrit analysis of human milk overestimates fat and energy content when compared to a human milk analyzer using mid-infrared spectroscopy.

    PubMed

    O'Neill, Edward F; Radmacher, Paula G; Sparks, Blake; Adamkin, David H

    2013-05-01

    Human milk (HM) is the preferred feeding for human infants but may be inadequate to support the rapid growth of the very-low-birth-weight infant. The creamatocrit (CMCT) has been widely used to guide health care professionals as they analyze HM fortification; however, the CMCT method is based on an equation using assumptions for protein and carbohydrate with fat as the only measured variable. The aim of the present study was to test the hypothesis that a human milk analyzer (HMA) would provide more accurate data for fat and energy content than analysis by CMCT. Fifty-one well-mixed samples of previously frozen expressed HM were obtained after thawing. Previously assayed "control" milk samples were thawed and also run with unknowns. All milk samples were prewarmed at 40°C and then analyzed by both CMCT and HMA. CMCT fat results were substituted in the CMCT equation to reach a value for energy (kcal/oz). Fat results from HMA were entered into a computer model to reach a value for energy (kcal/oz). Fat and energy results were compared by paired t test with statistical significance set at P < 0.05. An additional 10 samples were analyzed locally by both methods and then sent to a certified laboratory for quantitative analysis. Results for fat and energy were analyzed by 1-way analysis of variance with statistical significance set at P < 0.05. Mean fat content by CMCT (5.8 ± 1.9 g/dL) was significantly higher than by HMA (3.2 ± 1.1 g/dL, P < 0.001). Mean energy by CMCT (21.8 ± 3.4 kcal/oz) was also significantly higher than by HMA (17.1 ± 2.9, P < 0.001). Comparison of biochemical analysis with HMA of the subset of milk samples showed no statistical difference for fat and energy, whereas CMCT was significantly higher than for both fat (P < 0.001) and energy (P = 0.002). The CMCT method appears to overestimate fat and energy content of HM samples when compared with HMA and biochemical methods.

  18. Comparison of the Cellient(™) automated cell block system and agar cell block method.

    PubMed

    Kruger, A M; Stevens, M W; Kerley, K J; Carter, C D

    2014-12-01

    To compare the Cellient(TM) automated cell block system with the agar cell block method in terms of quantity and quality of diagnostic material and morphological, histochemical and immunocytochemical features. Cell blocks were prepared from 100 effusion samples using the agar method and Cellient system, and routinely sectioned and stained for haematoxylin and eosin and periodic acid-Schiff with diastase (PASD). A preliminary immunocytochemical study was performed on selected cases (27/100 cases). Sections were evaluated using a three-point grading system to compare a set of morphological parameters. Statistical analysis was performed using Fisher's exact test. Parameters assessing cellularity, presence of single cells and definition of nuclear membrane, nucleoli, chromatin and cytoplasm showed a statistically significant improvement on Cellient cell blocks compared with agar cell blocks (P < 0.05). No significant difference was seen for definition of cell groups, PASD staining or the intensity or clarity of immunocytochemical staining. A discrepant immunocytochemistry (ICC) result was seen in 21% (13/63) of immunostains. The Cellient technique is comparable with the agar method, with statistically significant results achieved for important morphological features. It demonstrates potential as an alternative cell block preparation method which is relevant for the rapid processing of fine needle aspiration samples, malignant effusions and low-cellularity specimens, where optimal cell morphology and architecture are essential. Further investigation is required to optimize immunocytochemical staining using the Cellient method. © 2014 John Wiley & Sons Ltd.

  19. The allele combinations of three loci based on, liver, stomach cancers, hematencephalon, COPD and normal population: A preliminary study.

    PubMed

    Gai, Liping; Liu, Hui; Cui, Jing-Hui; Yu, Weijian; Ding, Xiao-Dong

    2017-03-20

    The purpose of this study was to examine the specific allele combinations of three loci connected with the liver cancers, stomach cancers, hematencephalon and patients with chronic obstructive pulmonary disease (COPD) and to explore the feasibility of the research methods. We explored different mathematical methods for statistical analyses to assess the association between the genotype and phenotype. At the same time we still analyses the statistical results of allele combinations of three loci by difference value method and ratio method. All the DNA blood samples were collected from patients with 50 liver cancers, 75 stomach cancers, 50 hematencephalon, 72 COPD and 200 normal populations. All the samples were from Chinese. Alleles from short tandem repeat (STR) loci were determined using the STR Profiler plus PCR amplification kit (15 STR loci). Previous research was based on combinations of single-locus alleles, and combinations of cross-loci (two loci) alleles. Allele combinations of three loci were obtained by computer counting and stronger genetic signal was obtained. The methods of allele combinations of three loci can help to identify the statistically significant differences of allele combinations between liver cancers, stomach cancers, patients with hematencephalon, COPD and the normal population. The probability of illness followed different rules and had apparent specificity. This method can be extended to other diseases and provide reference for early clinical diagnosis. Copyright © 2016. Published by Elsevier B.V.

  20. Evaluation of an alternate method for sampling benthic macroinvertebrates in low-gradient streams sampled as part of the National Rivers and Streams Assessment.

    PubMed

    Flotemersch, Joseph E; North, Sheila; Blocksom, Karen A

    2014-02-01

    Benthic macroinvertebrates are sampled in streams and rivers as one of the assessment elements of the US Environmental Protection Agency's National Rivers and Streams Assessment. In a 2006 report, the recommendation was made that different yet comparable methods be evaluated for different types of streams (e.g., low gradient vs. high gradient). Consequently, a research element was added to the 2008-2009 National Rivers and Streams Assessment to conduct a side-by-side comparison of the standard macroinvertebrate sampling method with an alternate method specifically designed for low-gradient wadeable streams and rivers that focused more on stream edge habitat. Samples were collected using each method at 525 sites in five of nine aggregate ecoregions located in the conterminous USA. Methods were compared using the benthic macroinvertebrate multimetric index developed for the 2006 Wadeable Streams Assessment. Statistical analysis did not reveal any trends that would suggest the overall assessment of low-gradient streams on a regional or national scale would change if the alternate method was used rather than the standard sampling method, regardless of the gradient cutoff used to define low-gradient streams. Based on these results, the National Rivers and Streams Survey should continue to use the standard field method for sampling all streams.

  1. Statistical methods for investigating quiescence and other temporal seismicity patterns

    USGS Publications Warehouse

    Matthews, M.V.; Reasenberg, P.A.

    1988-01-01

    We propose a statistical model and a technique for objective recognition of one of the most commonly cited seismicity patterns:microearthquake quiescence. We use a Poisson process model for seismicity and define a process with quiescence as one with a particular type of piece-wise constant intensity function. From this model, we derive a statistic for testing stationarity against a 'quiescence' alternative. The large-sample null distribution of this statistic is approximated from simulated distributions of appropriate functionals applied to Brownian bridge processes. We point out the restrictiveness of the particular model we propose and of the quiescence idea in general. The fact that there are many point processes which have neither constant nor quiescent rate functions underscores the need to test for and describe nonuniformity thoroughly. We advocate the use of the quiescence test in conjunction with various other tests for nonuniformity and with graphical methods such as density estimation. ideally these methods may promote accurate description of temporal seismicity distributions and useful characterizations of interesting patterns. ?? 1988 Birkha??user Verlag.

  2. Applying Bayesian statistics to the study of psychological trauma: A suggestion for future research.

    PubMed

    Yalch, Matthew M

    2016-03-01

    Several contemporary researchers have noted the virtues of Bayesian methods of data analysis. Although debates continue about whether conventional or Bayesian statistics is the "better" approach for researchers in general, there are reasons why Bayesian methods may be well suited to the study of psychological trauma in particular. This article describes how Bayesian statistics offers practical solutions to the problems of data non-normality, small sample size, and missing data common in research on psychological trauma. After a discussion of these problems and the effects they have on trauma research, this article explains the basic philosophical and statistical foundations of Bayesian statistics and how it provides solutions to these problems using an applied example. Results of the literature review and the accompanying example indicates the utility of Bayesian statistics in addressing problems common in trauma research. Bayesian statistics provides a set of methodological tools and a broader philosophical framework that is useful for trauma researchers. Methodological resources are also provided so that interested readers can learn more. (c) 2016 APA, all rights reserved).

  3. The theoretical and experimental study of a material structure evolution in gigacyclic fatigue regime

    NASA Astrophysics Data System (ADS)

    Plekhov, Oleg; Naimark, Oleg; Narykova, Maria; Kadomtsev, Andrey; Betekhtin, Vladimir

    2015-10-01

    The work is devoted to the study of the metal structure evolution under gigacyclic fatigue (VHCF) regime. The study of the mechanical properties of the samples (Armco iron) with different state of life time existing was carried out on the base of the acoustic resonance method. The damage accumulation (porosity of the samples) was studied by the hydrostatic weighing method. A statistical model of damage accumulation was proposed in order to describe the damage accumulation process. The model describes the influence of the sample surface on the location of fatigue crack initiation.

  4. Statistical variability and confidence intervals for planar dose QA pass rates

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bailey, Daniel W.; Nelms, Benjamin E.; Attwood, Kristopher

    Purpose: The most common metric for comparing measured to calculated dose, such as for pretreatment quality assurance of intensity-modulated photon fields, is a pass rate (%) generated using percent difference (%Diff), distance-to-agreement (DTA), or some combination of the two (e.g., gamma evaluation). For many dosimeters, the grid of analyzed points corresponds to an array with a low areal density of point detectors. In these cases, the pass rates for any given comparison criteria are not absolute but exhibit statistical variability that is a function, in part, on the detector sampling geometry. In this work, the authors analyze the statistics ofmore » various methods commonly used to calculate pass rates and propose methods for establishing confidence intervals for pass rates obtained with low-density arrays. Methods: Dose planes were acquired for 25 prostate and 79 head and neck intensity-modulated fields via diode array and electronic portal imaging device (EPID), and matching calculated dose planes were created via a commercial treatment planning system. Pass rates for each dose plane pair (both centered to the beam central axis) were calculated with several common comparison methods: %Diff/DTA composite analysis and gamma evaluation, using absolute dose comparison with both local and global normalization. Specialized software was designed to selectively sample the measured EPID response (very high data density) down to discrete points to simulate low-density measurements. The software was used to realign the simulated detector grid at many simulated positions with respect to the beam central axis, thereby altering the low-density sampled grid. Simulations were repeated with 100 positional iterations using a 1 detector/cm{sup 2} uniform grid, a 2 detector/cm{sup 2} uniform grid, and similar random detector grids. For each simulation, %/DTA composite pass rates were calculated with various %Diff/DTA criteria and for both local and global %Diff normalization techniques. Results: For the prostate and head/neck cases studied, the pass rates obtained with gamma analysis of high density dose planes were 2%-5% higher than respective %/DTA composite analysis on average (ranging as high as 11%), depending on tolerances and normalization. Meanwhile, the pass rates obtained via local normalization were 2%-12% lower than with global maximum normalization on average (ranging as high as 27%), depending on tolerances and calculation method. Repositioning of simulated low-density sampled grids leads to a distribution of possible pass rates for each measured/calculated dose plane pair. These distributions can be predicted using a binomial distribution in order to establish confidence intervals that depend largely on the sampling density and the observed pass rate (i.e., the degree of difference between measured and calculated dose). These results can be extended to apply to 3D arrays of detectors, as well. Conclusions: Dose plane QA analysis can be greatly affected by choice of calculation metric and user-defined parameters, and so all pass rates should be reported with a complete description of calculation method. Pass rates for low-density arrays are subject to statistical uncertainty (vs. the high-density pass rate), but these sampling errors can be modeled using statistical confidence intervals derived from the sampled pass rate and detector density. Thus, pass rates for low-density array measurements should be accompanied by a confidence interval indicating the uncertainty of each pass rate.« less

  5. Semiquantitative determination of mesophilic, aerobic microorganisms in cocoa products using the Soleris NF-TVC method.

    PubMed

    Montei, Carolyn; McDougal, Susan; Mozola, Mark; Rice, Jennifer

    2014-01-01

    The Soleris Non-fermenting Total Viable Count method was previously validated for a wide variety of food products, including cocoa powder. A matrix extension study was conducted to validate the method for use with cocoa butter and cocoa liquor. Test samples included naturally contaminated cocoa liquor and cocoa butter inoculated with natural microbial flora derived from cocoa liquor. A probability of detection statistical model was used to compare Soleris results at multiple test thresholds (dilutions) with aerobic plate counts determined using the AOAC Official Method 966.23 dilution plating method. Results of the two methods were not statistically different at any dilution level in any of the three trials conducted. The Soleris method offers the advantage of results within 24 h, compared to the 48 h required by standard dilution plating methods.

  6. Comparative study of dental cephalometric patterns of Japanese-Brazilian, Caucasian and Mongoloid patients

    PubMed Central

    Sathler, Renata; Pinzan, Arnaldo; Fernandes, Thais Maria Freire; de Almeida, Renato Rodrigues; Henriques, José Fernando Castanha

    2014-01-01

    Introduction The objective of this study was to identify the patterns of dental variables of adolescent Japanese-Brazilian descents with normal occlusion, and also to compare them with a similar Caucasian and Mongoloid sample. Methods Lateral cephalometric radiographs were used to compare the groups: Caucasian (n = 40), Japanese-Brazilian (n = 32) and Mongoloid (n = 33). The statistical tests used were one-way ANOVA and ANCOVA. The cephalometric measurements used followed the analyses of Steiner, Tweed and McNamara Jr. Results Statistical differences (P < 0.05) indicated a smaller interincisal angle and overbite for the Japanese-Brazilian sample, when compared to the Caucasian sample, although with similar values to the Mongoloid group. Conclusion The dental patterns found for the Japanese-Brazilian descents were, in general, more similar to those of the Mongoloid sample. PMID:25279521

  7. Evaluation of asbestos levels in two schools before and after asbestos removal. Final report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Karaffa, M.A.; Chesson, J.; Russell, J.

    This report presents a statistical evaluation of airborne asbestos data collected at two schools before and after removal of asbestos-containing material (ACM). Although the monitoring data are not totally consistent with new Asbestos Hazard Emergency Response Act (AHERA) requirements and recent EPA guidelines, the study evaluates these historical data by standard statistical methods to determine if abated work areas meet proposed clearance criteria. The objectives of this statistical analysis were to compare (1) airborne asbestos levels indoors after removal with levels outdoors, (2) airborne asbestos levels before and after removal of asbestos, and (3) static sampling and aggressive sampling ofmore » airborne asbestos. The results of this evaluation indicated the following: the effect of asbestos removal on indoor air quality is unpredictable; the variability in fiber concentrations among different sampling sites within the same building indicates the need to treat different sites as separate areas for the purpose of clearance; and aggressive sampling is appropriate for clearance testing because it captures more entrainable asbestos structures. Aggressive sampling lowers the chance of declaring a worksite clean when entrainable asbestos is still present.« less

  8. Robust Covariate-Adjusted Log-Rank Statistics and Corresponding Sample Size Formula for Recurrent Events Data

    PubMed Central

    Song, Rui; Kosorok, Michael R.; Cai, Jianwen

    2009-01-01

    Summary Recurrent events data are frequently encountered in clinical trials. This article develops robust covariate-adjusted log-rank statistics applied to recurrent events data with arbitrary numbers of events under independent censoring and the corresponding sample size formula. The proposed log-rank tests are robust with respect to different data-generating processes and are adjusted for predictive covariates. It reduces to the Kong and Slud (1997, Biometrika 84, 847–862) setting in the case of a single event. The sample size formula is derived based on the asymptotic normality of the covariate-adjusted log-rank statistics under certain local alternatives and a working model for baseline covariates in the recurrent event data context. When the effect size is small and the baseline covariates do not contain significant information about event times, it reduces to the same form as that of Schoenfeld (1983, Biometrics 39, 499–503) for cases of a single event or independent event times within a subject. We carry out simulations to study the control of type I error and the comparison of powers between several methods in finite samples. The proposed sample size formula is illustrated using data from an rhDNase study. PMID:18162107

  9. Dermatoglyphic features in patients with multiple sclerosis

    PubMed Central

    Sabanciogullari, Vedat; Cevik, Seyda; Karacan, Kezban; Bolayir, Ertugrul; Cimen, Mehmet

    2014-01-01

    Objective: To examine dermatoglyphic features to clarify implicated genetic predisposition in the etiology of multiple sclerosis (MS). Methods: The study was conducted between January and December 2013 in the Departments of Anatomy, and Neurology, Cumhuriyet University School of Medicine, Sivas, Turkey. The dermatoglyphic data of 61 patients, and a control group consisting of 62 healthy adults obtained with a digital scanner were transferred to a computer environment. The ImageJ program was used, and atd, dat, adt angles, a-b ridge count, sample types of all fingers, and ridge counts were calculated. Results: In both hands of the patients with MS, the a-b ridge count and ridge counts in all fingers increased, and the differences in these values were statistically significant. There was also a statistically significant increase in the dat angle in both hands of the MS patients. On the contrary, there was no statistically significant difference between the groups in terms of dermal ridge samples, and the most frequent sample in both groups was the ulnar loop. Conclusions: Aberrations in the distribution of dermatoglyphic samples support the genetic predisposition in MS etiology. Multiple sclerosis susceptible individuals may be determined by analyzing dermatoglyphic samples. PMID:25274586

  10. A comparison of fitness-case sampling methods for genetic programming

    NASA Astrophysics Data System (ADS)

    Martínez, Yuliana; Naredo, Enrique; Trujillo, Leonardo; Legrand, Pierrick; López, Uriel

    2017-11-01

    Genetic programming (GP) is an evolutionary computation paradigm for automatic program induction. GP has produced impressive results but it still needs to overcome some practical limitations, particularly its high computational cost, overfitting and excessive code growth. Recently, many researchers have proposed fitness-case sampling methods to overcome some of these problems, with mixed results in several limited tests. This paper presents an extensive comparative study of four fitness-case sampling methods, namely: Interleaved Sampling, Random Interleaved Sampling, Lexicase Selection and Keep-Worst Interleaved Sampling. The algorithms are compared on 11 symbolic regression problems and 11 supervised classification problems, using 10 synthetic benchmarks and 12 real-world data-sets. They are evaluated based on test performance, overfitting and average program size, comparing them with a standard GP search. Comparisons are carried out using non-parametric multigroup tests and post hoc pairwise statistical tests. The experimental results suggest that fitness-case sampling methods are particularly useful for difficult real-world symbolic regression problems, improving performance, reducing overfitting and limiting code growth. On the other hand, it seems that fitness-case sampling cannot improve upon GP performance when considering supervised binary classification.

  11. Comparing a single case to a control group - Applying linear mixed effects models to repeated measures data.

    PubMed

    Huber, Stefan; Klein, Elise; Moeller, Korbinian; Willmes, Klaus

    2015-10-01

    In neuropsychological research, single-cases are often compared with a small control sample. Crawford and colleagues developed inferential methods (i.e., the modified t-test) for such a research design. In the present article, we suggest an extension of the methods of Crawford and colleagues employing linear mixed models (LMM). We first show that a t-test for the significance of a dummy coded predictor variable in a linear regression is equivalent to the modified t-test of Crawford and colleagues. As an extension to this idea, we then generalized the modified t-test to repeated measures data by using LMMs to compare the performance difference in two conditions observed in a single participant to that of a small control group. The performance of LMMs regarding Type I error rates and statistical power were tested based on Monte-Carlo simulations. We found that starting with about 15-20 participants in the control sample Type I error rates were close to the nominal Type I error rate using the Satterthwaite approximation for the degrees of freedom. Moreover, statistical power was acceptable. Therefore, we conclude that LMMs can be applied successfully to statistically evaluate performance differences between a single-case and a control sample. Copyright © 2015 Elsevier Ltd. All rights reserved.

  12. A statistical method for estimating rates of soil development and ages of geologic deposits: A design for soil-chronosequence studies

    USGS Publications Warehouse

    Switzer, P.; Harden, J.W.; Mark, R.K.

    1988-01-01

    A statistical method for estimating rates of soil development in a given region based on calibration from a series of dated soils is used to estimate ages of soils in the same region that are not dated directly. The method is designed specifically to account for sampling procedures and uncertainties that are inherent in soil studies. Soil variation and measurement error, uncertainties in calibration dates and their relation to the age of the soil, and the limited number of dated soils are all considered. Maximum likelihood (ML) is employed to estimate a parametric linear calibration curve, relating soil development to time or age on suitably transformed scales. Soil variation on a geomorphic surface of a certain age is characterized by replicate sampling of soils on each surface; such variation is assumed to have a Gaussian distribution. The age of a geomorphic surface is described by older and younger bounds. This technique allows age uncertainty to be characterized by either a Gaussian distribution or by a triangular distribution using minimum, best-estimate, and maximum ages. The calibration curve is taken to be linear after suitable (in certain cases logarithmic) transformations, if required, of the soil parameter and age variables. Soil variability, measurement error, and departures from linearity are described in a combined fashion using Gaussian distributions with variances particular to each sampled geomorphic surface and the number of sample replicates. Uncertainty in age of a geomorphic surface used for calibration is described using three parameters by one of two methods. In the first method, upper and lower ages are specified together with a coverage probability; this specification is converted to a Gaussian distribution with the appropriate mean and variance. In the second method, "absolute" older and younger ages are specified together with a most probable age; this specification is converted to an asymmetric triangular distribution with mode at the most probable age. The statistical variability of the ML-estimated calibration curve is assessed by a Monte Carlo method in which simulated data sets repeatedly are drawn from the distributional specification; calibration parameters are reestimated for each such simulation in order to assess their statistical variability. Several examples are used for illustration. The age of undated soils in a related setting may be estimated from the soil data using the fitted calibration curve. A second simulation to assess age estimate variability is described and applied to the examples. ?? 1988 International Association for Mathematical Geology.

  13. Evaluation of statistical methods for quantifying fractal scaling in water-quality time series with irregular sampling

    NASA Astrophysics Data System (ADS)

    Zhang, Qian; Harman, Ciaran J.; Kirchner, James W.

    2018-02-01

    River water-quality time series often exhibit fractal scaling, which here refers to autocorrelation that decays as a power law over some range of scales. Fractal scaling presents challenges to the identification of deterministic trends because (1) fractal scaling has the potential to lead to false inference about the statistical significance of trends and (2) the abundance of irregularly spaced data in water-quality monitoring networks complicates efforts to quantify fractal scaling. Traditional methods for estimating fractal scaling - in the form of spectral slope (β) or other equivalent scaling parameters (e.g., Hurst exponent) - are generally inapplicable to irregularly sampled data. Here we consider two types of estimation approaches for irregularly sampled data and evaluate their performance using synthetic time series. These time series were generated such that (1) they exhibit a wide range of prescribed fractal scaling behaviors, ranging from white noise (β = 0) to Brown noise (β = 2) and (2) their sampling gap intervals mimic the sampling irregularity (as quantified by both the skewness and mean of gap-interval lengths) in real water-quality data. The results suggest that none of the existing methods fully account for the effects of sampling irregularity on β estimation. First, the results illustrate the danger of using interpolation for gap filling when examining autocorrelation, as the interpolation methods consistently underestimate or overestimate β under a wide range of prescribed β values and gap distributions. Second, the widely used Lomb-Scargle spectral method also consistently underestimates β. A previously published modified form, using only the lowest 5 % of the frequencies for spectral slope estimation, has very poor precision, although the overall bias is small. Third, a recent wavelet-based method, coupled with an aliasing filter, generally has the smallest bias and root-mean-squared error among all methods for a wide range of prescribed β values and gap distributions. The aliasing method, however, does not itself account for sampling irregularity, and this introduces some bias in the result. Nonetheless, the wavelet method is recommended for estimating β in irregular time series until improved methods are developed. Finally, all methods' performances depend strongly on the sampling irregularity, highlighting that the accuracy and precision of each method are data specific. Accurately quantifying the strength of fractal scaling in irregular water-quality time series remains an unresolved challenge for the hydrologic community and for other disciplines that must grapple with irregular sampling.

  14. Classification of bladder cancer cell lines using Raman spectroscopy: a comparison of excitation wavelength, sample substrate and statistical algorithms

    NASA Astrophysics Data System (ADS)

    Kerr, Laura T.; Adams, Aine; O'Dea, Shirley; Domijan, Katarina; Cullen, Ivor; Hennelly, Bryan M.

    2014-05-01

    Raman microspectroscopy can be applied to the urinary bladder for highly accurate classification and diagnosis of bladder cancer. This technique can be applied in vitro to bladder epithelial cells obtained from urine cytology or in vivo as an optical biopsy" to provide results in real-time with higher sensitivity and specificity than current clinical methods. However, there exists a high degree of variability across experimental parameters which need to be standardised before this technique can be utilized in an everyday clinical environment. In this study, we investigate different laser wavelengths (473 nm and 532 nm), sample substrates (glass, fused silica and calcium fluoride) and multivariate statistical methods in order to gain insight into how these various experimental parameters impact on the sensitivity and specificity of Raman cytology.

  15. Monitoring of an antigen manufacturing process.

    PubMed

    Zavatti, Vanessa; Budman, Hector; Legge, Raymond; Tamer, Melih

    2016-06-01

    Fluorescence spectroscopy in combination with multivariate statistical methods was employed as a tool for monitoring the manufacturing process of pertactin (PRN), one of the virulence factors of Bordetella pertussis utilized in whopping cough vaccines. Fluorophores such as amino acids and co-enzymes were detected throughout the process. The fluorescence data collected at different stages of the fermentation and purification process were treated employing principal component analysis (PCA). Through PCA, it was feasible to identify sources of variability in PRN production. Then, partial least square (PLS) was employed to correlate the fluorescence spectra obtained from pure PRN samples and the final protein content measured by a Kjeldahl test from these samples. In view that a statistically significant correlation was found between fluorescence and PRN levels, this approach could be further used as a method to predict the final protein content.

  16. Control chart pattern recognition using RBF neural network with new training algorithm and practical features.

    PubMed

    Addeh, Abdoljalil; Khormali, Aminollah; Golilarz, Noorbakhsh Amiri

    2018-05-04

    The control chart patterns are the most commonly used statistical process control (SPC) tools to monitor process changes. When a control chart produces an out-of-control signal, this means that the process has been changed. In this study, a new method based on optimized radial basis function neural network (RBFNN) is proposed for control chart patterns (CCPs) recognition. The proposed method consists of four main modules: feature extraction, feature selection, classification and learning algorithm. In the feature extraction module, shape and statistical features are used. Recently, various shape and statistical features have been presented for the CCPs recognition. In the feature selection module, the association rules (AR) method has been employed to select the best set of the shape and statistical features. In the classifier section, RBFNN is used and finally, in RBFNN, learning algorithm has a high impact on the network performance. Therefore, a new learning algorithm based on the bees algorithm has been used in the learning module. Most studies have considered only six patterns: Normal, Cyclic, Increasing Trend, Decreasing Trend, Upward Shift and Downward Shift. Since three patterns namely Normal, Stratification, and Systematic are very similar to each other and distinguishing them is very difficult, in most studies Stratification and Systematic have not been considered. Regarding to the continuous monitoring and control over the production process and the exact type detection of the problem encountered during the production process, eight patterns have been investigated in this study. The proposed method is tested on a dataset containing 1600 samples (200 samples from each pattern) and the results showed that the proposed method has a very good performance. Copyright © 2018 ISA. Published by Elsevier Ltd. All rights reserved.

  17. Method matters: Experimental evidence for shorter avian sperm in faecal compared to abdominal massage samples

    PubMed Central

    Cockburn, Glenn; Sánchez-Tójar, Alfredo; Løvlie, Hanne; Schroeder, Julia

    2017-01-01

    Birds are model organisms in sperm biology. Previous work in zebra finches, suggested that sperm sampled from males' faeces and ejaculates do not differ in size. Here, we tested this assumption in a captive population of house sparrows, Passer domesticus. We compared sperm length in samples from three collection techniques: female dummy, faecal and abdominal massage samples. We found that sperm were significantly shorter in faecal than abdominal massage samples, which was explained by shorter heads and midpieces, but not flagella. This result might indicate that faecal sampled sperm could be less mature than sperm collected by abdominal massage. The female dummy method resulted in an insufficient number of experimental ejaculates because most males ignored it. In light of these results, we recommend using abdominal massage as a preferred method for avian sperm sampling. Where avian sperm cannot be collected by abdominal massage alone, we advise controlling for sperm sampling protocol statistically. PMID:28813481

  18. Intra-individual variation in urinary iodine concentration: effect of statistical correction on population distribution using seasonal three-consecutive-day spot urine in children

    PubMed Central

    Ji, Xiaohong; Liu, Peng; Sun, Zhenqi; Su, Xiaohui; Wang, Wei; Gao, Yanhui; Sun, Dianjun

    2016-01-01

    Objective To determine the effect of statistical correction for intra-individual variation on estimated urinary iodine concentration (UIC) by sampling on 3 consecutive days in four seasons in children. Setting School-aged children from urban and rural primary schools in Harbin, Heilongjiang, China. Participants 748 and 640 children aged 8–11 years were recruited from urban and rural schools, respectively, in Harbin. Primary and secondary outcome measures The spot urine samples were collected once a day for 3 consecutive days in each season over 1 year. The UIC of the first day was corrected by two statistical correction methods: the average correction method (average of days 1, 2; average of days 1, 2 and 3) and the variance correction method (UIC of day 1 corrected by two replicates and by three replicates). The variance correction method determined the SD between subjects (Sb) and within subjects (Sw), and calculated the correction coefficient (Fi), Fi=Sb/(Sb+Sw/di), where di was the number of observations. The UIC of day 1 was then corrected using the following equation: Results The variance correction methods showed the overall Fi was 0.742 for 2 days’ correction and 0.829 for 3 days’ correction; the values for the seasons spring, summer, autumn and winter were 0.730, 0.684, 0.706 and 0.703 for 2 days’ correction and 0.809, 0.742, 0.796 and 0.804 for 3 days’ correction, respectively. After removal of the individual effect, the correlation coefficient between consecutive days was 0.224, and between non-consecutive days 0.050. Conclusions The variance correction method is effective for correcting intra-individual variation in estimated UIC following sampling on 3 consecutive days in four seasons in children. The method varies little between ages, sexes and urban or rural setting, but does vary between seasons. PMID:26920442

  19. Sample size determination for mediation analysis of longitudinal data.

    PubMed

    Pan, Haitao; Liu, Suyu; Miao, Danmin; Yuan, Ying

    2018-03-27

    Sample size planning for longitudinal data is crucial when designing mediation studies because sufficient statistical power is not only required in grant applications and peer-reviewed publications, but is essential to reliable research results. However, sample size determination is not straightforward for mediation analysis of longitudinal design. To facilitate planning the sample size for longitudinal mediation studies with a multilevel mediation model, this article provides the sample size required to achieve 80% power by simulations under various sizes of the mediation effect, within-subject correlations and numbers of repeated measures. The sample size calculation is based on three commonly used mediation tests: Sobel's method, distribution of product method and the bootstrap method. Among the three methods of testing the mediation effects, Sobel's method required the largest sample size to achieve 80% power. Bootstrapping and the distribution of the product method performed similarly and were more powerful than Sobel's method, as reflected by the relatively smaller sample sizes. For all three methods, the sample size required to achieve 80% power depended on the value of the ICC (i.e., within-subject correlation). A larger value of ICC typically required a larger sample size to achieve 80% power. Simulation results also illustrated the advantage of the longitudinal study design. The sample size tables for most encountered scenarios in practice have also been published for convenient use. Extensive simulations study showed that the distribution of the product method and bootstrapping method have superior performance to the Sobel's method, but the product method was recommended to use in practice in terms of less computation time load compared to the bootstrapping method. A R package has been developed for the product method of sample size determination in mediation longitudinal study design.

  20. Evaluation of the direct and diffusion methods for the determination of fluoride content in table salt

    PubMed Central

    Martínez-Mier, E. Angeles; Soto-Rojas, Armando E.; Buckley, Christine M.; Margineda, Jorge; Zero, Domenick T.

    2010-01-01

    Objective The aim of this study was to assess methods currently used for analyzing fluoridated salt in order to identify the most useful method for this type of analysis. Basic research design Seventy-five fluoridated salt samples were obtained. Samples were analyzed for fluoride content, with and without pretreatment, using direct and diffusion methods. Element analysis was also conducted in selected samples. Fluoride was added to ultra pure NaCl and non-fluoridated commercial salt samples and Ca and Mg were added to fluoride samples in order to assess fluoride recoveries using modifications to the methods. Results Larger amounts of fluoride were found and recovered using diffusion than direct methods (96%–100% for diffusion vs. 67%–90% for direct). Statistically significant differences were obtained between direct and diffusion methods using different ion strength adjusters. Pretreatment methods reduced the amount of recovered fluoride. Determination of fluoride content was influenced both by the presence of NaCl and other ions in the salt. Conclusion Direct and diffusion techniques for analysis of fluoridated salt are suitable methods for fluoride analysis. The choice of method should depend on the purpose of the analysis. PMID:20088217

  1. Statistical learning theory for high dimensional prediction: Application to criterion-keyed scale development.

    PubMed

    Chapman, Benjamin P; Weiss, Alexander; Duberstein, Paul R

    2016-12-01

    Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in "big data" problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how 3 common SLT algorithms-supervised principal components, regularization, and boosting-can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach-or perhaps because of them-SLT methods may hold value as a statistically rigorous approach to exploratory regression. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  2. DNA analysis in Disaster Victim Identification.

    PubMed

    Montelius, Kerstin; Lindblom, Bertil

    2012-06-01

    DNA profiling and matching is one of the primary methods to identify missing persons in a disaster, as defined by the Interpol Disaster Victim Identification Guide. The process to identify a victim by DNA includes: the collection of the best possible ante-mortem (AM) samples, the choice of post-mortem (PM) samples, DNA-analysis, matching and statistical weighting of the genetic relationship or match. Each disaster has its own scenario, and each scenario defines its own methods for identification of the deceased.

  3. A visual basic program to generate sediment grain-size statistics and to extrapolate particle distributions

    USGS Publications Warehouse

    Poppe, L.J.; Eliason, A.H.; Hastings, M.E.

    2004-01-01

    Measures that describe and summarize sediment grain-size distributions are important to geologists because of the large amount of information contained in textural data sets. Statistical methods are usually employed to simplify the necessary comparisons among samples and quantify the observed differences. The two statistical methods most commonly used by sedimentologists to describe particle distributions are mathematical moments (Krumbein and Pettijohn, 1938) and inclusive graphics (Folk, 1974). The choice of which of these statistical measures to use is typically governed by the amount of data available (Royse, 1970). If the entire distribution is known, the method of moments may be used; if the next to last accumulated percent is greater than 95, inclusive graphics statistics can be generated. Unfortunately, earlier programs designed to describe sediment grain-size distributions statistically do not run in a Windows environment, do not allow extrapolation of the distribution's tails, or do not generate both moment and graphic statistics (Kane and Hubert, 1963; Collias et al., 1963; Schlee and Webster, 1967; Poppe et al., 2000)1.Owing to analytical limitations, electro-resistance multichannel particle-size analyzers, such as Coulter Counters, commonly truncate the tails of the fine-fraction part of grain-size distributions. These devices do not detect fine clay in the 0.6–0.1 μm range (part of the 11-phi and all of the 12-phi and 13-phi fractions). Although size analyses performed down to 0.6 μm microns are adequate for most freshwater and near shore marine sediments, samples from many deeper water marine environments (e.g. rise and abyssal plain) may contain significant material in the fine clay fraction, and these analyses benefit from extrapolation.The program (GSSTAT) described herein generates statistics to characterize sediment grain-size distributions and can extrapolate the fine-grained end of the particle distribution. It is written in Microsoft Visual Basic 6.0 and provides a window to facilitate program execution. The input for the sediment fractions is weight percentages in whole-phi notation (Krumbein, 1934; Inman, 1952), and the program permits the user to select output in either method of moments or inclusive graphics statistics (Fig. 1). Users select options primarily with mouse-click events, or through interactive dialogue boxes.

  4. Radioactivity Registered With a Small Number of Events

    NASA Astrophysics Data System (ADS)

    Zlokazov, Victor; Utyonkov, Vladimir

    2018-02-01

    The synthesis of superheavy elements asks for the analysis of low statistics experimental data presumably obeying an unknown exponential distribution and to take the decision whether they originate from one source or have admixtures. Here we analyze predictions following from non-parametrical methods, employing only such fundamental sample properties as the sample mean, the median and the mode.

  5. Determining sample size for tree utilization surveys

    Treesearch

    Stanley J. Zarnoch; James W. Bentley; Tony G. Johnson

    2004-01-01

    The U.S. Department of Agriculture Forest Service has conducted many studies to determine what proportion of the timber harvested in the South is actually utilized. This paper describes the statistical methods used to determine required sample sizes for estimating utilization ratios for a required level of precision. The data used are those for 515 hardwood and 1,557...

  6. A Statistical Method for Synthesizing Mediation Analyses Using the Product of Coefficient Approach Across Multiple Trials

    PubMed Central

    Huang, Shi; MacKinnon, David P.; Perrino, Tatiana; Gallo, Carlos; Cruden, Gracelyn; Brown, C Hendricks

    2016-01-01

    Mediation analysis often requires larger sample sizes than main effect analysis to achieve the same statistical power. Combining results across similar trials may be the only practical option for increasing statistical power for mediation analysis in some situations. In this paper, we propose a method to estimate: 1) marginal means for mediation path a, the relation of the independent variable to the mediator; 2) marginal means for path b, the relation of the mediator to the outcome, across multiple trials; and 3) the between-trial level variance-covariance matrix based on a bivariate normal distribution. We present the statistical theory and an R computer program to combine regression coefficients from multiple trials to estimate a combined mediated effect and confidence interval under a random effects model. Values of coefficients a and b, along with their standard errors from each trial are the input for the method. This marginal likelihood based approach with Monte Carlo confidence intervals provides more accurate inference than the standard meta-analytic approach. We discuss computational issues, apply the method to two real-data examples and make recommendations for the use of the method in different settings. PMID:28239330

  7. On the assessment of the added value of new predictive biomarkers.

    PubMed

    Chen, Weijie; Samuelson, Frank W; Gallas, Brandon D; Kang, Le; Sahiner, Berkman; Petrick, Nicholas

    2013-07-29

    The surge in biomarker development calls for research on statistical evaluation methodology to rigorously assess emerging biomarkers and classification models. Recently, several authors reported the puzzling observation that, in assessing the added value of new biomarkers to existing ones in a logistic regression model, statistical significance of new predictor variables does not necessarily translate into a statistically significant increase in the area under the ROC curve (AUC). Vickers et al. concluded that this inconsistency is because AUC "has vastly inferior statistical properties," i.e., it is extremely conservative. This statement is based on simulations that misuse the DeLong et al. method. Our purpose is to provide a fair comparison of the likelihood ratio (LR) test and the Wald test versus diagnostic accuracy (AUC) tests. We present a test to compare ideal AUCs of nested linear discriminant functions via an F test. We compare it with the LR test and the Wald test for the logistic regression model. The null hypotheses of these three tests are equivalent; however, the F test is an exact test whereas the LR test and the Wald test are asymptotic tests. Our simulation shows that the F test has the nominal type I error even with a small sample size. Our results also indicate that the LR test and the Wald test have inflated type I errors when the sample size is small, while the type I error converges to the nominal value asymptotically with increasing sample size as expected. We further show that the DeLong et al. method tests a different hypothesis and has the nominal type I error when it is used within its designed scope. Finally, we summarize the pros and cons of all four methods we consider in this paper. We show that there is nothing inherently less powerful or disagreeable about ROC analysis for showing the usefulness of new biomarkers or characterizing the performance of classification models. Each statistical method for assessing biomarkers and classification models has its own strengths and weaknesses. Investigators need to choose methods based on the assessment purpose, the biomarker development phase at which the assessment is being performed, the available patient data, and the validity of assumptions behind the methodologies.

  8. Statistical Methods in Ai: Rare Event Learning Using Associative Rules and Higher-Order Statistics

    NASA Astrophysics Data System (ADS)

    Iyer, V.; Shetty, S.; Iyengar, S. S.

    2015-07-01

    Rare event learning has not been actively researched since lately due to the unavailability of algorithms which deal with big samples. The research addresses spatio-temporal streams from multi-resolution sensors to find actionable items from a perspective of real-time algorithms. This computing framework is independent of the number of input samples, application domain, labelled or label-less streams. A sampling overlap algorithm such as Brooks-Iyengar is used for dealing with noisy sensor streams. We extend the existing noise pre-processing algorithms using Data-Cleaning trees. Pre-processing using ensemble of trees using bagging and multi-target regression showed robustness to random noise and missing data. As spatio-temporal streams are highly statistically correlated, we prove that a temporal window based sampling from sensor data streams converges after n samples using Hoeffding bounds. Which can be used for fast prediction of new samples in real-time. The Data-cleaning tree model uses a nonparametric node splitting technique, which can be learned in an iterative way which scales linearly in memory consumption for any size input stream. The improved task based ensemble extraction is compared with non-linear computation models using various SVM kernels for speed and accuracy. We show using empirical datasets the explicit rule learning computation is linear in time and is only dependent on the number of leafs present in the tree ensemble. The use of unpruned trees (t) in our proposed ensemble always yields minimum number (m) of leafs keeping pre-processing computation to n × t log m compared to N2 for Gram Matrix. We also show that the task based feature induction yields higher Qualify of Data (QoD) in the feature space compared to kernel methods using Gram Matrix.

  9. Application of Bayesian methods to habitat selection modeling of the northern spotted owl in California: new statistical methods for wildlife research

    Treesearch

    Howard B. Stauffer; Cynthia J. Zabel; Jeffrey R. Dunk

    2005-01-01

    We compared a set of competing logistic regression habitat selection models for Northern Spotted Owls (Strix occidentalis caurina) in California. The habitat selection models were estimated, compared, evaluated, and tested using multiple sample datasets collected on federal forestlands in northern California. We used Bayesian methods in interpreting...

  10. Ocimum basilicum L.: phenolic profile and antioxidant-related activity.

    PubMed

    Dorman, H J Damien; Hiltunen, Raimo

    2010-01-01

    Ocimum basilicum L. leaf material was extracted by maceration with (80:20:1 v/v/v) methanol: water: acetic acid to produce a crude extract (CE), which was further fractionated by liquid-liquid extraction to isolate light petroleum (PE), ethyl acetate (EtOAc), n-butanol (n-BuOH) and H2O-soluble sub-fractions. The total phenol and flavonoid contents of the resulting samples were estimated using colorimetric-based methods, and their iron(III) reductive and free radical scavenging activities were determined in a battery of in vitro assays. The CE and sub-fractions contained phenolic compounds and flavonoids. The samples, except for PE, gave a positive result for the presence of flavones and flavonols; however, flavanones only appeared to be present in the CE. In iron(III) reduction, CE and n-BuOH were the most potent followed by EtOAc and H2O (statistically indistinguishable, p > 0.05). However, in the ferric reducing antioxidant power assay, H2O was the most potent followed by CE and EtOAc (statistically indistinguishable, p > 0.05) and n-BuOH and PE. In 1,1-diphenyl-2-picrylhydrazyl scavenging, all the samples, except PE, were effective against this reactive nitrogen species, with CE, EtOAc and n-BuOH being the most potent (statistically indistinguishable, p > 0.05). In alkylperoxyl scavenging, all the samples, except for PE, were effective against this reactive oxygen species (ROS). In superoxide anion scavenging, all the samples were capable of scavenging this ROS with CE being the most effective, followed by n-BuOH and H2O (statistically indistinguishable, p > 0.05) and EtOAc and PE. Similarly, in hydroxyl scavenging, all the samples were capable of scavenging this ROS with CE and n-BuOH being the most effective (statistically indistinguishable, p > 0.05) followed by EtOAc and H2O (statistically indistinguishable, p > 0.05) and PE.

  11. Application of PCA and SIMCA statistical analysis of FT-IR spectra for the classification and identification of different slag types with environmental origin.

    PubMed

    Stumpe, B; Engel, T; Steinweg, B; Marschner, B

    2012-04-03

    In the past, different slag materials were often used for landscaping and construction purposes or simply dumped. Nowadays German environmental laws strictly control the use of slags, but there is still a remaining part of 35% which is uncontrolled dumped in landfills. Since some slags have high heavy metal contents and different slag types have typical chemical and physical properties that will influence the risk potential and other characteristics of the deposits, an identification of the slag types is needed. We developed a FT-IR-based statistical method to identify different slags classes. Slags samples were collected at different sites throughout various cities within the industrial Ruhr area. Then, spectra of 35 samples from four different slags classes, ladle furnace (LF), blast furnace (BF), oxygen furnace steel (OF), and zinc furnace slags (ZF), were determined in the mid-infrared region (4000-400 cm(-1)). The spectra data sets were subject to statistical classification methods for the separation of separate spectral data of different slag classes. Principal component analysis (PCA) models for each slag class were developed and further used for soft independent modeling of class analogy (SIMCA). Precise classification of slag samples into four different slag classes were achieved using two different SIMCA models stepwise. At first, SIMCA 1 was used for classification of ZF as well as OF slags over the total spectral range. If no correct classification was found, then the spectrum was analyzed with SIMCA 2 at reduced wavenumbers for the classification of LF as well as BF spectra. As a result, we provide a time- and cost-efficient method based on FT-IR spectroscopy for processing and identifying large numbers of environmental slag samples.

  12. Quantifying Cartilage Contact Modulus, Tension Modulus, and Permeability With Hertzian Biphasic Creep

    PubMed Central

    Moore, A. C.; DeLucca, J. F.; Elliott, D. M.; Burris, D. L.

    2016-01-01

    This paper describes a new method, based on a recent analytical model (Hertzian biphasic theory (HBT)), to simultaneously quantify cartilage contact modulus, tension modulus, and permeability. Standard Hertzian creep measurements were performed on 13 osteochondral samples from three mature bovine stifles. Each creep dataset was fit for material properties using HBT. A subset of the dataset (N = 4) was also fit using Oyen's method and FEBio, an open-source finite element package designed for soft tissue mechanics. The HBT method demonstrated statistically significant sensitivity to differences between cartilage from the tibial plateau and cartilage from the femoral condyle. Based on the four samples used for comparison, no statistically significant differences were detected between properties from the HBT and FEBio methods. While the finite element method is considered the gold standard for analyzing this type of contact, the expertise and time required to setup and solve can be prohibitive, especially for large datasets. The HBT method agreed quantitatively with FEBio but also offers ease of use by nonexperts, rapid solutions, and exceptional fit quality (R2 = 0.999 ± 0.001, N = 13). PMID:27536012

  13. Mapping of epistatic quantitative trait loci in four-way crosses.

    PubMed

    He, Xiao-Hong; Qin, Hongde; Hu, Zhongli; Zhang, Tianzhen; Zhang, Yuan-Ming

    2011-01-01

    Four-way crosses (4WC) involving four different inbred lines often appear in plant and animal commercial breeding programs. Direct mapping of quantitative trait loci (QTL) in these commercial populations is both economical and practical. However, the existing statistical methods for mapping QTL in a 4WC population are built on the single-QTL genetic model. This simple genetic model fails to take into account QTL interactions, which play an important role in the genetic architecture of complex traits. In this paper, therefore, we attempted to develop a statistical method to detect epistatic QTL in 4WC population. Conditional probabilities of QTL genotypes, computed by the multi-point single locus method, were used to sample the genotypes of all putative QTL in the entire genome. The sampled genotypes were used to construct the design matrix for QTL effects. All QTL effects, including main and epistatic effects, were simultaneously estimated by the penalized maximum likelihood method. The proposed method was confirmed by a series of Monte Carlo simulation studies and real data analysis of cotton. The new method will provide novel tools for the genetic dissection of complex traits, construction of QTL networks, and analysis of heterosis.

  14. Assessment of variations in thermal cycle life data of thermal barrier coated rods

    NASA Astrophysics Data System (ADS)

    Hendricks, R. C.; McDonald, G.

    An analysis of thermal cycle life data for 22 thermal barrier coated (TBC) specimens was conducted. The Zr02-8Y203/NiCrAlY plasma spray coated Rene 41 rods were tested in a Mach 0.3 Jet A/air burner flame. All specimens were subjected to the same coating and subsequent test procedures in an effort to control three parametric groups; material properties, geometry and heat flux. Statistically, the data sample space had a mean of 1330 cycles with a standard deviation of 520 cycles. The data were described by normal or log-normal distributions, but other models could also apply; the sample size must be increased to clearly delineate a statistical failure model. The statistical methods were also applied to adhesive/cohesive strength data for 20 TBC discs of the same composition, with similar results. The sample space had a mean of 9 MPa with a standard deviation of 4.2 MPa.

  15. Assessment of variations in thermal cycle life data of thermal barrier coated rods

    NASA Technical Reports Server (NTRS)

    Hendricks, R. C.; Mcdonald, G.

    1981-01-01

    An analysis of thermal cycle life data for 22 thermal barrier coated (TBC) specimens was conducted. The Zr02-8Y203/NiCrAlY plasma spray coated Rene 41 rods were tested in a Mach 0.3 Jet A/air burner flame. All specimens were subjected to the same coating and subsequent test procedures in an effort to control three parametric groups; material properties, geometry and heat flux. Statistically, the data sample space had a mean of 1330 cycles with a standard deviation of 520 cycles. The data were described by normal or log-normal distributions, but other models could also apply; the sample size must be increased to clearly delineate a statistical failure model. The statistical methods were also applied to adhesive/cohesive strength data for 20 TBC discs of the same composition, with similar results. The sample space had a mean of 9 MPa with a standard deviation of 4.2 MPa.

  16. Evaluation of Lysis Methods for the Extraction of Bacterial DNA for Analysis of the Vaginal Microbiota.

    PubMed

    Gill, Christina; van de Wijgert, Janneke H H M; Blow, Frances; Darby, Alistair C

    2016-01-01

    Recent studies on the vaginal microbiota have employed molecular techniques such as 16S rRNA gene sequencing to describe the bacterial community as a whole. These techniques require the lysis of bacterial cells to release DNA before purification and PCR amplification of the 16S rRNA gene. Currently, methods for the lysis of bacterial cells are not standardised and there is potential for introducing bias into the results if some bacterial species are lysed less efficiently than others. This study aimed to compare the results of vaginal microbiota profiling using four different pretreatment methods for the lysis of bacterial samples (30 min of lysis with lysozyme, 16 hours of lysis with lysozyme, 60 min of lysis with a mixture of lysozyme, mutanolysin and lysostaphin and 30 min of lysis with lysozyme followed by bead beating) prior to chemical and enzyme-based DNA extraction with a commercial kit. After extraction, DNA yield did not significantly differ between methods with the exception of lysis with lysozyme combined with bead beating which produced significantly lower yields when compared to lysis with the enzyme cocktail or 30 min lysis with lysozyme only. However, this did not result in a statistically significant difference in the observed alpha diversity of samples. The beta diversity (Bray-Curtis dissimilarity) between different lysis methods was statistically significantly different, but this difference was small compared to differences between samples, and did not affect the grouping of samples with similar vaginal bacterial community structure by hierarchical clustering. An understanding of how laboratory methods affect the results of microbiota studies is vital in order to accurately interpret the results and make valid comparisons between studies. Our results indicate that the choice of lysis method does not prevent the detection of effects relating to the type of vaginal bacterial community one of the main outcome measures of epidemiological studies. However, we recommend that the same method is used on all samples within a particular study.

  17. TNO/Centaurs grouping tested with asteroid data sets

    NASA Astrophysics Data System (ADS)

    Fulchignoni, M.; Birlan, M.; Barucci, M. A.

    2001-11-01

    Recently, we have discussed the possible subdivision in few groups of a sample of 22 TNO and Centaurs for which the BVRIJ photometry were available (Barucci et al., 2001, A&A, 371,1150). We obtained this results using the multivariate statistics adopted to define the current asteroid taxonomy, namely the Principal Components Analysis and the G-mode method (Tholen & Barucci, 1989, in ASTEROIDS II). How these methods work with a very small statistical sample as the TNO/Centaurs one? Theoretically, the number of degrees of freedom of the sample is correct. In fact it is 88 in our case and have to be larger then 50 to cope with the requirements of the G-mode. Does the random sampling of the small number of members of a large population contain enough information to reveal some structure in the population? We extracted several samples of 22 asteroids out of a data-base of 86 objects of known taxonomic type for which BVRIJ photometry is available from ECAS (Zellner et al. 1985, ICARUS 61, 355), SMASS II (S.W. Bus, 1999, PhD Thesis, MIT), and the Bell et al. Atlas of the asteroid infrared spectra. The objects constituting the first sample were selected in order to give a good representation of the major asteroid taxonomic classes (at least three samples each class): C,S,D,A, and G. Both methods were able to distinguish all these groups confirming the validity of the adopted methods. The S class is hard to individuate as a consequence of the choice of I and J variables, which imply a lack of information on the absorption band at 1 micron. The other samples were obtained by random choice of the objects. Not all the major groups were well represented (less than three samples per groups), but the general trend of the asteroid taxonomy has been always obtained. We conclude that the quoted grouping of TNO/Centaurs is representative of some physico-chemical structure of the outer solar system small body population.

  18. Differentiating gold nanorod samples using particle size and shape distributions from transmission electron microscope images

    NASA Astrophysics Data System (ADS)

    Grulke, Eric A.; Wu, Xiaochun; Ji, Yinglu; Buhr, Egbert; Yamamoto, Kazuhiro; Song, Nam Woong; Stefaniak, Aleksandr B.; Schwegler-Berry, Diane; Burchett, Woodrow W.; Lambert, Joshua; Stromberg, Arnold J.

    2018-04-01

    Size and shape distributions of gold nanorod samples are critical to their physico-chemical properties, especially their longitudinal surface plasmon resonance. This interlaboratory comparison study developed methods for measuring and evaluating size and shape distributions for gold nanorod samples using transmission electron microscopy (TEM) images. The objective was to determine whether two different samples, which had different performance attributes in their application, were different with respect to their size and/or shape descriptor distributions. Touching particles in the captured images were identified using a ruggedness shape descriptor. Nanorods could be distinguished from nanocubes using an elongational shape descriptor. A non-parametric statistical test showed that cumulative distributions of an elongational shape descriptor, that is, the aspect ratio, were statistically different between the two samples for all laboratories. While the scale parameters of size and shape distributions were similar for both samples, the width parameters of size and shape distributions were statistically different. This protocol fulfills an important need for a standardized approach to measure gold nanorod size and shape distributions for applications in which quantitative measurements and comparisons are important. Furthermore, the validated protocol workflow can be automated, thus providing consistent and rapid measurements of nanorod size and shape distributions for researchers, regulatory agencies, and industry.

  19. A Current Application of the Methods of Secular and Statistical Parallax

    NASA Astrophysics Data System (ADS)

    Turner, D. G.

    The methods of secular and statistical parallax for homogeneous groups of Galactic stars are applied in a practical (classroom) exercise to establish the luminosity of bright B3 V stars. The solar motion of 20 km s-1 relative to group stars exceeds their random velocities of ±10 km s-1, a condition adopted for preference of secular parallax to statistical parallax. The group parallax of πups = 5.81 ± 0.83 mas and derived luminosity MV = -0.98 ± 0.33 for B3 V stars from upsilon components of proper motion should be close to the true value. The weighted mean Hipparcos parallax of ±Hip = 5.75±0.27 mas for the same sample, and implied luminosity of MV = -1.00 ± 0.15, confirm the secular parallax solution. Both solutions are close to MV = -0.83 for ZAMS stars of the same type, implying that Malmquist bias in the selection of stars mainly accounts for the presence of unresolved binaries, slightly evolved objects, and rapidly rotating stars in the sample.

  20. Improved sample preparation of glyphosate and methylphosphonic acid by EPA method 6800A and time-of-flight mass spectrometry using novel solid-phase extraction.

    PubMed

    Wagner, Rebecca; Wetzel, Stephanie J; Kern, John; Kingston, H M Skip

    2012-02-01

    The employment of chemical weapons by rogue states and/or terrorist organizations is an ongoing concern in the United States. The quantitative analysis of nerve agents must be rapid and reliable for use in the private and public sectors. Current methods describe a tedious and time-consuming derivatization for gas chromatography-mass spectrometry and liquid chromatography in tandem with mass spectrometry. Two solid-phase extraction (SPE) techniques for the analysis of glyphosate and methylphosphonic acid are described with the utilization of isotopically enriched analytes for quantitation via atmospheric pressure chemical ionization-quadrupole time-of-flight mass spectrometry (APCI-Q-TOF-MS) that does not require derivatization. Solid-phase extraction-isotope dilution mass spectrometry (SPE-IDMS) involves pre-equilibration of a naturally occurring sample with an isotopically enriched standard. The second extraction method, i-Spike, involves loading an isotopically enriched standard onto the SPE column before the naturally occurring sample. The sample and the spike are then co-eluted from the column enabling precise and accurate quantitation via IDMS. The SPE methods in conjunction with IDMS eliminate concerns of incomplete elution, matrix and sorbent effects, and MS drift. For accurate quantitation with IDMS, the isotopic contribution of all atoms in the target molecule must be statistically taken into account. This paper describes two newly developed sample preparation techniques for the analysis of nerve agent surrogates in drinking water as well as statistical probability analysis for proper molecular IDMS. The methods described in this paper demonstrate accurate molecular IDMS using APCI-Q-TOF-MS with limits of quantitation as low as 0.400 mg/kg for glyphosate and 0.031 mg/kg for methylphosphonic acid. Copyright © 2012 John Wiley & Sons, Ltd.

  1. Probabilistic assessment method of the non-monotonic dose-responses-Part I: Methodological approach.

    PubMed

    Chevillotte, Grégoire; Bernard, Audrey; Varret, Clémence; Ballet, Pascal; Bodin, Laurent; Roudot, Alain-Claude

    2017-08-01

    More and more studies aim to characterize non-monotonic dose response curves (NMDRCs). The greatest difficulty is to assess the statistical plausibility of NMDRCs from previously conducted dose response studies. This difficulty is linked to the fact that these studies present (i) few doses tested, (ii) a low sample size per dose, and (iii) the absence of any raw data. In this study, we propose a new methodological approach to probabilistically characterize NMDRCs. The methodology is composed of three main steps: (i) sampling from summary data to cover all the possibilities that may be presented by the responses measured by dose and to obtain a new raw database, (ii) statistical analysis of each sampled dose-response curve to characterize the slopes and their signs, and (iii) characterization of these dose-response curves according to the variation of the sign in the slope. This method allows characterizing all types of dose-response curves and can be applied both to continuous data and to discrete data. The aim of this study is to present the general principle of this probabilistic method which allows to assess the non-monotonic dose responses curves, and to present some results. Copyright © 2017 Elsevier Ltd. All rights reserved.

  2. Chemometrics-based process analytical technology (PAT) tools: applications and adaptation in pharmaceutical and biopharmaceutical industries.

    PubMed

    Challa, Shruthi; Potumarthi, Ravichandra

    2013-01-01

    Process analytical technology (PAT) is used to monitor and control critical process parameters in raw materials and in-process products to maintain the critical quality attributes and build quality into the product. Process analytical technology can be successfully implemented in pharmaceutical and biopharmaceutical industries not only to impart quality into the products but also to prevent out-of-specifications and improve the productivity. PAT implementation eliminates the drawbacks of traditional methods which involves excessive sampling and facilitates rapid testing through direct sampling without any destruction of sample. However, to successfully adapt PAT tools into pharmaceutical and biopharmaceutical environment, thorough understanding of the process is needed along with mathematical and statistical tools to analyze large multidimensional spectral data generated by PAT tools. Chemometrics is a chemical discipline which incorporates both statistical and mathematical methods to obtain and analyze relevant information from PAT spectral tools. Applications of commonly used PAT tools in combination with appropriate chemometric method along with their advantages and working principle are discussed. Finally, systematic application of PAT tools in biopharmaceutical environment to control critical process parameters for achieving product quality is diagrammatically represented.

  3. Statistical approaches to account for false-positive errors in environmental DNA samples.

    PubMed

    Lahoz-Monfort, José J; Guillera-Arroita, Gurutzeta; Tingley, Reid

    2016-05-01

    Environmental DNA (eDNA) sampling is prone to both false-positive and false-negative errors. We review statistical methods to account for such errors in the analysis of eDNA data and use simulations to compare the performance of different modelling approaches. Our simulations illustrate that even low false-positive rates can produce biased estimates of occupancy and detectability. We further show that removing or classifying single PCR detections in an ad hoc manner under the suspicion that such records represent false positives, as sometimes advocated in the eDNA literature, also results in biased estimation of occupancy, detectability and false-positive rates. We advocate alternative approaches to account for false-positive errors that rely on prior information, or the collection of ancillary detection data at a subset of sites using a sampling method that is not prone to false-positive errors. We illustrate the advantages of these approaches over ad hoc classifications of detections and provide practical advice and code for fitting these models in maximum likelihood and Bayesian frameworks. Given the severe bias induced by false-negative and false-positive errors, the methods presented here should be more routinely adopted in eDNA studies. © 2015 John Wiley & Sons Ltd.

  4. Comparison of Two Methods for the Isolation of Salmonellae From Imported Foods

    PubMed Central

    Taylor, Welton I.; Hobbs, Betty C.; Smith, Muriel E.

    1964-01-01

    Two methods for the detection of salmonellae in foods were compared in 179 imported meat and egg samples. The number of positive samples and replications, and the number of strains and kinds of serotypes were statistically comparable by both the direct enrichment method of the Food Hygiene Laboratory in England, and the pre-enrichment method devised for processed foods in the United States. Boneless frozen beef, veal, and horsemeat imported from five countries for consumption in England were found to have salmonellae present in 48 of 116 (41%) samples. Dried egg products imported from three countries were observed to have salmonellae in 10 of 63 (16%) samples. The high incidence of salmonellae isolated from imported foods illustrated the existence of an international health hazard resulting from the continuous introduction of exogenous strains of pathogenic microorganisms on a large scale. PMID:14106941

  5. Statistical CT noise reduction with multiscale decomposition and penalized weighted least squares in the projection domain

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tang Shaojie; Tang Xiangyang; School of Automation, Xi'an University of Posts and Telecommunications, Xi'an, Shaanxi 710121

    2012-09-15

    Purposes: The suppression of noise in x-ray computed tomography (CT) imaging is of clinical relevance for diagnostic image quality and the potential for radiation dose saving. Toward this purpose, statistical noise reduction methods in either the image or projection domain have been proposed, which employ a multiscale decomposition to enhance the performance of noise suppression while maintaining image sharpness. Recognizing the advantages of noise suppression in the projection domain, the authors propose a projection domain multiscale penalized weighted least squares (PWLS) method, in which the angular sampling rate is explicitly taken into consideration to account for the possible variation ofmore » interview sampling rate in advanced clinical or preclinical applications. Methods: The projection domain multiscale PWLS method is derived by converting an isotropic diffusion partial differential equation in the image domain into the projection domain, wherein a multiscale decomposition is carried out. With adoption of the Markov random field or soft thresholding objective function, the projection domain multiscale PWLS method deals with noise at each scale. To compensate for the degradation in image sharpness caused by the projection domain multiscale PWLS method, an edge enhancement is carried out following the noise reduction. The performance of the proposed method is experimentally evaluated and verified using the projection data simulated by computer and acquired by a CT scanner. Results: The preliminary results show that the proposed projection domain multiscale PWLS method outperforms the projection domain single-scale PWLS method and the image domain multiscale anisotropic diffusion method in noise reduction. In addition, the proposed method can preserve image sharpness very well while the occurrence of 'salt-and-pepper' noise and mosaic artifacts can be avoided. Conclusions: Since the interview sampling rate is taken into account in the projection domain multiscale decomposition, the proposed method is anticipated to be useful in advanced clinical and preclinical applications where the interview sampling rate varies.« less

  6. Design, analysis, and interpretation of field quality-control data for water-sampling projects

    USGS Publications Warehouse

    Mueller, David K.; Schertz, Terry L.; Martin, Jeffrey D.; Sandstrom, Mark W.

    2015-01-01

    The report provides extensive information about statistical methods used to analyze quality-control data in order to estimate potential bias and variability in environmental data. These methods include construction of confidence intervals on various statistical measures, such as the mean, percentiles and percentages, and standard deviation. The methods are used to compare quality-control results with the larger set of environmental data in order to determine whether the effects of bias and variability might interfere with interpretation of these data. Examples from published reports are presented to illustrate how the methods are applied, how bias and variability are reported, and how the interpretation of environmental data can be qualified based on the quality-control analysis.

  7. Is using multiple imputation better than complete case analysis for estimating a prevalence (risk) difference in randomized controlled trials when binary outcome observations are missing?

    PubMed

    Mukaka, Mavuto; White, Sarah A; Terlouw, Dianne J; Mwapasa, Victor; Kalilani-Phiri, Linda; Faragher, E Brian

    2016-07-22

    Missing outcomes can seriously impair the ability to make correct inferences from randomized controlled trials (RCTs). Complete case (CC) analysis is commonly used, but it reduces sample size and is perceived to lead to reduced statistical efficiency of estimates while increasing the potential for bias. As multiple imputation (MI) methods preserve sample size, they are generally viewed as the preferred analytical approach. We examined this assumption, comparing the performance of CC and MI methods to determine risk difference (RD) estimates in the presence of missing binary outcomes. We conducted simulation studies of 5000 simulated data sets with 50 imputations of RCTs with one primary follow-up endpoint at different underlying levels of RD (3-25 %) and missing outcomes (5-30 %). For missing at random (MAR) or missing completely at random (MCAR) outcomes, CC method estimates generally remained unbiased and achieved precision similar to or better than MI methods, and high statistical coverage. Missing not at random (MNAR) scenarios yielded invalid inferences with both methods. Effect size estimate bias was reduced in MI methods by always including group membership even if this was unrelated to missingness. Surprisingly, under MAR and MCAR conditions in the assessed scenarios, MI offered no statistical advantage over CC methods. While MI must inherently accompany CC methods for intention-to-treat analyses, these findings endorse CC methods for per protocol risk difference analyses in these conditions. These findings provide an argument for the use of the CC approach to always complement MI analyses, with the usual caveat that the validity of the mechanism for missingness be thoroughly discussed. More importantly, researchers should strive to collect as much data as possible.

  8. Inverse probability weighting and doubly robust methods in correcting the effects of non-response in the reimbursed medication and self-reported turnout estimates in the ATH survey.

    PubMed

    Härkänen, Tommi; Kaikkonen, Risto; Virtala, Esa; Koskinen, Seppo

    2014-11-06

    To assess the nonresponse rates in a questionnaire survey with respect to administrative register data, and to correct the bias statistically. The Finnish Regional Health and Well-being Study (ATH) in 2010 was based on a national sample and several regional samples. Missing data analysis was based on socio-demographic register data covering the whole sample. Inverse probability weighting (IPW) and doubly robust (DR) methods were estimated using the logistic regression model, which was selected using the Bayesian information criteria. The crude, weighted and true self-reported turnout in the 2008 municipal election and prevalences of entitlements to specially reimbursed medication, and the crude and weighted body mass index (BMI) means were compared. The IPW method appeared to remove a relatively large proportion of the bias compared to the crude prevalence estimates of the turnout and the entitlements to specially reimbursed medication. Several demographic factors were shown to be associated with missing data, but few interactions were found. Our results suggest that the IPW method can improve the accuracy of results of a population survey, and the model selection provides insight into the structure of missing data. However, health-related missing data mechanisms are beyond the scope of statistical methods, which mainly rely on socio-demographic information to correct the results.

  9. Increased recovery of touch DNA evidence using FTA paper compared to conventional collection methods.

    PubMed

    Kirgiz, Irina A; Calloway, Cassandra

    2017-04-01

    Tape lifting and FTA paper scraping methods were directly compared to traditional double swabbing for collecting touch DNA from car steering wheels (n = 70 cars). Touch DNA was collected from the left or right side of each steering wheel (randomized) using two sterile cotton swabs, while the other side was sampled using water-soluble tape or FTA paper cards. DNA was extracted and quantified in duplicate using qPCR. Quantifiable amounts of DNA were detected for 100% of the samples (n = 140) collected independent of the method. However, the DNA collection yield was dependent on the collection method. A statistically significant difference in DNA yield was observed between FTA scraping and double swabbing methods (p = 0.0051), with FTA paper collecting a two-fold higher amount. Statistical analysis showed no significant difference in DNA yields between the double swabbing and tape lifting techniques (p = 0.21). Based on the DNA concentration required for 1 ng input, 47% of the samples collected using FTA paper would be expected to yield a short tandem repeat (STR) profile compared to 30% and 23% using double swabbing or tape, respectively. Further, 55% and 77% of the samples collected using double swabbing or tape, respectively, did not yield a high enough DNA concentration for the 0.5 ng of DNA input recommended for conventional STR kits and would be expected to result in a partial or no profile compared to 35% of the samples collected using FTA paper. STR analysis was conducted for a subset of the higher concentrated samples to confirm that the DNA collected from the steering wheel was from the driver. 32 samples were selected with DNA amounts of at least 1 ng total DNA (100 pg/μl when concentrated if required). A mixed STR profile was observed for 26 samples (88%) and the last driver was the major DNA contributor for 29 samples (94%). For one sample, the last driver was the minor DNA contributor. A full STR profile of the last driver was observed for 21 samples (69%) and a partial profile was observed for nine samples (25%); STR analysis failed for two samples collected using tape (6%). In conclusion, we show that the FTA paper scraping method has the potential to collect higher DNA yields from touch DNA evidence deposited on non-porous surfaces often encountered in criminal cases compared to conventional methods. Copyright © 2017 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.

  10. Identification of abnormal accident patterns at intersections

    DOT National Transportation Integrated Search

    1999-08-01

    This report presents the findings and recommendations based on the Identification of Abnormal Accident Patterns at Intersections. This project used a statistically valid sampling method to determine whether a specific intersection has an abnormally h...

  11. Individual snag detection using neighborhood attribute filtered airborne lidar data

    Treesearch

    Brian M. Wing; Martin W. Ritchie; Kevin Boston; Warren B. Cohen; Michael J. Olsen

    2015-01-01

    The ability to estimate and monitor standing dead trees (snags) has been difficult due to their irregular and sparse distribution, often requiring intensive sampling methods to obtain statistically significant estimates. This study presents a new method for estimating and monitoring snags using neighborhood attribute filtered airborne discrete-return lidar data. The...

  12. Random Numbers and Monte Carlo Methods

    NASA Astrophysics Data System (ADS)

    Scherer, Philipp O. J.

    Many-body problems often involve the calculation of integrals of very high dimension which cannot be treated by standard methods. For the calculation of thermodynamic averages Monte Carlo methods are very useful which sample the integration volume at randomly chosen points. After summarizing some basic statistics, we discuss algorithms for the generation of pseudo-random numbers with given probability distribution which are essential for all Monte Carlo methods. We show how the efficiency of Monte Carlo integration can be improved by sampling preferentially the important configurations. Finally the famous Metropolis algorithm is applied to classical many-particle systems. Computer experiments visualize the central limit theorem and apply the Metropolis method to the traveling salesman problem.

  13. Statistical Analysis of Adaptive Beam-Forming Methods

    DTIC Science & Technology

    1988-05-01

    minimum amount of computing resources? * What are the tradeoffs being made when a system design selects block averaging over exponential averaging? Will...understood by many signal processing practitioners, however, is how system parameters and the number of sensors effect the distribution of the... system performance improve and if so by how much? b " It is well known that the noise sampled at adjacent sensors is not statistically independent

  14. Challenges of Big Data Analysis.

    PubMed

    Fan, Jianqing; Han, Fang; Liu, Han

    2014-06-01

    Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article gives overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasize on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions.

  15. Significance levels for studies with correlated test statistics.

    PubMed

    Shi, Jianxin; Levinson, Douglas F; Whittemore, Alice S

    2008-07-01

    When testing large numbers of null hypotheses, one needs to assess the evidence against the global null hypothesis that none of the hypotheses is false. Such evidence typically is based on the test statistic of the largest magnitude, whose statistical significance is evaluated by permuting the sample units to simulate its null distribution. Efron (2007) has noted that correlation among the test statistics can induce substantial interstudy variation in the shapes of their histograms, which may cause misleading tail counts. Here, we show that permutation-based estimates of the overall significance level also can be misleading when the test statistics are correlated. We propose that such estimates be conditioned on a simple measure of the spread of the observed histogram, and we provide a method for obtaining conditional significance levels. We justify this conditioning using the conditionality principle described by Cox and Hinkley (1974). Application of the method to gene expression data illustrates the circumstances when conditional significance levels are needed.

  16. Challenges of Big Data Analysis

    PubMed Central

    Fan, Jianqing; Han, Fang; Liu, Han

    2014-01-01

    Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article gives overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasize on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions. PMID:25419469

  17. A proposed method to minimize waste from institutional radiation safety surveillance programs through the application of expected value statistics.

    PubMed

    Emery, R J

    1997-03-01

    Institutional radiation safety programs routinely use wipe test sampling and liquid scintillation counting analysis to indicate the presence of removable radioactive contamination. Significant volumes of liquid waste can be generated by such surveillance activities, and the subsequent disposal of these materials can sometimes be difficult and costly. In settings where large numbers of negative results are regularly obtained, the limited grouping of samples for analysis based on expected value statistical techniques is possible. To demonstrate the plausibility of the approach, single wipe samples exposed to varying amounts of contamination were analyzed concurrently with nine non-contaminated samples. Although the sample grouping inevitably leads to increased quenching with liquid scintillation counting systems, the effect did not impact the ability to detect removable contamination in amounts well below recommended action levels. Opportunities to further improve this cost effective semi-quantitative screening procedure are described, including improvements in sample collection procedures, enhancing sample-counting media contact through mixing and extending elution periods, increasing sample counting times, and adjusting institutional action levels.

  18. Comparison of the effects of filtration and preservation methods on analyses for strontium-90 in ground water

    USGS Publications Warehouse

    Knobel, L.L.; DeWayne, Cecil L.; Wegner, S.J.; Moore, L.L.

    1992-01-01

    From 1952 to 1988, about 140 curies of strontium-90 were discharged in liquid waste to disposal ponds and wells at the INEL (Idaho National Engineering Laboratory). Water from four wells was sampled as part of the U.S. Geological Survey's quality-assurance program to evaluate the effects of filtration and preservation methods on strontium-90 concentrations in ground water at the INEL. Water from each well was filtered through eithera 0.45- or a 0.1-micrometer membrane filter; unfiltered samples also were collected. Two sets of filtered and two sets of unfiltered water samples were collected at each well. One of the two sets of water samples was field acidified. Strontium-90 concentrations ranged from below the reporting level to 52 ?? 4 picocuries per liter. Descriptive statistics were used to determine reproducibility of the analytical results for strontium-90 concentrations in water from each well. Comparisons were made with unfiltered, acidified samples at each well. Analytical results for strontium-90 concentrations in water from well 88 were not in statistical agreement between the unfiltered, acidified sample and the filtered (0.45 micrometer), acidified sample. The strontium-90 concentration for water from well 88 was less than the reporting level. For water from wells with strontium-90 concentrations at or above the reporting level, 94 percent or more of the strontium-90 is in true solution or in colloidal particles smaller than 0.1 micrometer. These results suggest that changes in filtration and preservation methods used for sample collection do not significantly affect reproducibility of strontium-90 analyses in ground water at the INEL.

  19. Comparison of the effects of filtration and preservation methods on analyses for strontium-90 in ground water.

    PubMed

    Knobel, L L; Cecil, L D; Wegner, S J; Moore, L L

    1992-01-01

    From 1952 to 1988, about 140 curies of strontium-90 were discharged in liquid waste to disposal ponds and wells at the INEL (Idaho National Engineering Laboratory). Water from four wells was sampled as part of the U.S. Geological Survey's quality-assurance program to evaluate the effects of filtration and preservation methods on strontium-90 concentrations in ground water at the INEL. Water from each well was filtered through either a 0.45- or a 0.1-micrometer membrane filter; unfiltered samples also were collected. Two sets of filtered and two sets of unfiltered water samples were collected at each well. One of the two sets of water samples was field acidified.Strontium-90 concentrations ranged from below the reporting level to 52±4 picocuries per liter. Descriptive statistics were used to determine reproducibility of the analytical results for strontium-90 concentrations in water from each well. Comparisons were made with unfiltered, acidified samples at each well. Analytical results for strontium-90 concentrations in water from well 88 were not in statistical agreement between the unfiltered, acidified sample and the filtered (0.45 micrometer), acidified sample. The strontium-90 concentration for water from well 88 was less than the reporting level.For water from wells with strontium-90 concentrations at or above the reporting level, 94 percent or more of the strontium-90 is in true solution or in colloidal particles smaller than 0.1 micrometer. These results suggest that changes in filtration and preservation methods used for sample collection do not significantly affect reproducibility of strontium-90 analyses in ground water at the INEL.

  20. Multiple Versus Single Set Validation of Multivariate Models to Avoid Mistakes.

    PubMed

    Harrington, Peter de Boves

    2018-01-02

    Validation of multivariate models is of current importance for a wide range of chemical applications. Although important, it is neglected. The common practice is to use a single external validation set for evaluation. This approach is deficient and may mislead investigators with results that are specific to the single validation set of data. In addition, no statistics are available regarding the precision of a derived figure of merit (FOM). A statistical approach using bootstrapped Latin partitions is advocated. This validation method makes an efficient use of the data because each object is used once for validation. It was reviewed a decade earlier but primarily for the optimization of chemometric models this review presents the reasons it should be used for generalized statistical validation. Average FOMs with confidence intervals are reported and powerful, matched-sample statistics may be applied for comparing models and methods. Examples demonstrate the problems with single validation sets.

  1. Testing biological liquid samples using modified m-line spectroscopy method

    NASA Astrophysics Data System (ADS)

    Augusciuk, Elzbieta; Rybiński, Grzegorz

    2005-09-01

    Non-chemical method of detection of sugar concentration in biological (animal and plant source) liquids has been investigated. Simplified set was build to show the easy way of carrying out the survey and to make easy to gather multiple measurements for error detecting and statistics. Method is suggested as easy and cheap alternative for chemical methods of measuring sugar concentration, but needing a lot effort to be made precise.

  2. Characterizing microstructural features of biomedical samples by statistical analysis of Mueller matrix images

    NASA Astrophysics Data System (ADS)

    He, Honghui; Dong, Yang; Zhou, Jialing; Ma, Hui

    2017-03-01

    As one of the salient features of light, polarization contains abundant structural and optical information of media. Recently, as a comprehensive description of polarization property, the Mueller matrix polarimetry has been applied to various biomedical studies such as cancerous tissues detections. In previous works, it has been found that the structural information encoded in the 2D Mueller matrix images can be presented by other transformed parameters with more explicit relationship to certain microstructural features. In this paper, we present a statistical analyzing method to transform the 2D Mueller matrix images into frequency distribution histograms (FDHs) and their central moments to reveal the dominant structural features of samples quantitatively. The experimental results of porcine heart, intestine, stomach, and liver tissues demonstrate that the transformation parameters and central moments based on the statistical analysis of Mueller matrix elements have simple relationships to the dominant microstructural properties of biomedical samples, including the density and orientation of fibrous structures, the depolarization power, diattenuation and absorption abilities. It is shown in this paper that the statistical analysis of 2D images of Mueller matrix elements may provide quantitative or semi-quantitative criteria for biomedical diagnosis.

  3. A Guideline to Univariate Statistical Analysis for LC/MS-Based Untargeted Metabolomics-Derived Data

    PubMed Central

    Vinaixa, Maria; Samino, Sara; Saez, Isabel; Duran, Jordi; Guinovart, Joan J.; Yanes, Oscar

    2012-01-01

    Several metabolomic software programs provide methods for peak picking, retention time alignment and quantification of metabolite features in LC/MS-based metabolomics. Statistical analysis, however, is needed in order to discover those features significantly altered between samples. By comparing the retention time and MS/MS data of a model compound to that from the altered feature of interest in the research sample, metabolites can be then unequivocally identified. This paper reports on a comprehensive overview of a workflow for statistical analysis to rank relevant metabolite features that will be selected for further MS/MS experiments. We focus on univariate data analysis applied in parallel on all detected features. Characteristics and challenges of this analysis are discussed and illustrated using four different real LC/MS untargeted metabolomic datasets. We demonstrate the influence of considering or violating mathematical assumptions on which univariate statistical test rely, using high-dimensional LC/MS datasets. Issues in data analysis such as determination of sample size, analytical variation, assumption of normality and homocedasticity, or correction for multiple testing are discussed and illustrated in the context of our four untargeted LC/MS working examples. PMID:24957762

  4. A Guideline to Univariate Statistical Analysis for LC/MS-Based Untargeted Metabolomics-Derived Data.

    PubMed

    Vinaixa, Maria; Samino, Sara; Saez, Isabel; Duran, Jordi; Guinovart, Joan J; Yanes, Oscar

    2012-10-18

    Several metabolomic software programs provide methods for peak picking, retention time alignment and quantification of metabolite features in LC/MS-based metabolomics. Statistical analysis, however, is needed in order to discover those features significantly altered between samples. By comparing the retention time and MS/MS data of a model compound to that from the altered feature of interest in the research sample, metabolites can be then unequivocally identified. This paper reports on a comprehensive overview of a workflow for statistical analysis to rank relevant metabolite features that will be selected for further MS/MS experiments. We focus on univariate data analysis applied in parallel on all detected features. Characteristics and challenges of this analysis are discussed and illustrated using four different real LC/MS untargeted metabolomic datasets. We demonstrate the influence of considering or violating mathematical assumptions on which univariate statistical test rely, using high-dimensional LC/MS datasets. Issues in data analysis such as determination of sample size, analytical variation, assumption of normality and homocedasticity, or correction for multiple testing are discussed and illustrated in the context of our four untargeted LC/MS working examples.

  5. Cocaine profiling for strategic intelligence, a cross-border project between France and Switzerland: part II. Validation of the statistical methodology for the profiling of cocaine.

    PubMed

    Lociciro, S; Esseiva, P; Hayoz, P; Dujourdy, L; Besacier, F; Margot, P

    2008-05-20

    Harmonisation and optimization of analytical and statistical methodologies were carried out between two forensic laboratories (Lausanne, Switzerland and Lyon, France) in order to provide drug intelligence for cross-border cocaine seizures. Part I dealt with the optimization of the analytical method and its robustness. This second part investigates statistical methodologies that will provide reliable comparison of cocaine seizures analysed on two different gas chromatographs interfaced with a flame ionisation detectors (GC-FIDs) in two distinct laboratories. Sixty-six statistical combinations (ten data pre-treatments followed by six different distance measurements and correlation coefficients) were applied. One pre-treatment (N+S: area of each peak is divided by its standard deviation calculated from the whole data set) followed by the Cosine or Pearson correlation coefficients were found to be the best statistical compromise for optimal discrimination of linked and non-linked samples. The centralisation of the analyses in one single laboratory is not a required condition anymore to compare samples seized in different countries. This allows collaboration, but also, jurisdictional control over data.

  6. Application of the experimental design of experiments (DoE) for the determination of organotin compounds in water samples using HS-SPME and GC-MS/MS.

    PubMed

    Coscollà, Clara; Navarro-Olivares, Santiago; Martí, Pedro; Yusà, Vicent

    2014-02-01

    When attempting to discover the important factors and then optimise a response by tuning these factors, experimental design (design of experiments, DoE) gives a powerful suite of statistical methodology. DoE identify significant factors and then optimise a response with respect to them in method development. In this work, a headspace-solid-phase micro-extraction (HS-SPME) combined with gas chromatography tandem mass spectrometry (GC-MS/MS) methodology for the simultaneous determination of six important organotin compounds namely monobutyltin (MBT), dibutyltin (DBT), tributyltin (TBT), monophenyltin (MPhT), diphenyltin (DPhT), triphenyltin (TPhT) has been optimized using a statistical design of experiments (DOE). The analytical method is based on the ethylation with NaBEt4 and simultaneous headspace-solid-phase micro-extraction of the derivative compounds followed by GC-MS/MS analysis. The main experimental parameters influencing the extraction efficiency selected for optimization were pre-incubation time, incubation temperature, agitator speed, extraction time, desorption temperature, buffer (pH, concentration and volume), headspace volume, sample salinity, preparation of standards, ultrasonic time and desorption time in the injector. The main factors (excitation voltage, excitation time, ion source temperature, isolation time and electron energy) affecting the GC-IT-MS/MS response were also optimized using the same statistical design of experiments. The proposed method presented good linearity (coefficient of determination R(2)>0.99) and repeatibilty (1-25%) for all the compounds under study. The accuracy of the method measured as the average percentage recovery of the compounds in spiked surface and marine waters was higher than 70% for all compounds studied. Finally, the optimized methodology was applied to real aqueous samples enabled the simultaneous determination of all compounds under study in surface and marine water samples obtained from Valencia region (Spain). © 2013 Elsevier B.V. All rights reserved.

  7. Representation of Probability Density Functions from Orbit Determination using the Particle Filter

    NASA Technical Reports Server (NTRS)

    Mashiku, Alinda K.; Garrison, James; Carpenter, J. Russell

    2012-01-01

    Statistical orbit determination enables us to obtain estimates of the state and the statistical information of its region of uncertainty. In order to obtain an accurate representation of the probability density function (PDF) that incorporates higher order statistical information, we propose the use of nonlinear estimation methods such as the Particle Filter. The Particle Filter (PF) is capable of providing a PDF representation of the state estimates whose accuracy is dependent on the number of particles or samples used. For this method to be applicable to real case scenarios, we need a way of accurately representing the PDF in a compressed manner with little information loss. Hence we propose using the Independent Component Analysis (ICA) as a non-Gaussian dimensional reduction method that is capable of maintaining higher order statistical information obtained using the PF. Methods such as the Principal Component Analysis (PCA) are based on utilizing up to second order statistics, hence will not suffice in maintaining maximum information content. Both the PCA and the ICA are applied to two scenarios that involve a highly eccentric orbit with a lower apriori uncertainty covariance and a less eccentric orbit with a higher a priori uncertainty covariance, to illustrate the capability of the ICA in relation to the PCA.

  8. QCD Precision Measurements and Structure Function Extraction at a High Statistics, High Energy Neutrino Scattering Experiment:. NuSOnG

    NASA Astrophysics Data System (ADS)

    Adams, T.; Batra, P.; Bugel, L.; Camilleri, L.; Conrad, J. M.; de Gouvêa, A.; Fisher, P. H.; Formaggio, J. A.; Jenkins, J.; Karagiorgi, G.; Kobilarcik, T. R.; Kopp, S.; Kyle, G.; Loinaz, W. A.; Mason, D. A.; Milner, R.; Moore, R.; Morfín, J. G.; Nakamura, M.; Naples, D.; Nienaber, P.; Olness, F. I.; Owens, J. F.; Pate, S. F.; Pronin, A.; Seligman, W. G.; Shaevitz, M. H.; Schellman, H.; Schienbein, I.; Syphers, M. J.; Tait, T. M. P.; Takeuchi, T.; Tan, C. Y.; van de Water, R. G.; Yamamoto, R. K.; Yu, J. Y.

    We extend the physics case for a new high-energy, ultra-high statistics neutrino scattering experiment, NuSOnG (Neutrino Scattering On Glass) to address a variety of issues including precision QCD measurements, extraction of structure functions, and the derived Parton Distribution Functions (PDF's). This experiment uses a Tevatron-based neutrino beam to obtain a sample of Deep Inelastic Scattering (DIS) events which is over two orders of magnitude larger than past samples. We outline an innovative method for fitting the structure functions using a parametrized energy shift which yields reduced systematic uncertainties. High statistics measurements, in combination with improved systematics, will enable NuSOnG to perform discerning tests of fundamental Standard Model parameters as we search for deviations which may hint of "Beyond the Standard Model" physics.

  9. Calculating p-values and their significances with the Energy Test for large datasets

    NASA Astrophysics Data System (ADS)

    Barter, W.; Burr, C.; Parkes, C.

    2018-04-01

    The energy test method is a multi-dimensional test of whether two samples are consistent with arising from the same underlying population, through the calculation of a single test statistic (called the T-value). The method has recently been used in particle physics to search for samples that differ due to CP violation. The generalised extreme value function has previously been used to describe the distribution of T-values under the null hypothesis that the two samples are drawn from the same underlying population. We show that, in a simple test case, the distribution is not sufficiently well described by the generalised extreme value function. We present a new method, where the distribution of T-values under the null hypothesis when comparing two large samples can be found by scaling the distribution found when comparing small samples drawn from the same population. This method can then be used to quickly calculate the p-values associated with the results of the test.

  10. Unbiased, scalable sampling of protein loop conformations from probabilistic priors.

    PubMed

    Zhang, Yajia; Hauser, Kris

    2013-01-01

    Protein loops are flexible structures that are intimately tied to function, but understanding loop motion and generating loop conformation ensembles remain significant computational challenges. Discrete search techniques scale poorly to large loops, optimization and molecular dynamics techniques are prone to local minima, and inverse kinematics techniques can only incorporate structural preferences in adhoc fashion. This paper presents Sub-Loop Inverse Kinematics Monte Carlo (SLIKMC), a new Markov chain Monte Carlo algorithm for generating conformations of closed loops according to experimentally available, heterogeneous structural preferences. Our simulation experiments demonstrate that the method computes high-scoring conformations of large loops (>10 residues) orders of magnitude faster than standard Monte Carlo and discrete search techniques. Two new developments contribute to the scalability of the new method. First, structural preferences are specified via a probabilistic graphical model (PGM) that links conformation variables, spatial variables (e.g., atom positions), constraints and prior information in a unified framework. The method uses a sparse PGM that exploits locality of interactions between atoms and residues. Second, a novel method for sampling sub-loops is developed to generate statistically unbiased samples of probability densities restricted by loop-closure constraints. Numerical experiments confirm that SLIKMC generates conformation ensembles that are statistically consistent with specified structural preferences. Protein conformations with 100+ residues are sampled on standard PC hardware in seconds. Application to proteins involved in ion-binding demonstrate its potential as a tool for loop ensemble generation and missing structure completion.

  11. Unbiased, scalable sampling of protein loop conformations from probabilistic priors

    PubMed Central

    2013-01-01

    Background Protein loops are flexible structures that are intimately tied to function, but understanding loop motion and generating loop conformation ensembles remain significant computational challenges. Discrete search techniques scale poorly to large loops, optimization and molecular dynamics techniques are prone to local minima, and inverse kinematics techniques can only incorporate structural preferences in adhoc fashion. This paper presents Sub-Loop Inverse Kinematics Monte Carlo (SLIKMC), a new Markov chain Monte Carlo algorithm for generating conformations of closed loops according to experimentally available, heterogeneous structural preferences. Results Our simulation experiments demonstrate that the method computes high-scoring conformations of large loops (>10 residues) orders of magnitude faster than standard Monte Carlo and discrete search techniques. Two new developments contribute to the scalability of the new method. First, structural preferences are specified via a probabilistic graphical model (PGM) that links conformation variables, spatial variables (e.g., atom positions), constraints and prior information in a unified framework. The method uses a sparse PGM that exploits locality of interactions between atoms and residues. Second, a novel method for sampling sub-loops is developed to generate statistically unbiased samples of probability densities restricted by loop-closure constraints. Conclusion Numerical experiments confirm that SLIKMC generates conformation ensembles that are statistically consistent with specified structural preferences. Protein conformations with 100+ residues are sampled on standard PC hardware in seconds. Application to proteins involved in ion-binding demonstrate its potential as a tool for loop ensemble generation and missing structure completion. PMID:24565175

  12. Multivariate Analysis and Prediction of Dioxin-Furan ...

    EPA Pesticide Factsheets

    Peer Review Draft of Regional Methods Initiative Final Report Dioxins, which are bioaccumulative and environmentally persistent, pose an ongoing risk to human and ecosystem health. Fish constitute a significant source of dioxin exposure for humans and fish-eating wildlife. Current dioxin analytical methods are costly, time-consuming, and produce hazardous by-products. A Danish team developed a novel, multivariate statistical methodology based on the covariance of dioxin-furan congener Toxic Equivalences (TEQs) and fatty acid methyl esters (FAMEs) and applied it to North Atlantic Ocean fishmeal samples. The goal of the current study was to attempt to extend this Danish methodology to 77 whole and composite fish samples from three trophic groups: predator (whole largemouth bass), benthic (whole flathead and channel catfish) and forage fish (composite bluegill, pumpkinseed and green sunfish) from two dioxin contaminated rivers (Pocatalico R. and Kanawha R.) in West Virginia, USA. Multivariate statistical analyses, including, Principal Components Analysis (PCA), Hierarchical Clustering, and Partial Least Squares Regression (PLS), were used to assess the relationship between the FAMEs and TEQs in these dioxin contaminated freshwater fish from the Kanawha and Pocatalico Rivers. These three multivariate statistical methods all confirm that the pattern of Fatty Acid Methyl Esters (FAMEs) in these freshwater fish covaries with and is predictive of the WHO TE

  13. Validation of 31 of the most commonly used immunohistochemical antibodies in cytology prepared using the Cellient(®) automated cell block system.

    PubMed

    Montgomery, Eric; Gao, Chen; de Luca, Julie; Bower, Jessie; Attwood, Kristropher; Ylagan, Lourdes

    2014-12-01

    The Cellient(®) cell block system has become available as an alternative, partially automated method to create cell blocks in cytology. We sought to show a validation method for immunohistochemical (IHC) staining on the Cellient cell block system (CCB) in comparison with the formalin fixed paraffin embedded traditional cell block (TCB). Immunohistochemical staining was performed using 31 antibodies on 38 patient samples for a total of 326 slides. Split samples were processed using both methods by following the Cellient(®) manufacturer's recommendations for the Cellient cell block (CCB) and the Histogel method for preparing the traditional cell block (TCB). Interpretation was performed by three pathologists and two cytotechnologists. Immunohistochemical stains were scored as: 0/1+ (negative) and 2/3+ (positive). Inter-rater agreement for each antibody was evaluated for CCB and TCB, as well as the intra-rater agreement between TCB and CCB between observers. Interobserver staining concordance for the TCB was obtained with statistical significance (P < 0.05) in 24 of 31 antibodies. Interobserver staining concordance for the CCB was obtained with statistical significance in 27 of 31 antibodies. Intra-observer staining concordance between TCB and CCB was obtained with statistical significance in 24 of 31 antibodies tested. In conclusions, immunohistochemical stains on cytologic specimens processed by the Cellient system are reliable and concordant with stains performed on the same split samples processed via a formalin fixed-paraffin embedded (FFPE) block. The Cellient system is a welcome adjunct to cytology work-flow by producing cell block material of sufficient quality to allow the use of routine IHC. © 2014 Wiley Periodicals, Inc.

  14. Sampling for mercury at subnanogram per litre concentrations for load estimation in rivers

    USGS Publications Warehouse

    Colman, J.A.; Breault, R.F.

    2000-01-01

    Estimation of constituent loads in streams requires collection of stream samples that are representative of constituent concentrations, that is, composites of isokinetic multiple verticals collected along a stream transect. An all-Teflon isokinetic sampler (DH-81) cleaned in 75??C, 4 N HCl was tested using blank, split, and replicate samples to assess systematic and random sample contamination by mercury species. Mean mercury concentrations in field-equipment blanks were low: 0.135 ng??L-1 for total mercury (??Hg) and 0.0086 ng??L-1 for monomethyl mercury (MeHg). Mean square errors (MSE) for ??Hg and MeHg duplicate samples collected at eight sampling stations were not statistically different from MSE of samples split in the laboratory, which represent the analytical and splitting error. Low fieldblank concentrations and statistically equal duplicate- and split-sample MSE values indicate that no measurable contamination was occurring during sampling. Standard deviations associated with example mercury load estimations were four to five times larger, on a relative basis, than standard deviations calculated from duplicate samples, indicating that error of the load determination was primarily a function of the loading model used, not of sampling or analytical methods.

  15. A ricin forensic profiling approach based on a complex set of biomarkers.

    PubMed

    Fredriksson, Sten-Åke; Wunschel, David S; Lindström, Susanne Wiklund; Nilsson, Calle; Wahl, Karen; Åstot, Crister

    2018-08-15

    A forensic method for the retrospective determination of preparation methods used for illicit ricin toxin production was developed. The method was based on a complex set of biomarkers, including carbohydrates, fatty acids, seed storage proteins, in combination with data on ricin and Ricinus communis agglutinin. The analyses were performed on samples prepared from four castor bean plant (R. communis) cultivars by four different sample preparation methods (PM1-PM4) ranging from simple disintegration of the castor beans to multi-step preparation methods including different protein precipitation methods. Comprehensive analytical data was collected by use of a range of analytical methods and robust orthogonal partial least squares-discriminant analysis- models (OPLS-DA) were constructed based on the calibration set. By the use of a decision tree and two OPLS-DA models, the sample preparation methods of test set samples were determined. The model statistics of the two models were good and a 100% rate of correct predictions of the test set was achieved. Copyright © 2018 Elsevier B.V. All rights reserved.

  16. Classical Statistics and Statistical Learning in Imaging Neuroscience

    PubMed Central

    Bzdok, Danilo

    2017-01-01

    Brain-imaging research has predominantly generated insight by means of classical statistics, including regression-type analyses and null-hypothesis testing using t-test and ANOVA. Throughout recent years, statistical learning methods enjoy increasing popularity especially for applications in rich and complex data, including cross-validated out-of-sample prediction using pattern classification and sparsity-inducing regression. This concept paper discusses the implications of inferential justifications and algorithmic methodologies in common data analysis scenarios in neuroimaging. It is retraced how classical statistics and statistical learning originated from different historical contexts, build on different theoretical foundations, make different assumptions, and evaluate different outcome metrics to permit differently nuanced conclusions. The present considerations should help reduce current confusion between model-driven classical hypothesis testing and data-driven learning algorithms for investigating the brain with imaging techniques. PMID:29056896

  17. Provenance establishment of coffee using solution ICP-MS and ICP-AES.

    PubMed

    Valentin, Jenna L; Watling, R John

    2013-11-01

    Statistical interpretation of the concentrations of 59 elements, determined using solution based inductively coupled plasma mass spectrometry (ICP-MS) and inductively coupled plasma emission spectroscopy (ICP-AES), was used to establish the provenance of coffee samples from 15 countries across five continents. Data confirmed that the harvest year, degree of ripeness and whether the coffees were green or roasted had little effect on the elemental composition of the coffees. The application of linear discriminant analysis and principal component analysis of the elemental concentrations permitted up to 96.9% correct classification of the coffee samples according to their continent of origin. When samples from each continent were considered separately, up to 100% correct classification of coffee samples into their countries, and plantations of origin was achieved. This research demonstrates the potential of using elemental composition, in combination with statistical classification methods, for accurate provenance establishment of coffee. Copyright © 2013 Elsevier Ltd. All rights reserved.

  18. Optimizing Integrated Terminal Airspace Operations Under Uncertainty

    NASA Technical Reports Server (NTRS)

    Bosson, Christabelle; Xue, Min; Zelinski, Shannon

    2014-01-01

    In the terminal airspace, integrated departures and arrivals have the potential to increase operations efficiency. Recent research has developed geneticalgorithm- based schedulers for integrated arrival and departure operations under uncertainty. This paper presents an alternate method using a machine jobshop scheduling formulation to model the integrated airspace operations. A multistage stochastic programming approach is chosen to formulate the problem and candidate solutions are obtained by solving sample average approximation problems with finite sample size. Because approximate solutions are computed, the proposed algorithm incorporates the computation of statistical bounds to estimate the optimality of the candidate solutions. A proof-ofconcept study is conducted on a baseline implementation of a simple problem considering a fleet mix of 14 aircraft evolving in a model of the Los Angeles terminal airspace. A more thorough statistical analysis is also performed to evaluate the impact of the number of scenarios considered in the sampled problem. To handle extensive sampling computations, a multithreading technique is introduced.

  19. Potential, velocity, and density fields from sparse and noisy redshift-distance samples - Method

    NASA Technical Reports Server (NTRS)

    Dekel, Avishai; Bertschinger, Edmund; Faber, Sandra M.

    1990-01-01

    A method for recovering the three-dimensional potential, velocity, and density fields from large-scale redshift-distance samples is described. Galaxies are taken as tracers of the velocity field, not of the mass. The density field and the initial conditions are calculated using an iterative procedure that applies the no-vorticity assumption at an initial time and uses the Zel'dovich approximation to relate initial and final positions of particles on a grid. The method is tested using a cosmological N-body simulation 'observed' at the positions of real galaxies in a redshift-distance sample, taking into account their distance measurement errors. Malmquist bias and other systematic and statistical errors are extensively explored using both analytical techniques and Monte Carlo simulations.

  20. [A Review on the Use of Effect Size in Nursing Research].

    PubMed

    Kang, Hyuncheol; Yeon, Kyupil; Han, Sang Tae

    2015-10-01

    The purpose of this study was to introduce the main concepts of statistical testing and effect size and to provide researchers in nursing science with guidance on how to calculate the effect size for the statistical analysis methods mainly used in nursing. For t-test, analysis of variance, correlation analysis, regression analysis which are used frequently in nursing research, the generally accepted definitions of the effect size were explained. Some formulae for calculating the effect size are described with several examples in nursing research. Furthermore, the authors present the required minimum sample size for each example utilizing G*Power 3 software that is the most widely used program for calculating sample size. It is noted that statistical significance testing and effect size measurement serve different purposes, and the reliance on only one side may be misleading. Some practical guidelines are recommended for combining statistical significance testing and effect size measure in order to make more balanced decisions in quantitative analyses.

  1. Chance-corrected classification for use in discriminant analysis: Ecological applications

    USGS Publications Warehouse

    Titus, K.; Mosher, J.A.; Williams, B.K.

    1984-01-01

    A method for evaluating the classification table from a discriminant analysis is described. The statistic, kappa, is useful to ecologists in that it removes the effects of chance. It is useful even with equal group sample sizes although the need for a chance-corrected measure of prediction becomes greater with more dissimilar group sample sizes. Examples are presented.

  2. The Mediatory Role of Exercise Self-Regulation in the Relationship between Personality Traits and Anger Management of Athletes

    ERIC Educational Resources Information Center

    Shahbazzadeh, Somayeh; Beliad, Mohammad Reza

    2017-01-01

    This study investigates the mediatory role of exercise self-regulation role in the relationship between personality traits and anger management among athletes. The statistical population of this study includes all athlete students of Shar-e Ghods College, among which 260 people were selected as sample using random sampling method. In addition, the…

  3. Adjusted Wald Confidence Interval for a Difference of Binomial Proportions Based on Paired Data

    ERIC Educational Resources Information Center

    Bonett, Douglas G.; Price, Robert M.

    2012-01-01

    Adjusted Wald intervals for binomial proportions in one-sample and two-sample designs have been shown to perform about as well as the best available methods. The adjusted Wald intervals are easy to compute and have been incorporated into introductory statistics courses. An adjusted Wald interval for paired binomial proportions is proposed here and…

  4. Framework for making better predictions by directly estimating variables' predictivity.

    PubMed

    Lo, Adeline; Chernoff, Herman; Zheng, Tian; Lo, Shaw-Hwa

    2016-12-13

    We propose approaching prediction from a framework grounded in the theoretical correct prediction rate of a variable set as a parameter of interest. This framework allows us to define a measure of predictivity that enables assessing variable sets for, preferably high, predictivity. We first define the prediction rate for a variable set and consider, and ultimately reject, the naive estimator, a statistic based on the observed sample data, due to its inflated bias for moderate sample size and its sensitivity to noisy useless variables. We demonstrate that the [Formula: see text]-score of the PR method of VS yields a relatively unbiased estimate of a parameter that is not sensitive to noisy variables and is a lower bound to the parameter of interest. Thus, the PR method using the [Formula: see text]-score provides an effective approach to selecting highly predictive variables. We offer simulations and an application of the [Formula: see text]-score on real data to demonstrate the statistic's predictive performance on sample data. We conjecture that using the partition retention and [Formula: see text]-score can aid in finding variable sets with promising prediction rates; however, further research in the avenue of sample-based measures of predictivity is much desired.

  5. Sampling studies to estimate the HIV prevalence rate in female commercial sex workers.

    PubMed

    Pascom, Ana Roberta Pati; Szwarcwald, Célia Landmann; Barbosa Júnior, Aristides

    2010-01-01

    We investigated sampling methods being used to estimate the HIV prevalence rate among female commercial sex workers. The studies were classified according to the adequacy or not of the sample size to estimate HIV prevalence rate and according to the sampling method (probabilistic or convenience). We identified 75 studies that estimated the HIV prevalence rate among female sex workers. Most of the studies employed convenience samples. The sample size was not adequate to estimate HIV prevalence rate in 35 studies. The use of convenience sample limits statistical inference for the whole group. It was observed that there was an increase in the number of published studies since 2005, as well as in the number of studies that used probabilistic samples. This represents a large advance in the monitoring of risk behavior practices and HIV prevalence rate in this group.

  6. Improving the power of clinical trials of rheumatoid arthritis by using data on continuous scales when analysing response rates: an application of the augmented binary method

    PubMed Central

    Jenkins, Martin

    2016-01-01

    Objective. In clinical trials of RA, it is common to assess effectiveness using end points based upon dichotomized continuous measures of disease activity, which classify patients as responders or non-responders. Although dichotomization generally loses statistical power, there are good clinical reasons to use these end points; for example, to allow for patients receiving rescue therapy to be assigned as non-responders. We adopt a statistical technique called the augmented binary method to make better use of the information provided by these continuous measures and account for how close patients were to being responders. Methods. We adapted the augmented binary method for use in RA clinical trials. We used a previously published randomized controlled trial (Oral SyK Inhibition in Rheumatoid Arthritis-1) to assess its performance in comparison to a standard method treating patients purely as responders or non-responders. The power and error rate were investigated by sampling from this study. Results. The augmented binary method reached similar conclusions to standard analysis methods but was able to estimate the difference in response rates to a higher degree of precision. Results suggested that CI widths for ACR responder end points could be reduced by at least 15%, which could equate to reducing the sample size of a study by 29% to achieve the same statistical power. For other end points, the gain was even higher. Type I error rates were not inflated. Conclusion. The augmented binary method shows considerable promise for RA trials, making more efficient use of patient data whilst still reporting outcomes in terms of recognized response end points. PMID:27338084

  7. Robust identification of transcriptional regulatory networks using a Gibbs sampler on outlier sum statistic.

    PubMed

    Gu, Jinghua; Xuan, Jianhua; Riggins, Rebecca B; Chen, Li; Wang, Yue; Clarke, Robert

    2012-08-01

    Identification of transcriptional regulatory networks (TRNs) is of significant importance in computational biology for cancer research, providing a critical building block to unravel disease pathways. However, existing methods for TRN identification suffer from the inclusion of excessive 'noise' in microarray data and false-positives in binding data, especially when applied to human tumor-derived cell line studies. More robust methods that can counteract the imperfection of data sources are therefore needed for reliable identification of TRNs in this context. In this article, we propose to establish a link between the quality of one target gene to represent its regulator and the uncertainty of its expression to represent other target genes. Specifically, an outlier sum statistic was used to measure the aggregated evidence for regulation events between target genes and their corresponding transcription factors. A Gibbs sampling method was then developed to estimate the marginal distribution of the outlier sum statistic, hence, to uncover underlying regulatory relationships. To evaluate the effectiveness of our proposed method, we compared its performance with that of an existing sampling-based method using both simulation data and yeast cell cycle data. The experimental results show that our method consistently outperforms the competing method in different settings of signal-to-noise ratio and network topology, indicating its robustness for biological applications. Finally, we applied our method to breast cancer cell line data and demonstrated its ability to extract biologically meaningful regulatory modules related to estrogen signaling and action in breast cancer. The Gibbs sampler MATLAB package is freely available at http://www.cbil.ece.vt.edu/software.htm. xuan@vt.edu Supplementary data are available at Bioinformatics online.

  8. Robust identification of transcriptional regulatory networks using a Gibbs sampler on outlier sum statistic

    PubMed Central

    Gu, Jinghua; Xuan, Jianhua; Riggins, Rebecca B.; Chen, Li; Wang, Yue; Clarke, Robert

    2012-01-01

    Motivation: Identification of transcriptional regulatory networks (TRNs) is of significant importance in computational biology for cancer research, providing a critical building block to unravel disease pathways. However, existing methods for TRN identification suffer from the inclusion of excessive ‘noise’ in microarray data and false-positives in binding data, especially when applied to human tumor-derived cell line studies. More robust methods that can counteract the imperfection of data sources are therefore needed for reliable identification of TRNs in this context. Results: In this article, we propose to establish a link between the quality of one target gene to represent its regulator and the uncertainty of its expression to represent other target genes. Specifically, an outlier sum statistic was used to measure the aggregated evidence for regulation events between target genes and their corresponding transcription factors. A Gibbs sampling method was then developed to estimate the marginal distribution of the outlier sum statistic, hence, to uncover underlying regulatory relationships. To evaluate the effectiveness of our proposed method, we compared its performance with that of an existing sampling-based method using both simulation data and yeast cell cycle data. The experimental results show that our method consistently outperforms the competing method in different settings of signal-to-noise ratio and network topology, indicating its robustness for biological applications. Finally, we applied our method to breast cancer cell line data and demonstrated its ability to extract biologically meaningful regulatory modules related to estrogen signaling and action in breast cancer. Availability and implementation: The Gibbs sampler MATLAB package is freely available at http://www.cbil.ece.vt.edu/software.htm. Contact: xuan@vt.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22595208

  9. Plasma creatinine in dogs: intra- and inter-laboratory variation in 10 European veterinary laboratories

    PubMed Central

    2011-01-01

    Background There is substantial variation in reported reference intervals for canine plasma creatinine among veterinary laboratories, thereby influencing the clinical assessment of analytical results. The aims of the study was to determine the inter- and intra-laboratory variation in plasma creatinine among 10 veterinary laboratories, and to compare results from each laboratory with the upper limit of its reference interval. Methods Samples were collected from 10 healthy dogs, 10 dogs with expected intermediate plasma creatinine concentrations, and 10 dogs with azotemia. Overlap was observed for the first two groups. The 30 samples were divided into 3 batches and shipped in random order by postal delivery for plasma creatinine determination. Statistical testing was performed in accordance with ISO standard methodology. Results Inter- and intra-laboratory variation was clinically acceptable as plasma creatinine values for most samples were usually of the same magnitude. A few extreme outliers caused three laboratories to fail statistical testing for consistency. Laboratory sample means above or below the overall sample mean, did not unequivocally reflect high or low reference intervals in that laboratory. Conclusions In spite of close analytical results, further standardization among laboratories is warranted. The discrepant reference intervals seem to largely reflect different populations used in establishing the reference intervals, rather than analytical variation due to different laboratory methods. PMID:21477356

  10. Promotional methods used by representatives of drug companies: A prospective survey in general practice

    PubMed Central

    Schramm, Jesper; Andersen, Morten; Vach, Kirstin; Kragstrup, Jakob; Peter Kampmann, Jens; Søndergaard, Jens

    2007-01-01

    Objective To examine the extent and composition of pharmaceutical industry representatives’ marketing techniques with a particular focus on drug sampling in relation to drug age. Design A group of 47 GPs prospectively collected data on drug promotional activities during a six-month period, and a sub-sample of 10 GPs furthermore recorded the representatives’ marketing techniques in detail. Setting Primary healthcare. Subjects General practitioners in the County of Funen, Denmark. Main outcome measures. Promotional visits and corresponding marketing techniques. Results The 47 GPs recorded 1050 visits corresponding to a median of 19 (range 3 to 63) per GP in the six months. The majority of drugs promoted (52%) were marketed more than five years ago. There was a statistically significant decline in the proportion of visits where drug samples were offered with drug age, but the decline was small OR 0.97 (95% CI 0.95;0.98) per year. Leaflets (68%), suggestions on how to improve therapy for a specific patient registered with the practice (53%), drug samples (48%), and gifts (36%) were the most frequently used marketing techniques. Conclusion Drug-industry representatives use a variety of promotional methods. The tendency to hand out drug samples was statistically significantly associated with drug age, but the decline was small. PMID:17497486

  11. Raman spectroscopy-based screening of IgM positive and negative sera for dengue virus infection

    NASA Astrophysics Data System (ADS)

    Bilal, M.; Saleem, M.; Bilal, Maria; Ijaz, T.; Khan, Saranjam; Ullah, Rahat; Raza, A.; Khurram, M.; Akram, W.; Ahmed, M.

    2016-11-01

    A statistical method based on Raman spectroscopy for the screening of immunoglobulin M (IgM) in dengue virus (DENV) infected human sera is presented. In total, 108 sera samples were collected and their antibody indexes (AI) for IgM were determined through enzyme-linked immunosorbent assay (ELISA). Raman spectra of these samples were acquired using a 785 nm wavelength excitation laser. Seventy-eight Raman spectra were selected randomly and unbiasedly for the development of a statistical model using partial least square (PLS) regression, while the remaining 30 were used for testing the developed model. An R-square (r 2) value of 0.929 was determined using the leave-one-sample-out (LOO) cross validation method, showing the validity of this model. It considers all molecular changes related to IgM concentration, and describes their role in infection. A graphical user interface (GUI) platform has been developed to run a developed multivariate model for the prediction of AI of IgM for blindly tested samples, and an excellent agreement has been found between model predicted and clinically determined values. Parameters like sensitivity, specificity, accuracy, and area under receiver operator characteristic (ROC) curve for these tested samples are also reported to visualize model performance.

  12. Study design and statistical analysis of data in human population studies with the micronucleus assay.

    PubMed

    Ceppi, Marcello; Gallo, Fabio; Bonassi, Stefano

    2011-01-01

    The most common study design performed in population studies based on the micronucleus (MN) assay, is the cross-sectional study, which is largely performed to evaluate the DNA damaging effects of exposure to genotoxic agents in the workplace, in the environment, as well as from diet or lifestyle factors. Sample size is still a critical issue in the design of MN studies since most recent studies considering gene-environment interaction, often require a sample size of several hundred subjects, which is in many cases difficult to achieve. The control of confounding is another major threat to the validity of causal inference. The most popular confounders considered in population studies using MN are age, gender and smoking habit. Extensive attention is given to the assessment of effect modification, given the increasing inclusion of biomarkers of genetic susceptibility in the study design. Selected issues concerning the statistical treatment of data have been addressed in this mini-review, starting from data description, which is a critical step of statistical analysis, since it allows to detect possible errors in the dataset to be analysed and to check the validity of assumptions required for more complex analyses. Basic issues dealing with statistical analysis of biomarkers are extensively evaluated, including methods to explore the dose-response relationship among two continuous variables and inferential analysis. A critical approach to the use of parametric and non-parametric methods is presented, before addressing the issue of most suitable multivariate models to fit MN data. In the last decade, the quality of statistical analysis of MN data has certainly evolved, although even nowadays only a small number of studies apply the Poisson model, which is the most suitable method for the analysis of MN data.

  13. Survey methods for assessing land cover map accuracy

    USGS Publications Warehouse

    Nusser, S.M.; Klaas, E.E.

    2003-01-01

    The increasing availability of digital photographic materials has fueled efforts by agencies and organizations to generate land cover maps for states, regions, and the United States as a whole. Regardless of the information sources and classification methods used, land cover maps are subject to numerous sources of error. In order to understand the quality of the information contained in these maps, it is desirable to generate statistically valid estimates of accuracy rates describing misclassification errors. We explored a full sample survey framework for creating accuracy assessment study designs that balance statistical and operational considerations in relation to study objectives for a regional assessment of GAP land cover maps. We focused not only on appropriate sample designs and estimation approaches, but on aspects of the data collection process, such as gaining cooperation of land owners and using pixel clusters as an observation unit. The approach was tested in a pilot study to assess the accuracy of Iowa GAP land cover maps. A stratified two-stage cluster sampling design addressed sample size requirements for land covers and the need for geographic spread while minimizing operational effort. Recruitment methods used for private land owners yielded high response rates, minimizing a source of nonresponse error. Collecting data for a 9-pixel cluster centered on the sampled pixel was simple to implement, and provided better information on rarer vegetation classes as well as substantial gains in precision relative to observing data at a single-pixel.

  14. Statistical analysis and digital processing of the Mössbauer spectra

    NASA Astrophysics Data System (ADS)

    Prochazka, Roman; Tucek, Pavel; Tucek, Jiri; Marek, Jaroslav; Mashlan, Miroslav; Pechousek, Jiri

    2010-02-01

    This work is focused on using the statistical methods and development of the filtration procedures for signal processing in Mössbauer spectroscopy. Statistical tools for noise filtering in the measured spectra are used in many scientific areas. The use of a pure statistical approach in accumulated Mössbauer spectra filtration is described. In Mössbauer spectroscopy, the noise can be considered as a Poisson statistical process with a Gaussian distribution for high numbers of observations. This noise is a superposition of the non-resonant photons counting with electronic noise (from γ-ray detection and discrimination units), and the velocity system quality that can be characterized by the velocity nonlinearities. The possibility of a noise-reducing process using a new design of statistical filter procedure is described. This mathematical procedure improves the signal-to-noise ratio and thus makes it easier to determine the hyperfine parameters of the given Mössbauer spectra. The filter procedure is based on a periodogram method that makes it possible to assign the statistically important components in the spectral domain. The significance level for these components is then feedback-controlled using the correlation coefficient test results. The estimation of the theoretical correlation coefficient level which corresponds to the spectrum resolution is performed. Correlation coefficient test is based on comparison of the theoretical and the experimental correlation coefficients given by the Spearman method. The correctness of this solution was analyzed by a series of statistical tests and confirmed by many spectra measured with increasing statistical quality for a given sample (absorber). The effect of this filter procedure depends on the signal-to-noise ratio and the applicability of this method has binding conditions.

  15. Reporting guidance considerations from a statistical perspective: overview of tools to enhance the rigour of reporting of randomised trials and systematic reviews.

    PubMed

    Hutton, Brian; Wolfe, Dianna; Moher, David; Shamseer, Larissa

    2017-05-01

    Research waste has received considerable attention from the biomedical community. One noteworthy contributor is incomplete reporting in research publications. When detailing statistical methods and results, ensuring analytic methods and findings are completely documented improves transparency. For publications describing randomised trials and systematic reviews, guidelines have been developed to facilitate complete reporting. This overview summarises aspects of statistical reporting in trials and systematic reviews of health interventions. A narrative approach to summarise features regarding statistical methods and findings from reporting guidelines for trials and reviews was taken. We aim to enhance familiarity of statistical details that should be reported in biomedical research among statisticians and their collaborators. We summarise statistical reporting considerations for trials and systematic reviews from guidance documents including the Consolidated Standards of Reporting Trials (CONSORT) Statement for reporting of trials, the Standard Protocol Items: Recommendations for Interventional Trials (SPIRIT) Statement for trial protocols, the Statistical Analyses and Methods in the Published Literature (SAMPL) Guidelines for statistical reporting principles, the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) Statement for systematic reviews and PRISMA for Protocols (PRISMA-P). Considerations regarding sharing of study data and statistical code are also addressed. Reporting guidelines provide researchers with minimum criteria for reporting. If followed, they can enhance research transparency and contribute improve quality of biomedical publications. Authors should employ these tools for planning and reporting of their research. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.

  16. Compressive strength of human openwedges: a selection method

    NASA Astrophysics Data System (ADS)

    Follet, H.; Gotteland, M.; Bardonnet, R.; Sfarghiu, A. M.; Peyrot, J.; Rumelhart, C.

    2004-02-01

    A series of 44 samples of bone wedges of human origin, intended for allograft openwedge osteotomy and obtained without particular precautions during hip arthroplasty were re-examined. After viral inactivity chemical treatment, lyophilisation and radio-sterilisation (intended to produce optimal health safety), the compressive strength, independent of age, sex and the height of the sample (or angle of cut), proved to be too widely dispersed [ 10{-}158 MPa] in the first study. We propose a method for selecting samples which takes into account their geometry (width, length, thicknesses, cortical surface area). Statistical methods (Principal Components Analysis PCA, Hierarchical Cluster Analysis, Multilinear regression) allowed final selection of 29 samples having a mean compressive strength σ_{max} =103 MPa ± 26 and with variation [ 61{-}158 MPa] . These results are equivalent or greater than average materials currently used in openwedge osteotomy.

  17. Decision Process to Identify Lessons for Transition to a Distributed (or Blended) Learning Instructional Format

    DTIC Science & Technology

    2009-09-01

    instructional format. Using a mixed- method coding and analysis approach, the sample of POIs were categorized, coded, statistically analyzed, and a... Method SECURITY CLASSIFICATION OF 19. LIMITATION OF 20. NUMBER 21. RESPONSIBLE PERSON 16. REPORT Unclassified 17. ABSTRACT...transition to a distributed (or blended) learning format. Procedure: A mixed- methods approach, combining qualitative coding procedures with basic

  18. 20 CFR 609.14 - Payments to States.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ..., either in advance or by way of reimbursement, as may be determined by the Department, the sum that the.... An estimate may be made on the basis of a statistical, sampling, or other method agreed on by the...

  19. Comparison of methods for determination of total oil sands-derived naphthenic acids in water samples.

    PubMed

    Hughes, Sarah A; Huang, Rongfu; Mahaffey, Ashley; Chelme-Ayala, Pamela; Klamerth, Nikolaus; Meshref, Mohamed N A; Ibrahim, Mohamed D; Brown, Christine; Peru, Kerry M; Headley, John V; Gamal El-Din, Mohamed

    2017-11-01

    There are several established methods for the determination of naphthenic acids (NAs) in waters associated with oil sands mining operations. Due to their highly complex nature, measured concentration and composition of NAs vary depending on the method used. This study compared different common sample preparation techniques, analytical instrument methods, and analytical standards to measure NAs in groundwater and process water samples collected from an active oil sands operation. In general, the high- and ultrahigh-resolution methods, namely high performance liquid chromatography time-of-flight mass spectrometry (UPLC-TOF-MS) and Orbitrap mass spectrometry (Orbitrap-MS), were within an order of magnitude of the Fourier transform infrared spectroscopy (FTIR) methods. The gas chromatography mass spectrometry (GC-MS) methods consistently had the highest NA concentrations and greatest standard error. Total NAs concentration was not statistically different between sample preparation of solid phase extraction and liquid-liquid extraction. Calibration standards influenced quantitation results. This work provided a comprehensive understanding of the inherent differences in the various techniques available to measure NAs and hence the potential differences in measured amounts of NAs in samples. Results from this study will contribute to the analytical method standardization for NA analysis in oil sands related water samples. Copyright © 2017 Elsevier Ltd. All rights reserved.

  20. The kappa statistic in rehabilitation research: an examination.

    PubMed

    Tooth, Leigh R; Ottenbacher, Kenneth J

    2004-08-01

    The number and sophistication of statistical procedures reported in medical rehabilitation research is increasing. Application of the principles and methods associated with evidence-based practice has contributed to the need for rehabilitation practitioners to understand quantitative methods in published articles. Outcomes measurement and determination of reliability are areas that have experienced rapid change during the past decade. In this study, distinctions between reliability and agreement are examined. Information is presented on analytical approaches for addressing reliability and agreement with the focus on the application of the kappa statistic. The following assumptions are discussed: (1) kappa should be used with data measured on a categorical scale, (2) the patients or objects categorized should be independent, and (3) the observers or raters must make their measurement decisions and judgments independently. Several issues related to using kappa in measurement studies are described, including use of weighted kappa, methods of reporting kappa, the effect of bias and prevalence on kappa, and sample size and power requirements for kappa. The kappa statistic is useful for assessing agreement among raters, and it is being used more frequently in rehabilitation research. Correct interpretation of the kappa statistic depends on meeting the required assumptions and accurate reporting.

Top