Sample records for statistical sampling design

  1. Statistical power and optimal design in experiments in which samples of participants respond to samples of stimuli.

    PubMed

    Westfall, Jacob; Kenny, David A; Judd, Charles M

    2014-10-01

    Researchers designing experiments in which a sample of participants responds to a sample of stimuli are faced with difficult questions about optimal study design. The conventional procedures of statistical power analysis fail to provide appropriate answers to these questions because they are based on statistical models in which stimuli are not assumed to be a source of random variation in the data, models that are inappropriate for experiments involving crossed random factors of participants and stimuli. In this article, we present new methods of power analysis for designs with crossed random factors, and we give detailed, practical guidance to psychology researchers planning experiments in which a sample of participants responds to a sample of stimuli. We extensively examine 5 commonly used experimental designs, describe how to estimate statistical power in each, and provide power analysis results based on a reasonable set of default parameter values. We then develop general conclusions and formulate rules of thumb concerning the optimal design of experiments in which a sample of participants responds to a sample of stimuli. We show that in crossed designs, statistical power typically does not approach unity as the number of participants goes to infinity but instead approaches a maximum attainable power value that is possibly small, depending on the stimulus sample. We also consider the statistical merits of designs involving multiple stimulus blocks. Finally, we provide a simple and flexible Web-based power application to aid researchers in planning studies with samples of stimuli.

  2. Causality in Statistical Power: Isomorphic Properties of Measurement, Research Design, Effect Size, and Sample Size.

    PubMed

    Heidel, R Eric

    2016-01-01

    Statistical power is the ability to detect a significant effect, given that the effect actually exists in a population. Like most statistical concepts, statistical power tends to induce cognitive dissonance in hepatology researchers. However, planning for statistical power by an a priori sample size calculation is of paramount importance when designing a research study. There are five specific empirical components that make up an a priori sample size calculation: the scale of measurement of the outcome, the research design, the magnitude of the effect size, the variance of the effect size, and the sample size. A framework grounded in the phenomenon of isomorphism, or interdependencies amongst different constructs with similar forms, will be presented to understand the isomorphic effects of decisions made on each of the five aforementioned components of statistical power.

  3. DESIGNING ENVIRONMENTAL MONITORING DATABASES FOR STATISTIC ASSESSMENT

    EPA Science Inventory

    Databases designed for statistical analyses have characteristics that distinguish them from databases intended for general use. EMAP uses a probabilistic sampling design to collect data to produce statistical assessments of environmental conditions. In addition to supporting the ...

  4. Multi-Reader ROC studies with Split-Plot Designs: A Comparison of Statistical Methods

    PubMed Central

    Obuchowski, Nancy A.; Gallas, Brandon D.; Hillis, Stephen L.

    2012-01-01

    Rationale and Objectives Multi-reader imaging trials often use a factorial design, where study patients undergo testing with all imaging modalities and readers interpret the results of all tests for all patients. A drawback of the design is the large number of interpretations required of each reader. Split-plot designs have been proposed as an alternative, in which one or a subset of readers interprets all images of a sample of patients, while other readers interpret the images of other samples of patients. In this paper we compare three methods of analysis for the split-plot design. Materials and Methods Three statistical methods are presented: Obuchowski-Rockette method modified for the split-plot design, a newly proposed marginal-mean ANOVA approach, and an extension of the three-sample U-statistic method. A simulation study using the Roe-Metz model was performed to compare the type I error rate, power and confidence interval coverage of the three test statistics. Results The type I error rates for all three methods are close to the nominal level but tend to be slightly conservative. The statistical power is nearly identical for the three methods. The coverage of 95% CIs fall close to the nominal coverage for small and large sample sizes. Conclusions The split-plot MRMC study design can be statistically efficient compared with the factorial design, reducing the number of interpretations required per reader. Three methods of analysis, shown to have nominal type I error rate, similar power, and nominal CI coverage, are available for this study design. PMID:23122570

  5. Multi-reader ROC studies with split-plot designs: a comparison of statistical methods.

    PubMed

    Obuchowski, Nancy A; Gallas, Brandon D; Hillis, Stephen L

    2012-12-01

    Multireader imaging trials often use a factorial design, in which study patients undergo testing with all imaging modalities and readers interpret the results of all tests for all patients. A drawback of this design is the large number of interpretations required of each reader. Split-plot designs have been proposed as an alternative, in which one or a subset of readers interprets all images of a sample of patients, while other readers interpret the images of other samples of patients. In this paper, the authors compare three methods of analysis for the split-plot design. Three statistical methods are presented: the Obuchowski-Rockette method modified for the split-plot design, a newly proposed marginal-mean analysis-of-variance approach, and an extension of the three-sample U-statistic method. A simulation study using the Roe-Metz model was performed to compare the type I error rate, power, and confidence interval coverage of the three test statistics. The type I error rates for all three methods are close to the nominal level but tend to be slightly conservative. The statistical power is nearly identical for the three methods. The coverage of 95% confidence intervals falls close to the nominal coverage for small and large sample sizes. The split-plot multireader, multicase study design can be statistically efficient compared to the factorial design, reducing the number of interpretations required per reader. Three methods of analysis, shown to have nominal type I error rates, similar power, and nominal confidence interval coverage, are available for this study design. Copyright © 2012 AUR. All rights reserved.

  6. Design-based Sample and Probability Law-Assumed Sample: Their Role in Scientific Investigation.

    ERIC Educational Resources Information Center

    Ojeda, Mario Miguel; Sahai, Hardeo

    2002-01-01

    Discusses some key statistical concepts in probabilistic and non-probabilistic sampling to provide an overview for understanding the inference process. Suggests a statistical model constituting the basis of statistical inference and provides a brief review of the finite population descriptive inference and a quota sampling inferential theory.…

  7. Application of binomial and multinomial probability statistics to the sampling design process of a global grain tracing and recall system

    USDA-ARS?s Scientific Manuscript database

    Small, coded, pill-sized tracers embedded in grain are proposed as a method for grain traceability. A sampling process for a grain traceability system was designed and investigated by applying probability statistics using a science-based sampling approach to collect an adequate number of tracers fo...

  8. STATISTICAL SAMPLING AND DATA ANALYSIS

    EPA Science Inventory

    Research is being conducted to develop approaches to improve soil and sediment sampling techniques, measurement design and geostatistics, and data analysis via chemometric, environmetric, and robust statistical methods. Improvements in sampling contaminated soil and other hetero...

  9. Validation of Statistical Sampling Algorithms in Visual Sample Plan (VSP): Summary Report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nuffer, Lisa L; Sego, Landon H.; Wilson, John E.

    2009-02-18

    The U.S. Department of Homeland Security, Office of Technology Development (OTD) contracted with a set of U.S. Department of Energy national laboratories, including the Pacific Northwest National Laboratory (PNNL), to write a Remediation Guidance for Major Airports After a Chemical Attack. The report identifies key activities and issues that should be considered by a typical major airport following an incident involving release of a toxic chemical agent. Four experimental tasks were identified that would require further research in order to supplement the Remediation Guidance. One of the tasks, Task 4, OTD Chemical Remediation Statistical Sampling Design Validation, dealt with statisticalmore » sampling algorithm validation. This report documents the results of the sampling design validation conducted for Task 4. In 2005, the Government Accountability Office (GAO) performed a review of the past U.S. responses to Anthrax terrorist cases. Part of the motivation for this PNNL report was a major GAO finding that there was a lack of validated sampling strategies in the U.S. response to Anthrax cases. The report (GAO 2005) recommended that probability-based methods be used for sampling design in order to address confidence in the results, particularly when all sample results showed no remaining contamination. The GAO also expressed a desire that the methods be validated, which is the main purpose of this PNNL report. The objective of this study was to validate probability-based statistical sampling designs and the algorithms pertinent to within-building sampling that allow the user to prescribe or evaluate confidence levels of conclusions based on data collected as guided by the statistical sampling designs. Specifically, the designs found in the Visual Sample Plan (VSP) software were evaluated. VSP was used to calculate the number of samples and the sample location for a variety of sampling plans applied to an actual release site. Most of the sampling designs validated are probability based, meaning samples are located randomly (or on a randomly placed grid) so no bias enters into the placement of samples, and the number of samples is calculated such that IF the amount and spatial extent of contamination exceeds levels of concern, at least one of the samples would be taken from a contaminated area, at least X% of the time. Hence, "validation" of the statistical sampling algorithms is defined herein to mean ensuring that the "X%" (confidence) is actually met.« less

  10. Using GIS to generate spatially balanced random survey designs for natural resource applications.

    PubMed

    Theobald, David M; Stevens, Don L; White, Denis; Urquhart, N Scott; Olsen, Anthony R; Norman, John B

    2007-07-01

    Sampling of a population is frequently required to understand trends and patterns in natural resource management because financial and time constraints preclude a complete census. A rigorous probability-based survey design specifies where to sample so that inferences from the sample apply to the entire population. Probability survey designs should be used in natural resource and environmental management situations because they provide the mathematical foundation for statistical inference. Development of long-term monitoring designs demand survey designs that achieve statistical rigor and are efficient but remain flexible to inevitable logistical or practical constraints during field data collection. Here we describe an approach to probability-based survey design, called the Reversed Randomized Quadrant-Recursive Raster, based on the concept of spatially balanced sampling and implemented in a geographic information system. This provides environmental managers a practical tool to generate flexible and efficient survey designs for natural resource applications. Factors commonly used to modify sampling intensity, such as categories, gradients, or accessibility, can be readily incorporated into the spatially balanced sample design.

  11. Multiple category-lot quality assurance sampling: a new classification system with application to schistosomiasis control.

    PubMed

    Olives, Casey; Valadez, Joseph J; Brooker, Simon J; Pagano, Marcello

    2012-01-01

    Originally a binary classifier, Lot Quality Assurance Sampling (LQAS) has proven to be a useful tool for classification of the prevalence of Schistosoma mansoni into multiple categories (≤10%, >10 and <50%, ≥50%), and semi-curtailed sampling has been shown to effectively reduce the number of observations needed to reach a decision. To date the statistical underpinnings for Multiple Category-LQAS (MC-LQAS) have not received full treatment. We explore the analytical properties of MC-LQAS, and validate its use for the classification of S. mansoni prevalence in multiple settings in East Africa. We outline MC-LQAS design principles and formulae for operating characteristic curves. In addition, we derive the average sample number for MC-LQAS when utilizing semi-curtailed sampling and introduce curtailed sampling in this setting. We also assess the performance of MC-LQAS designs with maximum sample sizes of n=15 and n=25 via a weighted kappa-statistic using S. mansoni data collected in 388 schools from four studies in East Africa. Overall performance of MC-LQAS classification was high (kappa-statistic of 0.87). In three of the studies, the kappa-statistic for a design with n=15 was greater than 0.75. In the fourth study, where these designs performed poorly (kappa-statistic less than 0.50), the majority of observations fell in regions where potential error is known to be high. Employment of semi-curtailed and curtailed sampling further reduced the sample size by as many as 0.5 and 3.5 observations per school, respectively, without increasing classification error. This work provides the needed analytics to understand the properties of MC-LQAS for assessing the prevalance of S. mansoni and shows that in most settings a sample size of 15 children provides a reliable classification of schools.

  12. Statistical methods for efficient design of community surveys of response to noise: Random coefficients regression models

    NASA Technical Reports Server (NTRS)

    Tomberlin, T. J.

    1985-01-01

    Research studies of residents' responses to noise consist of interviews with samples of individuals who are drawn from a number of different compact study areas. The statistical techniques developed provide a basis for those sample design decisions. These techniques are suitable for a wide range of sample survey applications. A sample may consist of a random sample of residents selected from a sample of compact study areas, or in a more complex design, of a sample of residents selected from a sample of larger areas (e.g., cities). The techniques may be applied to estimates of the effects on annoyance of noise level, numbers of noise events, the time-of-day of the events, ambient noise levels, or other factors. Methods are provided for determining, in advance, how accurately these effects can be estimated for different sample sizes and study designs. Using a simple cost function, they also provide for optimum allocation of the sample across the stages of the design for estimating these effects. These techniques are developed via a regression model in which the regression coefficients are assumed to be random, with components of variance associated with the various stages of a multi-stage sample design.

  13. Community Health Centers: Providers, Patients, and Content of Care

    MedlinePlus

    ... Statistics (NCHS). NAMCS uses a multistage probability sample design involving geographic primary sampling units (PSUs), physician practices ... 05 level. To account for the complex sample design during variance estimation, all analyses were performed using ...

  14. Statistical approaches used to assess and redesign surface water-quality-monitoring networks.

    PubMed

    Khalil, B; Ouarda, T B M J

    2009-11-01

    An up-to-date review of the statistical approaches utilized for the assessment and redesign of surface water quality monitoring (WQM) networks is presented. The main technical aspects of network design are covered in four sections, addressing monitoring objectives, water quality variables, sampling frequency and spatial distribution of sampling locations. This paper discusses various monitoring objectives and related procedures used for the assessment and redesign of long-term surface WQM networks. The appropriateness of each approach for the design, contraction or expansion of monitoring networks is also discussed. For each statistical approach, its advantages and disadvantages are examined from a network design perspective. Possible methods to overcome disadvantages and deficiencies in the statistical approaches that are currently in use are recommended.

  15. Some challenges with statistical inference in adaptive designs.

    PubMed

    Hung, H M James; Wang, Sue-Jane; Yang, Peiling

    2014-01-01

    Adaptive designs have generated a great deal of attention to clinical trial communities. The literature contains many statistical methods to deal with added statistical uncertainties concerning the adaptations. Increasingly encountered in regulatory applications are adaptive statistical information designs that allow modification of sample size or related statistical information and adaptive selection designs that allow selection of doses or patient populations during the course of a clinical trial. For adaptive statistical information designs, a few statistical testing methods are mathematically equivalent, as a number of articles have stipulated, but arguably there are large differences in their practical ramifications. We pinpoint some undesirable features of these methods in this work. For adaptive selection designs, the selection based on biomarker data for testing the correlated clinical endpoints may increase statistical uncertainty in terms of type I error probability, and most importantly the increased statistical uncertainty may be impossible to assess.

  16. 7 CFR 3600.3 - Functions.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    .... agricultural and rural economy. (2) Administering a methodological research program to improve agricultural... design and data collection methodologies to the agricultural statistics program. Major functions include...) Designing, testing, and establishing survey techniques and standards, including sample design, sample...

  17. 7 CFR 3600.3 - Functions.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    .... agricultural and rural economy. (2) Administering a methodological research program to improve agricultural... design and data collection methodologies to the agricultural statistics program. Major functions include...) Designing, testing, and establishing survey techniques and standards, including sample design, sample...

  18. 7 CFR 3600.3 - Functions.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    .... agricultural and rural economy. (2) Administering a methodological research program to improve agricultural... design and data collection methodologies to the agricultural statistics program. Major functions include...) Designing, testing, and establishing survey techniques and standards, including sample design, sample...

  19. 7 CFR 3600.3 - Functions.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    .... agricultural and rural economy. (2) Administering a methodological research program to improve agricultural... design and data collection methodologies to the agricultural statistics program. Major functions include...) Designing, testing, and establishing survey techniques and standards, including sample design, sample...

  20. 7 CFR 3600.3 - Functions.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    .... agricultural and rural economy. (2) Administering a methodological research program to improve agricultural... design and data collection methodologies to the agricultural statistics program. Major functions include...) Designing, testing, and establishing survey techniques and standards, including sample design, sample...

  1. The Effect of Cluster Sampling Design in Survey Research on the Standard Error Statistic.

    ERIC Educational Resources Information Center

    Wang, Lin; Fan, Xitao

    Standard statistical methods are used to analyze data that is assumed to be collected using a simple random sampling scheme. These methods, however, tend to underestimate variance when the data is collected with a cluster design, which is often found in educational survey research. The purposes of this paper are to demonstrate how a cluster design…

  2. A statistically rigorous sampling design to integrate avian monitoring and management within Bird Conservation Regions.

    PubMed

    Pavlacky, David C; Lukacs, Paul M; Blakesley, Jennifer A; Skorkowsky, Robert C; Klute, David S; Hahn, Beth A; Dreitz, Victoria J; George, T Luke; Hanni, David J

    2017-01-01

    Monitoring is an essential component of wildlife management and conservation. However, the usefulness of monitoring data is often undermined by the lack of 1) coordination across organizations and regions, 2) meaningful management and conservation objectives, and 3) rigorous sampling designs. Although many improvements to avian monitoring have been discussed, the recommendations have been slow to emerge in large-scale programs. We introduce the Integrated Monitoring in Bird Conservation Regions (IMBCR) program designed to overcome the above limitations. Our objectives are to outline the development of a statistically defensible sampling design to increase the value of large-scale monitoring data and provide example applications to demonstrate the ability of the design to meet multiple conservation and management objectives. We outline the sampling process for the IMBCR program with a focus on the Badlands and Prairies Bird Conservation Region (BCR 17). We provide two examples for the Brewer's sparrow (Spizella breweri) in BCR 17 demonstrating the ability of the design to 1) determine hierarchical population responses to landscape change and 2) estimate hierarchical habitat relationships to predict the response of the Brewer's sparrow to conservation efforts at multiple spatial scales. The collaboration across organizations and regions provided economy of scale by leveraging a common data platform over large spatial scales to promote the efficient use of monitoring resources. We designed the IMBCR program to address the information needs and core conservation and management objectives of the participating partner organizations. Although it has been argued that probabilistic sampling designs are not practical for large-scale monitoring, the IMBCR program provides a precedent for implementing a statistically defensible sampling design from local to bioregional scales. We demonstrate that integrating conservation and management objectives with rigorous statistical design and analyses ensures reliable knowledge about bird populations that is relevant and integral to bird conservation at multiple scales.

  3. A statistically rigorous sampling design to integrate avian monitoring and management within Bird Conservation Regions

    PubMed Central

    Hahn, Beth A.; Dreitz, Victoria J.; George, T. Luke

    2017-01-01

    Monitoring is an essential component of wildlife management and conservation. However, the usefulness of monitoring data is often undermined by the lack of 1) coordination across organizations and regions, 2) meaningful management and conservation objectives, and 3) rigorous sampling designs. Although many improvements to avian monitoring have been discussed, the recommendations have been slow to emerge in large-scale programs. We introduce the Integrated Monitoring in Bird Conservation Regions (IMBCR) program designed to overcome the above limitations. Our objectives are to outline the development of a statistically defensible sampling design to increase the value of large-scale monitoring data and provide example applications to demonstrate the ability of the design to meet multiple conservation and management objectives. We outline the sampling process for the IMBCR program with a focus on the Badlands and Prairies Bird Conservation Region (BCR 17). We provide two examples for the Brewer’s sparrow (Spizella breweri) in BCR 17 demonstrating the ability of the design to 1) determine hierarchical population responses to landscape change and 2) estimate hierarchical habitat relationships to predict the response of the Brewer’s sparrow to conservation efforts at multiple spatial scales. The collaboration across organizations and regions provided economy of scale by leveraging a common data platform over large spatial scales to promote the efficient use of monitoring resources. We designed the IMBCR program to address the information needs and core conservation and management objectives of the participating partner organizations. Although it has been argued that probabilistic sampling designs are not practical for large-scale monitoring, the IMBCR program provides a precedent for implementing a statistically defensible sampling design from local to bioregional scales. We demonstrate that integrating conservation and management objectives with rigorous statistical design and analyses ensures reliable knowledge about bird populations that is relevant and integral to bird conservation at multiple scales. PMID:29065128

  4. Visual Sample Plan Version 7.0 User's Guide

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Matzke, Brett D.; Newburn, Lisa LN; Hathaway, John E.

    2014-03-01

    User's guide for VSP 7.0 This user's guide describes Visual Sample Plan (VSP) Version 7.0 and provides instructions for using the software. VSP selects the appropriate number and location of environmental samples to ensure that the results of statistical tests performed to provide input to risk decisions have the required confidence and performance. VSP Version 7.0 provides sample-size equations or algorithms needed by specific statistical tests appropriate for specific environmental sampling objectives. It also provides data quality assessment and statistical analysis functions to support evaluation of the data and determine whether the data support decisions regarding sites suspected of contamination.more » The easy-to-use program is highly visual and graphic. VSP runs on personal computers with Microsoft Windows operating systems (XP, Vista, Windows 7, and Windows 8). Designed primarily for project managers and users without expertise in statistics, VSP is applicable to two- and three-dimensional populations to be sampled (e.g., rooms and buildings, surface soil, a defined layer of subsurface soil, water bodies, and other similar applications) for studies of environmental quality. VSP is also applicable for designing sampling plans for assessing chem/rad/bio threat and hazard identification within rooms and buildings, and for designing geophysical surveys for unexploded ordnance (UXO) identification.« less

  5. Methods for flexible sample-size design in clinical trials: Likelihood, weighted, dual test, and promising zone approaches.

    PubMed

    Shih, Weichung Joe; Li, Gang; Wang, Yining

    2016-03-01

    Sample size plays a crucial role in clinical trials. Flexible sample-size designs, as part of the more general category of adaptive designs that utilize interim data, have been a popular topic in recent years. In this paper, we give a comparative review of four related methods for such a design. The likelihood method uses the likelihood ratio test with an adjusted critical value. The weighted method adjusts the test statistic with given weights rather than the critical value. The dual test method requires both the likelihood ratio statistic and the weighted statistic to be greater than the unadjusted critical value. The promising zone approach uses the likelihood ratio statistic with the unadjusted value and other constraints. All four methods preserve the type-I error rate. In this paper we explore their properties and compare their relationships and merits. We show that the sample size rules for the dual test are in conflict with the rules of the promising zone approach. We delineate what is necessary to specify in the study protocol to ensure the validity of the statistical procedure and what can be kept implicit in the protocol so that more flexibility can be attained for confirmatory phase III trials in meeting regulatory requirements. We also prove that under mild conditions, the likelihood ratio test still preserves the type-I error rate when the actual sample size is larger than the re-calculated one. Copyright © 2015 Elsevier Inc. All rights reserved.

  6. Multiple Category-Lot Quality Assurance Sampling: A New Classification System with Application to Schistosomiasis Control

    PubMed Central

    Olives, Casey; Valadez, Joseph J.; Brooker, Simon J.; Pagano, Marcello

    2012-01-01

    Background Originally a binary classifier, Lot Quality Assurance Sampling (LQAS) has proven to be a useful tool for classification of the prevalence of Schistosoma mansoni into multiple categories (≤10%, >10 and <50%, ≥50%), and semi-curtailed sampling has been shown to effectively reduce the number of observations needed to reach a decision. To date the statistical underpinnings for Multiple Category-LQAS (MC-LQAS) have not received full treatment. We explore the analytical properties of MC-LQAS, and validate its use for the classification of S. mansoni prevalence in multiple settings in East Africa. Methodology We outline MC-LQAS design principles and formulae for operating characteristic curves. In addition, we derive the average sample number for MC-LQAS when utilizing semi-curtailed sampling and introduce curtailed sampling in this setting. We also assess the performance of MC-LQAS designs with maximum sample sizes of n = 15 and n = 25 via a weighted kappa-statistic using S. mansoni data collected in 388 schools from four studies in East Africa. Principle Findings Overall performance of MC-LQAS classification was high (kappa-statistic of 0.87). In three of the studies, the kappa-statistic for a design with n = 15 was greater than 0.75. In the fourth study, where these designs performed poorly (kappa-statistic less than 0.50), the majority of observations fell in regions where potential error is known to be high. Employment of semi-curtailed and curtailed sampling further reduced the sample size by as many as 0.5 and 3.5 observations per school, respectively, without increasing classification error. Conclusion/Significance This work provides the needed analytics to understand the properties of MC-LQAS for assessing the prevalance of S. mansoni and shows that in most settings a sample size of 15 children provides a reliable classification of schools. PMID:22970333

  7. Introductory Statistics Students' Conceptual Understanding of Study Design and Conclusions

    NASA Astrophysics Data System (ADS)

    Fry, Elizabeth Brondos

    Recommended learning goals for students in introductory statistics courses include the ability to recognize and explain the key role of randomness in designing studies and in drawing conclusions from those studies involving generalizations to a population or causal claims (GAISE College Report ASA Revision Committee, 2016). The purpose of this study was to explore introductory statistics students' understanding of the distinct roles that random sampling and random assignment play in study design and the conclusions that can be made from each. A study design unit lasting two and a half weeks was designed and implemented in four sections of an undergraduate introductory statistics course based on modeling and simulation. The research question that this study attempted to answer is: How does introductory statistics students' conceptual understanding of study design and conclusions (in particular, unbiased estimation and establishing causation) change after participating in a learning intervention designed to promote conceptual change in these areas? In order to answer this research question, a forced-choice assessment called the Inferences from Design Assessment (IDEA) was developed as a pretest and posttest, along with two open-ended assignments, a group quiz and a lab assignment. Quantitative analysis of IDEA results and qualitative analysis of the group quiz and lab assignment revealed that overall, students' mastery of study design concepts significantly increased after the unit, and the great majority of students successfully made the appropriate connections between random sampling and generalization, and between random assignment and causal claims. However, a small, but noticeable portion of students continued to demonstrate misunderstandings, such as confusion between random sampling and random assignment.

  8. Long-Term Bioeffects of 435-MHz Radiofrequency Radiation on Selected Blood-Borne Endpoints in Cannulated Rats. Volume 4. Plasma Catecholamines.

    DTIC Science & Technology

    1987-08-01

    out. To use each animal as its own control , arterial blood was sampled by means of chronically implanted aortic cannulas 112,13,14]. This simple...APPENDIX B STATISTICAL METHODOLOGY 37 APPENDIX B STATISTICAL METHODOLOGY The balanced design of this experiment (requiring that 25 animals from each...protoccl in that, in numerous cases, samples were collected at odd intervals (invalidating the orthogonality of the design ) and the number of samples’taken

  9. Designing Intervention Studies: Selected Populations, Range Restrictions, and Statistical Power

    PubMed Central

    Miciak, Jeremy; Taylor, W. Pat; Stuebing, Karla K.; Fletcher, Jack M.; Vaughn, Sharon

    2016-01-01

    An appropriate estimate of statistical power is critical for the design of intervention studies. Although the inclusion of a pretest covariate in the test of the primary outcome can increase statistical power, samples selected on the basis of pretest performance may demonstrate range restriction on the selection measure and other correlated measures. This can result in attenuated pretest-posttest correlations, reducing the variance explained by the pretest covariate. We investigated the implications of two potential range restriction scenarios: direct truncation on a selection measure and indirect range restriction on correlated measures. Empirical and simulated data indicated direct range restriction on the pretest covariate greatly reduced statistical power and necessitated sample size increases of 82%–155% (dependent on selection criteria) to achieve equivalent statistical power to parameters with unrestricted samples. However, measures demonstrating indirect range restriction required much smaller sample size increases (32%–71%) under equivalent scenarios. Additional analyses manipulated the correlations between measures and pretest-posttest correlations to guide planning experiments. Results highlight the need to differentiate between selection measures and potential covariates and to investigate range restriction as a factor impacting statistical power. PMID:28479943

  10. Designing Intervention Studies: Selected Populations, Range Restrictions, and Statistical Power.

    PubMed

    Miciak, Jeremy; Taylor, W Pat; Stuebing, Karla K; Fletcher, Jack M; Vaughn, Sharon

    2016-01-01

    An appropriate estimate of statistical power is critical for the design of intervention studies. Although the inclusion of a pretest covariate in the test of the primary outcome can increase statistical power, samples selected on the basis of pretest performance may demonstrate range restriction on the selection measure and other correlated measures. This can result in attenuated pretest-posttest correlations, reducing the variance explained by the pretest covariate. We investigated the implications of two potential range restriction scenarios: direct truncation on a selection measure and indirect range restriction on correlated measures. Empirical and simulated data indicated direct range restriction on the pretest covariate greatly reduced statistical power and necessitated sample size increases of 82%-155% (dependent on selection criteria) to achieve equivalent statistical power to parameters with unrestricted samples. However, measures demonstrating indirect range restriction required much smaller sample size increases (32%-71%) under equivalent scenarios. Additional analyses manipulated the correlations between measures and pretest-posttest correlations to guide planning experiments. Results highlight the need to differentiate between selection measures and potential covariates and to investigate range restriction as a factor impacting statistical power.

  11. GLIMMPSE Lite: Calculating Power and Sample Size on Smartphone Devices

    PubMed Central

    Munjal, Aarti; Sakhadeo, Uttara R.; Muller, Keith E.; Glueck, Deborah H.; Kreidler, Sarah M.

    2014-01-01

    Researchers seeking to develop complex statistical applications for mobile devices face a common set of difficult implementation issues. In this work, we discuss general solutions to the design challenges. We demonstrate the utility of the solutions for a free mobile application designed to provide power and sample size calculations for univariate, one-way analysis of variance (ANOVA), GLIMMPSE Lite. Our design decisions provide a guide for other scientists seeking to produce statistical software for mobile platforms. PMID:25541688

  12. Designing Intervention Studies: Selected Populations, Range Restrictions, and Statistical Power

    ERIC Educational Resources Information Center

    Miciak, Jeremy; Taylor, W. Pat; Stuebing, Karla K.; Fletcher, Jack M.; Vaughn, Sharon

    2016-01-01

    An appropriate estimate of statistical power is critical for the design of intervention studies. Although the inclusion of a pretest covariate in the test of the primary outcome can increase statistical power, samples selected on the basis of pretest performance may demonstrate range restriction on the selection measure and other correlated…

  13. Quantitative Methods for Analysing Joint Questionnaire Data: Exploring the Role of Joint in Force Design

    DTIC Science & Technology

    2015-08-01

    the nine questions. The Statistical Package for the Social Sciences ( SPSS ) [11] was used to conduct statistical analysis on the sample. Two types...constructs. SPSS was again used to conduct statistical analysis on the sample. This time factor analysis was conducted. Factor analysis attempts to...Business Research Methods and Statistics using SPSS . P432. 11 IBM SPSS Statistics . (2012) 12 Burns, R.B., Burns, R.A. (2008) ‘Business Research

  14. "Using Power Tables to Compute Statistical Power in Multilevel Experimental Designs"

    ERIC Educational Resources Information Center

    Konstantopoulos, Spyros

    2009-01-01

    Power computations for one-level experimental designs that assume simple random samples are greatly facilitated by power tables such as those presented in Cohen's book about statistical power analysis. However, in education and the social sciences experimental designs have naturally nested structures and multilevel models are needed to compute the…

  15. A nonparametric method to generate synthetic populations to adjust for complex sampling design features.

    PubMed

    Dong, Qi; Elliott, Michael R; Raghunathan, Trivellore E

    2014-06-01

    Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs.

  16. A nonparametric method to generate synthetic populations to adjust for complex sampling design features

    PubMed Central

    Dong, Qi; Elliott, Michael R.; Raghunathan, Trivellore E.

    2017-01-01

    Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs. PMID:29200608

  17. Latent spatial models and sampling design for landscape genetics

    USGS Publications Warehouse

    Hanks, Ephraim M.; Hooten, Mevin B.; Knick, Steven T.; Oyler-McCance, Sara J.; Fike, Jennifer A.; Cross, Todd B.; Schwartz, Michael K.

    2016-01-01

    We propose a spatially-explicit approach for modeling genetic variation across space and illustrate how this approach can be used to optimize spatial prediction and sampling design for landscape genetic data. We propose a multinomial data model for categorical microsatellite allele data commonly used in landscape genetic studies, and introduce a latent spatial random effect to allow for spatial correlation between genetic observations. We illustrate how modern dimension reduction approaches to spatial statistics can allow for efficient computation in landscape genetic statistical models covering large spatial domains. We apply our approach to propose a retrospective spatial sampling design for greater sage-grouse (Centrocercus urophasianus) population genetics in the western United States.

  18. Experimental toxicology: Issues of statistics, experimental design, and replication.

    PubMed

    Briner, Wayne; Kirwan, Jeral

    2017-01-01

    The difficulty of replicating experiments has drawn considerable attention. Issues with replication occur for a variety of reasons ranging from experimental design to laboratory errors to inappropriate statistical analysis. Here we review a variety of guidelines for statistical analysis, design, and execution of experiments in toxicology. In general, replication can be improved by using hypothesis driven experiments with adequate sample sizes, randomization, and blind data collection techniques. Copyright © 2016 Elsevier B.V. All rights reserved.

  19. Sampling benthic macroinvertebrates in a large flood-plain river: Considerations of study design, sample size, and cost

    USGS Publications Warehouse

    Bartsch, L.A.; Richardson, W.B.; Naimo, T.J.

    1998-01-01

    Estimation of benthic macroinvertebrate populations over large spatial scales is difficult due to the high variability in abundance and the cost of sample processing and taxonomic analysis. To determine a cost-effective, statistically powerful sample design, we conducted an exploratory study of the spatial variation of benthic macroinvertebrates in a 37 km reach of the Upper Mississippi River. We sampled benthos at 36 sites within each of two strata, contiguous backwater and channel border. Three standard ponar (525 cm(2)) grab samples were obtained at each site ('Original Design'). Analysis of variance and sampling cost of strata-wide estimates for abundance of Oligochaeta, Chironomidae, and total invertebrates showed that only one ponar sample per site ('Reduced Design') yielded essentially the same abundance estimates as the Original Design, while reducing the overall cost by 63%. A posteriori statistical power analysis (alpha = 0.05, beta = 0.20) on the Reduced Design estimated that at least 18 sites per stratum were needed to detect differences in mean abundance between contiguous backwater and channel border areas for Oligochaeta, Chironomidae, and total invertebrates. Statistical power was nearly identical for the three taxonomic groups. The abundances of several taxa of concern (e.g., Hexagenia mayflies and Musculium fingernail clams) were too spatially variable to estimate power with our method. Resampling simulations indicated that to achieve adequate sampling precision for Oligochaeta, at least 36 sample sites per stratum would be required, whereas a sampling precision of 0.2 would not be attained with any sample size for Hexagenia in channel border areas, or Chironomidae and Musculium in both strata given the variance structure of the original samples. Community-wide diversity indices (Brillouin and 1-Simpsons) increased as sample area per site increased. The backwater area had higher diversity than the channel border area. The number of sampling sites required to sample benthic macroinvertebrates during our sampling period depended on the study objective and ranged from 18 to more than 40 sites per stratum. No single sampling regime would efficiently and adequately sample all components of the macroinvertebrate community.

  20. [New design of the Health Survey of Catalonia (Spain, 2010-2014): a step forward in health planning and evaluation].

    PubMed

    Alcañiz-Zanón, Manuela; Mompart-Penina, Anna; Guillén-Estany, Montserrat; Medina-Bustos, Antonia; Aragay-Barbany, Josep M; Brugulat-Guiteras, Pilar; Tresserras-Gaju, Ricard

    2014-01-01

    This article presents the genesis of the Health Survey of Catalonia (Spain, 2010-2014) with its semiannual subsamples and explains the basic characteristics of its multistage sampling design. In comparison with previous surveys, the organizational advantages of this new statistical operation include rapid data availability and the ability to continuously monitor the population. The main benefits are timeliness in the production of indicators and the possibility of introducing new topics through the supplemental questionnaire as a function of needs. Limitations consist of the complexity of the sample design and the lack of longitudinal follow-up of the sample. Suitable sampling weights for each specific subsample are necessary for any statistical analysis of micro-data. Accuracy in the analysis of territorial disaggregation or population subgroups increases if annual samples are accumulated. Copyright © 2013 SESPAS. Published by Elsevier Espana. All rights reserved.

  1. Testing for independence in J×K contingency tables with complex sample survey data.

    PubMed

    Lipsitz, Stuart R; Fitzmaurice, Garrett M; Sinha, Debajyoti; Hevelone, Nathanael; Giovannucci, Edward; Hu, Jim C

    2015-09-01

    The test of independence of row and column variables in a (J×K) contingency table is a widely used statistical test in many areas of application. For complex survey samples, use of the standard Pearson chi-squared test is inappropriate due to correlation among units within the same cluster. Rao and Scott (1981, Journal of the American Statistical Association 76, 221-230) proposed an approach in which the standard Pearson chi-squared statistic is multiplied by a design effect to adjust for the complex survey design. Unfortunately, this test fails to exist when one of the observed cell counts equals zero. Even with the large samples typical of many complex surveys, zero cell counts can occur for rare events, small domains, or contingency tables with a large number of cells. Here, we propose Wald and score test statistics for independence based on weighted least squares estimating equations. In contrast to the Rao-Scott test statistic, the proposed Wald and score test statistics always exist. In simulations, the score test is found to perform best with respect to type I error. The proposed method is motivated by, and applied to, post surgical complications data from the United States' Nationwide Inpatient Sample (NIS) complex survey of hospitals in 2008. © 2015, The International Biometric Society.

  2. Sampling stratospheric aerosols with impactors

    NASA Technical Reports Server (NTRS)

    Oberbeck, Verne R.

    1989-01-01

    Derivation of statistically significant size distributions from impactor samples of rarefield stratospheric aerosols imposes difficult sampling constraints on collector design. It is shown that it is necessary to design impactors of different size for each range of aerosol size collected so as to obtain acceptable levels of uncertainty with a reasonable amount of data reduction.

  3. Integrated Sampling Strategy (ISS) Guide

    Treesearch

    Robert E. Keane; Duncan C. Lutes

    2006-01-01

    What is an Integrated Sampling Strategy? Simply put, it is the strategy that guides how plots are put on the landscape. FIREMON’s Integrated Sampling Strategy assists fire managers as they design their fire monitoring project by answering questions such as: What statistical approach is appropriate for my sample design? How many plots can I afford? How many plots do I...

  4. Statistical Power and Optimum Sample Allocation Ratio for Treatment and Control Having Unequal Costs Per Unit of Randomization

    ERIC Educational Resources Information Center

    Liu, Xiaofeng

    2003-01-01

    This article considers optimal sample allocation between the treatment and control condition in multilevel designs when the costs per sampling unit vary due to treatment assignment. Optimal unequal allocation may reduce the cost from that of a balanced design without sacrificing any power. The optimum sample allocation ratio depends only on the…

  5. Applied statistics in ecology: common pitfalls and simple solutions

    Treesearch

    E. Ashley Steel; Maureen C. Kennedy; Patrick G. Cunningham; John S. Stanovick

    2013-01-01

    The most common statistical pitfalls in ecological research are those associated with data exploration, the logic of sampling and design, and the interpretation of statistical results. Although one can find published errors in calculations, the majority of statistical pitfalls result from incorrect logic or interpretation despite correct numerical calculations. There...

  6. Research Designs for Intervention Research with Small Samples II: Stepped Wedge and Interrupted Time-Series Designs.

    PubMed

    Fok, Carlotta Ching Ting; Henry, David; Allen, James

    2015-10-01

    The stepped wedge design (SWD) and the interrupted time-series design (ITSD) are two alternative research designs that maximize efficiency and statistical power with small samples when contrasted to the operating characteristics of conventional randomized controlled trials (RCT). This paper provides an overview and introduction to previous work with these designs and compares and contrasts them with the dynamic wait-list design (DWLD) and the regression point displacement design (RPDD), which were presented in a previous article (Wyman, Henry, Knoblauch, and Brown, Prevention Science. 2015) in this special section. The SWD and the DWLD are similar in that both are intervention implementation roll-out designs. We discuss similarities and differences between the SWD and DWLD in their historical origin and application, along with differences in the statistical modeling of each design. Next, we describe the main design characteristics of the ITSD, along with some of its strengths and limitations. We provide a critical comparative review of strengths and weaknesses in application of the ITSD, SWD, DWLD, and RPDD as small sample alternatives to application of the RCT, concluding with a discussion of the types of contextual factors that influence selection of an optimal research design by prevention researchers working with small samples.

  7. Research Designs for Intervention Research with Small Samples II: Stepped Wedge and Interrupted Time-Series Designs

    PubMed Central

    Ting Fok, Carlotta Ching; Henry, David; Allen, James

    2015-01-01

    The stepped wedge design (SWD) and the interrupted time-series design (ITSD) are two alternative research designs that maximize efficiency and statistical power with small samples when contrasted to the operating characteristics of conventional randomized controlled trials (RCT). This paper provides an overview and introduction to previous work with these designs, and compares and contrasts them with the dynamic wait-list design (DWLD) and the regression point displacement design (RPDD), which were presented in a previous article (Wyman, Henry, Knoblauch, and Brown, 2015) in this Special Section. The SWD and the DWLD are similar in that both are intervention implementation roll-out designs. We discuss similarities and differences between the SWD and DWLD in their historical origin and application, along with differences in the statistical modeling of each design. Next, we describe the main design characteristics of the ITSD, along with some of its strengths and limitations. We provide a critical comparative review of strengths and weaknesses in application of the ITSD, SWD, DWLD, and RPDD as small samples alternatives to application of the RCT, concluding with a discussion of the types of contextual factors that influence selection of an optimal research design by prevention researchers working with small samples. PMID:26017633

  8. Impact of Design Effects in Large-Scale District and State Assessments

    ERIC Educational Resources Information Center

    Phillips, Gary W.

    2015-01-01

    This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…

  9. Simulation of Wind Profile Perturbations for Launch Vehicle Design

    NASA Technical Reports Server (NTRS)

    Adelfang, S. I.

    2004-01-01

    Ideally, a statistically representative sample of measured high-resolution wind profiles with wavelengths as small as tens of meters is required in design studies to establish aerodynamic load indicator dispersions and vehicle control system capability. At most potential launch sites, high- resolution wind profiles may not exist. Representative samples of Rawinsonde wind profiles to altitudes of 30 km are more likely to be available from the extensive network of measurement sites established for routine sampling in support of weather observing and forecasting activity. Such a sample, large enough to be statistically representative of relatively large wavelength perturbations, would be inadequate for launch vehicle design assessments because the Rawinsonde system accurately measures wind perturbations with wavelengths no smaller than 2000 m (1000 m altitude increment). The Kennedy Space Center (KSC) Jimsphere wind profiles (150/month and seasonal 2 and 3.5-hr pairs) are the only adequate samples of high resolution profiles approx. 150 to 300 m effective resolution, but over-sampled at 25 m intervals) that have been used extensively for launch vehicle design assessments. Therefore, a simulation process has been developed for enhancement of measured low-resolution Rawinsonde profiles that would be applicable in preliminary launch vehicle design studies at launch sites other than KSC.

  10. A review of empirical research related to the use of small quantitative samples in clinical outcome scale development.

    PubMed

    Houts, Carrie R; Edwards, Michael C; Wirth, R J; Deal, Linda S

    2016-11-01

    There has been a notable increase in the advocacy of using small-sample designs as an initial quantitative assessment of item and scale performance during the scale development process. This is particularly true in the development of clinical outcome assessments (COAs), where Rasch analysis has been advanced as an appropriate statistical tool for evaluating the developing COAs using a small sample. We review the benefits such methods are purported to offer from both a practical and statistical standpoint and detail several problematic areas, including both practical and statistical theory concerns, with respect to the use of quantitative methods, including Rasch-consistent methods, with small samples. The feasibility of obtaining accurate information and the potential negative impacts of misusing large-sample statistical methods with small samples during COA development are discussed.

  11. Area estimation using multiyear designs and partial crop identification

    NASA Technical Reports Server (NTRS)

    Sielken, R. L., Jr.

    1983-01-01

    Progress is reported for the following areas: (1) estimating the stratum's crop acreage proportion using the multiyear area estimation model; (2) assessment of multiyear sampling designs; and (3) development of statistical methodology for incorporating partially identified sample segments into crop area estimation.

  12. Trends in study design and the statistical methods employed in a leading general medicine journal.

    PubMed

    Gosho, M; Sato, Y; Nagashima, K; Takahashi, S

    2018-02-01

    Study design and statistical methods have become core components of medical research, and the methodology has become more multifaceted and complicated over time. The study of the comprehensive details and current trends of study design and statistical methods is required to support the future implementation of well-planned clinical studies providing information about evidence-based medicine. Our purpose was to illustrate study design and statistical methods employed in recent medical literature. This was an extension study of Sato et al. (N Engl J Med 2017; 376: 1086-1087), which reviewed 238 articles published in 2015 in the New England Journal of Medicine (NEJM) and briefly summarized the statistical methods employed in NEJM. Using the same database, we performed a new investigation of the detailed trends in study design and individual statistical methods that were not reported in the Sato study. Due to the CONSORT statement, prespecification and justification of sample size are obligatory in planning intervention studies. Although standard survival methods (eg Kaplan-Meier estimator and Cox regression model) were most frequently applied, the Gray test and Fine-Gray proportional hazard model for considering competing risks were sometimes used for a more valid statistical inference. With respect to handling missing data, model-based methods, which are valid for missing-at-random data, were more frequently used than single imputation methods. These methods are not recommended as a primary analysis, but they have been applied in many clinical trials. Group sequential design with interim analyses was one of the standard designs, and novel design, such as adaptive dose selection and sample size re-estimation, was sometimes employed in NEJM. Model-based approaches for handling missing data should replace single imputation methods for primary analysis in the light of the information found in some publications. Use of adaptive design with interim analyses is increasing after the presentation of the FDA guidance for adaptive design. © 2017 John Wiley & Sons Ltd.

  13. Statistical Strategy for Inventorying and Monitoring the Ecosystem Resources of the State of Jalisco at Multiple Scales and Resolution Levels

    Treesearch

    Robin M. Reich; Hans T. Schreuder

    2006-01-01

    The sampling strategy involving both statistical and in-place inventory information is presented for the natural resources project of the Green Belt area (Centuron Verde) in the Mexican state of Jalisco. The sampling designs used were a grid based ground sample of a 90x90 m plot and a two-stage stratified sample of 30 x 30 m plots. The data collected were used to...

  14. DoD Met Most Requirements of the Improper Payments Elimination and Recovery Act in FY 2014, but Improper Payment Estimates Were Unreliable

    DTIC Science & Technology

    2015-05-12

    Deficiencies That Affect the Reliability of Estimates ________________________________________6 Statistical Precision Could Be Improved... statistical precision of improper payments estimates in seven of the DoD payment programs through the use of stratified sample designs. DoD improper...payments not subject to sampling, which made the results statistically invalid. We made a recommendation to correct this problem in a previous report;4

  15. Statistical inferences for data from studies conducted with an aggregated multivariate outcome-dependent sample design

    PubMed Central

    Lu, Tsui-Shan; Longnecker, Matthew P.; Zhou, Haibo

    2016-01-01

    Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data and the general ODS design for a continuous response. While substantial work has been done for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome dependent sampling (Multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the Multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the Multivariate-ODS or the estimator from a simple random sample with the same sample size. The Multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of PCB exposure to hearing loss in children born to the Collaborative Perinatal Study. PMID:27966260

  16. 47 CFR 1.363 - Introduction of statistical data.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... case of sample surveys, there shall be a clear description of the survey design, including the... evidence in common carrier hearing proceedings, including but not limited to sample surveys, econometric... description of the experimental design shall be set forth, including a specification of the controlled...

  17. 47 CFR 1.363 - Introduction of statistical data.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... case of sample surveys, there shall be a clear description of the survey design, including the... evidence in common carrier hearing proceedings, including but not limited to sample surveys, econometric... description of the experimental design shall be set forth, including a specification of the controlled...

  18. 47 CFR 1.363 - Introduction of statistical data.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... case of sample surveys, there shall be a clear description of the survey design, including the... evidence in common carrier hearing proceedings, including but not limited to sample surveys, econometric... description of the experimental design shall be set forth, including a specification of the controlled...

  19. 47 CFR 1.363 - Introduction of statistical data.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... case of sample surveys, there shall be a clear description of the survey design, including the... evidence in common carrier hearing proceedings, including but not limited to sample surveys, econometric... description of the experimental design shall be set forth, including a specification of the controlled...

  20. 47 CFR 1.363 - Introduction of statistical data.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... case of sample surveys, there shall be a clear description of the survey design, including the... evidence in common carrier hearing proceedings, including but not limited to sample surveys, econometric... description of the experimental design shall be set forth, including a specification of the controlled...

  1. Experimental design, power and sample size for animal reproduction experiments.

    PubMed

    Chapman, Phillip L; Seidel, George E

    2008-01-01

    The present paper concerns statistical issues in the design of animal reproduction experiments, with emphasis on the problems of sample size determination and power calculations. We include examples and non-technical discussions aimed at helping researchers avoid serious errors that may invalidate or seriously impair the validity of conclusions from experiments. Screen shots from interactive power calculation programs and basic SAS power calculation programs are presented to aid in understanding statistical power and computing power in some common experimental situations. Practical issues that are common to most statistical design problems are briefly discussed. These include one-sided hypothesis tests, power level criteria, equality of within-group variances, transformations of response variables to achieve variance equality, optimal specification of treatment group sizes, 'post hoc' power analysis and arguments for the increased use of confidence intervals in place of hypothesis tests.

  2. Adaptive web sampling.

    PubMed

    Thompson, Steven K

    2006-12-01

    A flexible class of adaptive sampling designs is introduced for sampling in network and spatial settings. In the designs, selections are made sequentially with a mixture distribution based on an active set that changes as the sampling progresses, using network or spatial relationships as well as sample values. The new designs have certain advantages compared with previously existing adaptive and link-tracing designs, including control over sample sizes and of the proportion of effort allocated to adaptive selections. Efficient inference involves averaging over sample paths consistent with the minimal sufficient statistic. A Markov chain resampling method makes the inference computationally feasible. The designs are evaluated in network and spatial settings using two empirical populations: a hidden human population at high risk for HIV/AIDS and an unevenly distributed bird population.

  3. Statistical inference for extended or shortened phase II studies based on Simon's two-stage designs.

    PubMed

    Zhao, Junjun; Yu, Menggang; Feng, Xi-Ping

    2015-06-07

    Simon's two-stage designs are popular choices for conducting phase II clinical trials, especially in the oncology trials to reduce the number of patients placed on ineffective experimental therapies. Recently Koyama and Chen (2008) discussed how to conduct proper inference for such studies because they found that inference procedures used with Simon's designs almost always ignore the actual sampling plan used. In particular, they proposed an inference method for studies when the actual second stage sample sizes differ from planned ones. We consider an alternative inference method based on likelihood ratio. In particular, we order permissible sample paths under Simon's two-stage designs using their corresponding conditional likelihood. In this way, we can calculate p-values using the common definition: the probability of obtaining a test statistic value at least as extreme as that observed under the null hypothesis. In addition to providing inference for a couple of scenarios where Koyama and Chen's method can be difficult to apply, the resulting estimate based on our method appears to have certain advantage in terms of inference properties in many numerical simulations. It generally led to smaller biases and narrower confidence intervals while maintaining similar coverages. We also illustrated the two methods in a real data setting. Inference procedures used with Simon's designs almost always ignore the actual sampling plan. Reported P-values, point estimates and confidence intervals for the response rate are not usually adjusted for the design's adaptiveness. Proper statistical inference procedures should be used.

  4. Experimental Design in Clinical 'Omics Biomarker Discovery.

    PubMed

    Forshed, Jenny

    2017-11-03

    This tutorial highlights some issues in the experimental design of clinical 'omics biomarker discovery, how to avoid bias and get as true quantities as possible from biochemical analyses, and how to select samples to improve the chance of answering the clinical question at issue. This includes the importance of defining clinical aim and end point, knowing the variability in the results, randomization of samples, sample size, statistical power, and how to avoid confounding factors by including clinical data in the sample selection, that is, how to avoid unpleasant surprises at the point of statistical analysis. The aim of this Tutorial is to help translational clinical and preclinical biomarker candidate research and to improve the validity and potential of future biomarker candidate findings.

  5. Statistical issues in quality control of proteomic analyses: good experimental design and planning.

    PubMed

    Cairns, David A

    2011-03-01

    Quality control is becoming increasingly important in proteomic investigations as experiments become more multivariate and quantitative. Quality control applies to all stages of an investigation and statistics can play a key role. In this review, the role of statistical ideas in the design and planning of an investigation is described. This involves the design of unbiased experiments using key concepts from statistical experimental design, the understanding of the biological and analytical variation in a system using variance components analysis and the determination of a required sample size to perform a statistically powerful investigation. These concepts are described through simple examples and an example data set from a 2-D DIGE pilot experiment. Each of these concepts can prove useful in producing better and more reproducible data. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  6. Methodological quality of behavioural weight loss studies: a systematic review

    PubMed Central

    Lemon, S. C.; Wang, M. L.; Haughton, C. F.; Estabrook, D. P.; Frisard, C. F.; Pagoto, S. L.

    2018-01-01

    Summary This systematic review assessed the methodological quality of behavioural weight loss intervention studies conducted among adults and associations between quality and statistically significant weight loss outcome, strength of intervention effectiveness and sample size. Searches for trials published between January, 2009 and December, 2014 were conducted using PUBMED, MEDLINE and PSYCINFO and identified ninety studies. Methodological quality indicators included study design, anthropometric measurement approach, sample size calculations, intent-to-treat (ITT) analysis, loss to follow-up rate, missing data strategy, sampling strategy, report of treatment receipt and report of intervention fidelity (mean = 6.3). Indicators most commonly utilized included randomized design (100%), objectively measured anthropometrics (96.7%), ITT analysis (86.7%) and reporting treatment adherence (76.7%). Most studies (62.2%) had a follow-up rate >75% and reported a loss to follow-up analytic strategy or minimal missing data (69.9%). Describing intervention fidelity (34.4%) and sampling from a known population (41.1%) were least common. Methodological quality was not associated with reporting a statistically significant result, effect size or sample size. This review found the published literature of behavioural weight loss trials to be of high quality for specific indicators, including study design and measurement. Identified for improvement include utilization of more rigorous statistical approaches to loss to follow up and better fidelity reporting. PMID:27071775

  7. Evaluation of Solid Rocket Motor Component Data Using a Commercially Available Statistical Software Package

    NASA Technical Reports Server (NTRS)

    Stefanski, Philip L.

    2015-01-01

    Commercially available software packages today allow users to quickly perform the routine evaluations of (1) descriptive statistics to numerically and graphically summarize both sample and population data, (2) inferential statistics that draws conclusions about a given population from samples taken of it, (3) probability determinations that can be used to generate estimates of reliability allowables, and finally (4) the setup of designed experiments and analysis of their data to identify significant material and process characteristics for application in both product manufacturing and performance enhancement. This paper presents examples of analysis and experimental design work that has been conducted using Statgraphics®(Registered Trademark) statistical software to obtain useful information with regard to solid rocket motor propellants and internal insulation material. Data were obtained from a number of programs (Shuttle, Constellation, and Space Launch System) and sources that include solid propellant burn rate strands, tensile specimens, sub-scale test motors, full-scale operational motors, rubber insulation specimens, and sub-scale rubber insulation analog samples. Besides facilitating the experimental design process to yield meaningful results, statistical software has demonstrated its ability to quickly perform complex data analyses and yield significant findings that might otherwise have gone unnoticed. One caveat to these successes is that useful results not only derive from the inherent power of the software package, but also from the skill and understanding of the data analyst.

  8. 76 FR 13436 - Agency Information Collection Activities: Proposed Collection; Comment Request; Generic Clearance...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-03-11

    ... target population to which generalizations will be made, the sampling frame, the sample design (including... for submission for other generic mechanisms that are designed to yield quantitative results. The MSPB... insights on perceptions and opinions, but are not statistical surveys that yield quantitative results that...

  9. A Science and Risk-Based Pragmatic Methodology for Blend and Content Uniformity Assessment.

    PubMed

    Sayeed-Desta, Naheed; Pazhayattil, Ajay Babu; Collins, Jordan; Doshi, Chetan

    2018-04-01

    This paper describes a pragmatic approach that can be applied in assessing powder blend and unit dosage uniformity of solid dose products at Process Design, Process Performance Qualification, and Continued/Ongoing Process Verification stages of the Process Validation lifecycle. The statistically based sampling, testing, and assessment plan was developed due to the withdrawal of the FDA draft guidance for industry "Powder Blends and Finished Dosage Units-Stratified In-Process Dosage Unit Sampling and Assessment." This paper compares the proposed Grouped Area Variance Estimate (GAVE) method with an alternate approach outlining the practicality and statistical rationalization using traditional sampling and analytical methods. The approach is designed to fit solid dose processes assuring high statistical confidence in both powder blend uniformity and dosage unit uniformity during all three stages of the lifecycle complying with ASTM standards as recommended by the US FDA.

  10. Statistical inference for tumor growth inhibition T/C ratio.

    PubMed

    Wu, Jianrong

    2010-09-01

    The tumor growth inhibition T/C ratio is commonly used to quantify treatment effects in drug screening tumor xenograft experiments. The T/C ratio is converted to an antitumor activity rating using an arbitrary cutoff point and often without any formal statistical inference. Here, we applied a nonparametric bootstrap method and a small sample likelihood ratio statistic to make a statistical inference of the T/C ratio, including both hypothesis testing and a confidence interval estimate. Furthermore, sample size and power are also discussed for statistical design of tumor xenograft experiments. Tumor xenograft data from an actual experiment were analyzed to illustrate the application.

  11. Applied Survey Sampling

    ERIC Educational Resources Information Center

    Blair, Edward; Blair, Johnny

    2015-01-01

    Written for students and researchers who wish to understand the conceptual and practical aspects of sampling, this book is designed to be accessible without requiring advanced statistical training. It covers a wide range of topics, from the basics of sampling to special topics such as sampling rare populations, sampling organizational populations,…

  12. Statistical inferences for data from studies conducted with an aggregated multivariate outcome-dependent sample design.

    PubMed

    Lu, Tsui-Shan; Longnecker, Matthew P; Zhou, Haibo

    2017-03-15

    Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data, and the general ODS design for a continuous response. While substantial work has been carried out for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome-dependent sampling (multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the multivariate-ODS or the estimator from a simple random sample with the same sample size. The multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of polychlorinated biphenyl exposure to hearing loss in children born to the Collaborative Perinatal Study. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  13. Learning to Reason from Samples: Commentary from the Perspectives of Task Design and the Emergence of "Big Data"

    ERIC Educational Resources Information Center

    Ainley, Janet; Gould, Robert; Pratt, Dave

    2015-01-01

    This paper is in the form of a reflective discussion of the collection of papers in this Special Issue on "Statistical reasoning: learning to reason from samples" drawing on deliberations arising at the Seventh International Collaboration for Research on Statistical Reasoning, Thinking, and Literacy (SRTL7). It is an important part of…

  14. Statistical considerations in monitoring birds over large areas

    USGS Publications Warehouse

    Johnson, D.H.

    2000-01-01

    The proper design of a monitoring effort depends primarily on the objectives desired, constrained by the resources available to conduct the work. Typically, managers have numerous objectives, such as determining abundance of the species, detecting changes in population size, evaluating responses to management activities, and assessing habitat associations. A design that is optimal for one objective will likely not be optimal for others. Careful consideration of the importance of the competing objectives may lead to a design that adequately addresses the priority concerns, although it may not be optimal for any individual objective. Poor design or inadequate sample sizes may result in such weak conclusions that the effort is wasted. Statistical expertise can be used at several stages, such as estimating power of certain hypothesis tests, but is perhaps most useful in fundamental considerations of describing objectives and designing sampling plans.

  15. A Simple and Robust Method for Partially Matched Samples Using the P-Values Pooling Approach

    PubMed Central

    Kuan, Pei Fen; Huang, Bo

    2013-01-01

    This paper focuses on statistical analyses in scenarios where some samples from the matched pairs design are missing, resulting in partially matched samples. Motivated by the idea of meta-analysis, we recast the partially matched samples as coming from two experimental designs, and propose a simple yet robust approach based on the weighted Z-test to integrate the p-values computed from these two designs. We show that the proposed approach achieves better operating characteristics in simulations and a case study, compared to existing methods for partially matched samples. PMID:23417968

  16. Statistical power calculations for mixed pharmacokinetic study designs using a population approach.

    PubMed

    Kloprogge, Frank; Simpson, Julie A; Day, Nicholas P J; White, Nicholas J; Tarning, Joel

    2014-09-01

    Simultaneous modelling of dense and sparse pharmacokinetic data is possible with a population approach. To determine the number of individuals required to detect the effect of a covariate, simulation-based power calculation methodologies can be employed. The Monte Carlo Mapped Power method (a simulation-based power calculation methodology using the likelihood ratio test) was extended in the current study to perform sample size calculations for mixed pharmacokinetic studies (i.e. both sparse and dense data collection). A workflow guiding an easy and straightforward pharmacokinetic study design, considering also the cost-effectiveness of alternative study designs, was used in this analysis. Initially, data were simulated for a hypothetical drug and then for the anti-malarial drug, dihydroartemisinin. Two datasets (sampling design A: dense; sampling design B: sparse) were simulated using a pharmacokinetic model that included a binary covariate effect and subsequently re-estimated using (1) the same model and (2) a model not including the covariate effect in NONMEM 7.2. Power calculations were performed for varying numbers of patients with sampling designs A and B. Study designs with statistical power >80% were selected and further evaluated for cost-effectiveness. The simulation studies of the hypothetical drug and the anti-malarial drug dihydroartemisinin demonstrated that the simulation-based power calculation methodology, based on the Monte Carlo Mapped Power method, can be utilised to evaluate and determine the sample size of mixed (part sparsely and part densely sampled) study designs. The developed method can contribute to the design of robust and efficient pharmacokinetic studies.

  17. Statistical methodology for the analysis of dye-switch microarray experiments

    PubMed Central

    Mary-Huard, Tristan; Aubert, Julie; Mansouri-Attia, Nadera; Sandra, Olivier; Daudin, Jean-Jacques

    2008-01-01

    Background In individually dye-balanced microarray designs, each biological sample is hybridized on two different slides, once with Cy3 and once with Cy5. While this strategy ensures an automatic correction of the gene-specific labelling bias, it also induces dependencies between log-ratio measurements that must be taken into account in the statistical analysis. Results We present two original statistical procedures for the statistical analysis of individually balanced designs. These procedures are compared with the usual ML and REML mixed model procedures proposed in most statistical toolboxes, on both simulated and real data. Conclusion The UP procedure we propose as an alternative to usual mixed model procedures is more efficient and significantly faster to compute. This result provides some useful guidelines for the analysis of complex designs. PMID:18271965

  18. Optimal sampling design for estimating spatial distribution and abundance of a freshwater mussel population

    USGS Publications Warehouse

    Pooler, P.S.; Smith, D.R.

    2005-01-01

    We compared the ability of simple random sampling (SRS) and a variety of systematic sampling (SYS) designs to estimate abundance, quantify spatial clustering, and predict spatial distribution of freshwater mussels. Sampling simulations were conducted using data obtained from a census of freshwater mussels in a 40 X 33 m section of the Cacapon River near Capon Bridge, West Virginia, and from a simulated spatially random population generated to have the same abundance as the real population. Sampling units that were 0.25 m 2 gave more accurate and precise abundance estimates and generally better spatial predictions than 1-m2 sampling units. Systematic sampling with ???2 random starts was more efficient than SRS. Estimates of abundance based on SYS were more accurate when the distance between sampling units across the stream was less than or equal to the distance between sampling units along the stream. Three measures for quantifying spatial clustering were examined: Hopkins Statistic, the Clumping Index, and Morisita's Index. Morisita's Index was the most reliable, and the Hopkins Statistic was prone to false rejection of complete spatial randomness. SYS designs with units spaced equally across and up stream provided the most accurate predictions when estimating the spatial distribution by kriging. Our research indicates that SYS designs with sampling units equally spaced both across and along the stream would be appropriate for sampling freshwater mussels even if no information about the true underlying spatial distribution of the population were available to guide the design choice. ?? 2005 by The North American Benthological Society.

  19. Sampling and Data Analysis for Environmental Microbiology

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Murray, Christopher J.

    2001-06-01

    A brief review of the literature indicates the importance of statistical analysis in applied and environmental microbiology. Sampling designs are particularly important for successful studies, and it is highly recommended that researchers review their sampling design before heading to the laboratory or the field. Most statisticians have numerous stories of scientists who approached them after their study was complete only to have to tell them that the data they gathered could not be used to test the hypothesis they wanted to address. Once the data are gathered, a large and complex body of statistical techniques are available for analysis ofmore » the data. Those methods include both numerical and graphical techniques for exploratory characterization of the data. Hypothesis testing and analysis of variance (ANOVA) are techniques that can be used to compare the mean and variance of two or more groups of samples. Regression can be used to examine the relationships between sets of variables and is often used to examine the dependence of microbiological populations on microbiological parameters. Multivariate statistics provides several methods that can be used for interpretation of datasets with a large number of variables and to partition samples into similar groups, a task that is very common in taxonomy, but also has applications in other fields of microbiology. Geostatistics and other techniques have been used to examine the spatial distribution of microorganisms. The objectives of this chapter are to provide a brief survey of some of the statistical techniques that can be used for sample design and data analysis of microbiological data in environmental studies, and to provide some examples of their use from the literature.« less

  20. Prospective power calculations for the Four Lab study of a multigenerational reproductive/developmental toxicity rodent bioassay using a complex mixture of disinfection by-products in the low-response region.

    PubMed

    Dingus, Cheryl A; Teuschler, Linda K; Rice, Glenn E; Simmons, Jane Ellen; Narotsky, Michael G

    2011-10-01

    In complex mixture toxicology, there is growing emphasis on testing environmentally representative doses that improve the relevance of results for health risk assessment, but are typically much lower than those used in traditional toxicology studies. Traditional experimental designs with typical sample sizes may have insufficient statistical power to detect effects caused by environmentally relevant doses. Proper study design, with adequate statistical power, is critical to ensuring that experimental results are useful for environmental health risk assessment. Studies with environmentally realistic complex mixtures have practical constraints on sample concentration factor and sample volume as well as the number of animals that can be accommodated. This article describes methodology for calculation of statistical power for non-independent observations for a multigenerational rodent reproductive/developmental bioassay. The use of the methodology is illustrated using the U.S. EPA's Four Lab study in which rodents were exposed to chlorinated water concentrates containing complex mixtures of drinking water disinfection by-products. Possible experimental designs included two single-block designs and a two-block design. Considering the possible study designs and constraints, a design of two blocks of 100 females with a 40:60 ratio of control:treated animals and a significance level of 0.05 yielded maximum prospective power (~90%) to detect pup weight decreases, while providing the most power to detect increased prenatal loss.

  1. Prospective Power Calculations for the Four Lab Study of A Multigenerational Reproductive/Developmental Toxicity Rodent Bioassay Using A Complex Mixture of Disinfection By-Products in the Low-Response Region

    PubMed Central

    Dingus, Cheryl A.; Teuschler, Linda K.; Rice, Glenn E.; Simmons, Jane Ellen; Narotsky, Michael G.

    2011-01-01

    In complex mixture toxicology, there is growing emphasis on testing environmentally representative doses that improve the relevance of results for health risk assessment, but are typically much lower than those used in traditional toxicology studies. Traditional experimental designs with typical sample sizes may have insufficient statistical power to detect effects caused by environmentally relevant doses. Proper study design, with adequate statistical power, is critical to ensuring that experimental results are useful for environmental health risk assessment. Studies with environmentally realistic complex mixtures have practical constraints on sample concentration factor and sample volume as well as the number of animals that can be accommodated. This article describes methodology for calculation of statistical power for non-independent observations for a multigenerational rodent reproductive/developmental bioassay. The use of the methodology is illustrated using the U.S. EPA’s Four Lab study in which rodents were exposed to chlorinated water concentrates containing complex mixtures of drinking water disinfection by-products. Possible experimental designs included two single-block designs and a two-block design. Considering the possible study designs and constraints, a design of two blocks of 100 females with a 40:60 ratio of control:treated animals and a significance level of 0.05 yielded maximum prospective power (~90%) to detect pup weight decreases, while providing the most power to detect increased prenatal loss. PMID:22073030

  2. 2016 Workplace and Gender Relations Survey of Active Duty Members: Frequently Asked Questions

    DTIC Science & Technology

    2017-05-01

    active duty population both at the sample design stage as well as during the statistical weighting process to account for survey non-response and post...used the OPA sampling design , won the 2011 Policy Impact Award from The American Association for Public Opinion Research (AAPOR), which “recognizes

  3. Reliability approach to rotating-component design. [fatigue life and stress concentration

    NASA Technical Reports Server (NTRS)

    Kececioglu, D. B.; Lalli, V. R.

    1975-01-01

    A probabilistic methodology for designing rotating mechanical components using reliability to relate stress to strength is explained. The experimental test machines and data obtained for steel to verify this methodology are described. A sample mechanical rotating component design problem is solved by comparing a deterministic design method with the new design-by reliability approach. The new method shows that a smaller size and weight can be obtained for specified rotating shaft life and reliability, and uses the statistical distortion-energy theory with statistical fatigue diagrams for optimum shaft design. Statistical methods are presented for (1) determining strength distributions for steel experimentally, (2) determining a failure theory for stress variations in a rotating shaft subjected to reversed bending and steady torque, and (3) relating strength to stress by reliability.

  4. Outcome-Dependent Sampling Design and Inference for Cox's Proportional Hazards Model.

    PubMed

    Yu, Jichang; Liu, Yanyan; Cai, Jianwen; Sandler, Dale P; Zhou, Haibo

    2016-11-01

    We propose a cost-effective outcome-dependent sampling design for the failure time data and develop an efficient inference procedure for data collected with this design. To account for the biased sampling scheme, we derive estimators from a weighted partial likelihood estimating equation. The proposed estimators for regression parameters are shown to be consistent and asymptotically normally distributed. A criteria that can be used to optimally implement the ODS design in practice is proposed and studied. The small sample performance of the proposed method is evaluated by simulation studies. The proposed design and inference procedure is shown to be statistically more powerful than existing alternative designs with the same sample sizes. We illustrate the proposed method with an existing real data from the Cancer Incidence and Mortality of Uranium Miners Study.

  5. Using public control genotype data to increase power and decrease cost of case-control genetic association studies.

    PubMed

    Ho, Lindsey A; Lange, Ethan M

    2010-12-01

    Genome-wide association (GWA) studies are a powerful approach for identifying novel genetic risk factors associated with human disease. A GWA study typically requires the inclusion of thousands of samples to have sufficient statistical power to detect single nucleotide polymorphisms that are associated with only modest increases in risk of disease given the heavy burden of a multiple test correction that is necessary to maintain valid statistical tests. Low statistical power and the high financial cost of performing a GWA study remains prohibitive for many scientific investigators anxious to perform such a study using their own samples. A number of remedies have been suggested to increase statistical power and decrease cost, including the utilization of free publicly available genotype data and multi-stage genotyping designs. Herein, we compare the statistical power and relative costs of alternative association study designs that use cases and screened controls to study designs that are based only on, or additionally include, free public control genotype data. We describe a novel replication-based two-stage study design, which uses free public control genotype data in the first stage and follow-up genotype data on case-matched controls in the second stage that preserves many of the advantages inherent when using only an epidemiologically matched set of controls. Specifically, we show that our proposed two-stage design can substantially increase statistical power and decrease cost of performing a GWA study while controlling the type-I error rate that can be inflated when using public controls due to differences in ancestry and batch genotype effects.

  6. Design-based and model-based inference in surveys of freshwater mollusks

    USGS Publications Warehouse

    Dorazio, R.M.

    1999-01-01

    Well-known concepts in statistical inference and sampling theory are used to develop recommendations for planning and analyzing the results of quantitative surveys of freshwater mollusks. Two methods of inference commonly used in survey sampling (design-based and model-based) are described and illustrated using examples relevant in surveys of freshwater mollusks. The particular objectives of a survey and the type of information observed in each unit of sampling can be used to help select the sampling design and the method of inference. For example, the mean density of a sparsely distributed population of mollusks can be estimated with higher precision by using model-based inference or by using design-based inference with adaptive cluster sampling than by using design-based inference with conventional sampling. More experience with quantitative surveys of natural assemblages of freshwater mollusks is needed to determine the actual benefits of different sampling designs and inferential procedures.

  7. Testing how voluntary participation requirements in an environmental study affect the planned random sample design outcomes: implications for the predictions of values and their uncertainty.

    NASA Astrophysics Data System (ADS)

    Ander, Louise; Lark, Murray; Smedley, Pauline; Watts, Michael; Hamilton, Elliott; Fletcher, Tony; Crabbe, Helen; Close, Rebecca; Studden, Mike; Leonardi, Giovanni

    2015-04-01

    Random sampling design is optimal in order to be able to assess outcomes, such as the mean of a given variable across an area. However, this optimal sampling design may be compromised to an unknown extent by unavoidable real-world factors: the extent to which the study design can still be considered random, and the influence this may have on the choice of appropriate statistical data analysis is examined in this work. We take a study which relied on voluntary participation for the sampling of private water tap chemical composition in England, UK. This study was designed and implemented as a categorical, randomised study. The local geological classes were grouped into 10 types, which were considered to be most important in likely effects on groundwater chemistry (the source of all the tap waters sampled). Locations of the users of private water supplies were made available to the study group from the Local Authority in the area. These were then assigned, based on location, to geological groups 1 to 10 and randomised within each group. However, the permission to collect samples then required active, voluntary participation by householders and thus, unlike many environmental studies, could not always follow the initial sample design. Impediments to participation ranged from 'willing but not available' during the designated sampling period, to a lack of response to requests to sample (assumed to be wholly unwilling or unable to participate). Additionally, a small number of unplanned samples were collected via new participants making themselves known to the sampling teams, during the sampling period. Here we examine the impact this has on the 'random' nature of the resulting data distribution, by comparison with the non-participating known supplies. We consider the implications this has on choice of statistical analysis methods to predict values and uncertainty at un-sampled locations.

  8. Long-term strategy for the statistical design of a forest health monitoring system

    Treesearch

    Hans T. Schreuder; Raymond L. Czaplewski

    1993-01-01

    A conceptual framework is given for a broad-scale survey of forest health that accomplishes three objectives: generate descriptive statistics; detect changes in such statistics; and simplify analytical inferences that identify, and possibly establish cause-effect relationships. Our paper discusses the development of sampling schemes to satisfy these three objectives,...

  9. Artificial Intelligence Approach to Support Statistical Quality Control Teaching

    ERIC Educational Resources Information Center

    Reis, Marcelo Menezes; Paladini, Edson Pacheco; Khator, Suresh; Sommer, Willy Arno

    2006-01-01

    Statistical quality control--SQC (consisting of Statistical Process Control, Process Capability Studies, Acceptance Sampling and Design of Experiments) is a very important tool to obtain, maintain and improve the Quality level of goods and services produced by an organization. Despite its importance, and the fact that it is taught in technical and…

  10. Occupancy Modeling Species-Environment Relationships with Non-ignorable Survey Designs.

    PubMed

    Irvine, Kathryn M; Rodhouse, Thomas J; Wright, Wilson J; Olsen, Anthony R

    2018-05-26

    Statistical models supporting inferences about species occurrence patterns in relation to environmental gradients are fundamental to ecology and conservation biology. A common implicit assumption is that the sampling design is ignorable and does not need to be formally accounted for in analyses. The analyst assumes data are representative of the desired population and statistical modeling proceeds. However, if datasets from probability and non-probability surveys are combined or unequal selection probabilities are used, the design may be non ignorable. We outline the use of pseudo-maximum likelihood estimation for site-occupancy models to account for such non-ignorable survey designs. This estimation method accounts for the survey design by properly weighting the pseudo-likelihood equation. In our empirical example, legacy and newer randomly selected locations were surveyed for bats to bridge a historic statewide effort with an ongoing nationwide program. We provide a worked example using bat acoustic detection/non-detection data and show how analysts can diagnose whether their design is ignorable. Using simulations we assessed whether our approach is viable for modeling datasets composed of sites contributed outside of a probability design Pseudo-maximum likelihood estimates differed from the usual maximum likelihood occu31 pancy estimates for some bat species. Using simulations we show the maximum likelihood estimator of species-environment relationships with non-ignorable sampling designs was biased, whereas the pseudo-likelihood estimator was design-unbiased. However, in our simulation study the designs composed of a large proportion of legacy or non-probability sites resulted in estimation issues for standard errors. These issues were likely a result of highly variable weights confounded by small sample sizes (5% or 10% sampling intensity and 4 revisits). Aggregating datasets from multiple sources logically supports larger sample sizes and potentially increases spatial extents for statistical inferences. Our results suggest that ignoring the mechanism for how locations were selected for data collection (e.g., the sampling design) could result in erroneous model-based conclusions. Therefore, in order to ensure robust and defensible recommendations for evidence-based conservation decision-making, the survey design information in addition to the data themselves must be available for analysts. Details for constructing the weights used in estimation and code for implementation are provided. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  11. Geospatial techniques for developing a sampling frame of watersheds across a region

    USGS Publications Warehouse

    Gresswell, Robert E.; Bateman, Douglas S.; Lienkaemper, George; Guy, T.J.

    2004-01-01

    Current land-management decisions that affect the persistence of native salmonids are often influenced by studies of individual sites that are selected based on judgment and convenience. Although this approach is useful for some purposes, extrapolating results to areas that were not sampled is statistically inappropriate because the sampling design is usually biased. Therefore, in recent investigations of coastal cutthroat trout (Oncorhynchus clarki clarki) located above natural barriers to anadromous salmonids, we used a methodology for extending the statistical scope of inference. The purpose of this paper is to apply geospatial tools to identify a population of watersheds and develop a probability-based sampling design for coastal cutthroat trout in western Oregon, USA. The population of mid-size watersheds (500-5800 ha) west of the Cascade Range divide was derived from watershed delineations based on digital elevation models. Because a database with locations of isolated populations of coastal cutthroat trout did not exist, a sampling frame of isolated watersheds containing cutthroat trout had to be developed. After the sampling frame of watersheds was established, isolated watersheds with coastal cutthroat trout were stratified by ecoregion and erosion potential based on dominant bedrock lithology (i.e., sedimentary and igneous). A stratified random sample of 60 watersheds was selected with proportional allocation in each stratum. By comparing watershed drainage areas of streams in the general population to those in the sampling frame and the resulting sample (n = 60), we were able to evaluate the how representative the subset of watersheds was in relation to the population of watersheds. Geospatial tools provided a relatively inexpensive means to generate the information necessary to develop a statistically robust, probability-based sampling design.

  12. Remote sensing-aided systems for snow qualification, evapotranspiration estimation, and their application in hydrologic models

    NASA Technical Reports Server (NTRS)

    Korram, S.

    1977-01-01

    The design of general remote sensing-aided methodologies was studied to provide the estimates of several important inputs to water yield forecast models. These input parameters are snow area extent, snow water content, and evapotranspiration. The study area is Feather River Watershed (780,000 hectares), Northern California. The general approach involved a stepwise sequence of identification of the required information, sample design, measurement/estimation, and evaluation of results. All the relevent and available information types needed in the estimation process are being defined. These include Landsat, meteorological satellite, and aircraft imagery, topographic and geologic data, ground truth data, and climatic data from ground stations. A cost-effective multistage sampling approach was employed in quantification of all the required parameters. The physical and statistical models for both snow quantification and evapotranspiration estimation was developed. These models use the information obtained by aerial and ground data through appropriate statistical sampling design.

  13. Directions for new developments on statistical design and analysis of small population group trials.

    PubMed

    Hilgers, Ralf-Dieter; Roes, Kit; Stallard, Nigel

    2016-06-14

    Most statistical design and analysis methods for clinical trials have been developed and evaluated where at least several hundreds of patients could be recruited. These methods may not be suitable to evaluate therapies if the sample size is unavoidably small, which is usually termed by small populations. The specific sample size cut off, where the standard methods fail, needs to be investigated. In this paper, the authors present their view on new developments for design and analysis of clinical trials in small population groups, where conventional statistical methods may be inappropriate, e.g., because of lack of power or poor adherence to asymptotic approximations due to sample size restrictions. Following the EMA/CHMP guideline on clinical trials in small populations, we consider directions for new developments in the area of statistical methodology for design and analysis of small population clinical trials. We relate the findings to the research activities of three projects, Asterix, IDeAl, and InSPiRe, which have received funding since 2013 within the FP7-HEALTH-2013-INNOVATION-1 framework of the EU. As not all aspects of the wide research area of small population clinical trials can be addressed, we focus on areas where we feel advances are needed and feasible. The general framework of the EMA/CHMP guideline on small population clinical trials stimulates a number of research areas. These serve as the basis for the three projects, Asterix, IDeAl, and InSPiRe, which use various approaches to develop new statistical methodology for design and analysis of small population clinical trials. Small population clinical trials refer to trials with a limited number of patients. Small populations may result form rare diseases or specific subtypes of more common diseases. New statistical methodology needs to be tailored to these specific situations. The main results from the three projects will constitute a useful toolbox for improved design and analysis of small population clinical trials. They address various challenges presented by the EMA/CHMP guideline as well as recent discussions about extrapolation. There is a need for involvement of the patients' perspective in the planning and conduct of small population clinical trials for a successful therapy evaluation.

  14. Reliability and statistical power analysis of cortical and subcortical FreeSurfer metrics in a large sample of healthy elderly.

    PubMed

    Liem, Franziskus; Mérillat, Susan; Bezzola, Ladina; Hirsiger, Sarah; Philipp, Michel; Madhyastha, Tara; Jäncke, Lutz

    2015-03-01

    FreeSurfer is a tool to quantify cortical and subcortical brain anatomy automatically and noninvasively. Previous studies have reported reliability and statistical power analyses in relatively small samples or only selected one aspect of brain anatomy. Here, we investigated reliability and statistical power of cortical thickness, surface area, volume, and the volume of subcortical structures in a large sample (N=189) of healthy elderly subjects (64+ years). Reliability (intraclass correlation coefficient) of cortical and subcortical parameters is generally high (cortical: ICCs>0.87, subcortical: ICCs>0.95). Surface-based smoothing increases reliability of cortical thickness maps, while it decreases reliability of cortical surface area and volume. Nevertheless, statistical power of all measures benefits from smoothing. When aiming to detect a 10% difference between groups, the number of subjects required to test effects with sufficient power over the entire cortex varies between cortical measures (cortical thickness: N=39, surface area: N=21, volume: N=81; 10mm smoothing, power=0.8, α=0.05). For subcortical regions this number is between 16 and 76 subjects, depending on the region. We also demonstrate the advantage of within-subject designs over between-subject designs. Furthermore, we publicly provide a tool that allows researchers to perform a priori power analysis and sensitivity analysis to help evaluate previously published studies and to design future studies with sufficient statistical power. Copyright © 2014 Elsevier Inc. All rights reserved.

  15. Experimental design matters for statistical analysis: how to handle blocking.

    PubMed

    Jensen, Signe M; Schaarschmidt, Frank; Onofri, Andrea; Ritz, Christian

    2018-03-01

    Nowadays, evaluation of the effects of pesticides often relies on experimental designs that involve multiple concentrations of the pesticide of interest or multiple pesticides at specific comparable concentrations and, possibly, secondary factors of interest. Unfortunately, the experimental design is often more or less neglected when analysing data. Two data examples were analysed using different modelling strategies. First, in a randomized complete block design, mean heights of maize treated with a herbicide and one of several adjuvants were compared. Second, translocation of an insecticide applied to maize as a seed treatment was evaluated using incomplete data from an unbalanced design with several layers of hierarchical sampling. Extensive simulations were carried out to further substantiate the effects of different modelling strategies. It was shown that results from suboptimal approaches (two-sample t-tests and ordinary ANOVA assuming independent observations) may be both quantitatively and qualitatively different from the results obtained using an appropriate linear mixed model. The simulations demonstrated that the different approaches may lead to differences in coverage percentages of confidence intervals and type 1 error rates, confirming that misleading conclusions can easily happen when an inappropriate statistical approach is chosen. To ensure that experimental data are summarized appropriately, avoiding misleading conclusions, the experimental design should duly be reflected in the choice of statistical approaches and models. We recommend that author guidelines should explicitly point out that authors need to indicate how the statistical analysis reflects the experimental design. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.

  16. Methodological issues with adaptation of clinical trial design.

    PubMed

    Hung, H M James; Wang, Sue-Jane; O'Neill, Robert T

    2006-01-01

    Adaptation of clinical trial design generates many issues that have not been resolved for practical applications, though statistical methodology has advanced greatly. This paper focuses on some methodological issues. In one type of adaptation such as sample size re-estimation, only the postulated value of a parameter for planning the trial size may be altered. In another type, the originally intended hypothesis for testing may be modified using the internal data accumulated at an interim time of the trial, such as changing the primary endpoint and dropping a treatment arm. For sample size re-estimation, we make a contrast between an adaptive test weighting the two-stage test statistics with the statistical information given by the original design and the original sample mean test with a properly corrected critical value. We point out the difficulty in planning a confirmatory trial based on the crude information generated by exploratory trials. In regards to selecting a primary endpoint, we argue that the selection process that allows switching from one endpoint to the other with the internal data of the trial is not very likely to gain a power advantage over the simple process of selecting one from the two endpoints by testing them with an equal split of alpha (Bonferroni adjustment). For dropping a treatment arm, distributing the remaining sample size of the discontinued arm to other treatment arms can substantially improve the statistical power of identifying a superior treatment arm in the design. A common difficult methodological issue is that of how to select an adaptation rule in the trial planning stage. Pre-specification of the adaptation rule is important for the practicality consideration. Changing the originally intended hypothesis for testing with the internal data generates great concerns to clinical trial researchers.

  17. Statistical Design for Biospecimen Cohort Size in Proteomics-based Biomarker Discovery and Verification Studies

    PubMed Central

    Skates, Steven J.; Gillette, Michael A.; LaBaer, Joshua; Carr, Steven A.; Anderson, N. Leigh; Liebler, Daniel C.; Ransohoff, David; Rifai, Nader; Kondratovich, Marina; Težak, Živana; Mansfield, Elizabeth; Oberg, Ann L.; Wright, Ian; Barnes, Grady; Gail, Mitchell; Mesri, Mehdi; Kinsinger, Christopher R.; Rodriguez, Henry; Boja, Emily S.

    2014-01-01

    Protein biomarkers are needed to deepen our understanding of cancer biology and to improve our ability to diagnose, monitor and treat cancers. Important analytical and clinical hurdles must be overcome to allow the most promising protein biomarker candidates to advance into clinical validation studies. Although contemporary proteomics technologies support the measurement of large numbers of proteins in individual clinical specimens, sample throughput remains comparatively low. This problem is amplified in typical clinical proteomics research studies, which routinely suffer from a lack of proper experimental design, resulting in analysis of too few biospecimens to achieve adequate statistical power at each stage of a biomarker pipeline. To address this critical shortcoming, a joint workshop was held by the National Cancer Institute (NCI), National Heart, Lung and Blood Institute (NHLBI), and American Association for Clinical Chemistry (AACC), with participation from the U.S. Food and Drug Administration (FDA). An important output from the workshop was a statistical framework for the design of biomarker discovery and verification studies. Herein, we describe the use of quantitative clinical judgments to set statistical criteria for clinical relevance, and the development of an approach to calculate biospecimen sample size for proteomic studies in discovery and verification stages prior to clinical validation stage. This represents a first step towards building a consensus on quantitative criteria for statistical design of proteomics biomarker discovery and verification research. PMID:24063748

  18. Statistical design for biospecimen cohort size in proteomics-based biomarker discovery and verification studies.

    PubMed

    Skates, Steven J; Gillette, Michael A; LaBaer, Joshua; Carr, Steven A; Anderson, Leigh; Liebler, Daniel C; Ransohoff, David; Rifai, Nader; Kondratovich, Marina; Težak, Živana; Mansfield, Elizabeth; Oberg, Ann L; Wright, Ian; Barnes, Grady; Gail, Mitchell; Mesri, Mehdi; Kinsinger, Christopher R; Rodriguez, Henry; Boja, Emily S

    2013-12-06

    Protein biomarkers are needed to deepen our understanding of cancer biology and to improve our ability to diagnose, monitor, and treat cancers. Important analytical and clinical hurdles must be overcome to allow the most promising protein biomarker candidates to advance into clinical validation studies. Although contemporary proteomics technologies support the measurement of large numbers of proteins in individual clinical specimens, sample throughput remains comparatively low. This problem is amplified in typical clinical proteomics research studies, which routinely suffer from a lack of proper experimental design, resulting in analysis of too few biospecimens to achieve adequate statistical power at each stage of a biomarker pipeline. To address this critical shortcoming, a joint workshop was held by the National Cancer Institute (NCI), National Heart, Lung, and Blood Institute (NHLBI), and American Association for Clinical Chemistry (AACC) with participation from the U.S. Food and Drug Administration (FDA). An important output from the workshop was a statistical framework for the design of biomarker discovery and verification studies. Herein, we describe the use of quantitative clinical judgments to set statistical criteria for clinical relevance and the development of an approach to calculate biospecimen sample size for proteomic studies in discovery and verification stages prior to clinical validation stage. This represents a first step toward building a consensus on quantitative criteria for statistical design of proteomics biomarker discovery and verification research.

  19. Seasonal rationalization of river water quality sampling locations: a comparative study of the modified Sanders and multivariate statistical approaches.

    PubMed

    Varekar, Vikas; Karmakar, Subhankar; Jha, Ramakar

    2016-02-01

    The design of surface water quality sampling location is a crucial decision-making process for rationalization of monitoring network. The quantity, quality, and types of available dataset (watershed characteristics and water quality data) may affect the selection of appropriate design methodology. The modified Sanders approach and multivariate statistical techniques [particularly factor analysis (FA)/principal component analysis (PCA)] are well-accepted and widely used techniques for design of sampling locations. However, their performance may vary significantly with quantity, quality, and types of available dataset. In this paper, an attempt has been made to evaluate performance of these techniques by accounting the effect of seasonal variation, under a situation of limited water quality data but extensive watershed characteristics information, as continuous and consistent river water quality data is usually difficult to obtain, whereas watershed information may be made available through application of geospatial techniques. A case study of Kali River, Western Uttar Pradesh, India, is selected for the analysis. The monitoring was carried out at 16 sampling locations. The discrete and diffuse pollution loads at different sampling sites were estimated and accounted using modified Sanders approach, whereas the monitored physical and chemical water quality parameters were utilized as inputs for FA/PCA. The designed optimum number of sampling locations for monsoon and non-monsoon seasons by modified Sanders approach are eight and seven while that for FA/PCA are eleven and nine, respectively. Less variation in the number and locations of designed sampling sites were obtained by both techniques, which shows stability of results. A geospatial analysis has also been carried out to check the significance of designed sampling location with respect to river basin characteristics and land use of the study area. Both methods are equally efficient; however, modified Sanders approach outperforms FA/PCA when limited water quality and extensive watershed information is available. The available water quality dataset is limited and FA/PCA-based approach fails to identify monitoring locations with higher variation, as these multivariate statistical approaches are data-driven. The priority/hierarchy and number of sampling sites designed by modified Sanders approach are well justified by the land use practices and observed river basin characteristics of the study area.

  20. Estimation of sample size and testing power (Part 4).

    PubMed

    Hu, Liang-ping; Bao, Xiao-lei; Guan, Xue; Zhou, Shi-guo

    2012-01-01

    Sample size estimation is necessary for any experimental or survey research. An appropriate estimation of sample size based on known information and statistical knowledge is of great significance. This article introduces methods of sample size estimation of difference test for data with the design of one factor with two levels, including sample size estimation formulas and realization based on the formulas and the POWER procedure of SAS software for quantitative data and qualitative data with the design of one factor with two levels. In addition, this article presents examples for analysis, which will play a leading role for researchers to implement the repetition principle during the research design phase.

  1. The Content of Statistical Requirements for Authors in Biomedical Research Journals

    PubMed Central

    Liu, Tian-Yi; Cai, Si-Yu; Nie, Xiao-Lu; Lyu, Ya-Qi; Peng, Xiao-Xia; Feng, Guo-Shuang

    2016-01-01

    Background: Robust statistical designing, sound statistical analysis, and standardized presentation are important to enhance the quality and transparency of biomedical research. This systematic review was conducted to summarize the statistical reporting requirements introduced by biomedical research journals with an impact factor of 10 or above so that researchers are able to give statistical issues’ serious considerations not only at the stage of data analysis but also at the stage of methodological design. Methods: Detailed statistical instructions for authors were downloaded from the homepage of each of the included journals or obtained from the editors directly via email. Then, we described the types and numbers of statistical guidelines introduced by different press groups. Items of statistical reporting guideline as well as particular requirements were summarized in frequency, which were grouped into design, method of analysis, and presentation, respectively. Finally, updated statistical guidelines and particular requirements for improvement were summed up. Results: Totally, 21 of 23 press groups introduced at least one statistical guideline. More than half of press groups can update their statistical instruction for authors gradually relative to issues of new statistical reporting guidelines. In addition, 16 press groups, covering 44 journals, address particular statistical requirements. The most of the particular requirements focused on the performance of statistical analysis and transparency in statistical reporting, including “address issues relevant to research design, including participant flow diagram, eligibility criteria, and sample size estimation,” and “statistical methods and the reasons.” Conclusions: Statistical requirements for authors are becoming increasingly perfected. Statistical requirements for authors remind researchers that they should make sufficient consideration not only in regards to statistical methods during the research design, but also standardized statistical reporting, which would be beneficial in providing stronger evidence and making a greater critical appraisal of evidence more accessible. PMID:27748343

  2. The Content of Statistical Requirements for Authors in Biomedical Research Journals.

    PubMed

    Liu, Tian-Yi; Cai, Si-Yu; Nie, Xiao-Lu; Lyu, Ya-Qi; Peng, Xiao-Xia; Feng, Guo-Shuang

    2016-10-20

    Robust statistical designing, sound statistical analysis, and standardized presentation are important to enhance the quality and transparency of biomedical research. This systematic review was conducted to summarize the statistical reporting requirements introduced by biomedical research journals with an impact factor of 10 or above so that researchers are able to give statistical issues' serious considerations not only at the stage of data analysis but also at the stage of methodological design. Detailed statistical instructions for authors were downloaded from the homepage of each of the included journals or obtained from the editors directly via email. Then, we described the types and numbers of statistical guidelines introduced by different press groups. Items of statistical reporting guideline as well as particular requirements were summarized in frequency, which were grouped into design, method of analysis, and presentation, respectively. Finally, updated statistical guidelines and particular requirements for improvement were summed up. Totally, 21 of 23 press groups introduced at least one statistical guideline. More than half of press groups can update their statistical instruction for authors gradually relative to issues of new statistical reporting guidelines. In addition, 16 press groups, covering 44 journals, address particular statistical requirements. The most of the particular requirements focused on the performance of statistical analysis and transparency in statistical reporting, including "address issues relevant to research design, including participant flow diagram, eligibility criteria, and sample size estimation," and "statistical methods and the reasons." Statistical requirements for authors are becoming increasingly perfected. Statistical requirements for authors remind researchers that they should make sufficient consideration not only in regards to statistical methods during the research design, but also standardized statistical reporting, which would be beneficial in providing stronger evidence and making a greater critical appraisal of evidence more accessible.

  3. A note on sample size calculation for mean comparisons based on noncentral t-statistics.

    PubMed

    Chow, Shein-Chung; Shao, Jun; Wang, Hansheng

    2002-11-01

    One-sample and two-sample t-tests are commonly used in analyzing data from clinical trials in comparing mean responses from two drug products. During the planning stage of a clinical study, a crucial step is the sample size calculation, i.e., the determination of the number of subjects (patients) needed to achieve a desired power (e.g., 80%) for detecting a clinically meaningful difference in the mean drug responses. Based on noncentral t-distributions, we derive some sample size calculation formulas for testing equality, testing therapeutic noninferiority/superiority, and testing therapeutic equivalence, under the popular one-sample design, two-sample parallel design, and two-sample crossover design. Useful tables are constructed and some examples are given for illustration.

  4. Two-sample statistics for testing the equality of survival functions against improper semi-parametric accelerated failure time alternatives: an application to the analysis of a breast cancer clinical trial.

    PubMed

    Broët, Philippe; Tsodikov, Alexander; De Rycke, Yann; Moreau, Thierry

    2004-06-01

    This paper presents two-sample statistics suited for testing equality of survival functions against improper semi-parametric accelerated failure time alternatives. These tests are designed for comparing either the short- or the long-term effect of a prognostic factor, or both. These statistics are obtained as partial likelihood score statistics from a time-dependent Cox model. As a consequence, the proposed tests can be very easily implemented using widely available software. A breast cancer clinical trial is presented as an example to demonstrate the utility of the proposed tests.

  5. Outcome-Dependent Sampling Design and Inference for Cox’s Proportional Hazards Model

    PubMed Central

    Yu, Jichang; Liu, Yanyan; Cai, Jianwen; Sandler, Dale P.; Zhou, Haibo

    2016-01-01

    We propose a cost-effective outcome-dependent sampling design for the failure time data and develop an efficient inference procedure for data collected with this design. To account for the biased sampling scheme, we derive estimators from a weighted partial likelihood estimating equation. The proposed estimators for regression parameters are shown to be consistent and asymptotically normally distributed. A criteria that can be used to optimally implement the ODS design in practice is proposed and studied. The small sample performance of the proposed method is evaluated by simulation studies. The proposed design and inference procedure is shown to be statistically more powerful than existing alternative designs with the same sample sizes. We illustrate the proposed method with an existing real data from the Cancer Incidence and Mortality of Uranium Miners Study. PMID:28090134

  6. Developing the design of a continuous national health survey for New Zealand

    PubMed Central

    2013-01-01

    Background A continuously operating survey can yield advantages in survey management, field operations, and the provision of timely information for policymakers and researchers. We describe the key features of the sample design of the New Zealand (NZ) Health Survey, which has been conducted on a continuous basis since mid-2011, and compare to a number of other national population health surveys. Methods A number of strategies to improve the NZ Health Survey are described: implementation of a targeted dual-frame sample design for better Māori, Pacific, and Asian statistics; movement from periodic to continuous operation; use of core questions with rotating topic modules to improve flexibility in survey content; and opportunities for ongoing improvements and efficiencies, including linkage to administrative datasets. Results and discussion The use of disproportionate area sampling and a dual frame design resulted in reductions of approximately 19%, 26%, and 4% to variances of Māori, Pacific and Asian statistics respectively, but at the cost of a 17% increase to all-ethnicity variances. These were broadly in line with the survey’s priorities. Respondents provided a high degree of cooperation in the first year, with an adult response rate of 79% and consent rates for data linkage above 90%. Conclusions A combination of strategies tailored to local conditions gives the best results for national health surveys. In the NZ context, data from the NZ Census of Population and Dwellings and the Electoral Roll can be used to improve the sample design. A continuously operating survey provides both administrative and statistical advantages. PMID:24364838

  7. Design of portable ultraminiature flow cytometers for medical diagnostics

    NASA Astrophysics Data System (ADS)

    Leary, James F.

    2018-02-01

    Design of portable microfluidic flow/image cytometry devices for measurements in the field (e.g. initial medical diagnostics) requires careful design in terms of power requirements and weight to allow for realistic portability. True portability with high-throughput microfluidic systems also requires sampling systems without the need for sheath hydrodynamic focusing both to avoid the need for sheath fluid and to enable higher volumes of actual sample, rather than sheath/sample combinations. Weight/power requirements dictate use of super-bright LEDs with top-hat excitation beam architectures and very small silicon photodiodes or nanophotonic sensors that can both be powered by small batteries. Signal-to-noise characteristics can be greatly improved by appropriately pulsing the LED excitation sources and sampling and subtracting noise in between excitation pulses. Microfluidic cytometry also requires judicious use of small sample volumes and appropriate statistical sampling by microfluidic cytometry or imaging for adequate statistical significance to permit real-time (typically in less than 15 minutes) initial medical decisions for patients in the field. This is not something conventional cytometry traditionally worries about, but is very important for development of small, portable microfluidic devices with small-volume throughputs. It also provides a more reasonable alternative to conventional tubes of blood when sampling geriatric and newborn patients for whom a conventional peripheral blood draw can be problematical. Instead one or two drops of blood obtained by pin-prick should be able to provide statistically meaningful results for use in making real-time medical decisions without the need for blood fractionation, which is not realistic in the doctor's office or field.

  8. Taking a statistical approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wild, M.; Rouhani, S.

    1995-02-01

    A typical site investigation entails extensive sampling and monitoring. In the past, sampling plans have been designed on purely ad hoc bases, leading to significant expenditures and, in some cases, collection of redundant information. In many instances, sampling costs exceed the true worth of the collected data. The US Environmental Protection Agency (EPA) therefore has advocated the use of geostatistics to provide a logical framework for sampling and analysis of environmental data. Geostatistical methodology uses statistical techniques for the spatial analysis of a variety of earth-related data. The use of geostatistics was developed by the mining industry to estimate oremore » concentrations. The same procedure is effective in quantifying environmental contaminants in soils for risk assessments. Unlike classical statistical techniques, geostatistics offers procedures to incorporate the underlying spatial structure of the investigated field. Sample points spaced close together tend to be more similar than samples spaced further apart. This can guide sampling strategies and determine complex contaminant distributions. Geostatistic techniques can be used to evaluate site conditions on the basis of regular, irregular, random and even spatially biased samples. In most environmental investigations, it is desirable to concentrate sampling in areas of known or suspected contamination. The rigorous mathematical procedures of geostatistics allow for accurate estimates at unsampled locations, potentially reducing sampling requirements. The use of geostatistics serves as a decision-aiding and planning tool and can significantly reduce short-term site assessment costs, long-term sampling and monitoring needs, as well as lead to more accurate and realistic remedial design criteria.« less

  9. Spatial Statistical Models and Optimal Survey Design for Rapid Geophysical characterization of UXO Sites

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    G. Ostrouchov; W.E.Doll; D.A.Wolf

    2003-07-01

    Unexploded ordnance(UXO)surveys encompass large areas, and the cost of surveying these areas can be high. Enactment of earlier protocols for sampling UXO sites have shown the shortcomings of these procedures and led to a call for development of scientifically defensible statistical procedures for survey design and analysis. This project is one of three funded by SERDP to address this need.

  10. Paradigms for adaptive statistical information designs: practical experiences and strategies.

    PubMed

    Wang, Sue-Jane; Hung, H M James; O'Neill, Robert

    2012-11-10

    In the last decade or so, interest in adaptive design clinical trials has gradually been directed towards their use in regulatory submissions by pharmaceutical drug sponsors to evaluate investigational new drugs. Methodological advances of adaptive designs are abundant in the statistical literature since the 1970s. The adaptive design paradigm has been enthusiastically perceived to increase the efficiency and to be more cost-effective than the fixed design paradigm for drug development. Much interest in adaptive designs is in those studies with two-stages, where stage 1 is exploratory and stage 2 depends upon stage 1 results, but where the data of both stages will be combined to yield statistical evidence for use as that of a pivotal registration trial. It was not until the recent release of the US Food and Drug Administration Draft Guidance for Industry on Adaptive Design Clinical Trials for Drugs and Biologics (2010) that the boundaries of flexibility for adaptive designs were specifically considered for regulatory purposes, including what are exploratory goals, and what are the goals of adequate and well-controlled (A&WC) trials (2002). The guidance carefully described these distinctions in an attempt to minimize the confusion between the goals of preliminary learning phases of drug development, which are inherently substantially uncertain, and the definitive inference-based phases of drug development. In this paper, in addition to discussing some aspects of adaptive designs in a confirmatory study setting, we underscore the value of adaptive designs when used in exploratory trials to improve planning of subsequent A&WC trials. One type of adaptation that is receiving attention is the re-estimation of the sample size during the course of the trial. We refer to this type of adaptation as an adaptive statistical information design. Specifically, a case example is used to illustrate how challenging it is to plan a confirmatory adaptive statistical information design. We highlight the substantial risk of planning the sample size for confirmatory trials when information is very uninformative and stipulate the advantages of adaptive statistical information designs for planning exploratory trials. Practical experiences and strategies as lessons learned from more recent adaptive design proposals will be discussed to pinpoint the improved utilities of adaptive design clinical trials and their potential to increase the chance of a successful drug development. Published 2012. This article is a US Government work and is in the public domain in the USA.

  11. Statistical assessment on a combined analysis of GRYN-ROMN-UCBN upland vegetation vital signs

    USGS Publications Warehouse

    Irvine, Kathryn M.; Rodhouse, Thomas J.

    2014-01-01

    As of 2013, Rocky Mountain and Upper Columbia Basin Inventory and Monitoring Networks have multiple years of vegetation data and Greater Yellowstone Network has three years of vegetation data and monitoring is ongoing in all three networks. Our primary objective is to assess whether a combined analysis of these data aimed at exploring correlations with climate and weather data is feasible. We summarize the core survey design elements across protocols and point out the major statistical challenges for a combined analysis at present. The dissimilarity in response designs between ROMN and UCBN-GRYN network protocols presents a statistical challenge that has not been resolved yet. However, the UCBN and GRYN data are compatible as they implement a similar response design; therefore, a combined analysis is feasible and will be pursued in future. When data collected by different networks are combined, the survey design describing the merged dataset is (likely) a complex survey design. A complex survey design is the result of combining datasets from different sampling designs. A complex survey design is characterized by unequal probability sampling, varying stratification, and clustering (see Lohr 2010 Chapter 7 for general overview). Statistical analysis of complex survey data requires modifications to standard methods, one of which is to include survey design weights within a statistical model. We focus on this issue for a combined analysis of upland vegetation from these networks, leaving other topics for future research. We conduct a simulation study on the possible effects of equal versus unequal probability selection of points on parameter estimates of temporal trend using available packages within the R statistical computing package. We find that, as written, using lmer or lm for trend detection in a continuous response and clm and clmm for visually estimated cover classes with “raw” GRTS design weights specified for the weight argument leads to substantially different results and/or computational instability. However, when only fixed effects are of interest, the survey package (svyglm and svyolr) may be suitable for a model-assisted analysis for trend. We provide possible directions for future research into combined analysis for ordinal and continuous vital sign indictors.

  12. [The research protocol VI: How to choose the appropriate statistical test. Inferential statistics].

    PubMed

    Flores-Ruiz, Eric; Miranda-Novales, María Guadalupe; Villasís-Keever, Miguel Ángel

    2017-01-01

    The statistical analysis can be divided in two main components: descriptive analysis and inferential analysis. An inference is to elaborate conclusions from the tests performed with the data obtained from a sample of a population. Statistical tests are used in order to establish the probability that a conclusion obtained from a sample is applicable to the population from which it was obtained. However, choosing the appropriate statistical test in general poses a challenge for novice researchers. To choose the statistical test it is necessary to take into account three aspects: the research design, the number of measurements and the scale of measurement of the variables. Statistical tests are divided into two sets, parametric and nonparametric. Parametric tests can only be used if the data show a normal distribution. Choosing the right statistical test will make it easier for readers to understand and apply the results.

  13. An evaluation of the quality of statistical design and analysis of published medical research: results from a systematic survey of general orthopaedic journals.

    PubMed

    Parsons, Nick R; Price, Charlotte L; Hiskens, Richard; Achten, Juul; Costa, Matthew L

    2012-04-25

    The application of statistics in reported research in trauma and orthopaedic surgery has become ever more important and complex. Despite the extensive use of statistical analysis, it is still a subject which is often not conceptually well understood, resulting in clear methodological flaws and inadequate reporting in many papers. A detailed statistical survey sampled 100 representative orthopaedic papers using a validated questionnaire that assessed the quality of the trial design and statistical analysis methods. The survey found evidence of failings in study design, statistical methodology and presentation of the results. Overall, in 17% (95% confidence interval; 10-26%) of the studies investigated the conclusions were not clearly justified by the results, in 39% (30-49%) of studies a different analysis should have been undertaken and in 17% (10-26%) a different analysis could have made a difference to the overall conclusions. It is only by an improved dialogue between statistician, clinician, reviewer and journal editor that the failings in design methodology and analysis highlighted by this survey can be addressed.

  14. Map of National Aquatic Resource Surveys Sampling Locations

    EPA Pesticide Factsheets

    This map displays all of the lakes, rivers and streams, wetlands, and coastal waters sampled by the National Aquatic Resource Surveys, a collaborative EPA program that assesses the condition of the nation's waters using statistical designs.

  15. Statistical properties of mean stand biomass estimators in a LIDAR-based double sampling forest survey design.

    Treesearch

    H.E. Anderson; J. Breidenbach

    2007-01-01

    Airborne laser scanning (LIDAR) can be a valuable tool in double-sampling forest survey designs. LIDAR-derived forest structure metrics are often highly correlated with important forest inventory variables, such as mean stand biomass, and LIDAR-based synthetic regression estimators have the potential to be highly efficient compared to single-stage estimators, which...

  16. A Comparison of Single Sample and Bootstrap Methods to Assess Mediation in Cluster Randomized Trials

    ERIC Educational Resources Information Center

    Pituch, Keenan A.; Stapleton, Laura M.; Kang, Joo Youn

    2006-01-01

    A Monte Carlo study examined the statistical performance of single sample and bootstrap methods that can be used to test and form confidence interval estimates of indirect effects in two cluster randomized experimental designs. The designs were similar in that they featured random assignment of clusters to one of two treatment conditions and…

  17. Sampling design for the 1980 commercial and multifamily residential building survey

    NASA Astrophysics Data System (ADS)

    Bowen, W. M.; Olsen, A. R.; Nieves, A. L.

    1981-06-01

    The extent to which new building design practices comply with the proposed 1980 energy budget levels for commercial and multifamily residential building designs (DEB-80) can be assessed by: (1) identifying small number of building types which account for the majority of commercial buildings constructed in the U.S.A.; (2) conducting a separate survey for each building type; and (3) including only buildings designed during 1980. For each building, the design energy consumption (DEC-80) will be determined by the DOE2.1 computer program. The quantity X = (DEC-80 - DEB-80). These X quantities can then be used to compute sample statistics. Inferences about nationwide compliance with DEB-80 may then be made for each building type. Details of the population, sampling frame, stratification, sample size, and implementation of the sampling plan are provided.

  18. Survey of rural, private wells. Statistical design

    USGS Publications Warehouse

    Mehnert, Edward; Schock, Susan C.; ,

    1991-01-01

    Half of Illinois' 38 million acres were planted in corn and soybeans in 1988. On the 19 million acres planted in corn and soybeans, approximately 1 million tons of nitrogen fertilizer and 50 million pounds of pesticides were applied. Because groundwater is the water supply for over 90 percent of rural Illinois, the occurrence of agricultural chemicals in groundwater in Illinois is of interest to the agricultural community, the public, and regulatory agencies. The occurrence of agricultural chemicals in groundwater is well documented. However, the extent of this contamination still needs to be defined. This can be done by randomly sampling wells across a geographic area. Key elements of a random, water-well sampling program for regional groundwater quality include the overall statistical design of the program, definition of the sample population, selection of wells to be sampled, and analysis of survey results. These elements must be consistent with the purpose for conducting the program; otherwise, the program will not provide the desired information. The need to carefully design and conduct a sampling program becomes readily apparent when one considers the high cost of collecting and analyzing a sample. For a random sampling program conducted in Illinois, the key elements, as well as the limitations imposed by available information, are described.

  19. Combining censored and uncensored data in a U-statistic: design and sample size implications for cell therapy research.

    PubMed

    Moyé, Lemuel A; Lai, Dejian; Jing, Kaiyan; Baraniuk, Mary Sarah; Kwak, Minjung; Penn, Marc S; Wu, Colon O

    2011-01-01

    The assumptions that anchor large clinical trials are rooted in smaller, Phase II studies. In addition to specifying the target population, intervention delivery, and patient follow-up duration, physician-scientists who design these Phase II studies must select the appropriate response variables (endpoints). However, endpoint measures can be problematic. If the endpoint assesses the change in a continuous measure over time, then the occurrence of an intervening significant clinical event (SCE), such as death, can preclude the follow-up measurement. Finally, the ideal continuous endpoint measurement may be contraindicated in a fraction of the study patients, a change that requires a less precise substitution in this subset of participants.A score function that is based on the U-statistic can address these issues of 1) intercurrent SCE's and 2) response variable ascertainments that use different measurements of different precision. The scoring statistic is easy to apply, clinically relevant, and provides flexibility for the investigators' prospective design decisions. Sample size and power formulations for this statistic are provided as functions of clinical event rates and effect size estimates that are easy for investigators to identify and discuss. Examples are provided from current cardiovascular cell therapy research.

  20. Evaluating the efficiency of environmental monitoring programs

    USGS Publications Warehouse

    Levine, Carrie R.; Yanai, Ruth D.; Lampman, Gregory G.; Burns, Douglas A.; Driscoll, Charles T.; Lawrence, Gregory B.; Lynch, Jason; Schoch, Nina

    2014-01-01

    Statistical uncertainty analyses can be used to improve the efficiency of environmental monitoring, allowing sampling designs to maximize information gained relative to resources required for data collection and analysis. In this paper, we illustrate four methods of data analysis appropriate to four types of environmental monitoring designs. To analyze a long-term record from a single site, we applied a general linear model to weekly stream chemistry data at Biscuit Brook, NY, to simulate the effects of reducing sampling effort and to evaluate statistical confidence in the detection of change over time. To illustrate a detectable difference analysis, we analyzed a one-time survey of mercury concentrations in loon tissues in lakes in the Adirondack Park, NY, demonstrating the effects of sampling intensity on statistical power and the selection of a resampling interval. To illustrate a bootstrapping method, we analyzed the plot-level sampling intensity of forest inventory at the Hubbard Brook Experimental Forest, NH, to quantify the sampling regime needed to achieve a desired confidence interval. Finally, to analyze time-series data from multiple sites, we assessed the number of lakes and the number of samples per year needed to monitor change over time in Adirondack lake chemistry using a repeated-measures mixed-effects model. Evaluations of time series and synoptic long-term monitoring data can help determine whether sampling should be re-allocated in space or time to optimize the use of financial and human resources.

  1. Using variance components to estimate power in a hierarchically nested sampling design improving monitoring of larval Devils Hole pupfish

    USGS Publications Warehouse

    Dzul, Maria C.; Dixon, Philip M.; Quist, Michael C.; Dinsomore, Stephen J.; Bower, Michael R.; Wilson, Kevin P.; Gaines, D. Bailey

    2013-01-01

    We used variance components to assess allocation of sampling effort in a hierarchically nested sampling design for ongoing monitoring of early life history stages of the federally endangered Devils Hole pupfish (DHP) (Cyprinodon diabolis). Sampling design for larval DHP included surveys (5 days each spring 2007–2009), events, and plots. Each survey was comprised of three counting events, where DHP larvae on nine plots were counted plot by plot. Statistical analysis of larval abundance included three components: (1) evaluation of power from various sample size combinations, (2) comparison of power in fixed and random plot designs, and (3) assessment of yearly differences in the power of the survey. Results indicated that increasing the sample size at the lowest level of sampling represented the most realistic option to increase the survey's power, fixed plot designs had greater power than random plot designs, and the power of the larval survey varied by year. This study provides an example of how monitoring efforts may benefit from coupling variance components estimation with power analysis to assess sampling design.

  2. Two-Sample Statistics for Testing the Equality of Survival Functions Against Improper Semi-parametric Accelerated Failure Time Alternatives: An Application to the Analysis of a Breast Cancer Clinical Trial

    PubMed Central

    BROËT, PHILIPPE; TSODIKOV, ALEXANDER; DE RYCKE, YANN; MOREAU, THIERRY

    2010-01-01

    This paper presents two-sample statistics suited for testing equality of survival functions against improper semi-parametric accelerated failure time alternatives. These tests are designed for comparing either the short- or the long-term effect of a prognostic factor, or both. These statistics are obtained as partial likelihood score statistics from a time-dependent Cox model. As a consequence, the proposed tests can be very easily implemented using widely available software. A breast cancer clinical trial is presented as an example to demonstrate the utility of the proposed tests. PMID:15293627

  3. 45 CFR 308.1 - Self-assessment implementation methodology.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... selects statistically valid samples of cases from the IV-D program universe of cases; and (3) The State establishes a procedure for the design of samples and assures that no portions of the IV-D case universe are...

  4. 45 CFR 308.1 - Self-assessment implementation methodology.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... selects statistically valid samples of cases from the IV-D program universe of cases; and (3) The State establishes a procedure for the design of samples and assures that no portions of the IV-D case universe are...

  5. 45 CFR 308.1 - Self-assessment implementation methodology.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... selects statistically valid samples of cases from the IV-D program universe of cases; and (3) The State establishes a procedure for the design of samples and assures that no portions of the IV-D case universe are...

  6. 45 CFR 308.1 - Self-assessment implementation methodology.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... selects statistically valid samples of cases from the IV-D program universe of cases; and (3) The State establishes a procedure for the design of samples and assures that no portions of the IV-D case universe are...

  7. 45 CFR 308.1 - Self-assessment implementation methodology.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... selects statistically valid samples of cases from the IV-D program universe of cases; and (3) The State establishes a procedure for the design of samples and assures that no portions of the IV-D case universe are...

  8. What Are Probability Surveys used by the National Aquatic Resource Surveys?

    EPA Pesticide Factsheets

    The National Aquatic Resource Surveys (NARS) use probability-survey designs to assess the condition of the nation’s waters. In probability surveys (also known as sample-surveys or statistical surveys), sampling sites are selected randomly.

  9. Optimized Design and Analysis of Sparse-Sampling fMRI Experiments

    PubMed Central

    Perrachione, Tyler K.; Ghosh, Satrajit S.

    2013-01-01

    Sparse-sampling is an important methodological advance in functional magnetic resonance imaging (fMRI), in which silent delays are introduced between MR volume acquisitions, allowing for the presentation of auditory stimuli without contamination by acoustic scanner noise and for overt vocal responses without motion-induced artifacts in the functional time series. As such, the sparse-sampling technique has become a mainstay of principled fMRI research into the cognitive and systems neuroscience of speech, language, hearing, and music. Despite being in use for over a decade, there has been little systematic investigation of the acquisition parameters, experimental design considerations, and statistical analysis approaches that bear on the results and interpretation of sparse-sampling fMRI experiments. In this report, we examined how design and analysis choices related to the duration of repetition time (TR) delay (an acquisition parameter), stimulation rate (an experimental design parameter), and model basis function (an analysis parameter) act independently and interactively to affect the neural activation profiles observed in fMRI. First, we conducted a series of computational simulations to explore the parameter space of sparse design and analysis with respect to these variables; second, we validated the results of these simulations in a series of sparse-sampling fMRI experiments. Overall, these experiments suggest the employment of three methodological approaches that can, in many situations, substantially improve the detection of neurophysiological response in sparse fMRI: (1) Sparse analyses should utilize a physiologically informed model that incorporates hemodynamic response convolution to reduce model error. (2) The design of sparse fMRI experiments should maintain a high rate of stimulus presentation to maximize effect size. (3) TR delays of short to intermediate length can be used between acquisitions of sparse-sampled functional image volumes to increase the number of samples and improve statistical power. PMID:23616742

  10. Optimized design and analysis of sparse-sampling FMRI experiments.

    PubMed

    Perrachione, Tyler K; Ghosh, Satrajit S

    2013-01-01

    Sparse-sampling is an important methodological advance in functional magnetic resonance imaging (fMRI), in which silent delays are introduced between MR volume acquisitions, allowing for the presentation of auditory stimuli without contamination by acoustic scanner noise and for overt vocal responses without motion-induced artifacts in the functional time series. As such, the sparse-sampling technique has become a mainstay of principled fMRI research into the cognitive and systems neuroscience of speech, language, hearing, and music. Despite being in use for over a decade, there has been little systematic investigation of the acquisition parameters, experimental design considerations, and statistical analysis approaches that bear on the results and interpretation of sparse-sampling fMRI experiments. In this report, we examined how design and analysis choices related to the duration of repetition time (TR) delay (an acquisition parameter), stimulation rate (an experimental design parameter), and model basis function (an analysis parameter) act independently and interactively to affect the neural activation profiles observed in fMRI. First, we conducted a series of computational simulations to explore the parameter space of sparse design and analysis with respect to these variables; second, we validated the results of these simulations in a series of sparse-sampling fMRI experiments. Overall, these experiments suggest the employment of three methodological approaches that can, in many situations, substantially improve the detection of neurophysiological response in sparse fMRI: (1) Sparse analyses should utilize a physiologically informed model that incorporates hemodynamic response convolution to reduce model error. (2) The design of sparse fMRI experiments should maintain a high rate of stimulus presentation to maximize effect size. (3) TR delays of short to intermediate length can be used between acquisitions of sparse-sampled functional image volumes to increase the number of samples and improve statistical power.

  11. 75 FR 1415 - Submission for OMB Review: Comment Request

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-01-11

    ... Department of Labor--Bureau of Labor Statistics (BLS), Office of Management and Budget, Room 10235... Statistics. Type of Review: Revision of a currently approved collection. Title of Collection: The Consumer... sector. The data are collected from a national probability sample of households designed to represent the...

  12. Understanding Statistical Variation: A Response to Sharma

    ERIC Educational Resources Information Center

    Farmer, Jim

    2008-01-01

    In this article, the author responds to the paper "Exploring pre-service teachers' understanding of statistical variation: Implications for teaching and research" by Sashi Sharma (see EJ779107). In that paper, Sharma described a study "designed to investigate pre-service teachers' acknowledgment of variation in sampling and distribution…

  13. Statistical Analysis of Research Data | Center for Cancer Research

    Cancer.gov

    Recent advances in cancer biology have resulted in the need for increased statistical analysis of research data. The Statistical Analysis of Research Data (SARD) course will be held on April 5-6, 2018 from 9 a.m.-5 p.m. at the National Institutes of Health's Natcher Conference Center, Balcony C on the Bethesda Campus. SARD is designed to provide an overview on the general principles of statistical analysis of research data.  The first day will feature univariate data analysis, including descriptive statistics, probability distributions, one- and two-sample inferential statistics.

  14. A STATISTICAL SURVEY OF DIOXIN-LIKE COMPOUNDS IN ...

    EPA Pesticide Factsheets

    The USEPA and the USDA completed the first statistically designed survey of the occurrence and concentration of dibenzo-p-dioxins (CDDs), dibenzofurans (CDFs), and coplanar polychlorinated biphenyls (PCBs) in the fat of beef animals raised for human consumption in the United States. Back fat was sampled from 63 carcasses at federally inspected slaughter establishments nationwide. The sample design called for sampling beef animal classes in proportion to national annual slaughter statistics. All samples were analyzed using a modification of EPA method 1613, using isotope dilution, High Resolution GC/MS to determine the rate of occurrence of 2,3,7,8-substituted CDDs/CDFs/PCBs. The method detection limits ranged from 0.05 ng/kg for TCDD to 3 ng/kg for OCDD. The results of this survey showed a mean concentration (reported as I-TEQ, lipid adjusted) in U.S. beef animals of 0.35 ng/kg and 0.89 ng/kg for CDD/CDF TEQs when either non-detects are treated as 0 value or assigned a value of 1/2 the detection limit, respectively, and 0.51 ng/kg for coplanar PCB TEQs at both non-detect equal 0 and 1/2 detection limit. journal article

  15. Design Effects and Generalized Variance Functions for the 1990-91 Schools and Staffing Survey (SASS). Volume II. Technical Report.

    ERIC Educational Resources Information Center

    Salvucci, Sameena; And Others

    This technical report provides the results of a study on the calculation and use of generalized variance functions (GVFs) and design effects for the 1990-91 Schools and Staffing Survey (SASS). The SASS is a periodic integrated system of sample surveys conducted by the National Center for Education Statistics (NCES) that produces sampling variances…

  16. [Practical aspects regarding sample size in clinical research].

    PubMed

    Vega Ramos, B; Peraza Yanes, O; Herrera Correa, G; Saldívar Toraya, S

    1996-01-01

    The knowledge of the right sample size let us to be sure if the published results in medical papers had a suitable design and a proper conclusion according to the statistics analysis. To estimate the sample size we must consider the type I error, type II error, variance, the size of the effect, significance and power of the test. To decide what kind of mathematics formula will be used, we must define what kind of study we have, it means if its a prevalence study, a means values one or a comparative one. In this paper we explain some basic topics of statistics and we describe four simple samples of estimation of sample size.

  17. On two-sample McNemar test.

    PubMed

    Xiang, Jim X

    2016-01-01

    Measuring a change in the existence of disease symptoms before and after a treatment is examined for statistical significance by means of the McNemar test. When comparing two treatments, Feuer and Kessler (1989) proposed a two-sample McNemar test. In this article, we show that this test usually inflates the type I error in the hypothesis testing, and propose a new two-sample McNemar test that is superior in terms of preserving type I error. We also make the connection between the two-sample McNemar test and the test statistic for the equal residual effects in a 2 × 2 crossover design. The limitations of the two-sample McNemar test are also discussed.

  18. Area estimation using multiyear designs and partial crop identification

    NASA Technical Reports Server (NTRS)

    Sielken, R. L., Jr.

    1984-01-01

    Statistical procedures were developed for large area assessments using both satellite and conventional data. Crop acreages, other ground cover indices, and measures of change were the principal characteristics of interest. These characteristics are capable of being estimated from samples collected possibly from several sources at varying times, with different levels of identification. Multiyear analysis techniques were extended to include partially identified samples; the best current year sampling design corresponding to a given sampling history was determined; weights reflecting the precision or confidence in each observation were identified and utilized, and the variation in estimates incorporating partially identified samples were quantified.

  19. Public and patient involvement in quantitative health research: A statistical perspective.

    PubMed

    Hannigan, Ailish

    2018-06-19

    The majority of studies included in recent reviews of impact for public and patient involvement (PPI) in health research had a qualitative design. PPI in solely quantitative designs is underexplored, particularly its impact on statistical analysis. Statisticians in practice have a long history of working in both consultative (indirect) and collaborative (direct) roles in health research, yet their perspective on PPI in quantitative health research has never been explicitly examined. To explore the potential and challenges of PPI from a statistical perspective at distinct stages of quantitative research, that is sampling, measurement and statistical analysis, distinguishing between indirect and direct PPI. Statistical analysis is underpinned by having a representative sample, and a collaborative or direct approach to PPI may help achieve that by supporting access to and increasing participation of under-represented groups in the population. Acknowledging and valuing the role of lay knowledge of the context in statistical analysis and in deciding what variables to measure may support collective learning and advance scientific understanding, as evidenced by the use of participatory modelling in other disciplines. A recurring issue for quantitative researchers, which reflects quantitative sampling methods, is the selection and required number of PPI contributors, and this requires further methodological development. Direct approaches to PPI in quantitative health research may potentially increase its impact, but the facilitation and partnership skills required may require further training for all stakeholders, including statisticians. © 2018 The Authors Health Expectations published by John Wiley & Sons Ltd.

  20. Random-effects linear modeling and sample size tables for two special crossover designs of average bioequivalence studies: the four-period, two-sequence, two-formulation and six-period, three-sequence, three-formulation designs.

    PubMed

    Diaz, Francisco J; Berg, Michel J; Krebill, Ron; Welty, Timothy; Gidal, Barry E; Alloway, Rita; Privitera, Michael

    2013-12-01

    Due to concern and debate in the epilepsy medical community and to the current interest of the US Food and Drug Administration (FDA) in revising approaches to the approval of generic drugs, the FDA is currently supporting ongoing bioequivalence studies of antiepileptic drugs, the EQUIGEN studies. During the design of these crossover studies, the researchers could not find commercial or non-commercial statistical software that quickly allowed computation of sample sizes for their designs, particularly software implementing the FDA requirement of using random-effects linear models for the analyses of bioequivalence studies. This article presents tables for sample-size evaluations of average bioequivalence studies based on the two crossover designs used in the EQUIGEN studies: the four-period, two-sequence, two-formulation design, and the six-period, three-sequence, three-formulation design. Sample-size computations assume that random-effects linear models are used in bioequivalence analyses with crossover designs. Random-effects linear models have been traditionally viewed by many pharmacologists and clinical researchers as just mathematical devices to analyze repeated-measures data. In contrast, a modern view of these models attributes an important mathematical role in theoretical formulations in personalized medicine to them, because these models not only have parameters that represent average patients, but also have parameters that represent individual patients. Moreover, the notation and language of random-effects linear models have evolved over the years. Thus, another goal of this article is to provide a presentation of the statistical modeling of data from bioequivalence studies that highlights the modern view of these models, with special emphasis on power analyses and sample-size computations.

  1. Trajectory Design for a Single-String Impactor Concept

    NASA Technical Reports Server (NTRS)

    Dono Perez, Andres; Burton, Roland; Stupl, Jan; Mauro, David

    2017-01-01

    This paper introduces a trajectory design for a secondary spacecraft concept to augment science return in interplanetary missions. The concept consist of a single-string probe with a kinetic impactor on board that generates an artificial plume to perform in-situ sampling. The trajectory design was applied to a particular case study that samples ejecta particles from the Jovian moon Europa. Results were validated using statistical analysis. Details regarding the navigation, targeting and disposal challenges related to this concept are presented herein.

  2. The international growth standard for preadolescent and adolescent children: statistical considerations.

    PubMed

    Cole, T J

    2006-12-01

    This article discusses statistical considerations for the design of a new study intended to provide an International Growth Standard for Preadolescent and Adolescent Children, including issues such as cross-sectional, longitudinal, and mixed designs; sample-size derivation for the number of populations and number of children per population; modeling of growth centiles of height, weight, and other measurements; and modeling of the adolescent growth spurt. The conclusions are that a mixed longitudinal design will provide information on both growth distance and velocity; samples of children from 5 to 10 sites should be suitable for an international standard (based on political rather than statistical arguments); the samples should be broadly uniform across age but oversampled during puberty, and should include data into adulthood. The LMS method is recommended for constructing measurement centiles, and parametric or semiparametric approaches are available to estimate the timing of the adolescent growth spurt in individuals. If the new standard is to be grafted onto the 2006 World Health Organization (WHO) reference, caution is needed at the join point of 5 years, where children from the new standard are likely to be appreciably more obese than those from the WHO reference, due to the rising trends in obesity and the time gap in data collection between the two surveys.

  3. [Evaluation of the quality of Anales Españoles de Pediatría versus Medicina Clínica].

    PubMed

    Bonillo Perales, A

    2002-08-01

    To compare the scientific methodology and quality of articles published in Anales Españoles de Pediatría and Medicina Clínica. A stratified and randomized selection of 40 original articles published in 2001 in Anales Españoles de Pediatría and Medicina Clínica was made. Methodological errors in the critical analysis of original articles (21 items), epidemiological design, sample size, statistical complexity and levels of scientific evidence in both journals were compared using the chi-squared and/or Student's t-test. No differences were found between Anales Españoles de Pediatría and Medicina Clínica in the critical evaluation of original articles (p > 0.2). In original articles published in Anales Españoles de Pediatría, the designs were of lower scientific evidence (a lower proportion of clinical trials, cohort and case-control studies) (17.5 vs 42.5 %, p 0.05), sample sizes were smaller (p 0.003) and there was less statistical complexity in the results section (p 0.03). To improve the scientific quality of Anales Españoles de Pediatría, improved study designs, larger sample sizes and greater statistical complexity are required in its articles.

  4. Assessment of statistical uncertainty in the quantitative analysis of solid samples in motion using laser-induced breakdown spectroscopy

    NASA Astrophysics Data System (ADS)

    Cabalín, L. M.; González, A.; Ruiz, J.; Laserna, J. J.

    2010-08-01

    Statistical uncertainty in the quantitative analysis of solid samples in motion by laser-induced breakdown spectroscopy (LIBS) has been assessed. For this purpose, a LIBS demonstrator was designed and constructed in our laboratory. The LIBS system consisted of a laboratory-scale conveyor belt, a compact optical module and a Nd:YAG laser operating at 532 nm. The speed of the conveyor belt was variable and could be adjusted up to a maximum speed of 2 m s - 1 . Statistical uncertainty in the analytical measurements was estimated in terms of precision (reproducibility and repeatability) and accuracy. The results obtained by LIBS on shredded scrap samples under real conditions have demonstrated that the analytical precision and accuracy of LIBS is dependent on the sample geometry, position on the conveyor belt and surface cleanliness. Flat, relatively clean scrap samples exhibited acceptable reproducibility and repeatability; by contrast, samples with an irregular shape or a dirty surface exhibited a poor relative standard deviation.

  5. Merging National Forest and National Forest Health Inventories to Obtain an Integrated Forest Resource Inventory – Experiences from Bavaria, Slovenia and Sweden

    PubMed Central

    Kovač, Marko; Bauer, Arthur; Ståhl, Göran

    2014-01-01

    Backgrounds, Material and Methods To meet the demands of sustainable forest management and international commitments, European nations have designed a variety of forest-monitoring systems for specific needs. While the majority of countries are committed to independent, single-purpose inventorying, a minority of countries have merged their single-purpose forest inventory systems into integrated forest resource inventories. The statistical efficiencies of the Bavarian, Slovene and Swedish integrated forest resource inventory designs are investigated with the various statistical parameters of the variables of growing stock volume, shares of damaged trees, and deadwood volume. The parameters are derived by using the estimators for the given inventory designs. The required sample sizes are derived via the general formula for non-stratified independent samples and via statistical power analyses. The cost effectiveness of the designs is compared via two simple cost effectiveness ratios. Results In terms of precision, the most illustrative parameters of the variables are relative standard errors; their values range between 1% and 3% if the variables’ variations are low (s%<80%) and are higher in the case of higher variations. A comparison of the actual and required sample sizes shows that the actual sample sizes were deliberately set high to provide precise estimates for the majority of variables and strata. In turn, the successive inventories are statistically efficient, because they allow detecting the mean changes of variables with powers higher than 90%; the highest precision is attained for the changes of growing stock volume and the lowest for the changes of the shares of damaged trees. Two indicators of cost effectiveness also show that the time input spent for measuring one variable decreases with the complexity of inventories. Conclusion There is an increasing need for credible information on forest resources to be used for decision making and national and international policy making. Such information can be cost-efficiently provided through integrated forest resource inventories. PMID:24941120

  6. Statistical power as a function of Cronbach alpha of instrument questionnaire items.

    PubMed

    Heo, Moonseong; Kim, Namhee; Faith, Myles S

    2015-10-14

    In countless number of clinical trials, measurements of outcomes rely on instrument questionnaire items which however often suffer measurement error problems which in turn affect statistical power of study designs. The Cronbach alpha or coefficient alpha, here denoted by C(α), can be used as a measure of internal consistency of parallel instrument items that are developed to measure a target unidimensional outcome construct. Scale score for the target construct is often represented by the sum of the item scores. However, power functions based on C(α) have been lacking for various study designs. We formulate a statistical model for parallel items to derive power functions as a function of C(α) under several study designs. To this end, we assume fixed true score variance assumption as opposed to usual fixed total variance assumption. That assumption is critical and practically relevant to show that smaller measurement errors are inversely associated with higher inter-item correlations, and thus that greater C(α) is associated with greater statistical power. We compare the derived theoretical statistical power with empirical power obtained through Monte Carlo simulations for the following comparisons: one-sample comparison of pre- and post-treatment mean differences, two-sample comparison of pre-post mean differences between groups, and two-sample comparison of mean differences between groups. It is shown that C(α) is the same as a test-retest correlation of the scale scores of parallel items, which enables testing significance of C(α). Closed-form power functions and samples size determination formulas are derived in terms of C(α), for all of the aforementioned comparisons. Power functions are shown to be an increasing function of C(α), regardless of comparison of interest. The derived power functions are well validated by simulation studies that show that the magnitudes of theoretical power are virtually identical to those of the empirical power. Regardless of research designs or settings, in order to increase statistical power, development and use of instruments with greater C(α), or equivalently with greater inter-item correlations, is crucial for trials that intend to use questionnaire items for measuring research outcomes. Further development of the power functions for binary or ordinal item scores and under more general item correlation strutures reflecting more real world situations would be a valuable future study.

  7. The Skillings-Mack test (Friedman test when there are missing data).

    PubMed

    Chatfield, Mark; Mander, Adrian

    2009-04-01

    The Skillings-Mack statistic (Skillings and Mack, 1981, Technometrics 23: 171-177) is a general Friedman-type statistic that can be used in almost any block design with an arbitrary missing-data structure. The missing data can be either missing by design, for example, an incomplete block design, or missing completely at random. The Skillings-Mack test is equivalent to the Friedman test when there are no missing data in a balanced complete block design, and the Skillings-Mack test is equivalent to the test suggested in Durbin (1951, British Journal of Psychology, Statistical Section 4: 85-90) for a balanced incomplete block design. The Friedman test was implemented in Stata by Goldstein (1991, Stata Technical Bulletin 3: 26-27) and further developed in Goldstein (2005, Stata Journal 5: 285). This article introduces the skilmack command, which performs the Skillings-Mack test.The skilmack command is also useful when there are many ties or equal ranks (N.B. the Friedman statistic compared with the chi(2) distribution will give a conservative result), as well as for small samples; appropriate results can be obtained by simulating the distribution of the test statistic under the null hypothesis.

  8. Designing clinical trials to test disease-modifying agents: application to the treatment trials of Alzheimer's disease.

    PubMed

    Xiong, Chengjie; van Belle, Gerald; Miller, J Philip; Morris, John C

    2011-02-01

    Therapeutic trials of disease-modifying agents on Alzheimer's disease (AD) require novel designs and analyses involving switch of treatments for at least a portion of subjects enrolled. Randomized start and randomized withdrawal designs are two examples of such designs. Crucial design parameters such as sample size and the time of treatment switch are important to understand in designing such clinical trials. The purpose of this article is to provide methods to determine sample sizes and time of treatment switch as well as optimum statistical tests of treatment efficacy for clinical trials of disease-modifying agents on AD. A general linear mixed effects model is proposed to test the disease-modifying efficacy of novel therapeutic agents on AD. This model links the longitudinal growth from both the placebo arm and the treatment arm at the time of treatment switch for these in the delayed treatment arm or early withdrawal arm and incorporates the potential correlation on the rate of cognitive change before and after the treatment switch. Sample sizes and the optimum time for treatment switch of such trials as well as optimum test statistic for the treatment efficacy are determined according to the model. Assuming an evenly spaced longitudinal design over a fixed duration, the optimum treatment switching time in a randomized start or a randomized withdrawal trial is half way through the trial. With the optimum test statistic for the treatment efficacy and over a wide spectrum of model parameters, the optimum sample size allocations are fairly close to the simplest design with a sample size ratio of 1:1:1 among the treatment arm, the delayed treatment or early withdrawal arm, and the placebo arm. The application of the proposed methodology to AD provides evidence that much larger sample sizes are required to adequately power disease-modifying trials when compared with those for symptomatic agents, even when the treatment switch time and efficacy test are optimally chosen. The proposed method assumes that the only and immediate effect of treatment switch is on the rate of cognitive change. Crucial design parameters for the clinical trials of disease-modifying agents on AD can be optimally chosen. Government and industry officials as well as academia researchers should consider the optimum use of the clinical trials design for disease-modifying agents on AD in their effort to search for the treatments with the potential to modify the underlying pathophysiology of AD.

  9. A re-evaluation of a case-control model with contaminated controls for resource selection studies

    Treesearch

    Christopher T. Rota; Joshua J. Millspaugh; Dylan C. Kesler; Chad P. Lehman; Mark A. Rumble; Catherine M. B. Jachowski

    2013-01-01

    A common sampling design in resource selection studies involves measuring resource attributes at sample units used by an animal and at sample units considered available for use. Few models can estimate the absolute probability of using a sample unit from such data, but such approaches are generally preferred over statistical methods that estimate a relative probability...

  10. Sampling designs for HIV molecular epidemiology with application to Honduras.

    PubMed

    Shepherd, Bryan E; Rossini, Anthony J; Soto, Ramon Jeremias; De Rivera, Ivette Lorenzana; Mullins, James I

    2005-11-01

    Proper sampling is essential to characterize the molecular epidemiology of human immunodeficiency virus (HIV). HIV sampling frames are difficult to identify, so most studies use convenience samples. We discuss statistically valid and feasible sampling techniques that overcome some of the potential for bias due to convenience sampling and ensure better representation of the study population. We employ a sampling design called stratified cluster sampling. This first divides the population into geographical and/or social strata. Within each stratum, a population of clusters is chosen from groups, locations, or facilities where HIV-positive individuals might be found. Some clusters are randomly selected within strata and individuals are randomly selected within clusters. Variation and cost help determine the number of clusters and the number of individuals within clusters that are to be sampled. We illustrate the approach through a study designed to survey the heterogeneity of subtype B strains in Honduras.

  11. Using R to Simulate Permutation Distributions for Some Elementary Experimental Designs

    ERIC Educational Resources Information Center

    Eudey, T. Lynn; Kerr, Joshua D.; Trumbo, Bruce E.

    2010-01-01

    Null distributions of permutation tests for two-sample, paired, and block designs are simulated using the R statistical programming language. For each design and type of data, permutation tests are compared with standard normal-theory and nonparametric tests. These examples (often using real data) provide for classroom discussion use of metrics…

  12. Basic School Teachers' Perceptions about Curriculum Design in Ghana

    ERIC Educational Resources Information Center

    Abudu, Amadu Musah; Mensah, Mary Afi

    2016-01-01

    This study focused on teachers' perceptions about curriculum design and barriers to their participation. The sample size was 130 teachers who responded to a questionnaire. The analyses made use of descriptive statistics and descriptions. The study found that the level of teachers' participation in curriculum design is low. The results further…

  13. ADAPTIVE MATCHING IN RANDOMIZED TRIALS AND OBSERVATIONAL STUDIES

    PubMed Central

    van der Laan, Mark J.; Balzer, Laura B.; Petersen, Maya L.

    2014-01-01

    SUMMARY In many randomized and observational studies the allocation of treatment among a sample of n independent and identically distributed units is a function of the covariates of all sampled units. As a result, the treatment labels among the units are possibly dependent, complicating estimation and posing challenges for statistical inference. For example, cluster randomized trials frequently sample communities from some target population, construct matched pairs of communities from those included in the sample based on some metric of similarity in baseline community characteristics, and then randomly allocate a treatment and a control intervention within each matched pair. In this case, the observed data can neither be represented as the realization of n independent random variables, nor, contrary to current practice, as the realization of n/2 independent random variables (treating the matched pair as the independent sampling unit). In this paper we study estimation of the average causal effect of a treatment under experimental designs in which treatment allocation potentially depends on the pre-intervention covariates of all units included in the sample. We define efficient targeted minimum loss based estimators for this general design, present a theorem that establishes the desired asymptotic normality of these estimators and allows for asymptotically valid statistical inference, and discuss implementation of these estimators. We further investigate the relative asymptotic efficiency of this design compared with a design in which unit-specific treatment assignment depends only on the units’ covariates. Our findings have practical implications for the optimal design and analysis of pair matched cluster randomized trials, as well as for observational studies in which treatment decisions may depend on characteristics of the entire sample. PMID:25097298

  14. Informal Statistics Help Desk

    NASA Technical Reports Server (NTRS)

    Young, M.; Koslovsky, M.; Schaefer, Caroline M.; Feiveson, A. H.

    2017-01-01

    Back by popular demand, the JSC Biostatistics Laboratory and LSAH statisticians are offering an opportunity to discuss your statistical challenges and needs. Take the opportunity to meet the individuals offering expert statistical support to the JSC community. Join us for an informal conversation about any questions you may have encountered with issues of experimental design, analysis, or data visualization. Get answers to common questions about sample size, repeated measures, statistical assumptions, missing data, multiple testing, time-to-event data, and when to trust the results of your analyses.

  15. Selecting the optimum plot size for a California design-based stream and wetland mapping program.

    PubMed

    Lackey, Leila G; Stein, Eric D

    2014-04-01

    Accurate estimates of the extent and distribution of wetlands and streams are the foundation of wetland monitoring, management, restoration, and regulatory programs. Traditionally, these estimates have relied on comprehensive mapping. However, this approach is prohibitively resource-intensive over large areas, making it both impractical and statistically unreliable. Probabilistic (design-based) approaches to evaluating status and trends provide a more cost-effective alternative because, compared with comprehensive mapping, overall extent is inferred from mapping a statistically representative, randomly selected subset of the target area. In this type of design, the size of sample plots has a significant impact on program costs and on statistical precision and accuracy; however, no consensus exists on the appropriate plot size for remote monitoring of stream and wetland extent. This study utilized simulated sampling to assess the performance of four plot sizes (1, 4, 9, and 16 km(2)) for three geographic regions of California. Simulation results showed smaller plot sizes (1 and 4 km(2)) were most efficient for achieving desired levels of statistical accuracy and precision. However, larger plot sizes were more likely to contain rare and spatially limited wetland subtypes. Balancing these considerations led to selection of 4 km(2) for the California status and trends program.

  16. Sampling design considerations for demographic studies: a case of colonial seabirds

    USGS Publications Warehouse

    Kendall, William L.; Converse, Sarah J.; Doherty, Paul F.; Naughton, Maura B.; Anders, Angela; Hines, James E.; Flint, Elizabeth

    2009-01-01

    For the purposes of making many informed conservation decisions, the main goal for data collection is to assess population status and allow prediction of the consequences of candidate management actions. Reducing the bias and variance of estimates of population parameters reduces uncertainty in population status and projections, thereby reducing the overall uncertainty under which a population manager must make a decision. In capture-recapture studies, imperfect detection of individuals, unobservable life-history states, local movement outside study areas, and tag loss can cause bias or precision problems with estimates of population parameters. Furthermore, excessive disturbance to individuals during capture?recapture sampling may be of concern because disturbance may have demographic consequences. We address these problems using as an example a monitoring program for Black-footed Albatross (Phoebastria nigripes) and Laysan Albatross (Phoebastria immutabilis) nesting populations in the northwestern Hawaiian Islands. To mitigate these estimation problems, we describe a synergistic combination of sampling design and modeling approaches. Solutions include multiple capture periods per season and multistate, robust design statistical models, dead recoveries and incidental observations, telemetry and data loggers, buffer areas around study plots to neutralize the effect of local movements outside study plots, and double banding and statistical models that account for band loss. We also present a variation on the robust capture?recapture design and a corresponding statistical model that minimizes disturbance to individuals. For the albatross case study, this less invasive robust design was more time efficient and, when used in combination with a traditional robust design, reduced the standard error of detection probability by 14% with only two hours of additional effort in the field. These field techniques and associated modeling approaches are applicable to studies of most taxa being marked and in some cases have individually been applied to studies of birds, fish, herpetofauna, and mammals.

  17. Exploring Tree Age & Diameter to Illustrate Sample Design & Inference in Observational Ecology

    ERIC Educational Resources Information Center

    Casady, Grant M.

    2015-01-01

    Undergraduate biology labs often explore the techniques of data collection but neglect the statistical framework necessary to express findings. Students can be confused about how to use their statistical knowledge to address specific biological questions. Growth in the area of observational ecology requires that students gain experience in…

  18. Variance estimates and confidence intervals for the Kappa measure of classification accuracy

    Treesearch

    M. A. Kalkhan; R. M. Reich; R. L. Czaplewski

    1997-01-01

    The Kappa statistic is frequently used to characterize the results of an accuracy assessment used to evaluate land use and land cover classifications obtained by remotely sensed data. This statistic allows comparisons of alternative sampling designs, classification algorithms, photo-interpreters, and so forth. In order to make these comparisons, it is...

  19. An On-Line Virtual Environment for Teaching Statistical Sampling and Analysis

    ERIC Educational Resources Information Center

    Marsh, Michael T.

    2009-01-01

    Regardless of the related discipline, students in statistics courses invariably have difficulty understanding the connection between the numerical values calculated for end-of-the-chapter exercises and their usefulness in decision making. This disconnect is, in part, due to the lack of time and opportunity to actually design the experiments and…

  20. US Geological Survey nutrient preservation experiment : experimental design, statistical analysis, and interpretation of analytical results

    USGS Publications Warehouse

    Patton, Charles J.; Gilroy, Edward J.

    1999-01-01

    Data on which this report is based, including nutrient concentrations in synthetic reference samples determined concurrently with those in real samples, are extensive (greater than 20,000 determinations) and have been published separately. In addition to confirming the well-documented instability of nitrite in acidified samples, this study also demonstrates that when biota are removed from samples at collection sites by 0.45-micrometer membrane filtration, subsequent preservation with sulfuric acid or mercury (II) provides no statistically significant improvement in nutrient concentration stability during storage at 4 degrees Celsius for 30 days. Biocide preservation had no statistically significant effect on the 30-day stability of phosphorus concentrations in whole-water splits from any of the 15 stations, but did stabilize Kjeldahl nitrogen concentrations in whole-water splits from three data-collection stations where ammonium accounted for at least half of the measured Kjeldahl nitrogen.

  1. Analysis of chemical warfare agents. II. Use of thiols and statistical experimental design for the trace level determination of vesicant compounds in air samples.

    PubMed

    Muir, Bob; Quick, Suzanne; Slater, Ben J; Cooper, David B; Moran, Mary C; Timperley, Christopher M; Carrick, Wendy A; Burnell, Christopher K

    2005-03-18

    Thermal desorption with gas chromatography-mass spectrometry (TD-GC-MS) remains the technique of choice for analysis of trace concentrations of analytes in air samples. This paper describes the development and application of a method for analysing the vesicant compounds sulfur mustard and Lewisites I-III. 3,4-Dimercaptotoluene and butanethiol were used to spike sorbent tubes and vesicant vapours sampled; Lewisite I and II reacted with the thiols while sulfur mustard and Lewisite III did not. Statistical experimental design was used to optimise thermal desorption parameters and the optimum method used to determine vesicant compounds in headspace samples taken from a decontamination trial. 3,4-Dimercaptotoluene reacted with Lewisites I and II to give a common derivative with a limit of detection (LOD) of 260 microg m(-3), while the butanethiol gave distinct derivatives with limits of detection around 30 microg m(-3).

  2. Proteomic Workflows for Biomarker Identification Using Mass Spectrometry — Technical and Statistical Considerations during Initial Discovery

    PubMed Central

    Orton, Dennis J.; Doucette, Alan A.

    2013-01-01

    Identification of biomarkers capable of differentiating between pathophysiological states of an individual is a laudable goal in the field of proteomics. Protein biomarker discovery generally employs high throughput sample characterization by mass spectrometry (MS), being capable of identifying and quantifying thousands of proteins per sample. While MS-based technologies have rapidly matured, the identification of truly informative biomarkers remains elusive, with only a handful of clinically applicable tests stemming from proteomic workflows. This underlying lack of progress is attributed in large part to erroneous experimental design, biased sample handling, as well as improper statistical analysis of the resulting data. This review will discuss in detail the importance of experimental design and provide some insight into the overall workflow required for biomarker identification experiments. Proper balance between the degree of biological vs. technical replication is required for confident biomarker identification. PMID:28250400

  3. Statistical inference for the additive hazards model under outcome-dependent sampling.

    PubMed

    Yu, Jichang; Liu, Yanyan; Sandler, Dale P; Zhou, Haibo

    2015-09-01

    Cost-effective study design and proper inference procedures for data from such designs are always of particular interests to study investigators. In this article, we propose a biased sampling scheme, an outcome-dependent sampling (ODS) design for survival data with right censoring under the additive hazards model. We develop a weighted pseudo-score estimator for the regression parameters for the proposed design and derive the asymptotic properties of the proposed estimator. We also provide some suggestions for using the proposed method by evaluating the relative efficiency of the proposed method against simple random sampling design and derive the optimal allocation of the subsamples for the proposed design. Simulation studies show that the proposed ODS design is more powerful than other existing designs and the proposed estimator is more efficient than other estimators. We apply our method to analyze a cancer study conducted at NIEHS, the Cancer Incidence and Mortality of Uranium Miners Study, to study the risk of radon exposure to cancer.

  4. Statistical inference for the additive hazards model under outcome-dependent sampling

    PubMed Central

    Yu, Jichang; Liu, Yanyan; Sandler, Dale P.; Zhou, Haibo

    2015-01-01

    Cost-effective study design and proper inference procedures for data from such designs are always of particular interests to study investigators. In this article, we propose a biased sampling scheme, an outcome-dependent sampling (ODS) design for survival data with right censoring under the additive hazards model. We develop a weighted pseudo-score estimator for the regression parameters for the proposed design and derive the asymptotic properties of the proposed estimator. We also provide some suggestions for using the proposed method by evaluating the relative efficiency of the proposed method against simple random sampling design and derive the optimal allocation of the subsamples for the proposed design. Simulation studies show that the proposed ODS design is more powerful than other existing designs and the proposed estimator is more efficient than other estimators. We apply our method to analyze a cancer study conducted at NIEHS, the Cancer Incidence and Mortality of Uranium Miners Study, to study the risk of radon exposure to cancer. PMID:26379363

  5. What I See Is Not Quite the Way It Really Is: Students' Emergent Reasoning about Sampling Variability

    ERIC Educational Resources Information Center

    Pfannkuch, Maxine; Arnold, Pip; Wild, Chris J.

    2015-01-01

    Currently, instruction pays little attention to the development of students' sampling variability reasoning in relation to statistical inference. In this paper, we briefly discuss the especially designed sampling variability learning experiences students aged about 15 engaged in as part of a research project. We examine assessment and…

  6. Statistical sampling methods for soils monitoring

    Treesearch

    Ann M. Abbott

    2010-01-01

    Development of the best sampling design to answer a research question should be an interactive venture between the land manager or researcher and statisticians, and is the result of answering various questions. A series of questions that can be asked to guide the researcher in making decisions that will arrive at an effective sampling plan are described, and a case...

  7. National Sample Survey of Registered Nurses II. Status of Nurses: November 1980.

    ERIC Educational Resources Information Center

    Bentley, Barbara S.; And Others

    This report provides data describing the nursing population as determined by the second national sample survey of registered nurses. A brief introduction is followed by a chapter that presents an overview of the survey methodology, including details on the sampling design, the response rate, and the statistical reliability. Chapter 3 provides a…

  8. In vivo Comet assay--statistical analysis and power calculations of mice testicular cells.

    PubMed

    Hansen, Merete Kjær; Sharma, Anoop Kumar; Dybdahl, Marianne; Boberg, Julie; Kulahci, Murat

    2014-11-01

    The in vivo Comet assay is a sensitive method for evaluating DNA damage. A recurrent concern is how to analyze the data appropriately and efficiently. A popular approach is to summarize the raw data into a summary statistic prior to the statistical analysis. However, consensus on which summary statistic to use has yet to be reached. Another important consideration concerns the assessment of proper sample sizes in the design of Comet assay studies. This study aims to identify a statistic suitably summarizing the % tail DNA of mice testicular samples in Comet assay studies. A second aim is to provide curves for this statistic outlining the number of animals and gels to use. The current study was based on 11 compounds administered via oral gavage in three doses to male mice: CAS no. 110-26-9, CAS no. 512-56-1, CAS no. 111873-33-7, CAS no. 79-94-7, CAS no. 115-96-8, CAS no. 598-55-0, CAS no. 636-97-5, CAS no. 85-28-9, CAS no. 13674-87-8, CAS no. 43100-38-5 and CAS no. 60965-26-6. Testicular cells were examined using the alkaline version of the Comet assay and the DNA damage was quantified as % tail DNA using a fully automatic scoring system. From the raw data 23 summary statistics were examined. A linear mixed-effects model was fitted to the summarized data and the estimated variance components were used to generate power curves as a function of sample size. The statistic that most appropriately summarized the within-sample distributions was the median of the log-transformed data, as it most consistently conformed to the assumptions of the statistical model. Power curves for 1.5-, 2-, and 2.5-fold changes of the highest dose group compared to the control group when 50 and 100 cells were scored per gel are provided to aid in the design of future Comet assay studies on testicular cells. Copyright © 2014 Elsevier B.V. All rights reserved.

  9. Review of Well Operator Files for Hydraulically Fractured Oil and Gas Production Wells: Well Design and Construction Fact Sheet

    EPA Pesticide Factsheets

    EPA reviewed a statistically representative sample of oil and gas production wells reported by nine service companies to help understand the role of well design and construction practices preventing pathways for subsurface fluid movement.

  10. [Design of standard voice sample text for subjective auditory perceptual evaluation of voice disorders].

    PubMed

    Li, Jin-rang; Sun, Yan-yan; Xu, Wen

    2010-09-01

    To design a speech voice sample text with all phonemes in Mandarin for subjective auditory perceptual evaluation of voice disorders. The principles for design of a speech voice sample text are: The short text should include the 21 initials and 39 finals, this may cover all the phonemes in Mandarin. Also, the short text should have some meanings. A short text was made out. It had 155 Chinese words, and included 21 initials and 38 finals (the final, ê, was not included because it was rarely used in Mandarin). Also, the text covered 17 light tones and one "Erhua". The constituent ratios of the initials and finals presented in this short text were statistically similar as those in Mandarin according to the method of similarity of the sample and population (r = 0.742, P < 0.001 and r = 0.844, P < 0.001, respectively). The constituent ratios of the tones presented in this short text were statistically not similar as those in Mandarin (r = 0.731, P > 0.05). A speech voice sample text with all phonemes in Mandarin was made out. The constituent ratios of the initials and finals presented in this short text are similar as those in Mandarin. Its value for subjective auditory perceptual evaluation of voice disorders need further study.

  11. Two-stage sequential sampling: A neighborhood-free adaptive sampling procedure

    USGS Publications Warehouse

    Salehi, M.; Smith, D.R.

    2005-01-01

    Designing an efficient sampling scheme for a rare and clustered population is a challenging area of research. Adaptive cluster sampling, which has been shown to be viable for such a population, is based on sampling a neighborhood of units around a unit that meets a specified condition. However, the edge units produced by sampling neighborhoods have proven to limit the efficiency and applicability of adaptive cluster sampling. We propose a sampling design that is adaptive in the sense that the final sample depends on observed values, but it avoids the use of neighborhoods and the sampling of edge units. Unbiased estimators of population total and its variance are derived using Murthy's estimator. The modified two-stage sampling design is easy to implement and can be applied to a wider range of populations than adaptive cluster sampling. We evaluate the proposed sampling design by simulating sampling of two real biological populations and an artificial population for which the variable of interest took the value either 0 or 1 (e.g., indicating presence and absence of a rare event). We show that the proposed sampling design is more efficient than conventional sampling in nearly all cases. The approach used to derive estimators (Murthy's estimator) opens the door for unbiased estimators to be found for similar sequential sampling designs. ?? 2005 American Statistical Association and the International Biometric Society.

  12. Single-arm phase II trial design under parametric cure models.

    PubMed

    Wu, Jianrong

    2015-01-01

    The current practice of designing single-arm phase II survival trials is limited under the exponential model. Trial design under the exponential model may not be appropriate when a portion of patients are cured. There is no literature available for designing single-arm phase II trials under the parametric cure model. In this paper, a test statistic is proposed, and a sample size formula is derived for designing single-arm phase II trials under a class of parametric cure models. Extensive simulations showed that the proposed test and sample size formula perform very well under different scenarios. Copyright © 2015 John Wiley & Sons, Ltd.

  13. 1979 Reserve Force Studies Surveys: Survey Design, Sample Design and Administrative Procedures,

    DTIC Science & Technology

    1981-08-01

    three factors: the need for a statistically significant number of usable questionnaires from different groups within the random sampls and from...Because of the multipurpose nature of these surveys and the large number of questions needed to fully address some of the topics covered, we...varies. Collection of data at the unit level is needed to accurately estimate actual reserve compensation and benefits and their possible role in both

  14. A generalized concept for cost-effective structural design. [Statistical Decision Theory applied to aerospace systems

    NASA Technical Reports Server (NTRS)

    Thomas, J. M.; Hawk, J. D.

    1975-01-01

    A generalized concept for cost-effective structural design is introduced. It is assumed that decisions affecting the cost effectiveness of aerospace structures fall into three basic categories: design, verification, and operation. Within these basic categories, certain decisions concerning items such as design configuration, safety factors, testing methods, and operational constraints are to be made. All or some of the variables affecting these decisions may be treated probabilistically. Bayesian statistical decision theory is used as the tool for determining the cost optimum decisions. A special case of the general problem is derived herein, and some very useful parametric curves are developed and applied to several sample structures.

  15. Statistical techniques for sampling and monitoring natural resources

    Treesearch

    Hans T. Schreuder; Richard Ernst; Hugo Ramirez-Maldonado

    2004-01-01

    We present the statistical theory of inventory and monitoring from a probabilistic point of view. We start with the basics and show the interrelationships between designs and estimators illustrating the methods with a small artificial population as well as with a mapped realistic population. For such applications, useful open source software is given in Appendix 4....

  16. Student Achievement in Undergraduate Statistics: The Potential Value of Allowing Failure

    ERIC Educational Resources Information Center

    Ferrandino, Joseph A.

    2016-01-01

    This article details what resulted when I re-designed my undergraduate statistics course to allow failure as a learning strategy and focused on achievement rather than performance. A variety of within and between sample t-tests are utilized to determine the impact of unlimited test and quiz opportunities on student learning on both quizzes and…

  17. Animating Statistics: A New Kind of Applet for Exploring Probability Distributions

    ERIC Educational Resources Information Center

    Kahle, David

    2014-01-01

    In this article, I introduce a novel applet ("module") for exploring probability distributions, their samples, and various related statistical concepts. The module is primarily designed to be used by the instructor in the introductory course, but it can be used far beyond it as well. It is a free, cross-platform, stand-alone interactive…

  18. Bioassessment Tools for Stony Corals: Monitoring Approaches and Proposed Sampling Plan for the U.S. Virgin Islands

    EPA Science Inventory

    This document describes three general approaches to the design of a sampling plan for biological monitoring of coral reefs. Status assessment, trend detection and targeted monitoring each require a different approach to site selection and statistical analysis. For status assessm...

  19. MID-ATLANTIC COASTAL STREAMS STUDY: STATISTICAL DESIGN FOR REGIONAL ASSESSMENT AND LANDSCAPE MODEL DEVELOPMENT

    EPA Science Inventory

    A network of stream-sampling sites was developed for the Mid-Atlantic Coastal Plain (New Jersey through North Carolina) a collaborative study between the U.S. Environmental Protection Agency and the U.S. Geological Survey. A stratified random sampling with unequal weighting was u...

  20. "Adultspan" Publication Patterns: Author and Article Characteristics from 1999 to 2009

    ERIC Educational Resources Information Center

    Erford, Bradley T.; Clark, Kelly H.; Erford, Breann M.

    2011-01-01

    Publication patterns of articles in "Adultspan" from 1999 to 2009 were reviewed. Author characteristics and article content were analyzed to determine trends over time. Research articles were analyzed specifically for type of research design, classification, sampling method, types of participants, sample size, types of statistics used, and…

  1. MID-ATLANTIC COASTAL STREAMS STUDY: STATISTICAL DESIGN FOR REGIONAL ASSESSMENT AND LANDSCAPE MODEL DEVELOPMENT

    EPA Science Inventory

    A network of stream-sampling sites was developed for the Mid-Atlantic Coastal Plain (New Jersey through North Carolina) as part of collaborative research between the U.S. Environmental Protection Agency and the U.S. Geological Survey. A stratified random sampling with unequal wei...

  2. Monitoring changes in exotic vegetation

    Treesearch

    Robert D. Sutter

    1998-01-01

    Ecological monitoring provides critical information for management decisions by measuring changes in managed and unmanaged populations, communities and ecological systems. It integrates ecology, goal and objective setting, sampling design, sampling methods, and statistical analysis. It is a topic that I, with a team of Nature Conservancy ecologists, teach in a six day...

  3. Study design and statistical analysis of data in human population studies with the micronucleus assay.

    PubMed

    Ceppi, Marcello; Gallo, Fabio; Bonassi, Stefano

    2011-01-01

    The most common study design performed in population studies based on the micronucleus (MN) assay, is the cross-sectional study, which is largely performed to evaluate the DNA damaging effects of exposure to genotoxic agents in the workplace, in the environment, as well as from diet or lifestyle factors. Sample size is still a critical issue in the design of MN studies since most recent studies considering gene-environment interaction, often require a sample size of several hundred subjects, which is in many cases difficult to achieve. The control of confounding is another major threat to the validity of causal inference. The most popular confounders considered in population studies using MN are age, gender and smoking habit. Extensive attention is given to the assessment of effect modification, given the increasing inclusion of biomarkers of genetic susceptibility in the study design. Selected issues concerning the statistical treatment of data have been addressed in this mini-review, starting from data description, which is a critical step of statistical analysis, since it allows to detect possible errors in the dataset to be analysed and to check the validity of assumptions required for more complex analyses. Basic issues dealing with statistical analysis of biomarkers are extensively evaluated, including methods to explore the dose-response relationship among two continuous variables and inferential analysis. A critical approach to the use of parametric and non-parametric methods is presented, before addressing the issue of most suitable multivariate models to fit MN data. In the last decade, the quality of statistical analysis of MN data has certainly evolved, although even nowadays only a small number of studies apply the Poisson model, which is the most suitable method for the analysis of MN data.

  4. Statistical summaries of fatigue data for design purposes

    NASA Technical Reports Server (NTRS)

    Wirsching, P. H.

    1983-01-01

    Two methods are discussed for constructing a design curve on the safe side of fatigue data. Both the tolerance interval and equivalent prediction interval (EPI) concepts provide such a curve while accounting for both the distribution of the estimators in small samples and the data scatter. The EPI is also useful as a mechanism for providing necessary statistics on S-N data for a full reliability analysis which includes uncertainty in all fatigue design factors. Examples of statistical analyses of the general strain life relationship are presented. The tolerance limit and EPI techniques for defining a design curve are demonstrated. Examples usng WASPALOY B and RQC-100 data demonstrate that a reliability model could be constructed by considering the fatigue strength and fatigue ductility coefficients as two independent random variables. A technique given for establishing the fatigue strength for high cycle lives relies on an extrapolation technique and also accounts for "runners." A reliability model or design value can be specified.

  5. Application of the experimental design of experiments (DoE) for the determination of organotin compounds in water samples using HS-SPME and GC-MS/MS.

    PubMed

    Coscollà, Clara; Navarro-Olivares, Santiago; Martí, Pedro; Yusà, Vicent

    2014-02-01

    When attempting to discover the important factors and then optimise a response by tuning these factors, experimental design (design of experiments, DoE) gives a powerful suite of statistical methodology. DoE identify significant factors and then optimise a response with respect to them in method development. In this work, a headspace-solid-phase micro-extraction (HS-SPME) combined with gas chromatography tandem mass spectrometry (GC-MS/MS) methodology for the simultaneous determination of six important organotin compounds namely monobutyltin (MBT), dibutyltin (DBT), tributyltin (TBT), monophenyltin (MPhT), diphenyltin (DPhT), triphenyltin (TPhT) has been optimized using a statistical design of experiments (DOE). The analytical method is based on the ethylation with NaBEt4 and simultaneous headspace-solid-phase micro-extraction of the derivative compounds followed by GC-MS/MS analysis. The main experimental parameters influencing the extraction efficiency selected for optimization were pre-incubation time, incubation temperature, agitator speed, extraction time, desorption temperature, buffer (pH, concentration and volume), headspace volume, sample salinity, preparation of standards, ultrasonic time and desorption time in the injector. The main factors (excitation voltage, excitation time, ion source temperature, isolation time and electron energy) affecting the GC-IT-MS/MS response were also optimized using the same statistical design of experiments. The proposed method presented good linearity (coefficient of determination R(2)>0.99) and repeatibilty (1-25%) for all the compounds under study. The accuracy of the method measured as the average percentage recovery of the compounds in spiked surface and marine waters was higher than 70% for all compounds studied. Finally, the optimized methodology was applied to real aqueous samples enabled the simultaneous determination of all compounds under study in surface and marine water samples obtained from Valencia region (Spain). © 2013 Elsevier B.V. All rights reserved.

  6. Genome-wide association analysis of secondary imaging phenotypes from the Alzheimer's disease neuroimaging initiative study.

    PubMed

    Zhu, Wensheng; Yuan, Ying; Zhang, Jingwen; Zhou, Fan; Knickmeyer, Rebecca C; Zhu, Hongtu

    2017-02-01

    The aim of this paper is to systematically evaluate a biased sampling issue associated with genome-wide association analysis (GWAS) of imaging phenotypes for most imaging genetic studies, including the Alzheimer's Disease Neuroimaging Initiative (ADNI). Specifically, the original sampling scheme of these imaging genetic studies is primarily the retrospective case-control design, whereas most existing statistical analyses of these studies ignore such sampling scheme by directly correlating imaging phenotypes (called the secondary traits) with genotype. Although it has been well documented in genetic epidemiology that ignoring the case-control sampling scheme can produce highly biased estimates, and subsequently lead to misleading results and suspicious associations, such findings are not well documented in imaging genetics. We use extensive simulations and a large-scale imaging genetic data analysis of the Alzheimer's Disease Neuroimaging Initiative (ADNI) data to evaluate the effects of the case-control sampling scheme on GWAS results based on some standard statistical methods, such as linear regression methods, while comparing it with several advanced statistical methods that appropriately adjust for the case-control sampling scheme. Copyright © 2016 Elsevier Inc. All rights reserved.

  7. Statistical Considerations of Food Allergy Prevention Studies.

    PubMed

    Bahnson, Henry T; du Toit, George; Lack, Gideon

    Clinical studies to prevent the development of food allergy have recently helped reshape public policy recommendations on the early introduction of allergenic foods. These trials are also prompting new research, and it is therefore important to address the unique design and analysis challenges of prevention trials. We highlight statistical concepts and give recommendations that clinical researchers may wish to adopt when designing future study protocols and analysis plans for prevention studies. Topics include selecting a study sample, addressing internal and external validity, improving statistical power, choosing alpha and beta, analysis innovations to address dilution effects, and analysis methods to deal with poor compliance, dropout, and missing data. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  8. Selected Oral Health Indicators in the United States, 2005-2008

    MedlinePlus

    ... errors of the percentages were estimated using Taylor series linearization, to take into account the complex sampling design. The statistical significance of differences between estimates were ...

  9. An automated system for chromosome analysis. Volume 1: Goals, system design, and performance

    NASA Technical Reports Server (NTRS)

    Castleman, K. R.; Melnyk, J. H.

    1975-01-01

    The design, construction, and testing of a complete system to produce karyotypes and chromosome measurement data from human blood samples, and a basis for statistical analysis of quantitative chromosome measurement data is described. The prototype was assembled, tested, and evaluated on clinical material and thoroughly documented.

  10. 75 FR 69681 - Proposed Collection; Comment Request; California Health Interview Survey Cancer Control Module...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-11-15

    ... State. The CHIS sample is designed to provide statistically reliable estimates statewide, for California... activity, obesity, and human papillomavirus. Additionally, CHIS is designed to be comparable to the National Health Interview Survey (NHIS) data in order to conduct comparative analyses. CHIS provides...

  11. Optimal Experimental Design for Model Discrimination

    ERIC Educational Resources Information Center

    Myung, Jay I.; Pitt, Mark A.

    2009-01-01

    Models of a psychological process can be difficult to discriminate experimentally because it is not easy to determine the values of the critical design variables (e.g., presentation schedule, stimulus structure) that will be most informative in differentiating them. Recent developments in sampling-based search methods in statistics make it…

  12. Investigating the impact of design characteristics on statistical efficiency within discrete choice experiments: A systematic survey.

    PubMed

    Vanniyasingam, Thuva; Daly, Caitlin; Jin, Xuejing; Zhang, Yuan; Foster, Gary; Cunningham, Charles; Thabane, Lehana

    2018-06-01

    This study reviews simulation studies of discrete choice experiments to determine (i) how survey design features affect statistical efficiency, (ii) and to appraise their reporting quality. Statistical efficiency was measured using relative design (D-) efficiency, D-optimality, or D-error. For this systematic survey, we searched Journal Storage (JSTOR), Since Direct, PubMed, and OVID which included a search within EMBASE. Searches were conducted up to year 2016 for simulation studies investigating the impact of DCE design features on statistical efficiency. Studies were screened and data were extracted independently and in duplicate. Results for each included study were summarized by design characteristic. Previously developed criteria for reporting quality of simulation studies were also adapted and applied to each included study. Of 371 potentially relevant studies, 9 were found to be eligible, with several varying in study objectives. Statistical efficiency improved when increasing the number of choice tasks or alternatives; decreasing the number of attributes, attribute levels; using an unrestricted continuous "manipulator" attribute; using model-based approaches with covariates incorporating response behaviour; using sampling approaches that incorporate previous knowledge of response behaviour; incorporating heterogeneity in a model-based design; correctly specifying Bayesian priors; minimizing parameter prior variances; and using an appropriate method to create the DCE design for the research question. The simulation studies performed well in terms of reporting quality. Improvement is needed in regards to clearly specifying study objectives, number of failures, random number generators, starting seeds, and the software used. These results identify the best approaches to structure a DCE. An investigator can manipulate design characteristics to help reduce response burden and increase statistical efficiency. Since studies varied in their objectives, conclusions were made on several design characteristics, however, the validity of each conclusion was limited. Further research should be conducted to explore all conclusions in various design settings and scenarios. Additional reviews to explore other statistical efficiency outcomes and databases can also be performed to enhance the conclusions identified from this review.

  13. Considerations for the design, analysis and presentation of in vivo studies.

    PubMed

    Ranstam, J; Cook, J A

    2017-03-01

    To describe, explain and give practical suggestions regarding important principles and key methodological challenges in the study design, statistical analysis, and reporting of results from in vivo studies. Pre-specifying endpoints and analysis, recognizing the common underlying assumption of statistically independent observations, performing sample size calculations, and addressing multiplicity issues are important parts of an in vivo study. A clear reporting of results and informative graphical presentations of data are other important parts. Copyright © 2016 Osteoarthritis Research Society International. Published by Elsevier Ltd. All rights reserved.

  14. Type I error probabilities based on design-stage strategies with applications to noninferiority trials.

    PubMed

    Rothmann, Mark

    2005-01-01

    When testing the equality of means from two different populations, a t-test or large sample normal test tend to be performed. For these tests, when the sample size or design for the second sample is dependent on the results of the first sample, the type I error probability is altered for each specific possibility in the null hypothesis. We will examine the impact on the type I error probabilities for two confidence interval procedures and procedures using test statistics when the design for the second sample or experiment is dependent on the results from the first sample or experiment (or series of experiments). Ways for controlling a desired maximum type I error probability or a desired type I error rate will be discussed. Results are applied to the setting of noninferiority comparisons in active controlled trials where the use of a placebo is unethical.

  15. Economic Statistical Design of Integrated X-bar-S Control Chart with Preventive Maintenance and General Failure Distribution

    PubMed Central

    Caballero Morales, Santiago Omar

    2013-01-01

    The application of Preventive Maintenance (PM) and Statistical Process Control (SPC) are important practices to achieve high product quality, small frequency of failures, and cost reduction in a production process. However there are some points that have not been explored in depth about its joint application. First, most SPC is performed with the X-bar control chart which does not fully consider the variability of the production process. Second, many studies of design of control charts consider just the economic aspect while statistical restrictions must be considered to achieve charts with low probabilities of false detection of failures. Third, the effect of PM on processes with different failure probability distributions has not been studied. Hence, this paper covers these points, presenting the Economic Statistical Design (ESD) of joint X-bar-S control charts with a cost model that integrates PM with general failure distribution. Experiments showed statistically significant reductions in costs when PM is performed on processes with high failure rates and reductions in the sampling frequency of units for testing under SPC. PMID:23527082

  16. A weighted generalized score statistic for comparison of predictive values of diagnostic tests.

    PubMed

    Kosinski, Andrzej S

    2013-03-15

    Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations that are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we presented, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic that incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, always reduces to the score statistic in the independent samples situation, and preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe that the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the WGS test statistic in a general GEE setting. Copyright © 2012 John Wiley & Sons, Ltd.

  17. A weighted generalized score statistic for comparison of predictive values of diagnostic tests

    PubMed Central

    Kosinski, Andrzej S.

    2013-01-01

    Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations which are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we present, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic which incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, it always reduces to the score statistic in the independent samples situation, and it preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the weighted generalized score test statistic in a general GEE setting. PMID:22912343

  18. The Development of Statistics Textbook Supported with ICT and Portfolio-Based Assessment

    NASA Astrophysics Data System (ADS)

    Hendikawati, Putriaji; Yuni Arini, Florentina

    2016-02-01

    This research was development research that aimed to develop and produce a Statistics textbook model that supported with information and communication technology (ICT) and Portfolio-Based Assessment. This book was designed for students of mathematics at the college to improve students’ ability in mathematical connection and communication. There were three stages in this research i.e. define, design, and develop. The textbooks consisted of 10 chapters which each chapter contains introduction, core materials and include examples and exercises. The textbook developed phase begins with the early stages of designed the book (draft 1) which then validated by experts. Revision of draft 1 produced draft 2 which then limited test for readability test book. Furthermore, revision of draft 2 produced textbook draft 3 which simulated on a small sample to produce a valid model textbook. The data were analysed with descriptive statistics. The analysis showed that the Statistics textbook model that supported with ICT and Portfolio-Based Assessment valid and fill up the criteria of practicality.

  19. A Bayesian sequential design with adaptive randomization for 2-sided hypothesis test.

    PubMed

    Yu, Qingzhao; Zhu, Lin; Zhu, Han

    2017-11-01

    Bayesian sequential and adaptive randomization designs are gaining popularity in clinical trials thanks to their potentials to reduce the number of required participants and save resources. We propose a Bayesian sequential design with adaptive randomization rates so as to more efficiently attribute newly recruited patients to different treatment arms. In this paper, we consider 2-arm clinical trials. Patients are allocated to the 2 arms with a randomization rate to achieve minimum variance for the test statistic. Algorithms are presented to calculate the optimal randomization rate, critical values, and power for the proposed design. Sensitivity analysis is implemented to check the influence on design by changing the prior distributions. Simulation studies are applied to compare the proposed method and traditional methods in terms of power and actual sample sizes. Simulations show that, when total sample size is fixed, the proposed design can obtain greater power and/or cost smaller actual sample size than the traditional Bayesian sequential design. Finally, we apply the proposed method to a real data set and compare the results with the Bayesian sequential design without adaptive randomization in terms of sample sizes. The proposed method can further reduce required sample size. Copyright © 2017 John Wiley & Sons, Ltd.

  20. ICS-II USA research design and methodology.

    PubMed

    Rana, H; Andersen, R M; Nakazono, T T; Davidson, P L

    1997-05-01

    The purpose of the WHO-sponsored International Collaborative Study of Oral Health Outcomes (ICS-II) was to provide policy-markers and researchers with detailed, reliable, and valid data on the oral health situation in their countries or regions, together with comparative data from other dental care delivery systems. ICS-II used a cross-sectional design with no explicit control groups or experimental interventions. A standardized methodology was developed and tested for collecting and analyzing epidemiological, sociocultural, economic, and delivery system data. Respondent information was obtained by household interviews, and clinical examinations were conducted by calibrated oral epidemiologists. Discussed are the sampling design characteristics for the USA research locations, response rates, samples size for interview and oral examination data, weighting procedures, and statistical methods. SUDAAN was used to adjust variance calculations, since complex sampling designs were used.

  1. Assessment of wadeable stream resources in the driftless area ecoregion in Western Wisconsin using a probabilistic sampling design.

    PubMed

    Miller, Michael A; Colby, Alison C C; Kanehl, Paul D; Blocksom, Karen

    2009-03-01

    The Wisconsin Department of Natural Resources (WDNR), with support from the U.S. EPA, conducted an assessment of wadeable streams in the Driftless Area ecoregion in western Wisconsin using a probabilistic sampling design. This ecoregion encompasses 20% of Wisconsin's land area and contains 8,800 miles of perennial streams. Randomly-selected stream sites (n = 60) equally distributed among stream orders 1-4 were sampled. Watershed land use, riparian and in-stream habitat, water chemistry, macroinvertebrate, and fish assemblage data were collected at each true random site and an associated "modified-random" site on each stream that was accessed via a road crossing nearest to the true random site. Targeted least-disturbed reference sites (n = 22) were also sampled to develop reference conditions for various physical, chemical, and biological measures. Cumulative distribution function plots of various measures collected at the true random sites evaluated with reference condition thresholds, indicate that high proportions of the random sites (and by inference the entire Driftless Area wadeable stream population) show some level of degradation. Study results show no statistically significant differences between the true random and modified-random sample sites for any of the nine physical habitat, 11 water chemistry, seven macroinvertebrate, or eight fish metrics analyzed. In Wisconsin's Driftless Area, 79% of wadeable stream lengths were accessible via road crossings. While further evaluation of the statistical rigor of using a modified-random sampling design is warranted, sampling randomly-selected stream sites accessed via the nearest road crossing may provide a more economical way to apply probabilistic sampling in stream monitoring programs.

  2. [Methodological design of the National Health and Nutrition Survey 2016].

    PubMed

    Romero-Martínez, Martín; Shamah-Levy, Teresa; Cuevas-Nasu, Lucía; Gómez-Humarán, Ignacio Méndez; Gaona-Pineda, Elsa Berenice; Gómez-Acosta, Luz María; Rivera-Dommarco, Juan Ángel; Hernández-Ávila, Mauricio

    2017-01-01

    Describe the design methodology of the halfway health and nutrition national survey (Ensanut-MC) 2016. The Ensanut-MC is a national probabilistic survey whose objective population are the inhabitants of private households in Mexico. The sample size was determined to make inferences on the urban and rural areas in four regions. Describes main design elements: target population, topics of study, sampling procedure, measurement procedure and logistics organization. A final sample of 9 479 completed household interviews, and a sample of 16 591 individual interviews. The response rate for households was 77.9%, and the response rate for individuals was 91.9%. The Ensanut-MC probabilistic design allows valid statistical inferences about interest parameters for Mexico´s public health and nutrition, specifically on overweight, obesity and diabetes mellitus. Updated information also supports the monitoring, updating and formulation of new policies and priority programs.

  3. Statistical considerations for plot design, sampling procedures, analysis, and quality assurance of ozone injury studies

    Treesearch

    Michael Arbaugh; Larry Bednar

    1996-01-01

    The sampling methods used to monitor ozone injury to ponderosa and Jeffrey pines depend on the objectives of the study, geographic and genetic composition of the forest, and the source and composition of air pollutant emissions. By using a standardized sampling methodology, it may be possible to compare conditions within local areas more accurately, and to apply the...

  4. An Analysis of Variance Framework for Matrix Sampling.

    ERIC Educational Resources Information Center

    Sirotnik, Kenneth

    Significant cost savings can be achieved with the use of matrix sampling in estimating population parameters from psychometric data. The statistical design is intuitively simple, using the framework of the two-way classification analysis of variance technique. For example, the mean and variance are derived from the performance of a certain grade…

  5. THE ROLE OF SAMPLE SURVEYS: WHY SHOULD PRACTITIONERS CONSIDER USING A STATISTICAL SAMPLING DESIGN?

    EPA Science Inventory

    The principle focus of this book chapter is to address a question facing many monitoring organizations: Is it possible to find a cost-effective way to integrate multiple monitoring efforts in a region? The American Fisheries Society, along with the State of the Salmon (a joint ...

  6. Statistical and sampling issues when using multiple particle tracking

    NASA Astrophysics Data System (ADS)

    Savin, Thierry; Doyle, Patrick S.

    2007-08-01

    Video microscopy can be used to simultaneously track several microparticles embedded in a complex material. The trajectories are used to extract a sample of displacements at random locations in the material. From this sample, averaged quantities characterizing the dynamics of the probes are calculated to evaluate structural and/or mechanical properties of the assessed material. However, the sampling of measured displacements in heterogeneous systems is singular because the volume of observation with video microscopy is finite. By carefully characterizing the sampling design in the experimental output of the multiple particle tracking technique, we derive estimators for the mean and variance of the probes’ dynamics that are independent of the peculiar statistical characteristics. We expose stringent tests of these estimators using simulated and experimental complex systems with a known heterogeneous structure. Up to a certain fundamental limitation, which we characterize through a material degree of sampling by the embedded probe tracking, these estimators can be applied to quantify the heterogeneity of a material, providing an original and intelligible kind of information on complex fluid properties. More generally, we show that the precise assessment of the statistics in the multiple particle tracking output sample of observations is essential in order to provide accurate unbiased measurements.

  7. In vitro Fracture strength and hardness of different computer-aided design/computer-aided manufacturing inlays.

    PubMed

    Sagsoz, O; Yildiz, M; Hojjat Ghahramanzadeh, A S L; Alsaran, A

    2018-03-01

    The purpose of this study was to examine the fracture strength and surface microhardness of computer-aided design/computer-aided manufacturing (CAD/CAM) materials in vitro. Mesial-occlusal-distal inlays were made from five different CAD/CAM materials (feldspathic ceramic, CEREC blocs; leucite-reinforced ceramic, IPS Empress CAD; resin nano ceramic, 3M ESPE Lava Ultimate; hybrid ceramic, VITA Enamic; and lithium disilicate ceramic, IPS e.max CAD) using CEREC 4 CAD/CAM system. Samples were adhesively cemented to metal analogs with a resin cement (3M ESPE, U200). The fracture tests were carried out with a universal testing machine. Furthermore, five samples were prepared from each CAD/CAM material for micro-Vickers hardness test. Data were analyzed with statistics software SPSS 20 (IBM Corp., New York, USA). Fracture strength of lithium disilicate inlays (3949 N) was found to be higher than other ceramic inlays (P < 0.05). There was no difference between other inlays statistically (P > 0.05). The highest micro-Vickers hardness was measured in lithium disilicate samples, and the lowest was in resin nano ceramic samples. Fracture strength results demonstrate that inlays can withstand the forces in the mouth. Statistical results showed that fracture strength and micro-Vickers hardness of feldspathic ceramic, leucite-reinforced ceramic, and lithium disilicate ceramic materials had a positive correlation.

  8. A general approach for sample size calculation for the three-arm 'gold standard' non-inferiority design.

    PubMed

    Stucke, Kathrin; Kieser, Meinhard

    2012-12-10

    In the three-arm 'gold standard' non-inferiority design, an experimental treatment, an active reference, and a placebo are compared. This design is becoming increasingly popular, and it is, whenever feasible, recommended for use by regulatory guidelines. We provide a general method to calculate the required sample size for clinical trials performed in this design. As special cases, the situations of continuous, binary, and Poisson distributed outcomes are explored. Taking into account the correlation structure of the involved test statistics, the proposed approach leads to considerable savings in sample size as compared with application of ad hoc methods for all three scale levels. Furthermore, optimal sample size allocation ratios are determined that result in markedly smaller total sample sizes as compared with equal assignment. As optimal allocation makes the active treatment groups larger than the placebo group, implementation of the proposed approach is also desirable from an ethical viewpoint. Copyright © 2012 John Wiley & Sons, Ltd.

  9. An Automated System for Chromosome Analysis

    NASA Technical Reports Server (NTRS)

    Castleman, K. R.; Melnyk, J. H.

    1976-01-01

    The design, construction, and testing of a complete system to produce karyotypes and chromosome measurement data from human blood samples, and to provide a basis for statistical analysis of quantitative chromosome measurement data are described.

  10. Determining significant material properties: A discovery approach

    NASA Technical Reports Server (NTRS)

    Karplus, Alan K.

    1992-01-01

    The following is a laboratory experiment designed to further understanding of materials science. The experiment itself can be informative for persons of any age past elementary school, and even for some in elementary school. The preparation of the plastic samples is readily accomplished by persons with resonable dexterity in the cutting of paper designs. The completion of the statistical Design of Experiments, which uses Yates' Method, requires basic math (addition and subtraction). Interpretive work requires plotting of data and making observations. Knowledge of statistical methods would be helpful. The purpose of this experiment is to acquaint students with the seven classes of recyclable plastics, and provide hands-on learning about the response of these plastics to mechanical tensile loading.

  11. Addendum to Sampling and Analysis Plan (SAP) for Assessment of LANL-Derived Residual Radionuclides in Soils within Tract A-16-d for Land Conveyance and Transfer for Sewage Treatment Facility Area

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Whicker, Jeffrey Jay; Gillis, Jessica Mcdonnel; Ruedig, Elizabeth

    This report summarizes the sampling design used, associated statistical assumptions, as well as general guidelines for conducting post-sampling data analysis. Sampling plan components presented here include how many sampling locations to choose and where within the sampling area to collect those samples. The type of medium to sample (i.e., soil, groundwater, etc.) and how to analyze the samples (in-situ, fixed laboratory, etc.) are addressed in other sections of the sampling plan.

  12. Statistical models for the analysis and design of digital polymerase chain (dPCR) experiments

    USGS Publications Warehouse

    Dorazio, Robert; Hunter, Margaret

    2015-01-01

    Statistical methods for the analysis and design of experiments using digital PCR (dPCR) have received only limited attention and have been misused in many instances. To address this issue and to provide a more general approach to the analysis of dPCR data, we describe a class of statistical models for the analysis and design of experiments that require quantification of nucleic acids. These models are mathematically equivalent to generalized linear models of binomial responses that include a complementary, log–log link function and an offset that is dependent on the dPCR partition volume. These models are both versatile and easy to fit using conventional statistical software. Covariates can be used to specify different sources of variation in nucleic acid concentration, and a model’s parameters can be used to quantify the effects of these covariates. For purposes of illustration, we analyzed dPCR data from different types of experiments, including serial dilution, evaluation of copy number variation, and quantification of gene expression. We also showed how these models can be used to help design dPCR experiments, as in selection of sample sizes needed to achieve desired levels of precision in estimates of nucleic acid concentration or to detect differences in concentration among treatments with prescribed levels of statistical power.

  13. Statistical Models for the Analysis and Design of Digital Polymerase Chain Reaction (dPCR) Experiments.

    PubMed

    Dorazio, Robert M; Hunter, Margaret E

    2015-11-03

    Statistical methods for the analysis and design of experiments using digital PCR (dPCR) have received only limited attention and have been misused in many instances. To address this issue and to provide a more general approach to the analysis of dPCR data, we describe a class of statistical models for the analysis and design of experiments that require quantification of nucleic acids. These models are mathematically equivalent to generalized linear models of binomial responses that include a complementary, log-log link function and an offset that is dependent on the dPCR partition volume. These models are both versatile and easy to fit using conventional statistical software. Covariates can be used to specify different sources of variation in nucleic acid concentration, and a model's parameters can be used to quantify the effects of these covariates. For purposes of illustration, we analyzed dPCR data from different types of experiments, including serial dilution, evaluation of copy number variation, and quantification of gene expression. We also showed how these models can be used to help design dPCR experiments, as in selection of sample sizes needed to achieve desired levels of precision in estimates of nucleic acid concentration or to detect differences in concentration among treatments with prescribed levels of statistical power.

  14. Sampling designs for contaminant temporal trend analyses using sedentary species exemplified by the snails Bellamya aeruginosa and Viviparus viviparus.

    PubMed

    Yin, Ge; Danielsson, Sara; Dahlberg, Anna-Karin; Zhou, Yihui; Qiu, Yanling; Nyberg, Elisabeth; Bignert, Anders

    2017-10-01

    Environmental monitoring typically assumes samples and sampling activities to be representative of the population being studied. Given a limited budget, an appropriate sampling strategy is essential to support detecting temporal trends of contaminants. In the present study, based on real chemical analysis data on polybrominated diphenyl ethers in snails collected from five subsites in Tianmu Lake, computer simulation is performed to evaluate three sampling strategies by the estimation of required sample size, to reach a detection of an annual change of 5% with a statistical power of 80% and 90% with a significant level of 5%. The results showed that sampling from an arbitrarily selected sampling spot is the worst strategy, requiring much more individual analyses to achieve the above mentioned criteria compared with the other two approaches. A fixed sampling site requires the lowest sample size but may not be representative for the intended study object e.g. a lake and is also sensitive to changes of that particular sampling site. In contrast, sampling at multiple sites along the shore each year, and using pooled samples when the cost to collect and prepare individual specimens are much lower than the cost for chemical analysis, would be the most robust and cost efficient strategy in the long run. Using statistical power as criterion, the results demonstrated quantitatively the consequences of various sampling strategies, and could guide users with respect of required sample sizes depending on sampling design for long term monitoring programs. Copyright © 2017 Elsevier Ltd. All rights reserved.

  15. A survey sampling approach for pesticide monitoring of community water systems using groundwater as a drinking water source.

    PubMed

    Whitmore, Roy W; Chen, Wenlin

    2013-12-04

    The ability to infer human exposure to substances from drinking water using monitoring data helps determine and/or refine potential risks associated with drinking water consumption. We describe a survey sampling approach and its application to an atrazine groundwater monitoring study to adequately characterize upper exposure centiles and associated confidence intervals with predetermined precision. Study design and data analysis included sampling frame definition, sample stratification, sample size determination, allocation to strata, analysis weights, and weighted population estimates. Sampling frame encompassed 15 840 groundwater community water systems (CWS) in 21 states throughout the U. S. Median, and 95th percentile atrazine concentrations were 0.0022 and 0.024 ppb, respectively, for all CWS. Statistical estimates agreed with historical monitoring results, suggesting that the study design was adequate and robust. This methodology makes no assumptions regarding the occurrence distribution (e.g., lognormality); thus analyses based on the design-induced distribution provide the most robust basis for making inferences from the sample to target population.

  16. Statistical strategy for inventorying and monitoring the ecosystem resources of the Mexican States of Jalisco and Colima at multiple scales and resolution levels

    Treesearch

    H. T. Schreuder; M. S. Williams; C. Aguirre-Bravo; P. L. Patterson

    2003-01-01

    The sampling strategy is presented for the initial phase of the natural resources pilot project in the Mexican States of Jalisco and Colima. The sampling design used is ground-based cluster sampling with poststratification based on Landsat Thematic Mapper imagery. The data collected will serve as a basis for additional data collection, mapping, and spatial modeling...

  17. Professional School Counseling (PSC) Publication Pattern Review: A Meta-Study of Author and Article Characteristics from the First 15 Years

    ERIC Educational Resources Information Center

    Erford, Bradley T.; Giguere, Monica; Glenn, Kacie; Ciarlone, Hallie

    2015-01-01

    Patterns of articles published in "Professional School Counseling" (PSC) from the first 15 volumes were reviewed in this meta-study. Author characteristics (e.g., sex, employment setting, nation of domicile) and article characteristics (e.g., topic, type, design, sample, sample size, participant type, statistical procedures and…

  18. Predictor sort sampling and one-sided confidence bounds on quantiles

    Treesearch

    Steve Verrill; Victoria L. Herian; David W. Green

    2002-01-01

    Predictor sort experiments attempt to make use of the correlation between a predictor that can be measured prior to the start of an experiment and the response variable that we are investigating. Properly designed and analyzed, they can reduce necessary sample sizes, increase statistical power, and reduce the lengths of confidence intervals. However, if the non- random...

  19. Validity, Reliability and Difficulty Indices for Instructor-Built Exam Questions

    ERIC Educational Resources Information Center

    Jandaghi, Gholamreza; Shaterian, Fatemeh

    2008-01-01

    The purpose of the research is to determine college Instructor's skill rate in designing exam questions in chemistry subject. The statistical population was all of chemistry exam sheets for two semesters in one academic year from which a sample of 364 exam sheets was drawn using multistage cluster sampling. Two experts assessed the sheets and by…

  20. Sequential ensemble-based optimal design for parameter estimation: SEQUENTIAL ENSEMBLE-BASED OPTIMAL DESIGN

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Man, Jun; Zhang, Jiangjiang; Li, Weixuan

    2016-10-01

    The ensemble Kalman filter (EnKF) has been widely used in parameter estimation for hydrological models. The focus of most previous studies was to develop more efficient analysis (estimation) algorithms. On the other hand, it is intuitively understandable that a well-designed sampling (data-collection) strategy should provide more informative measurements and subsequently improve the parameter estimation. In this work, a Sequential Ensemble-based Optimal Design (SEOD) method, coupled with EnKF, information theory and sequential optimal design, is proposed to improve the performance of parameter estimation. Based on the first-order and second-order statistics, different information metrics including the Shannon entropy difference (SD), degrees ofmore » freedom for signal (DFS) and relative entropy (RE) are used to design the optimal sampling strategy, respectively. The effectiveness of the proposed method is illustrated by synthetic one-dimensional and two-dimensional unsaturated flow case studies. It is shown that the designed sampling strategies can provide more accurate parameter estimation and state prediction compared with conventional sampling strategies. Optimal sampling designs based on various information metrics perform similarly in our cases. The effect of ensemble size on the optimal design is also investigated. Overall, larger ensemble size improves the parameter estimation and convergence of optimal sampling strategy. Although the proposed method is applied to unsaturated flow problems in this study, it can be equally applied in any other hydrological problems.« less

  1. rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-Seq data.

    PubMed

    Shen, Shihao; Park, Juw Won; Lu, Zhi-xiang; Lin, Lan; Henry, Michael D; Wu, Ying Nian; Zhou, Qing; Xing, Yi

    2014-12-23

    Ultra-deep RNA sequencing (RNA-Seq) has become a powerful approach for genome-wide analysis of pre-mRNA alternative splicing. We previously developed multivariate analysis of transcript splicing (MATS), a statistical method for detecting differential alternative splicing between two RNA-Seq samples. Here we describe a new statistical model and computer program, replicate MATS (rMATS), designed for detection of differential alternative splicing from replicate RNA-Seq data. rMATS uses a hierarchical model to simultaneously account for sampling uncertainty in individual replicates and variability among replicates. In addition to the analysis of unpaired replicates, rMATS also includes a model specifically designed for paired replicates between sample groups. The hypothesis-testing framework of rMATS is flexible and can assess the statistical significance over any user-defined magnitude of splicing change. The performance of rMATS is evaluated by the analysis of simulated and real RNA-Seq data. rMATS outperformed two existing methods for replicate RNA-Seq data in all simulation settings, and RT-PCR yielded a high validation rate (94%) in an RNA-Seq dataset of prostate cancer cell lines. Our data also provide guiding principles for designing RNA-Seq studies of alternative splicing. We demonstrate that it is essential to incorporate biological replicates in the study design. Of note, pooling RNAs or merging RNA-Seq data from multiple replicates is not an effective approach to account for variability, and the result is particularly sensitive to outliers. The rMATS source code is freely available at rnaseq-mats.sourceforge.net/. As the popularity of RNA-Seq continues to grow, we expect rMATS will be useful for studies of alternative splicing in diverse RNA-Seq projects.

  2. Study Quality in SLA: An Assessment of Designs, Analyses, and Reporting Practices in Quantitative L2 Research

    ERIC Educational Resources Information Center

    Plonsky, Luke

    2013-01-01

    This study assesses research and reporting practices in quantitative second language (L2) research. A sample of 606 primary studies, published from 1990 to 2010 in "Language Learning and Studies in Second Language Acquisition," was collected and coded for designs, statistical analyses, reporting practices, and outcomes (i.e., effect…

  3. Sample size re-assessment leading to a raised sample size does not inflate type I error rate under mild conditions.

    PubMed

    Broberg, Per

    2013-07-19

    One major concern with adaptive designs, such as the sample size adjustable designs, has been the fear of inflating the type I error rate. In (Stat Med 23:1023-1038, 2004) it is however proven that when observations follow a normal distribution and the interim result show promise, meaning that the conditional power exceeds 50%, type I error rate is protected. This bound and the distributional assumptions may seem to impose undesirable restrictions on the use of these designs. In (Stat Med 30:3267-3284, 2011) the possibility of going below 50% is explored and a region that permits an increased sample size without inflation is defined in terms of the conditional power at the interim. A criterion which is implicit in (Stat Med 30:3267-3284, 2011) is derived by elementary methods and expressed in terms of the test statistic at the interim to simplify practical use. Mathematical and computational details concerning this criterion are exhibited. Under very general conditions the type I error rate is preserved under sample size adjustable schemes that permit a raise. The main result states that for normally distributed observations raising the sample size when the result looks promising, where the definition of promising depends on the amount of knowledge gathered so far, guarantees the protection of the type I error rate. Also, in the many situations where the test statistic approximately follows a normal law, the deviation from the main result remains negligible. This article provides details regarding the Weibull and binomial distributions and indicates how one may approach these distributions within the current setting. There is thus reason to consider such designs more often, since they offer a means of adjusting an important design feature at little or no cost in terms of error rate.

  4. Sample size re-estimation and other midcourse adjustments with sequential parallel comparison design.

    PubMed

    Silverman, Rachel K; Ivanova, Anastasia

    2017-01-01

    Sequential parallel comparison design (SPCD) was proposed to reduce placebo response in a randomized trial with placebo comparator. Subjects are randomized between placebo and drug in stage 1 of the trial, and then, placebo non-responders are re-randomized in stage 2. Efficacy analysis includes all data from stage 1 and all placebo non-responding subjects from stage 2. This article investigates the possibility to re-estimate the sample size and adjust the design parameters, allocation proportion to placebo in stage 1 of SPCD, and weight of stage 1 data in the overall efficacy test statistic during an interim analysis.

  5. Survey methods for assessing land cover map accuracy

    USGS Publications Warehouse

    Nusser, S.M.; Klaas, E.E.

    2003-01-01

    The increasing availability of digital photographic materials has fueled efforts by agencies and organizations to generate land cover maps for states, regions, and the United States as a whole. Regardless of the information sources and classification methods used, land cover maps are subject to numerous sources of error. In order to understand the quality of the information contained in these maps, it is desirable to generate statistically valid estimates of accuracy rates describing misclassification errors. We explored a full sample survey framework for creating accuracy assessment study designs that balance statistical and operational considerations in relation to study objectives for a regional assessment of GAP land cover maps. We focused not only on appropriate sample designs and estimation approaches, but on aspects of the data collection process, such as gaining cooperation of land owners and using pixel clusters as an observation unit. The approach was tested in a pilot study to assess the accuracy of Iowa GAP land cover maps. A stratified two-stage cluster sampling design addressed sample size requirements for land covers and the need for geographic spread while minimizing operational effort. Recruitment methods used for private land owners yielded high response rates, minimizing a source of nonresponse error. Collecting data for a 9-pixel cluster centered on the sampled pixel was simple to implement, and provided better information on rarer vegetation classes as well as substantial gains in precision relative to observing data at a single-pixel.

  6. Statistical Analysis of Adaptive Beam-Forming Methods

    DTIC Science & Technology

    1988-05-01

    minimum amount of computing resources? * What are the tradeoffs being made when a system design selects block averaging over exponential averaging? Will...understood by many signal processing practitioners, however, is how system parameters and the number of sensors effect the distribution of the... system performance improve and if so by how much? b " It is well known that the noise sampled at adjacent sensors is not statistically independent

  7. Statistical Algorithms Accounting for Background Density in the Detection of UXO Target Areas at DoD Munitions Sites

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Matzke, Brett D.; Wilson, John E.; Hathaway, J.

    2008-02-12

    Statistically defensible methods are presented for developing geophysical detector sampling plans and analyzing data for munitions response sites where unexploded ordnance (UXO) may exist. Detection methods for identifying areas of elevated anomaly density from background density are shown. Additionally, methods are described which aid in the choice of transect pattern and spacing to assure with degree of confidence that a target area (TA) of specific size, shape, and anomaly density will be identified using the detection methods. Methods for evaluating the sensitivity of designs to variation in certain parameters are also discussed. Methods presented have been incorporated into the Visualmore » Sample Plan (VSP) software (free at http://dqo.pnl.gov/vsp) and demonstrated at multiple sites in the United States. Application examples from actual transect designs and surveys from the previous two years are demonstrated.« less

  8. Effects of Simplifying Choice Tasks on Estimates of Taste Heterogeneity in Stated-Choice Surveys

    PubMed Central

    Johnson, F. Reed; Ozdemir, Semra; Phillips, Kathryn A

    2011-01-01

    Researchers usually employ orthogonal arrays or D-optimal designs with little or no attribute overlap in stated-choice surveys. The challenge is to balance statistical efficiency and respondent burden to minimize the overall error in the survey responses. This study examined whether simplifying the choice task, by using a design with more overlap, provides advantages over standard minimum-overlap methods. We administered two designs for eliciting HIV test preferences to split samples. Surveys were undertaken at four HIV testing locations in San Francisco, California. Personal characteristics had different effects on willingness to pay for the two treatments, and gains in statistical efficiency in the minimal-overlap version more than compensated for possible imprecision from increased measurement error. PMID:19880234

  9. Statistics 101 for Radiologists.

    PubMed

    Anvari, Arash; Halpern, Elkan F; Samir, Anthony E

    2015-10-01

    Diagnostic tests have wide clinical applications, including screening, diagnosis, measuring treatment effect, and determining prognosis. Interpreting diagnostic test results requires an understanding of key statistical concepts used to evaluate test efficacy. This review explains descriptive statistics and discusses probability, including mutually exclusive and independent events and conditional probability. In the inferential statistics section, a statistical perspective on study design is provided, together with an explanation of how to select appropriate statistical tests. Key concepts in recruiting study samples are discussed, including representativeness and random sampling. Variable types are defined, including predictor, outcome, and covariate variables, and the relationship of these variables to one another. In the hypothesis testing section, we explain how to determine if observed differences between groups are likely to be due to chance. We explain type I and II errors, statistical significance, and study power, followed by an explanation of effect sizes and how confidence intervals can be used to generalize observed effect sizes to the larger population. Statistical tests are explained in four categories: t tests and analysis of variance, proportion analysis tests, nonparametric tests, and regression techniques. We discuss sensitivity, specificity, accuracy, receiver operating characteristic analysis, and likelihood ratios. Measures of reliability and agreement, including κ statistics, intraclass correlation coefficients, and Bland-Altman graphs and analysis, are introduced. © RSNA, 2015.

  10. Interpreting “statistical hypothesis testing” results in clinical research

    PubMed Central

    Sarmukaddam, Sanjeev B.

    2012-01-01

    Difference between “Clinical Significance and Statistical Significance” should be kept in mind while interpreting “statistical hypothesis testing” results in clinical research. This fact is already known to many but again pointed out here as philosophy of “statistical hypothesis testing” is sometimes unnecessarily criticized mainly due to failure in considering such distinction. Randomized controlled trials are also wrongly criticized similarly. Some scientific method may not be applicable in some peculiar/particular situation does not mean that the method is useless. Also remember that “statistical hypothesis testing” is not for decision making and the field of “decision analysis” is very much an integral part of science of statistics. It is not correct to say that “confidence intervals have nothing to do with confidence” unless one understands meaning of the word “confidence” as used in context of confidence interval. Interpretation of the results of every study should always consider all possible alternative explanations like chance, bias, and confounding. Statistical tests in inferential statistics are, in general, designed to answer the question “How likely is the difference found in random sample(s) is due to chance” and therefore limitation of relying only on statistical significance in making clinical decisions should be avoided. PMID:22707861

  11. Adjusted Wald Confidence Interval for a Difference of Binomial Proportions Based on Paired Data

    ERIC Educational Resources Information Center

    Bonett, Douglas G.; Price, Robert M.

    2012-01-01

    Adjusted Wald intervals for binomial proportions in one-sample and two-sample designs have been shown to perform about as well as the best available methods. The adjusted Wald intervals are easy to compute and have been incorporated into introductory statistics courses. An adjusted Wald interval for paired binomial proportions is proposed here and…

  12. The Influence of Experimental Design on the Detection of Performance Differences

    ERIC Educational Resources Information Center

    Bates, B. T.; Dufek, J. S.; James, C. R.; Harry, J. R.; Eggleston, J. D.

    2016-01-01

    We demonstrate the effect of sample and trial size on statistical outcomes for single-subject analyses (SSA) and group analyses (GA) for a frequently studied performance activity and common intervention. Fifty strides of walking data collected in two blocks of 25 trials for two shoe conditions were analyzed for samples of five, eight, 10, and 12…

  13. Human metabolic profiles are stably controlled by genetic and environmental variation

    PubMed Central

    Nicholson, George; Rantalainen, Mattias; Maher, Anthony D; Li, Jia V; Malmodin, Daniel; Ahmadi, Kourosh R; Faber, Johan H; Hallgrímsdóttir, Ingileif B; Barrett, Amy; Toft, Henrik; Krestyaninova, Maria; Viksna, Juris; Neogi, Sudeshna Guha; Dumas, Marc-Emmanuel; Sarkans, Ugis; The MolPAGE Consortium; Silverman, Bernard W; Donnelly, Peter; Nicholson, Jeremy K; Allen, Maxine; Zondervan, Krina T; Lindon, John C; Spector, Tim D; McCarthy, Mark I; Holmes, Elaine; Baunsgaard, Dorrit; Holmes, Chris C

    2011-01-01

    1H Nuclear Magnetic Resonance spectroscopy (1H NMR) is increasingly used to measure metabolite concentrations in sets of biological samples for top-down systems biology and molecular epidemiology. For such purposes, knowledge of the sources of human variation in metabolite concentrations is valuable, but currently sparse. We conducted and analysed a study to create such a resource. In our unique design, identical and non-identical twin pairs donated plasma and urine samples longitudinally. We acquired 1H NMR spectra on the samples, and statistically decomposed variation in metabolite concentration into familial (genetic and common-environmental), individual-environmental, and longitudinally unstable components. We estimate that stable variation, comprising familial and individual-environmental factors, accounts on average for 60% (plasma) and 47% (urine) of biological variation in 1H NMR-detectable metabolite concentrations. Clinically predictive metabolic variation is likely nested within this stable component, so our results have implications for the effective design of biomarker-discovery studies. We provide a power-calculation method which reveals that sample sizes of a few thousand should offer sufficient statistical precision to detect 1H NMR-based biomarkers quantifying predisposition to disease. PMID:21878913

  14. Ichthyoplankton abundance and variance in a large river system concerns for long-term monitoring

    USGS Publications Warehouse

    Holland-Bartels, Leslie E.; Dewey, Michael R.; Zigler, Steven J.

    1995-01-01

    System-wide spatial patterns of ichthyoplankton abundance and variability were assessed in the upper Mississippi and lower Illinois rivers to address the experimental design and statistical confidence in density estimates. Ichthyoplankton was sampled from June to August 1989 in primary milieus (vegetated and non-vegated backwaters and impounded areas, main channels and main channel borders) in three navigation pools (8, 13 and 26) of the upper Mississippi River and in a downstream reach of the Illinois River. Ichthyoplankton densities varied among stations of similar aquatic landscapes (milieus) more than among subsamples within a station. An analysis of sampling effort indicated that the collection of single samples at many stations in a given milieu type is statistically and economically preferable to the collection of multiple subsamples at fewer stations. Cluster analyses also revealed that stations only generally grouped by their preassigned milieu types. Pilot studies such as this can define station groupings and sources of variation beyond an a priori habitat classification. Thus the minimum intensity of sampling required to achieve a desired statistical confidence can be identified before implementing monitoring efforts.

  15. Raising the bar for reproducible science at the U.S. Environmental Protection Agency Office of Research and Development.

    PubMed

    George, Barbara Jane; Sobus, Jon R; Phelps, Lara P; Rashleigh, Brenda; Simmons, Jane Ellen; Hines, Ronald N

    2015-05-01

    Considerable concern has been raised regarding research reproducibility both within and outside the scientific community. Several factors possibly contribute to a lack of reproducibility, including a failure to adequately employ statistical considerations during study design, bias in sample selection or subject recruitment, errors in developing data inclusion/exclusion criteria, and flawed statistical analysis. To address some of these issues, several publishers have developed checklists that authors must complete. Others have either enhanced statistical expertise on existing editorial boards, or formed distinct statistics editorial boards. Although the U.S. Environmental Protection Agency, Office of Research and Development, already has a strong Quality Assurance Program, an initiative was undertaken to further strengthen statistics consideration and other factors in study design and also to ensure these same factors are evaluated during the review and approval of study protocols. To raise awareness of the importance of statistical issues and provide a forum for robust discussion, a Community of Practice for Statistics was formed in January 2014. In addition, three working groups were established to develop a series of questions or criteria that should be considered when designing or reviewing experimental, observational, or modeling focused research. This article describes the process used to develop these study design guidance documents, their contents, how they are being employed by the Agency's research enterprise, and expected benefits to Agency science. The process and guidance documents presented here may be of utility for any research enterprise interested in enhancing the reproducibility of its science. © The Author 2015. Published by Oxford University Press on behalf of the Society of Toxicology.

  16. Analysis and meta-analysis of single-case designs: an introduction.

    PubMed

    Shadish, William R

    2014-04-01

    The last 10 years have seen great progress in the analysis and meta-analysis of single-case designs (SCDs). This special issue includes five articles that provide an overview of current work on that topic, including standardized mean difference statistics, multilevel models, Bayesian statistics, and generalized additive models. Each article analyzes a common example across articles and presents syntax or macros for how to do them. These articles are followed by commentaries from single-case design researchers and journal editors. This introduction briefly describes each article and then discusses several issues that must be addressed before we can know what analyses will eventually be best to use in SCD research. These issues include modeling trend, modeling error covariances, computing standardized effect size estimates, assessing statistical power, incorporating more accurate models of outcome distributions, exploring whether Bayesian statistics can improve estimation given the small samples common in SCDs, and the need for annotated syntax and graphical user interfaces that make complex statistics accessible to SCD researchers. The article then discusses reasons why SCD researchers are likely to incorporate statistical analyses into their research more often in the future, including changing expectations and contingencies regarding SCD research from outside SCD communities, changes and diversity within SCD communities, corrections of erroneous beliefs about the relationship between SCD research and statistics, and demonstrations of how statistics can help SCD researchers better meet their goals. Copyright © 2013 Society for the Study of School Psychology. Published by Elsevier Ltd. All rights reserved.

  17. Integrated flight/propulsion control system design based on a decentralized, hierarchical approach

    NASA Technical Reports Server (NTRS)

    Mattern, Duane; Garg, Sanjay; Bullard, Randy

    1989-01-01

    A sample integrated flight/propulsion control system design is presented for the piloted longitudinal landing task with a modern, statistically unstable fighter aircraft. The design procedure is summarized. The vehicle model used in the sample study is described, and the procedure for partitioning the integrated system is presented along with a description of the subsystems. The high-level airframe performance specifications and control design are presented and the control performance is evaluated. The generation of the low-level (engine) subsystem specifications from the airframe requirements are discussed, and the engine performance specifications are presented along with the subsystem control design. A compensator to accommodate the influence of airframe outputs on the engine subsystem is also considered. Finally, the entire closed loop system performance and stability characteristics are examined.

  18. Integrated flight/propulsion control system design based on a decentralized, hierarchical approach

    NASA Technical Reports Server (NTRS)

    Mattern, Duane; Garg, Sanjay; Bullard, Randy

    1989-01-01

    A sample integrated flight/propulsion control system design is presented for the piloted longitiudinal landing task with a modern, statistically unstable fighter aircraft. The design procedure is summarized, the vehicle model used in the sample study is described, and the procedure for partitioning the integrated system is presented along with a description of the subsystems. The high-level airframe performance specifications and control design are presented and the control performance is evaluated. The generation of the low-level (engine) subsystem specifications from the airframe requirements are discussed, and the engine performance specifications are presented along with the subsystem control design. A compensator to accommodate the influence of airframe outputs on the engine subsystem is also considered. Finally, the entire closed loop system performance and stability characteristics are examined.

  19. Design of point-of-care (POC) microfluidic medical diagnostic devices

    NASA Astrophysics Data System (ADS)

    Leary, James F.

    2018-02-01

    Design of inexpensive and portable hand-held microfluidic flow/image cytometry devices for initial medical diagnostics at the point of initial patient contact by emergency medical personnel in the field requires careful design in terms of power/weight requirements to allow for realistic portability as a hand-held, point-of-care medical diagnostics device. True portability also requires small micro-pumps for high-throughput capability. Weight/power requirements dictate use of super-bright LEDs and very small silicon photodiodes or nanophotonic sensors that can be powered by batteries. Signal-to-noise characteristics can be greatly improved by appropriately pulsing the LED excitation sources and sampling and subtracting noise in between excitation pulses. The requirements for basic computing, imaging, GPS and basic telecommunications can be simultaneously met by use of smartphone technologies, which become part of the overall device. Software for a user-interface system, limited real-time computing, real-time imaging, and offline data analysis can be accomplished through multi-platform software development systems that are well-suited to a variety of currently available cellphone technologies which already contain all of these capabilities. Microfluidic cytometry requires judicious use of small sample volumes and appropriate statistical sampling by microfluidic cytometry or imaging for adequate statistical significance to permit real-time (typically < 15 minutes) medical decisions for patients at the physician's office or real-time decision making in the field. One or two drops of blood obtained by pin-prick should be able to provide statistically meaningful results for use in making real-time medical decisions without the need for blood fractionation, which is not realistic in the field.

  20. Design-based stereology: Planning, volumetry and sampling are crucial steps for a successful study.

    PubMed

    Tschanz, Stefan; Schneider, Jan Philipp; Knudsen, Lars

    2014-01-01

    Quantitative data obtained by means of design-based stereology can add valuable information to studies performed on a diversity of organs, in particular when correlated to functional/physiological and biochemical data. Design-based stereology is based on a sound statistical background and can be used to generate accurate data which are in line with principles of good laboratory practice. In addition, by adjusting the study design an appropriate precision can be achieved to find relevant differences between groups. For the success of the stereological assessment detailed planning is necessary. In this review we focus on common pitfalls encountered during stereological assessment. An exemplary workflow is included, and based on authentic examples, we illustrate a number of sampling principles which can be implemented to obtain properly sampled tissue blocks for various purposes. Copyright © 2013 Elsevier GmbH. All rights reserved.

  1. Data-Division-Specific Robustness and Power of Randomization Tests for ABAB Designs

    ERIC Educational Resources Information Center

    Manolov, Rumen; Solanas, Antonio; Bulte, Isis; Onghena, Patrick

    2010-01-01

    This study deals with the statistical properties of a randomization test applied to an ABAB design in cases where the desirable random assignment of the points of change in phase is not possible. To obtain information about each possible data division, the authors carried out a conditional Monte Carlo simulation with 100,000 samples for each…

  2. 40 CFR Appendix 6 to Subpart A of... - Reverse Phase Extraction (RPE) Method for Detection of Oil Contamination in Non-Aqueous Drilling...

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... rigorous statistical experimental design and interpretation (Reference 16.4). 14.0Pollution Prevention 14... fluids. 1.4This method has been designed to show positive contamination for 5% of representative crude....1Sample collection bottles/jars—New, pre-cleaned bottles/jars, lot-certified to be free of artifacts...

  3. 40 CFR Appendix 6 to Subpart A of... - Reverse Phase Extraction (RPE) Method for Detection of Oil Contamination in Non-Aqueous Drilling...

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... rigorous statistical experimental design and interpretation (Reference 16.4). 14.0Pollution Prevention 14... fluids. 1.4This method has been designed to show positive contamination for 5% of representative crude....1Sample collection bottles/jars—New, pre-cleaned bottles/jars, lot-certified to be free of artifacts...

  4. 40 CFR Appendix 6 to Subpart A of... - Reverse Phase Extraction (RPE) Method for Detection of Oil Contamination in Non-Aqueous Drilling...

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... rigorous statistical experimental design and interpretation (Reference 16.4). 14.0Pollution Prevention 14... fluids. 1.4This method has been designed to show positive contamination for 5% of representative crude....1Sample collection bottles/jars—New, pre-cleaned bottles/jars, lot-certified to be free of artifacts...

  5. 40 CFR Appendix 6 to Subpart A of... - Reverse Phase Extraction (RPE) Method for Detection of Oil Contamination in Non-Aqueous Drilling...

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... rigorous statistical experimental design and interpretation (Reference 16.4). 14.0Pollution Prevention 14... oil contamination in drilling fluids. 1.4This method has been designed to show positive contamination....1Sample collection bottles/jars—New, pre-cleaned bottles/jars, lot-certified to be free of artifacts...

  6. EPA-ORD MEASUREMENT SCIENCE SUPPORT FOR HOMELAND SECURITY

    EPA Science Inventory

    This presentation will describe the organization and the research and development activities of the ORD National Exposure Measurements Center and will focus on the Center's planned role in providing analytical method development, statistical sampling and design guidance, quality ...

  7. Statistical Assessment of a Paired-site Approach for Verification of Carbon and Nitrogen Sequestration on CRP Land

    NASA Astrophysics Data System (ADS)

    Kucharik, C.; Roth, J.

    2002-12-01

    The threat of global climate change has provoked policy-makers to consider plausible strategies to slow the accumulation of greenhouse gases, especially carbon dioxide, in the atmosphere. One such idea involves the sequestration of atmospheric carbon (C) in degraded agricultural soils as part of the Conservation Reserve Program (CRP). While the potential for significant C sequestration in CRP grassland ecosystems has been demonstrated, the paired-site sampling approach traditionally used to quantify soil C changes has not been evaluated with robust statistical analysis. In this study, 14 paired CRP (> 8 years old) and cropland sites in Dane County, Wisconsin (WI) were used to assess whether a paired-site sampling design could detect statistically significant differences (ANOVA) in mean soil organic C and total nitrogen (N) storage. We compared surface (0 to 10 cm) bulk density, and sampled soils (0 to 5, 5 to 10, and 10 to 25 cm) for textural differences and chemical analysis of organic matter (OM), soil organic C (SOC), total N, and pH. The CRP contributed to lowering soil bulk density by 13% (p < 0.0001) and increased SOC and OM storage (kg m-2) by 13 to 17% in the 0 to 5 cm layer (p = 0.1). We tested the statistical power associated with ANOVA for measured soil properties, and calculated minimum detectable differences (MDD). We concluded that 40 to 65 paired sites and soil sampling in 5 cm increments near the surface were needed to achieve an 80% confidence level (α = 0.05; β = 0.20) in soil C and N sequestration rates. Because soil C and total N storage was highly variable among these sites (CVs > 20%), only a 23 to 29% change in existing total organic C and N pools could be reliably detected. While C and N sequestration (247 kg C ha{-1 } yr-1 and 17 kg N ha-1 yr-1) may be occurring and confined to the surface 5 cm as part of the WI CRP, our sampling design did not statistically support the desired 80% confidence level. We conclude that usage of statistical power analysis is essential to insure a high level of confidence in soil C and N sequestration rates that are quantified using paired plots.

  8. Evaluation of Primary Immunization Coverage of Infants Under Universal Immunization Programme in an Urban Area of Bangalore City Using Cluster Sampling and Lot Quality Assurance Sampling Techniques

    PubMed Central

    K, Punith; K, Lalitha; G, Suman; BS, Pradeep; Kumar K, Jayanth

    2008-01-01

    Research Question: Is LQAS technique better than cluster sampling technique in terms of resources to evaluate the immunization coverage in an urban area? Objective: To assess and compare the lot quality assurance sampling against cluster sampling in the evaluation of primary immunization coverage. Study Design: Population-based cross-sectional study. Study Setting: Areas under Mathikere Urban Health Center. Study Subjects: Children aged 12 months to 23 months. Sample Size: 220 in cluster sampling, 76 in lot quality assurance sampling. Statistical Analysis: Percentages and Proportions, Chi square Test. Results: (1) Using cluster sampling, the percentage of completely immunized, partially immunized and unimmunized children were 84.09%, 14.09% and 1.82%, respectively. With lot quality assurance sampling, it was 92.11%, 6.58% and 1.31%, respectively. (2) Immunization coverage levels as evaluated by cluster sampling technique were not statistically different from the coverage value as obtained by lot quality assurance sampling techniques. Considering the time and resources required, it was found that lot quality assurance sampling is a better technique in evaluating the primary immunization coverage in urban area. PMID:19876474

  9. A Non-Intrusive Algorithm for Sensitivity Analysis of Chaotic Flow Simulations

    NASA Technical Reports Server (NTRS)

    Blonigan, Patrick J.; Wang, Qiqi; Nielsen, Eric J.; Diskin, Boris

    2017-01-01

    We demonstrate a novel algorithm for computing the sensitivity of statistics in chaotic flow simulations to parameter perturbations. The algorithm is non-intrusive but requires exposing an interface. Based on the principle of shadowing in dynamical systems, this algorithm is designed to reduce the effect of the sampling error in computing sensitivity of statistics in chaotic simulations. We compare the effectiveness of this method to that of the conventional finite difference method.

  10. Factors That Explain the Attitude towards Statistics in High-School Students: Empirical Evidence at Technological Study Center of the Sea in Veracruz, Mexico

    ERIC Educational Resources Information Center

    Rojas-Kramer, Carlos; Limón-Suárez, Enrique; Moreno-García, Elena; García-Santillán, Arturo

    2018-01-01

    The aim of this paper was to analyze attitude towards statistics in high-school students using the SATS scale designed by Auzmendi (1992). The sample was 200 students from the sixth semester of the afternoon shift, who were enrolled in technical careers from the Technological Study Center of the Sea (Centro de Estudios Tecnológicos del Mar 07…

  11. Assessing signal-to-noise in quantitative proteomics: multivariate statistical analysis in DIGE experiments.

    PubMed

    Friedman, David B

    2012-01-01

    All quantitative proteomics experiments measure variation between samples. When performing large-scale experiments that involve multiple conditions or treatments, the experimental design should include the appropriate number of individual biological replicates from each condition to enable the distinction between a relevant biological signal from technical noise. Multivariate statistical analyses, such as principal component analysis (PCA), provide a global perspective on experimental variation, thereby enabling the assessment of whether the variation describes the expected biological signal or the unanticipated technical/biological noise inherent in the system. Examples will be shown from high-resolution multivariable DIGE experiments where PCA was instrumental in demonstrating biologically significant variation as well as sample outliers, fouled samples, and overriding technical variation that would not be readily observed using standard univariate tests.

  12. Design of Neural Networks for Fast Convergence and Accuracy

    NASA Technical Reports Server (NTRS)

    Maghami, Peiman G.; Sparks, Dean W., Jr.

    1998-01-01

    A novel procedure for the design and training of artificial neural networks, used for rapid and efficient controls and dynamics design and analysis for flexible space systems, has been developed. Artificial neural networks are employed to provide a means of evaluating the impact of design changes rapidly. Specifically, two-layer feedforward neural networks are designed to approximate the functional relationship between the component spacecraft design changes and measures of its performance. A training algorithm, based on statistical sampling theory, is presented, which guarantees that the trained networks provide a designer-specified degree of accuracy in mapping the functional relationship. Within each iteration of this statistical-based algorithm, a sequential design algorithm is used for the design and training of the feedforward network to provide rapid convergence to the network goals. Here, at each sequence a new network is trained to minimize the error of previous network. The design algorithm attempts to avoid the local minima phenomenon that hampers the traditional network training. A numerical example is performed on a spacecraft application in order to demonstrate the feasibility of the proposed approach.

  13. Factorial analysis of trihalomethanes formation in drinking water.

    PubMed

    Chowdhury, Shakhawat; Champagne, Pascale; McLellan, P James

    2010-06-01

    Disinfection of drinking water reduces pathogenic infection, but may pose risks to human health through the formation of disinfection byproducts. The effects of different factors on the formation of trihalomethanes were investigated using a statistically designed experimental program, and a predictive model for trihalomethanes formation was developed. Synthetic water samples with different factor levels were produced, and trihalomethanes concentrations were measured. A replicated fractional factorial design with center points was performed, and significant factors were identified through statistical analysis. A second-order trihalomethanes formation model was developed from 92 experiments, and the statistical adequacy was assessed through appropriate diagnostics. This model was validated using additional data from the Drinking Water Surveillance Program database and was applied to the Smiths Falls water supply system in Ontario, Canada. The model predictions were correlated strongly to the measured trihalomethanes, with correlations of 0.95 and 0.91, respectively. The resulting model can assist in analyzing risk-cost tradeoffs in the design and operation of water supply systems.

  14. Rank-based permutation approaches for non-parametric factorial designs.

    PubMed

    Umlauft, Maria; Konietschke, Frank; Pauly, Markus

    2017-11-01

    Inference methods for null hypotheses formulated in terms of distribution functions in general non-parametric factorial designs are studied. The methods can be applied to continuous, ordinal or even ordered categorical data in a unified way, and are based only on ranks. In this set-up Wald-type statistics and ANOVA-type statistics are the current state of the art. The first method is asymptotically exact but a rather liberal statistical testing procedure for small to moderate sample size, while the latter is only an approximation which does not possess the correct asymptotic α level under the null. To bridge these gaps, a novel permutation approach is proposed which can be seen as a flexible generalization of the Kruskal-Wallis test to all kinds of factorial designs with independent observations. It is proven that the permutation principle is asymptotically correct while keeping its finite exactness property when data are exchangeable. The results of extensive simulation studies foster these theoretical findings. A real data set exemplifies its applicability. © 2017 The British Psychological Society.

  15. A bibliometric analysis of statistical terms used in American Physical Therapy Association journals (2011-2012): evidence for educating physical therapists.

    PubMed

    Tilson, Julie K; Marshall, Katie; Tam, Jodi J; Fetters, Linda

    2016-04-22

    A primary barrier to the implementation of evidence based practice (EBP) in physical therapy is therapists' limited ability to understand and interpret statistics. Physical therapists demonstrate limited skills and report low self-efficacy for interpreting results of statistical procedures. While standards for physical therapist education include statistics, little empirical evidence is available to inform what should constitute such curricula. The purpose of this study was to conduct a census of the statistical terms and study designs used in physical therapy literature and to use the results to make recommendations for curricular development in physical therapist education. We conducted a bibliometric analysis of 14 peer-reviewed journals associated with the American Physical Therapy Association over 12 months (Oct 2011-Sept 2012). Trained raters recorded every statistical term appearing in identified systematic reviews, primary research reports, and case series and case reports. Investigator-reported study design was also recorded. Terms representing the same statistical test or concept were combined into a single, representative term. Cumulative percentage was used to identify the most common representative statistical terms. Common representative terms were organized into eight categories to inform curricular design. Of 485 articles reviewed, 391 met the inclusion criteria. These 391 articles used 532 different terms which were combined into 321 representative terms; 13.1 (sd = 8.0) terms per article. Eighty-one representative terms constituted 90% of all representative term occurrences. Of the remaining 240 representative terms, 105 (44%) were used in only one article. The most common study design was prospective cohort (32.5%). Physical therapy literature contains a large number of statistical terms and concepts for readers to navigate. However, in the year sampled, 81 representative terms accounted for 90% of all occurrences. These "common representative terms" can be used to inform curricula to promote physical therapists' skills, competency, and confidence in interpreting statistics in their professional literature. We make specific recommendations for curriculum development informed by our findings.

  16. Design and Field Procedures in the US National Comorbidity Survey Replication Adolescent Supplement (NCS-A)

    PubMed Central

    Kessler, Ronald C.; Avenevoli, Shelli; Costello, E. Jane; Green, Jennifer Greif; Gruber, Michael J.; Heeringa, Steven; Merikangas, Kathleen R.; Pennell, Beth-Ellen; Sampson, Nancy A.; Zaslavsky, Alan M.

    2009-01-01

    An overview is presented of the design and field procedures of the US National Comorbidity Survey Replication Adolescent Supplement (NCS-A), a US face-to-face household survey of the prevalence and correlates of DSM-IV mental disorders. The survey was based on a dual-frame design that included 904 adolescent residents of the households that participated in the US National Comorbidity Survey Replication (85.9% response rate) and 9,244 adolescent students selected from a nationally representative sample of 320 schools (74.7% response rate). After expositing the logic of dual-frame designs, comparisons are presented of sample and population distributions on Census socio-demographic variables and, in the school sample, school characteristics. These document only minor differences between the samples and the population. The results of statistical analysis of the bias-efficiency trade-off in weight trimming are then presented. These show that modest trimming meaningfully reduces mean squared error. Analysis of comparative sample efficiency shows that the household sample is more efficient than the school sample, leading to the household sample getting a higher weight relative to its size in the consolidated sample relative to the school sample. Taken together, these results show that the NCS-A is an efficient sample of the target population with good representativeness on a range of socio-demographic and geographic variables. PMID:19507169

  17. Commodity Movements on the Texas Highway System: Data Collection and Survey Results

    DOT National Transportation Integrated Search

    1991-11-01

    This report presents the survey procedures used and data collected in the : development of commodity flow statistics for movements over Texas Highways. : Response rates, sampling procedures, questionnaire design and the types of data : provided by th...

  18. Statistical design and analysis plan for an impact evaluation of an HIV treatment and prevention intervention for female sex workers in Zimbabwe: a study protocol for a cluster randomised controlled trial.

    PubMed

    Hargreaves, James R; Fearon, Elizabeth; Davey, Calum; Phillips, Andrew; Cambiano, Valentina; Cowan, Frances M

    2016-01-05

    Pragmatic cluster-randomised trials should seek to make unbiased estimates of effect and be reported according to CONSORT principles, and the study population should be representative of the target population. This is challenging when conducting trials amongst 'hidden' populations without a sample frame. We describe a pair-matched cluster-randomised trial of a combination HIV-prevention intervention to reduce the proportion of female sex workers (FSW) with a detectable HIV viral load in Zimbabwe, recruiting via respondent driven sampling (RDS). We will cross-sectionally survey approximately 200 FSW at baseline and at endline to characterise each of 14 sites. RDS is a variant of chain referral sampling and has been adapted to approximate random sampling. Primary analysis will use the 'RDS-2' method to estimate cluster summaries and will adapt Hayes and Moulton's '2-step' method to adjust effect estimates for individual-level confounders and further adjust for cluster baseline prevalence. We will adapt CONSORT to accommodate RDS. In the absence of observable refusal rates, we will compare the recruitment process between matched pairs. We will need to investigate whether cluster-specific recruitment or the intervention itself affects the accuracy of the RDS estimation process, potentially causing differential biases. To do this, we will calculate RDS-diagnostic statistics for each cluster at each time point and compare these statistics within matched pairs and time points. Sensitivity analyses will assess the impact of potential biases arising from assumptions made by the RDS-2 estimation. We are not aware of any other completed pragmatic cluster RCTs that are recruiting participants using RDS. Our statistical design and analysis approach seeks to transparently document participant recruitment and allow an assessment of the representativeness of the study to the target population, a key aspect of pragmatic trials. The challenges we have faced in the design of this trial are likely to be shared in other contexts aiming to serve the needs of legally and/or socially marginalised populations for which no sampling frame exists and especially when the social networks of participants are both the target of intervention and the means of recruitment. The trial was registered at Pan African Clinical Trials Registry (PACTR201312000722390) on 9 December 2013.

  19. A statistical approach to selecting and confirming validation targets in -omics experiments

    PubMed Central

    2012-01-01

    Background Genomic technologies are, by their very nature, designed for hypothesis generation. In some cases, the hypotheses that are generated require that genome scientists confirm findings about specific genes or proteins. But one major advantage of high-throughput technology is that global genetic, genomic, transcriptomic, and proteomic behaviors can be observed. Manual confirmation of every statistically significant genomic result is prohibitively expensive. This has led researchers in genomics to adopt the strategy of confirming only a handful of the most statistically significant results, a small subset chosen for biological interest, or a small random subset. But there is no standard approach for selecting and quantitatively evaluating validation targets. Results Here we present a new statistical method and approach for statistically validating lists of significant results based on confirming only a small random sample. We apply our statistical method to show that the usual practice of confirming only the most statistically significant results does not statistically validate result lists. We analyze an extensively validated RNA-sequencing experiment to show that confirming a random subset can statistically validate entire lists of significant results. Finally, we analyze multiple publicly available microarray experiments to show that statistically validating random samples can both (i) provide evidence to confirm long gene lists and (ii) save thousands of dollars and hundreds of hours of labor over manual validation of each significant result. Conclusions For high-throughput -omics studies, statistical validation is a cost-effective and statistically valid approach to confirming lists of significant results. PMID:22738145

  20. How Big of a Problem is Analytic Error in Secondary Analyses of Survey Data?

    PubMed

    West, Brady T; Sakshaug, Joseph W; Aurelien, Guy Alain S

    2016-01-01

    Secondary analyses of survey data collected from large probability samples of persons or establishments further scientific progress in many fields. The complex design features of these samples improve data collection efficiency, but also require analysts to account for these features when conducting analysis. Unfortunately, many secondary analysts from fields outside of statistics, biostatistics, and survey methodology do not have adequate training in this area, and as a result may apply incorrect statistical methods when analyzing these survey data sets. This in turn could lead to the publication of incorrect inferences based on the survey data that effectively negate the resources dedicated to these surveys. In this article, we build on the results of a preliminary meta-analysis of 100 peer-reviewed journal articles presenting analyses of data from a variety of national health surveys, which suggested that analytic errors may be extremely prevalent in these types of investigations. We first perform a meta-analysis of a stratified random sample of 145 additional research products analyzing survey data from the Scientists and Engineers Statistical Data System (SESTAT), which describes features of the U.S. Science and Engineering workforce, and examine trends in the prevalence of analytic error across the decades used to stratify the sample. We once again find that analytic errors appear to be quite prevalent in these studies. Next, we present several example analyses of real SESTAT data, and demonstrate that a failure to perform these analyses correctly can result in substantially biased estimates with standard errors that do not adequately reflect complex sample design features. Collectively, the results of this investigation suggest that reviewers of this type of research need to pay much closer attention to the analytic methods employed by researchers attempting to publish or present secondary analyses of survey data.

  1. How Big of a Problem is Analytic Error in Secondary Analyses of Survey Data?

    PubMed Central

    West, Brady T.; Sakshaug, Joseph W.; Aurelien, Guy Alain S.

    2016-01-01

    Secondary analyses of survey data collected from large probability samples of persons or establishments further scientific progress in many fields. The complex design features of these samples improve data collection efficiency, but also require analysts to account for these features when conducting analysis. Unfortunately, many secondary analysts from fields outside of statistics, biostatistics, and survey methodology do not have adequate training in this area, and as a result may apply incorrect statistical methods when analyzing these survey data sets. This in turn could lead to the publication of incorrect inferences based on the survey data that effectively negate the resources dedicated to these surveys. In this article, we build on the results of a preliminary meta-analysis of 100 peer-reviewed journal articles presenting analyses of data from a variety of national health surveys, which suggested that analytic errors may be extremely prevalent in these types of investigations. We first perform a meta-analysis of a stratified random sample of 145 additional research products analyzing survey data from the Scientists and Engineers Statistical Data System (SESTAT), which describes features of the U.S. Science and Engineering workforce, and examine trends in the prevalence of analytic error across the decades used to stratify the sample. We once again find that analytic errors appear to be quite prevalent in these studies. Next, we present several example analyses of real SESTAT data, and demonstrate that a failure to perform these analyses correctly can result in substantially biased estimates with standard errors that do not adequately reflect complex sample design features. Collectively, the results of this investigation suggest that reviewers of this type of research need to pay much closer attention to the analytic methods employed by researchers attempting to publish or present secondary analyses of survey data. PMID:27355817

  2. Statistical generation of training sets for measuring NO3(-), NH4(+) and major ions in natural waters using an ion selective electrode array.

    PubMed

    Mueller, Amy V; Hemond, Harold F

    2016-05-18

    Knowledge of ionic concentrations in natural waters is essential to understand watershed processes. Inorganic nitrogen, in the form of nitrate and ammonium ions, is a key nutrient as well as a participant in redox, acid-base, and photochemical processes of natural waters, leading to spatiotemporal patterns of ion concentrations at scales as small as meters or hours. Current options for measurement in situ are costly, relying primarily on instruments adapted from laboratory methods (e.g., colorimetric, UV absorption); free-standing and inexpensive ISE sensors for NO3(-) and NH4(+) could be attractive alternatives if interferences from other constituents were overcome. Multi-sensor arrays, coupled with appropriate non-linear signal processing, offer promise in this capacity but have not yet successfully achieved signal separation for NO3(-) and NH4(+)in situ at naturally occurring levels in unprocessed water samples. A novel signal processor, underpinned by an appropriate sensor array, is proposed that overcomes previous limitations by explicitly integrating basic chemical constraints (e.g., charge balance). This work further presents a rationalized process for the development of such in situ instrumentation for NO3(-) and NH4(+), including a statistical-modeling strategy for instrument design, training/calibration, and validation. Statistical analysis reveals that historical concentrations of major ionic constituents in natural waters across New England strongly covary and are multi-modal. This informs the design of a statistically appropriate training set, suggesting that the strong covariance of constituents across environmental samples can be exploited through appropriate signal processing mechanisms to further improve estimates of minor constituents. Two artificial neural network architectures, one expanded to incorporate knowledge of basic chemical constraints, were tested to process outputs of a multi-sensor array, trained using datasets of varying degrees of statistical representativeness to natural water samples. The accuracy of ANN results improves monotonically with the statistical representativeness of the training set (error decreases by ∼5×), while the expanded neural network architecture contributes a further factor of 2-3.5 decrease in error when trained with the most representative sample set. Results using the most statistically accurate set of training samples (which retain environmentally relevant ion concentrations but avoid the potential interference of humic acids) demonstrated accurate, unbiased quantification of nitrate and ammonium at natural environmental levels (±20% down to <10 μM), as well as the major ions Na(+), K(+), Ca(2+), Mg(2+), Cl(-), and SO4(2-), in unprocessed samples. These results show promise for the development of new in situ instrumentation for the support of scientific field work.

  3. Design of partially supervised classifiers for multispectral image data

    NASA Technical Reports Server (NTRS)

    Jeon, Byeungwoo; Landgrebe, David

    1993-01-01

    A partially supervised classification problem is addressed, especially when the class definition and corresponding training samples are provided a priori only for just one particular class. In practical applications of pattern classification techniques, a frequently observed characteristic is the heavy, often nearly impossible requirements on representative prior statistical class characteristics of all classes in a given data set. Considering the effort in both time and man-power required to have a well-defined, exhaustive list of classes with a corresponding representative set of training samples, this 'partially' supervised capability would be very desirable, assuming adequate classifier performance can be obtained. Two different classification algorithms are developed to achieve simplicity in classifier design by reducing the requirement of prior statistical information without sacrificing significant classifying capability. The first one is based on optimal significance testing, where the optimal acceptance probability is estimated directly from the data set. In the second approach, the partially supervised classification is considered as a problem of unsupervised clustering with initially one known cluster or class. A weighted unsupervised clustering procedure is developed to automatically define other classes and estimate their class statistics. The operational simplicity thus realized should make these partially supervised classification schemes very viable tools in pattern classification.

  4. Tips and Tricks for Successful Application of Statistical Methods to Biological Data.

    PubMed

    Schlenker, Evelyn

    2016-01-01

    This chapter discusses experimental design and use of statistics to describe characteristics of data (descriptive statistics) and inferential statistics that test the hypothesis posed by the investigator. Inferential statistics, based on probability distributions, depend upon the type and distribution of the data. For data that are continuous, randomly and independently selected, as well as normally distributed more powerful parametric tests such as Student's t test and analysis of variance (ANOVA) can be used. For non-normally distributed or skewed data, transformation of the data (using logarithms) may normalize the data allowing use of parametric tests. Alternatively, with skewed data nonparametric tests can be utilized, some of which rely on data that are ranked prior to statistical analysis. Experimental designs and analyses need to balance between committing type 1 errors (false positives) and type 2 errors (false negatives). For a variety of clinical studies that determine risk or benefit, relative risk ratios (random clinical trials and cohort studies) or odds ratios (case-control studies) are utilized. Although both use 2 × 2 tables, their premise and calculations differ. Finally, special statistical methods are applied to microarray and proteomics data, since the large number of genes or proteins evaluated increase the likelihood of false discoveries. Additional studies in separate samples are used to verify microarray and proteomic data. Examples in this chapter and references are available to help continued investigation of experimental designs and appropriate data analysis.

  5. Are quantitative trait-dependent sampling designs cost-effective for analysis of rare and common variants?

    PubMed

    Yilmaz, Yildiz E; Bull, Shelley B

    2011-11-29

    Use of trait-dependent sampling designs in whole-genome association studies of sequence data can reduce total sequencing costs with modest losses of statistical efficiency. In a quantitative trait (QT) analysis of data from the Genetic Analysis Workshop 17 mini-exome for unrelated individuals in the Asian subpopulation, we investigate alternative designs that sequence only 50% of the entire cohort. In addition to a simple random sampling design, we consider extreme-phenotype designs that are of increasing interest in genetic association analysis of QTs, especially in studies concerned with the detection of rare genetic variants. We also evaluate a novel sampling design in which all individuals have a nonzero probability of being selected into the sample but in which individuals with extreme phenotypes have a proportionately larger probability. We take differential sampling of individuals with informative trait values into account by inverse probability weighting using standard survey methods which thus generalizes to the source population. In replicate 1 data, we applied the designs in association analysis of Q1 with both rare and common variants in the FLT1 gene, based on knowledge of the generating model. Using all 200 replicate data sets, we similarly analyzed Q1 and Q4 (which is known to be free of association with FLT1) to evaluate relative efficiency, type I error, and power. Simulation study results suggest that the QT-dependent selection designs generally yield greater than 50% relative efficiency compared to using the entire cohort, implying cost-effectiveness of 50% sample selection and worthwhile reduction of sequencing costs.

  6. GOST: A generic ordinal sequential trial design for a treatment trial in an emerging pandemic.

    PubMed

    Whitehead, John; Horby, Peter

    2017-03-01

    Conducting clinical trials to assess experimental treatments for potentially pandemic infectious diseases is challenging. Since many outbreaks of infectious diseases last only six to eight weeks, there is a need for trial designs that can be implemented rapidly in the face of uncertainty. Outbreaks are sudden and unpredictable and so it is essential that as much planning as possible takes place in advance. Statistical aspects of such trial designs should be evaluated and discussed in readiness for implementation. This paper proposes a generic ordinal sequential trial design (GOST) for a randomised clinical trial comparing an experimental treatment for an emerging infectious disease with standard care. The design is intended as an off-the-shelf, ready-to-use robust and flexible option. The primary endpoint is a categorisation of patient outcome according to an ordinal scale. A sequential approach is adopted, stopping as soon as it is clear that the experimental treatment has an advantage or that sufficient advantage is unlikely to be detected. The properties of the design are evaluated using large-sample theory and verified for moderate sized samples using simulation. The trial is powered to detect a generic clinically relevant difference: namely an odds ratio of 2 for better rather than worse outcomes. Total sample sizes (across both treatments) of between 150 and 300 patients prove to be adequate in many cases, but the precise value depends on both the magnitude of the treatment advantage and the nature of the ordinal scale. An advantage of the approach is that any erroneous assumptions made at the design stage about the proportion of patients falling into each outcome category have little effect on the error probabilities of the study, although they can lead to inaccurate forecasts of sample size. It is important and feasible to pre-determine many of the statistical aspects of an efficient trial design in advance of a disease outbreak. The design can then be tailored to the specific disease under study once its nature is better understood.

  7. Feasibility Study for Design of a Biocybernetic Communication System

    DTIC Science & Technology

    1975-08-01

    electrode for the Within Words variance and Between Words variance for each of the 255 data samples in the 6-sec epoch. If a given sample point was not...contributing to the computer classification of the word, the ratio of the two variances (i.e., the F-statistic) should be small. On the other hand...if the Between Word variance was signifi- cantly higher than the Within Word variance for a given sample point, we can assume with some confidence

  8. Modified Toxicity Probability Interval Design: A Safer and More Reliable Method Than the 3 + 3 Design for Practical Phase I Trials

    PubMed Central

    Ji, Yuan; Wang, Sue-Jane

    2013-01-01

    The 3 + 3 design is the most common choice among clinicians for phase I dose-escalation oncology trials. In recent reviews, more than 95% of phase I trials have been based on the 3 + 3 design. Given that it is intuitive and its implementation does not require a computer program, clinicians can conduct 3 + 3 dose escalations in practice with virtually no logistic cost, and trial protocols based on the 3 + 3 design pass institutional review board and biostatistics reviews quickly. However, the performance of the 3 + 3 design has rarely been compared with model-based designs in simulation studies with matched sample sizes. In the vast majority of statistical literature, the 3 + 3 design has been shown to be inferior in identifying true maximum-tolerated doses (MTDs), although the sample size required by the 3 + 3 design is often orders-of-magnitude smaller than model-based designs. In this article, through comparative simulation studies with matched sample sizes, we demonstrate that the 3 + 3 design has higher risks of exposing patients to toxic doses above the MTD than the modified toxicity probability interval (mTPI) design, a newly developed adaptive method. In addition, compared with the mTPI design, the 3 + 3 design does not yield higher probabilities in identifying the correct MTD, even when the sample size is matched. Given that the mTPI design is equally transparent, costless to implement with free software, and more flexible in practical situations, we highly encourage its adoption in early dose-escalation studies whenever the 3 + 3 design is also considered. We provide free software to allow direct comparisons of the 3 + 3 design with other model-based designs in simulation studies with matched sample sizes. PMID:23569307

  9. Implications of clinical trial design on sample size requirements.

    PubMed

    Leon, Andrew C

    2008-07-01

    The primary goal in designing a randomized controlled clinical trial (RCT) is to minimize bias in the estimate of treatment effect. Randomized group assignment, double-blinded assessments, and control or comparison groups reduce the risk of bias. The design must also provide sufficient statistical power to detect a clinically meaningful treatment effect and maintain a nominal level of type I error. An attempt to integrate neurocognitive science into an RCT poses additional challenges. Two particularly relevant aspects of such a design often receive insufficient attention in an RCT. Multiple outcomes inflate type I error, and an unreliable assessment process introduces bias and reduces statistical power. Here we describe how both unreliability and multiple outcomes can increase the study costs and duration and reduce the feasibility of the study. The objective of this article is to consider strategies that overcome the problems of unreliability and multiplicity.

  10. SRS 2010 Vegetation Inventory GeoStatistical Mapping Results for Custom Reaction Intensity and Total Dead Fuels.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Edwards, Lloyd A.; Paresol, Bernard

    This report of the geostatistical analysis results of the fire fuels response variables, custom reaction intensity and total dead fuels is but a part of an SRS 2010 vegetation inventory project. For detailed description of project, theory and background including sample design, methods, and results please refer to USDA Forest Service Savannah River Site internal report “SRS 2010 Vegetation Inventory GeoStatistical Mapping Report”, (Edwards & Parresol 2013).

  11. DESIGNA ND ANALYSIS FOR THEMATIC MAP ACCURACY ASSESSMENT: FUNDAMENTAL PRINCIPLES

    EPA Science Inventory

    Before being used in scientific investigations and policy decisions, thematic maps constructed from remotely sensed data should be subjected to a statistically rigorous accuracy assessment. The three basic components of an accuracy assessment are: 1) the sampling design used to s...

  12. 39 CFR 3001.31 - Evidence.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... methods employed in statistical compilations. The principal title of each exhibit should state what it... furnished: (i) Market research. (a) The following data and information shall be provided: (1) A clear and detailed description of the sample, observational, and data preparation designs, including definitions of...

  13. 39 CFR 3001.31 - Evidence.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... item of information used and the methods employed in statistical compilations. The principal title of... furnished: (i) Market research. (a) The following data and information shall be provided: (1) A clear and detailed description of the sample, observational, and data preparation designs, including definitions of...

  14. 39 CFR 3001.31 - Evidence.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... item of information used and the methods employed in statistical compilations. The principal title of... should be furnished: (i) Market research. (a) The following data and information shall be provided: (1) A clear and detailed description of the sample, observational, and data preparation designs, including...

  15. 39 CFR 3001.31 - Evidence.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... item of information used and the methods employed in statistical compilations. The principal title of... should be furnished: (i) Market research. (a) The following data and information shall be provided: (1) A clear and detailed description of the sample, observational, and data preparation designs, including...

  16. 39 CFR 3001.31 - Evidence.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... item of information used and the methods employed in statistical compilations. The principal title of... should be furnished: (i) Market research. (a) The following data and information shall be provided: (1) A clear and detailed description of the sample, observational, and data preparation designs, including...

  17. Nurse Practitioners, Certified Nurse Midwives, and Physician Assistants in Physician Offices

    MedlinePlus

    ... on Vital and Health Statistics Annual Reports Health Survey Research Methods Conference Reports from the National Medical Care ... each sample visit that takes all stages of design into account. The survey data are inflated or weighted to produce unbiased ...

  18. 77 FR 3241 - Notice of Proposed Information Collection Requests

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-01-23

    ... particular content of the survey, describing the sample design, the timeline for the survey activities, and..., school districts, schools, postsecondary institutions, and libraries. Surveys of teachers, students... Education Statistics Quick Response Information System (QRIS) consists of the Fast Response Survey System...

  19. Robust matching for voice recognition

    NASA Astrophysics Data System (ADS)

    Higgins, Alan; Bahler, L.; Porter, J.; Blais, P.

    1994-10-01

    This paper describes an automated method of comparing a voice sample of an unknown individual with samples from known speakers in order to establish or verify the individual's identity. The method is based on a statistical pattern matching approach that employs a simple training procedure, requires no human intervention (transcription, work or phonetic marketing, etc.), and makes no assumptions regarding the expected form of the statistical distributions of the observations. The content of the speech material (vocabulary, grammar, etc.) is not assumed to be constrained in any way. An algorithm is described which incorporates frame pruning and channel equalization processes designed to achieve robust performance with reasonable computational resources. An experimental implementation demonstrating the feasibility of the concept is described.

  20. On the repeated measures designs and sample sizes for randomized controlled trials.

    PubMed

    Tango, Toshiro

    2016-04-01

    For the analysis of longitudinal or repeated measures data, generalized linear mixed-effects models provide a flexible and powerful tool to deal with heterogeneity among subject response profiles. However, the typical statistical design adopted in usual randomized controlled trials is an analysis of covariance type analysis using a pre-defined pair of "pre-post" data, in which pre-(baseline) data are used as a covariate for adjustment together with other covariates. Then, the major design issue is to calculate the sample size or the number of subjects allocated to each treatment group. In this paper, we propose a new repeated measures design and sample size calculations combined with generalized linear mixed-effects models that depend not only on the number of subjects but on the number of repeated measures before and after randomization per subject used for the analysis. The main advantages of the proposed design combined with the generalized linear mixed-effects models are (1) it can easily handle missing data by applying the likelihood-based ignorable analyses under the missing at random assumption and (2) it may lead to a reduction in sample size, compared with the simple pre-post design. The proposed designs and the sample size calculations are illustrated with real data arising from randomized controlled trials. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  1. Inference with viral quasispecies diversity indices: clonal and NGS approaches.

    PubMed

    Gregori, Josep; Salicrú, Miquel; Domingo, Esteban; Sanchez, Alex; Esteban, Juan I; Rodríguez-Frías, Francisco; Quer, Josep

    2014-04-15

    Given the inherent dynamics of a viral quasispecies, we are often interested in the comparison of diversity indices of sequential samples of a patient, or in the comparison of diversity indices of virus in groups of patients in a treated versus control design. It is then important to make sure that the diversity measures from each sample may be compared with no bias and within a consistent statistical framework. In the present report, we review some indices often used as measures for viral quasispecies complexity and provide means for statistical inference, applying procedures taken from the ecology field. In particular, we examine the Shannon entropy and the mutation frequency, and we discuss the appropriateness of different normalization methods of the Shannon entropy found in the literature. By taking amplicons ultra-deep pyrosequencing (UDPS) raw data as a surrogate of a real hepatitis C virus viral population, we study through in-silico sampling the statistical properties of these indices under two methods of viral quasispecies sampling, classical cloning followed by Sanger sequencing (CCSS) and next-generation sequencing (NGS) such as UDPS. We propose solutions specific to each of the two sampling methods-CCSS and NGS-to guarantee statistically conforming conclusions as free of bias as possible. josep.gregori@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  2. Dynamic principle for ensemble control tools.

    PubMed

    Samoletov, A; Vasiev, B

    2017-11-28

    Dynamical equations describing physical systems in contact with a thermal bath are commonly extended by mathematical tools called "thermostats." These tools are designed for sampling ensembles in statistical mechanics. Here we propose a dynamic principle underlying a range of thermostats which is derived using fundamental laws of statistical physics and ensures invariance of the canonical measure. The principle covers both stochastic and deterministic thermostat schemes. Our method has a clear advantage over a range of proposed and widely used thermostat schemes that are based on formal mathematical reasoning. Following the derivation of the proposed principle, we show its generality and illustrate its applications including design of temperature control tools that differ from the Nosé-Hoover-Langevin scheme.

  3. Evaluation of statistical designs in phase I expansion cohorts: the Dana-Farber/Harvard Cancer Center experience.

    PubMed

    Dahlberg, Suzanne E; Shapiro, Geoffrey I; Clark, Jeffrey W; Johnson, Bruce E

    2014-07-01

    Phase I trials have traditionally been designed to assess toxicity and establish phase II doses with dose-finding studies and expansion cohorts but are frequently exceeding the traditional sample size to further assess endpoints in specific patient subsets. The scientific objectives of phase I expansion cohorts and their evolving role in the current era of targeted therapies have yet to be systematically examined. Adult therapeutic phase I trials opened within Dana-Farber/Harvard Cancer Center (DF/HCC) from 1988 to 2012 were identified for sample size details. Statistical designs and study objectives of those submitted in 2011 were reviewed for expansion cohort details. Five hundred twenty-two adult therapeutic phase I trials were identified during the 25 years. The average sample size of a phase I study has increased from 33.8 patients to 73.1 patients over that time. The proportion of trials with planned enrollment of 50 or fewer patients dropped from 93.0% during the time period 1988 to 1992 to 46.0% between 2008 and 2012; at the same time, the proportion of trials enrolling 51 to 100 patients and more than 100 patients increased from 5.3% and 1.8%, respectively, to 40.5% and 13.5% (χ(2) test, two-sided P < .001). Sixteen of the 60 trials (26.7%) in 2011 enrolled patients to three or more sub-cohorts in the expansion phase. Sixty percent of studies provided no statistical justification of the sample size, although 91.7% of trials stated response as an objective. Our data suggest that phase I studies have dramatically changed in size and scientific scope within the last decade. Additional studies addressing the implications of this trend on research processes, ethical concerns, and resource burden are needed. © The Author 2014. Published by Oxford University Press. All rights reserved.

  4. Urban Land Cover Mapping Accuracy Assessment - A Cost-benefit Analysis Approach

    NASA Astrophysics Data System (ADS)

    Xiao, T.

    2012-12-01

    One of the most important components in urban land cover mapping is mapping accuracy assessment. Many statistical models have been developed to help design simple schemes based on both accuracy and confidence levels. It is intuitive that an increased number of samples increases the accuracy as well as the cost of an assessment. Understanding cost and sampling size is crucial in implementing efficient and effective of field data collection. Few studies have included a cost calculation component as part of the assessment. In this study, a cost-benefit sampling analysis model was created by combining sample size design and sampling cost calculation. The sampling cost included transportation cost, field data collection cost, and laboratory data analysis cost. Simple Random Sampling (SRS) and Modified Systematic Sampling (MSS) methods were used to design sample locations and to extract land cover data in ArcGIS. High resolution land cover data layers of Denver, CO and Sacramento, CA, street networks, and parcel GIS data layers were used in this study to test and verify the model. The relationship between the cost and accuracy was used to determine the effectiveness of each sample method. The results of this study can be applied to other environmental studies that require spatial sampling.

  5. Utilizing the Zero-One Linear Programming Constraints to Draw Multiple Sets of Matched Samples from a Non-Treatment Population as Control Groups for the Quasi-Experimental Design

    ERIC Educational Resources Information Center

    Li, Yuan H.; Yang, Yu N.; Tompkins, Leroy J.; Modarresi, Shahpar

    2005-01-01

    The statistical technique, "Zero-One Linear Programming," that has successfully been used to create multiple tests with similar characteristics (e.g., item difficulties, test information and test specifications) in the area of educational measurement, was deemed to be a suitable method for creating multiple sets of matched samples to be…

  6. Assessment of Validity, Reliability and Difficulty Indices for Teacher-Built Physics Exam Questions in First Year High School

    ERIC Educational Resources Information Center

    Jandaghi, Gholamreza

    2010-01-01

    The purpose of the research is to determine high school teachers' skill rate in designing exam questions in physics subject. The statistical population was all of physics exam shits for two semesters in one school year from which a sample of 364 exam shits was drawn using multistage cluster sampling. Two experts assessed the shits and by using…

  7. Using the Multiple-Matched-Sample and Statistical Controls to Examine the Effects of Magnet School Programs on the Reading and Mathematics Performance of Students

    ERIC Educational Resources Information Center

    Yang, Yu N.; Li, Yuan H.; Tompkins, Leroy J.; Modarresi, Shahpar

    2005-01-01

    This summative evaluation of magnet programs employed a quasi-experimental design to investigate whether or not students enrolled in magnet programs gained any achievement advantage over students who were not enrolled in a magnet program. Researchers used Zero-One Linear Programming to draw multiple sets of matched samples from the non-magnet…

  8. Statistical power comparisons at 3T and 7T with a GO / NOGO task.

    PubMed

    Torrisi, Salvatore; Chen, Gang; Glen, Daniel; Bandettini, Peter A; Baker, Chris I; Reynolds, Richard; Yen-Ting Liu, Jeffrey; Leshin, Joseph; Balderston, Nicholas; Grillon, Christian; Ernst, Monique

    2018-07-15

    The field of cognitive neuroscience is weighing evidence about whether to move from standard field strength to ultra-high field (UHF). The present study contributes to the evidence by comparing a cognitive neuroscience paradigm at 3 Tesla (3T) and 7 Tesla (7T). The goal was to test and demonstrate the practical effects of field strength on a standard GO/NOGO task using accessible preprocessing and analysis tools. Two independent matched healthy samples (N = 31 each) were analyzed at 3T and 7T. Results show gains at 7T in statistical strength, the detection of smaller effects and group-level power. With an increased availability of UHF scanners, these gains may be exploited by cognitive neuroscientists and other neuroimaging researchers to develop more efficient or comprehensive experimental designs and, given the same sample size, achieve greater statistical power at 7T. Published by Elsevier Inc.

  9. Statistical power for nonequivalent pretest-posttest designs. The impact of change-score versus ANCOVA models.

    PubMed

    Oakes, J M; Feldman, H A

    2001-02-01

    Nonequivalent controlled pretest-posttest designs are central to evaluation science, yet no practical and unified approach for estimating power in the two most widely used analytic approaches to these designs exists. This article fills the gap by presenting and comparing useful, unified power formulas for ANCOVA and change-score analyses, indicating the implications of each on sample-size requirements. The authors close with practical recommendations for evaluators. Mathematical details and a simple spreadsheet approach are included in appendices.

  10. A design methodology for nonlinear systems containing parameter uncertainty

    NASA Technical Reports Server (NTRS)

    Young, G. E.; Auslander, D. M.

    1983-01-01

    In the present design methodology for nonlinear systems containing parameter uncertainty, a generalized sensitivity analysis is incorporated which employs parameter space sampling and statistical inference. For the case of a system with j adjustable and k nonadjustable parameters, this methodology (which includes an adaptive random search strategy) is used to determine the combination of j adjustable parameter values which maximize the probability of those performance indices which simultaneously satisfy design criteria in spite of the uncertainty due to k nonadjustable parameters.

  11. Simulation methods to estimate design power: an overview for applied research.

    PubMed

    Arnold, Benjamin F; Hogan, Daniel R; Colford, John M; Hubbard, Alan E

    2011-06-20

    Estimating the required sample size and statistical power for a study is an integral part of study design. For standard designs, power equations provide an efficient solution to the problem, but they are unavailable for many complex study designs that arise in practice. For such complex study designs, computer simulation is a useful alternative for estimating study power. Although this approach is well known among statisticians, in our experience many epidemiologists and social scientists are unfamiliar with the technique. This article aims to address this knowledge gap. We review an approach to estimate study power for individual- or cluster-randomized designs using computer simulation. This flexible approach arises naturally from the model used to derive conventional power equations, but extends those methods to accommodate arbitrarily complex designs. The method is universally applicable to a broad range of designs and outcomes, and we present the material in a way that is approachable for quantitative, applied researchers. We illustrate the method using two examples (one simple, one complex) based on sanitation and nutritional interventions to improve child growth. We first show how simulation reproduces conventional power estimates for simple randomized designs over a broad range of sample scenarios to familiarize the reader with the approach. We then demonstrate how to extend the simulation approach to more complex designs. Finally, we discuss extensions to the examples in the article, and provide computer code to efficiently run the example simulations in both R and Stata. Simulation methods offer a flexible option to estimate statistical power for standard and non-traditional study designs and parameters of interest. The approach we have described is universally applicable for evaluating study designs used in epidemiologic and social science research.

  12. Simulation methods to estimate design power: an overview for applied research

    PubMed Central

    2011-01-01

    Background Estimating the required sample size and statistical power for a study is an integral part of study design. For standard designs, power equations provide an efficient solution to the problem, but they are unavailable for many complex study designs that arise in practice. For such complex study designs, computer simulation is a useful alternative for estimating study power. Although this approach is well known among statisticians, in our experience many epidemiologists and social scientists are unfamiliar with the technique. This article aims to address this knowledge gap. Methods We review an approach to estimate study power for individual- or cluster-randomized designs using computer simulation. This flexible approach arises naturally from the model used to derive conventional power equations, but extends those methods to accommodate arbitrarily complex designs. The method is universally applicable to a broad range of designs and outcomes, and we present the material in a way that is approachable for quantitative, applied researchers. We illustrate the method using two examples (one simple, one complex) based on sanitation and nutritional interventions to improve child growth. Results We first show how simulation reproduces conventional power estimates for simple randomized designs over a broad range of sample scenarios to familiarize the reader with the approach. We then demonstrate how to extend the simulation approach to more complex designs. Finally, we discuss extensions to the examples in the article, and provide computer code to efficiently run the example simulations in both R and Stata. Conclusions Simulation methods offer a flexible option to estimate statistical power for standard and non-traditional study designs and parameters of interest. The approach we have described is universally applicable for evaluating study designs used in epidemiologic and social science research. PMID:21689447

  13. Using and Evaluating Resampling Simulations in SPSS and Excel.

    ERIC Educational Resources Information Center

    Smith, Brad

    2003-01-01

    Describes and evaluates three computer-assisted simulations used with Statistical Package for the Social Sciences (SPSS) and Microsoft Excel. Designed the simulations to reinforce and enhance student understanding of sampling distributions, confidence intervals, and significance tests. Reports evaluations revealed improved student comprehension of…

  14. Statistics based sampling for controller and estimator design

    NASA Astrophysics Data System (ADS)

    Tenne, Dirk

    The purpose of this research is the development of statistical design tools for robust feed-forward/feedback controllers and nonlinear estimators. This dissertation is threefold and addresses the aforementioned topics nonlinear estimation, target tracking and robust control. To develop statistically robust controllers and nonlinear estimation algorithms, research has been performed to extend existing techniques, which propagate the statistics of the state, to achieve higher order accuracy. The so-called unscented transformation has been extended to capture higher order moments. Furthermore, higher order moment update algorithms based on a truncated power series have been developed. The proposed techniques are tested on various benchmark examples. Furthermore, the unscented transformation has been utilized to develop a three dimensional geometrically constrained target tracker. The proposed planar circular prediction algorithm has been developed in a local coordinate framework, which is amenable to extension of the tracking algorithm to three dimensional space. This tracker combines the predictions of a circular prediction algorithm and a constant velocity filter by utilizing the Covariance Intersection. This combined prediction can be updated with the subsequent measurement using a linear estimator. The proposed technique is illustrated on a 3D benchmark trajectory, which includes coordinated turns and straight line maneuvers. The third part of this dissertation addresses the design of controller which include knowledge of parametric uncertainties and their distributions. The parameter distributions are approximated by a finite set of points which are calculated by the unscented transformation. This set of points is used to design robust controllers which minimize a statistical performance of the plant over the domain of uncertainty consisting of a combination of the mean and variance. The proposed technique is illustrated on three benchmark problems. The first relates to the design of prefilters for a linear and nonlinear spring-mass-dashpot system and the second applies a feedback controller to a hovering helicopter. Lastly, the statistical robust controller design is devoted to a concurrent feed-forward/feedback controller structure for a high-speed low tension tape drive.

  15. Internal pilots for a class of linear mixed models with Gaussian and compound symmetric data

    PubMed Central

    Gurka, Matthew J.; Coffey, Christopher S.; Muller, Keith E.

    2015-01-01

    SUMMARY An internal pilot design uses interim sample size analysis, without interim data analysis, to adjust the final number of observations. The approach helps to choose a sample size sufficiently large (to achieve the statistical power desired), but not too large (which would waste money and time). We report on recent research in cerebral vascular tortuosity (curvature in three dimensions) which would benefit greatly from internal pilots due to uncertainty in the parameters of the covariance matrix used for study planning. Unfortunately, observations correlated across the four regions of the brain and small sample sizes preclude using existing methods. However, as in a wide range of medical imaging studies, tortuosity data have no missing or mistimed data, a factorial within-subject design, the same between-subject design for all responses, and a Gaussian distribution with compound symmetry. For such restricted models, we extend exact, small sample univariate methods for internal pilots to linear mixed models with any between-subject design (not just two groups). Planning a new tortuosity study illustrates how the new methods help to avoid sample sizes that are too small or too large while still controlling the type I error rate. PMID:17318914

  16. Guidelines for the design and statistical analysis of experiments in papers submitted to ATLA.

    PubMed

    Festing, M F

    2001-01-01

    In vitro experiments need to be well designed and correctly analysed if they are to achieve their full potential to replace the use of animals in research. An "experiment" is a procedure for collecting scientific data in order to answer a hypothesis, or to provide material for generating new hypotheses, and differs from a survey because the scientist has control over the treatments that can be applied. Most experiments can be classified into one of a few formal designs, the most common being completely randomised, and randomised block designs. These are quite common with in vitro experiments, which are often replicated in time. Some experiments involve a single independent (treatment) variable, while other "factorial" designs simultaneously vary two or more independent variables, such as drug treatment and cell line. Factorial designs often provide additional information at little extra cost. Experiments need to be carefully planned to avoid bias, be powerful yet simple, provide for a valid statistical analysis and, in some cases, have a wide range of applicability. Virtually all experiments need some sort of statistical analysis in order to take account of biological variation among the experimental subjects. Parametric methods using the t test or analysis of variance are usually more powerful than non-parametric methods, provided the underlying assumptions of normality of the residuals and equal variances are approximately valid. The statistical analyses of data from a completely randomised design, and from a randomised-block design are demonstrated in Appendices 1 and 2, and methods of determining sample size are discussed in Appendix 3. Appendix 4 gives a checklist for authors submitting papers to ATLA.

  17. Statistical distribution of mechanical properties for three graphite-epoxy material systems

    NASA Technical Reports Server (NTRS)

    Reese, C.; Sorem, J., Jr.

    1981-01-01

    Graphite-epoxy composites are playing an increasing role as viable alternative materials in structural applications necessitating thorough investigation into the predictability and reproducibility of their material strength properties. This investigation was concerned with tension, compression, and short beam shear coupon testing of large samples from three different material suppliers to determine their statistical strength behavior. Statistical results indicate that a two Parameter Weibull distribution model provides better overall characterization of material behavior for the graphite-epoxy systems tested than does the standard Normal distribution model that is employed for most design work. While either a Weibull or Normal distribution model provides adequate predictions for average strength values, the Weibull model provides better characterization in the lower tail region where the predictions are of maximum design interest. The two sets of the same material were found to have essentially the same material properties, and indicate that repeatability can be achieved.

  18. Landsat image and sample design for water reservoirs (Rapel dam Central Chile).

    PubMed

    Lavanderos, L; Pozo, M E; Pattillo, C; Miranda, H

    1990-01-01

    Spatial heterogeneity of the Rapel reservoir surface waters is analyzed through Landsat images. The image digital counts are used with the aim or developing an aprioristic quantitative sample design.Natural horizontal stratification of the Rapel Reservoir (Central Chile) is produced mainly by suspended solids. The spatial heterogeneity conditions of the reservoir for the Spring 86-Summer 87 period were determined by qualitative analysis and image processing of the MSS Landsat, bands 1 and 3. The space-time variations of the different observed strata obtained with multitemporal image analysis.A random stratified sample design (r.s.s.d) was developed, based on the digital counts statistical analysis. Strata population size as well as the average, variance and sampling size of the digital counts were obtained by the r.s.s.d method.Stratification determined by analysis of satellite images were later correlated with ground data. Though the stratification of the reservoir is constant over time, the shape and size of the strata varys.

  19. Sample size determination for estimating antibody seroconversion rate under stable malaria transmission intensity.

    PubMed

    Sepúlveda, Nuno; Drakeley, Chris

    2015-04-03

    In the last decade, several epidemiological studies have demonstrated the potential of using seroprevalence (SP) and seroconversion rate (SCR) as informative indicators of malaria burden in low transmission settings or in populations on the cusp of elimination. However, most of studies are designed to control ensuing statistical inference over parasite rates and not on these alternative malaria burden measures. SP is in essence a proportion and, thus, many methods exist for the respective sample size determination. In contrast, designing a study where SCR is the primary endpoint, is not an easy task because precision and statistical power are affected by the age distribution of a given population. Two sample size calculators for SCR estimation are proposed. The first one consists of transforming the confidence interval for SP into the corresponding one for SCR given a known seroreversion rate (SRR). The second calculator extends the previous one to the most common situation where SRR is unknown. In this situation, data simulation was used together with linear regression in order to study the expected relationship between sample size and precision. The performance of the first sample size calculator was studied in terms of the coverage of the confidence intervals for SCR. The results pointed out to eventual problems of under or over coverage for sample sizes ≤250 in very low and high malaria transmission settings (SCR ≤ 0.0036 and SCR ≥ 0.29, respectively). The correct coverage was obtained for the remaining transmission intensities with sample sizes ≥ 50. Sample size determination was then carried out for cross-sectional surveys using realistic SCRs from past sero-epidemiological studies and typical age distributions from African and non-African populations. For SCR < 0.058, African studies require a larger sample size than their non-African counterparts in order to obtain the same precision. The opposite happens for the remaining transmission intensities. With respect to the second sample size calculator, simulation unravelled the likelihood of not having enough information to estimate SRR in low transmission settings (SCR ≤ 0.0108). In that case, the respective estimates tend to underestimate the true SCR. This problem is minimized by sample sizes of no less than 500 individuals. The sample sizes determined by this second method highlighted the prior expectation that, when SRR is not known, sample sizes are increased in relation to the situation of a known SRR. In contrast to the first sample size calculation, African studies would now require lesser individuals than their counterparts conducted elsewhere, irrespective of the transmission intensity. Although the proposed sample size calculators can be instrumental to design future cross-sectional surveys, the choice of a particular sample size must be seen as a much broader exercise that involves weighting statistical precision with ethical issues, available human and economic resources, and possible time constraints. Moreover, if the sample size determination is carried out on varying transmission intensities, as done here, the respective sample sizes can also be used in studies comparing sites with different malaria transmission intensities. In conclusion, the proposed sample size calculators are a step towards the design of better sero-epidemiological studies. Their basic ideas show promise to be applied to the planning of alternative sampling schemes that may target or oversample specific age groups.

  20. Variance Estimation, Design Effects, and Sample Size Calculations for Respondent-Driven Sampling

    PubMed Central

    2006-01-01

    Hidden populations, such as injection drug users and sex workers, are central to a number of public health problems. However, because of the nature of these groups, it is difficult to collect accurate information about them, and this difficulty complicates disease prevention efforts. A recently developed statistical approach called respondent-driven sampling improves our ability to study hidden populations by allowing researchers to make unbiased estimates of the prevalence of certain traits in these populations. Yet, not enough is known about the sample-to-sample variability of these prevalence estimates. In this paper, we present a bootstrap method for constructing confidence intervals around respondent-driven sampling estimates and demonstrate in simulations that it outperforms the naive method currently in use. We also use simulations and real data to estimate the design effects for respondent-driven sampling in a number of situations. We conclude with practical advice about the power calculations that are needed to determine the appropriate sample size for a study using respondent-driven sampling. In general, we recommend a sample size twice as large as would be needed under simple random sampling. PMID:16937083

  1. Comparison of Sample Size by Bootstrap and by Formulas Based on Normal Distribution Assumption.

    PubMed

    Wang, Zuozhen

    2018-01-01

    Bootstrapping technique is distribution-independent, which provides an indirect way to estimate the sample size for a clinical trial based on a relatively smaller sample. In this paper, sample size estimation to compare two parallel-design arms for continuous data by bootstrap procedure are presented for various test types (inequality, non-inferiority, superiority, and equivalence), respectively. Meanwhile, sample size calculation by mathematical formulas (normal distribution assumption) for the identical data are also carried out. Consequently, power difference between the two calculation methods is acceptably small for all the test types. It shows that the bootstrap procedure is a credible technique for sample size estimation. After that, we compared the powers determined using the two methods based on data that violate the normal distribution assumption. To accommodate the feature of the data, the nonparametric statistical method of Wilcoxon test was applied to compare the two groups in the data during the process of bootstrap power estimation. As a result, the power estimated by normal distribution-based formula is far larger than that by bootstrap for each specific sample size per group. Hence, for this type of data, it is preferable that the bootstrap method be applied for sample size calculation at the beginning, and that the same statistical method as used in the subsequent statistical analysis is employed for each bootstrap sample during the course of bootstrap sample size estimation, provided there is historical true data available that can be well representative of the population to which the proposed trial is planning to extrapolate.

  2. Microwave-assisted extraction and mild saponification for determination of organochlorine pesticides in oyster samples.

    PubMed

    Carro, N; García, I; Ignacio, M-C; Llompart, M; Yebra, M-C; Mouteira, A

    2002-10-01

    A sample-preparation procedure (extraction and saponification) using microwave energy is proposed for determination of organochlorine pesticides in oyster samples. A Plackett-Burman factorial design has been used to optimize the microwave-assisted extraction and mild saponification on a freeze dried sample spiked with a mixture of aldrin, endrin, dieldrin, heptachlor, heptachorepoxide, isodrin, transnonachlor, p, p'-DDE, and p, p'-DDD. Six variables: solvent volume, extraction time, extraction temperature, amount of acetone (%) in the extractant solvent, amount of sample, and volume of NaOH solution were considered in the optimization process. The results show that the amount of sample is statistically significant for dieldrin, aldrin, p, p'-DDE, heptachlor, and transnonachlor and solvent volume for dieldrin, aldrin, and p, p'-DDE. The volume of NaOH solution is statistically significant for aldrin and p, p'-DDE only. Extraction temperature and extraction time seem to be the main factors determining the efficiency of extraction process for isodrin and p, p'-DDE, respectively. The optimized procedure was compared with conventional Soxhlet extraction.

  3. Biological effect of low-head sea lamprey barriers: Designs for extensive surveys and the value of incorporating intensive process-oriented research

    USGS Publications Warehouse

    Hayes, D.B.; Baylis, J.R.; Carl, L.M.; Dodd, H.R.; Goldstein, J.D.; McLaughlin, R.L.; Noakes, D.L.G.; Porto, L.M.

    2003-01-01

    Four sampling designs for quantifying the effect of low-head sea lamprey (Petromyzon marinus) barriers on fish communities were evaluated, and the contribution of process-oriented research to the overall confidence of results obtained was discussed. The designs include: (1) sample barrier streams post-construction; (2) sample barrier and reference streams post-construction; (3) sample barrier streams pre- and post-construction; and (4) sample barrier and reference streams pre- and post-construction. In the statistical literature, the principal basis for comparison of sampling designs is generally the precision achieved by each design. In addition to precision, designs should be compared based on the interpretability of results and on the scale to which the results apply. Using data collected in a broad survey of streams with and without sea lamprey barriers, some of the tradeoffs that occur among precision, scale, and interpretability are illustrated. Although circumstances such as funding and availability of pre-construction data may limit which design can be implemented, a pre/post-construction design including barrier and reference streams provides the most meaningful information for use in barrier management decisions. Where it is not feasible to obtain pre-construction data, a design including reference streams is important to maintain the interpretability of results. Regardless of the design used, process-oriented research provides a framework for interpreting results obtained in broad surveys. As such, information from both extensive surveys and intensive process-oriented research provides the best basis for fishery management actions, and gives researchers and managers the most confidence in the conclusions reached regarding the effects of sea lamprey barriers.

  4. Conceptual Model of Clinical Governance Information System for Statistical Indicators by Using UML in Two Sample Hospitals.

    PubMed

    Jeddi, Fatemeh Rangraz; Farzandipoor, Mehrdad; Arabfard, Masoud; Hosseini, Azam Haj Mohammad

    2014-04-01

    The purpose of this study was investigating situation and presenting a conceptual model for clinical governance information system by using UML in two sample hospitals. However, use of information is one of the fundamental components of clinical governance; but unfortunately, it does not pay much attention to information management. A cross sectional study was conducted in October 2012- May 2013. Data were gathered through questionnaires and interviews in two sample hospitals. Face and content validity of the questionnaire has been confirmed by experts. Data were collected from a pilot hospital and reforms were carried out and Final questionnaire was prepared. Data were analyzed by descriptive statistics and SPSS 16 software. With the scenario derived from questionnaires, UML diagrams are presented by using Rational Rose 7 software. The results showed that 32.14 percent Indicators of the hospitals were calculated. Database was not designed and 100 percent of the hospital's clinical governance was required to create a database. Clinical governance unit of hospitals to perform its mission, do not have access to all the needed indicators. Defining of Processes and drawing of models and creating of database are essential for designing of information systems.

  5. Conceptual Model of Clinical Governance Information System for Statistical Indicators by Using UML in Two Sample Hospitals.

    PubMed

    Jeddi, Fatemeh Rangraz; Farzandipoor, Mehrdad; Arabfard, Masoud; Hosseini, Azam Haj Mohammad

    2016-04-01

    The purpose of this study was investigating situation and presenting a conceptual model for clinical governance information system by using UML in two sample hospitals. However, use of information is one of the fundamental components of clinical governance; but unfortunately, it does not pay much attention to information management. A cross sectional study was conducted in October 2012- May 2013. Data were gathered through questionnaires and interviews in two sample hospitals. Face and content validity of the questionnaire has been confirmed by experts. Data were collected from a pilot hospital and reforms were carried out and Final questionnaire was prepared. Data were analyzed by descriptive statistics and SPSS 16 software. With the scenario derived from questionnaires, UML diagrams are presented by using Rational Rose 7 software. The results showed that 32.14 percent Indicators of the hospitals were calculated. Database was not designed and 100 percent of the hospital's clinical governance was required to create a database. Clinical governance unit of hospitals to perform its mission, do not have access to all the needed indicators. Defining of Processes and drawing of models and creating of database are essential for designing of information systems.

  6. Sample size considerations for paired experimental design with incomplete observations of continuous outcomes.

    PubMed

    Zhu, Hong; Xu, Xiaohan; Ahn, Chul

    2017-01-01

    Paired experimental design is widely used in clinical and health behavioral studies, where each study unit contributes a pair of observations. Investigators often encounter incomplete observations of paired outcomes in the data collected. Some study units contribute complete pairs of observations, while the others contribute either pre- or post-intervention observations. Statistical inference for paired experimental design with incomplete observations of continuous outcomes has been extensively studied in literature. However, sample size method for such study design is sparsely available. We derive a closed-form sample size formula based on the generalized estimating equation approach by treating the incomplete observations as missing data in a linear model. The proposed method properly accounts for the impact of mixed structure of observed data: a combination of paired and unpaired outcomes. The sample size formula is flexible to accommodate different missing patterns, magnitude of missingness, and correlation parameter values. We demonstrate that under complete observations, the proposed generalized estimating equation sample size estimate is the same as that based on the paired t-test. In the presence of missing data, the proposed method would lead to a more accurate sample size estimate comparing with the crude adjustment. Simulation studies are conducted to evaluate the finite-sample performance of the generalized estimating equation sample size formula. A real application example is presented for illustration.

  7. Criterion-Referenced Test Items for Welding.

    ERIC Educational Resources Information Center

    Davis, Diane, Ed.

    This test item bank on welding contains test questions based upon competencies found in the Missouri Welding Competency Profile. Some test items are keyed for multiple competencies. These criterion-referenced test items are designed to work with the Vocational Instructional Management System. Questions have been statistically sampled and validated…

  8. Using Screencast Videos to Enhance Undergraduate Students' Statistical Reasoning about Confidence Intervals

    ERIC Educational Resources Information Center

    Strazzeri, Kenneth Charles

    2013-01-01

    The purposes of this study were to investigate (a) undergraduate students' reasoning about the concepts of confidence intervals (b) undergraduate students' interactions with "well-designed" screencast videos on sampling distributions and confidence intervals, and (c) how screencast videos improve undergraduate students' reasoning ability…

  9. STATISTICAL ISSUES FOR MONITORING ECOLOGICAL AND NATURAL RESOURCES IN THE UNITED STATES

    EPA Science Inventory

    The United States funds a number of national monitoring programs to measure the status and trends of ecological and natural resources. Each of these programs has a unique focus: the scientific objectives are different as are the sample designs. However, individuals and committees...

  10. Statistical analyses to support guidelines for marine avian sampling. Final report

    USGS Publications Warehouse

    Kinlan, Brian P.; Zipkin, Elise; O'Connell, Allan F.; Caldow, Chris

    2012-01-01

    Interest in development of offshore renewable energy facilities has led to a need for high-quality, statistically robust information on marine wildlife distributions. A practical approach is described to estimate the amount of sampling effort required to have sufficient statistical power to identify species-specific “hotspots” and “coldspots” of marine bird abundance and occurrence in an offshore environment divided into discrete spatial units (e.g., lease blocks), where “hotspots” and “coldspots” are defined relative to a reference (e.g., regional) mean abundance and/or occurrence probability for each species of interest. For example, a location with average abundance or occurrence that is three times larger the mean (3x effect size) could be defined as a “hotspot,” and a location that is three times smaller than the mean (1/3x effect size) as a “coldspot.” The choice of the effect size used to define hot and coldspots will generally depend on a combination of ecological and regulatory considerations. A method is also developed for testing the statistical significance of possible hotspots and coldspots. Both methods are illustrated with historical seabird survey data from the USGS Avian Compendium Database. Our approach consists of five main components: 1. A review of the primary scientific literature on statistical modeling of animal group size and avian count data to develop a candidate set of statistical distributions that have been used or may be useful to model seabird counts. 2. Statistical power curves for one-sample, one-tailed Monte Carlo significance tests of differences of observed small-sample means from a specified reference distribution. These curves show the power to detect "hotspots" or "coldspots" of occurrence and abundance at a range of effect sizes, given assumptions which we discuss. 3. A model selection procedure, based on maximum likelihood fits of models in the candidate set, to determine an appropriate statistical distribution to describe counts of a given species in a particular region and season. 4. Using a large database of historical at-sea seabird survey data, we applied this technique to identify appropriate statistical distributions for modeling a variety of species, allowing the distribution to vary by season. For each species and season, we used the selected distribution to calculate and map retrospective statistical power to detect hotspots and coldspots, and map pvalues from Monte Carlo significance tests of hotspots and coldspots, in discrete lease blocks designated by the U.S. Department of Interior, Bureau of Ocean Energy Management (BOEM). 5. Because our definition of hotspots and coldspots does not explicitly include variability over time, we examine the relationship between the temporal scale of sampling and the proportion of variance captured in time series of key environmental correlates of marine bird abundance, as well as available marine bird abundance time series, and use these analyses to develop recommendations for the temporal distribution of sampling to adequately represent both shortterm and long-term variability. We conclude by presenting a schematic “decision tree” showing how this power analysis approach would fit in a general framework for avian survey design, and discuss implications of model assumptions and results. We discuss avenues for future development of this work, and recommendations for practical implementation in the context of siting and wildlife assessment for offshore renewable energy development projects.

  11. Precision, Reliability, and Effect Size of Slope Variance in Latent Growth Curve Models: Implications for Statistical Power Analysis

    PubMed Central

    Brandmaier, Andreas M.; von Oertzen, Timo; Ghisletta, Paolo; Lindenberger, Ulman; Hertzog, Christopher

    2018-01-01

    Latent Growth Curve Models (LGCM) have become a standard technique to model change over time. Prediction and explanation of inter-individual differences in change are major goals in lifespan research. The major determinants of statistical power to detect individual differences in change are the magnitude of true inter-individual differences in linear change (LGCM slope variance), design precision, alpha level, and sample size. Here, we show that design precision can be expressed as the inverse of effective error. Effective error is determined by instrument reliability and the temporal arrangement of measurement occasions. However, it also depends on another central LGCM component, the variance of the latent intercept and its covariance with the latent slope. We derive a new reliability index for LGCM slope variance—effective curve reliability (ECR)—by scaling slope variance against effective error. ECR is interpretable as a standardized effect size index. We demonstrate how effective error, ECR, and statistical power for a likelihood ratio test of zero slope variance formally relate to each other and how they function as indices of statistical power. We also provide a computational approach to derive ECR for arbitrary intercept-slope covariance. With practical use cases, we argue for the complementary utility of the proposed indices of a study's sensitivity to detect slope variance when making a priori longitudinal design decisions or communicating study designs. PMID:29755377

  12. Study on Measuring the Viscosity of Lubricating Oil by Viscometer Based on Hele - Shaw Principle

    NASA Astrophysics Data System (ADS)

    Li, Longfei

    2017-12-01

    In order to explore the method of accurately measuring the viscosity value of oil samples using the viscometer based on Hele-Shaw principle, three different measurement methods are designed in the laboratory, and the statistical characteristics of the measured values are compared, in order to get the best measurement method. The results show that the oil sample to be measured is placed in the magnetic field formed by the magnet, and the oil sample can be sucked from the same distance from the magnet. The viscosity value of the sample can be measured accurately.

  13. Evaluation of a regional monitoring program's statistical power to detect temporal trends in forest health indicators

    USGS Publications Warehouse

    Perles, Stephanie J.; Wagner, Tyler; Irwin, Brian J.; Manning, Douglas R.; Callahan, Kristina K.; Marshall, Matthew R.

    2014-01-01

    Forests are socioeconomically and ecologically important ecosystems that are exposed to a variety of natural and anthropogenic stressors. As such, monitoring forest condition and detecting temporal changes therein remain critical to sound public and private forestland management. The National Parks Service’s Vital Signs monitoring program collects information on many forest health indicators, including species richness, cover by exotics, browse pressure, and forest regeneration. We applied a mixed-model approach to partition variability in data for 30 forest health indicators collected from several national parks in the eastern United States. We then used the estimated variance components in a simulation model to evaluate trend detection capabilities for each indicator. We investigated the extent to which the following factors affected ability to detect trends: (a) sample design: using simple panel versus connected panel design, (b) effect size: increasing trend magnitude, (c) sample size: varying the number of plots sampled each year, and (d) stratified sampling: post-stratifying plots into vegetation domains. Statistical power varied among indicators; however, indicators that measured the proportion of a total yielded higher power when compared to indicators that measured absolute or average values. In addition, the total variability for an indicator appeared to influence power to detect temporal trends more than how total variance was partitioned among spatial and temporal sources. Based on these analyses and the monitoring objectives of theVital Signs program, the current sampling design is likely overly intensive for detecting a 5 % trend·year−1 for all indicators and is appropriate for detecting a 1 % trend·year−1 in most indicators.

  14. Optimal two-phase sampling design for comparing accuracies of two binary classification rules.

    PubMed

    Xu, Huiping; Hui, Siu L; Grannis, Shaun

    2014-02-10

    In this paper, we consider the design for comparing the performance of two binary classification rules, for example, two record linkage algorithms or two screening tests. Statistical methods are well developed for comparing these accuracy measures when the gold standard is available for every unit in the sample, or in a two-phase study when the gold standard is ascertained only in the second phase in a subsample using a fixed sampling scheme. However, these methods do not attempt to optimize the sampling scheme to minimize the variance of the estimators of interest. In comparing the performance of two classification rules, the parameters of primary interest are the difference in sensitivities, specificities, and positive predictive values. We derived the analytic variance formulas for these parameter estimates and used them to obtain the optimal sampling design. The efficiency of the optimal sampling design is evaluated through an empirical investigation that compares the optimal sampling with simple random sampling and with proportional allocation. Results of the empirical study show that the optimal sampling design is similar for estimating the difference in sensitivities and in specificities, and both achieve a substantial amount of variance reduction with an over-sample of subjects with discordant results and under-sample of subjects with concordant results. A heuristic rule is recommended when there is no prior knowledge of individual sensitivities and specificities, or the prevalence of the true positive findings in the study population. The optimal sampling is applied to a real-world example in record linkage to evaluate the difference in classification accuracy of two matching algorithms. Copyright © 2013 John Wiley & Sons, Ltd.

  15. A knowledge-based T2-statistic to perform pathway analysis for quantitative proteomic data

    PubMed Central

    Chen, Yi-Hau

    2017-01-01

    Approaches to identify significant pathways from high-throughput quantitative data have been developed in recent years. Still, the analysis of proteomic data stays difficult because of limited sample size. This limitation also leads to the practice of using a competitive null as common approach; which fundamentally implies genes or proteins as independent units. The independent assumption ignores the associations among biomolecules with similar functions or cellular localization, as well as the interactions among them manifested as changes in expression ratios. Consequently, these methods often underestimate the associations among biomolecules and cause false positives in practice. Some studies incorporate the sample covariance matrix into the calculation to address this issue. However, sample covariance may not be a precise estimation if the sample size is very limited, which is usually the case for the data produced by mass spectrometry. In this study, we introduce a multivariate test under a self-contained null to perform pathway analysis for quantitative proteomic data. The covariance matrix used in the test statistic is constructed by the confidence scores retrieved from the STRING database or the HitPredict database. We also design an integrating procedure to retain pathways of sufficient evidence as a pathway group. The performance of the proposed T2-statistic is demonstrated using five published experimental datasets: the T-cell activation, the cAMP/PKA signaling, the myoblast differentiation, and the effect of dasatinib on the BCR-ABL pathway are proteomic datasets produced by mass spectrometry; and the protective effect of myocilin via the MAPK signaling pathway is a gene expression dataset of limited sample size. Compared with other popular statistics, the proposed T2-statistic yields more accurate descriptions in agreement with the discussion of the original publication. We implemented the T2-statistic into an R package T2GA, which is available at https://github.com/roqe/T2GA. PMID:28622336

  16. A knowledge-based T2-statistic to perform pathway analysis for quantitative proteomic data.

    PubMed

    Lai, En-Yu; Chen, Yi-Hau; Wu, Kun-Pin

    2017-06-01

    Approaches to identify significant pathways from high-throughput quantitative data have been developed in recent years. Still, the analysis of proteomic data stays difficult because of limited sample size. This limitation also leads to the practice of using a competitive null as common approach; which fundamentally implies genes or proteins as independent units. The independent assumption ignores the associations among biomolecules with similar functions or cellular localization, as well as the interactions among them manifested as changes in expression ratios. Consequently, these methods often underestimate the associations among biomolecules and cause false positives in practice. Some studies incorporate the sample covariance matrix into the calculation to address this issue. However, sample covariance may not be a precise estimation if the sample size is very limited, which is usually the case for the data produced by mass spectrometry. In this study, we introduce a multivariate test under a self-contained null to perform pathway analysis for quantitative proteomic data. The covariance matrix used in the test statistic is constructed by the confidence scores retrieved from the STRING database or the HitPredict database. We also design an integrating procedure to retain pathways of sufficient evidence as a pathway group. The performance of the proposed T2-statistic is demonstrated using five published experimental datasets: the T-cell activation, the cAMP/PKA signaling, the myoblast differentiation, and the effect of dasatinib on the BCR-ABL pathway are proteomic datasets produced by mass spectrometry; and the protective effect of myocilin via the MAPK signaling pathway is a gene expression dataset of limited sample size. Compared with other popular statistics, the proposed T2-statistic yields more accurate descriptions in agreement with the discussion of the original publication. We implemented the T2-statistic into an R package T2GA, which is available at https://github.com/roqe/T2GA.

  17. Lunar soils grain size catalog

    NASA Technical Reports Server (NTRS)

    Graf, John C.

    1993-01-01

    This catalog compiles every available grain size distribution for Apollo surface soils, trench samples, cores, and Luna 24 soils. Original laboratory data are tabled, and cumulative weight distribution curves and histograms are plotted. Standard statistical parameters are calculated using the method of moments. Photos and location comments describe the sample environment and geological setting. This catalog can help researchers describe the geotechnical conditions and site variability of the lunar surface essential to the design of a lunar base.

  18. The impact of study design and diagnostic approach in a large multi-centre ADHD study: Part 2: Dimensional measures of psychopathology and intelligence.

    PubMed

    Müller, Ueli C; Asherson, Philip; Banaschewski, Tobias; Buitelaar, Jan K; Ebstein, Richard P; Eisenberg, Jaques; Gill, Michael; Manor, Iris; Miranda, Ana; Oades, Robert D; Roeyers, Herbert; Rothenberger, Aribert; Sergeant, Joseph A; Sonuga-Barke, Edmund Js; Thompson, Margaret; Faraone, Stephen V; Steinhausen, Hans-Christoph

    2011-04-07

    The International Multi-centre ADHD Genetics (IMAGE) project with 11 participating centres from 7 European countries and Israel has collected a large behavioural and genetic database for present and future research. Behavioural data were collected from 1068 probands with ADHD and 1446 unselected siblings. The aim was to describe and analyse questionnaire data and IQ measures from all probands and siblings. In particular, to investigate the influence of age, gender, family status (proband vs. sibling), informant, and centres on sample homogeneity in psychopathological measures. Conners' Questionnaires, Strengths and Difficulties Questionnaires, and Wechsler Intelligence Scores were used to describe the phenotype of the sample. Data were analysed by use of robust statistical multi-way procedures. Besides main effects of age, gender, informant, and centre, there were considerable interaction effects on questionnaire data. The larger differences between probands and siblings at home than at school may reflect contrast effects in the parents. Furthermore, there were marked gender by status effects on the ADHD symptom ratings with girls scoring one standard deviation higher than boys in the proband sample but lower than boys in the siblings sample. The multi-centre design is another important source of heterogeneity, particularly in the interaction with the family status. To a large extent the centres differed from each other with regard to differences between proband and sibling scores. When ADHD probands are diagnosed by use of fixed symptom counts, the severity of the disorder in the proband sample may markedly differ between boys and girls and across age, particularly in samples with a large age range. A multi-centre design carries the risk of considerable phenotypic differences between centres and, consequently, of additional heterogeneity of the sample even if standardized diagnostic procedures are used. These possible sources of variance should be counteracted in genetic analyses either by using age and gender adjusted diagnostic procedures and regional normative data or by adjusting for design artefacts by use of covariate statistics, by eliminating outliers, or by other methods suitable for reducing heterogeneity.

  19. Factorial versus multi-arm multi-stage designs for clinical trials with multiple treatments.

    PubMed

    Jaki, Thomas; Vasileiou, Despina

    2017-02-20

    When several treatments are available for evaluation in a clinical trial, different design options are available. We compare multi-arm multi-stage with factorial designs, and in particular, we will consider a 2 × 2 factorial design, where groups of patients will either take treatments A, B, both or neither. We investigate the performance and characteristics of both types of designs under different scenarios and compare them using both theory and simulations. For the factorial designs, we construct appropriate test statistics to test the hypothesis of no treatment effect against the control group with overall control of the type I error. We study the effect of the choice of the allocation ratios on the critical value and sample size requirements for a target power. We also study how the possibility of an interaction between the two treatments A and B affects type I and type II errors when testing for significance of each of the treatment effects. We present both simulation results and a case study on an osteoarthritis clinical trial. We discover that in an optimal factorial design in terms of minimising the associated critical value, the corresponding allocation ratios differ substantially to those of a balanced design. We also find evidence of potentially big losses in power in factorial designs for moderate deviations from the study design assumptions and little gain compared with multi-arm multi-stage designs when the assumptions hold. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  20. A BASIS FOR MODIFYING THE TANK 12 COMPOSITE SAMPLING DESIGN

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shine, G.

    The SRR sampling campaign to obtain residual solids material from the Savannah River Site (SRS) Tank Farm Tank 12 primary vessel resulted in obtaining appreciable material in all 6 planned source samples from the mound strata but only in 5 of the 6 planned source samples from the floor stratum. Consequently, the design of the compositing scheme presented in the Tank 12 Sampling and Analysis Plan, Pavletich (2014a), must be revised. Analytical Development of SRNL statistically evaluated the sampling uncertainty associated with using various compositing arrays and splitting one or more samples for compositing. The variance of the simple meanmore » of composite sample concentrations is a reasonable standard to investigate the impact of the following sampling options. Composite Sample Design Option (a). Assign only 1 source sample from the floor stratum and 1 source sample from each of the mound strata to each of the composite samples. Each source sample contributes material to only 1 composite sample. Two source samples from the floor stratum would not be used. Composite Sample Design Option (b). Assign 2 source samples from the floor stratum and 1 source sample from each of the mound strata to each composite sample. This infers that one source sample from the floor must be used twice, with 2 composite samples sharing material from this particular source sample. All five source samples from the floor would be used. Composite Sample Design Option (c). Assign 3 source samples from the floor stratum and 1 source sample from each of the mound strata to each composite sample. This infers that several of the source samples from the floor stratum must be assigned to more than one composite sample. All 5 source samples from the floor would be used. Using fewer than 12 source samples will increase the sampling variability over that of the Basic Composite Sample Design, Pavletich (2013). Considering the impact to the variance of the simple mean of the composite sample concentrations, the recommendation is to construct each sample composite using four or five source samples. Although the variance using 5 source samples per composite sample (Composite Sample Design Option (c)) was slightly less than the variance using 4 source samples per composite sample (Composite Sample Design Option (b)), there is no practical difference between those variances. This does not consider that the measurement error variance, which is the same for all composite sample design options considered in this report, will further dilute any differences. Composite Sample Design Option (a) had the largest variance for the mean concentration in the three composite samples and should be avoided. These results are consistent with Pavletich (2014b) which utilizes a low elevation and a high elevation mound source sample and two floor source samples for each composite sample. Utilizing the four source samples per composite design, Pavletich (2014b) utilizes aliquots of Floor Sample 4 for two composite samples.« less

  1. Accounting for measurement error: a critical but often overlooked process.

    PubMed

    Harris, Edward F; Smith, Richard N

    2009-12-01

    Due to instrument imprecision and human inconsistencies, measurements are not free of error. Technical error of measurement (TEM) is the variability encountered between dimensions when the same specimens are measured at multiple sessions. A goal of a data collection regimen is to minimise TEM. The few studies that actually quantify TEM, regardless of discipline, report that it is substantial and can affect results and inferences. This paper reviews some statistical approaches for identifying and controlling TEM. Statistically, TEM is part of the residual ('unexplained') variance in a statistical test, so accounting for TEM, which requires repeated measurements, enhances the chances of finding a statistically significant difference if one exists. The aim of this paper was to review and discuss common statistical designs relating to types of error and statistical approaches to error accountability. This paper addresses issues of landmark location, validity, technical and systematic error, analysis of variance, scaled measures and correlation coefficients in order to guide the reader towards correct identification of true experimental differences. Researchers commonly infer characteristics about populations from comparatively restricted study samples. Most inferences are statistical and, aside from concerns about adequate accounting for known sources of variation with the research design, an important source of variability is measurement error. Variability in locating landmarks that define variables is obvious in odontometrics, cephalometrics and anthropometry, but the same concerns about measurement accuracy and precision extend to all disciplines. With increasing accessibility to computer-assisted methods of data collection, the ease of incorporating repeated measures into statistical designs has improved. Accounting for this technical source of variation increases the chance of finding biologically true differences when they exist.

  2. Gene coexpression measures in large heterogeneous samples using count statistics.

    PubMed

    Wang, Y X Rachel; Waterman, Michael S; Huang, Haiyan

    2014-11-18

    With the advent of high-throughput technologies making large-scale gene expression data readily available, developing appropriate computational tools to process these data and distill insights into systems biology has been an important part of the "big data" challenge. Gene coexpression is one of the earliest techniques developed that is still widely in use for functional annotation, pathway analysis, and, most importantly, the reconstruction of gene regulatory networks, based on gene expression data. However, most coexpression measures do not specifically account for local features in expression profiles. For example, it is very likely that the patterns of gene association may change or only exist in a subset of the samples, especially when the samples are pooled from a range of experiments. We propose two new gene coexpression statistics based on counting local patterns of gene expression ranks to take into account the potentially diverse nature of gene interactions. In particular, one of our statistics is designed for time-course data with local dependence structures, such as time series coupled over a subregion of the time domain. We provide asymptotic analysis of their distributions and power, and evaluate their performance against a wide range of existing coexpression measures on simulated and real data. Our new statistics are fast to compute, robust against outliers, and show comparable and often better general performance.

  3. Multi-arm group sequential designs with a simultaneous stopping rule.

    PubMed

    Urach, S; Posch, M

    2016-12-30

    Multi-arm group sequential clinical trials are efficient designs to compare multiple treatments to a control. They allow one to test for treatment effects already in interim analyses and can have a lower average sample number than fixed sample designs. Their operating characteristics depend on the stopping rule: We consider simultaneous stopping, where the whole trial is stopped as soon as for any of the arms the null hypothesis of no treatment effect can be rejected, and separate stopping, where only recruitment to arms for which a significant treatment effect could be demonstrated is stopped, but the other arms are continued. For both stopping rules, the family-wise error rate can be controlled by the closed testing procedure applied to group sequential tests of intersection and elementary hypotheses. The group sequential boundaries for the separate stopping rule also control the family-wise error rate if the simultaneous stopping rule is applied. However, we show that for the simultaneous stopping rule, one can apply improved, less conservative stopping boundaries for local tests of elementary hypotheses. We derive corresponding improved Pocock and O'Brien type boundaries as well as optimized boundaries to maximize the power or average sample number and investigate the operating characteristics and small sample properties of the resulting designs. To control the power to reject at least one null hypothesis, the simultaneous stopping rule requires a lower average sample number than the separate stopping rule. This comes at the cost of a lower power to reject all null hypotheses. Some of this loss in power can be regained by applying the improved stopping boundaries for the simultaneous stopping rule. The procedures are illustrated with clinical trials in systemic sclerosis and narcolepsy. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  4. Design of Neural Networks for Fast Convergence and Accuracy: Dynamics and Control

    NASA Technical Reports Server (NTRS)

    Maghami, Peiman G.; Sparks, Dean W., Jr.

    1997-01-01

    A procedure for the design and training of artificial neural networks, used for rapid and efficient controls and dynamics design and analysis for flexible space systems, has been developed. Artificial neural networks are employed, such that once properly trained, they provide a means of evaluating the impact of design changes rapidly. Specifically, two-layer feedforward neural networks are designed to approximate the functional relationship between the component/spacecraft design changes and measures of its performance or nonlinear dynamics of the system/components. A training algorithm, based on statistical sampling theory, is presented, which guarantees that the trained networks provide a designer-specified degree of accuracy in mapping the functional relationship. Within each iteration of this statistical-based algorithm, a sequential design algorithm is used for the design and training of the feedforward network to provide rapid convergence to the network goals. Here, at each sequence a new network is trained to minimize the error of previous network. The proposed method should work for applications wherein an arbitrary large source of training data can be generated. Two numerical examples are performed on a spacecraft application in order to demonstrate the feasibility of the proposed approach.

  5. Design of neural networks for fast convergence and accuracy: dynamics and control.

    PubMed

    Maghami, P G; Sparks, D R

    2000-01-01

    A procedure for the design and training of artificial neural networks, used for rapid and efficient controls and dynamics design and analysis for flexible space systems, has been developed. Artificial neural networks are employed, such that once properly trained, they provide a means of evaluating the impact of design changes rapidly. Specifically, two-layer feedforward neural networks are designed to approximate the functional relationship between the component/spacecraft design changes and measures of its performance or nonlinear dynamics of the system/components. A training algorithm, based on statistical sampling theory, is presented, which guarantees that the trained networks provide a designer-specified degree of accuracy in mapping the functional relationship. Within each iteration of this statistical-based algorithm, a sequential design algorithm is used for the design and training of the feedforward network to provide rapid convergence to the network goals. Here, at each sequence a new network is trained to minimize the error of previous network. The proposed method should work for applications wherein an arbitrary large source of training data can be generated. Two numerical examples are performed on a spacecraft application in order to demonstrate the feasibility of the proposed approach.

  6. An optimal stratified Simon two-stage design.

    PubMed

    Parashar, Deepak; Bowden, Jack; Starr, Colin; Wernisch, Lorenz; Mander, Adrian

    2016-07-01

    In Phase II oncology trials, therapies are increasingly being evaluated for their effectiveness in specific populations of interest. Such targeted trials require designs that allow for stratification based on the participants' molecular characterisation. A targeted design proposed by Jones and Holmgren (JH) Jones CL, Holmgren E: 'An adaptive Simon two-stage design for phase 2 studies of targeted therapies', Contemporary Clinical Trials 28 (2007) 654-661.determines whether a drug only has activity in a disease sub-population or in the wider disease population. Their adaptive design uses results from a single interim analysis to decide whether to enrich the study population with a subgroup or not; it is based on two parallel Simon two-stage designs. We study the JH design in detail and extend it by providing a few alternative ways to control the familywise error rate, in the weak sense as well as the strong sense. We also introduce a novel optimal design by minimising the expected sample size. Our extended design contributes to the much needed framework for conducting Phase II trials in stratified medicine. © 2016 The Authors Pharmaceutical Statistics Published by John Wiley & Sons Ltd. © 2016 The Authors Pharmaceutical Statistics Published by John Wiley & Sons Ltd.

  7. Testing non-inferiority of a new treatment in three-arm clinical trials with binary endpoints.

    PubMed

    Tang, Nian-Sheng; Yu, Bin; Tang, Man-Lai

    2014-12-18

    A two-arm non-inferiority trial without a placebo is usually adopted to demonstrate that an experimental treatment is not worse than a reference treatment by a small pre-specified non-inferiority margin due to ethical concerns. Selection of the non-inferiority margin and establishment of assay sensitivity are two major issues in the design, analysis and interpretation for two-arm non-inferiority trials. Alternatively, a three-arm non-inferiority clinical trial including a placebo is usually conducted to assess the assay sensitivity and internal validity of a trial. Recently, some large-sample approaches have been developed to assess the non-inferiority of a new treatment based on the three-arm trial design. However, these methods behave badly with small sample sizes in the three arms. This manuscript aims to develop some reliable small-sample methods to test three-arm non-inferiority. Saddlepoint approximation, exact and approximate unconditional, and bootstrap-resampling methods are developed to calculate p-values of the Wald-type, score and likelihood ratio tests. Simulation studies are conducted to evaluate their performance in terms of type I error rate and power. Our empirical results show that the saddlepoint approximation method generally behaves better than the asymptotic method based on the Wald-type test statistic. For small sample sizes, approximate unconditional and bootstrap-resampling methods based on the score test statistic perform better in the sense that their corresponding type I error rates are generally closer to the prespecified nominal level than those of other test procedures. Both approximate unconditional and bootstrap-resampling test procedures based on the score test statistic are generally recommended for three-arm non-inferiority trials with binary outcomes.

  8. Statistical Analyses of Femur Parameters for Designing Anatomical Plates.

    PubMed

    Wang, Lin; He, Kunjin; Chen, Zhengming

    2016-01-01

    Femur parameters are key prerequisites for scientifically designing anatomical plates. Meanwhile, individual differences in femurs present a challenge to design well-fitting anatomical plates. Therefore, to design anatomical plates more scientifically, analyses of femur parameters with statistical methods were performed in this study. The specific steps were as follows. First, taking eight anatomical femur parameters as variables, 100 femur samples were classified into three classes with factor analysis and Q-type cluster analysis. Second, based on the mean parameter values of the three classes of femurs, three sizes of average anatomical plates corresponding to the three classes of femurs were designed. Finally, based on Bayes discriminant analysis, a new femur could be assigned to the proper class. Thereafter, the average anatomical plate suitable for that new femur was selected from the three available sizes of plates. Experimental results showed that the classification of femurs was quite reasonable based on the anatomical aspects of the femurs. For instance, three sizes of condylar buttress plates were designed. Meanwhile, 20 new femurs are judged to which classes the femurs belong. Thereafter, suitable condylar buttress plates were determined and selected.

  9. Steep discounting of delayed monetary and food rewards in obesity: a meta-analysis.

    PubMed

    Amlung, M; Petker, T; Jackson, J; Balodis, I; MacKillop, J

    2016-08-01

    An increasing number of studies have investigated delay discounting (DD) in relation to obesity, but with mixed findings. This meta-analysis synthesized the literature on the relationship between monetary and food DD and obesity, with three objectives: (1) to characterize the relationship between DD and obesity in both case-control comparisons and continuous designs; (2) to examine potential moderators, including case-control v. continuous design, money v. food rewards, sample sex distribution, and sample age (18 years); and (3) to evaluate publication bias. From 134 candidate articles, 39 independent investigations yielded 29 case-control and 30 continuous comparisons (total n = 10 278). Random-effects meta-analysis was conducted using Cohen's d as the effect size. Publication bias was evaluated using fail-safe N, Begg-Mazumdar and Egger tests, meta-regression of publication year and effect size, and imputation of missing studies. The primary analysis revealed a medium effect size across studies that was highly statistically significant (d = 0.43, p < 10-14). None of the moderators examined yielded statistically significant differences, although notably larger effect sizes were found for studies with case-control designs, food rewards and child/adolescent samples. Limited evidence of publication bias was present, although the Begg-Mazumdar test and meta-regression suggested a slightly diminishing effect size over time. Steep DD of food and money appears to be a robust feature of obesity that is relatively consistent across the DD assessment methodologies and study designs examined. These findings are discussed in the context of research on DD in drug addiction, the neural bases of DD in obesity, and potential clinical applications.

  10. NEW APPROACHES TO ESTIMATION OF SOLID-WASTE QUANTITY AND COMPOSITION

    EPA Science Inventory

    Efficient and statistically sound sampling protocols for estimating the quantity and composition of solid waste over a stated period of time in a given location, such as a landfill site or at a specific point in an industrial or commercial process, are essential to the design ...

  11. Roadmap for Navy Family Research.

    DTIC Science & Technology

    1980-08-01

    of methodological limitations, including: small, often non -representative or narrowly defined samples; inadequate statistical controls, inadequate...1-1 1.2 Overview of the Research Roadmap ..................... 1-2 2. Methodology ...the Office of Naval Research by the Westinghouse Public Applied Systems Division, and is designed to provide the Navy with a systematic framework for

  12. Reasoning about Shape as a Pattern in Variability

    ERIC Educational Resources Information Center

    Bakker, Arthur

    2004-01-01

    This paper examines ways in which coherent reasoning about key concepts such as variability, sampling, data, and distribution can be developed as part of statistics education. Instructional activities that could support such reasoning were developed through design research conducted with students in grades 7 and 8. Results are reported from a…

  13. A Comparison of Conjoint Analysis Response Formats

    Treesearch

    Kevin J. Boyle; Thomas P. Holmes; Mario F. Teisl; Brian Roe

    2001-01-01

    A split-sample design is used to evaluate the convergent validity of three response formats used in conjoint analysis experiments. WC investigate whether recoding rating data to rankings and choose-one formats, and recoding ranking data to choose one. result in structural models and welfare estimates that are statistically indistinguishable from...

  14. Measurement guidelines for the sequestration of forest carbon

    Treesearch

    Timothy R.H. Pearson; Sandra L. Brown; Richard A. Birdsey

    2007-01-01

    Measurement guidelines for forest carbon sequestration were developed to support reporting by public and private entities to greenhouse gas registries. These guidelines are intended to be a reference for designing a forest carbon inventory and monitoring system by professionals with a knowledge of sampling, statistical estimation, and forest measurements. This report...

  15. 78 FR 11661 - Request for Information: Main Study Design for the National Children's Study

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-02-19

    ....nationalchildrensstudy.gov/research/workshops/Pages/nationalacademyofsciencesworkshop.aspx . DATES: RFI Release Date is...-response relationships, substudies embedded in the Vanguard Study or the Main Study, and formative research... fall of 2012, the NCS held a series of meetings with federal and non-federal statistical sampling...

  16. Intraclass Correlation Values for Planning Group-Randomized Trials in Education

    ERIC Educational Resources Information Center

    Hedges, Larry V.; Hedberg, E. C.

    2007-01-01

    Experiments that assign intact groups to treatment conditions are increasingly common in social research. In educational research, the groups assigned are often schools. The design of group-randomized experiments requires knowledge of the intraclass correlation structure to compute statistical power and sample sizes required to achieve adequate…

  17. Learning Opportunities for Group Learning

    ERIC Educational Resources Information Center

    Gil, Alfonso J.; Mataveli, Mara

    2017-01-01

    Purpose: This paper aims to analyse the impact of organizational learning culture and learning facilitators in group learning. Design/methodology/approach: This study was conducted using a survey method applied to a statistically representative sample of employees from Rioja wine companies in Spain. A model was tested using a structural equation…

  18. Developing and Refining the Taiwan Birth Cohort Study (TBCS): Five Years of Experience

    ERIC Educational Resources Information Center

    Lung, For-Wey; Chiang, Tung-Liang; Lin, Shio-Jean; Shu, Bih-Ching; Lee, Meng-Chih

    2011-01-01

    The Taiwan Birth Cohort Study (TBCS) is the first nationwide birth cohort database in Asia designed to establish national norms of children's development. Several challenges during database development and data analysis were identified. Challenges include sampling methods, instrument development and statistical approach to missing data. The…

  19. Comparison of statistical sampling methods with ScannerBit, the GAMBIT scanning module

    NASA Astrophysics Data System (ADS)

    Martinez, Gregory D.; McKay, James; Farmer, Ben; Scott, Pat; Roebber, Elinore; Putze, Antje; Conrad, Jan

    2017-11-01

    We introduce ScannerBit, the statistics and sampling module of the public, open-source global fitting framework GAMBIT. ScannerBit provides a standardised interface to different sampling algorithms, enabling the use and comparison of multiple computational methods for inferring profile likelihoods, Bayesian posteriors, and other statistical quantities. The current version offers random, grid, raster, nested sampling, differential evolution, Markov Chain Monte Carlo (MCMC) and ensemble Monte Carlo samplers. We also announce the release of a new standalone differential evolution sampler, Diver, and describe its design, usage and interface to ScannerBit. We subject Diver and three other samplers (the nested sampler MultiNest, the MCMC GreAT, and the native ScannerBit implementation of the ensemble Monte Carlo algorithm T-Walk) to a battery of statistical tests. For this we use a realistic physical likelihood function, based on the scalar singlet model of dark matter. We examine the performance of each sampler as a function of its adjustable settings, and the dimensionality of the sampling problem. We evaluate performance on four metrics: optimality of the best fit found, completeness in exploring the best-fit region, number of likelihood evaluations, and total runtime. For Bayesian posterior estimation at high resolution, T-Walk provides the most accurate and timely mapping of the full parameter space. For profile likelihood analysis in less than about ten dimensions, we find that Diver and MultiNest score similarly in terms of best fit and speed, outperforming GreAT and T-Walk; in ten or more dimensions, Diver substantially outperforms the other three samplers on all metrics.

  20. How conservative is Fisher's exact test? A quantitative evaluation of the two-sample comparative binomial trial.

    PubMed

    Crans, Gerald G; Shuster, Jonathan J

    2008-08-15

    The debate as to which statistical methodology is most appropriate for the analysis of the two-sample comparative binomial trial has persisted for decades. Practitioners who favor the conditional methods of Fisher, Fisher's exact test (FET), claim that only experimental outcomes containing the same amount of information should be considered when performing analyses. Hence, the total number of successes should be fixed at its observed level in hypothetical repetitions of the experiment. Using conditional methods in clinical settings can pose interpretation difficulties, since results are derived using conditional sample spaces rather than the set of all possible outcomes. Perhaps more importantly from a clinical trial design perspective, this test can be too conservative, resulting in greater resource requirements and more subjects exposed to an experimental treatment. The actual significance level attained by FET (the size of the test) has not been reported in the statistical literature. Berger (J. R. Statist. Soc. D (The Statistician) 2001; 50:79-85) proposed assessing the conservativeness of conditional methods using p-value confidence intervals. In this paper we develop a numerical algorithm that calculates the size of FET for sample sizes, n, up to 125 per group at the two-sided significance level, alpha = 0.05. Additionally, this numerical method is used to define new significance levels alpha(*) = alpha+epsilon, where epsilon is a small positive number, for each n, such that the size of the test is as close as possible to the pre-specified alpha (0.05 for the current work) without exceeding it. Lastly, a sample size and power calculation example are presented, which demonstrates the statistical advantages of implementing the adjustment to FET (using alpha(*) instead of alpha) in the two-sample comparative binomial trial. 2008 John Wiley & Sons, Ltd

  1. The power and promise of RNA-seq in ecology and evolution.

    PubMed

    Todd, Erica V; Black, Michael A; Gemmell, Neil J

    2016-03-01

    Reference is regularly made to the power of new genomic sequencing approaches. Using powerful technology, however, is not the same as having the necessary power to address a research question with statistical robustness. In the rush to adopt new and improved genomic research methods, limitations of technology and experimental design may be initially neglected. Here, we review these issues with regard to RNA sequencing (RNA-seq). RNA-seq adds large-scale transcriptomics to the toolkit of ecological and evolutionary biologists, enabling differential gene expression (DE) studies in nonmodel species without the need for prior genomic resources. High biological variance is typical of field-based gene expression studies and means that larger sample sizes are often needed to achieve the same degree of statistical power as clinical studies based on data from cell lines or inbred animal models. Sequencing costs have plummeted, yet RNA-seq studies still underutilize biological replication. Finite research budgets force a trade-off between sequencing effort and replication in RNA-seq experimental design. However, clear guidelines for negotiating this trade-off, while taking into account study-specific factors affecting power, are currently lacking. Study designs that prioritize sequencing depth over replication fail to capitalize on the power of RNA-seq technology for DE inference. Significant recent research effort has gone into developing statistical frameworks and software tools for power analysis and sample size calculation in the context of RNA-seq DE analysis. We synthesize progress in this area and derive an accessible rule-of-thumb guide for designing powerful RNA-seq experiments relevant in eco-evolutionary and clinical settings alike. © 2016 John Wiley & Sons Ltd.

  2. Inventory and mapping of flood inundation using interactive digital image analysis techniques

    USGS Publications Warehouse

    Rohde, Wayne G.; Nelson, Charles A.; Taranik, J.V.

    1979-01-01

    LANDSAT digital data and color infra-red photographs were used in a multiphase sampling scheme to estimate the area of agricultural land affected by a flood. The LANDSAT data were classified with a maximum likelihood algorithm. Stratification of the LANDSAT data, prior to classification, greatly reduced misclassification errors. The classification results were used to prepare a map overlay showing the areal extent of flooding. These data also provided statistics required to estimate sample size in a two phase sampling scheme, and provided quick, accurate estimates of areas flooded for the first phase. The measurements made in the second phase, based on ground data and photo-interpretation, were used with two phase sampling statistics to estimate the area of agricultural land affected by flooding These results show that LANDSAT digital data can be used to prepare map overlays showing the extent of flooding on agricultural land and, with two phase sampling procedures, can provide acreage estimates with sampling errors of about 5 percent. This procedure provides a technique for rapidly assessing the areal extent of flood conditions on agricultural land and would provide a basis for designing a sampling framework to estimate the impact of flooding on crop production.

  3. Statistical inference from multiple iTRAQ experiments without using common reference standards.

    PubMed

    Herbrich, Shelley M; Cole, Robert N; West, Keith P; Schulze, Kerry; Yager, James D; Groopman, John D; Christian, Parul; Wu, Lee; O'Meally, Robert N; May, Damon H; McIntosh, Martin W; Ruczinski, Ingo

    2013-02-01

    Isobaric tags for relative and absolute quantitation (iTRAQ) is a prominent mass spectrometry technology for protein identification and quantification that is capable of analyzing multiple samples in a single experiment. Frequently, iTRAQ experiments are carried out using an aliquot from a pool of all samples, or "masterpool", in one of the channels as a reference sample standard to estimate protein relative abundances in the biological samples and to combine abundance estimates from multiple experiments. In this manuscript, we show that using a masterpool is counterproductive. We obtain more precise estimates of protein relative abundance by using the available biological data instead of the masterpool and do not need to occupy a channel that could otherwise be used for another biological sample. In addition, we introduce a simple statistical method to associate proteomic data from multiple iTRAQ experiments with a numeric response and show that this approach is more powerful than the conventionally employed masterpool-based approach. We illustrate our methods using data from four replicate iTRAQ experiments on aliquots of the same pool of plasma samples and from a 406-sample project designed to identify plasma proteins that covary with nutrient concentrations in chronically undernourished children from South Asia.

  4. Preparing and Presenting Effective Research Posters

    PubMed Central

    Miller, Jane E

    2007-01-01

    Objectives Posters are a common way to present results of a statistical analysis, program evaluation, or other project at professional conferences. Often, researchers fail to recognize the unique nature of the format, which is a hybrid of a published paper and an oral presentation. This methods note demonstrates how to design research posters to convey study objectives, methods, findings, and implications effectively to varied professional audiences. Methods A review of existing literature on research communication and poster design is used to identify and demonstrate important considerations for poster content and layout. Guidelines on how to write about statistical methods, results, and statistical significance are illustrated with samples of ineffective writing annotated to point out weaknesses, accompanied by concrete examples and explanations of improved presentation. A comparison of the content and format of papers, speeches, and posters is also provided. Findings Each component of a research poster about a quantitative analysis should be adapted to the audience and format, with complex statistical results translated into simplified charts, tables, and bulleted text to convey findings as part of a clear, focused story line. Conclusions Effective research posters should be designed around two or three key findings with accompanying handouts and narrative description to supply additional technical detail and encourage dialog with poster viewers. PMID:17355594

  5. 42 CFR 402.109 - Statistical sampling.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 42 Public Health 2 2011-10-01 2011-10-01 false Statistical sampling. 402.109 Section 402.109... Statistical sampling. (a) Purpose. CMS or OIG may introduce the results of a statistical sampling study to... or caused to be presented. (b) Prima facie evidence. The results of the statistical sampling study...

  6. 42 CFR 402.109 - Statistical sampling.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 42 Public Health 2 2010-10-01 2010-10-01 false Statistical sampling. 402.109 Section 402.109... Statistical sampling. (a) Purpose. CMS or OIG may introduce the results of a statistical sampling study to... or caused to be presented. (b) Prima facie evidence. The results of the statistical sampling study...

  7. Optimal sample sizes for the design of reliability studies: power consideration.

    PubMed

    Shieh, Gwowen

    2014-09-01

    Intraclass correlation coefficients are used extensively to measure the reliability or degree of resemblance among group members in multilevel research. This study concerns the problem of the necessary sample size to ensure adequate statistical power for hypothesis tests concerning the intraclass correlation coefficient in the one-way random-effects model. In view of the incomplete and problematic numerical results in the literature, the approximate sample size formula constructed from Fisher's transformation is reevaluated and compared with an exact approach across a wide range of model configurations. These comprehensive examinations showed that the Fisher transformation method is appropriate only under limited circumstances, and therefore it is not recommended as a general method in practice. For advance design planning of reliability studies, the exact sample size procedures are fully described and illustrated for various allocation and cost schemes. Corresponding computer programs are also developed to implement the suggested algorithms.

  8. Surrogate and clinical endpoints for studies in peripheral artery occlusive disease: Are statistics the brakes?

    PubMed

    Waliszewski, Matthias W; Redlich, Ulf; Breul, Victor; Tautenhahn, Jörg

    2017-04-30

    The aim of this review is to present the available clinical and surrogate endpoints that may be used in future studies performed in patients with peripheral artery occlusive disease (PAOD). Importantly, we describe statistical limitations of the most commonly used endpoints and offer some guidance with respect to study design for a given sample size. The proposed endpoints may be used in studies using surgical or interventional revascularization and/or drug treatments. Considering recently published study endpoints and designs, the usefulness of these endpoints for reimbursement is evaluated. Based on these potential study endpoints and patient sample size estimates with different non-inferiority or tests for difference hypotheses, a rating relative to their corresponding reimbursement values is attempted. As regards the benefit for the patients and for the payers, walking distance and the ankle brachial index (ABI) are the most feasible endpoints in a relatively small study samples given that other non-vascular impact factors can be controlled. Angiographic endpoints such as minimal lumen diameter (MLD) do not seem useful from a reimbursement standpoint despite their intuitiveness. Other surrogate endpoints, such as transcutaneous oxygen tension measurements, have yet to be established as useful endpoints in reasonably sized studies with patients with critical limb ischemia (CLI). From a reimbursement standpoint, WD and ABI are effective endpoints for a moderate study sample size given that non-vascular confounding factors can be controlled.

  9. Seven Pervasive Statistical Flaws in Cognitive Training Interventions

    PubMed Central

    Moreau, David; Kirk, Ian J.; Waldie, Karen E.

    2016-01-01

    The prospect of enhancing cognition is undoubtedly among the most exciting research questions currently bridging psychology, neuroscience, and evidence-based medicine. Yet, convincing claims in this line of work stem from designs that are prone to several shortcomings, thus threatening the credibility of training-induced cognitive enhancement. Here, we present seven pervasive statistical flaws in intervention designs: (i) lack of power; (ii) sampling error; (iii) continuous variable splits; (iv) erroneous interpretations of correlated gain scores; (v) single transfer assessments; (vi) multiple comparisons; and (vii) publication bias. Each flaw is illustrated with a Monte Carlo simulation to present its underlying mechanisms, gauge its magnitude, and discuss potential remedies. Although not restricted to training studies, these flaws are typically exacerbated in such designs, due to ubiquitous practices in data collection or data analysis. The article reviews these practices, so as to avoid common pitfalls when designing or analyzing an intervention. More generally, it is also intended as a reference for anyone interested in evaluating claims of cognitive enhancement. PMID:27148010

  10. Frontiers of Two-Dimensional Correlation Spectroscopy. Part 1. New concepts and noteworthy developments

    NASA Astrophysics Data System (ADS)

    Noda, Isao

    2014-07-01

    A comprehensive survey review of new and noteworthy developments, which are advancing forward the frontiers in the field of 2D correlation spectroscopy during the last four years, is compiled. This review covers books, proceedings, and review articles published on 2D correlation spectroscopy, a number of significant conceptual developments in the field, data pretreatment methods and other pertinent topics, as well as patent and publication trends and citation activities. Developments discussed include projection 2D correlation analysis, concatenated 2D correlation, and correlation under multiple perturbation effects, as well as orthogonal sample design, predicting 2D correlation spectra, manipulating and comparing 2D spectra, correlation strategy based on segmented data blocks, such as moving-window analysis, features like determination of sequential order and enhanced spectral resolution, statistical 2D spectroscopy using covariance and other statistical metrics, hetero-correlation analysis, and sample-sample correlation technique. Data pretreatment operations prior to 2D correlation analysis are discussed, including the correction for physical effects, background and baseline subtraction, selection of reference spectrum, normalization and scaling of data, derivatives spectra and deconvolution technique, and smoothing and noise reduction. Other pertinent topics include chemometrics and statistical considerations, peak position shift phenomena, variable sampling increments, computation and software, display schemes, such as color coded format, slice and power spectra, tabulation, and other schemes.

  11. Time series, periodograms, and significance

    NASA Astrophysics Data System (ADS)

    Hernandez, G.

    1999-05-01

    The geophysical literature shows a wide and conflicting usage of methods employed to extract meaningful information on coherent oscillations from measurements. This makes it difficult, if not impossible, to relate the findings reported by different authors. Therefore, we have undertaken a critical investigation of the tests and methodology used for determining the presence of statistically significant coherent oscillations in periodograms derived from time series. Statistical significance tests are only valid when performed on the independent frequencies present in a measurement. Both the number of possible independent frequencies in a periodogram and the significance tests are determined by the number of degrees of freedom, which is the number of true independent measurements, present in the time series, rather than the number of sample points in the measurement. The number of degrees of freedom is an intrinsic property of the data, and it must be determined from the serial coherence of the time series. As part of this investigation, a detailed study has been performed which clearly illustrates the deleterious effects that the apparently innocent and commonly used processes of filtering, de-trending, and tapering of data have on periodogram analysis and the consequent difficulties in the interpretation of the statistical significance thus derived. For the sake of clarity, a specific example of actual field measurements containing unevenly-spaced measurements, gaps, etc., as well as synthetic examples, have been used to illustrate the periodogram approach, and pitfalls, leading to the (statistical) significance tests for the presence of coherent oscillations. Among the insights of this investigation are: (1) the concept of a time series being (statistically) band limited by its own serial coherence and thus having a critical sampling rate which defines one of the necessary requirements for the proper statistical design of an experiment; (2) the design of a critical test for the maximum number of significant frequencies which can be used to describe a time series, while retaining intact the variance of the test sample; (3) a demonstration of the unnecessary difficulties that manipulation of the data brings into the statistical significance interpretation of said data; and (4) the resolution and correction of the apparent discrepancy in significance results obtained by the use of the conventional Lomb-Scargle significance test, when compared with the long-standing Schuster-Walker and Fisher tests.

  12. Samples in applied psychology: over a decade of research in review.

    PubMed

    Shen, Winny; Kiger, Thomas B; Davies, Stacy E; Rasch, Rena L; Simon, Kara M; Ones, Deniz S

    2011-09-01

    This study examines sample characteristics of articles published in Journal of Applied Psychology (JAP) from 1995 to 2008. At the individual level, the overall median sample size over the period examined was approximately 173, which is generally adequate for detecting the average magnitude of effects of primary interest to researchers who publish in JAP. Samples using higher units of analyses (e.g., teams, departments/work units, and organizations) had lower median sample sizes (Mdn ≈ 65), yet were arguably robust given typical multilevel design choices of JAP authors despite the practical constraints of collecting data at higher units of analysis. A substantial proportion of studies used student samples (~40%); surprisingly, median sample sizes for student samples were smaller than working adult samples. Samples were more commonly occupationally homogeneous (~70%) than occupationally heterogeneous. U.S. and English-speaking participants made up the vast majority of samples, whereas Middle Eastern, African, and Latin American samples were largely unrepresented. On the basis of study results, recommendations are provided for authors, editors, and readers, which converge on 3 themes: (a) appropriateness and match between sample characteristics and research questions, (b) careful consideration of statistical power, and (c) the increased popularity of quantitative synthesis. Implications are discussed in terms of theory building, generalizability of research findings, and statistical power to detect effects. PsycINFO Database Record (c) 2011 APA, all rights reserved

  13. Sampling of temporal networks: Methods and biases

    NASA Astrophysics Data System (ADS)

    Rocha, Luis E. C.; Masuda, Naoki; Holme, Petter

    2017-11-01

    Temporal networks have been increasingly used to model a diversity of systems that evolve in time; for example, human contact structures over which dynamic processes such as epidemics take place. A fundamental aspect of real-life networks is that they are sampled within temporal and spatial frames. Furthermore, one might wish to subsample networks to reduce their size for better visualization or to perform computationally intensive simulations. The sampling method may affect the network structure and thus caution is necessary to generalize results based on samples. In this paper, we study four sampling strategies applied to a variety of real-life temporal networks. We quantify the biases generated by each sampling strategy on a number of relevant statistics such as link activity, temporal paths and epidemic spread. We find that some biases are common in a variety of networks and statistics, but one strategy, uniform sampling of nodes, shows improved performance in most scenarios. Given the particularities of temporal network data and the variety of network structures, we recommend that the choice of sampling methods be problem oriented to minimize the potential biases for the specific research questions on hand. Our results help researchers to better design network data collection protocols and to understand the limitations of sampled temporal network data.

  14. A comparison of two experimental design approaches in applying conjoint analysis in patient-centered outcomes research: a randomized trial.

    PubMed

    Kinter, Elizabeth T; Prior, Thomas J; Carswell, Christopher I; Bridges, John F P

    2012-01-01

    While the application of conjoint analysis and discrete-choice experiments in health are now widely accepted, a healthy debate exists around competing approaches to experimental design. There remains, however, a paucity of experimental evidence comparing competing design approaches and their impact on the application of these methods in patient-centered outcomes research. Our objectives were to directly compare the choice-model parameters and predictions of an orthogonal and a D-efficient experimental design using a randomized trial (i.e., an experiment on experiments) within an application of conjoint analysis studying patient-centered outcomes among outpatients diagnosed with schizophrenia in Germany. Outpatients diagnosed with schizophrenia were surveyed and randomized to receive choice tasks developed using either an orthogonal or a D-efficient experimental design. The choice tasks elicited judgments from the respondents as to which of two patient profiles (varying across seven outcomes and process attributes) was preferable from their own perspective. The results from the two survey designs were analyzed using the multinomial logit model, and the resulting parameter estimates and their robust standard errors were compared across the two arms of the study (i.e., the orthogonal and D-efficient designs). The predictive performances of the two resulting models were also compared by computing their percentage of survey responses classified correctly, and the potential for variation in scale between the two designs of the experiments was tested statistically and explored graphically. The results of the two models were statistically identical. No difference was found using an overall chi-squared test of equality for the seven parameters (p = 0.69) or via uncorrected pairwise comparisons of the parameter estimates (p-values ranged from 0.30 to 0.98). The D-efficient design resulted in directionally smaller standard errors for six of the seven parameters, of which only two were statistically significant, and no differences were found in the observed D-efficiencies of their standard errors (p = 0.62). The D-efficient design resulted in poorer predictive performance, but this was not significant (p = 0.73); there was some evidence that the parameters of the D-efficient design were biased marginally towards the null. While no statistical difference in scale was detected between the two designs (p = 0.74), the D-efficient design had a higher relative scale (1.06). This could be observed when the parameters were explored graphically, as the D-efficient parameters were lower. Our results indicate that orthogonal and D-efficient experimental designs have produced results that are statistically equivalent. This said, we have identified several qualitative findings that speak to the potential differences in these results that may have been statistically identified in a larger sample. While more comparative studies focused on the statistical efficiency of competing design strategies are needed, a more pressing research problem is to document the impact the experimental design has on respondent efficiency.

  15. Power calculation for overall hypothesis testing with high-dimensional commensurate outcomes.

    PubMed

    Chi, Yueh-Yun; Gribbin, Matthew J; Johnson, Jacqueline L; Muller, Keith E

    2014-02-28

    The complexity of system biology means that any metabolic, genetic, or proteomic pathway typically includes so many components (e.g., molecules) that statistical methods specialized for overall testing of high-dimensional and commensurate outcomes are required. While many overall tests have been proposed, very few have power and sample size methods. We develop accurate power and sample size methods and software to facilitate study planning for high-dimensional pathway analysis. With an account of any complex correlation structure between high-dimensional outcomes, the new methods allow power calculation even when the sample size is less than the number of variables. We derive the exact (finite-sample) and approximate non-null distributions of the 'univariate' approach to repeated measures test statistic, as well as power-equivalent scenarios useful to generalize our numerical evaluations. Extensive simulations of group comparisons support the accuracy of the approximations even when the ratio of number of variables to sample size is large. We derive a minimum set of constants and parameters sufficient and practical for power calculation. Using the new methods and specifying the minimum set to determine power for a study of metabolic consequences of vitamin B6 deficiency helps illustrate the practical value of the new results. Free software implementing the power and sample size methods applies to a wide range of designs, including one group pre-intervention and post-intervention comparisons, multiple parallel group comparisons with one-way or factorial designs, and the adjustment and evaluation of covariate effects. Copyright © 2013 John Wiley & Sons, Ltd.

  16. Reporting of various methodological and statistical parameters in negative studies published in prominent Indian Medical Journals: a systematic review.

    PubMed

    Charan, J; Saxena, D

    2014-01-01

    Biased negative studies not only reflect poor research effort but also have an impact on 'patient care' as they prevent further research with similar objectives, leading to potential research areas remaining unexplored. Hence, published 'negative studies' should be methodologically strong. All parameters that may help a reader to judge validity of results and conclusions should be reported in published negative studies. There is a paucity of data on reporting of statistical and methodological parameters in negative studies published in Indian Medical Journals. The present systematic review was designed with an aim to critically evaluate negative studies published in prominent Indian Medical Journals for reporting of statistical and methodological parameters. Systematic review. All negative studies published in 15 Science Citation Indexed (SCI) medical journals published from India were included in present study. Investigators involved in the study evaluated all negative studies for the reporting of various parameters. Primary endpoints were reporting of "power" and "confidence interval." Power was reported in 11.8% studies. Confidence interval was reported in 15.7% studies. Majority of parameters like sample size calculation (13.2%), type of sampling method (50.8%), name of statistical tests (49.1%), adjustment of multiple endpoints (1%), post hoc power calculation (2.1%) were reported poorly. Frequency of reporting was more in clinical trials as compared to other study designs and in journals having impact factor more than 1 as compared to journals having impact factor less than 1. Negative studies published in prominent Indian medical journals do not report statistical and methodological parameters adequately and this may create problems in the critical appraisal of findings reported in these journals by its readers.

  17. Design and analysis of three-arm trials with negative binomially distributed endpoints.

    PubMed

    Mütze, Tobias; Munk, Axel; Friede, Tim

    2016-02-20

    A three-arm clinical trial design with an experimental treatment, an active control, and a placebo control, commonly referred to as the gold standard design, enables testing of non-inferiority or superiority of the experimental treatment compared with the active control. In this paper, we propose methods for designing and analyzing three-arm trials with negative binomially distributed endpoints. In particular, we develop a Wald-type test with a restricted maximum-likelihood variance estimator for testing non-inferiority or superiority. For this test, sample size and power formulas as well as optimal sample size allocations will be derived. The performance of the proposed test will be assessed in an extensive simulation study with regard to type I error rate, power, sample size, and sample size allocation. For the purpose of comparison, Wald-type statistics with a sample variance estimator and an unrestricted maximum-likelihood estimator are included in the simulation study. We found that the proposed Wald-type test with a restricted variance estimator performed well across the considered scenarios and is therefore recommended for application in clinical trials. The methods proposed are motivated and illustrated by a recent clinical trial in multiple sclerosis. The R package ThreeArmedTrials, which implements the methods discussed in this paper, is available on CRAN. Copyright © 2015 John Wiley & Sons, Ltd.

  18. Optimal Experimental Design for Model Discrimination

    PubMed Central

    Myung, Jay I.; Pitt, Mark A.

    2009-01-01

    Models of a psychological process can be difficult to discriminate experimentally because it is not easy to determine the values of the critical design variables (e.g., presentation schedule, stimulus structure) that will be most informative in differentiating them. Recent developments in sampling-based search methods in statistics make it possible to determine these values, and thereby identify an optimal experimental design. After describing the method, it is demonstrated in two content areas in cognitive psychology in which models are highly competitive: retention (i.e., forgetting) and categorization. The optimal design is compared with the quality of designs used in the literature. The findings demonstrate that design optimization has the potential to increase the informativeness of the experimental method. PMID:19618983

  19. Groundwater-quality data for the Sierra Nevada study unit, 2008: Results from the California GAMA program

    USGS Publications Warehouse

    Shelton, Jennifer L.; Fram, Miranda S.; Munday, Cathy M.; Belitz, Kenneth

    2010-01-01

    Groundwater quality in the approximately 25,500-square-mile Sierra Nevada study unit was investigated in June through October 2008, as part of the Priority Basin Project of the Groundwater Ambient Monitoring and Assessment (GAMA) Program. The GAMA Priority Basin Project is being conducted by the U.S. Geological Survey (USGS) in cooperation with the California State Water Resources Control Board (SWRCB). The Sierra Nevada study was designed to provide statistically robust assessments of untreated groundwater quality within the primary aquifer systems in the study unit, and to facilitate statistically consistent comparisons of groundwater quality throughout California. The primary aquifer systems (hereinafter, primary aquifers) are defined by the depth of the screened or open intervals of the wells listed in the California Department of Public Health (CDPH) database of wells used for public and community drinking-water supplies. The quality of groundwater in shallower or deeper water-bearing zones may differ from that in the primary aquifers; shallow groundwater may be more vulnerable to contamination from the surface. In the Sierra Nevada study unit, groundwater samples were collected from 84 wells (and springs) in Lassen, Plumas, Butte, Sierra, Yuba, Nevada, Placer, El Dorado, Amador, Alpine, Calaveras, Tuolumne, Madera, Mariposa, Fresno, Inyo, Tulare, and Kern Counties. The wells were selected on two overlapping networks by using a spatially-distributed, randomized, grid-based approach. The primary grid-well network consisted of 30 wells, one well per grid cell in the study unit, and was designed to provide statistical representation of groundwater quality throughout the entire study unit. The lithologic grid-well network is a secondary grid that consisted of the wells in the primary grid-well network plus 53 additional wells and was designed to provide statistical representation of groundwater quality in each of the four major lithologic units in the Sierra Nevada study unit: granitic, metamorphic, sedimentary, and volcanic rocks. One natural spring that is not used for drinking water was sampled for comparison with a nearby primary grid well in the same cell. Groundwater samples were analyzed for organic constituents (volatile organic compounds [VOC], pesticides and pesticide degradates, and pharmaceutical compounds), constituents of special interest (N-nitrosodimethylamine [NDMA] and perchlorate), naturally occurring inorganic constituents (nutrients, major ions, total dissolved solids, and trace elements), and radioactive constituents (radium isotopes, radon-222, gross alpha and gross beta particle activities, and uranium isotopes). Naturally occurring isotopes and geochemical tracers (stable isotopes of hydrogen and oxygen in water, stable isotopes of carbon, carbon-14, strontium isotopes, and tritium), and dissolved noble gases also were measured to help identify the sources and ages of the sampled groundwater. Three types of quality-control samples (blanks, replicates, and samples for matrix spikes) each were collected at approximately 10 percent of the wells sampled for each analysis, and the results for these samples were used to evaluate the quality of the data for the groundwater samples. Field blanks rarely contained detectable concentrations of any constituent, suggesting that contamination from sample collection, handling, and analytical procedures was not a significant source of bias in the data for the groundwater samples. Differences between replicate samples were within acceptable ranges, with few exceptions. Matrix-spike recoveries were within acceptable ranges for most compounds. This study did not attempt to evaluate the quality of water delivered to consumers; after withdrawal from the ground, groundwater typically is treated, disinfected, or blended with other waters to maintain water quality. Regulatory benchmarks apply to finished drinking water that is served to the consumer, not to untre

  20. Sample size considerations when groups are the appropriate unit of analyses

    PubMed Central

    Sadler, Georgia Robins; Ko, Celine Marie; Alisangco, Jennifer; Rosbrook, Bradley P.; Miller, Eric; Fullerton, Judith

    2007-01-01

    This paper discusses issues to be considered by nurse researchers when groups should be used as a unit of randomization. Advantages and disadvantages are presented, with statistical calculations needed to determine effective sample size. Examples of these concepts are presented using data from the Black Cosmetologists Promoting Health Program. Different hypothetical scenarios and their impact on sample size are presented. Given the complexity of calculating sample size when using groups as a unit of randomization, it’s advantageous for researchers to work closely with statisticians when designing and implementing studies that anticipate the use of groups as the unit of randomization. PMID:17693219

  1. Statistical Analysis of a Large Sample Size Pyroshock Test Data Set Including Post Flight Data Assessment. Revision 1

    NASA Technical Reports Server (NTRS)

    Hughes, William O.; McNelis, Anne M.

    2010-01-01

    The Earth Observing System (EOS) Terra spacecraft was launched on an Atlas IIAS launch vehicle on its mission to observe planet Earth in late 1999. Prior to launch, the new design of the spacecraft's pyroshock separation system was characterized by a series of 13 separation ground tests. The analysis methods used to evaluate this unusually large amount of shock data will be discussed in this paper, with particular emphasis on population distributions and finding statistically significant families of data, leading to an overall shock separation interface level. The wealth of ground test data also allowed a derivation of a Mission Assurance level for the flight. All of the flight shock measurements were below the EOS Terra Mission Assurance level thus contributing to the overall success of the EOS Terra mission. The effectiveness of the statistical methodology for characterizing the shock interface level and for developing a flight Mission Assurance level from a large sample size of shock data is demonstrated in this paper.

  2. Maximum type 1 error rate inflation in multiarmed clinical trials with adaptive interim sample size modifications.

    PubMed

    Graf, Alexandra C; Bauer, Peter; Glimm, Ekkehard; Koenig, Franz

    2014-07-01

    Sample size modifications in the interim analyses of an adaptive design can inflate the type 1 error rate, if test statistics and critical boundaries are used in the final analysis as if no modification had been made. While this is already true for designs with an overall change of the sample size in a balanced treatment-control comparison, the inflation can be much larger if in addition a modification of allocation ratios is allowed as well. In this paper, we investigate adaptive designs with several treatment arms compared to a single common control group. Regarding modifications, we consider treatment arm selection as well as modifications of overall sample size and allocation ratios. The inflation is quantified for two approaches: a naive procedure that ignores not only all modifications, but also the multiplicity issue arising from the many-to-one comparison, and a Dunnett procedure that ignores modifications, but adjusts for the initially started multiple treatments. The maximum inflation of the type 1 error rate for such types of design can be calculated by searching for the "worst case" scenarios, that are sample size adaptation rules in the interim analysis that lead to the largest conditional type 1 error rate in any point of the sample space. To show the most extreme inflation, we initially assume unconstrained second stage sample size modifications leading to a large inflation of the type 1 error rate. Furthermore, we investigate the inflation when putting constraints on the second stage sample sizes. It turns out that, for example fixing the sample size of the control group, leads to designs controlling the type 1 error rate. © 2014 The Author. Biometrical Journal published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. Statistical methods used in the public health literature and implications for training of public health professionals

    PubMed Central

    Hayat, Matthew J.; Powell, Amanda; Johnson, Tessa; Cadwell, Betsy L.

    2017-01-01

    Statistical literacy and knowledge is needed to read and understand the public health literature. The purpose of this study was to quantify basic and advanced statistical methods used in public health research. We randomly sampled 216 published articles from seven top tier general public health journals. Studies were reviewed by two readers and a standardized data collection form completed for each article. Data were analyzed with descriptive statistics and frequency distributions. Results were summarized for statistical methods used in the literature, including descriptive and inferential statistics, modeling, advanced statistical techniques, and statistical software used. Approximately 81.9% of articles reported an observational study design and 93.1% of articles were substantively focused. Descriptive statistics in table or graphical form were reported in more than 95% of the articles, and statistical inference reported in more than 76% of the studies reviewed. These results reveal the types of statistical methods currently used in the public health literature. Although this study did not obtain information on what should be taught, information on statistical methods being used is useful for curriculum development in graduate health sciences education, as well as making informed decisions about continuing education for public health professionals. PMID:28591190

  4. Statistical methods used in the public health literature and implications for training of public health professionals.

    PubMed

    Hayat, Matthew J; Powell, Amanda; Johnson, Tessa; Cadwell, Betsy L

    2017-01-01

    Statistical literacy and knowledge is needed to read and understand the public health literature. The purpose of this study was to quantify basic and advanced statistical methods used in public health research. We randomly sampled 216 published articles from seven top tier general public health journals. Studies were reviewed by two readers and a standardized data collection form completed for each article. Data were analyzed with descriptive statistics and frequency distributions. Results were summarized for statistical methods used in the literature, including descriptive and inferential statistics, modeling, advanced statistical techniques, and statistical software used. Approximately 81.9% of articles reported an observational study design and 93.1% of articles were substantively focused. Descriptive statistics in table or graphical form were reported in more than 95% of the articles, and statistical inference reported in more than 76% of the studies reviewed. These results reveal the types of statistical methods currently used in the public health literature. Although this study did not obtain information on what should be taught, information on statistical methods being used is useful for curriculum development in graduate health sciences education, as well as making informed decisions about continuing education for public health professionals.

  5. iCFD: Interpreted Computational Fluid Dynamics - Degeneration of CFD to one-dimensional advection-dispersion models using statistical experimental design - The secondary clarifier.

    PubMed

    Guyonvarch, Estelle; Ramin, Elham; Kulahci, Murat; Plósz, Benedek Gy

    2015-10-15

    The present study aims at using statistically designed computational fluid dynamics (CFD) simulations as numerical experiments for the identification of one-dimensional (1-D) advection-dispersion models - computationally light tools, used e.g., as sub-models in systems analysis. The objective is to develop a new 1-D framework, referred to as interpreted CFD (iCFD) models, in which statistical meta-models are used to calculate the pseudo-dispersion coefficient (D) as a function of design and flow boundary conditions. The method - presented in a straightforward and transparent way - is illustrated using the example of a circular secondary settling tank (SST). First, the significant design and flow factors are screened out by applying the statistical method of two-level fractional factorial design of experiments. Second, based on the number of significant factors identified through the factor screening study and system understanding, 50 different sets of design and flow conditions are selected using Latin Hypercube Sampling (LHS). The boundary condition sets are imposed on a 2-D axi-symmetrical CFD simulation model of the SST. In the framework, to degenerate the 2-D model structure, CFD model outputs are approximated by the 1-D model through the calibration of three different model structures for D. Correlation equations for the D parameter then are identified as a function of the selected design and flow boundary conditions (meta-models), and their accuracy is evaluated against D values estimated in each numerical experiment. The evaluation and validation of the iCFD model structure is carried out using scenario simulation results obtained with parameters sampled from the corners of the LHS experimental region. For the studied SST, additional iCFD model development was carried out in terms of (i) assessing different density current sub-models; (ii) implementation of a combined flocculation, hindered, transient and compression settling velocity function; and (iii) assessment of modelling the onset of transient and compression settling. Furthermore, the optimal level of model discretization both in 2-D and 1-D was undertaken. Results suggest that the iCFD model developed for the SST through the proposed methodology is able to predict solid distribution with high accuracy - taking a reasonable computational effort - when compared to multi-dimensional numerical experiments, under a wide range of flow and design conditions. iCFD tools could play a crucial role in reliably predicting systems' performance under normal and shock events. Copyright © 2015 Elsevier Ltd. All rights reserved.

  6. Estimation of a partially linear additive model for data from an outcome-dependent sampling design with a continuous outcome

    PubMed Central

    Tan, Ziwen; Qin, Guoyou; Zhou, Haibo

    2016-01-01

    Outcome-dependent sampling (ODS) designs have been well recognized as a cost-effective way to enhance study efficiency in both statistical literature and biomedical and epidemiologic studies. A partially linear additive model (PLAM) is widely applied in real problems because it allows for a flexible specification of the dependence of the response on some covariates in a linear fashion and other covariates in a nonlinear non-parametric fashion. Motivated by an epidemiological study investigating the effect of prenatal polychlorinated biphenyls exposure on children's intelligence quotient (IQ) at age 7 years, we propose a PLAM in this article to investigate a more flexible non-parametric inference on the relationships among the response and covariates under the ODS scheme. We propose the estimation method and establish the asymptotic properties of the proposed estimator. Simulation studies are conducted to show the improved efficiency of the proposed ODS estimator for PLAM compared with that from a traditional simple random sampling design with the same sample size. The data of the above-mentioned study is analyzed to illustrate the proposed method. PMID:27006375

  7. Forest statistics for Central Alabama counties

    Treesearch

    Arnold Hedlund; J.M. Earles

    1972-01-01

    This report tabulates information from a new forest inventory of counties in central Alabama. The tables are intended for use as source data in compiling estimates for groups of counties. Because the sampling procedure is designed primarily to furnish inventory data for the State as a whole, estimates for individual counties have limited and variable accuracy.

  8. Forest statistics for Southwest Alabama counties

    Treesearch

    Arnold Hedlund; J. M. Earles

    1972-01-01

    This report tabulates information from a new forest inventory of counties in southwestern Alabama. The tables are intended for sue as source data in compiling estimates for groups of counties. Because the sampling procedure is designed primarily to furnish inventory data for the State as a whole, estimates for individual counties have limited and variable accuracy....

  9. Forest statistics for North Alabama counties

    Treesearch

    Arnold Hedlund; J. M. Earles

    1972-01-01

    This report tabulates information from a new forest inventory of counties in northern Alabama. The tables are intended for use as a source data in compiling estimates for groups of counties. Because the sampling procedure is designed primarily to furnish inventory data for the state as a whole, estimates for individual counties have limited and variable accuracy.

  10. The Current State of Sustainability in Bioscience Laboratories: A Statistical Examination of a UK Tertiary Institute

    ERIC Educational Resources Information Center

    Wright, Hazel A.; Ironside, Joseph E.; Gwynn-Jones, Dylan

    2008-01-01

    Purpose: This study aims to identify the current barriers to sustainability in the bioscience laboratory setting and to determine which mechanisms are likely to increase sustainable behaviours in this specialised environment. Design/methodology/approach: The study gathers qualitative data from a sample of laboratory researchers presently…

  11. Analysis Tools (AT)

    Treesearch

    Larry J. Gangi

    2006-01-01

    The FIREMON Analysis Tools program is designed to let the user perform grouped or ungrouped summary calculations of single measurement plot data, or statistical comparisons of grouped or ungrouped plot data taken at different sampling periods. The program allows the user to create reports and graphs, save and print them, or cut and paste them into a word processor....

  12. A Monte Carlo Approach to Unidimensionality Testing in Polytomous Rasch Models

    ERIC Educational Resources Information Center

    Christensen, Karl Bang; Kreiner, Svend

    2007-01-01

    Many statistical tests are designed to test the different assumptions of the Rasch model, but only few are directed at detecting multidimensionality. The Martin-Lof test is an attractive approach, the disadvantage being that its null distribution deviates strongly from the asymptotic chi-square distribution for most realistic sample sizes. A Monte…

  13. Exploring the Use of Participatory Information to Improve Monitoring, Mapping and Assessment of Aquatic Ecosystem Services at Landascape Scales

    EPA Science Inventory

    Traditionally, the EPA has monitored aquatic ecosystems using statistically rigorous sample designs and intensive field efforts which provide high quality datasets. But by their nature they leave many aquatic systems unsampled, follow a top down approach, have a long lag between ...

  14. Effects of spatial allocation and parameter variability on lakewide estimates from surveys of Lake Superior, North America’s largest lake

    EPA Science Inventory

    Lake Superior was sampled in 2011 using a Generalized Random Tessellation Stratified design (n=54 sites) to characterize biological and chemical properties of this huge aquatic resource, with statistical confidence. The lake was divided into two strata (inshore <100m and offsh...

  15. Health Surveys Using Mobile Phones in Developing Countries: Automated Active Strata Monitoring and Other Statistical Considerations for Improving Precision and Reducing Biases

    PubMed Central

    Blynn, Emily; Ahmed, Saifuddin; Gibson, Dustin; Pariyo, George; Hyder, Adnan A

    2017-01-01

    In low- and middle-income countries (LMICs), historically, household surveys have been carried out by face-to-face interviews to collect survey data related to risk factors for noncommunicable diseases. The proliferation of mobile phone ownership and the access it provides in these countries offers a new opportunity to remotely conduct surveys with increased efficiency and reduced cost. However, the near-ubiquitous ownership of phones, high population mobility, and low cost require a re-examination of statistical recommendations for mobile phone surveys (MPS), especially when surveys are automated. As with landline surveys, random digit dialing remains the most appropriate approach to develop an ideal survey-sampling frame. Once the survey is complete, poststratification weights are generally applied to reduce estimate bias and to adjust for selectivity due to mobile ownership. Since weights increase design effects and reduce sampling efficiency, we introduce the concept of automated active strata monitoring to improve representativeness of the sample distribution to that of the source population. Although some statistical challenges remain, MPS represent a promising emerging means for population-level data collection in LMICs. PMID:28476726

  16. Conceptual Model of Clinical Governance Information System for Statistical Indicators by Using UML in Two Sample Hospitals

    PubMed Central

    Jeddi, Fatemeh Rangraz; Farzandipoor, Mehrdad; Arabfard, Masoud; Hosseini, Azam Haj Mohammad

    2016-01-01

    Objective: The purpose of this study was investigating situation and presenting a conceptual model for clinical governance information system by using UML in two sample hospitals. Background: However, use of information is one of the fundamental components of clinical governance; but unfortunately, it does not pay much attention to information management. Material and Methods: A cross sectional study was conducted in October 2012- May 2013. Data were gathered through questionnaires and interviews in two sample hospitals. Face and content validity of the questionnaire has been confirmed by experts. Data were collected from a pilot hospital and reforms were carried out and Final questionnaire was prepared. Data were analyzed by descriptive statistics and SPSS 16 software. Results: With the scenario derived from questionnaires, UML diagrams are presented by using Rational Rose 7 software. The results showed that 32.14 percent Indicators of the hospitals were calculated. Database was not designed and 100 percent of the hospital’s clinical governance was required to create a database. Conclusion: Clinical governance unit of hospitals to perform its mission, do not have access to all the needed indicators. Defining of Processes and drawing of models and creating of database are essential for designing of information systems. PMID:27147804

  17. Conceptual Model of Clinical Governance Information System for Statistical Indicators by Using UML in Two Sample Hospitals

    PubMed Central

    Jeddi, Fatemeh Rangraz; Farzandipoor, Mehrdad; Arabfard, Masoud; Hosseini, Azam Haj Mohammad

    2014-01-01

    Objective: The purpose of this study was investigating situation and presenting a conceptual model for clinical governance information system by using UML in two sample hospitals. Background: However, use of information is one of the fundamental components of clinical governance; but unfortunately, it does not pay much attention to information management. Material and Methods: A cross sectional study was conducted in October 2012- May 2013. Data were gathered through questionnaires and interviews in two sample hospitals. Face and content validity of the questionnaire has been confirmed by experts. Data were collected from a pilot hospital and reforms were carried out and Final questionnaire was prepared. Data were analyzed by descriptive statistics and SPSS 16 software. Results: With the scenario derived from questionnaires, UML diagrams are presented by using Rational Rose 7 software. The results showed that 32.14 percent Indicators of the hospitals were calculated. Database was not designed and 100 percent of the hospital’s clinical governance was required to create a database. Conclusion: Clinical governance unit of hospitals to perform its mission, do not have access to all the needed indicators. Defining of Processes and drawing of models and creating of database are essential for designing of information systems. PMID:24825933

  18. Microcrystallography using single-bounce monocapillary optics

    PubMed Central

    Gillilan, R. E.; Cook, M. J.; Cornaby, S. W.; Bilderback, D. H.

    2010-01-01

    X-ray microbeams have become increasingly valuable in protein crystallography. A number of synchrotron beamlines worldwide have adapted to handling smaller and more challenging samples by providing a combination of high-precision sample-positioning hardware, special visible-light optics for sample visualization, and small-diameter X-ray beams with low background scatter. Most commonly, X-ray microbeams with diameters ranging from 50 µm to 1 µm are produced by Kirkpatrick and Baez mirrors in combination with defining apertures and scatter guards. A simple alternative based on single-bounce glass monocapillary X-ray optics is presented. The basic capillary design considerations are discussed and a practical and robust implementation that capitalizes on existing beamline hardware is presented. A design for mounting the capillary is presented which eliminates parasitic scattering and reduces deformations of the optic to a degree suitable for use on next-generation X-ray sources. Comparison of diffraction data statistics for microcrystals using microbeam and conventional aperture-collimated beam shows that capillary-focused beam can deliver significant improvement. Statistics also confirm that the annular beam profile produced by the capillary optic does not impact data quality in an observable way. Examples are given of new structures recently solved using this technology. Single-bounce monocapillary optics can offer an attractive alternative for retrofitting existing beamlines for microcrystallography. PMID:20157276

  19. Assessment of sampling stability in ecological applications of discriminant analysis

    USGS Publications Warehouse

    Williams, B.K.; Titus, K.

    1988-01-01

    A simulation study was undertaken to assess the sampling stability of the variable loadings in linear discriminant function analysis. A factorial design was used for the factors of multivariate dimensionality, dispersion structure, configuration of group means, and sample size. A total of 32,400 discriminant analyses were conducted, based on data from simulated populations with appropriate underlying statistical distributions. A review of 60 published studies and 142 individual analyses indicated that sample sizes in ecological studies often have met that requirement. However, individual group sample sizes frequently were very unequal, and checks of assumptions usually were not reported. The authors recommend that ecologists obtain group sample sizes that are at least three times as large as the number of variables measured.

  20. Atmospheric turbulence power spectral measurements to long wavelengths for several meteorological conditions

    NASA Technical Reports Server (NTRS)

    Rhyne, R. H.; Murrow, H. N.; Sidwell, K.

    1976-01-01

    Use of power spectral design techniques for supersonic transports requires accurate definition of atmospheric turbulence in the long wavelength region below the knee of the power spectral density function curve. Examples are given of data obtained from a current turbulence flight sampling program. These samples are categorized as (1) convective, (2) wind shear, (3) rotor, and (4) mountain-wave turbulence. Time histories, altitudes, root-mean-square values, statistical degrees of freedom, power spectra, and integral scale values are shown and discussed.

  1. An experimental study of the temporal statistics of radio signals scattered by rain

    NASA Technical Reports Server (NTRS)

    Hubbard, R. W.; Hull, J. A.; Rice, P. L.; Wells, P. I.

    1973-01-01

    A fixed-beam bistatic CW experiment designed to measure the temporal statistics of the volume reflectivity produced by hydrometeors at several selected altitudes, scattering angles, and at two frequencies (3.6 and 7.8 GHz) is described. Surface rain gauge data, local meteorological data, surveillance S-band radar, and great-circle path propagation measurements were also made to describe the general weather and propagation conditions and to distinguish precipitation scatter signals from those caused by ducting and other nonhydrometeor scatter mechanisms. The data analysis procedures were designed to provide an assessment of a one-year sample of data with a time resolution of one minute. The cumulative distributions of the bistatic signals for all of the rainy minutes during this period are presented for the several path geometries.

  2. Critical appraisal of fundamental items in approved clinical trial research proposals in Mashhad University of Medical Sciences

    PubMed Central

    Shakeri, Mohammad-Taghi; Taghipour, Ali; Sadeghi, Masoumeh; Nezami, Hossein; Amirabadizadeh, Ali-Reza; Bonakchi, Hossein

    2017-01-01

    Background: Writing, designing, and conducting a clinical trial research proposal has an important role in achieving valid and reliable findings. Thus, this study aimed at critically appraising fundamental information in approved clinical trial research proposals in Mashhad University of Medical Sciences (MUMS) from 2008 to 2014. Methods: This cross-sectional study was conducted on all 935 approved clinical trial research proposals in MUMS from 2008 to 2014. A valid and reliable as well as comprehensive, simple, and usable checklist in sessions with biostatisticians and methodologists, consisting of 11 main items as research tool, were used. Agreement rate between the reviewers of the proposals, who were responsible for data collection, was assessed during 3 sessions, and Kappa statistics was calculated at the last session as 97%. Results: More than 60% of the research proposals had a methodologist consultant, moreover, type of study or study design had been specified in almost all of them (98%). Appropriateness of study aims with hypotheses was not observed in a significant number of research proposals (585 proposals, 62.6%). The required sample size for 66.8% of the approved proposals was based on a sample size formula; however, in 25% of the proposals, sample size formula was not in accordance with the study design. Data collection tool was not selected appropriately in 55.2% of the approved research proposals. Type and method of randomization were unknown in 21% of the proposals and dealing with missing data had not been described in most of them (98%). Inclusion and exclusion criteria were (92%) fully and adequately explained. Moreover, 44% and 31% of the research proposals were moderate and weak in rank, respectively, with respect to the correctness of the statistical analysis methods. Conclusion: Findings of the present study revealed that a large portion of the approved proposals were highly biased or ambiguous with respect to randomization, blinding, dealing with missing data, data collection tool, sampling methods, and statistical analysis. Thus, it is essential to consult and collaborate with a methodologist in all parts of a proposal to control the possible and specific biases in clinical trials. PMID:29445703

  3. Critical appraisal of fundamental items in approved clinical trial research proposals in Mashhad University of Medical Sciences.

    PubMed

    Shakeri, Mohammad-Taghi; Taghipour, Ali; Sadeghi, Masoumeh; Nezami, Hossein; Amirabadizadeh, Ali-Reza; Bonakchi, Hossein

    2017-01-01

    Background: Writing, designing, and conducting a clinical trial research proposal has an important role in achieving valid and reliable findings. Thus, this study aimed at critically appraising fundamental information in approved clinical trial research proposals in Mashhad University of Medical Sciences (MUMS) from 2008 to 2014. Methods: This cross-sectional study was conducted on all 935 approved clinical trial research proposals in MUMS from 2008 to 2014. A valid and reliable as well as comprehensive, simple, and usable checklist in sessions with biostatisticians and methodologists, consisting of 11 main items as research tool, were used. Agreement rate between the reviewers of the proposals, who were responsible for data collection, was assessed during 3 sessions, and Kappa statistics was calculated at the last session as 97%. Results: More than 60% of the research proposals had a methodologist consultant, moreover, type of study or study design had been specified in almost all of them (98%). Appropriateness of study aims with hypotheses was not observed in a significant number of research proposals (585 proposals, 62.6%). The required sample size for 66.8% of the approved proposals was based on a sample size formula; however, in 25% of the proposals, sample size formula was not in accordance with the study design. Data collection tool was not selected appropriately in 55.2% of the approved research proposals. Type and method of randomization were unknown in 21% of the proposals and dealing with missing data had not been described in most of them (98%). Inclusion and exclusion criteria were (92%) fully and adequately explained. Moreover, 44% and 31% of the research proposals were moderate and weak in rank, respectively, with respect to the correctness of the statistical analysis methods. Conclusion: Findings of the present study revealed that a large portion of the approved proposals were highly biased or ambiguous with respect to randomization, blinding, dealing with missing data, data collection tool, sampling methods, and statistical analysis. Thus, it is essential to consult and collaborate with a methodologist in all parts of a proposal to control the possible and specific biases in clinical trials.

  4. Sample size determination for mediation analysis of longitudinal data.

    PubMed

    Pan, Haitao; Liu, Suyu; Miao, Danmin; Yuan, Ying

    2018-03-27

    Sample size planning for longitudinal data is crucial when designing mediation studies because sufficient statistical power is not only required in grant applications and peer-reviewed publications, but is essential to reliable research results. However, sample size determination is not straightforward for mediation analysis of longitudinal design. To facilitate planning the sample size for longitudinal mediation studies with a multilevel mediation model, this article provides the sample size required to achieve 80% power by simulations under various sizes of the mediation effect, within-subject correlations and numbers of repeated measures. The sample size calculation is based on three commonly used mediation tests: Sobel's method, distribution of product method and the bootstrap method. Among the three methods of testing the mediation effects, Sobel's method required the largest sample size to achieve 80% power. Bootstrapping and the distribution of the product method performed similarly and were more powerful than Sobel's method, as reflected by the relatively smaller sample sizes. For all three methods, the sample size required to achieve 80% power depended on the value of the ICC (i.e., within-subject correlation). A larger value of ICC typically required a larger sample size to achieve 80% power. Simulation results also illustrated the advantage of the longitudinal study design. The sample size tables for most encountered scenarios in practice have also been published for convenient use. Extensive simulations study showed that the distribution of the product method and bootstrapping method have superior performance to the Sobel's method, but the product method was recommended to use in practice in terms of less computation time load compared to the bootstrapping method. A R package has been developed for the product method of sample size determination in mediation longitudinal study design.

  5. Ever Enrolled Medicare Population Estimates from the MCBS Access to Care Files

    PubMed Central

    Petroski, Jason; Ferraro, David; Chu, Adam

    2014-01-01

    Objective The Medicare Current Beneficiary Survey’s (MCBS) Access to Care (ATC) file is designed to provide timely access to information on the Medicare population, yet because of the survey’s complex sampling design and expedited processing it is difficult to use the file to make both “always-enrolled” and “ever-enrolled” estimates on the Medicare population. In this study, we describe the ATC file and sample design, and we evaluate and review various alternatives for producing “ever-enrolled” estimates. Methods We created “ever enrolled” estimates for key variables in the MCBS using three separate approaches. We tested differences between the alternative approaches for statistical significance and show the relative magnitude of difference between approaches. Results Even when estimates derived from the different approaches were statistically different, the magnitude of the difference was often sufficiently small so as to result in little practical difference among the alternate approaches. However, when considering more than just the estimation method, there are advantages to using certain approaches over others. Conclusion There are several plausible approaches to achieving “ever-enrolled” estimates in the MCBS ATC file; however, the most straightforward approach appears to be implementation and usage of a new set of “ever-enrolled” weights for this file. PMID:24991484

  6. An efficient algorithm for generating diverse microstructure sets and delineating properties closures

    DOE PAGES

    Johnson, Oliver K.; Kurniawan, Christian

    2018-02-03

    Properties closures delineate the theoretical objective space for materials design problems, allowing designers to make informed trade-offs between competing constraints and target properties. In this paper, we present a new algorithm called hierarchical simplex sampling (HSS) that approximates properties closures more efficiently and faithfully than traditional optimization based approaches. By construction, HSS generates samples of microstructure statistics that span the corresponding microstructure hull. As a result, we also find that HSS can be coupled with synthetic polycrystal generation software to generate diverse sets of microstructures for subsequent mesoscale simulations. Finally, by more broadly sampling the space of possible microstructures, itmore » is anticipated that such diverse microstructure sets will expand our understanding of the influence of microstructure on macroscale effective properties and inform the construction of higher-fidelity mesoscale structure-property models.« less

  7. An efficient algorithm for generating diverse microstructure sets and delineating properties closures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Johnson, Oliver K.; Kurniawan, Christian

    Properties closures delineate the theoretical objective space for materials design problems, allowing designers to make informed trade-offs between competing constraints and target properties. In this paper, we present a new algorithm called hierarchical simplex sampling (HSS) that approximates properties closures more efficiently and faithfully than traditional optimization based approaches. By construction, HSS generates samples of microstructure statistics that span the corresponding microstructure hull. As a result, we also find that HSS can be coupled with synthetic polycrystal generation software to generate diverse sets of microstructures for subsequent mesoscale simulations. Finally, by more broadly sampling the space of possible microstructures, itmore » is anticipated that such diverse microstructure sets will expand our understanding of the influence of microstructure on macroscale effective properties and inform the construction of higher-fidelity mesoscale structure-property models.« less

  8. Effects of sample size, number of markers, and allelic richness on the detection of spatial genetic pattern

    USGS Publications Warehouse

    Landguth, Erin L.; Gedy, Bradley C.; Oyler-McCance, Sara J.; Garey, Andrew L.; Emel, Sarah L.; Mumma, Matthew; Wagner, Helene H.; Fortin, Marie-Josée; Cushman, Samuel A.

    2012-01-01

    The influence of study design on the ability to detect the effects of landscape pattern on gene flow is one of the most pressing methodological gaps in landscape genetic research. To investigate the effect of study design on landscape genetics inference, we used a spatially-explicit, individual-based program to simulate gene flow in a spatially continuous population inhabiting a landscape with gradual spatial changes in resistance to movement. We simulated a wide range of combinations of number of loci, number of alleles per locus and number of individuals sampled from the population. We assessed how these three aspects of study design influenced the statistical power to successfully identify the generating process among competing hypotheses of isolation-by-distance, isolation-by-barrier, and isolation-by-landscape resistance using a causal modelling approach with partial Mantel tests. We modelled the statistical power to identify the generating process as a response surface for equilibrium and non-equilibrium conditions after introduction of isolation-by-landscape resistance. All three variables (loci, alleles and sampled individuals) affect the power of causal modelling, but to different degrees. Stronger partial Mantel r correlations between landscape distances and genetic distances were found when more loci were used and when loci were more variable, which makes comparisons of effect size between studies difficult. Number of individuals did not affect the accuracy through mean equilibrium partial Mantel r, but larger samples decreased the uncertainty (increasing the precision) of equilibrium partial Mantel r estimates. We conclude that amplifying more (and more variable) loci is likely to increase the power of landscape genetic inferences more than increasing number of individuals.

  9. Effects of sample size, number of markers, and allelic richness on the detection of spatial genetic pattern

    USGS Publications Warehouse

    Landguth, E.L.; Fedy, B.C.; Oyler-McCance, S.J.; Garey, A.L.; Emel, S.L.; Mumma, M.; Wagner, H.H.; Fortin, M.-J.; Cushman, S.A.

    2012-01-01

    The influence of study design on the ability to detect the effects of landscape pattern on gene flow is one of the most pressing methodological gaps in landscape genetic research. To investigate the effect of study design on landscape genetics inference, we used a spatially-explicit, individual-based program to simulate gene flow in a spatially continuous population inhabiting a landscape with gradual spatial changes in resistance to movement. We simulated a wide range of combinations of number of loci, number of alleles per locus and number of individuals sampled from the population. We assessed how these three aspects of study design influenced the statistical power to successfully identify the generating process among competing hypotheses of isolation-by-distance, isolation-by-barrier, and isolation-by-landscape resistance using a causal modelling approach with partial Mantel tests. We modelled the statistical power to identify the generating process as a response surface for equilibrium and non-equilibrium conditions after introduction of isolation-by-landscape resistance. All three variables (loci, alleles and sampled individuals) affect the power of causal modelling, but to different degrees. Stronger partial Mantel r correlations between landscape distances and genetic distances were found when more loci were used and when loci were more variable, which makes comparisons of effect size between studies difficult. Number of individuals did not affect the accuracy through mean equilibrium partial Mantel r, but larger samples decreased the uncertainty (increasing the precision) of equilibrium partial Mantel r estimates. We conclude that amplifying more (and more variable) loci is likely to increase the power of landscape genetic inferences more than increasing number of individuals. ?? 2011 Blackwell Publishing Ltd.

  10. Geographic and temporal validity of prediction models: Different approaches were useful to examine model performance

    PubMed Central

    Austin, Peter C.; van Klaveren, David; Vergouwe, Yvonne; Nieboer, Daan; Lee, Douglas S.; Steyerberg, Ewout W.

    2017-01-01

    Objective Validation of clinical prediction models traditionally refers to the assessment of model performance in new patients. We studied different approaches to geographic and temporal validation in the setting of multicenter data from two time periods. Study Design and Setting We illustrated different analytic methods for validation using a sample of 14,857 patients hospitalized with heart failure at 90 hospitals in two distinct time periods. Bootstrap resampling was used to assess internal validity. Meta-analytic methods were used to assess geographic transportability. Each hospital was used once as a validation sample, with the remaining hospitals used for model derivation. Hospital-specific estimates of discrimination (c-statistic) and calibration (calibration intercepts and slopes) were pooled using random effects meta-analysis methods. I2 statistics and prediction interval width quantified geographic transportability. Temporal transportability was assessed using patients from the earlier period for model derivation and patients from the later period for model validation. Results Estimates of reproducibility, pooled hospital-specific performance, and temporal transportability were on average very similar, with c-statistics of 0.75. Between-hospital variation was moderate according to I2 statistics and prediction intervals for c-statistics. Conclusion This study illustrates how performance of prediction models can be assessed in settings with multicenter data at different time periods. PMID:27262237

  11. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gilbert, Richard O.

    The application of statistics to environmental pollution monitoring studies requires a knowledge of statistical analysis methods particularly well suited to pollution data. This book fills that need by providing sampling plans, statistical tests, parameter estimation procedure techniques, and references to pertinent publications. Most of the statistical techniques are relatively simple, and examples, exercises, and case studies are provided to illustrate procedures. The book is logically divided into three parts. Chapters 1, 2, and 3 are introductory chapters. Chapters 4 through 10 discuss field sampling designs and Chapters 11 through 18 deal with a broad range of statistical analysis procedures. Somemore » statistical techniques given here are not commonly seen in statistics book. For example, see methods for handling correlated data (Sections 4.5 and 11.12), for detecting hot spots (Chapter 10), and for estimating a confidence interval for the mean of a lognormal distribution (Section 13.2). Also, Appendix B lists a computer code that estimates and tests for trends over time at one or more monitoring stations using nonparametric methods (Chapters 16 and 17). Unfortunately, some important topics could not be included because of their complexity and the need to limit the length of the book. For example, only brief mention could be made of time series analysis using Box-Jenkins methods and of kriging techniques for estimating spatial and spatial-time patterns of pollution, although multiple references on these topics are provided. Also, no discussion of methods for assessing risks from environmental pollution could be included.« less

  12. A Bayesian nonparametric method for prediction in EST analysis

    PubMed Central

    Lijoi, Antonio; Mena, Ramsés H; Prünster, Igor

    2007-01-01

    Background Expressed sequence tags (ESTs) analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library. Results In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a) the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b) the number of new unique genes to be observed in a future sample; c) the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail. Conclusion The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample. PMID:17868445

  13. Sampling design for spatially distributed hydrogeologic and environmental processes

    USGS Publications Warehouse

    Christakos, G.; Olea, R.A.

    1992-01-01

    A methodology for the design of sampling networks over space is proposed. The methodology is based on spatial random field representations of nonhomogeneous natural processes, and on optimal spatial estimation techniques. One of the most important results of random field theory for physical sciences is its rationalization of correlations in spatial variability of natural processes. This correlation is extremely important both for interpreting spatially distributed observations and for predictive performance. The extent of site sampling and the types of data to be collected will depend on the relationship of subsurface variability to predictive uncertainty. While hypothesis formulation and initial identification of spatial variability characteristics are based on scientific understanding (such as knowledge of the physics of the underlying phenomena, geological interpretations, intuition and experience), the support offered by field data is statistically modelled. This model is not limited by the geometric nature of sampling and covers a wide range in subsurface uncertainties. A factorization scheme of the sampling error variance is derived, which possesses certain atttactive properties allowing significant savings in computations. By means of this scheme, a practical sampling design procedure providing suitable indices of the sampling error variance is established. These indices can be used by way of multiobjective decision criteria to obtain the best sampling strategy. Neither the actual implementation of the in-situ sampling nor the solution of the large spatial estimation systems of equations are necessary. The required values of the accuracy parameters involved in the network design are derived using reference charts (readily available for various combinations of data configurations and spatial variability parameters) and certain simple yet accurate analytical formulas. Insight is gained by applying the proposed sampling procedure to realistic examples related to sampling problems in two dimensions. ?? 1992.

  14. Methodological reporting of randomized trials in five leading Chinese nursing journals.

    PubMed

    Shi, Chunhu; Tian, Jinhui; Ren, Dan; Wei, Hongli; Zhang, Lihuan; Wang, Quan; Yang, Kehu

    2014-01-01

    Randomized controlled trials (RCTs) are not always well reported, especially in terms of their methodological descriptions. This study aimed to investigate the adherence of methodological reporting complying with CONSORT and explore associated trial level variables in the Chinese nursing care field. In June 2012, we identified RCTs published in five leading Chinese nursing journals and included trials with details of randomized methods. The quality of methodological reporting was measured through the methods section of the CONSORT checklist and the overall CONSORT methodological items score was calculated and expressed as a percentage. Meanwhile, we hypothesized that some general and methodological characteristics were associated with reporting quality and conducted a regression with these data to explore the correlation. The descriptive and regression statistics were calculated via SPSS 13.0. In total, 680 RCTs were included. The overall CONSORT methodological items score was 6.34 ± 0.97 (Mean ± SD). No RCT reported descriptions and changes in "trial design," changes in "outcomes" and "implementation," or descriptions of the similarity of interventions for "blinding." Poor reporting was found in detailing the "settings of participants" (13.1%), "type of randomization sequence generation" (1.8%), calculation methods of "sample size" (0.4%), explanation of any interim analyses and stopping guidelines for "sample size" (0.3%), "allocation concealment mechanism" (0.3%), additional analyses in "statistical methods" (2.1%), and targeted subjects and methods of "blinding" (5.9%). More than 50% of trials described randomization sequence generation, the eligibility criteria of "participants," "interventions," and definitions of the "outcomes" and "statistical methods." The regression analysis found that publication year and ITT analysis were weakly associated with CONSORT score. The completeness of methodological reporting of RCTs in the Chinese nursing care field is poor, especially with regard to the reporting of trial design, changes in outcomes, sample size calculation, allocation concealment, blinding, and statistical methods.

  15. Flow Chamber System for the Statistical Evaluation of Bacterial Colonization on Materials

    PubMed Central

    Menzel, Friederike; Conradi, Bianca; Rodenacker, Karsten; Gorbushina, Anna A.; Schwibbert, Karin

    2016-01-01

    Biofilm formation on materials leads to high costs in industrial processes, as well as in medical applications. This fact has stimulated interest in the development of new materials with improved surfaces to reduce bacterial colonization. Standardized tests relying on statistical evidence are indispensable to evaluate the quality and safety of these new materials. We describe here a flow chamber system for biofilm cultivation under controlled conditions with a total capacity for testing up to 32 samples in parallel. In order to quantify the surface colonization, bacterial cells were DAPI (4`,6-diamidino-2-phenylindole)-stained and examined with epifluorescence microscopy. More than 100 images of each sample were automatically taken and the surface coverage was estimated using the free open source software g’mic, followed by a precise statistical evaluation. Overview images of all gathered pictures were generated to dissect the colonization characteristics of the selected model organism Escherichia coli W3310 on different materials (glass and implant steel). With our approach, differences in bacterial colonization on different materials can be quantified in a statistically validated manner. This reliable test procedure will support the design of improved materials for medical, industrial, and environmental (subaquatic or subaerial) applications. PMID:28773891

  16. Disentangling Vulnerabilities from Outcomes: Distinctions between Trait Affect and Depressive Symptoms in Adolescent and Adult Samples

    PubMed Central

    Harding, Kaitlin A.; Willey, Brittany; Ahles, Joshua; Mezulis, Amy

    2016-01-01

    Background Trait negative affect and trait positive affect are affective vulnerabilities to depressive symptoms in adolescence and adulthood. While trait affect and the state affect characteristic of depressive symptoms are proposed to be theoretically distinct, no studies have established that these constructs are statistically distinct. Therefore, the purpose of the current study was to determine whether the trait affect (e.g. temperament dimensions) that predicts depressive symptoms and the state affect characteristic of depressive symptoms are statistically distinct among early adolescents and adults. We hypothesized that trait negative affect, trait positive affect, and depressive symptoms would represent largely distinct factors in both samples. Method Participants were 268 early adolescents (53.73% female) and 321 young adults (70.09% female) who completed self-report measures of demographic information, trait affect, and depressive symptoms. Results Principal axis factoring with oblique rotation for both samples indicated distinct adolescent factor loadings and overlapping adult factor loadings. Confirmatory factor analyses in both samples supported distinct but related relationships between trait NA, trait PA, and depressive symptoms. Limitations Study limitations include our cross-sectional design that prevented examination of self-reported fluctuations in trait affect and depressive symptoms and the unknown potential effects of self-report biases among adolescents and adults. Conclusions Findings support existing theoretical distinctions between adolescent constructs but highlight a need to revise or remove items to distinguish measurements of adult trait affect and depressive symptoms. Adolescent trait affect and depressive symptoms are statistically distinct, but adult trait affect and depressive symptoms statistically overlap and warrant further consideration. PMID:27085163

  17. Experiments with central-limit properties of spatial samples from locally covariant random fields

    USGS Publications Warehouse

    Barringer, T.H.; Smith, T.E.

    1992-01-01

    When spatial samples are statistically dependent, the classical estimator of sample-mean standard deviation is well known to be inconsistent. For locally dependent samples, however, consistent estimators of sample-mean standard deviation can be constructed. The present paper investigates the sampling properties of one such estimator, designated as the tau estimator of sample-mean standard deviation. In particular, the asymptotic normality properties of standardized sample means based on tau estimators are studied in terms of computer experiments with simulated sample-mean distributions. The effects of both sample size and dependency levels among samples are examined for various value of tau (denoting the size of the spatial kernel for the estimator). The results suggest that even for small degrees of spatial dependency, the tau estimator exhibits significantly stronger normality properties than does the classical estimator of standardized sample means. ?? 1992.

  18. Statistical modelling as an aid to the design of retail sampling plans for mycotoxins in food.

    PubMed

    MacArthur, Roy; MacDonald, Susan; Brereton, Paul; Murray, Alistair

    2006-01-01

    A study has been carried out to assess appropriate statistical models for use in evaluating retail sampling plans for the determination of mycotoxins in food. A compound gamma model was found to be a suitable fit. A simulation model based on the compound gamma model was used to produce operating characteristic curves for a range of parameters relevant to retail sampling. The model was also used to estimate the minimum number of increments necessary to minimize the overall measurement uncertainty. Simulation results showed that measurements based on retail samples (for which the maximum number of increments is constrained by cost) may produce fit-for-purpose results for the measurement of ochratoxin A in dried fruit, but are unlikely to do so for the measurement of aflatoxin B1 in pistachio nuts. In order to produce a more accurate simulation, further work is required to determine the degree of heterogeneity associated with batches of food products. With appropriate parameterization in terms of physical and biological characteristics, the systems developed in this study could be applied to other analyte/matrix combinations.

  19. Gender differences in a clinic-referred sample of Taiwanese attention-deficit/hyperactivity disorder children.

    PubMed

    Yang, Pinchen; Jong, Yuh-Jyh; Chung, Li-Chen; Chen, Cheng-Sheng

    2004-12-01

    The purpose of this study was to examine gender differences within a clinic-referred sample of 6-11-year-old Taiwanese children with attention-deficit/hyperactivity disorder (ADHD)- combined subtype. The subjects were 21 girls with a diagnosis of ADHD from the 4th edition of the Diagnostic and Statistical Manual and 21 age-matched boys with ADHD. Comparisons were made of behavioral ratings, cognitive profiles, and vigilance/attention assessments between these two groups. The results found ADHD girls and ADHD boys to be statistically indistinguishable on nearly all measures except the subtests of block design (P = 0.016), the discrepancy between Performance Intelligence Quotient and Verbal Intelligence Quotient (P = 0.019), and the discrepancy between fluid and crystallized IQ (P = 0.041). In the study samples, ADHD girls and ADHD boys were strikingly similar on a wide range of measures. ADHD boys and girls in clinics may be expected to show more similarities than differences in treatment needs. However, these results should be interpreted with caution since data were only from clinic-referred samples.

  20. Teaching Efficacy in the Classroom: Skill Based Training for Teachers' Empowerment

    ERIC Educational Resources Information Center

    Karimzadeh, Mansoureh; Salehi, Hadi; Embi, Mohamed Amin; Nasiri, Mehdi; Shojaee, Mohammad

    2014-01-01

    This study aims to use an experimental research design to enhance teaching efficacy by social-emotional skills training in teachers. The statistical sample comprised of 68 elementary teachers (grades 4 and 5) with at least 10 years teaching experience and a bachelor's degree who were randomly assigned into control (18 female, 16 male) and…

  1. Detecting Mixtures from Structural Model Differences Using Latent Variable Mixture Modeling: A Comparison of Relative Model Fit Statistics

    ERIC Educational Resources Information Center

    Henson, James M.; Reise, Steven P.; Kim, Kevin H.

    2007-01-01

    The accuracy of structural model parameter estimates in latent variable mixture modeling was explored with a 3 (sample size) [times] 3 (exogenous latent mean difference) [times] 3 (endogenous latent mean difference) [times] 3 (correlation between factors) [times] 3 (mixture proportions) factorial design. In addition, the efficacy of several…

  2. THEMATIC ACCURACY OF THE 1992 NATIONAL LAND-COVER DATA (NLCD) FOR THE EASTERN UNITED STATES: STATISTICAL METHODOLOGY AND REGIONAL RESULTS

    EPA Science Inventory

    The accuracy of the National Land Cover Data (NLCD) map is assessed via a probability sampling design incorporating three levels of stratification and two stages of selection. Agreement between the map and reference land-cover labels is defined as a match between the primary or a...

  3. Perceptions of Support, Induction, and Intentions by Secondary Science and Mathematics Teachers on Job Retention

    ERIC Educational Resources Information Center

    Bond, Sharon C.

    2012-01-01

    This study was designed to examine the teacher characteristics, workplace factors, and type of induction supports that contribute to the retention of secondary science and mathematics teachers. Using the sample of secondary science and mathematics teachers extracted from the National Center for Educational Statistics (NCES) 2007-2008 Schools and…

  4. Alternative Natural Resource Monitoring Strategies in the Mexican States of Jalisco and Colima

    Treesearch

    Cele Aguirre-Bravo; Hans Schreuder

    2005-01-01

    This paper presents a strategy for inventorying and monitoring the natural resources in the Mexican states of Jalisco and Colima. The strategy emphasizes a strong linkage between remote sensing with field sampling design to produce statistical summaries and spatial estimates at multiple scales and resolution levels. Outputs derived from this strategy are expected to...

  5. Fusion And Inference From Multiple And Massive Disparate Distributed Dynamic Data Sets

    DTIC Science & Technology

    2017-07-01

    principled methodology for two-sample graph testing; designed a provably almost-surely perfect vertex clustering algorithm for block model graphs; proved...3.7 Semi-Supervised Clustering Methodology ...................................................................... 9 3.8 Robust Hypothesis Testing...dimensional Euclidean space – allows the full arsenal of statistical and machine learning methodology for multivariate Euclidean data to be deployed for

  6. Women's Liberation Scale (WLS): A Measure of Attitudes Toward Positions Advocated by Women's Groups.

    ERIC Educational Resources Information Center

    Goldberg, Carlos

    The Women's Liberation Scale (WLS) is a 14-item, Likert-type scale designed to measure attitudes toward positions advocated by women's groups. The WLS and its four-alternative response schema is presented, along with descriptive statistics of scores based on male and female college samples. Reliability and validity measures are reported, and the…

  7. Optimal Sample Size Determinations for the Heteroscedastic Two One-Sided Tests of Mean Equivalence: Design Schemes and Software Implementations

    ERIC Educational Resources Information Center

    Jan, Show-Li; Shieh, Gwowen

    2017-01-01

    Equivalence assessment is becoming an increasingly important topic in many application areas including behavioral and social sciences research. Although there exist more powerful tests, the two one-sided tests (TOST) procedure is a technically transparent and widely accepted method for establishing statistical equivalence. Alternatively, a direct…

  8. Randomness, Sample Size, Imagination and Metacognition: Making Judgments about Differences in Data Sets

    ERIC Educational Resources Information Center

    Stack, Sue; Watson, Jane

    2013-01-01

    There is considerable research on the difficulties students have in conceptualising individual concepts of probability and statistics (see for example, Bryant & Nunes, 2012; Jones, 2005). The unit of work developed for the action research project described in this article is specifically designed to address some of these in order to help…

  9. Indicator organisms in meat and poultry slaughter operations: their potential use in process control and the role of emerging technologies.

    PubMed

    Saini, Parmesh K; Marks, Harry M; Dreyfuss, Moshe S; Evans, Peter; Cook, L Victor; Dessai, Uday

    2011-08-01

    Measuring commonly occurring, nonpathogenic organisms on poultry products may be used for designing statistical process control systems that could result in reductions of pathogen levels. The extent of pathogen level reduction that could be obtained from actions resulting from monitoring these measurements over time depends upon the degree of understanding cause-effect relationships between processing variables, selected output variables, and pathogens. For such measurements to be effective for controlling or improving processing to some capability level within the statistical process control context, sufficiently frequent measurements would be needed to help identify processing deficiencies. Ultimately the correct balance of sampling and resources is determined by those characteristics of deficient processing that are important to identify. We recommend strategies that emphasize flexibility, depending upon sampling objectives. Coupling the measurement of levels of indicator organisms with practical emerging technologies and suitable on-site platforms that decrease the time between sample collections and interpreting results would enhance monitoring process control.

  10. [Application of statistics on chronic-diseases-relating observational research papers].

    PubMed

    Hong, Zhi-heng; Wang, Ping; Cao, Wei-hua

    2012-09-01

    To study the application of statistics on Chronic-diseases-relating observational research papers which were recently published in the Chinese Medical Association Magazines, with influential index above 0.5. Using a self-developed criterion, two investigators individually participated in assessing the application of statistics on Chinese Medical Association Magazines, with influential index above 0.5. Different opinions reached an agreement through discussion. A total number of 352 papers from 6 magazines, including the Chinese Journal of Epidemiology, Chinese Journal of Oncology, Chinese Journal of Preventive Medicine, Chinese Journal of Cardiology, Chinese Journal of Internal Medicine and Chinese Journal of Endocrinology and Metabolism, were reviewed. The rate of clear statement on the following contents as: research objectives, t target audience, sample issues, objective inclusion criteria and variable definitions were 99.43%, 98.57%, 95.43%, 92.86% and 96.87%. The correct rates of description on quantitative and qualitative data were 90.94% and 91.46%, respectively. The rates on correctly expressing the results, on statistical inference methods related to quantitative, qualitative data and modeling were 100%, 95.32% and 87.19%, respectively. 89.49% of the conclusions could directly response to the research objectives. However, 69.60% of the papers did not mention the exact names of the study design, statistically, that the papers were using. 11.14% of the papers were in lack of further statement on the exclusion criteria. Percentage of the papers that could clearly explain the sample size estimation only taking up as 5.16%. Only 24.21% of the papers clearly described the variable value assignment. Regarding the introduction on statistical conduction and on database methods, the rate was only 24.15%. 18.75% of the papers did not express the statistical inference methods sufficiently. A quarter of the papers did not use 'standardization' appropriately. As for the aspect of statistical inference, the rate of description on statistical testing prerequisite was only 24.12% while 9.94% papers did not even employ the statistical inferential method that should be used. The main deficiencies on the application of Statistics used in papers related to Chronic-diseases-related observational research were as follows: lack of sample-size determination, variable value assignment description not sufficient, methods on statistics were not introduced clearly or properly, lack of consideration for pre-requisition regarding the use of statistical inferences.

  11. Sampling design for long-term regional trends in marine rocky intertidal communities

    USGS Publications Warehouse

    Irvine, Gail V.; Shelley, Alice

    2013-01-01

    Probability-based designs reduce bias and allow inference of results to the pool of sites from which they were chosen. We developed and tested probability-based designs for monitoring marine rocky intertidal assemblages at Glacier Bay National Park and Preserve (GLBA), Alaska. A multilevel design was used that varied in scale and inference. The levels included aerial surveys, extensive sampling of 25 sites, and more intensive sampling of 6 sites. Aerial surveys of a subset of intertidal habitat indicated that the original target habitat of bedrock-dominated sites with slope ≤30° was rare. This unexpected finding illustrated one value of probability-based surveys and led to a shift in the target habitat type to include steeper, more mixed rocky habitat. Subsequently, we evaluated the statistical power of different sampling methods and sampling strategies to detect changes in the abundances of the predominant sessile intertidal taxa: barnacles Balanomorpha, the mussel Mytilus trossulus, and the rockweed Fucus distichus subsp. evanescens. There was greatest power to detect trends in Mytilus and lesser power for barnacles and Fucus. Because of its greater power, the extensive, coarse-grained sampling scheme was adopted in subsequent years over the intensive, fine-grained scheme. The sampling attributes that had the largest effects on power included sampling of “vertical” line transects (vs. horizontal line transects or quadrats) and increasing the number of sites. We also evaluated the power of several management-set parameters. Given equal sampling effort, sampling more sites fewer times had greater power. The information gained through intertidal monitoring is likely to be useful in assessing changes due to climate, including ocean acidification; invasive species; trampling effects; and oil spills.

  12. Network Sampling with Memory: A proposal for more efficient sampling from social networks.

    PubMed

    Mouw, Ted; Verdery, Ashton M

    2012-08-01

    Techniques for sampling from networks have grown into an important area of research across several fields. For sociologists, the possibility of sampling from a network is appealing for two reasons: (1) A network sample can yield substantively interesting data about network structures and social interactions, and (2) it is useful in situations where study populations are difficult or impossible to survey with traditional sampling approaches because of the lack of a sampling frame. Despite its appeal, methodological concerns about the precision and accuracy of network-based sampling methods remain. In particular, recent research has shown that sampling from a network using a random walk based approach such as Respondent Driven Sampling (RDS) can result in high design effects (DE)-the ratio of the sampling variance to the sampling variance of simple random sampling (SRS). A high design effect means that more cases must be collected to achieve the same level of precision as SRS. In this paper we propose an alternative strategy, Network Sampling with Memory (NSM), which collects network data from respondents in order to reduce design effects and, correspondingly, the number of interviews needed to achieve a given level of statistical power. NSM combines a "List" mode, where all individuals on the revealed network list are sampled with the same cumulative probability, with a "Search" mode, which gives priority to bridge nodes connecting the current sample to unexplored parts of the network. We test the relative efficiency of NSM compared to RDS and SRS on 162 school and university networks from Add Health and Facebook that range in size from 110 to 16,278 nodes. The results show that the average design effect for NSM on these 162 networks is 1.16, which is very close to the efficiency of a simple random sample (DE=1), and 98.5% lower than the average DE we observed for RDS.

  13. Network Sampling with Memory: A proposal for more efficient sampling from social networks

    PubMed Central

    Mouw, Ted; Verdery, Ashton M.

    2013-01-01

    Techniques for sampling from networks have grown into an important area of research across several fields. For sociologists, the possibility of sampling from a network is appealing for two reasons: (1) A network sample can yield substantively interesting data about network structures and social interactions, and (2) it is useful in situations where study populations are difficult or impossible to survey with traditional sampling approaches because of the lack of a sampling frame. Despite its appeal, methodological concerns about the precision and accuracy of network-based sampling methods remain. In particular, recent research has shown that sampling from a network using a random walk based approach such as Respondent Driven Sampling (RDS) can result in high design effects (DE)—the ratio of the sampling variance to the sampling variance of simple random sampling (SRS). A high design effect means that more cases must be collected to achieve the same level of precision as SRS. In this paper we propose an alternative strategy, Network Sampling with Memory (NSM), which collects network data from respondents in order to reduce design effects and, correspondingly, the number of interviews needed to achieve a given level of statistical power. NSM combines a “List” mode, where all individuals on the revealed network list are sampled with the same cumulative probability, with a “Search” mode, which gives priority to bridge nodes connecting the current sample to unexplored parts of the network. We test the relative efficiency of NSM compared to RDS and SRS on 162 school and university networks from Add Health and Facebook that range in size from 110 to 16,278 nodes. The results show that the average design effect for NSM on these 162 networks is 1.16, which is very close to the efficiency of a simple random sample (DE=1), and 98.5% lower than the average DE we observed for RDS. PMID:24159246

  14. Measurement of bite force variables related to human discrimination of left-right hardness differences of silicone rubber samples placed between the incisors.

    PubMed

    Dan, Haruka; Azuma, Teruaki; Hayakawa, Fumiyo; Kohyama, Kaoru

    2005-05-01

    This study was designed to examine human subjects' ability to discriminate between spatially different bite pressures. We measured actual bite pressure distribution when subjects simultaneously bit two silicone rubber samples with different hardnesses using their right and left incisors. They were instructed to compare the hardness of these two rubber samples and indicate which was harder (right or left). The correct-answer rates were statistically significant at P < 0.05 for all pairs of different right and left silicone rubber hardnesses. Simultaneous bite measurements using a multiple-point sheet sensor demonstrated that the bite force, active pressure and maximum pressure point were greater for the harder silicone rubber sample. The difference between the left and right was statistically significant (P < 0.05) for all pairs with different silicone rubber hardnesses. We demonstrated for the first time that subjects could perceive and discriminate between spatially different bite pressures during a single bite with incisors. Differences of the bite force, pressure and the maximum pressure point between the right and left silicone samples should be sensory cues for spatial hardness discrimination.

  15. A methodological analysis of chaplaincy research: 2000-2009.

    PubMed

    Galek, Kathleen; Flannelly, Kevin J; Jankowski, Katherine R B; Handzo, George F

    2011-01-01

    The present article presents a comprehensive review and analysis of quantitative research conducted in the United States on chaplaincy and closely related topics published between 2000 and 2009. A combined search strategy identified 49 quantitative studies in 13 journals. The analysis focuses on the methodological sophistication of the studies, compared to earlier research on chaplaincy and pastoral care. Cross-sectional surveys of convenience samples still dominate the field, but sample sizes have increased somewhat over the past three decades. Reporting of the validity and reliability of measures continues to be low, although reporting of response rates has improved. Improvements in the use of inferential statistics and statistical controls were also observed, compared to previous research. The authors conclude that more experimental research is needed on chaplaincy, along with an increased use of hypothesis testing, regardless of the research designs that are used.

  16. Environmental Monitoring and Assessment Program Western Pilot Project - Information about selected fish and macroinvertebrates sampled from North Dakota perennial streams, 2000-2003

    USGS Publications Warehouse

    Vining, Kevin C.; Lundgren, Robert F.

    2008-01-01

    Sixty-five sampling sites, selected by a statistical design to represent lengths of perennial streams in North Dakota, were chosen to be sampled for fish and aquatic insects (macroinvertebrates) to establish unbiased baseline data. Channel catfish and common carp were the most abundant game and large fish species in the Cultivated Plains and Rangeland Plains, respectively. Blackflies were present in more than 50 percent of stream lengths sampled in the State; mayflies and caddisflies were present in more than 80 percent. Dragonflies were present in a greater percentage of stream lengths in the Rangeland Plains than in the Cultivated Plains.

  17. Studies in Support of the Application of Statistical Theory to Design and Evaluation of Operational Tests. Annex D. An Application of Bayesian Statistical Methods in the Determination of Sample Size for Operational Testing in the U.S. Army

    DTIC Science & Technology

    1977-07-01

    SIZE C XNI. C UE2 - UTILITY OF EXPERIMENT OF SIZE C XN2. C ICHECK - VARIABLE USLD TO CHECK FOR C TERMINATION, C~C DIMENSION SUBLIM{20),UPLIM(20),UEI(20...1J=UPLIM(K4-I)-XNI (K+1)+SU8LIt1(K+i*. C CHECK FOR TERMINATION. 944 ICHECK =SUBLIM(K)+2 IFIICHECK.GEUPLiHMK.,OR.K.G1.20’ GO TO 930 GO TO 920 930

  18. Conception et realisation d'un echantillonneur de grande vitesse en technologie HIGFET (transistor a effet de champ avec heterostructure et grille isolee)

    NASA Astrophysics Data System (ADS)

    Tazlauanu, Mihai

    The research work reported in this thesis details a new fabrication technology for high speed integrated circuits in the broadest sense, including original contributions to device modeling, circuit simulation, integrated circuit design, wafer fabrication, micro-physical and electrical characterization, process flow and final device testing as part of an electrical system. The primary building block of this technology is the heterostructure insulated gate field effect transistor, HIGFET. We used an InP/InGaAs epitaxial heterostructure to ensure a high charge carrier mobility and hence obtain a higher operating frequency than that currently possible for silicon devices. We designed and built integrated circuits with two system architectures. The first architecture integrates the clock signal generator with the sample and hold circuitry on the InP die, while the second is a hybrid architecture of an InP sample and hold assembled with an external clock signal generator made with ECL circuits on GaAs. To generate the clock signals on the same die with the sample and hold circuits, we developed a digital circuit family based on an original inverter, appropriate for depletion mode NMOS technology. We used this circuit to design buffer amplifiers and ring oscillators. Four mask sets produced in a Cadence environment, have permitted the fabrication of test and working devices. Each new mask generation has reflected the previous achievements and has implemented new structures and circuit techniques. The fabrication technology has undergone successive modifications and refinements to optimize device manufacturing. Particular attention has been paid to the technological robustness. The plasma enhanced etching process (RIE) had been used for an exhaustive study for the statistical simulation of the technological steps. Electrical measurements, performed on the experimental samples, have permitted the modeling of the devices, technological processing to be adjusted and circuit design improved. Electrical measurements performed on dedicated test structures, during the fabrication cycle, allowed the identification and correction of some technological problems (ohmic contacts, current leakage, interconnection integrity, and thermal instabilities). Feedback corrections were validated by dedicated experiments with the experimental effort optimized by statistical techniques (factorial fractional design). (Abstract shortened by UMI.)

  19. Misrepresenting random sampling? A systematic review of research papers in the Journal of Advanced Nursing.

    PubMed

    Williamson, Graham R

    2003-11-01

    This paper discusses the theoretical limitations of the use of random sampling and probability theory in the production of a significance level (or P-value) in nursing research. Potential alternatives, in the form of randomization tests, are proposed. Research papers in nursing, medicine and psychology frequently misrepresent their statistical findings, as the P-values reported assume random sampling. In this systematic review of studies published between January 1995 and June 2002 in the Journal of Advanced Nursing, 89 (68%) studies broke this assumption because they used convenience samples or entire populations. As a result, some of the findings may be questionable. The key ideas of random sampling and probability theory for statistical testing (for generating a P-value) are outlined. The result of a systematic review of research papers published in the Journal of Advanced Nursing is then presented, showing how frequently random sampling appears to have been misrepresented. Useful alternative techniques that might overcome these limitations are then discussed. REVIEW LIMITATIONS: This review is limited in scope because it is applied to one journal, and so the findings cannot be generalized to other nursing journals or to nursing research in general. However, it is possible that other nursing journals are also publishing research articles based on the misrepresentation of random sampling. The review is also limited because in several of the articles the sampling method was not completely clearly stated, and in this circumstance a judgment has been made as to the sampling method employed, based on the indications given by author(s). Quantitative researchers in nursing should be very careful that the statistical techniques they use are appropriate for the design and sampling methods of their studies. If the techniques they employ are not appropriate, they run the risk of misinterpreting findings by using inappropriate, unrepresentative and biased samples.

  20. A Guerilla Guide to Common Problems in ‘Neurostatistics’: Essential Statistical Topics in Neuroscience

    PubMed Central

    Smith, Paul F.

    2017-01-01

    Effective inferential statistical analysis is essential for high quality studies in neuroscience. However, recently, neuroscience has been criticised for the poor use of experimental design and statistical analysis. Many of the statistical issues confronting neuroscience are similar to other areas of biology; however, there are some that occur more regularly in neuroscience studies. This review attempts to provide a succinct overview of some of the major issues that arise commonly in the analyses of neuroscience data. These include: the non-normal distribution of the data; inequality of variance between groups; extensive correlation in data for repeated measurements across time or space; excessive multiple testing; inadequate statistical power due to small sample sizes; pseudo-replication; and an over-emphasis on binary conclusions about statistical significance as opposed to effect sizes. Statistical analysis should be viewed as just another neuroscience tool, which is critical to the final outcome of the study. Therefore, it needs to be done well and it is a good idea to be proactive and seek help early, preferably before the study even begins. PMID:29371855

  1. A Guerilla Guide to Common Problems in 'Neurostatistics': Essential Statistical Topics in Neuroscience.

    PubMed

    Smith, Paul F

    2017-01-01

    Effective inferential statistical analysis is essential for high quality studies in neuroscience. However, recently, neuroscience has been criticised for the poor use of experimental design and statistical analysis. Many of the statistical issues confronting neuroscience are similar to other areas of biology; however, there are some that occur more regularly in neuroscience studies. This review attempts to provide a succinct overview of some of the major issues that arise commonly in the analyses of neuroscience data. These include: the non-normal distribution of the data; inequality of variance between groups; extensive correlation in data for repeated measurements across time or space; excessive multiple testing; inadequate statistical power due to small sample sizes; pseudo-replication; and an over-emphasis on binary conclusions about statistical significance as opposed to effect sizes. Statistical analysis should be viewed as just another neuroscience tool, which is critical to the final outcome of the study. Therefore, it needs to be done well and it is a good idea to be proactive and seek help early, preferably before the study even begins.

  2. Combining band recovery data and Pollock's robust design to model temporary and permanent emigration

    USGS Publications Warehouse

    Lindberg, M.S.; Kendall, W.L.; Hines, J.E.; Anderson, M.G.

    2001-01-01

    Capture-recapture models are widely used to estimate demographic parameters of marked populations. Recently, this statistical theory has been extended to modeling dispersal of open populations. Multistate models can be used to estimate movement probabilities among subdivided populations if multiple sites are sampled. Frequently, however, sampling is limited to a single site. Models described by Burnham (1993, in Marked Individuals in the Study of Bird Populations, 199-213), which combined open population capture-recapture and band-recovery models, can be used to estimate permanent emigration when sampling is limited to a single population. Similarly, Kendall, Nichols, and Hines (1997, Ecology 51, 563-578) developed models to estimate temporary emigration under Pollock's (1982, Journal of Wildlife Management 46, 757-760) robust design. We describe a likelihood-based approach to simultaneously estimate temporary and permanent emigration when sampling is limited to a single population. We use a sampling design that combines the robust design and recoveries of individuals obtained immediately following each sampling period. We present a general form for our model where temporary emigration is a first-order Markov process, and we discuss more restrictive models. We illustrate these models with analysis of data on marked Canvasback ducks. Our analysis indicates that probability of permanent emigration for adult female Canvasbacks was 0.193 (SE = 0.082) and that birds that were present at the study area in year i - 1 had a higher probability of presence in year i than birds that were not present in year i - 1.

  3. Systematic versus random sampling in stereological studies.

    PubMed

    West, Mark J

    2012-12-01

    The sampling that takes place at all levels of an experimental design must be random if the estimate is to be unbiased in a statistical sense. There are two fundamental ways by which one can make a random sample of the sections and positions to be probed on the sections. Using a card-sampling analogy, one can pick any card at all out of a deck of cards. This is referred to as independent random sampling because the sampling of any one card is made without reference to the position of the other cards. The other approach to obtaining a random sample would be to pick a card within a set number of cards and others at equal intervals within the deck. Systematic sampling along one axis of many biological structures is more efficient than random sampling, because most biological structures are not randomly organized. This article discusses the merits of systematic versus random sampling in stereological studies.

  4. Design and analysis issues in gene and environment studies

    PubMed Central

    2012-01-01

    Both nurture (environmental) and nature (genetic factors) play an important role in human disease etiology. Traditionally, these effects have been thought of as independent. This perspective is ill informed for non-mendelian complex disorders which result as an interaction between genetics and environment. To understand health and disease we must study how nature and nurture interact. Recent advances in human genomics and high-throughput biotechnology make it possible to study large numbers of genetic markers and gene products simultaneously to explore their interactions with environment. The purpose of this review is to discuss design and analytic issues for gene-environment interaction studies in the “-omics” era, with a focus on environmental and genetic epidemiological studies. We present an expanded environmental genomic disease paradigm. We discuss several study design issues for gene-environmental interaction studies, including confounding and selection bias, measurement of exposures and genotypes. We discuss statistical issues in studying gene-environment interactions in different study designs, such as choices of statistical models, assumptions regarding biological factors, and power and sample size considerations, especially in genome-wide gene-environment studies. Future research directions are also discussed. PMID:23253229

  5. Design and analysis issues in gene and environment studies.

    PubMed

    Liu, Chen-yu; Maity, Arnab; Lin, Xihong; Wright, Robert O; Christiani, David C

    2012-12-19

    Both nurture (environmental) and nature (genetic factors) play an important role in human disease etiology. Traditionally, these effects have been thought of as independent. This perspective is ill informed for non-mendelian complex disorders which result as an interaction between genetics and environment. To understand health and disease we must study how nature and nurture interact. Recent advances in human genomics and high-throughput biotechnology make it possible to study large numbers of genetic markers and gene products simultaneously to explore their interactions with environment. The purpose of this review is to discuss design and analytic issues for gene-environment interaction studies in the "-omics" era, with a focus on environmental and genetic epidemiological studies. We present an expanded environmental genomic disease paradigm. We discuss several study design issues for gene-environmental interaction studies, including confounding and selection bias, measurement of exposures and genotypes. We discuss statistical issues in studying gene-environment interactions in different study designs, such as choices of statistical models, assumptions regarding biological factors, and power and sample size considerations, especially in genome-wide gene-environment studies. Future research directions are also discussed.

  6. An automated approach to the design of decision tree classifiers

    NASA Technical Reports Server (NTRS)

    Argentiero, P.; Chin, P.; Beaudet, P.

    1980-01-01

    The classification of large dimensional data sets arising from the merging of remote sensing data with more traditional forms of ancillary data is considered. Decision tree classification, a popular approach to the problem, is characterized by the property that samples are subjected to a sequence of decision rules before they are assigned to a unique class. An automated technique for effective decision tree design which relies only on apriori statistics is presented. This procedure utilizes a set of two dimensional canonical transforms and Bayes table look-up decision rules. An optimal design at each node is derived based on the associated decision table. A procedure for computing the global probability of correct classfication is also provided. An example is given in which class statistics obtained from an actual LANDSAT scene are used as input to the program. The resulting decision tree design has an associated probability of correct classification of .76 compared to the theoretically optimum .79 probability of correct classification associated with a full dimensional Bayes classifier. Recommendations for future research are included.

  7. 45 CFR 160.536 - Statistical sampling.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 45 Public Welfare 1 2010-10-01 2010-10-01 false Statistical sampling. 160.536 Section 160.536... REQUIREMENTS GENERAL ADMINISTRATIVE REQUIREMENTS Procedures for Hearings § 160.536 Statistical sampling. (a) In... statistical sampling study as evidence of the number of violations under § 160.406 of this part, or the...

  8. 42 CFR 1003.133 - Statistical sampling.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 42 Public Health 5 2011-10-01 2011-10-01 false Statistical sampling. 1003.133 Section 1003.133... AUTHORITIES CIVIL MONEY PENALTIES, ASSESSMENTS AND EXCLUSIONS § 1003.133 Statistical sampling. (a) In meeting... statistical sampling study as evidence of the number and amount of claims and/or requests for payment as...

  9. 45 CFR 160.536 - Statistical sampling.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 45 Public Welfare 1 2011-10-01 2011-10-01 false Statistical sampling. 160.536 Section 160.536... REQUIREMENTS GENERAL ADMINISTRATIVE REQUIREMENTS Procedures for Hearings § 160.536 Statistical sampling. (a) In... statistical sampling study as evidence of the number of violations under § 160.406 of this part, or the...

  10. 42 CFR 1003.133 - Statistical sampling.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 42 Public Health 5 2010-10-01 2010-10-01 false Statistical sampling. 1003.133 Section 1003.133... AUTHORITIES CIVIL MONEY PENALTIES, ASSESSMENTS AND EXCLUSIONS § 1003.133 Statistical sampling. (a) In meeting... statistical sampling study as evidence of the number and amount of claims and/or requests for payment as...

  11. 42 CFR 405.1064 - ALJ decisions involving statistical samples.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 42 Public Health 2 2011-10-01 2011-10-01 false ALJ decisions involving statistical samples. 405... Medicare Coverage Policies § 405.1064 ALJ decisions involving statistical samples. When an appeal from the QIC involves an overpayment issue and the QIC used a statistical sample in reaching its...

  12. 42 CFR 405.1064 - ALJ decisions involving statistical samples.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 42 Public Health 2 2010-10-01 2010-10-01 false ALJ decisions involving statistical samples. 405... Medicare Coverage Policies § 405.1064 ALJ decisions involving statistical samples. When an appeal from the QIC involves an overpayment issue and the QIC used a statistical sample in reaching its...

  13. Users manual for the US baseline corn and soybean segment classification procedure

    NASA Technical Reports Server (NTRS)

    Horvath, R.; Colwell, R. (Principal Investigator); Hay, C.; Metzler, M.; Mykolenko, O.; Odenweller, J.; Rice, D.

    1981-01-01

    A user's manual for the classification component of the FY-81 U.S. Corn and Soybean Pilot Experiment in the Foreign Commodity Production Forecasting Project of AgRISTARS is presented. This experiment is one of several major experiments in AgRISTARS designed to measure and advance the remote sensing technologies for cropland inventory. The classification procedure discussed is designed to produce segment proportion estimates for corn and soybeans in the U.S. Corn Belt (Iowa, Indiana, and Illinois) using LANDSAT data. The estimates are produced by an integrated Analyst/Machine procedure. The Analyst selects acquisitions, participates in stratification, and assigns crop labels to selected samples. In concert with the Analyst, the machine digitally preprocesses LANDSAT data to remove external effects, stratifies the data into field like units and into spectrally similar groups, statistically samples the data for Analyst labeling, and combines the labeled samples into a final estimate.

  14. POWER ANALYSIS FOR COMPLEX MEDIATIONAL DESIGNS USING MONTE CARLO METHODS

    PubMed Central

    Thoemmes, Felix; MacKinnon, David P.; Reiser, Mark R.

    2013-01-01

    Applied researchers often include mediation effects in applications of advanced methods such as latent variable models and linear growth curve models. Guidance on how to estimate statistical power to detect mediation for these models has not yet been addressed in the literature. We describe a general framework for power analyses for complex mediational models. The approach is based on the well known technique of generating a large number of samples in a Monte Carlo study, and estimating power as the percentage of cases in which an estimate of interest is significantly different from zero. Examples of power calculation for commonly used mediational models are provided. Power analyses for the single mediator, multiple mediators, three-path mediation, mediation with latent variables, moderated mediation, and mediation in longitudinal designs are described. Annotated sample syntax for Mplus is appended and tabled values of required sample sizes are shown for some models. PMID:23935262

  15. Statistical analysis of nonmonotonic dose-response relationships: research design and analysis of nasal cell proliferation in rats exposed to formaldehyde.

    PubMed

    Gaylor, David W; Lutz, Werner K; Conolly, Rory B

    2004-01-01

    Statistical analyses of nonmonotonic dose-response curves are proposed, experimental designs to detect low-dose effects of J-shaped curves are suggested, and sample sizes are provided. For quantal data such as cancer incidence rates, much larger numbers of animals are required than for continuous data such as biomarker measurements. For example, 155 animals per dose group are required to have at least an 80% chance of detecting a decrease from a 20% incidence in controls to an incidence of 10% at a low dose. For a continuous measurement, only 14 animals per group are required to have at least an 80% chance of detecting a change of the mean by one standard deviation of the control group. Experimental designs based on three dose groups plus controls are discussed to detect nonmonotonicity or to estimate the zero equivalent dose (ZED), i.e., the dose that produces a response equal to the average response in the controls. Cell proliferation data in the nasal respiratory epithelium of rats exposed to formaldehyde by inhalation are used to illustrate the statistical procedures. Statistically significant departures from a monotonic dose response were obtained for time-weighted average labeling indices with an estimated ZED at a formaldehyde dose of 5.4 ppm, with a lower 95% confidence limit of 2.7 ppm. It is concluded that demonstration of a statistically significant bi-phasic dose-response curve, together with estimation of the resulting ZED, could serve as a point-of departure in establishing a reference dose for low-dose risk assessment.

  16. A statistical experiment design approach for optimizing biodegradation of weathered crude oil in coastal sediments.

    PubMed

    Mohajeri, Leila; Aziz, Hamidi Abdul; Isa, Mohamed Hasnain; Zahed, Mohammad Ali

    2010-02-01

    This work studied the bioremediation of weathered crude oil (WCO) in coastal sediment samples using central composite face centered design (CCFD) under response surface methodology (RSM). Initial oil concentration, biomass, nitrogen and phosphorus concentrations were used as independent variables (factors) and oil removal as dependent variable (response) in a 60 days trial. A statistically significant model for WCO removal was obtained. The coefficient of determination (R(2)=0.9732) and probability value (P<0.0001) demonstrated significance for the regression model. Numerical optimization based on desirability function were carried out for initial oil concentration of 2, 16 and 30 g per kg sediment and 83.13, 78.06 and 69.92 per cent removal were observed respectively, compare to 77.13, 74.17 and 69.87 per cent removal for un-optimized results.

  17. A modified weighted function method for parameter estimation of Pearson type three distribution

    NASA Astrophysics Data System (ADS)

    Liang, Zhongmin; Hu, Yiming; Li, Binquan; Yu, Zhongbo

    2014-04-01

    In this paper, an unconventional method called Modified Weighted Function (MWF) is presented for the conventional moment estimation of a probability distribution function. The aim of MWF is to estimate the coefficient of variation (CV) and coefficient of skewness (CS) from the original higher moment computations to the first-order moment calculations. The estimators for CV and CS of Pearson type three distribution function (PE3) were derived by weighting the moments of the distribution with two weight functions, which were constructed by combining two negative exponential-type functions. The selection of these weight functions was based on two considerations: (1) to relate weight functions to sample size in order to reflect the relationship between the quantity of sample information and the role of weight function and (2) to allocate more weights to data close to medium-tail positions in a sample series ranked in an ascending order. A Monte-Carlo experiment was conducted to simulate a large number of samples upon which statistical properties of MWF were investigated. For the PE3 parent distribution, results of MWF were compared to those of the original Weighted Function (WF) and Linear Moments (L-M). The results indicate that MWF was superior to WF and slightly better than L-M, in terms of statistical unbiasness and effectiveness. In addition, the robustness of MWF, WF, and L-M were compared by designing the Monte-Carlo experiment that samples are obtained from Log-Pearson type three distribution (LPE3), three parameter Log-Normal distribution (LN3), and Generalized Extreme Value distribution (GEV), respectively, but all used as samples from the PE3 distribution. The results show that in terms of statistical unbiasness, no one method possesses the absolutely overwhelming advantage among MWF, WF, and L-M, while in terms of statistical effectiveness, the MWF is superior to WF and L-M.

  18. Sampling considerations for disease surveillance in wildlife populations

    USGS Publications Warehouse

    Nusser, S.M.; Clark, W.R.; Otis, D.L.; Huang, L.

    2008-01-01

    Disease surveillance in wildlife populations involves detecting the presence of a disease, characterizing its prevalence and spread, and subsequent monitoring. A probability sample of animals selected from the population and corresponding estimators of disease prevalence and detection provide estimates with quantifiable statistical properties, but this approach is rarely used. Although wildlife scientists often assume probability sampling and random disease distributions to calculate sample sizes, convenience samples (i.e., samples of readily available animals) are typically used, and disease distributions are rarely random. We demonstrate how landscape-based simulation can be used to explore properties of estimators from convenience samples in relation to probability samples. We used simulation methods to model what is known about the habitat preferences of the wildlife population, the disease distribution, and the potential biases of the convenience-sample approach. Using chronic wasting disease in free-ranging deer (Odocoileus virginianus) as a simple illustration, we show that using probability sample designs with appropriate estimators provides unbiased surveillance parameter estimates but that the selection bias and coverage errors associated with convenience samples can lead to biased and misleading results. We also suggest practical alternatives to convenience samples that mix probability and convenience sampling. For example, a sample of land areas can be selected using a probability design that oversamples areas with larger animal populations, followed by harvesting of individual animals within sampled areas using a convenience sampling method.

  19. Statistical sensitivity analysis of a simple nuclear waste repository model

    NASA Astrophysics Data System (ADS)

    Ronen, Y.; Lucius, J. L.; Blow, E. M.

    1980-06-01

    A preliminary step in a comprehensive sensitivity analysis of the modeling of a nuclear waste repository. The purpose of the complete analysis is to determine which modeling parameters and physical data are most important in determining key design performance criteria and then to obtain the uncertainty in the design for safety considerations. The theory for a statistical screening design methodology is developed for later use in the overall program. The theory was applied to the test case of determining the relative importance of the sensitivity of near field temperature distribution in a single level salt repository to modeling parameters. The exact values of the sensitivities to these physical and modeling parameters were then obtained using direct methods of recalculation. The sensitivity coefficients found to be important for the sample problem were thermal loading, distance between the spent fuel canisters and their radius. Other important parameters were those related to salt properties at a point of interest in the repository.

  20. Sampling designs matching species biology produce accurate and affordable abundance indices

    PubMed Central

    Farley, Sean; Russell, Gareth J.; Butler, Matthew J.; Selinger, Jeff

    2013-01-01

    Wildlife biologists often use grid-based designs to sample animals and generate abundance estimates. Although sampling in grids is theoretically sound, in application, the method can be logistically difficult and expensive when sampling elusive species inhabiting extensive areas. These factors make it challenging to sample animals and meet the statistical assumption of all individuals having an equal probability of capture. Violating this assumption biases results. Does an alternative exist? Perhaps by sampling only where resources attract animals (i.e., targeted sampling), it would provide accurate abundance estimates more efficiently and affordably. However, biases from this approach would also arise if individuals have an unequal probability of capture, especially if some failed to visit the sampling area. Since most biological programs are resource limited, and acquiring abundance data drives many conservation and management applications, it becomes imperative to identify economical and informative sampling designs. Therefore, we evaluated abundance estimates generated from grid and targeted sampling designs using simulations based on geographic positioning system (GPS) data from 42 Alaskan brown bears (Ursus arctos). Migratory salmon drew brown bears from the wider landscape, concentrating them at anadromous streams. This provided a scenario for testing the targeted approach. Grid and targeted sampling varied by trap amount, location (traps placed randomly, systematically or by expert opinion), and traps stationary or moved between capture sessions. We began by identifying when to sample, and if bears had equal probability of capture. We compared abundance estimates against seven criteria: bias, precision, accuracy, effort, plus encounter rates, and probabilities of capture and recapture. One grid (49 km2 cells) and one targeted configuration provided the most accurate results. Both placed traps by expert opinion and moved traps between capture sessions, which raised capture probabilities. The grid design was least biased (−10.5%), but imprecise (CV 21.2%), and used most effort (16,100 trap-nights). The targeted configuration was more biased (−17.3%), but most precise (CV 12.3%), with least effort (7,000 trap-nights). Targeted sampling generated encounter rates four times higher, and capture and recapture probabilities 11% and 60% higher than grid sampling, in a sampling frame 88% smaller. Bears had unequal probability of capture with both sampling designs, partly because some bears never had traps available to sample them. Hence, grid and targeted sampling generated abundance indices, not estimates. Overall, targeted sampling provided the most accurate and affordable design to index abundance. Targeted sampling may offer an alternative method to index the abundance of other species inhabiting expansive and inaccessible landscapes elsewhere, provided their attraction to resource concentrations. PMID:24392290

  1. Cost-efficient designs for three-arm trials with treatment delivered by health professionals: Sample sizes for a combination of nested and crossed designs

    PubMed Central

    Moerbeek, Mirjam

    2018-01-01

    Background This article studies the design of trials that compare three treatment conditions that are delivered by two types of health professionals. The one type of health professional delivers one treatment, and the other type delivers two treatments, hence, this design is a combination of a nested and crossed design. As each health professional treats multiple patients, the data have a nested structure. This nested structure has thus far been ignored in the design of such trials, which may result in an underestimate of the required sample size. In the design stage, the sample sizes should be determined such that a desired power is achieved for each of the three pairwise comparisons, while keeping costs or sample size at a minimum. Methods The statistical model that relates outcome to treatment condition and explicitly takes the nested data structure into account is presented. Mathematical expressions that relate sample size to power are derived for each of the three pairwise comparisons on the basis of this model. The cost-efficient design achieves sufficient power for each pairwise comparison at lowest costs. Alternatively, one may minimize the total number of patients. The sample sizes are found numerically and an Internet application is available for this purpose. The design is also compared to a nested design in which each health professional delivers just one treatment. Results Mathematical expressions show that this design is more efficient than the nested design. For each pairwise comparison, power increases with the number of health professionals and the number of patients per health professional. The methodology of finding a cost-efficient design is illustrated using a trial that compares treatments for social phobia. The optimal sample sizes reflect the costs for training and supervising psychologists and psychiatrists, and the patient-level costs in the three treatment conditions. Conclusion This article provides the methodology for designing trials that compare three treatment conditions while taking the nesting of patients within health professionals into account. As such, it helps to avoid underpowered trials. To use the methodology, a priori estimates of the total outcome variances and intraclass correlation coefficients must be obtained from experts’ opinions or findings in the literature. PMID:29316807

  2. Development and Validation of Pathogen Environmental Monitoring Programs for Small Cheese Processing Facilities.

    PubMed

    Beno, Sarah M; Stasiewicz, Matthew J; Andrus, Alexis D; Ralyea, Robert D; Kent, David J; Martin, Nicole H; Wiedmann, Martin; Boor, Kathryn J

    2016-12-01

    Pathogen environmental monitoring programs (EMPs) are essential for food processing facilities of all sizes that produce ready-to-eat food products exposed to the processing environment. We developed, implemented, and evaluated EMPs targeting Listeria spp. and Salmonella in nine small cheese processing facilities, including seven farmstead facilities. Individual EMPs with monthly sample collection protocols were designed specifically for each facility. Salmonella was detected in only one facility, with likely introduction from the adjacent farm indicated by pulsed-field gel electrophoresis data. Listeria spp. were isolated from all nine facilities during routine sampling. The overall Listeria spp. (other than Listeria monocytogenes ) and L. monocytogenes prevalences in the 4,430 environmental samples collected were 6.03 and 1.35%, respectively. Molecular characterization and subtyping data suggested persistence of a given Listeria spp. strain in seven facilities and persistence of L. monocytogenes in four facilities. To assess routine sampling plans, validation sampling for Listeria spp. was performed in seven facilities after at least 6 months of routine sampling. This validation sampling was performed by independent individuals and included collection of 50 to 150 samples per facility, based on statistical sample size calculations. Two of the facilities had a significantly higher frequency of detection of Listeria spp. during the validation sampling than during routine sampling, whereas two other facilities had significantly lower frequencies of detection. This study provides a model for a science- and statistics-based approach to developing and validating pathogen EMPs.

  3. Field comparison of analytical results from discrete-depth ground water samplers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zemo, D.A.; Delfino, T.A.; Gallinatti, J.D.

    1995-07-01

    Discrete-depth ground water samplers are used during environmental screening investigations to collect ground water samples in lieu of installing and sampling monitoring wells. Two of the most commonly used samplers are the BAT Enviroprobe and the QED HydroPunch I, which rely on differing sample collection mechanics. Although these devices have been on the market for several years, it was unknown what, if any, effect the differences would have on analytical results for ground water samples containing low to moderate concentrations of chlorinated volatile organic compounds (VOCs). This study investigated whether the discrete-depth ground water sampler used introduces statistically significant differencesmore » in analytical results. The goal was to provide a technical basis for allowing the two devices to be used interchangeably during screening investigations. Because this study was based on field samples, it included several sources of potential variability. It was necessary to separate differences due to sampler type from variability due to sampling location, sample handling, and laboratory analytical error. To statistically evaluate these sources of variability, the experiment was arranged in a nested design. Sixteen ground water samples were collected from eight random locations within a 15-foot by 15-foot grid. The grid was located in an area where shallow ground water was believed to be uniformly affected by VOCs. The data were evaluated using analysis of variance.« less

  4. Précis of statistical significance: rationale, validity, and utility.

    PubMed

    Chow, S L

    1998-04-01

    The null-hypothesis significance-test procedure (NHSTP) is defended in the context of the theory-corroboration experiment, as well as the following contrasts: (a) substantive hypotheses versus statistical hypotheses, (b) theory corroboration versus statistical hypothesis testing, (c) theoretical inference versus statistical decision, (d) experiments versus nonexperimental studies, and (e) theory corroboration versus treatment assessment. The null hypothesis can be true because it is the hypothesis that errors are randomly distributed in data. Moreover, the null hypothesis is never used as a categorical proposition. Statistical significance means only that chance influences can be excluded as an explanation of data; it does not identify the nonchance factor responsible. The experimental conclusion is drawn with the inductive principle underlying the experimental design. A chain of deductive arguments gives rise to the theoretical conclusion via the experimental conclusion. The anomalous relationship between statistical significance and the effect size often used to criticize NHSTP is more apparent than real. The absolute size of the effect is not an index of evidential support for the substantive hypothesis. Nor is the effect size, by itself, informative as to the practical importance of the research result. Being a conditional probability, statistical power cannot be the a priori probability of statistical significance. The validity of statistical power is debatable because statistical significance is determined with a single sampling distribution of the test statistic based on H0, whereas it takes two distributions to represent statistical power or effect size. Sample size should not be determined in the mechanical manner envisaged in power analysis. It is inappropriate to criticize NHSTP for nonstatistical reasons. At the same time, neither effect size, nor confidence interval estimate, nor posterior probability can be used to exclude chance as an explanation of data. Neither can any of them fulfill the nonstatistical functions expected of them by critics.

  5. A field test of three LQAS designs to assess the prevalence of acute malnutrition.

    PubMed

    Deitchler, Megan; Valadez, Joseph J; Egge, Kari; Fernandez, Soledad; Hennigan, Mary

    2007-08-01

    The conventional method for assessing the prevalence of Global Acute Malnutrition (GAM) in emergency settings is the 30 x 30 cluster-survey. This study describes alternative approaches: three Lot Quality Assurance Sampling (LQAS) designs to assess GAM. The LQAS designs were field-tested and their results compared with those from a 30 x 30 cluster-survey. Computer simulations confirmed that small clusters instead of a simple random sample could be used for LQAS assessments of GAM. Three LQAS designs were developed (33 x 6, 67 x 3, Sequential design) to assess GAM thresholds of 10, 15 and 20%. The designs were field-tested simultaneously with a 30 x 30 cluster-survey in Siraro, Ethiopia during June 2003. Using a nested study design, anthropometric, morbidity and vaccination data were collected on all children 6-59 months in sampled households. Hypothesis tests about GAM thresholds were conducted for each LQAS design. Point estimates were obtained for the 30 x 30 cluster-survey and the 33 x 6 and 67 x 3 LQAS designs. Hypothesis tests showed GAM as <10% for the 33 x 6 design and GAM as > or =10% for the 67 x 3 and Sequential designs. Point estimates for the 33 x 6 and 67 x 3 designs were similar to those of the 30 x 30 cluster-survey for GAM (6.7%, CI = 3.2-10.2%; 8.2%, CI = 4.3-12.1%, 7.4%, CI = 4.8-9.9%) and all other indicators. The CIs for the LQAS designs were only slightly wider than the CIs for the 30 x 30 cluster-survey; yet the LQAS designs required substantially less time to administer. The LQAS designs provide statistically appropriate alternatives to the more time-consuming 30 x 30 cluster-survey. However, additional field-testing is needed using independent samples rather than a nested study design.

  6. Comparative Financial Statistics for Public Two-Year Colleges: FY 1993 National Sample.

    ERIC Educational Resources Information Center

    Dickmeyer, Nathan; Meeker, Bradley

    This report provides comparative information derived from a national sample of 516 public two-year colleges, highlighting financial statistics for fiscal year, 1992-93. This report provides space for colleges to compare their institutional statistics with national sample medians, quartile data for the national sample, and statistics presented in a…

  7. 7 CFR 52.38a - Definitions of terms applicable to statistical sampling.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 2 2011-01-01 2011-01-01 false Definitions of terms applicable to statistical... Sampling § 52.38a Definitions of terms applicable to statistical sampling. (a) Terms applicable to both on... acceptable as a process average. At the AQL's contained in the statistical sampling plans of this subpart...

  8. 7 CFR 52.38a - Definitions of terms applicable to statistical sampling.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 7 Agriculture 2 2010-01-01 2010-01-01 false Definitions of terms applicable to statistical... Sampling § 52.38a Definitions of terms applicable to statistical sampling. (a) Terms applicable to both on... acceptable as a process average. At the AQL's contained in the statistical sampling plans of this subpart...

  9. Evidence for a Global Sampling Process in Extraction of Summary Statistics of Item Sizes in a Set.

    PubMed

    Tokita, Midori; Ueda, Sachiyo; Ishiguchi, Akira

    2016-01-01

    Several studies have shown that our visual system may construct a "summary statistical representation" over groups of visual objects. Although there is a general understanding that human observers can accurately represent sets of a variety of features, many questions on how summary statistics, such as an average, are computed remain unanswered. This study investigated sampling properties of visual information used by human observers to extract two types of summary statistics of item sets, average and variance. We presented three models of ideal observers to extract the summary statistics: a global sampling model without sampling noise, global sampling model with sampling noise, and limited sampling model. We compared the performance of an ideal observer of each model with that of human observers using statistical efficiency analysis. Results suggest that summary statistics of items in a set may be computed without representing individual items, which makes it possible to discard the limited sampling account. Moreover, the extraction of summary statistics may not necessarily require the representation of individual objects with focused attention when the sets of items are larger than 4.

  10. Reducing uncertainty in dust monitoring to detect aeolian sediment transport responses to land cover change

    NASA Astrophysics Data System (ADS)

    Webb, N.; Chappell, A.; Van Zee, J.; Toledo, D.; Duniway, M.; Billings, B.; Tedela, N.

    2017-12-01

    Anthropogenic land use and land cover change (LULCC) influence global rates of wind erosion and dust emission, yet our understanding of the magnitude of the responses remains poor. Field measurements and monitoring provide essential data to resolve aeolian sediment transport patterns and assess the impacts of human land use and management intensity. Data collected in the field are also required for dust model calibration and testing, as models have become the primary tool for assessing LULCC-dust cycle interactions. However, there is considerable uncertainty in estimates of dust emission due to the spatial variability of sediment transport. Field sampling designs are currently rudimentary and considerable opportunities are available to reduce the uncertainty. Establishing the minimum detectable change is critical for measuring spatial and temporal patterns of sediment transport, detecting potential impacts of LULCC and land management, and for quantifying the uncertainty of dust model estimates. Here, we evaluate the effectiveness of common sampling designs (e.g., simple random sampling, systematic sampling) used to measure and monitor aeolian sediment transport rates. Using data from the US National Wind Erosion Research Network across diverse rangeland and cropland cover types, we demonstrate how only large changes in sediment mass flux (of the order 200% to 800%) can be detected when small sample sizes are used, crude sampling designs are implemented, or when the spatial variation is large. We then show how statistical rigour and the straightforward application of a sampling design can reduce the uncertainty and detect change in sediment transport over time and between land use and land cover types.

  11. Health Surveys Using Mobile Phones in Developing Countries: Automated Active Strata Monitoring and Other Statistical Considerations for Improving Precision and Reducing Biases.

    PubMed

    Labrique, Alain; Blynn, Emily; Ahmed, Saifuddin; Gibson, Dustin; Pariyo, George; Hyder, Adnan A

    2017-05-05

    In low- and middle-income countries (LMICs), historically, household surveys have been carried out by face-to-face interviews to collect survey data related to risk factors for noncommunicable diseases. The proliferation of mobile phone ownership and the access it provides in these countries offers a new opportunity to remotely conduct surveys with increased efficiency and reduced cost. However, the near-ubiquitous ownership of phones, high population mobility, and low cost require a re-examination of statistical recommendations for mobile phone surveys (MPS), especially when surveys are automated. As with landline surveys, random digit dialing remains the most appropriate approach to develop an ideal survey-sampling frame. Once the survey is complete, poststratification weights are generally applied to reduce estimate bias and to adjust for selectivity due to mobile ownership. Since weights increase design effects and reduce sampling efficiency, we introduce the concept of automated active strata monitoring to improve representativeness of the sample distribution to that of the source population. Although some statistical challenges remain, MPS represent a promising emerging means for population-level data collection in LMICs. ©Alain Labrique, Emily Blynn, Saifuddin Ahmed, Dustin Gibson, George Pariyo, Adnan A Hyder. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 05.05.2017.

  12. Comparing a single case to a control group - Applying linear mixed effects models to repeated measures data.

    PubMed

    Huber, Stefan; Klein, Elise; Moeller, Korbinian; Willmes, Klaus

    2015-10-01

    In neuropsychological research, single-cases are often compared with a small control sample. Crawford and colleagues developed inferential methods (i.e., the modified t-test) for such a research design. In the present article, we suggest an extension of the methods of Crawford and colleagues employing linear mixed models (LMM). We first show that a t-test for the significance of a dummy coded predictor variable in a linear regression is equivalent to the modified t-test of Crawford and colleagues. As an extension to this idea, we then generalized the modified t-test to repeated measures data by using LMMs to compare the performance difference in two conditions observed in a single participant to that of a small control group. The performance of LMMs regarding Type I error rates and statistical power were tested based on Monte-Carlo simulations. We found that starting with about 15-20 participants in the control sample Type I error rates were close to the nominal Type I error rate using the Satterthwaite approximation for the degrees of freedom. Moreover, statistical power was acceptable. Therefore, we conclude that LMMs can be applied successfully to statistically evaluate performance differences between a single-case and a control sample. Copyright © 2015 Elsevier Ltd. All rights reserved.

  13. An astronomer's guide to period searching

    NASA Astrophysics Data System (ADS)

    Schwarzenberg-Czerny, A.

    2003-03-01

    We concentrate on analysis of unevenly sampled time series, interrupted by periodic gaps, as often encountered in astronomy. While some of our conclusions may appear surprising, all are based on classical statistical principles of Fisher & successors. Except for discussion of the resolution issues, it is best for the reader to forget temporarily about Fourier transforms and to concentrate on problems of fitting of a time series with a model curve. According to their statistical content we divide the issues into several sections, consisting of: (ii) statistical numerical aspects of model fitting, (iii) evaluation of fitted models as hypotheses testing, (iv) the role of the orthogonal models in signal detection (v) conditions for equivalence of periodograms (vi) rating sensitivity by test power. An experienced observer working with individual objects would benefit little from formalized statistical approach. However, we demonstrate the usefulness of this approach in evaluation of performance of periodograms and in quantitative design of large variability surveys.

  14. Evaluation of three-dimensional virtual perception of garments

    NASA Astrophysics Data System (ADS)

    Aydoğdu, G.; Yeşilpinar, S.; Erdem, D.

    2017-10-01

    In recent years, three-dimensional design, dressing and simulation programs came into prominence in the textile industry. By these programs, the need to produce clothing samples for every design in design process has been eliminated. Clothing fit, design, pattern, fabric and accessory details and fabric drape features can be evaluated easily. Also, body size of virtual mannequin can be adjusted so more realistic simulations can be created. Moreover, three-dimensional virtual garment images created by these programs can be used while presenting the product to end-user instead of two-dimensional photograph images. In this study, a survey was carried out to investigate the visual perception of consumers. The survey was conducted for three different garment types, separately. Questions about gender, profession etc. was asked to the participants and expected them to compare real samples and artworks or three-dimensional virtual images of garments. When survey results were analyzed statistically, it is seen that demographic situation of participants does not affect visual perception and three-dimensional virtual garment images reflect the real sample characteristics better than artworks for each garment type. Also, it is reported that there is no perception difference depending on garment type between t-shirt, sweatshirt and tracksuit bottom.

  15. Impact of Sample Size and Variability on the Power and Type I Error Rates of Equivalence Tests: A Simulation Study

    ERIC Educational Resources Information Center

    Rusticus, Shayna A.; Lovato, Chris Y.

    2014-01-01

    The question of equivalence between two or more groups is frequently of interest to many applied researchers. Equivalence testing is a statistical method designed to provide evidence that groups are comparable by demonstrating that the mean differences found between groups are small enough that they are considered practically unimportant. Few…

  16. Nutrition Education in Public Elementary and Secondary Schools. Statistical Analysis Report. Fast Response Survey System (FRSS).

    ERIC Educational Resources Information Center

    Celebuski, Carin; Farris, Elizabeth

    This report presents the findings from the "Nutrition Education in Public Schools, K-12" survey that was designed to provide data on the status of nutrition education in U.S. public schools. Questionnaires were sent to 1,000 school principals of a nationally representative sample of U.S. elementary, middle, and high schools. The survey…

  17. Comparison of Efficiency of Jackknife and Variance Component Estimators of Standard Errors. Program Statistics Research. Technical Report.

    ERIC Educational Resources Information Center

    Longford, Nicholas T.

    Large scale surveys usually employ a complex sampling design and as a consequence, no standard methods for estimation of the standard errors associated with the estimates of population means are available. Resampling methods, such as jackknife or bootstrap, are often used, with reference to their properties of robustness and reduction of bias. A…

  18. Informal Statistics Help Desk

    NASA Technical Reports Server (NTRS)

    Ploutz-Snyder, R. J.; Feiveson, A. H.

    2015-01-01

    Back by popular demand, the JSC Biostatistics Lab is offering an opportunity for informal conversation about challenges you may have encountered with issues of experimental design, analysis, data visualization or related topics. Get answers to common questions about sample size, repeated measures, violation of distributional assumptions, missing data, multiple testing, time-to-event data, when to trust the results of your analyses (reproducibility issues) and more.

  19. Effects of a Worksite Tobacco Control Intervention in India: The Mumbai Worksite Tobacco Control Study, a Cluster Randomized Trial

    PubMed Central

    Sorensen, Glorian; Pednekar, Mangesh; Cordeira, Laura Shulman; Pawar, Pratibha; Nagler, Eve; Stoddard, Anne M.; Kim, Hae-Young; Gupta, Prakash C.

    2016-01-01

    Objectives We assessed a worksite intervention designed to promote tobacco control among manufacturing workers in Greater Mumbai, India. Methods We used a cluster-randomized design to test an integrated health promotion/health protection intervention, which addressed changes at the management and worker levels. Between July 2012 and July 2013, we recruited 20 worksites on a rolling basis and randomly assigned them to intervention or delayed-intervention control conditions. The follow-up survey was conducted between December 2013 and November 2014. Results The difference in 30-day quit rates between intervention and control conditions was statistically significant for production workers (OR=2.25, P=0.03), although not for the overall sample (OR=1.70; P=0.12). The intervention resulted in a doubling of the 6-month cessation rates among workers in the intervention worksites compared to those in the control, for production workers (OR=2.29; P=0.07) and for the overall sample (OR=1.81; P=0.13), but the difference did not reach statistical significance. Conclusions These findings demonstrate the potential impact of a tobacco control intervention that combined tobacco control and health protection programming within Indian manufacturing worksites. PMID:26883793

  20. Evaluating concentration estimation errors in ELISA microarray experiments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Daly, Don S.; White, Amanda M.; Varnum, Susan M.

    Enzyme-linked immunosorbent assay (ELISA) is a standard immunoassay to predict a protein concentration in a sample. Deploying ELISA in a microarray format permits simultaneous prediction of the concentrations of numerous proteins in a small sample. These predictions, however, are uncertain due to processing error and biological variability. Evaluating prediction error is critical to interpreting biological significance and improving the ELISA microarray process. Evaluating prediction error must be automated to realize a reliable high-throughput ELISA microarray system. Methods: In this paper, we present a statistical method based on propagation of error to evaluate prediction errors in the ELISA microarray process. Althoughmore » propagation of error is central to this method, it is effective only when comparable data are available. Therefore, we briefly discuss the roles of experimental design, data screening, normalization and statistical diagnostics when evaluating ELISA microarray prediction errors. We use an ELISA microarray investigation of breast cancer biomarkers to illustrate the evaluation of prediction errors. The illustration begins with a description of the design and resulting data, followed by a brief discussion of data screening and normalization. In our illustration, we fit a standard curve to the screened and normalized data, review the modeling diagnostics, and apply propagation of error.« less

  1. An experimental loop design for the detection of constitutional chromosomal aberrations by array CGH

    PubMed Central

    2009-01-01

    Background Comparative genomic hybridization microarrays for the detection of constitutional chromosomal aberrations is the application of microarray technology coming fastest into routine clinical application. Through genotype-phenotype association, it is also an important technique towards the discovery of disease causing genes and genomewide functional annotation in human. When using a two-channel microarray of genomic DNA probes for array CGH, the basic setup consists in hybridizing a patient against a normal reference sample. Two major disadvantages of this setup are (1) the use of half of the resources to measure a (little informative) reference sample and (2) the possibility that deviating signals are caused by benign copy number variation in the "normal" reference instead of a patient aberration. Instead, we apply an experimental loop design that compares three patients in three hybridizations. Results We develop and compare two statistical methods (linear models of log ratios and mixed models of absolute measurements). In an analysis of 27 patients seen at our genetics center, we observed that the linear models of the log ratios are advantageous over the mixed models of the absolute intensities. Conclusion The loop design and the performance of the statistical analysis contribute to the quick adoption of array CGH as a routine diagnostic tool. They lower the detection limit of mosaicisms and improve the assignment of copy number variation for genetic association studies. PMID:19925645

  2. Phenolic Analysis and Theoretic Design for Chinese Commercial Wines' Authentication.

    PubMed

    Li, Si-Yu; Zhu, Bao-Qing; Reeves, Malcolm J; Duan, Chang-Qing

    2018-01-01

    To develop a robust tool for Chinese commercial wines' varietal, regional, and vintage authentication, phenolic compounds in 121 Chinese commercial dry red wines were detected and quantified by using high-performance liquid chromatography triple-quadrupole mass spectrometry (HPLC-QqQ-MS/MS), and differentiation abilities of principal component analysis (PCA), partial least squares discriminant analysis (PLS-DA), and orthogonal partial least squares discriminant analysis (OPLS-DA) were compared. Better than PCA and PLS-DA, OPLS-DA models used to differentiate wines according to their varieties (Cabernet Sauvignon or other varieties), regions (east or west Cabernet Sauvignon wines), and vintages (young or old Cabernet Sauvignon wines) were ideally established. The S-plot provided in OPLS-DA models showed the key phenolic compounds which were both statistically and biochemically significant in sample differentiation. Besides, the potential of the OPLS-DA models in deeper sample differentiating of more detailed regional and vintage information of wines was proved optimistic. On the basis of our results, a promising theoretic design for wine authentication was further proposed for the first time, which might be helpful in practical authentication of more commercial wines. The phenolic data of 121 Chinese commercial dry red wines was processed with different statistical tools for varietal, regional, and vintage differentiation. A promising theoretical design was summarized, which might be helpful for wine authentication in practical situation. © 2017 Institute of Food Technologists®.

  3. An experimental loop design for the detection of constitutional chromosomal aberrations by array CGH.

    PubMed

    Allemeersch, Joke; Van Vooren, Steven; Hannes, Femke; De Moor, Bart; Vermeesch, Joris Robert; Moreau, Yves

    2009-11-19

    Comparative genomic hybridization microarrays for the detection of constitutional chromosomal aberrations is the application of microarray technology coming fastest into routine clinical application. Through genotype-phenotype association, it is also an important technique towards the discovery of disease causing genes and genomewide functional annotation in human. When using a two-channel microarray of genomic DNA probes for array CGH, the basic setup consists in hybridizing a patient against a normal reference sample. Two major disadvantages of this setup are (1) the use of half of the resources to measure a (little informative) reference sample and (2) the possibility that deviating signals are caused by benign copy number variation in the "normal" reference instead of a patient aberration. Instead, we apply an experimental loop design that compares three patients in three hybridizations. We develop and compare two statistical methods (linear models of log ratios and mixed models of absolute measurements). In an analysis of 27 patients seen at our genetics center, we observed that the linear models of the log ratios are advantageous over the mixed models of the absolute intensities. The loop design and the performance of the statistical analysis contribute to the quick adoption of array CGH as a routine diagnostic tool. They lower the detection limit of mosaicisms and improve the assignment of copy number variation for genetic association studies.

  4. Statistically significant performance results of a mine detector and fusion algorithm from an x-band high-resolution SAR

    NASA Astrophysics Data System (ADS)

    Williams, Arnold C.; Pachowicz, Peter W.

    2004-09-01

    Current mine detection research indicates that no single sensor or single look from a sensor will detect mines/minefields in a real-time manner at a performance level suitable for a forward maneuver unit. Hence, the integrated development of detectors and fusion algorithms are of primary importance. A problem in this development process has been the evaluation of these algorithms with relatively small data sets, leading to anecdotal and frequently over trained results. These anecdotal results are often unreliable and conflicting among various sensors and algorithms. Consequently, the physical phenomena that ought to be exploited and the performance benefits of this exploitation are often ambiguous. The Army RDECOM CERDEC Night Vision Laboratory and Electron Sensors Directorate has collected large amounts of multisensor data such that statistically significant evaluations of detection and fusion algorithms can be obtained. Even with these large data sets care must be taken in algorithm design and data processing to achieve statistically significant performance results for combined detectors and fusion algorithms. This paper discusses statistically significant detection and combined multilook fusion results for the Ellipse Detector (ED) and the Piecewise Level Fusion Algorithm (PLFA). These statistically significant performance results are characterized by ROC curves that have been obtained through processing this multilook data for the high resolution SAR data of the Veridian X-Band radar. We discuss the implications of these results on mine detection and the importance of statistical significance, sample size, ground truth, and algorithm design in performance evaluation.

  5. An Analytic Solution to the Computation of Power and Sample Size for Genetic Association Studies under a Pleiotropic Mode of Inheritance.

    PubMed

    Gordon, Derek; Londono, Douglas; Patel, Payal; Kim, Wonkuk; Finch, Stephen J; Heiman, Gary A

    2016-01-01

    Our motivation here is to calculate the power of 3 statistical tests used when there are genetic traits that operate under a pleiotropic mode of inheritance and when qualitative phenotypes are defined by use of thresholds for the multiple quantitative phenotypes. Specifically, we formulate a multivariate function that provides the probability that an individual has a vector of specific quantitative trait values conditional on having a risk locus genotype, and we apply thresholds to define qualitative phenotypes (affected, unaffected) and compute penetrances and conditional genotype frequencies based on the multivariate function. We extend the analytic power and minimum-sample-size-necessary (MSSN) formulas for 2 categorical data-based tests (genotype, linear trend test [LTT]) of genetic association to the pleiotropic model. We further compare the MSSN of the genotype test and the LTT with that of a multivariate ANOVA (Pillai). We approximate the MSSN for statistics by linear models using a factorial design and ANOVA. With ANOVA decomposition, we determine which factors most significantly change the power/MSSN for all statistics. Finally, we determine which test statistics have the smallest MSSN. In this work, MSSN calculations are for 2 traits (bivariate distributions) only (for illustrative purposes). We note that the calculations may be extended to address any number of traits. Our key findings are that the genotype test usually has lower MSSN requirements than the LTT. More inclusive thresholds (top/bottom 25% vs. top/bottom 10%) have higher sample size requirements. The Pillai test has a much larger MSSN than both the genotype test and the LTT, as a result of sample selection. With these formulas, researchers can specify how many subjects they must collect to localize genes for pleiotropic phenotypes. © 2017 S. Karger AG, Basel.

  6. Statistical considerations in evaluating pharmacogenomics-based clinical effect for confirmatory trials.

    PubMed

    Wang, Sue-Jane; O'Neill, Robert T; Hung, Hm James

    2010-10-01

    The current practice for seeking genomically favorable patients in randomized controlled clinical trials using genomic convenience samples. To discuss the extent of imbalance, confounding, bias, design efficiency loss, type I error, and type II error that can occur in the evaluation of the convenience samples, particularly when they are small samples. To articulate statistical considerations for a reasonable sample size to minimize the chance of imbalance, and, to highlight the importance of replicating the subgroup finding in independent studies. Four case examples reflecting recent regulatory experiences are used to underscore the problems with convenience samples. Probability of imbalance for a pre-specified subgroup is provided to elucidate sample size needed to minimize the chance of imbalance. We use an example drug development to highlight the level of scientific rigor needed, with evidence replicated for a pre-specified subgroup claim. The convenience samples evaluated ranged from 18% to 38% of the intent-to-treat samples with sample size ranging from 100 to 5000 patients per arm. The baseline imbalance can occur with probability higher than 25%. Mild to moderate multiple confounders yielding the same directional bias in favor of the treated group can make treatment group incomparable at baseline and result in a false positive conclusion that there is a treatment difference. Conversely, if the same directional bias favors the placebo group or there is loss in design efficiency, the type II error can increase substantially. Pre-specification of a genomic subgroup hypothesis is useful only for some degree of type I error control. Complete ascertainment of genomic samples in a randomized controlled trial should be the first step to explore if a favorable genomic patient subgroup suggests a treatment effect when there is no clear prior knowledge and understanding about how the mechanism of a drug target affects the clinical outcome of interest. When stratified randomization based on genomic biomarker status cannot be implemented in designing a pharmacogenomics confirmatory clinical trial, if there is one genomic biomarker prognostic for clinical response, as a general rule of thumb, a sample size of at least 100 patients may be needed to be considered for the lower prevalence genomic subgroup to minimize the chance of an imbalance of 20% or more difference in the prevalence of the genomic marker. The sample size may need to be at least 150, 350, and 1350, respectively, if an imbalance of 15%, 10% and 5% difference is of concern.

  7. Applying Monte Carlo Simulation to Launch Vehicle Design and Requirements Verification

    NASA Technical Reports Server (NTRS)

    Hanson, John M.; Beard, Bernard B.

    2010-01-01

    This paper is focused on applying Monte Carlo simulation to probabilistic launch vehicle design and requirements verification. The approaches developed in this paper can be applied to other complex design efforts as well. Typically the verification must show that requirement "x" is met for at least "y" % of cases, with, say, 10% consumer risk or 90% confidence. Two particular aspects of making these runs for requirements verification will be explored in this paper. First, there are several types of uncertainties that should be handled in different ways, depending on when they become known (or not). The paper describes how to handle different types of uncertainties and how to develop vehicle models that can be used to examine their characteristics. This includes items that are not known exactly during the design phase but that will be known for each assembled vehicle (can be used to determine the payload capability and overall behavior of that vehicle), other items that become known before or on flight day (can be used for flight day trajectory design and go/no go decision), and items that remain unknown on flight day. Second, this paper explains a method (order statistics) for determining whether certain probabilistic requirements are met or not and enables the user to determine how many Monte Carlo samples are required. Order statistics is not new, but may not be known in general to the GN&C community. The methods also apply to determining the design values of parameters of interest in driving the vehicle design. The paper briefly discusses when it is desirable to fit a distribution to the experimental Monte Carlo results rather than using order statistics.

  8. FabricS: A user-friendly, complete and robust software for particle shape-fabric analysis

    NASA Astrophysics Data System (ADS)

    Moreno Chávez, G.; Castillo Rivera, F.; Sarocchi, D.; Borselli, L.; Rodríguez-Sedano, L. A.

    2018-06-01

    Shape-fabric is a textural parameter related to the spatial arrangement of elongated particles in geological samples. Its usefulness spans a range from sedimentary petrology to igneous and metamorphic petrology. Independently of the process being studied, when a material flows, the elongated particles are oriented with the major axis in the direction of flow. In sedimentary petrology this information has been used for studies of paleo-flow direction of turbidites, the origin of quartz sediments, and locating ignimbrite vents, among others. In addition to flow direction and its polarity, the method enables flow rheology to be inferred. The use of shape-fabric has been limited due to the difficulties of automatically measuring particles and analyzing them with reliable circular statistics programs. This has dampened interest in the method for a long time. Shape-fabric measurement has increased in popularity since the 1980s thanks to the development of new image analysis techniques and circular statistics software. However, the programs currently available are unreliable, old and are incompatible with newer operating systems, or require programming skills. The goal of our work is to develop a user-friendly program, in the MATLAB environment, with a graphical user interface, that can process images and includes editing functions, and thresholds (elongation and size) for selecting a particle population and analyzing it with reliable circular statistics algorithms. Moreover, the method also has to produce rose diagrams, orientation vectors, and a complete series of statistical parameters. All these requirements are met by our new software. In this paper, we briefly explain the methodology from collection of oriented samples in the field to the minimum number of particles needed to obtain reliable fabric data. We obtained the data using specific statistical tests and taking into account the degree of iso-orientation of the samples and the required degree of reliability. The program has been verified by means of several simulations performed using appropriately designed features and by analyzing real samples.

  9. On the Analysis of Case-Control Studies in Cluster-correlated Data Settings.

    PubMed

    Haneuse, Sebastien; Rivera-Rodriguez, Claudia

    2018-01-01

    In resource-limited settings, long-term evaluation of national antiretroviral treatment (ART) programs often relies on aggregated data, the analysis of which may be subject to ecological bias. As researchers and policy makers consider evaluating individual-level outcomes such as treatment adherence or mortality, the well-known case-control design is appealing in that it provides efficiency gains over random sampling. In the context that motivates this article, valid estimation and inference requires acknowledging any clustering, although, to our knowledge, no statistical methods have been published for the analysis of case-control data for which the underlying population exhibits clustering. Furthermore, in the specific context of an ongoing collaboration in Malawi, rather than performing case-control sampling across all clinics, case-control sampling within clinics has been suggested as a more practical strategy. To our knowledge, although similar outcome-dependent sampling schemes have been described in the literature, a case-control design specific to correlated data settings is new. In this article, we describe this design, discuss balanced versus unbalanced sampling techniques, and provide a general approach to analyzing case-control studies in cluster-correlated settings based on inverse probability-weighted generalized estimating equations. Inference is based on a robust sandwich estimator with correlation parameters estimated to ensure appropriate accounting of the outcome-dependent sampling scheme. We conduct comprehensive simulations, based in part on real data on a sample of N = 78,155 program registrants in Malawi between 2005 and 2007, to evaluate small-sample operating characteristics and potential trade-offs associated with standard case-control sampling or when case-control sampling is performed within clusters.

  10. Point process statistics in atom probe tomography.

    PubMed

    Philippe, T; Duguay, S; Grancher, G; Blavette, D

    2013-09-01

    We present a review of spatial point processes as statistical models that we have designed for the analysis and treatment of atom probe tomography (APT) data. As a major advantage, these methods do not require sampling. The mean distance to nearest neighbour is an attractive approach to exhibit a non-random atomic distribution. A χ(2) test based on distance distributions to nearest neighbour has been developed to detect deviation from randomness. Best-fit methods based on first nearest neighbour distance (1 NN method) and pair correlation function are presented and compared to assess the chemical composition of tiny clusters. Delaunay tessellation for cluster selection has been also illustrated. These statistical tools have been applied to APT experiments on microelectronics materials. Copyright © 2012 Elsevier B.V. All rights reserved.

  11. Cleanroom certification model

    NASA Technical Reports Server (NTRS)

    Currit, P. A.

    1983-01-01

    The Cleanroom software development methodology is designed to take the gamble out of product releases for both suppliers and receivers of the software. The ingredients of this procedure are a life cycle of executable product increments, representative statistical testing, and a standard estimate of the MTTF (Mean Time To Failure) of the product at the time of its release. A statistical approach to software product testing using randomly selected samples of test cases is considered. A statistical model is defined for the certification process which uses the timing data recorded during test. A reasonableness argument for this model is provided that uses previously published data on software product execution. Also included is a derivation of the certification model estimators and a comparison of the proposed least squares technique with the more commonly used maximum likelihood estimators.

  12. Adolescent cigarette smoking and health risk behavior.

    PubMed

    Busen, N H; Modeland, V; Kouzekanani, K

    2001-06-01

    During the past 30 years, tobacco use among adolescents has substantially increased, resulting in major health problems associated with tobacco consumption. The purpose of this study was to identify adolescent smoking behaviors and to determine the relationship among smoking, specific demographic variables, and health risk behaviors. The sample consisted of 93 self-selecting adolescents. An ex post facto design was used for this study and data were analyzed by using nonparametric statistics. Findings included a statistically significant relationship between lifetime cigarette use and ethnicity. Statistically significant relationships were also found among current cigarette use and ethnicity, alcohol use, marijuana use, suicidal thoughts, and age at first sexual intercourse. Nurses and other providers must recognize that cigarette smoking may indicate other risk behaviors common among adolescents. Copyright 2001 by W.B. Saunders Company

  13. How allele frequency and study design affect association test statistics with misrepresentation errors.

    PubMed

    Escott-Price, Valentina; Ghodsi, Mansoureh; Schmidt, Karl Michael

    2014-04-01

    We evaluate the effect of genotyping errors on the type-I error of a general association test based on genotypes, showing that, in the presence of errors in the case and control samples, the test statistic asymptotically follows a scaled non-central $\\chi ^2$ distribution. We give explicit formulae for the scaling factor and non-centrality parameter for the symmetric allele-based genotyping error model and for additive and recessive disease models. They show how genotyping errors can lead to a significantly higher false-positive rate, growing with sample size, compared with the nominal significance levels. The strength of this effect depends very strongly on the population distribution of the genotype, with a pronounced effect in the case of rare alleles, and a great robustness against error in the case of large minor allele frequency. We also show how these results can be used to correct $p$-values.

  14. A Statistics-Based Cracking Criterion of Resin-Bonded Silica Sand for Casting Process Simulation

    NASA Astrophysics Data System (ADS)

    Wang, Huimin; Lu, Yan; Ripplinger, Keith; Detwiler, Duane; Luo, Alan A.

    2017-02-01

    Cracking of sand molds/cores can result in many casting defects such as veining. A robust cracking criterion is needed in casting process simulation for predicting/controlling such defects. A cracking probability map, relating to fracture stress and effective volume, was proposed for resin-bonded silica sand based on Weibull statistics. Three-point bending test results of sand samples were used to generate the cracking map and set up a safety line for cracking criterion. Tensile test results confirmed the accuracy of the safety line for cracking prediction. A laboratory casting experiment was designed and carried out to predict cracking of a cup mold during aluminum casting. The stress-strain behavior and the effective volume of the cup molds were calculated using a finite element analysis code ProCAST®. Furthermore, an energy dispersive spectroscopy fractographic examination of the sand samples confirmed the binder cracking in resin-bonded silica sand.

  15. Integrated Assessment and Improvement of the Quality Assurance System for the Cosworth Casting Process

    NASA Astrophysics Data System (ADS)

    Yousif, Dilon

    The purpose of this study was to improve the Quality Assurance (QA) System at the Nemak Windsor Aluminum Plant (WAP). The project used Six Sigma method based on Define, Measure, Analyze, Improve, and Control (DMAIC). Analysis of in process melt at WAP was based on chemical, thermal, and mechanical testing. The control limits for the W319 Al Alloy were statistically recalculated using the composition measured under stable conditions. The "Chemistry Viewer" software was developed for statistical analysis of alloy composition. This software features the Silicon Equivalency (SiBQ) developed by the IRC. The Melt Sampling Device (MSD) was designed and evaluated at WAP to overcome traditional sampling limitations. The Thermal Analysis "Filters" software was developed for cooling curve analysis of the 3XX Al Alloy(s) using IRC techniques. The impact of low melting point impurities on the start of melting was evaluated using the Universal Metallurgical Simulator and Analyzer (UMSA).

  16. Planetary mass function and planetary systems

    NASA Astrophysics Data System (ADS)

    Dominik, M.

    2011-02-01

    With planets orbiting stars, a planetary mass function should not be seen as a low-mass extension of the stellar mass function, but a proper formalism needs to take care of the fact that the statistical properties of planet populations are linked to the properties of their respective host stars. This can be accounted for by describing planet populations by means of a differential planetary mass-radius-orbit function, which together with the fraction of stars with given properties that are orbited by planets and the stellar mass function allows the derivation of all statistics for any considered sample. These fundamental functions provide a framework for comparing statistics that result from different observing techniques and campaigns which all have their very specific selection procedures and detection efficiencies. Moreover, recent results both from gravitational microlensing campaigns and radial-velocity surveys of stars indicate that planets tend to cluster in systems rather than being the lonely child of their respective parent star. While planetary multiplicity in an observed system becomes obvious with the detection of several planets, its quantitative assessment however comes with the challenge to exclude the presence of further planets. Current exoplanet samples begin to give us first hints at the population statistics, whereas pictures of planet parameter space in its full complexity call for samples that are 2-4 orders of magnitude larger. In order to derive meaningful statistics, however, planet detection campaigns need to be designed in such a way that well-defined fully deterministic target selection, monitoring and detection criteria are applied. The probabilistic nature of gravitational microlensing makes this technique an illustrative example of all the encountered challenges and uncertainties.

  17. Failure to Replicate a Genetic Association May Provide Important Clues About Genetic Architecture

    PubMed Central

    Greene, Casey S.; Penrod, Nadia M.; Williams, Scott M.; Moore, Jason H.

    2009-01-01

    Replication has become the gold standard for assessing statistical results from genome-wide association studies. Unfortunately this replication requirement may cause real genetic effects to be missed. A real result can fail to replicate for numerous reasons including inadequate sample size or variability in phenotype definitions across independent samples. In genome-wide association studies the allele frequencies of polymorphisms may differ due to sampling error or population differences. We hypothesize that some statistically significant independent genetic effects may fail to replicate in an independent dataset when allele frequencies differ and the functional polymorphism interacts with one or more other functional polymorphisms. To test this hypothesis, we designed a simulation study in which case-control status was determined by two interacting polymorphisms with heritabilities ranging from 0.025 to 0.4 with replication sample sizes ranging from 400 to 1600 individuals. We show that the power to replicate the statistically significant independent main effect of one polymorphism can drop dramatically with a change of allele frequency of less than 0.1 at a second interacting polymorphism. We also show that differences in allele frequency can result in a reversal of allelic effects where a protective allele becomes a risk factor in replication studies. These results suggest that failure to replicate an independent genetic effect may provide important clues about the complexity of the underlying genetic architecture. We recommend that polymorphisms that fail to replicate be checked for interactions with other polymorphisms, particularly when samples are collected from groups with distinct ethnic backgrounds or different geographic regions. PMID:19503614

  18. Rasch fit statistics and sample size considerations for polytomous data.

    PubMed

    Smith, Adam B; Rush, Robert; Fallowfield, Lesley J; Velikova, Galina; Sharpe, Michael

    2008-05-29

    Previous research on educational data has demonstrated that Rasch fit statistics (mean squares and t-statistics) are highly susceptible to sample size variation for dichotomously scored rating data, although little is known about this relationship for polytomous data. These statistics help inform researchers about how well items fit to a unidimensional latent trait, and are an important adjunct to modern psychometrics. Given the increasing use of Rasch models in health research the purpose of this study was therefore to explore the relationship between fit statistics and sample size for polytomous data. Data were collated from a heterogeneous sample of cancer patients (n = 4072) who had completed both the Patient Health Questionnaire - 9 and the Hospital Anxiety and Depression Scale. Ten samples were drawn with replacement for each of eight sample sizes (n = 25 to n = 3200). The Rating and Partial Credit Models were applied and the mean square and t-fit statistics (infit/outfit) derived for each model. The results demonstrated that t-statistics were highly sensitive to sample size, whereas mean square statistics remained relatively stable for polytomous data. It was concluded that mean square statistics were relatively independent of sample size for polytomous data and that misfit to the model could be identified using published recommended ranges.

  19. Rasch fit statistics and sample size considerations for polytomous data

    PubMed Central

    Smith, Adam B; Rush, Robert; Fallowfield, Lesley J; Velikova, Galina; Sharpe, Michael

    2008-01-01

    Background Previous research on educational data has demonstrated that Rasch fit statistics (mean squares and t-statistics) are highly susceptible to sample size variation for dichotomously scored rating data, although little is known about this relationship for polytomous data. These statistics help inform researchers about how well items fit to a unidimensional latent trait, and are an important adjunct to modern psychometrics. Given the increasing use of Rasch models in health research the purpose of this study was therefore to explore the relationship between fit statistics and sample size for polytomous data. Methods Data were collated from a heterogeneous sample of cancer patients (n = 4072) who had completed both the Patient Health Questionnaire – 9 and the Hospital Anxiety and Depression Scale. Ten samples were drawn with replacement for each of eight sample sizes (n = 25 to n = 3200). The Rating and Partial Credit Models were applied and the mean square and t-fit statistics (infit/outfit) derived for each model. Results The results demonstrated that t-statistics were highly sensitive to sample size, whereas mean square statistics remained relatively stable for polytomous data. Conclusion It was concluded that mean square statistics were relatively independent of sample size for polytomous data and that misfit to the model could be identified using published recommended ranges. PMID:18510722

  20. The Marshall Islands Data Management Program

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stoker, A.C.; Conrado, C.L.

    1995-09-01

    This report is a resource document of the methods and procedures used currently in the Data Management Program of the Marshall Islands Dose Assessment and Radioecology Project. Since 1973, over 60,000 environmental samples have been collected. Our program includes relational database design, programming and maintenance; sample and information management; sample tracking; quality control; and data entry, evaluation and reduction. The usefulness of scientific databases involves careful planning in order to fulfill the requirements of any large research program. Compilation of scientific results requires consolidation of information from several databases, and incorporation of new information as it is generated. The successmore » in combining and organizing all radionuclide analysis, sample information and statistical results into a readily accessible form, is critical to our project.« less

  1. Understanding the Sampling Distribution and the Central Limit Theorem.

    ERIC Educational Resources Information Center

    Lewis, Charla P.

    The sampling distribution is a common source of misuse and misunderstanding in the study of statistics. The sampling distribution, underlying distribution, and the Central Limit Theorem are all interconnected in defining and explaining the proper use of the sampling distribution of various statistics. The sampling distribution of a statistic is…

  2. Comparing problem-based learning and lecture as methods to teach whole-systems design to engineering students

    NASA Astrophysics Data System (ADS)

    Dukes, Michael Dickey

    The objective of this research is to compare problem-based learning and lecture as methods to teach whole-systems design to engineering students. A case study, Appendix A, exemplifying successful whole-systems design was developed and written by the author in partnership with the Rocky Mountain Institute. Concepts to be tested were then determined, and a questionnaire was developed to test students' preconceptions. A control group of students was taught using traditional lecture methods, and a sample group of students was taught using problem-based learning methods. After several weeks, the students were given the same questionnaire as prior to the instruction, and the data was analyzed to determine if the teaching methods were effective in correcting misconceptions. A statistically significant change in the students' preconceptions was observed in both groups on the topic of cost related to the design process. There was no statistically significant change in the students' preconceptions concerning the design process, technical ability within five years, and the possibility of drastic efficiency gains with current technologies. However, the results were inconclusive in determining that problem-based learning is more effective than lecture as a method for teaching the concept of whole-systems design, or vice versa.

  3. Developing Students' Reasoning about Samples and Sampling Variability as a Path to Expert Statistical Thinking

    ERIC Educational Resources Information Center

    Garfield, Joan; Le, Laura; Zieffler, Andrew; Ben-Zvi, Dani

    2015-01-01

    This paper describes the importance of developing students' reasoning about samples and sampling variability as a foundation for statistical thinking. Research on expert-novice thinking as well as statistical thinking is reviewed and compared. A case is made that statistical thinking is a type of expert thinking, and as such, research…

  4. A new u-statistic with superior design sensitivity in matched observational studies.

    PubMed

    Rosenbaum, Paul R

    2011-09-01

    In an observational or nonrandomized study of treatment effects, a sensitivity analysis indicates the magnitude of bias from unmeasured covariates that would need to be present to alter the conclusions of a naïve analysis that presumes adjustments for observed covariates suffice to remove all bias. The power of sensitivity analysis is the probability that it will reject a false hypothesis about treatment effects allowing for a departure from random assignment of a specified magnitude; in particular, if this specified magnitude is "no departure" then this is the same as the power of a randomization test in a randomized experiment. A new family of u-statistics is proposed that includes Wilcoxon's signed rank statistic but also includes other statistics with substantially higher power when a sensitivity analysis is performed in an observational study. Wilcoxon's statistic has high power to detect small effects in large randomized experiments-that is, it often has good Pitman efficiency-but small effects are invariably sensitive to small unobserved biases. Members of this family of u-statistics that emphasize medium to large effects can have substantially higher power in a sensitivity analysis. For example, in one situation with 250 pair differences that are Normal with expectation 1/2 and variance 1, the power of a sensitivity analysis that uses Wilcoxon's statistic is 0.08 while the power of another member of the family of u-statistics is 0.66. The topic is examined by performing a sensitivity analysis in three observational studies, using an asymptotic measure called the design sensitivity, and by simulating power in finite samples. The three examples are drawn from epidemiology, clinical medicine, and genetic toxicology. © 2010, The International Biometric Society.

  5. Split-mouth design in Paediatric Dentistry clinical trials.

    PubMed

    Pozos-Guillén, A; Chavarría-Bolaños, D; Garrocho-Rangel, A

    2017-03-01

    The aim of this article was to describe the essential concepts of the split-mouth design, its underlying assumptions, advantages, limitations, statistical considerations, and possible applications in Paediatric Dentistry clinical investigation. In Paediatric Dentistry clinical investigation, and as part of randomised controlled trials, the split-mouth design is commonly used. The design is characterised by subdividing the child's dentition into halves (right and left), where two different treatment modalities are assigned to one side randomly, in order to allow further outcome evaluation. Each participant acts as their own control by making within- patient rather than between-patient comparisons, thus diminishing inter-subject variability and increasing study accuracy and power. However, the main problem with this design comprises the potential contamination of the treatment effect from one side to the other, or the "carry-across effect"; likewise, this design is not indicated when the oral disease to be treated is not symmetrically distributed (e.g. severity) in the mouth of children. Thus, in spite of its advantages, the split-mouth design can only be applied in a limited number of strictly selected cases. In order to obtain valid and reliable data from split mouth design studies, it is necessary to evaluate the risk of carry-across effect as well as to carefully analise and select adequate inclusion criteria, sample-size calculation and method of statistical analysis.

  6. Prevalence of diseases and statistical power of the Japan Nurses' Health Study.

    PubMed

    Fujita, Toshiharu; Hayashi, Kunihiko; Katanoda, Kota; Matsumura, Yasuhiro; Lee, Jung Su; Takagi, Hirofumi; Suzuki, Shosuke; Mizunuma, Hideki; Aso, Takeshi

    2007-10-01

    The Japan Nurses' Health Study (JNHS) is a long-term, large-scale cohort study investigating the effects of various lifestyle factors and healthcare habits on the health of Japanese women. Based on currently limited statistical data regarding the incidence of disease among Japanese women, our initial sample size was tentatively set at 50,000 during the design phase. The actual number of women who agreed to participate in follow-up surveys was approximately 18,000. Taking into account the actual sample size and new information on disease frequency obtained during the baseline component, we established the prevalence of past diagnoses of target diseases, predicted their incidence, and calculated the statistical power for JNHS follow-up surveys. For all diseases except ovarian cancer, the prevalence of a past diagnosis increased markedly with age, and incidence rates could be predicted based on the degree of increase in prevalence between two adjacent 5-yr age groups. The predicted incidence rate for uterine myoma, hypercholesterolemia, and hypertension was > or =3.0 (per 1,000 women, per year), while the rate of thyroid disease, hepatitis, gallstone disease, and benign breast tumor was predicted to be > or =1.0. For these diseases, the statistical power to detect risk factors with a relative risk of 1.5 or more within ten years, was 70% or higher.

  7. Knowledge level of effect size statistics, confidence intervals and meta-analysis in Spanish academic psychologists.

    PubMed

    Badenes-Ribera, Laura; Frias-Navarro, Dolores; Pascual-Soler, Marcos; Monterde-I-Bort, Héctor

    2016-11-01

    The statistical reform movement and the American Psychological Association (APA) defend the use of estimators of the effect size and its confidence intervals, as well as the interpretation of the clinical significance of the findings. A survey was conducted in which academic psychologists were asked about their behavior in designing and carrying out their studies. The sample was composed of 472 participants (45.8% men). The mean number of years as a university professor was 13.56 years (SD= 9.27). The use of effect-size estimators is becoming generalized, as well as the consideration of meta-analytic studies. However, several inadequate practices still persist. A traditional model of methodological behavior based on statistical significance tests is maintained, based on the predominance of Cohen’s d and the unadjusted R2/η2, which are not immune to outliers or departure from normality and the violations of statistical assumptions, and the under-reporting of confidence intervals of effect-size statistics. The paper concludes with recommendations for improving statistical practice.

  8. Designing Studies That Would Address the Multilayered Nature of Health Care

    PubMed Central

    Pennell, Michael; Rhoda, Dale; Hade, Erinn M.; Paskett, Electra D.

    2010-01-01

    We review design and analytic methods available for multilevel interventions in cancer research with particular attention to study design, sample size requirements, and potential to provide statistical evidence for causal inference. The most appropriate methods will depend on the stage of development of the research and whether randomization is possible. Early on, fractional factorial designs may be used to screen intervention components, particularly when randomization of individuals is possible. Quasi-experimental designs, including time-series and multiple baseline designs, can be useful once the intervention is designed because they require few sites and can provide the preliminary evidence to plan efficacy studies. In efficacy and effectiveness studies, group-randomized trials are preferred when randomization is possible and regression discontinuity designs are preferred otherwise if assignment based on a quantitative score is possible. Quasi-experimental designs may be used, especially when combined with recent developments in analytic methods to reduce bias in effect estimates. PMID:20386057

  9. Stated Choice design comparison in a developing country: recall and attribute nonattendance

    PubMed Central

    2014-01-01

    Background Experimental designs constitute a vital component of all Stated Choice (aka discrete choice experiment) studies. However, there exists limited empirical evaluation of the statistical benefits of Stated Choice (SC) experimental designs that employ non-zero prior estimates in constructing non-orthogonal constrained designs. This paper statistically compares the performance of contrasting SC experimental designs. In so doing, the effect of respondent literacy on patterns of Attribute non-Attendance (ANA) across fractional factorial orthogonal and efficient designs is also evaluated. The study uses a ‘real’ SC design to model consumer choice of primary health care providers in rural north India. A total of 623 respondents were sampled across four villages in Uttar Pradesh, India. Methods Comparison of orthogonal and efficient SC experimental designs is based on several measures. Appropriate comparison of each design’s respective efficiency measure is made using D-error results. Standardised Akaike Information Criteria are compared between designs and across recall periods. Comparisons control for stated and inferred ANA. Coefficient and standard error estimates are also compared. Results The added complexity of the efficient SC design, theorised elsewhere, is reflected in higher estimated amounts of ANA among illiterate respondents. However, controlling for ANA using stated and inferred methods consistently shows that the efficient design performs statistically better. Modelling SC data from the orthogonal and efficient design shows that model-fit of the efficient design outperform the orthogonal design when using a 14-day recall period. The performance of the orthogonal design, with respect to standardised AIC model-fit, is better when longer recall periods of 30-days, 6-months and 12-months are used. Conclusions The effect of the efficient design’s cognitive demand is apparent among literate and illiterate respondents, although, more pronounced among illiterate respondents. This study empirically confirms that relaxing the orthogonality constraint of SC experimental designs increases the information collected in choice tasks, subject to the accuracy of the non-zero priors in the design and the correct specification of a ‘real’ SC recall period. PMID:25386388

  10. Evaluating the effect of disturbed ensemble distributions on SCFG based statistical sampling of RNA secondary structures.

    PubMed

    Scheid, Anika; Nebel, Markus E

    2012-07-09

    Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent) stochastic context-free grammar (SCFG) that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF) approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples), where neither of these two competing approaches generally outperforms the other. In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones), then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst-case time requirements of such an SCFG based sampling method without significant accuracy losses. If, on the other hand, the quality of sampled structures can be observed to strongly react to slight disturbances, there is little hope for improving the complexity by heuristic procedures. We hence provide a reliable test for the hypothesis that a heuristic method could be implemented to improve the time scaling of RNA secondary structure prediction in the worst-case - without sacrificing much of the accuracy of the results. Our experiments indicate that absolute errors generally lead to the generation of useless sample sets, whereas relative errors seem to have only small negative impact on both the predictive accuracy and the overall quality of resulting structure samples. Based on these observations, we present some useful ideas for developing a time-reduced sampling method guaranteeing an acceptable predictive accuracy. We also discuss some inherent drawbacks that arise in the context of approximation. The key results of this paper are crucial for the design of an efficient and competitive heuristic prediction method based on the increasingly accepted and attractive statistical sampling approach. This has indeed been indicated by the construction of prototype algorithms.

  11. Evaluating the effect of disturbed ensemble distributions on SCFG based statistical sampling of RNA secondary structures

    PubMed Central

    2012-01-01

    Background Over the past years, statistical and Bayesian approaches have become increasingly appreciated to address the long-standing problem of computational RNA structure prediction. Recently, a novel probabilistic method for the prediction of RNA secondary structures from a single sequence has been studied which is based on generating statistically representative and reproducible samples of the entire ensemble of feasible structures for a particular input sequence. This method samples the possible foldings from a distribution implied by a sophisticated (traditional or length-dependent) stochastic context-free grammar (SCFG) that mirrors the standard thermodynamic model applied in modern physics-based prediction algorithms. Specifically, that grammar represents an exact probabilistic counterpart to the energy model underlying the Sfold software, which employs a sampling extension of the partition function (PF) approach to produce statistically representative subsets of the Boltzmann-weighted ensemble. Although both sampling approaches have the same worst-case time and space complexities, it has been indicated that they differ in performance (both with respect to prediction accuracy and quality of generated samples), where neither of these two competing approaches generally outperforms the other. Results In this work, we will consider the SCFG based approach in order to perform an analysis on how the quality of generated sample sets and the corresponding prediction accuracy changes when different degrees of disturbances are incorporated into the needed sampling probabilities. This is motivated by the fact that if the results prove to be resistant to large errors on the distinct sampling probabilities (compared to the exact ones), then it will be an indication that these probabilities do not need to be computed exactly, but it may be sufficient and more efficient to approximate them. Thus, it might then be possible to decrease the worst-case time requirements of such an SCFG based sampling method without significant accuracy losses. If, on the other hand, the quality of sampled structures can be observed to strongly react to slight disturbances, there is little hope for improving the complexity by heuristic procedures. We hence provide a reliable test for the hypothesis that a heuristic method could be implemented to improve the time scaling of RNA secondary structure prediction in the worst-case – without sacrificing much of the accuracy of the results. Conclusions Our experiments indicate that absolute errors generally lead to the generation of useless sample sets, whereas relative errors seem to have only small negative impact on both the predictive accuracy and the overall quality of resulting structure samples. Based on these observations, we present some useful ideas for developing a time-reduced sampling method guaranteeing an acceptable predictive accuracy. We also discuss some inherent drawbacks that arise in the context of approximation. The key results of this paper are crucial for the design of an efficient and competitive heuristic prediction method based on the increasingly accepted and attractive statistical sampling approach. This has indeed been indicated by the construction of prototype algorithms. PMID:22776037

  12. Navigating complex sample analysis using national survey data.

    PubMed

    Saylor, Jennifer; Friedmann, Erika; Lee, Hyeon Joo

    2012-01-01

    The National Center for Health Statistics conducts the National Health and Nutrition Examination Survey and other national surveys with probability-based complex sample designs. Goals of national surveys are to provide valid data for the population of the United States. Analyses of data from population surveys present unique challenges in the research process but are valuable avenues to study the health of the United States population. The aim of this study was to demonstrate the importance of using complex data analysis techniques for data obtained with complex multistage sampling design and provide an example of analysis using the SPSS Complex Samples procedure. Illustration of challenges and solutions specific to secondary data analysis of national databases are described using the National Health and Nutrition Examination Survey as the exemplar. Oversampling of small or sensitive groups provides necessary estimates of variability within small groups. Use of weights without complex samples accurately estimates population means and frequency from the sample after accounting for over- or undersampling of specific groups. Weighting alone leads to inappropriate population estimates of variability, because they are computed as if the measures were from the entire population rather than a sample in the data set. The SPSS Complex Samples procedure allows inclusion of all sampling design elements, stratification, clusters, and weights. Use of national data sets allows use of extensive, expensive, and well-documented survey data for exploratory questions but limits analysis to those variables included in the data set. The large sample permits examination of multiple predictors and interactive relationships. Merging data files, availability of data in several waves of surveys, and complex sampling are techniques used to provide a representative sample but present unique challenges. In sophisticated data analysis techniques, use of these data is optimized.

  13. Statistical controversies in clinical research: building the bridge to phase II-efficacy estimation in dose-expansion cohorts.

    PubMed

    Boonstra, P S; Braun, T M; Taylor, J M G; Kidwell, K M; Bellile, E L; Daignault, S; Zhao, L; Griffith, K A; Lawrence, T S; Kalemkerian, G P; Schipper, M J

    2017-07-01

    Regulatory agencies and others have expressed concern about the uncritical use of dose expansion cohorts (DECs) in phase I oncology trials. Nonetheless, by several metrics-prevalence, size, and number-their popularity is increasing. Although early efficacy estimation in defined populations is a common primary endpoint of DECs, the types of designs best equipped to identify efficacy signals have not been established. We conducted a simulation study of six phase I design templates with multiple DECs: three dose-assignment/adjustment mechanisms multiplied by two analytic approaches for estimating efficacy after the trial is complete. We also investigated the effect of sample size and interim futility analysis on trial performance. Identifying populations in which the treatment is efficacious (true positives) and weeding out inefficacious treatment/populations (true negatives) are competing goals in these trials. Thus, we estimated true and false positive rates for each design. Adaptively updating the MTD during the DEC improved true positive rates by 8-43% compared with fixing the dose during the DEC phase while maintaining false positive rates. Inclusion of an interim futility analysis decreased the number of patients treated under inefficacious DECs without hurting performance. A substantial gain in efficiency is obtainable using a design template that statistically models toxicity and efficacy against dose level during expansion. Design choices for dose expansion should be motivated by and based upon expected performance. Similar to the common practice in single-arm phase II trials, cohort sample sizes should be justified with respect to their primary aim and include interim analyses to allow for early stopping. © The Author 2017. Published by Oxford University Press on behalf of the European Society for Medical Oncology. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  14. An evaluation of inferential procedures for adaptive clinical trial designs with pre-specified rules for modifying the sample size.

    PubMed

    Levin, Gregory P; Emerson, Sarah C; Emerson, Scott S

    2014-09-01

    Many papers have introduced adaptive clinical trial methods that allow modifications to the sample size based on interim estimates of treatment effect. There has been extensive commentary on type I error control and efficiency considerations, but little research on estimation after an adaptive hypothesis test. We evaluate the reliability and precision of different inferential procedures in the presence of an adaptive design with pre-specified rules for modifying the sampling plan. We extend group sequential orderings of the outcome space based on the stage at stopping, likelihood ratio statistic, and sample mean to the adaptive setting in order to compute median-unbiased point estimates, exact confidence intervals, and P-values uniformly distributed under the null hypothesis. The likelihood ratio ordering is found to average shorter confidence intervals and produce higher probabilities of P-values below important thresholds than alternative approaches. The bias adjusted mean demonstrates the lowest mean squared error among candidate point estimates. A conditional error-based approach in the literature has the benefit of being the only method that accommodates unplanned adaptations. We compare the performance of this and other methods in order to quantify the cost of failing to plan ahead in settings where adaptations could realistically be pre-specified at the design stage. We find the cost to be meaningful for all designs and treatment effects considered, and to be substantial for designs frequently proposed in the literature. © 2014, The International Biometric Society.

  15. Data mining and statistical inference in selective laser melting

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kamath, Chandrika

    Selective laser melting (SLM) is an additive manufacturing process that builds a complex three-dimensional part, layer-by-layer, using a laser beam to fuse fine metal powder together. The design freedom afforded by SLM comes associated with complexity. As the physical phenomena occur over a broad range of length and time scales, the computational cost of modeling the process is high. At the same time, the large number of parameters that control the quality of a part make experiments expensive. In this paper, we describe ways in which we can use data mining and statistical inference techniques to intelligently combine simulations andmore » experiments to build parts with desired properties. We start with a brief summary of prior work in finding process parameters for high-density parts. We then expand on this work to show how we can improve the approach by using feature selection techniques to identify important variables, data-driven surrogate models to reduce computational costs, improved sampling techniques to cover the design space adequately, and uncertainty analysis for statistical inference. Here, our results indicate that techniques from data mining and statistics can complement those from physical modeling to provide greater insight into complex processes such as selective laser melting.« less

  16. Data mining and statistical inference in selective laser melting

    DOE PAGES

    Kamath, Chandrika

    2016-01-11

    Selective laser melting (SLM) is an additive manufacturing process that builds a complex three-dimensional part, layer-by-layer, using a laser beam to fuse fine metal powder together. The design freedom afforded by SLM comes associated with complexity. As the physical phenomena occur over a broad range of length and time scales, the computational cost of modeling the process is high. At the same time, the large number of parameters that control the quality of a part make experiments expensive. In this paper, we describe ways in which we can use data mining and statistical inference techniques to intelligently combine simulations andmore » experiments to build parts with desired properties. We start with a brief summary of prior work in finding process parameters for high-density parts. We then expand on this work to show how we can improve the approach by using feature selection techniques to identify important variables, data-driven surrogate models to reduce computational costs, improved sampling techniques to cover the design space adequately, and uncertainty analysis for statistical inference. Here, our results indicate that techniques from data mining and statistics can complement those from physical modeling to provide greater insight into complex processes such as selective laser melting.« less

  17. [Review of research design and statistical methods in Chinese Journal of Cardiology].

    PubMed

    Zhang, Li-jun; Yu, Jin-ming

    2009-07-01

    To evaluate the research design and the use of statistical methods in Chinese Journal of Cardiology. Peer through the research design and statistical methods in all of the original papers in Chinese Journal of Cardiology from December 2007 to November 2008. The most frequently used research designs are cross-sectional design (34%), prospective design (21%) and experimental design (25%). In all of the articles, 49 (25%) use wrong statistical methods, 29 (15%) lack some sort of statistic analysis, 23 (12%) have inconsistencies in description of methods. There are significant differences between different statistical methods (P < 0.001). The correction rates of multifactor analysis were low and repeated measurement datas were not used repeated measurement analysis. Many problems exist in Chinese Journal of Cardiology. Better research design and correct use of statistical methods are still needed. More strict review by statistician and epidemiologist is also required to improve the literature qualities.

  18. Accounting for selection bias in association studies with complex survey data.

    PubMed

    Wirth, Kathleen E; Tchetgen Tchetgen, Eric J

    2014-05-01

    Obtaining representative information from hidden and hard-to-reach populations is fundamental to describe the epidemiology of many sexually transmitted diseases, including HIV. Unfortunately, simple random sampling is impractical in these settings, as no registry of names exists from which to sample the population at random. However, complex sampling designs can be used, as members of these populations tend to congregate at known locations, which can be enumerated and sampled at random. For example, female sex workers may be found at brothels and street corners, whereas injection drug users often come together at shooting galleries. Despite the logistical appeal, complex sampling schemes lead to unequal probabilities of selection, and failure to account for this differential selection can result in biased estimates of population averages and relative risks. However, standard techniques to account for selection can lead to substantial losses in efficiency. Consequently, researchers implement a variety of strategies in an effort to balance validity and efficiency. Some researchers fully or partially account for the survey design, whereas others do nothing and treat the sample as a realization of the population of interest. We use directed acyclic graphs to show how certain survey sampling designs, combined with subject-matter considerations unique to individual exposure-outcome associations, can induce selection bias. Finally, we present a novel yet simple maximum likelihood approach for analyzing complex survey data; this approach optimizes statistical efficiency at no cost to validity. We use simulated data to illustrate this method and compare it with other analytic techniques.

  19. Random versus fixed-site sampling when monitoring relative abundance of fishes in headwater streams of the upper Colorado River basin

    USGS Publications Warehouse

    Quist, M.C.; Gerow, K.G.; Bower, M.R.; Hubert, W.A.

    2006-01-01

    Native fishes of the upper Colorado River basin (UCRB) have declined in distribution and abundance due to habitat degradation and interactions with normative fishes. Consequently, monitoring populations of both native and nonnative fishes is important for conservation of native species. We used data collected from Muddy Creek, Wyoming (2003-2004), to compare sample size estimates using a random and a fixed-site sampling design to monitor changes in catch per unit effort (CPUE) of native bluehead suckers Catostomus discobolus, flannelmouth suckers C. latipinnis, roundtail chub Gila robusta, and speckled dace Rhinichthys osculus, as well as nonnative creek chub Semotilus atromaculatus and white suckers C. commersonii. When one-pass backpack electrofishing was used, detection of 10% or 25% changes in CPUE (fish/100 m) at 60% statistical power required 50-1,000 randomly sampled reaches among species regardless of sampling design. However, use of a fixed-site sampling design with 25-50 reaches greatly enhanced the ability to detect changes in CPUE. The addition of seining did not appreciably reduce required effort. When detection of 25-50% changes in CPUE of native and nonnative fishes is acceptable, we recommend establishment of 25-50 fixed reaches sampled by one-pass electrofishing in Muddy Creek. Because Muddy Creek has habitat and fish assemblages characteristic of other headwater streams in the UCRB, our results are likely to apply to many other streams in the basin. ?? Copyright by the American Fisheries Society 2006.

  20. Promotional methods used by representatives of drug companies: A prospective survey in general practice

    PubMed Central

    Schramm, Jesper; Andersen, Morten; Vach, Kirstin; Kragstrup, Jakob; Peter Kampmann, Jens; Søndergaard, Jens

    2007-01-01

    Objective To examine the extent and composition of pharmaceutical industry representatives’ marketing techniques with a particular focus on drug sampling in relation to drug age. Design A group of 47 GPs prospectively collected data on drug promotional activities during a six-month period, and a sub-sample of 10 GPs furthermore recorded the representatives’ marketing techniques in detail. Setting Primary healthcare. Subjects General practitioners in the County of Funen, Denmark. Main outcome measures. Promotional visits and corresponding marketing techniques. Results The 47 GPs recorded 1050 visits corresponding to a median of 19 (range 3 to 63) per GP in the six months. The majority of drugs promoted (52%) were marketed more than five years ago. There was a statistically significant decline in the proportion of visits where drug samples were offered with drug age, but the decline was small OR 0.97 (95% CI 0.95;0.98) per year. Leaflets (68%), suggestions on how to improve therapy for a specific patient registered with the practice (53%), drug samples (48%), and gifts (36%) were the most frequently used marketing techniques. Conclusion Drug-industry representatives use a variety of promotional methods. The tendency to hand out drug samples was statistically significantly associated with drug age, but the decline was small. PMID:17497486

Top