Sample records for statistical data

  1. Development of computer-assisted instruction application for statistical data analysis android platform as learning resource

    NASA Astrophysics Data System (ADS)

    Hendikawati, P.; Arifudin, R.; Zahid, M. Z.

    2018-03-01

    This study aims to design an android Statistics Data Analysis application that can be accessed through mobile devices to making it easier for users to access. The Statistics Data Analysis application includes various topics of basic statistical along with a parametric statistics data analysis application. The output of this application system is parametric statistics data analysis that can be used for students, lecturers, and users who need the results of statistical calculations quickly and easily understood. Android application development is created using Java programming language. The server programming language uses PHP with the Code Igniter framework, and the database used MySQL. The system development methodology used is the Waterfall methodology with the stages of analysis, design, coding, testing, and implementation and system maintenance. This statistical data analysis application is expected to support statistical lecturing activities and make students easier to understand the statistical analysis of mobile devices.

  2. Neural Systems with Numerically Matched Input-Output Statistic: Isotonic Bivariate Statistical Modeling

    PubMed Central

    Fiori, Simone

    2007-01-01

    Bivariate statistical modeling from incomplete data is a useful statistical tool that allows to discover the model underlying two data sets when the data in the two sets do not correspond in size nor in ordering. Such situation may occur when the sizes of the two data sets do not match (i.e., there are “holes” in the data) or when the data sets have been acquired independently. Also, statistical modeling is useful when the amount of available data is enough to show relevant statistical features of the phenomenon underlying the data. We propose to tackle the problem of statistical modeling via a neural (nonlinear) system that is able to match its input-output statistic to the statistic of the available data sets. A key point of the new implementation proposed here is that it is based on look-up-table (LUT) neural systems, which guarantee a computationally advantageous way of implementing neural systems. A number of numerical experiments, performed on both synthetic and real-world data sets, illustrate the features of the proposed modeling procedure. PMID:18566641

  3. Reproducible detection of disease-associated markers from gene expression data.

    PubMed

    Omae, Katsuhiro; Komori, Osamu; Eguchi, Shinto

    2016-08-18

    Detection of disease-associated markers plays a crucial role in gene screening for biological studies. Two-sample test statistics, such as the t-statistic, are widely used to rank genes based on gene expression data. However, the resultant gene ranking is often not reproducible among different data sets. Such irreproducibility may be caused by disease heterogeneity. When we divided data into two subsets, we found that the signs of the two t-statistics were often reversed. Focusing on such instability, we proposed a sign-sum statistic that counts the signs of the t-statistics for all possible subsets. The proposed method excludes genes affected by heterogeneity, thereby improving the reproducibility of gene ranking. We compared the sign-sum statistic with the t-statistic by a theoretical evaluation of the upper confidence limit. Through simulations and applications to real data sets, we show that the sign-sum statistic exhibits superior performance. We derive the sign-sum statistic for getting a robust gene ranking. The sign-sum statistic gives more reproducible ranking than the t-statistic. Using simulated data sets we show that the sign-sum statistic excludes hetero-type genes well. Also for the real data sets, the sign-sum statistic performs well in a viewpoint of ranking reproducibility.

  4. [Big data in official statistics].

    PubMed

    Zwick, Markus

    2015-08-01

    The concept of "big data" stands to change the face of official statistics over the coming years, having an impact on almost all aspects of data production. The tasks of future statisticians will not necessarily be to produce new data, but rather to identify and make use of existing data to adequately describe social and economic phenomena. Until big data can be used correctly in official statistics, a lot of questions need to be answered and problems solved: the quality of data, data protection, privacy, and the sustainable availability are some of the more pressing issues to be addressed. The essential skills of official statisticians will undoubtedly change, and this implies a number of challenges to be faced by statistical education systems, in universities, and inside the statistical offices. The national statistical offices of the European Union have concluded a concrete strategy for exploring the possibilities of big data for official statistics, by means of the Big Data Roadmap and Action Plan 1.0. This is an important first step and will have a significant influence on implementing the concept of big data inside the statistical offices of Germany.

  5. Statistical Analysis of Research Data | Center for Cancer Research

    Cancer.gov

    Recent advances in cancer biology have resulted in the need for increased statistical analysis of research data. The Statistical Analysis of Research Data (SARD) course will be held on April 5-6, 2018 from 9 a.m.-5 p.m. at the National Institutes of Health's Natcher Conference Center, Balcony C on the Bethesda Campus. SARD is designed to provide an overview on the general principles of statistical analysis of research data.  The first day will feature univariate data analysis, including descriptive statistics, probability distributions, one- and two-sample inferential statistics.

  6. Perspectives on statistics education: observations from statistical consulting in an academic nursing environment.

    PubMed

    Hayat, Matthew J; Schmiege, Sarah J; Cook, Paul F

    2014-04-01

    Statistics knowledge is essential for understanding the nursing and health care literature, as well as for applying rigorous science in nursing research. Statistical consultants providing services to faculty and students in an academic nursing program have the opportunity to identify gaps and challenges in statistics education for nursing students. This information may be useful to curriculum committees and statistics educators. This article aims to provide perspective on statistics education stemming from the experiences of three experienced statistics educators who regularly collaborate and consult with nurse investigators. The authors share their knowledge and express their views about data management, data screening and manipulation, statistical software, types of scientific investigation, and advanced statistical topics not covered in the usual coursework. The suggestions provided promote a call for data to study these topics. Relevant data about statistics education can assist educators in developing comprehensive statistics coursework for nursing students. Copyright 2014, SLACK Incorporated.

  7. Statistical methods and neural network approaches for classification of data from multiple sources

    NASA Technical Reports Server (NTRS)

    Benediktsson, Jon Atli; Swain, Philip H.

    1990-01-01

    Statistical methods for classification of data from multiple data sources are investigated and compared to neural network models. A problem with using conventional multivariate statistical approaches for classification of data of multiple types is in general that a multivariate distribution cannot be assumed for the classes in the data sources. Another common problem with statistical classification methods is that the data sources are not equally reliable. This means that the data sources need to be weighted according to their reliability but most statistical classification methods do not have a mechanism for this. This research focuses on statistical methods which can overcome these problems: a method of statistical multisource analysis and consensus theory. Reliability measures for weighting the data sources in these methods are suggested and investigated. Secondly, this research focuses on neural network models. The neural networks are distribution free since no prior knowledge of the statistical distribution of the data is needed. This is an obvious advantage over most statistical classification methods. The neural networks also automatically take care of the problem involving how much weight each data source should have. On the other hand, their training process is iterative and can take a very long time. Methods to speed up the training procedure are introduced and investigated. Experimental results of classification using both neural network models and statistical methods are given, and the approaches are compared based on these results.

  8. Security of statistical data bases: invasion of privacy through attribute correlational modeling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Palley, M.A.

    This study develops, defines, and applies a statistical technique for the compromise of confidential information in a statistical data base. Attribute Correlational Modeling (ACM) recognizes that the information contained in a statistical data base represents real world statistical phenomena. As such, ACM assumes correlational behavior among the database attributes. ACM proceeds to compromise confidential information through creation of a regression model, where the confidential attribute is treated as the dependent variable. The typical statistical data base may preclude the direct application of regression. In this scenario, the research introduces the notion of a synthetic data base, created through legitimate queriesmore » of the actual data base, and through proportional random variation of responses to these queries. The synthetic data base is constructed to resemble the actual data base as closely as possible in a statistical sense. ACM then applies regression analysis to the synthetic data base, and utilizes the derived model to estimate confidential information in the actual database.« less

  9. Muscular Dystrophy: Data and Statistics

    MedlinePlus

    ... For… Media Policy Makers MD STARnet Data and Statistics Recommend on Facebook Tweet Share Compartir Expand All Collapse All The following data and statistics come from MD STARnet. Data from the MD ...

  10. Statistical Tutorial | Center for Cancer Research

    Cancer.gov

    Recent advances in cancer biology have resulted in the need for increased statistical analysis of research data.  ST is designed as a follow up to Statistical Analysis of Research Data (SARD) held in April 2018.  The tutorial will apply the general principles of statistical analysis of research data including descriptive statistics, z- and t-tests of means and mean

  11. 77 FR 65585 - Renewal of the Bureau of Labor Statistics Data Users Advisory Committee

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-29

    ... DEPARTMENT OF LABOR Bureau of Labor Statistics Renewal of the Bureau of Labor Statistics Data..., the Secretary of Labor has determined that the renewal of the Bureau of Labor Statistics Data Users... duties imposed upon the Commissioner of Labor Statistics by 29 U.S.C. 1 and 2. This determination follows...

  12. A statistical package for computing time and frequency domain analysis

    NASA Technical Reports Server (NTRS)

    Brownlow, J.

    1978-01-01

    The spectrum analysis (SPA) program is a general purpose digital computer program designed to aid in data analysis. The program does time and frequency domain statistical analyses as well as some preanalysis data preparation. The capabilities of the SPA program include linear trend removal and/or digital filtering of data, plotting and/or listing of both filtered and unfiltered data, time domain statistical characterization of data, and frequency domain statistical characterization of data.

  13. The Lure of Statistics in Data Mining

    ERIC Educational Resources Information Center

    Grover, Lovleen Kumar; Mehra, Rajni

    2008-01-01

    The field of Data Mining like Statistics concerns itself with "learning from data" or "turning data into information". For statisticians the term "Data mining" has a pejorative meaning. Instead of finding useful patterns in large volumes of data as in the case of Statistics, data mining has the connotation of searching for data to fit preconceived…

  14. A nonparametric spatial scan statistic for continuous data.

    PubMed

    Jung, Inkyung; Cho, Ho Jin

    2015-10-20

    Spatial scan statistics are widely used for spatial cluster detection, and several parametric models exist. For continuous data, a normal-based scan statistic can be used. However, the performance of the model has not been fully evaluated for non-normal data. We propose a nonparametric spatial scan statistic based on the Wilcoxon rank-sum test statistic and compared the performance of the method with parametric models via a simulation study under various scenarios. The nonparametric method outperforms the normal-based scan statistic in terms of power and accuracy in almost all cases under consideration in the simulation study. The proposed nonparametric spatial scan statistic is therefore an excellent alternative to the normal model for continuous data and is especially useful for data following skewed or heavy-tailed distributions.

  15. Rasch fit statistics and sample size considerations for polytomous data.

    PubMed

    Smith, Adam B; Rush, Robert; Fallowfield, Lesley J; Velikova, Galina; Sharpe, Michael

    2008-05-29

    Previous research on educational data has demonstrated that Rasch fit statistics (mean squares and t-statistics) are highly susceptible to sample size variation for dichotomously scored rating data, although little is known about this relationship for polytomous data. These statistics help inform researchers about how well items fit to a unidimensional latent trait, and are an important adjunct to modern psychometrics. Given the increasing use of Rasch models in health research the purpose of this study was therefore to explore the relationship between fit statistics and sample size for polytomous data. Data were collated from a heterogeneous sample of cancer patients (n = 4072) who had completed both the Patient Health Questionnaire - 9 and the Hospital Anxiety and Depression Scale. Ten samples were drawn with replacement for each of eight sample sizes (n = 25 to n = 3200). The Rating and Partial Credit Models were applied and the mean square and t-fit statistics (infit/outfit) derived for each model. The results demonstrated that t-statistics were highly sensitive to sample size, whereas mean square statistics remained relatively stable for polytomous data. It was concluded that mean square statistics were relatively independent of sample size for polytomous data and that misfit to the model could be identified using published recommended ranges.

  16. Rasch fit statistics and sample size considerations for polytomous data

    PubMed Central

    Smith, Adam B; Rush, Robert; Fallowfield, Lesley J; Velikova, Galina; Sharpe, Michael

    2008-01-01

    Background Previous research on educational data has demonstrated that Rasch fit statistics (mean squares and t-statistics) are highly susceptible to sample size variation for dichotomously scored rating data, although little is known about this relationship for polytomous data. These statistics help inform researchers about how well items fit to a unidimensional latent trait, and are an important adjunct to modern psychometrics. Given the increasing use of Rasch models in health research the purpose of this study was therefore to explore the relationship between fit statistics and sample size for polytomous data. Methods Data were collated from a heterogeneous sample of cancer patients (n = 4072) who had completed both the Patient Health Questionnaire – 9 and the Hospital Anxiety and Depression Scale. Ten samples were drawn with replacement for each of eight sample sizes (n = 25 to n = 3200). The Rating and Partial Credit Models were applied and the mean square and t-fit statistics (infit/outfit) derived for each model. Results The results demonstrated that t-statistics were highly sensitive to sample size, whereas mean square statistics remained relatively stable for polytomous data. Conclusion It was concluded that mean square statistics were relatively independent of sample size for polytomous data and that misfit to the model could be identified using published recommended ranges. PMID:18510722

  17. Interfaces between statistical analysis packages and the ESRI geographic information system

    NASA Technical Reports Server (NTRS)

    Masuoka, E.

    1980-01-01

    Interfaces between ESRI's geographic information system (GIS) data files and real valued data files written to facilitate statistical analysis and display of spatially referenced multivariable data are described. An example of data analysis which utilized the GIS and the statistical analysis system is presented to illustrate the utility of combining the analytic capability of a statistical package with the data management and display features of the GIS.

  18. Policy Safeguards and the Legitimacy of Highway Interdiction

    DTIC Science & Technology

    2016-12-01

    17 B. BIAS WITHIN LAW ENFORCEMENT ..............................................19 C. STATISTICAL DATA GATHERING...32 3. Controlling Discretion .................................................................36 4. Statistical Data Collection for Traffic Stops...49 A. DESCRIPTION OF STATISTICAL DATA COLLECTED ...............50 B. DATA ORGANIZATION AND ANALYSIS

  19. Statistical Data Analyses of Trace Chemical, Biochemical, and Physical Analytical Signatures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Udey, Ruth Norma

    Analytical and bioanalytical chemistry measurement results are most meaningful when interpreted using rigorous statistical treatments of the data. The same data set may provide many dimensions of information depending on the questions asked through the applied statistical methods. Three principal projects illustrated the wealth of information gained through the application of statistical data analyses to diverse problems.

  20. 49 CFR Schedule G to Subpart B of... - Selected Statistical Data

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 49 Transportation 8 2010-10-01 2010-10-01 false Selected Statistical Data G Schedule G to Subpart... Statistical Data [Dollars in thousands] () Greyhound Lines, Inc. () Trailways combined () All study carriers.... 9002, L. 9, col. (b) Other Statistics: 25Number of regulator route intercity passenger miles Sch. 9002...

  1. Errors in statistical decision making Chapter 2 in Applied Statistics in Agricultural, Biological, and Environmental Sciences

    USDA-ARS?s Scientific Manuscript database

    Agronomic and Environmental research experiments result in data that are analyzed using statistical methods. These data are unavoidably accompanied by uncertainty. Decisions about hypotheses, based on statistical analyses of these data are therefore subject to error. This error is of three types,...

  2. TOWARDS A BAYESIAN PERSPECTIVE ON STATISTICAL DISCLOSURE LIMITATION

    EPA Science Inventory

    National statistical offices and other organizations collect data on individual subjects (person, businesses, organizations), typically while assuring the subject that data pertaining to them will be held confidential. These data provide the raw material for statistical data pro...

  3. Statistical Tutorial | Center for Cancer Research

    Cancer.gov

    Recent advances in cancer biology have resulted in the need for increased statistical analysis of research data.  ST is designed as a follow up to Statistical Analysis of Research Data (SARD) held in April 2018.  The tutorial will apply the general principles of statistical analysis of research data including descriptive statistics, z- and t-tests of means and mean differences, simple and multiple linear regression, ANOVA tests, and Chi-Squared distribution.

  4. Novel Kalman Filter Algorithm for Statistical Monitoring of Extensive Landscapes with Synoptic Sensor Data

    PubMed Central

    Czaplewski, Raymond L.

    2015-01-01

    Wall-to-wall remotely sensed data are increasingly available to monitor landscape dynamics over large geographic areas. However, statistical monitoring programs that use post-stratification cannot fully utilize those sensor data. The Kalman filter (KF) is an alternative statistical estimator. I develop a new KF algorithm that is numerically robust with large numbers of study variables and auxiliary sensor variables. A National Forest Inventory (NFI) illustrates application within an official statistics program. Practical recommendations regarding remote sensing and statistical issues are offered. This algorithm has the potential to increase the value of synoptic sensor data for statistical monitoring of large geographic areas. PMID:26393588

  5. Multiple alignment-free sequence comparison

    PubMed Central

    Ren, Jie; Song, Kai; Sun, Fengzhu; Deng, Minghua; Reinert, Gesine

    2013-01-01

    Motivation: Recently, a range of new statistics have become available for the alignment-free comparison of two sequences based on k-tuple word content. Here, we extend these statistics to the simultaneous comparison of more than two sequences. Our suite of statistics contains, first, and , extensions of statistics for pairwise comparison of the joint k-tuple content of all the sequences, and second, , and , averages of sums of pairwise comparison statistics. The two tasks we consider are, first, to identify sequences that are similar to a set of target sequences, and, second, to measure the similarity within a set of sequences. Results: Our investigation uses both simulated data as well as cis-regulatory module data where the task is to identify cis-regulatory modules with similar transcription factor binding sites. We find that although for real data, all of our statistics show a similar performance, on simulated data the Shepp-type statistics are in some instances outperformed by star-type statistics. The multiple alignment-free statistics are more sensitive to contamination in the data than the pairwise average statistics. Availability: Our implementation of the five statistics is available as R package named ‘multiAlignFree’ at be http://www-rcf.usc.edu/∼fsun/Programs/multiAlignFree/multiAlignFreemain.html. Contact: reinert@stats.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23990418

  6. DATA ON YOUTH, 1967, A STATISTICAL DOCUMENT.

    ERIC Educational Resources Information Center

    SCHEIDER, GEORGE

    THE DATA IN THIS REPORT ARE STATISTICS ON YOUTH THROUGHOUT THE UNITED STATES AND IN NEW YORK STATE. INCLUDED ARE DATA ON POPULATION, SCHOOL STATISTICS, EMPLOYMENT, FAMILY INCOME, JUVENILE DELINQUENCY AND YOUTH CRIME (INCLUDING NEW YORK CITY FIGURES), AND TRAFFIC ACCIDENTS. THE STATISTICS ARE PRESENTED IN THE TEXT AND IN TABLES AND CHARTS. (NH)

  7. The Ontology of Biological and Clinical Statistics (OBCS) for standardized and reproducible statistical analysis.

    PubMed

    Zheng, Jie; Harris, Marcelline R; Masci, Anna Maria; Lin, Yu; Hero, Alfred; Smith, Barry; He, Yongqun

    2016-09-14

    Statistics play a critical role in biological and clinical research. However, most reports of scientific results in the published literature make it difficult for the reader to reproduce the statistical analyses performed in achieving those results because they provide inadequate documentation of the statistical tests and algorithms applied. The Ontology of Biological and Clinical Statistics (OBCS) is put forward here as a step towards solving this problem. The terms in OBCS including 'data collection', 'data transformation in statistics', 'data visualization', 'statistical data analysis', and 'drawing a conclusion based on data', cover the major types of statistical processes used in basic biological research and clinical outcome studies. OBCS is aligned with the Basic Formal Ontology (BFO) and extends the Ontology of Biomedical Investigations (OBI), an OBO (Open Biological and Biomedical Ontologies) Foundry ontology supported by over 20 research communities. Currently, OBCS comprehends 878 terms, representing 20 BFO classes, 403 OBI classes, 229 OBCS specific classes, and 122 classes imported from ten other OBO ontologies. We discuss two examples illustrating how the ontology is being applied. In the first (biological) use case, we describe how OBCS was applied to represent the high throughput microarray data analysis of immunological transcriptional profiles in human subjects vaccinated with an influenza vaccine. In the second (clinical outcomes) use case, we applied OBCS to represent the processing of electronic health care data to determine the associations between hospital staffing levels and patient mortality. Our case studies were designed to show how OBCS can be used for the consistent representation of statistical analysis pipelines under two different research paradigms. Other ongoing projects using OBCS for statistical data processing are also discussed. The OBCS source code and documentation are available at: https://github.com/obcs/obcs . The Ontology of Biological and Clinical Statistics (OBCS) is a community-based open source ontology in the domain of biological and clinical statistics. OBCS is a timely ontology that represents statistics-related terms and their relations in a rigorous fashion, facilitates standard data analysis and integration, and supports reproducible biological and clinical research.

  8. Evaluation of statistical treatments of left-censored environmental data using coincident uncensored data sets: I. Summary statistics

    USGS Publications Warehouse

    Antweiler, Ronald C.; Taylor, Howard E.

    2008-01-01

    The main classes of statistical treatment of below-detection limit (left-censored) environmental data for the determination of basic statistics that have been used in the literature are substitution methods, maximum likelihood, regression on order statistics (ROS), and nonparametric techniques. These treatments, along with using all instrument-generated data (even those below detection), were evaluated by examining data sets in which the true values of the censored data were known. It was found that for data sets with less than 70% censored data, the best technique overall for determination of summary statistics was the nonparametric Kaplan-Meier technique. ROS and the two substitution methods of assigning one-half the detection limit value to censored data or assigning a random number between zero and the detection limit to censored data were adequate alternatives. The use of these two substitution methods, however, requires a thorough understanding of how the laboratory censored the data. The technique of employing all instrument-generated data - including numbers below the detection limit - was found to be less adequate than the above techniques. At high degrees of censoring (greater than 70% censored data), no technique provided good estimates of summary statistics. Maximum likelihood techniques were found to be far inferior to all other treatments except substituting zero or the detection limit value to censored data.

  9. Using Pooled Data and Data Visualization to Introduce Statistical Concepts in the General Chemistry Laboratory

    ERIC Educational Resources Information Center

    Olsen, Robert J.

    2008-01-01

    I describe how data pooling and data visualization can be employed in the first-semester general chemistry laboratory to introduce core statistical concepts such as central tendency and dispersion of a data set. The pooled data are plotted as a 1-D scatterplot, a purpose-designed number line through which statistical features of the data are…

  10. Analysis of Longitudinal Outcome Data with Missing Values in Total Knee Arthroplasty.

    PubMed

    Kang, Yeon Gwi; Lee, Jang Taek; Kang, Jong Yeal; Kim, Ga Hye; Kim, Tae Kyun

    2016-01-01

    We sought to determine the influence of missing data on the statistical results, and to determine which statistical method is most appropriate for the analysis of longitudinal outcome data of TKA with missing values among repeated measures ANOVA, generalized estimating equation (GEE) and mixed effects model repeated measures (MMRM). Data sets with missing values were generated with different proportion of missing data, sample size and missing-data generation mechanism. Each data set was analyzed with three statistical methods. The influence of missing data was greater with higher proportion of missing data and smaller sample size. MMRM tended to show least changes in the statistics. When missing values were generated by 'missing not at random' mechanism, no statistical methods could fully avoid deviations in the results. Copyright © 2016 Elsevier Inc. All rights reserved.

  11. A Framework for Assessing High School Students' Statistical Reasoning.

    PubMed

    Chan, Shiau Wei; Ismail, Zaleha; Sumintono, Bambang

    2016-01-01

    Based on a synthesis of literature, earlier studies, analyses and observations on high school students, this study developed an initial framework for assessing students' statistical reasoning about descriptive statistics. Framework descriptors were established across five levels of statistical reasoning and four key constructs. The former consisted of idiosyncratic reasoning, verbal reasoning, transitional reasoning, procedural reasoning, and integrated process reasoning. The latter include describing data, organizing and reducing data, representing data, and analyzing and interpreting data. In contrast to earlier studies, this initial framework formulated a complete and coherent statistical reasoning framework. A statistical reasoning assessment tool was then constructed from this initial framework. The tool was administered to 10 tenth-grade students in a task-based interview. The initial framework was refined, and the statistical reasoning assessment tool was revised. The ten students then participated in the second task-based interview, and the data obtained were used to validate the framework. The findings showed that the students' statistical reasoning levels were consistent across the four constructs, and this result confirmed the framework's cohesion. Developed to contribute to statistics education, this newly developed statistical reasoning framework provides a guide for planning learning goals and designing instruction and assessments.

  12. A Framework for Assessing High School Students' Statistical Reasoning

    PubMed Central

    2016-01-01

    Based on a synthesis of literature, earlier studies, analyses and observations on high school students, this study developed an initial framework for assessing students’ statistical reasoning about descriptive statistics. Framework descriptors were established across five levels of statistical reasoning and four key constructs. The former consisted of idiosyncratic reasoning, verbal reasoning, transitional reasoning, procedural reasoning, and integrated process reasoning. The latter include describing data, organizing and reducing data, representing data, and analyzing and interpreting data. In contrast to earlier studies, this initial framework formulated a complete and coherent statistical reasoning framework. A statistical reasoning assessment tool was then constructed from this initial framework. The tool was administered to 10 tenth-grade students in a task-based interview. The initial framework was refined, and the statistical reasoning assessment tool was revised. The ten students then participated in the second task-based interview, and the data obtained were used to validate the framework. The findings showed that the students’ statistical reasoning levels were consistent across the four constructs, and this result confirmed the framework’s cohesion. Developed to contribute to statistics education, this newly developed statistical reasoning framework provides a guide for planning learning goals and designing instruction and assessments. PMID:27812091

  13. Data and Statistics of DVT/PE

    MedlinePlus

    ... Controls Cancel Submit Search the CDC Venous Thromboembolism (Blood Clots) Note: Javascript is disabled or is not supported ... Challenge HA-VTE Data & Statistics HA-VTE Resources Blood Clots and Travel Research and Treatment Centers Data & Statistics ...

  14. The Importance of Statistical Modeling in Data Analysis and Inference

    ERIC Educational Resources Information Center

    Rollins, Derrick, Sr.

    2017-01-01

    Statistical inference simply means to draw a conclusion based on information that comes from data. Error bars are the most commonly used tool for data analysis and inference in chemical engineering data studies. This work demonstrates, using common types of data collection studies, the importance of specifying the statistical model for sound…

  15. Internet Data Analysis for the Undergraduate Statistics Curriculum

    ERIC Educational Resources Information Center

    Sanchez, Juana; He, Yan

    2005-01-01

    Statistics textbooks for undergraduates have not caught up with the enormous amount of analysis of Internet data that is taking place these days. Case studies that use Web server log data or Internet network traffic data are rare in undergraduate Statistics education. And yet these data provide numerous examples of skewed and bimodal…

  16. 29 CFR 1904.42 - Requests from the Bureau of Labor Statistics for data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 29 Labor 5 2010-07-01 2010-07-01 false Requests from the Bureau of Labor Statistics for data. 1904... Statistics for data. (a) Basic requirement. If you receive a Survey of Occupational Injuries and Illnesses Form from the Bureau of Labor Statistics (BLS), or a BLS designee, you must promptly complete the form...

  17. Descriptive statistics.

    PubMed

    Nick, Todd G

    2007-01-01

    Statistics is defined by the Medical Subject Headings (MeSH) thesaurus as the science and art of collecting, summarizing, and analyzing data that are subject to random variation. The two broad categories of summarizing and analyzing data are referred to as descriptive and inferential statistics. This chapter considers the science and art of summarizing data where descriptive statistics and graphics are used to display data. In this chapter, we discuss the fundamentals of descriptive statistics, including describing qualitative and quantitative variables. For describing quantitative variables, measures of location and spread, for example the standard deviation, are presented along with graphical presentations. We also discuss distributions of statistics, for example the variance, as well as the use of transformations. The concepts in this chapter are useful for uncovering patterns within the data and for effectively presenting the results of a project.

  18. Statistical Policy Working Paper 24. Electronic Dissemination of Statistical Data

    DOT National Transportation Integrated Search

    1995-11-01

    The report, Statistical Policy Working Paper 24, Electronic Dissemination of Statistical Data, includes several topics, such as Options and Best Uses for Different Media Operation of Electronic Dissemination Service, Customer Service Programs, Cost a...

  19. Data Literacy is Statistical Literacy

    ERIC Educational Resources Information Center

    Gould, Robert

    2017-01-01

    Past definitions of statistical literacy should be updated in order to account for the greatly amplified role that data now play in our lives. Experience working with high-school students in an innovative data science curriculum has shown that teaching statistical literacy, augmented by data literacy, can begin early.

  20. Key Statistics from the National Survey of Family Growth: Vasectomy

    MedlinePlus

    ... Birth Data NCHS Key Statistics from the National Survey of Family Growth - V Listing Recommend on Facebook ... What's this? Submit Button Related Sites NCHS Listservs Surveys and Data Collection Systems Vital Statistics: Birth Data ...

  1. An Analysis of the Navy’s Voluntary Education Program

    DTIC Science & Technology

    2007-03-01

    NAVAL ANALYSIS VOLED STUDY .........11 1. Data .........................................11 2. Statistical Models ...........................12 3...B. EMPLOYER FINANCED GENERAL TRAINING ................31 1. Data .........................................32 2. Statistical Model...37 1. Data .........................................38 2. Statistical Model ............................38 3. Findings

  2. Statistical Diversions

    ERIC Educational Resources Information Center

    Petocz, Peter; Sowey, Eric

    2012-01-01

    The term "data snooping" refers to the practice of choosing which statistical analyses to apply to a set of data after having first looked at those data. Data snooping contradicts a fundamental precept of applied statistics, that the scheme of analysis is to be planned in advance. In this column, the authors shall elucidate the…

  3. 75 FR 24718 - Guidance for Industry on Documenting Statistical Analysis Programs and Data Files; Availability

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-05-05

    ...] Guidance for Industry on Documenting Statistical Analysis Programs and Data Files; Availability AGENCY... Programs and Data Files.'' This guidance is provided to inform study statisticians of recommendations for documenting statistical analyses and data files submitted to the Center for Veterinary Medicine (CVM) for the...

  4. 49 CFR Schedule G to Subpart B of... - Selected Statistical Data

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 49 Transportation 8 2011-10-01 2011-10-01 false Selected Statistical Data G Schedule G to Subpart... Statistical Data [Dollars in thousands] () Greyhound Lines, Inc. () Trailways combined () All study carriers... purpose of Schedule G is to develop selected property, labor and operational data for use in evaluating...

  5. Using Facebook Data to Turn Introductory Statistics Students into Consultants

    ERIC Educational Resources Information Center

    Childers, Adam F.

    2017-01-01

    Facebook provides businesses and organizations with copious data that describe how users are interacting with their page. This data affords an excellent opportunity to turn introductory statistics students into consultants to analyze the Facebook data using descriptive and inferential statistics. This paper details a semester-long project that…

  6. Communications Link Characterization Experiment (CLCE) technical data report, volume 2

    NASA Technical Reports Server (NTRS)

    1977-01-01

    The results are presented of the long term rain rate statistical analysis and the investigation of determining the worst month statistical from the measured attenuation data caused by precipitation. The rain rate statistics cover a period of 11 months from July of 1974 to May of 1975 for measurements taken at the NASA, Rosman station. The rain rate statistical analysis is a continuation of the analysis of the rain rate data accumulated for the ATS-6 Millimeter Wave Progation Experiment. The statistical characteristics of the rain rate data through December of 1974 is also presented for the above experiment.

  7. Teaching for Statistical Literacy: Utilising Affordances in Real-World Data

    ERIC Educational Resources Information Center

    Chick, Helen L.; Pierce, Robyn

    2012-01-01

    It is widely held that context is important in teaching mathematics and statistics. Consideration of context is central to statistical thinking, and any teaching of statistics must incorporate this aspect. Indeed, it has been advocated that real-world data sets can motivate the learning of statistical principles. It is not, however, a…

  8. Small Sample Statistics for Incomplete Nonnormal Data: Extensions of Complete Data Formulae and a Monte Carlo Comparison

    ERIC Educational Resources Information Center

    Savalei, Victoria

    2010-01-01

    Incomplete nonnormal data are common occurrences in applied research. Although these 2 problems are often dealt with separately by methodologists, they often cooccur. Very little has been written about statistics appropriate for evaluating models with such data. This article extends several existing statistics for complete nonnormal data to…

  9. Hold My Calls: An Activity for Introducing the Statistical Process

    ERIC Educational Resources Information Center

    Abel, Todd; Poling, Lisa

    2015-01-01

    Working with practicing teachers, this article demonstrates, through the facilitation of a statistical activity, how to introduce and investigate the unique qualities of the statistical process including: formulate a question, collect data, analyze data, and interpret data.

  10. The U.S. geological survey rass-statpac system for management and statistical reduction of geochemical data

    USGS Publications Warehouse

    VanTrump, G.; Miesch, A.T.

    1977-01-01

    RASS is an acronym for Rock Analysis Storage System and STATPAC, for Statistical Package. The RASS and STATPAC computer programs are integrated into the RASS-STATPAC system for the management and statistical reduction of geochemical data. The system, in its present form, has been in use for more than 9 yr by scores of U.S. Geological Survey geologists, geochemists, and other scientists engaged in a broad range of geologic and geochemical investigations. The principal advantage of the system is the flexibility afforded the user both in data searches and retrievals and in the manner of statistical treatment of data. The statistical programs provide for most types of statistical reduction normally used in geochemistry and petrology, but also contain bridges to other program systems for statistical processing and automatic plotting. ?? 1977.

  11. Application of Ontology Technology in Health Statistic Data Analysis.

    PubMed

    Guo, Minjiang; Hu, Hongpu; Lei, Xingyun

    2017-01-01

    Research Purpose: establish health management ontology for analysis of health statistic data. Proposed Methods: this paper established health management ontology based on the analysis of the concepts in China Health Statistics Yearbook, and used protégé to define the syntactic and semantic structure of health statistical data. six classes of top-level ontology concepts and their subclasses had been extracted and the object properties and data properties were defined to establish the construction of these classes. By ontology instantiation, we can integrate multi-source heterogeneous data and enable administrators to have an overall understanding and analysis of the health statistic data. ontology technology provides a comprehensive and unified information integration structure of the health management domain and lays a foundation for the efficient analysis of multi-source and heterogeneous health system management data and enhancement of the management efficiency.

  12. Testing independence of bivariate interval-censored data using modified Kendall's tau statistic.

    PubMed

    Kim, Yuneung; Lim, Johan; Park, DoHwan

    2015-11-01

    In this paper, we study a nonparametric procedure to test independence of bivariate interval censored data; for both current status data (case 1 interval-censored data) and case 2 interval-censored data. To do it, we propose a score-based modification of the Kendall's tau statistic for bivariate interval-censored data. Our modification defines the Kendall's tau statistic with expected numbers of concordant and disconcordant pairs of data. The performance of the modified approach is illustrated by simulation studies and application to the AIDS study. We compare our method to alternative approaches such as the two-stage estimation method by Sun et al. (Scandinavian Journal of Statistics, 2006) and the multiple imputation method by Betensky and Finkelstein (Statistics in Medicine, 1999b). © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  13. Technical Note: The Initial Stages of Statistical Data Analysis

    PubMed Central

    Tandy, Richard D.

    1998-01-01

    Objective: To provide an overview of several important data-related considerations in the design stage of a research project and to review the levels of measurement and their relationship to the statistical technique chosen for the data analysis. Background: When planning a study, the researcher must clearly define the research problem and narrow it down to specific, testable questions. The next steps are to identify the variables in the study, decide how to group and treat subjects, and determine how to measure, and the underlying level of measurement of, the dependent variables. Then the appropriate statistical technique can be selected for data analysis. Description: The four levels of measurement in increasing complexity are nominal, ordinal, interval, and ratio. Nominal data are categorical or “count” data, and the numbers are treated as labels. Ordinal data can be ranked in a meaningful order by magnitude. Interval data possess the characteristics of ordinal data and also have equal distances between levels. Ratio data have a natural zero point. Nominal and ordinal data are analyzed with nonparametric statistical techniques and interval and ratio data with parametric statistical techniques. Advantages: Understanding the four levels of measurement and when it is appropriate to use each is important in determining which statistical technique to use when analyzing data. PMID:16558489

  14. STATISTICAL DATA ON CHEMICAL COMPOUNDS.

    DTIC Science & Technology

    DATA STORAGE SYSTEMS, FEASIBILITY STUDIES, COMPUTERS, STATISTICAL DATA , DOCUMENTS, ARMY...CHEMICAL COMPOUNDS, INFORMATION RETRIEVAL), (*INFORMATION RETRIEVAL, CHEMICAL COMPOUNDS), MOLECULAR STRUCTURE, BIBLIOGRAPHIES, DATA PROCESSING

  15. 75 FR 67121 - Re-Establishment of the Bureau of Labor Statistics Data Users Advisory Committee

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-11-01

    ... DEPARTMENT OF LABOR Bureau of Labor Statistics Re-Establishment of the Bureau of Labor Statistics... Statistics Data Users Advisory Committee (the ``Committee'') is in the public interest in connection with the performance of duties imposed upon the Commissioner of Labor Statistics by 29 U.S.C. 1 and 2. This...

  16. Statistical Package User’s Guide.

    DTIC Science & Technology

    1980-08-01

    261 C. STACH Nonparametric Descriptive Statistics ... ......... ... 265 D. CHIRA Coefficient of Concordance...135 I.- -a - - W 7- Test Data: This program was tested using data from John Neter and William Wasserman, Applied Linear Statistical Models: Regression...length of data file e. new fileý name (not same as raw data file) 5. Printout as optioned for only. Comments: Ranked data are used for program CHIRA

  17. Strategies Used by Students to Compare Two Data Sets

    ERIC Educational Resources Information Center

    Reaburn, Robyn

    2012-01-01

    One of the common tasks of inferential statistics is to compare two data sets. Long before formal statistical procedures, however, students can be encouraged to make comparisons between data sets and therefore build up intuitive statistical reasoning. Such tasks also give meaning to the data collection students may do. This study describes the…

  18. Computers as an Instrument for Data Analysis. Technical Report No. 11.

    ERIC Educational Resources Information Center

    Muller, Mervin E.

    A review of statistical data analysis involving computers as a multi-dimensional problem provides the perspective for consideration of the use of computers in statistical analysis and the problems associated with large data files. An overall description of STATJOB, a particular system for doing statistical data analysis on a digital computer,…

  19. 78 FR 17942 - Data Users Advisory Committee; Notice of Meeting and Agenda

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-03-25

    ... DEPARTMENT OF LABOR Bureau of Labor Statistics Data Users Advisory Committee; Notice of Meeting and Agenda The Bureau of Labor Statistics Data Users Advisory Committee will meet on Tuesday May 7..., DC. The Committee provides advice to the Bureau of Labor Statistics from the points of view of data...

  20. 78 FR 24236 - Data Users Advisory Committee; Notice of Meeting and Agenda

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-04-24

    ... DEPARTMENT OF LABOR Bureau of Labor Statistics Data Users Advisory Committee; Notice of Meeting and Agenda The Bureau of Labor Statistics Data Users Advisory Committee will meet on [[Page 24237... of Labor Statistics from the points of view of data users from various sectors of the U.S. economy...

  1. 77 FR 20850 - Data Users Advisory Committee; Notice of Meeting and Agenda

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-04-06

    ... DEPARTMENT OF LABOR Bureau of Labor Statistics Data Users Advisory Committee; Notice of Meeting and Agenda The Bureau of Labor Statistics Data Users Advisory Committee will meet on Thursday May 17..., DC. The Committee provides advice to the Bureau of Labor Statistics from the points of view of data...

  2. Statistical Compression for Climate Model Output

    NASA Astrophysics Data System (ADS)

    Hammerling, D.; Guinness, J.; Soh, Y. J.

    2017-12-01

    Numerical climate model simulations run at high spatial and temporal resolutions generate massive quantities of data. As our computing capabilities continue to increase, storing all of the data is not sustainable, and thus is it important to develop methods for representing the full datasets by smaller compressed versions. We propose a statistical compression and decompression algorithm based on storing a set of summary statistics as well as a statistical model describing the conditional distribution of the full dataset given the summary statistics. We decompress the data by computing conditional expectations and conditional simulations from the model given the summary statistics. Conditional expectations represent our best estimate of the original data but are subject to oversmoothing in space and time. Conditional simulations introduce realistic small-scale noise so that the decompressed fields are neither too smooth nor too rough compared with the original data. Considerable attention is paid to accurately modeling the original dataset-one year of daily mean temperature data-particularly with regard to the inherent spatial nonstationarity in global fields, and to determining the statistics to be stored, so that the variation in the original data can be closely captured, while allowing for fast decompression and conditional emulation on modest computers.

  3. Tips and Tricks for Successful Application of Statistical Methods to Biological Data.

    PubMed

    Schlenker, Evelyn

    2016-01-01

    This chapter discusses experimental design and use of statistics to describe characteristics of data (descriptive statistics) and inferential statistics that test the hypothesis posed by the investigator. Inferential statistics, based on probability distributions, depend upon the type and distribution of the data. For data that are continuous, randomly and independently selected, as well as normally distributed more powerful parametric tests such as Student's t test and analysis of variance (ANOVA) can be used. For non-normally distributed or skewed data, transformation of the data (using logarithms) may normalize the data allowing use of parametric tests. Alternatively, with skewed data nonparametric tests can be utilized, some of which rely on data that are ranked prior to statistical analysis. Experimental designs and analyses need to balance between committing type 1 errors (false positives) and type 2 errors (false negatives). For a variety of clinical studies that determine risk or benefit, relative risk ratios (random clinical trials and cohort studies) or odds ratios (case-control studies) are utilized. Although both use 2 × 2 tables, their premise and calculations differ. Finally, special statistical methods are applied to microarray and proteomics data, since the large number of genes or proteins evaluated increase the likelihood of false discoveries. Additional studies in separate samples are used to verify microarray and proteomic data. Examples in this chapter and references are available to help continued investigation of experimental designs and appropriate data analysis.

  4. ISSUES IN THE STATISTICAL ANALYSIS OF SMALL-AREA HEALTH DATA. (R825173)

    EPA Science Inventory

    The availability of geographically indexed health and population data, with advances in computing, geographical information systems and statistical methodology, have opened the way for serious exploration of small area health statistics based on routine data. Such analyses may be...

  5. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kogalovskii, M.R.

    This paper presents a review of problems related to statistical database systems, which are wide-spread in various fields of activity. Statistical databases (SDB) are referred to as databases that consist of data and are used for statistical analysis. Topics under consideration are: SDB peculiarities, properties of data models adequate for SDB requirements, metadata functions, null-value problems, SDB compromise protection problems, stored data compression techniques, and statistical data representation means. Also examined is whether the present Database Management Systems (DBMS) satisfy the SDB requirements. Some actual research directions in SDB systems are considered.

  6. Statistical Inference for Data Adaptive Target Parameters.

    PubMed

    Hubbard, Alan E; Kherad-Pajouh, Sara; van der Laan, Mark J

    2016-05-01

    Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in an estimation sample (one of the V subsamples) and corresponding complementary parameter-generating sample. For each of the V parameter-generating samples, we apply an algorithm that maps the sample to a statistical target parameter. We define our sample-split data adaptive statistical target parameter as the average of these V-sample specific target parameters. We present an estimator (and corresponding central limit theorem) of this type of data adaptive target parameter. This general methodology for generating data adaptive target parameters is demonstrated with a number of practical examples that highlight new opportunities for statistical learning from data. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming "data-driven", the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods. To suggest such potential, and to verify the predictions of the theory, extensive simulation studies, along with a data analysis based on adaptively determined intervention rules are shown and give insight into how to structure such an approach. The results show that the data adaptive target parameter approach provides a general framework and resulting methodology for data-driven science.

  7. [Sem: a suitable statistical software adaptated for research in oncology].

    PubMed

    Kwiatkowski, F; Girard, M; Hacene, K; Berlie, J

    2000-10-01

    Many softwares have been adapted for medical use; they rarely enable conveniently both data management and statistics. A recent cooperative work ended up in a new software, Sem (Statistics Epidemiology Medicine), which allows data management of trials and, as well, statistical treatments on them. Very convenient, it can be used by non professional in statistics (biologists, doctors, researchers, data managers), since usually (excepted with multivariate models), the software performs by itself the most adequate test, after what complementary tests can be requested if needed. Sem data base manager (DBM) is not compatible with usual DBM: this constitutes a first protection against loss of privacy. Other shields (passwords, cryptage...) strengthen data security, all the more necessary today since Sem can be run on computers nets. Data organization enables multiplicity: forms can be duplicated by patient. Dates are treated in a special but transparent manner (sorting, date and delay calculations...). Sem communicates with common desktop softwares, often with a simple copy/paste. So, statistics can be easily performed on data stored in external calculation sheets, and slides by pasting graphs with a single mouse click (survival curves...). Already used over fifty places in different hospitals for daily work, this product, combining data management and statistics, appears to be a convenient and innovative solution.

  8. Evaluation of Fuzzy-Logic Framework for Spatial Statistics Preserving Methods for Estimation of Missing Precipitation Data

    NASA Astrophysics Data System (ADS)

    El Sharif, H.; Teegavarapu, R. S.

    2012-12-01

    Spatial interpolation methods used for estimation of missing precipitation data at a site seldom check for their ability to preserve site and regional statistics. Such statistics are primarily defined by spatial correlations and other site-to-site statistics in a region. Preservation of site and regional statistics represents a means of assessing the validity of missing precipitation estimates at a site. This study evaluates the efficacy of a fuzzy-logic methodology for infilling missing historical daily precipitation data in preserving site and regional statistics. Rain gauge sites in the state of Kentucky, USA, are used as a case study for evaluation of this newly proposed method in comparison to traditional data infilling techniques. Several error and performance measures will be used to evaluate the methods and trade-offs in accuracy of estimation and preservation of site and regional statistics.

  9. L-statistics for Repeated Measurements Data With Application to Trimmed Means, Quantiles and Tolerance Intervals.

    PubMed

    Assaad, Houssein I; Choudhary, Pankaj K

    2013-01-01

    The L -statistics form an important class of estimators in nonparametric statistics. Its members include trimmed means and sample quantiles and functions thereof. This article is devoted to theory and applications of L -statistics for repeated measurements data, wherein the measurements on the same subject are dependent and the measurements from different subjects are independent. This article has three main goals: (a) Show that the L -statistics are asymptotically normal for repeated measurements data. (b) Present three statistical applications of this result, namely, location estimation using trimmed means, quantile estimation and construction of tolerance intervals. (c) Obtain a Bahadur representation for sample quantiles. These results are generalizations of similar results for independently and identically distributed data. The practical usefulness of these results is illustrated by analyzing a real data set involving measurement of systolic blood pressure. The properties of the proposed point and interval estimators are examined via simulation.

  10. 28 CFR 22.21 - Use of identifiable data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... STATISTICAL INFORMATION § 22.21 Use of identifiable data. Research or statistical information identifiable to a private person may be used only for research or statistical purposes. ... 28 Judicial Administration 1 2010-07-01 2010-07-01 false Use of identifiable data. 22.21 Section...

  11. 7 CFR 3601.7 - Requests for published data and information.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... STATISTICS SERVICE, DEPARTMENT OF AGRICULTURE PUBLIC INFORMATION § 3601.7 Requests for published data and... historic publications is available from the Secretary, Agricultural Statistics Board, NASS, U.S. Department... Publications, NASS Catalog, NASS Periodicals and Annual Reports. Published data, from each State Statistical...

  12. 46 CFR 555.4 - Petitions.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ..., business or operation; (3) The name and address of each United States carrier alleged to be adversely... to petitioner: (i) Statistical data documenting present or prospective cargo loss by United States... on that basis, and the sources of the statistical data; (ii) Statistical data or other information...

  13. Modeling Statistics of Fish Patchiness and Predicting Associated Influence on Statistics of Acoustic Echoes

    DTIC Science & Technology

    2013-09-30

    data. The Niwa and Anderson models were compared with 3-D multi-beam data collected by Paramo and Gerlotto. The data were consistent with the...Bhatia, S., T.K. Stanton, J. Paramo , and F. Gerlotto (under revision), “Modeling statistics of fish school dimensions using 3-D data from a

  14. Critical Views of 8th Grade Students toward Statistical Data in Newspaper Articles: Analysis in Light of Statistical Literacy

    ERIC Educational Resources Information Center

    Guler, Mustafa; Gursoy, Kadir; Guven, Bulent

    2016-01-01

    Understanding and interpreting biased data, decision-making in accordance with the data, and critically evaluating situations involving data are among the fundamental skills necessary in the modern world. To develop these required skills, emphasis on statistical literacy in school mathematics has been gradually increased in recent years. The…

  15. Challenges of Big Data Analysis.

    PubMed

    Fan, Jianqing; Han, Fang; Liu, Han

    2014-06-01

    Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article gives overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasize on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions.

  16. Challenges of Big Data Analysis

    PubMed Central

    Fan, Jianqing; Han, Fang; Liu, Han

    2014-01-01

    Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article gives overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasize on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions. PMID:25419469

  17. Data Processing System (DPS) software with experimental design, statistical analysis and data mining developed for use in entomological research.

    PubMed

    Tang, Qi-Yi; Zhang, Chuan-Xi

    2013-04-01

    A comprehensive but simple-to-use software package called DPS (Data Processing System) has been developed to execute a range of standard numerical analyses and operations used in experimental design, statistics and data mining. This program runs on standard Windows computers. Many of the functions are specific to entomological and other biological research and are not found in standard statistical software. This paper presents applications of DPS to experimental design, statistical analysis and data mining in entomology. © 2012 The Authors Insect Science © 2012 Institute of Zoology, Chinese Academy of Sciences.

  18. Neural network approaches versus statistical methods in classification of multisource remote sensing data

    NASA Technical Reports Server (NTRS)

    Benediktsson, Jon A.; Swain, Philip H.; Ersoy, Okan K.

    1990-01-01

    Neural network learning procedures and statistical classificaiton methods are applied and compared empirically in classification of multisource remote sensing and geographic data. Statistical multisource classification by means of a method based on Bayesian classification theory is also investigated and modified. The modifications permit control of the influence of the data sources involved in the classification process. Reliability measures are introduced to rank the quality of the data sources. The data sources are then weighted according to these rankings in the statistical multisource classification. Four data sources are used in experiments: Landsat MSS data and three forms of topographic data (elevation, slope, and aspect). Experimental results show that two different approaches have unique advantages and disadvantages in this classification application.

  19. Online Statistical Modeling (Regression Analysis) for Independent Responses

    NASA Astrophysics Data System (ADS)

    Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus

    2017-06-01

    Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.

  20. CIDR

    Science.gov Websites

    Statistics Quality Control Statistics CIDR is dedicated to producing the highest quality data for our investigators. These cumulative quality control statistics are based on data from 419 released CIDR Program

  1. The use and misuse of statistical methodologies in pharmacology research.

    PubMed

    Marino, Michael J

    2014-01-01

    Descriptive, exploratory, and inferential statistics are necessary components of hypothesis-driven biomedical research. Despite the ubiquitous need for these tools, the emphasis on statistical methods in pharmacology has become dominated by inferential methods often chosen more by the availability of user-friendly software than by any understanding of the data set or the critical assumptions of the statistical tests. Such frank misuse of statistical methodology and the quest to reach the mystical α<0.05 criteria has hampered research via the publication of incorrect analysis driven by rudimentary statistical training. Perhaps more critically, a poor understanding of statistical tools limits the conclusions that may be drawn from a study by divorcing the investigator from their own data. The net result is a decrease in quality and confidence in research findings, fueling recent controversies over the reproducibility of high profile findings and effects that appear to diminish over time. The recent development of "omics" approaches leading to the production of massive higher dimensional data sets has amplified these issues making it clear that new approaches are needed to appropriately and effectively mine this type of data. Unfortunately, statistical education in the field has not kept pace. This commentary provides a foundation for an intuitive understanding of statistics that fosters an exploratory approach and an appreciation for the assumptions of various statistical tests that hopefully will increase the correct use of statistics, the application of exploratory data analysis, and the use of statistical study design, with the goal of increasing reproducibility and confidence in the literature. Copyright © 2013. Published by Elsevier Inc.

  2. Design of order statistics filters using feedforward neural networks

    NASA Astrophysics Data System (ADS)

    Maslennikova, Yu. S.; Bochkarev, V. V.

    2016-08-01

    In recent years significant progress have been made in the development of nonlinear data processing techniques. Such techniques are widely used in digital data filtering and image enhancement. Many of the most effective nonlinear filters based on order statistics. The widely used median filter is the best known order statistic filter. Generalized form of these filters could be presented based on Lloyd's statistics. Filters based on order statistics have excellent robustness properties in the presence of impulsive noise. In this paper, we present special approach for synthesis of order statistics filters using artificial neural networks. Optimal Lloyd's statistics are used for selecting of initial weights for the neural network. Adaptive properties of neural networks provide opportunities to optimize order statistics filters for data with asymmetric distribution function. Different examples demonstrate the properties and performance of presented approach.

  3. Statistics Report on TEQSA Registered Higher Education Providers

    ERIC Educational Resources Information Center

    Australian Government Tertiary Education Quality and Standards Agency, 2015

    2015-01-01

    This statistics report provides a comprehensive snapshot of national statistics on all parts of the sector for the year 2013, by bringing together data collected directly by TEQSA with data sourced from the main higher education statistics collections managed by the Australian Government Department of Education and Training. The report provides…

  4. Data Acquisition and Preprocessing in Studies on Humans: What Is Not Taught in Statistics Classes?

    PubMed

    Zhu, Yeyi; Hernandez, Ladia M; Mueller, Peter; Dong, Yongquan; Forman, Michele R

    2013-01-01

    The aim of this paper is to address issues in research that may be missing from statistics classes and important for (bio-)statistics students. In the context of a case study, we discuss data acquisition and preprocessing steps that fill the gap between research questions posed by subject matter scientists and statistical methodology for formal inference. Issues include participant recruitment, data collection training and standardization, variable coding, data review and verification, data cleaning and editing, and documentation. Despite the critical importance of these details in research, most of these issues are rarely discussed in an applied statistics program. One reason for the lack of more formal training is the difficulty in addressing the many challenges that can possibly arise in the course of a study in a systematic way. This article can help to bridge this gap between research questions and formal statistical inference by using an illustrative case study for a discussion. We hope that reading and discussing this paper and practicing data preprocessing exercises will sensitize statistics students to these important issues and achieve optimal conduct, quality control, analysis, and interpretation of a study.

  5. The Empirical Nature and Statistical Treatment of Missing Data

    ERIC Educational Resources Information Center

    Tannenbaum, Christyn E.

    2009-01-01

    Introduction. Missing data is a common problem in research and can produce severely misleading analyses, including biased estimates of statistical parameters, and erroneous conclusions. In its 1999 report, the APA Task Force on Statistical Inference encouraged authors to report complications such as missing data and discouraged the use of…

  6. China Report Economic Affairs No. 371 Chinese Statistical Abstract

    DTIC Science & Technology

    1983-08-12

    Third Plenary Session of the 11th Party Central Committee. These statistical data essentially include the major indices of various sectors of the...1983 is used in • accordance with the practice at home and abroad in compiling economic data , although its contents are the statistical data of 1982...JPRS 84111 12 August 19 83 China Report ECONOMIC AFFAIRS No. 371 CHINESE STATISTICAL ABSTRACT OTIC 19980609 147 Q*AUTY r*a^Gr8Df

  7. Disparities in Minority Promotion Rates: A Total Quality Approach

    DTIC Science & Technology

    1992-01-01

    UCL - p + 3 x.’ { p ( I - p) / n data, The statistical theory of logistic regression is beyond the scope of this report. Several computer statistical ... Statistics . Richard D. Irwin, Inc., Homewood IL: 1986. Feagin, J. R., Discrimination 4merican style: Institutional racism and sexism . Englewood Cliffs...current year data and the previous three years. Data for fiscal year One purpose of this project is to provide a statistical 1987, 1988, 1989, 1990, and

  8. [Continuity of hospital identifiers in hospital discharge data - Analysis of the nationwide German DRG Statistics from 2005 to 2013].

    PubMed

    Nimptsch, Ulrike; Wengler, Annelene; Mansky, Thomas

    2016-11-01

    In Germany, nationwide hospital discharge data (DRG statistics provided by the research data centers of the Federal Statistical Office and the Statistical Offices of the 'Länder') are increasingly used as data source for health services research. Within this data hospitals can be separated via their hospital identifier ([Institutionskennzeichen] IK). However, this hospital identifier primarily designates the invoicing unit and is not necessarily equivalent to one hospital location. Aiming to investigate direction and extent of possible bias in hospital-level analyses this study examines the continuity of the hospital identifier within a cross-sectional and longitudinal approach and compares the results to official hospital census statistics. Within the DRG statistics from 2005 to 2013 the annual number of hospitals as classified by hospital identifiers was counted for each year of observation. The annual number of hospitals derived from DRG statistics was compared to the number of hospitals in the official census statistics 'Grunddaten der Krankenhäuser'. Subsequently, the temporal continuity of hospital identifiers in the DRG statistics was analyzed within cohorts of hospitals. Until 2013, the annual number of hospital identifiers in the DRG statistics fell by 175 (from 1,725 to 1,550). This decline affected only providers with small or medium case volume. The number of hospitals identified in the DRG statistics was lower than the number given in the census statistics (e.g., in 2013 1,550 IK vs. 1,668 hospitals in the census statistics). The longitudinal analyses revealed that the majority of hospital identifiers persisted in the years of observation, while one fifth of hospital identifiers changed. In cross-sectional studies of German hospital discharge data the separation of hospitals via the hospital identifier might lead to underestimating the number of hospitals and consequential overestimation of caseload per hospital. Discontinuities of hospital identifiers over time might impair the follow-up of hospital cohorts. These limitations must be taken into account in analyses of German hospital discharge data focusing on the hospital level. Copyright © 2016. Published by Elsevier GmbH.

  9. Incorporating an Interactive Statistics Workshop into an Introductory Biology Course-Based Undergraduate Research Experience (CURE) Enhances Students’ Statistical Reasoning and Quantitative Literacy Skills †

    PubMed Central

    Olimpo, Jeffrey T.; Pevey, Ryan S.; McCabe, Thomas M.

    2018-01-01

    Course-based undergraduate research experiences (CUREs) provide an avenue for student participation in authentic scientific opportunities. Within the context of such coursework, students are often expected to collect, analyze, and evaluate data obtained from their own investigations. Yet, limited research has been conducted that examines mechanisms for supporting students in these endeavors. In this article, we discuss the development and evaluation of an interactive statistics workshop that was expressly designed to provide students with an open platform for graduate teaching assistant (GTA)-mentored data processing, statistical testing, and synthesis of their own research findings. Mixed methods analyses of pre/post-intervention survey data indicated a statistically significant increase in students’ reasoning and quantitative literacy abilities in the domain, as well as enhancement of student self-reported confidence in and knowledge of the application of various statistical metrics to real-world contexts. Collectively, these data reify an important role for scaffolded instruction in statistics in preparing emergent scientists to be data-savvy researchers in a globally expansive STEM workforce. PMID:29904549

  10. Incorporating an Interactive Statistics Workshop into an Introductory Biology Course-Based Undergraduate Research Experience (CURE) Enhances Students' Statistical Reasoning and Quantitative Literacy Skills.

    PubMed

    Olimpo, Jeffrey T; Pevey, Ryan S; McCabe, Thomas M

    2018-01-01

    Course-based undergraduate research experiences (CUREs) provide an avenue for student participation in authentic scientific opportunities. Within the context of such coursework, students are often expected to collect, analyze, and evaluate data obtained from their own investigations. Yet, limited research has been conducted that examines mechanisms for supporting students in these endeavors. In this article, we discuss the development and evaluation of an interactive statistics workshop that was expressly designed to provide students with an open platform for graduate teaching assistant (GTA)-mentored data processing, statistical testing, and synthesis of their own research findings. Mixed methods analyses of pre/post-intervention survey data indicated a statistically significant increase in students' reasoning and quantitative literacy abilities in the domain, as well as enhancement of student self-reported confidence in and knowledge of the application of various statistical metrics to real-world contexts. Collectively, these data reify an important role for scaffolded instruction in statistics in preparing emergent scientists to be data-savvy researchers in a globally expansive STEM workforce.

  11. National transportation statistics 1999

    DOT National Transportation Integrated Search

    2000-01-01

    National Transportation Statistics 1999 is a companion document to the Transportation Statistics Annual Report, which analyzes the data presented here. The report has four chapters. Chapter 1 provides data on the extent, condition, use, and performan...

  12. Uterine Cancer Statistics

    MedlinePlus

    ... Doing AMIGAS Stay Informed Cancer Home Uterine Cancer Statistics Language: English (US) Español (Spanish) Recommend on Facebook ... the most commonly diagnosed gynecologic cancer. U.S. Cancer Statistics Data Visualizations Tool The Data Visualizations tool makes ...

  13. Testing the statistical compatibility of independent data sets

    NASA Astrophysics Data System (ADS)

    Maltoni, M.; Schwetz, T.

    2003-08-01

    We discuss a goodness-of-fit method which tests the compatibility between statistically independent data sets. The method gives sensible results even in cases where the χ2 minima of the individual data sets are very low or when several parameters are fitted to a large number of data points. In particular, it avoids the problem that a possible disagreement between data sets becomes diluted by data points which are insensitive to the crucial parameters. A formal derivation of the probability distribution function for the proposed test statistics is given, based on standard theorems of statistics. The application of the method is illustrated on data from neutrino oscillation experiments, and its complementarity to the standard goodness-of-fit is discussed.

  14. Biostatistics primer: part I.

    PubMed

    Overholser, Brian R; Sowinski, Kevin M

    2007-12-01

    Biostatistics is the application of statistics to biologic data. The field of statistics can be broken down into 2 fundamental parts: descriptive and inferential. Descriptive statistics are commonly used to categorize, display, and summarize data. Inferential statistics can be used to make predictions based on a sample obtained from a population or some large body of information. It is these inferences that are used to test specific research hypotheses. This 2-part review will outline important features of descriptive and inferential statistics as they apply to commonly conducted research studies in the biomedical literature. Part 1 in this issue will discuss fundamental topics of statistics and data analysis. Additionally, some of the most commonly used statistical tests found in the biomedical literature will be reviewed in Part 2 in the February 2008 issue.

  15. Analyzing longitudinal data with the linear mixed models procedure in SPSS.

    PubMed

    West, Brady T

    2009-09-01

    Many applied researchers analyzing longitudinal data share a common misconception: that specialized statistical software is necessary to fit hierarchical linear models (also known as linear mixed models [LMMs], or multilevel models) to longitudinal data sets. Although several specialized statistical software programs of high quality are available that allow researchers to fit these models to longitudinal data sets (e.g., HLM), rapid advances in general purpose statistical software packages have recently enabled analysts to fit these same models when using preferred packages that also enable other more common analyses. One of these general purpose statistical packages is SPSS, which includes a very flexible and powerful procedure for fitting LMMs to longitudinal data sets with continuous outcomes. This article aims to present readers with a practical discussion of how to analyze longitudinal data using the LMMs procedure in the SPSS statistical software package.

  16. The Health of Children--1970: Selected Data From the National Center for Health Statistics.

    ERIC Educational Resources Information Center

    National Center for Health Statistics (DHEW/PHS), Hyattsville, MD.

    In this booklet, charts and graphs present data from four divisions of the National Center for Health Statistics. The divisions represented are those concerned with vital statistics (births, deaths, fetal deaths, marriages and divorces); health interview statistics (information on health and demographic factors related to illness); health…

  17. Exploring Factors Related to Completion of an Online Undergraduate-Level Introductory Statistics Course

    ERIC Educational Resources Information Center

    Zimmerman, Whitney Alicia; Johnson, Glenn

    2017-01-01

    Data were collected from 353 online undergraduate introductory statistics students at the beginning of a semester using the Goals and Outcomes Associated with Learning Statistics (GOALS) instrument and an abbreviated form of the Statistics Anxiety Rating Scale (STARS). Data included a survey of expected grade, expected time commitment, and the…

  18. Statistical Machine Learning for Structured and High Dimensional Data

    DTIC Science & Technology

    2014-09-17

    AFRL-OSR-VA-TR-2014-0234 STATISTICAL MACHINE LEARNING FOR STRUCTURED AND HIGH DIMENSIONAL DATA Larry Wasserman CARNEGIE MELLON UNIVERSITY Final...Re . 8-98) v Prescribed by ANSI Std. Z39.18 14-06-2014 Final Dec 2009 - Aug 2014 Statistical Machine Learning for Structured and High Dimensional...area of resource-constrained statistical estimation. machine learning , high-dimensional statistics U U U UU John Lafferty 773-702-3813 > Research under

  19. Notes on numerical reliability of several statistical analysis programs

    USGS Publications Warehouse

    Landwehr, J.M.; Tasker, Gary D.

    1999-01-01

    This report presents a benchmark analysis of several statistical analysis programs currently in use in the USGS. The benchmark consists of a comparison between the values provided by a statistical analysis program for variables in the reference data set ANASTY and their known or calculated theoretical values. The ANASTY data set is an amendment of the Wilkinson NASTY data set that has been used in the statistical literature to assess the reliability (computational correctness) of calculated analytical results.

  20. Statistics for characterizing data on the periphery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Theiler, James P; Hush, Donald R

    2010-01-01

    We introduce a class of statistics for characterizing the periphery of a distribution, and show that these statistics are particularly valuable for problems in target detection. Because so many detection algorithms are rooted in Gaussian statistics, we concentrate on ellipsoidal models of high-dimensional data distributions (that is to say: covariance matrices), but we recommend several alternatives to the sample covariance matrix that more efficiently model the periphery of a distribution, and can more effectively detect anomalous data samples.

  1. Fetal Alcohol Spectrum Disorders (FASDs): Data and Statistics

    MedlinePlus

    ... alcohol screening and counseling for all women Data & Statistics Recommend on Facebook Tweet Share Compartir Prevalence of ... conducted annually by the National Center for Health Statistics (NCHS), CDC, to produce national estimates for a ...

  2. Prison Radicalization: The New Extremist Training Grounds?

    DTIC Science & Technology

    2007-09-01

    distributing and collecting survey data , and the data analysis. The analytical methodology includes descriptive and inferential statistical methods, in... statistical analysis of the responses to identify significant correlations and relationships. B. SURVEY DATA COLLECTION To effectively access a...Q18, Q19, Q20, and Q21. Due to the exploratory nature of this small survey, data analyses were confined mostly to descriptive statistics and

  3. Data Comparability and Public Policy: New Interest in Public Library Data. Papers Presented at Meetings of the American Statistical Association. Working Paper Series.

    ERIC Educational Resources Information Center

    National Center for Education Statistics (ED), Washington, DC.

    The four papers contained in this volume were presented at the August 1994 meetings of the American Statistical Association as a session titled, "Public Policy and Data Comparability: New Interest in Public Library Data." The first paper, "Public Library Statistics: Two Systems Compared" (Mary Jo Lynch), describes two systems…

  4. A Commercial IOTV Cleaning Study

    DTIC Science & Technology

    2010-04-12

    manufacturer’s list price without taking into consideration of possible volume discount.  Equipment depreciation cost was calculated based on...Laundering with Prewash Spot Cleaning) 32 Table 12 Shrinkage Statistical Data (Traditional Wet Laundering without Prewash Spot Cleaning...Statistical Data (Computer-controlled Wet Cleaning without Prewash Spot Cleaning) 35 Table 15 Shrinkage Statistical Data (Liquid CO2 Cleaning

  5. Financial Statistics. Higher Education General Information Survey (HEGIS) [machine-readable data file].

    ERIC Educational Resources Information Center

    Center for Education Statistics (ED/OERI), Washington, DC.

    The Financial Statistics machine-readable data file (MRDF) is a subfile of the larger Higher Education General Information Survey (HEGIS). It contains basic financial statistics for over 3,000 institutions of higher education in the United States and its territories. The data are arranged sequentially by institution, with institutional…

  6. Educational Statistics for Selected Health Occupations.

    ERIC Educational Resources Information Center

    Johnson, Donald W.; Holz, Frank M.

    Detailed statistics on education are provided for a number of health occupations. Data are given as far back as 1950-1951 for medical and dental schools, while for schools of public health, the data begin in 1975-1976. Complete 1980 data are provided only for dentistry, pharmacy, and veterinary medicine. Statistical tables are included on the…

  7. Statistical Literacy: Data Tell a Story

    ERIC Educational Resources Information Center

    Sole, Marla A.

    2016-01-01

    Every day, students collect, organize, and analyze data to make decisions. In this data-driven world, people need to assess how much trust they can place in summary statistics. The results of every survey and the safety of every drug that undergoes a clinical trial depend on the correct application of appropriate statistics. Recognizing the…

  8. 42 CFR 417.806 - Financial records, statistical data, and cost finding.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 42 Public Health 3 2011-10-01 2011-10-01 false Financial records, statistical data, and cost finding. 417.806 Section 417.806 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF..., statistical data, and cost finding. (a) The principles specified in § 417.568 apply to HCPPs, except those in...

  9. 76 FR 61386 - Data Users Advisory Committee; Notice of Meeting and Agenda

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-10-04

    ... DEPARTMENT OF LABOR Bureau of Labor Statistics Data Users Advisory Committee; Notice of Meeting and Agenda The Bureau of Labor Statistics Data Users Advisory Committee will meet on Tuesday, October...., Washington, DC. The Committee provides advice to the Bureau of Labor Statistics from the points of view of...

  10. 78 FR 64023 - Data Users Advisory Committee; Notice of Meeting and Agenda

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-10-25

    ... DEPARTMENT OF LABOR Bureau of Labor Statistics Data Users Advisory Committee; Notice of Meeting and Agenda The Bureau of Labor Statistics Data Users Advisory Committee will meet on Tuesday, November...., Washington, DC. The Committee provides advice to the Bureau of Labor Statistics from the points of view of...

  11. Using Data from Climate Science to Teach Introductory Statistics

    ERIC Educational Resources Information Center

    Witt, Gary

    2013-01-01

    This paper shows how the application of simple statistical methods can reveal to students important insights from climate data. While the popular press is filled with contradictory opinions about climate science, teachers can encourage students to use introductory-level statistics to analyze data for themselves on this important issue in public…

  12. Use of Data Visualisation in the Teaching of Statistics: A New Zealand Perspective

    ERIC Educational Resources Information Center

    Forbes, Sharleen; Chapman, Jeanette; Harraway, John; Stirling, Doug; Wild, Chris

    2014-01-01

    For many years, students have been taught to visualise data by drawing graphs. Recently, there has been a growing trend to teach statistics, particularly statistical concepts, using interactive and dynamic visualisation tools. Free down-loadable teaching and simulation software designed specifically for schools, and more general data visualisation…

  13. 42 CFR 417.806 - Financial records, statistical data, and cost finding.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 42 Public Health 3 2010-10-01 2010-10-01 false Financial records, statistical data, and cost finding. 417.806 Section 417.806 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF..., statistical data, and cost finding. (a) The principles specified in § 417.568 apply to HCPPs, except those in...

  14. 77 FR 59991 - Data Users Advisory Committee; Notice of Meeting and Agenda

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-01

    ... DEPARTMENT OF LABOR Bureau of Labor Statistics Data Users Advisory Committee; Notice of Meeting and Agenda The Bureau of Labor Statistics Data Users Advisory Committee will meet on Thursday November...., Washington, DC. The Committee provides advice to the Bureau of Labor Statistics from the points of view of...

  15. A log-Weibull spatial scan statistic for time to event data.

    PubMed

    Usman, Iram; Rosychuk, Rhonda J

    2018-06-13

    Spatial scan statistics have been used for the identification of geographic clusters of elevated numbers of cases of a condition such as disease outbreaks. These statistics accompanied by the appropriate distribution can also identify geographic areas with either longer or shorter time to events. Other authors have proposed the spatial scan statistics based on the exponential and Weibull distributions. We propose the log-Weibull as an alternative distribution for the spatial scan statistic for time to events data and compare and contrast the log-Weibull and Weibull distributions through simulation studies. The effect of type I differential censoring and power have been investigated through simulated data. Methods are also illustrated on time to specialist visit data for discharged patients presenting to emergency departments for atrial fibrillation and flutter in Alberta during 2010-2011. We found northern regions of Alberta had longer times to specialist visit than other areas. We proposed the spatial scan statistic for the log-Weibull distribution as a new approach for detecting spatial clusters for time to event data. The simulation studies suggest that the test performs well for log-Weibull data.

  16. How Statisticians Speak Risk

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Redus, K.S.

    2007-07-01

    The foundation of statistics deals with (a) how to measure and collect data and (b) how to identify models using estimates of statistical parameters derived from the data. Risk is a term used by the statistical community and those that employ statistics to express the results of a statistically based study. Statistical risk is represented as a probability that, for example, a statistical model is sufficient to describe a data set; but, risk is also interpreted as a measure of worth of one alternative when compared to another. The common thread of any risk-based problem is the combination of (a)more » the chance an event will occur, with (b) the value of the event. This paper presents an introduction to, and some examples of, statistical risk-based decision making from a quantitative, visual, and linguistic sense. This should help in understanding areas of radioactive waste management that can be suitably expressed using statistical risk and vice-versa. (authors)« less

  17. Hyperparameterization of soil moisture statistical models for North America with Ensemble Learning Models (Elm)

    NASA Astrophysics Data System (ADS)

    Steinberg, P. D.; Brener, G.; Duffy, D.; Nearing, G. S.; Pelissier, C.

    2017-12-01

    Hyperparameterization, of statistical models, i.e. automated model scoring and selection, such as evolutionary algorithms, grid searches, and randomized searches, can improve forecast model skill by reducing errors associated with model parameterization, model structure, and statistical properties of training data. Ensemble Learning Models (Elm), and the related Earthio package, provide a flexible interface for automating the selection of parameters and model structure for machine learning models common in climate science and land cover classification, offering convenient tools for loading NetCDF, HDF, Grib, or GeoTiff files, decomposition methods like PCA and manifold learning, and parallel training and prediction with unsupervised and supervised classification, clustering, and regression estimators. Continuum Analytics is using Elm to experiment with statistical soil moisture forecasting based on meteorological forcing data from NASA's North American Land Data Assimilation System (NLDAS). There Elm is using the NSGA-2 multiobjective optimization algorithm for optimizing statistical preprocessing of forcing data to improve goodness-of-fit for statistical models (i.e. feature engineering). This presentation will discuss Elm and its components, including dask (distributed task scheduling), xarray (data structures for n-dimensional arrays), and scikit-learn (statistical preprocessing, clustering, classification, regression), and it will show how NSGA-2 is being used for automate selection of soil moisture forecast statistical models for North America.

  18. Descriptive statistics: the specification of statistical measures and their presentation in tables and graphs. Part 7 of a series on evaluation of scientific publications.

    PubMed

    Spriestersbach, Albert; Röhrig, Bernd; du Prel, Jean-Baptist; Gerhold-Ay, Aslihan; Blettner, Maria

    2009-09-01

    Descriptive statistics are an essential part of biometric analysis and a prerequisite for the understanding of further statistical evaluations, including the drawing of inferences. When data are well presented, it is usually obvious whether the author has collected and evaluated them correctly and in keeping with accepted practice in the field. Statistical variables in medicine may be of either the metric (continuous, quantitative) or categorical (nominal, ordinal) type. Easily understandable examples are given. Basic techniques for the statistical description of collected data are presented and illustrated with examples. The goal of a scientific study must always be clearly defined. The definition of the target value or clinical endpoint determines the level of measurement of the variables in question. Nearly all variables, whatever their level of measurement, can be usefully presented graphically and numerically. The level of measurement determines what types of diagrams and statistical values are appropriate. There are also different ways of presenting combinations of two independent variables graphically and numerically. The description of collected data is indispensable. If the data are of good quality, valid and important conclusions can already be drawn when they are properly described. Furthermore, data description provides a basis for inferential statistics.

  19. Clinical study of the Erlanger silver catheter--data management and biometry.

    PubMed

    Martus, P; Geis, C; Lugauer, S; Böswald, M; Guggenbichler, J P

    1999-01-01

    The clinical evaluation of venous catheters for catheter-induced infections must conform to a strict biometric methodology. The statistical planning of the study (target population, design, degree of blinding), data management (database design, definition of variables, coding), quality assurance (data inspection at several levels) and the biometric evaluation of the Erlanger silver catheter project are described. The three-step data flow included: 1) primary data from the hospital, 2) relational database, 3) files accessible for statistical evaluation. Two different statistical models were compared: analyzing the first catheter only of a patient in the analysis (independent data) and analyzing several catheters from the same patient (dependent data) by means of the generalized estimating equations (GEE) method. The main result of the study was based on the comparison of both statistical models.

  20. Generating an Empirical Probability Distribution for the Andrews-Pregibon Statistic.

    ERIC Educational Resources Information Center

    Jarrell, Michele G.

    A probability distribution was developed for the Andrews-Pregibon (AP) statistic. The statistic, developed by D. F. Andrews and D. Pregibon (1978), identifies multivariate outliers. It is a ratio of the determinant of the data matrix with an observation deleted to the determinant of the entire data matrix. Although the AP statistic has been used…

  1. Progress of Education in the Asian Region. Statistical Supplement.

    ERIC Educational Resources Information Center

    United Nations Educational, Scientific, and Cultural Organization, Bangkok (Thailand). Regional Office for Education in Asia and Oceania.

    This work is a supplement to an earlier work entitled, "Progress of Education in the Asian Region: a Statistic Review", (ED 035 490) which contained statistical data up to 1967. This supplement presents statistical data up to 1969 for regional aggregates and up to 1970 for individual countries in some cases. As in the Review, the…

  2. Informal Statistics Help Desk

    NASA Technical Reports Server (NTRS)

    Young, M.; Koslovsky, M.; Schaefer, Caroline M.; Feiveson, A. H.

    2017-01-01

    Back by popular demand, the JSC Biostatistics Laboratory and LSAH statisticians are offering an opportunity to discuss your statistical challenges and needs. Take the opportunity to meet the individuals offering expert statistical support to the JSC community. Join us for an informal conversation about any questions you may have encountered with issues of experimental design, analysis, or data visualization. Get answers to common questions about sample size, repeated measures, statistical assumptions, missing data, multiple testing, time-to-event data, and when to trust the results of your analyses.

  3. Randomization Procedures Applied to Analysis of Ballistic Data

    DTIC Science & Technology

    1991-06-01

    test,;;15. NUMBER OF PAGES data analysis; computationally intensive statistics ; randomization tests; permutation tests; 16 nonparametric statistics ...be 0.13. 8 Any reasonable statistical procedure would fail to support the notion of improvement of dynamic over standard indexing based on this data ...AD-A238 389 TECHNICAL REPORT BRL-TR-3245 iBRL RANDOMIZATION PROCEDURES APPLIED TO ANALYSIS OF BALLISTIC DATA MALCOLM S. TAYLOR BARRY A. BODT - JUNE

  4. Cluster detection methods applied to the Upper Cape Cod cancer data.

    PubMed

    Ozonoff, Al; Webster, Thomas; Vieira, Veronica; Weinberg, Janice; Ozonoff, David; Aschengrau, Ann

    2005-09-15

    A variety of statistical methods have been suggested to assess the degree and/or the location of spatial clustering of disease cases. However, there is relatively little in the literature devoted to comparison and critique of different methods. Most of the available comparative studies rely on simulated data rather than real data sets. We have chosen three methods currently used for examining spatial disease patterns: the M-statistic of Bonetti and Pagano; the Generalized Additive Model (GAM) method as applied by Webster; and Kulldorff's spatial scan statistic. We apply these statistics to analyze breast cancer data from the Upper Cape Cancer Incidence Study using three different latency assumptions. The three different latency assumptions produced three different spatial patterns of cases and controls. For 20 year latency, all three methods generally concur. However, for 15 year latency and no latency assumptions, the methods produce different results when testing for global clustering. The comparative analyses of real data sets by different statistical methods provides insight into directions for further research. We suggest a research program designed around examining real data sets to guide focused investigation of relevant features using simulated data, for the purpose of understanding how to interpret statistical methods applied to epidemiological data with a spatial component.

  5. Compilation and Analysis of 20 and 30 GHz Rain Fade Events at the ACTS NASA Ground Station: Statistics and Model Assessment

    NASA Technical Reports Server (NTRS)

    Manning, Robert M.

    1996-01-01

    The purpose of the propagation studies within the ACTS Project Office is to acquire 20 and 30 GHz rain fade statistics using the ACTS beacon links received at the NGS (NASA Ground Station) in Cleveland. Other than the raw, statistically unprocessed rain fade events that occur in real time, relevant rain fade statistics derived from such events are the cumulative rain fade statistics as well as fade duration statistics (beyond given fade thresholds) over monthly and yearly time intervals. Concurrent with the data logging exercise, monthly maximum rainfall levels recorded at the US Weather Service at Hopkins Airport are appended to the database to facilitate comparison of observed fade statistics with those predicted by the ACTS Rain Attenuation Model. Also, the raw fade data will be in a format, complete with documentation, for use by other investigators who require realistic fade event evolution in time for simulation purposes or further analysis for comparisons with other rain fade prediction models, etc. The raw time series data from the 20 and 30 GHz beacon signals is purged of non relevant data intervals where no rain fading has occurred. All other data intervals which contain rain fade events are archived with the accompanying time stamps. The definition of just what constitutes a rain fade event will be discussed later. The archived data serves two purposes. First, all rain fade event data is recombined into a contiguous data series every month and every year; this will represent an uninterrupted record of the actual (i.e., not statistically processed) temporal evolution of rain fade at 20 and 30 GHz at the location of the NGS. The second purpose of the data in such a format is to enable a statistical analysis of prevailing propagation parameters such as cumulative distributions of attenuation on a monthly and yearly basis as well as fade duration probabilities below given fade thresholds, also on a monthly and yearly basis. In addition, various subsidiary statistics such as attenuation rate probabilities are derived. The purged raw rain fade data as well as the results of the analyzed data will be made available for use by parties in the private sector upon their request. The process which will be followed in this dissemination is outlined in this paper.

  6. Health Resources Statistics; Health Manpower and Health Facilities, 1968. Public Health Service Publication No. 1509.

    ERIC Educational Resources Information Center

    National Center for Health Statistics (DHEW/PHS), Hyattsville, MD.

    This report is a part of the program of the National Center for Health Statistics to provide current statistics as baseline data for the evaluation, planning, and administration of health programs. Part I presents data concerning the occupational fields: (1) administration, (2) anthropology and sociology, (3) data processing, (4) basic sciences,…

  7. Modeling Statistics of Fish Patchiness and Predicting Associated Influence on Statistics of Acoustic Echoes

    DTIC Science & Technology

    2013-09-30

    published 3-D multi-beam data. The Niwa and Anderson models were compared with 3-D multi-beam data collected by Paramo and Gerlotto. The data were...submitted, refereed] Bhatia, S., T.K. Stanton, J. Paramo , and F. Gerlotto (under revision), “Modeling statistics of fish school dimensions using 3-D

  8. Modeling Statistics of Fish Patchiness and Predicting Associated Influence on Statistics of Acoustic Echoes

    DTIC Science & Technology

    2012-09-30

    data collected by Paramo and Gerlotto. The data were consistent with the Anderson model in that both the data and model had a mode in the...10.1098/rsfs.2012.0027 [published, refereed] Bhatia, S., T.K. Stanton, J. Paramo , and F. Gerlotto (submitted), “Modeling statistics of fish school

  9. 3 CFR - Enhanced Collection of Relevant Data and Statistics Relating to Women

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 3 The President 1 2012-01-01 2012-01-01 false Enhanced Collection of Relevant Data and Statistics Relating to Women Presidential Documents Other Presidential Documents Memorandum of March 4, 2011 Enhanced Collection of Relevant Data and Statistics Relating to Women Memorandum for the Heads of Executive Departments and Agencies I am proud to work...

  10. Crash Lethality Model

    DTIC Science & Technology

    2012-06-06

    Statistical Data ........................................................................................... 45 31 Parametric Model for Rotor Wing Debris...Area .............................................................. 46 32 Skid Distance Statistical Data...results. The curve that related the BC value to the probability of skull fracture resulted in a tight confidence interval and a two tailed statistical p

  11. TRAPR: R Package for Statistical Analysis and Visualization of RNA-Seq Data.

    PubMed

    Lim, Jae Hyun; Lee, Soo Youn; Kim, Ju Han

    2017-03-01

    High-throughput transcriptome sequencing, also known as RNA sequencing (RNA-Seq), is a standard technology for measuring gene expression with unprecedented accuracy. Numerous bioconductor packages have been developed for the statistical analysis of RNA-Seq data. However, these tools focus on specific aspects of the data analysis pipeline, and are difficult to appropriately integrate with one another due to their disparate data structures and processing methods. They also lack visualization methods to confirm the integrity of the data and the process. In this paper, we propose an R-based RNA-Seq analysis pipeline called TRAPR, an integrated tool that facilitates the statistical analysis and visualization of RNA-Seq expression data. TRAPR provides various functions for data management, the filtering of low-quality data, normalization, transformation, statistical analysis, data visualization, and result visualization that allow researchers to build customized analysis pipelines.

  12. China Macroeconomic: Documentation Data Banks. Volume 3. Foreign Trade.

    DTIC Science & Technology

    1984-11-15

    1983 data from Chlnese "Cmtan, Statistics," Econmic Infomation and Agency, THoE ong Pr ious years from CIA "International Trade Statistics" Load fram...204,097 145,564 515,062 1984 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 6 Sources, see below 19F, 1983 data from Chinese "Customs Statistics," Econmic ...Statistics," EconmIc Information and Agency, F org Previous years fran CIA "Internationl Trade Statistics" Load from rw 12 to row 54 -.96 - A B C D E F G H

  13. IGESS: a statistical approach to integrating individual-level genotype data and summary statistics in genome-wide association studies.

    PubMed

    Dai, Mingwei; Ming, Jingsi; Cai, Mingxuan; Liu, Jin; Yang, Can; Wan, Xiang; Xu, Zongben

    2017-09-15

    Results from genome-wide association studies (GWAS) suggest that a complex phenotype is often affected by many variants with small effects, known as 'polygenicity'. Tens of thousands of samples are often required to ensure statistical power of identifying these variants with small effects. However, it is often the case that a research group can only get approval for the access to individual-level genotype data with a limited sample size (e.g. a few hundreds or thousands). Meanwhile, summary statistics generated using single-variant-based analysis are becoming publicly available. The sample sizes associated with the summary statistics datasets are usually quite large. How to make the most efficient use of existing abundant data resources largely remains an open question. In this study, we propose a statistical approach, IGESS, to increasing statistical power of identifying risk variants and improving accuracy of risk prediction by i ntegrating individual level ge notype data and s ummary s tatistics. An efficient algorithm based on variational inference is developed to handle the genome-wide analysis. Through comprehensive simulation studies, we demonstrated the advantages of IGESS over the methods which take either individual-level data or summary statistics data as input. We applied IGESS to perform integrative analysis of Crohns Disease from WTCCC and summary statistics from other studies. IGESS was able to significantly increase the statistical power of identifying risk variants and improve the risk prediction accuracy from 63.2% ( ±0.4% ) to 69.4% ( ±0.1% ) using about 240 000 variants. The IGESS software is available at https://github.com/daviddaigithub/IGESS . zbxu@xjtu.edu.cn or xwan@comp.hkbu.edu.hk or eeyang@hkbu.edu.hk. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  14. Statistics on continuous IBD data: Exact distribution evaluation for a pair of full(half)-sibs and a pair of a (great-) grandchild with a (great-) grandparent

    PubMed Central

    Stefanov, Valeri T

    2002-01-01

    Background Pairs of related individuals are widely used in linkage analysis. Most of the tests for linkage analysis are based on statistics associated with identity by descent (IBD) data. The current biotechnology provides data on very densely packed loci, and therefore, it may provide almost continuous IBD data for pairs of closely related individuals. Therefore, the distribution theory for statistics on continuous IBD data is of interest. In particular, distributional results which allow the evaluation of p-values for relevant tests are of importance. Results A technology is provided for numerical evaluation, with any given accuracy, of the cumulative probabilities of some statistics on continuous genome data for pairs of closely related individuals. In the case of a pair of full-sibs, the following statistics are considered: (i) the proportion of genome with 2 (at least 1) haplotypes shared identical-by-descent (IBD) on a chromosomal segment, (ii) the number of distinct pieces (subsegments) of a chromosomal segment, on each of which exactly 2 (at least 1) haplotypes are shared IBD. The natural counterparts of these statistics for the other relationships are also considered. Relevant Maple codes are provided for a rapid evaluation of the cumulative probabilities of such statistics. The genomic continuum model, with Haldane's model for the crossover process, is assumed. Conclusions A technology, together with relevant software codes for its automated implementation, are provided for exact evaluation of the distributions of relevant statistics associated with continuous genome data on closely related individuals. PMID:11996673

  15. Statistical Techniques to Analyze Pesticide Data Program Food Residue Observations.

    PubMed

    Szarka, Arpad Z; Hayworth, Carol G; Ramanarayanan, Tharacad S; Joseph, Robert S I

    2018-06-26

    The U.S. EPA conducts dietary-risk assessments to ensure that levels of pesticides on food in the U.S. food supply are safe. Often these assessments utilize conservative residue estimates, maximum residue levels (MRLs), and a high-end estimate derived from registrant-generated field-trial data sets. A more realistic estimate of consumers' pesticide exposure from food may be obtained by utilizing residues from food-monitoring programs, such as the Pesticide Data Program (PDP) of the U.S. Department of Agriculture. A substantial portion of food-residue concentrations in PDP monitoring programs are below the limits of detection (left-censored), which makes the comparison of regulatory-field-trial and PDP residue levels difficult. In this paper, we present a novel adaption of established statistical techniques, the Kaplan-Meier estimator (K-M), the robust regression on ordered statistic (ROS), and the maximum-likelihood estimator (MLE), to quantify the pesticide-residue concentrations in the presence of heavily censored data sets. The examined statistical approaches include the most commonly used parametric and nonparametric methods for handling left-censored data that have been used in the fields of medical and environmental sciences. This work presents a case study in which data of thiamethoxam residue on bell pepper generated from registrant field trials were compared with PDP-monitoring residue values. The results from the statistical techniques were evaluated and compared with commonly used simple substitution methods for the determination of summary statistics. It was found that the maximum-likelihood estimator (MLE) is the most appropriate statistical method to analyze this residue data set. Using the MLE technique, the data analyses showed that the median and mean PDP bell pepper residue levels were approximately 19 and 7 times lower, respectively, than the corresponding statistics of the field-trial residues.

  16. Searching for hidden unexpected features in the SnIa data

    NASA Astrophysics Data System (ADS)

    Shafieloo, A.; Perivolaropoulos, L.

    2010-06-01

    It is known that κ2 statistic and likelihood analysis may not be sensitive to the all features of the data. Despite of the fact that by using κ2 statistic we can measure the overall goodness of fit for a model confronted to a data set, some specific features of the data can stay undetectable. For instance, it has been pointed out that there is an unexpected brightness of the SnIa data at z > 1 in the Union compilation. We quantify this statement by constructing a new statistic, called Binned Normalized Difference (BND) statistic, which is applicable directly on the Type Ia Supernova (SnIa) distance moduli. This statistic is designed to pick up systematic brightness trends of SnIa data points with respect to a best fit cosmological model at high redshifts. According to this statistic there are 2.2%, 5.3% and 12.6% consistency between the Gold06, Union08 and Constitution09 data and spatially flat ΛCDM model when the real data is compared with many realizations of the simulated monte carlo datasets. The corresponding realization probability in the context of a (w0,w1) = (-1.4,2) model is more than 30% for all mentioned datasets indicating a much better consistency for this model with respect to the BND statistic. The unexpected high z brightness of SnIa can be interpreted either as a trend towards more deceleration at high z than expected in the context of ΛCDM or as a statistical fluctuation or finally as a systematic effect perhaps due to a mild SnIa evolution at high z.

  17. SHARE: system design and case studies for statistical health information release

    PubMed Central

    Gardner, James; Xiong, Li; Xiao, Yonghui; Gao, Jingjing; Post, Andrew R; Jiang, Xiaoqian; Ohno-Machado, Lucila

    2013-01-01

    Objectives We present SHARE, a new system for statistical health information release with differential privacy. We present two case studies that evaluate the software on real medical datasets and demonstrate the feasibility and utility of applying the differential privacy framework on biomedical data. Materials and Methods SHARE releases statistical information in electronic health records with differential privacy, a strong privacy framework for statistical data release. It includes a number of state-of-the-art methods for releasing multidimensional histograms and longitudinal patterns. We performed a variety of experiments on two real datasets, the surveillance, epidemiology and end results (SEER) breast cancer dataset and the Emory electronic medical record (EeMR) dataset, to demonstrate the feasibility and utility of SHARE. Results Experimental results indicate that SHARE can deal with heterogeneous data present in medical data, and that the released statistics are useful. The Kullback–Leibler divergence between the released multidimensional histograms and the original data distribution is below 0.5 and 0.01 for seven-dimensional and three-dimensional data cubes generated from the SEER dataset, respectively. The relative error for longitudinal pattern queries on the EeMR dataset varies between 0 and 0.3. While the results are promising, they also suggest that challenges remain in applying statistical data release using the differential privacy framework for higher dimensional data. Conclusions SHARE is one of the first systems to provide a mechanism for custodians to release differentially private aggregate statistics for a variety of use cases in the medical domain. This proof-of-concept system is intended to be applied to large-scale medical data warehouses. PMID:23059729

  18. An Exercise in Exploring Big Data for Producing Reliable Statistical Information.

    PubMed

    Rey-Del-Castillo, Pilar; Cardeñosa, Jesús

    2016-06-01

    The availability of copious data about many human, social, and economic phenomena is considered an opportunity for the production of official statistics. National statistical organizations and other institutions are more and more involved in new projects for developing what is sometimes seen as a possible change of paradigm in the way statistical figures are produced. Nevertheless, there are hardly any systems in production using Big Data sources. Arguments of confidentiality, data ownership, representativeness, and others make it a difficult task to get results in the short term. Using Call Detail Records from Ivory Coast as an illustration, this article shows some of the issues that must be dealt with when producing statistical indicators from Big Data sources. A proposal of a graphical method to evaluate one specific aspect of the quality of the computed figures is also presented, demonstrating that the visual insight provided improves the results obtained using other traditional procedures.

  19. Content-Related Issues Pertaining to Teaching Statistics: Making Decisions about Educational Objectives in Statistics Courses.

    ERIC Educational Resources Information Center

    Bliss, Leonard B.; Tashakkori, Abbas

    This paper discusses the objectives that would be appropriate for statistics classes for students who are not majoring in statistics, evaluation, or quantitative research design. These "non-majors" should be able to choose appropriate analytical methods for specific sets of data based on the research question and the nature of the data, and they…

  20. Using Real-Life Data When Teaching Statistics: Student Perceptions of this Strategy in an Introductory Statistics Course

    ERIC Educational Resources Information Center

    Neumann, David L.; Hood, Michelle; Neumann, Michelle M.

    2013-01-01

    Many teachers of statistics recommend using real-life data during class lessons. However, there has been little systematic study of what effect this teaching method has on student engagement and learning. The present study examined this question in a first-year university statistics course. Students (n = 38) were interviewed and their reflections…

  1. Recent statistical methods for orientation data

    NASA Technical Reports Server (NTRS)

    Batschelet, E.

    1972-01-01

    The application of statistical methods for determining the areas of animal orientation and navigation are discussed. The method employed is limited to the two-dimensional case. Various tests for determining the validity of the statistical analysis are presented. Mathematical models are included to support the theoretical considerations and tables of data are developed to show the value of information obtained by statistical analysis.

  2. Developing Conceptual Understanding in a Statistics Course: Merrill's First Principles and Real Data at Work

    ERIC Educational Resources Information Center

    Tu, Wendy; Snyder, Martha M.

    2017-01-01

    Difficulties in learning statistics primarily at the college-level led to a reform movement in statistics education in the early 1990s. Although much work has been done, effective learning designs that facilitate active learning, conceptual understanding of statistics, and the use of real-data in the classroom are needed. Guided by Merrill's First…

  3. OPR-PPR, a Computer Program for Assessing Data Importance to Model Predictions Using Linear Statistics

    USGS Publications Warehouse

    Tonkin, Matthew J.; Tiedeman, Claire; Ely, D. Matthew; Hill, Mary C.

    2007-01-01

    The OPR-PPR program calculates the Observation-Prediction (OPR) and Parameter-Prediction (PPR) statistics that can be used to evaluate the relative importance of various kinds of data to simulated predictions. The data considered fall into three categories: (1) existing observations, (2) potential observations, and (3) potential information about parameters. The first two are addressed by the OPR statistic; the third is addressed by the PPR statistic. The statistics are based on linear theory and measure the leverage of the data, which depends on the location, the type, and possibly the time of the data being considered. For example, in a ground-water system the type of data might be a head measurement at a particular location and time. As a measure of leverage, the statistics do not take into account the value of the measurement. As linear measures, the OPR and PPR statistics require minimal computational effort once sensitivities have been calculated. Sensitivities need to be calculated for only one set of parameter values; commonly these are the values estimated through model calibration. OPR-PPR can calculate the OPR and PPR statistics for any mathematical model that produces the necessary OPR-PPR input files. In this report, OPR-PPR capabilities are presented in the context of using the ground-water model MODFLOW-2000 and the universal inverse program UCODE_2005. The method used to calculate the OPR and PPR statistics is based on the linear equation for prediction standard deviation. Using sensitivities and other information, OPR-PPR calculates (a) the percent increase in the prediction standard deviation that results when one or more existing observations are omitted from the calibration data set; (b) the percent decrease in the prediction standard deviation that results when one or more potential observations are added to the calibration data set; or (c) the percent decrease in the prediction standard deviation that results when potential information on one or more parameters is added.

  4. Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results

    PubMed Central

    Wicherts, Jelte M.; Bakker, Marjan; Molenaar, Dylan

    2011-01-01

    Background The widespread reluctance to share published research data is often hypothesized to be due to the authors' fear that reanalysis may expose errors in their work or may produce conclusions that contradict their own. However, these hypotheses have not previously been studied systematically. Methods and Findings We related the reluctance to share research data for reanalysis to 1148 statistically significant results reported in 49 papers published in two major psychology journals. We found the reluctance to share data to be associated with weaker evidence (against the null hypothesis of no effect) and a higher prevalence of apparent errors in the reporting of statistical results. The unwillingness to share data was particularly clear when reporting errors had a bearing on statistical significance. Conclusions Our findings on the basis of psychological papers suggest that statistical results are particularly hard to verify when reanalysis is more likely to lead to contrasting conclusions. This highlights the importance of establishing mandatory data archiving policies. PMID:22073203

  5. Integrated Analysis of Pharmacologic, Clinical, and SNP Microarray Data using Projection onto the Most Interesting Statistical Evidence with Adaptive Permutation Testing

    PubMed Central

    Pounds, Stan; Cao, Xueyuan; Cheng, Cheng; Yang, Jun; Campana, Dario; Evans, William E.; Pui, Ching-Hon; Relling, Mary V.

    2010-01-01

    Powerful methods for integrated analysis of multiple biological data sets are needed to maximize interpretation capacity and acquire meaningful knowledge. We recently developed Projection Onto the Most Interesting Statistical Evidence (PROMISE). PROMISE is a statistical procedure that incorporates prior knowledge about the biological relationships among endpoint variables into an integrated analysis of microarray gene expression data with multiple biological and clinical endpoints. Here, PROMISE is adapted to the integrated analysis of pharmacologic, clinical, and genome-wide genotype data that incorporating knowledge about the biological relationships among pharmacologic and clinical response data. An efficient permutation-testing algorithm is introduced so that statistical calculations are computationally feasible in this higher-dimension setting. The new method is applied to a pediatric leukemia data set. The results clearly indicate that PROMISE is a powerful statistical tool for identifying genomic features that exhibit a biologically meaningful pattern of association with multiple endpoint variables. PMID:21516175

  6. Statistical Relational Learning (SRL) as an Enabling Technology for Data Acquisition and Data Fusion in Video

    DTIC Science & Technology

    2013-05-02

    REPORT Statistical Relational Learning ( SRL ) as an Enabling Technology for Data Acquisition and Data Fusion in Video 14. ABSTRACT 16. SECURITY...particular, it is important to reason about which portions of video require expensive analysis and storage. This project aims to make these...inferences using new and existing tools from Statistical Relational Learning ( SRL ). SRL is a recently emerging technology that enables the effective 1

  7. Statistical Methods in Integrative Genomics

    PubMed Central

    Richardson, Sylvia; Tseng, George C.; Sun, Wei

    2016-01-01

    Statistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then review statistical methods of integrative genomics, with emphasis on the motivation and rationale of these methods. We conclude with some summary points and future research directions. PMID:27482531

  8. A global goodness-of-fit statistic for Cox regression models.

    PubMed

    Parzen, M; Lipsitz, S R

    1999-06-01

    In this paper, a global goodness-of-fit test statistic for a Cox regression model, which has an approximate chi-squared distribution when the model has been correctly specified, is proposed. Our goodness-of-fit statistic is global and has power to detect if interactions or higher order powers of covariates in the model are needed. The proposed statistic is similar to the Hosmer and Lemeshow (1980, Communications in Statistics A10, 1043-1069) goodness-of-fit statistic for binary data as well as Schoenfeld's (1980, Biometrika 67, 145-153) statistic for the Cox model. The methods are illustrated using data from a Mayo Clinic trial in primary billiary cirrhosis of the liver (Fleming and Harrington, 1991, Counting Processes and Survival Analysis), in which the outcome is the time until liver transplantation or death. The are 17 possible covariates. Two Cox proportional hazards models are fit to the data, and the proposed goodness-of-fit statistic is applied to the fitted models.

  9. Illustrating the practice of statistics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hamada, Christina A; Hamada, Michael S

    2009-01-01

    The practice of statistics involves analyzing data and planning data collection schemes to answer scientific questions. Issues often arise with the data that must be dealt with and can lead to new procedures. In analyzing data, these issues can sometimes be addressed through the statistical models that are developed. Simulation can also be helpful in evaluating a new procedure. Moreover, simulation coupled with optimization can be used to plan a data collection scheme. The practice of statistics as just described is much more than just using a statistical package. In analyzing the data, it involves understanding the scientific problem andmore » incorporating the scientist's knowledge. In modeling the data, it involves understanding how the data were collected and accounting for limitations of the data where possible. Moreover, the modeling is likely to be iterative by considering a series of models and evaluating the fit of these models. Designing a data collection scheme involves understanding the scientist's goal and staying within hislher budget in terms of time and the available resources. Consequently, a practicing statistician is faced with such tasks and requires skills and tools to do them quickly. We have written this article for students to provide a glimpse of the practice of statistics. To illustrate the practice of statistics, we consider a problem motivated by some precipitation data that our relative, Masaru Hamada, collected some years ago. We describe his rain gauge observational study in Section 2. We describe modeling and an initial analysis of the precipitation data in Section 3. In Section 4, we consider alternative analyses that address potential issues with the precipitation data. In Section 5, we consider the impact of incorporating additional infonnation. We design a data collection scheme to illustrate the use of simulation and optimization in Section 6. We conclude this article in Section 7 with a discussion.« less

  10. Classical Statistics and Statistical Learning in Imaging Neuroscience

    PubMed Central

    Bzdok, Danilo

    2017-01-01

    Brain-imaging research has predominantly generated insight by means of classical statistics, including regression-type analyses and null-hypothesis testing using t-test and ANOVA. Throughout recent years, statistical learning methods enjoy increasing popularity especially for applications in rich and complex data, including cross-validated out-of-sample prediction using pattern classification and sparsity-inducing regression. This concept paper discusses the implications of inferential justifications and algorithmic methodologies in common data analysis scenarios in neuroimaging. It is retraced how classical statistics and statistical learning originated from different historical contexts, build on different theoretical foundations, make different assumptions, and evaluate different outcome metrics to permit differently nuanced conclusions. The present considerations should help reduce current confusion between model-driven classical hypothesis testing and data-driven learning algorithms for investigating the brain with imaging techniques. PMID:29056896

  11. [Design and implementation of online statistical analysis function in information system of air pollution and health impact monitoring].

    PubMed

    Lü, Yiran; Hao, Shuxin; Zhang, Guoqing; Liu, Jie; Liu, Yue; Xu, Dongqun

    2018-01-01

    To implement the online statistical analysis function in information system of air pollution and health impact monitoring, and obtain the data analysis information real-time. Using the descriptive statistical method as well as time-series analysis and multivariate regression analysis, SQL language and visual tools to implement online statistical analysis based on database software. Generate basic statistical tables and summary tables of air pollution exposure and health impact data online; Generate tendency charts of each data part online and proceed interaction connecting to database; Generate butting sheets which can lead to R, SAS and SPSS directly online. The information system air pollution and health impact monitoring implements the statistical analysis function online, which can provide real-time analysis result to its users.

  12. Data Mining: Going beyond Traditional Statistics

    ERIC Educational Resources Information Center

    Zhao, Chun-Mei; Luan, Jing

    2006-01-01

    The authors provide an overview of data mining, giving special attention to the relationship between data mining and statistics to unravel some misunderstandings about the two techniques. (Contains 1 figure.)

  13. Defense and Development in Sub-Saharan Africa: Codebook.

    DTIC Science & Technology

    1988-03-01

    countries by presenting the different data sources and explaining how they were compiled. The statistics in the 0 database cover 41 African countries for...February 1984, pp. 157-164 -vi Finally, in addition to the economic and military data , some statistics have been compiled that monitor social and...32 IX. SOCIAL/POLITICAL STATISTICS ....................................34 SOURCES AND NOTES ON COLLECTION OF DATA

  14. Novel Kalman filter algorithm for statistical monitoring of extensive landscapes with synoptic sensor data

    Treesearch

    Raymond L. Czaplewski

    2015-01-01

    Wall-to-wall remotely sensed data are increasingly available to monitor landscape dynamics over large geographic areas. However, statistical monitoring programs that use post-stratification cannot fully utilize those sensor data. The Kalman filter (KF) is an alternative statistical estimator. I develop a new KF algorithm that is numerically robust with large numbers of...

  15. Use of Statistics from National Data Sources to Inform Rehabilitation Program Planning, Evaluation, and Advocacy

    ERIC Educational Resources Information Center

    Bruyere, Susanne M.; Houtenville, Andrew J.

    2006-01-01

    Data on people with disabilities can be used to confirm service needs and to evaluate the resulting impact of services. Disability statistics from surveys and administrative records can play a meaningful role in such efforts. In this article, the authors describe the array of available data and statistics and their potential uses in rehabilitation…

  16. Increasing Confidence in a Statistics Course: Assessing Students' Beliefs and Using the Data to Design Curriculum with Students

    ERIC Educational Resources Information Center

    Huchting, Karen

    2013-01-01

    Students were involved in the curriculum design of a statistics course. They completed a pre-assessment of their confidence and skills using quantitative methods and statistics. Scores were aggregated, and anonymous data were shown on the first night of class. Using these data, the course was designed, demonstrating evidence-based instructional…

  17. Data analysis report on ATS-F COMSAT millimeter wave propagation experiment, part 1. [effects of hydrometeors on ground to satellite communication

    NASA Technical Reports Server (NTRS)

    Hyde, G.

    1976-01-01

    The 13/18 GHz COMSAT Propagation Experiment (CPE) was performed to measure attenuation caused by hydrometeors along slant paths from transmitting terminals on the ground to the ATS-6 satellite. The effectiveness of site diversity in overcoming this impairment was also studied. Problems encountered in assembling a valid data base of rain induced attenuation data for statistical analysis are considered. The procedures used to obtain the various statistics are then outlined. The graphs and tables of statistical data for the 15 dual frequency (13 and 18 GHz) site diversity locations are discussed. Cumulative rain rate statistics for the Fayetteville and Boston sites based on point rainfall data collected are presented along with extrapolations of the attenuation and point rainfall data.

  18. Testing for nonlinearity in time series: The method of surrogate data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Theiler, J.; Galdrikian, B.; Longtin, A.

    1991-01-01

    We describe a statistical approach for identifying nonlinearity in time series; in particular, we want to avoid claims of chaos when simpler models (such as linearly correlated noise) can explain the data. The method requires a careful statement of the null hypothesis which characterizes a candidate linear process, the generation of an ensemble of surrogate'' data sets which are similar to the original time series but consistent with the null hypothesis, and the computation of a discriminating statistic for the original and for each of the surrogate data sets. The idea is to test the original time series against themore » null hypothesis by checking whether the discriminating statistic computed for the original time series differs significantly from the statistics computed for each of the surrogate sets. We present algorithms for generating surrogate data under various null hypotheses, and we show the results of numerical experiments on artificial data using correlation dimension, Lyapunov exponent, and forecasting error as discriminating statistics. Finally, we consider a number of experimental time series -- including sunspots, electroencephalogram (EEG) signals, and fluid convection -- and evaluate the statistical significance of the evidence for nonlinear structure in each case. 56 refs., 8 figs.« less

  19. Selecting Summary Statistics in Approximate Bayesian Computation for Calibrating Stochastic Models

    PubMed Central

    Burr, Tom

    2013-01-01

    Approximate Bayesian computation (ABC) is an approach for using measurement data to calibrate stochastic computer models, which are common in biology applications. ABC is becoming the “go-to” option when the data and/or parameter dimension is large because it relies on user-chosen summary statistics rather than the full data and is therefore computationally feasible. One technical challenge with ABC is that the quality of the approximation to the posterior distribution of model parameters depends on the user-chosen summary statistics. In this paper, the user requirement to choose effective summary statistics in order to accurately estimate the posterior distribution of model parameters is investigated and illustrated by example, using a model and corresponding real data of mitochondrial DNA population dynamics. We show that for some choices of summary statistics, the posterior distribution of model parameters is closely approximated and for other choices of summary statistics, the posterior distribution is not closely approximated. A strategy to choose effective summary statistics is suggested in cases where the stochastic computer model can be run at many trial parameter settings, as in the example. PMID:24288668

  20. Selecting summary statistics in approximate Bayesian computation for calibrating stochastic models.

    PubMed

    Burr, Tom; Skurikhin, Alexei

    2013-01-01

    Approximate Bayesian computation (ABC) is an approach for using measurement data to calibrate stochastic computer models, which are common in biology applications. ABC is becoming the "go-to" option when the data and/or parameter dimension is large because it relies on user-chosen summary statistics rather than the full data and is therefore computationally feasible. One technical challenge with ABC is that the quality of the approximation to the posterior distribution of model parameters depends on the user-chosen summary statistics. In this paper, the user requirement to choose effective summary statistics in order to accurately estimate the posterior distribution of model parameters is investigated and illustrated by example, using a model and corresponding real data of mitochondrial DNA population dynamics. We show that for some choices of summary statistics, the posterior distribution of model parameters is closely approximated and for other choices of summary statistics, the posterior distribution is not closely approximated. A strategy to choose effective summary statistics is suggested in cases where the stochastic computer model can be run at many trial parameter settings, as in the example.

  1. Further developments in cloud statistics for computer simulations

    NASA Technical Reports Server (NTRS)

    Chang, D. T.; Willand, J. H.

    1972-01-01

    This study is a part of NASA's continued program to provide global statistics of cloud parameters for computer simulation. The primary emphasis was on the development of the data bank of the global statistical distributions of cloud types and cloud layers and their applications in the simulation of the vertical distributions of in-cloud parameters such as liquid water content. These statistics were compiled from actual surface observations as recorded in Standard WBAN forms. Data for a total of 19 stations were obtained and reduced. These stations were selected to be representative of the 19 primary cloud climatological regions defined in previous studies of cloud statistics. Using the data compiled in this study, a limited study was conducted of the hemogeneity of cloud regions, the latitudinal dependence of cloud-type distributions, the dependence of these statistics on sample size, and other factors in the statistics which are of significance to the problem of simulation. The application of the statistics in cloud simulation was investigated. In particular, the inclusion of the new statistics in an expanded multi-step Monte Carlo simulation scheme is suggested and briefly outlined.

  2. Statistical analysis of water-quality data containing multiple detection limits: S-language software for regression on order statistics

    USGS Publications Warehouse

    Lee, L.; Helsel, D.

    2005-01-01

    Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.

  3. Experimental statistics for biological sciences.

    PubMed

    Bang, Heejung; Davidian, Marie

    2010-01-01

    In this chapter, we cover basic and fundamental principles and methods in statistics - from "What are Data and Statistics?" to "ANOVA and linear regression," which are the basis of any statistical thinking and undertaking. Readers can easily find the selected topics in most introductory statistics textbooks, but we have tried to assemble and structure them in a succinct and reader-friendly manner in a stand-alone chapter. This text has long been used in real classroom settings for both undergraduate and graduate students who do or do not major in statistical sciences. We hope that from this chapter, readers would understand the key statistical concepts and terminologies, how to design a study (experimental or observational), how to analyze the data (e.g., describe the data and/or estimate the parameter(s) and make inference), and how to interpret the results. This text would be most useful if it is used as a supplemental material, while the readers take their own statistical courses or it would serve as a great reference text associated with a manual for any statistical software as a self-teaching guide.

  4. Statistical Policy Working Paper 25. Data Editing Workshop and Exposition

    DOT National Transportation Integrated Search

    1996-12-01

    Statistical Policy Working Paper 25 is the written record of the Data Editing Workshop and Exposition held March 22, 1996, at the Bureau of Labor Statistics (BLS) Conference and Training Center. The program consisted of 44 oral presentations and 19 s...

  5. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bennett, Janine Camille; Thompson, David; Pebay, Philippe Pierre

    Statistical analysis is typically used to reduce the dimensionality of and infer meaning from data. A key challenge of any statistical analysis package aimed at large-scale, distributed data is to address the orthogonal issues of parallel scalability and numerical stability. Many statistical techniques, e.g., descriptive statistics or principal component analysis, are based on moments and co-moments and, using robust online update formulas, can be computed in an embarrassingly parallel manner, amenable to a map-reduce style implementation. In this paper we focus on contingency tables, through which numerous derived statistics such as joint and marginal probability, point-wise mutual information, information entropy,more » and {chi}{sup 2} independence statistics can be directly obtained. However, contingency tables can become large as data size increases, requiring a correspondingly large amount of communication between processors. This potential increase in communication prevents optimal parallel speedup and is the main difference with moment-based statistics (which we discussed in [1]) where the amount of inter-processor communication is independent of data size. Here we present the design trade-offs which we made to implement the computation of contingency tables in parallel. We also study the parallel speedup and scalability properties of our open source implementation. In particular, we observe optimal speed-up and scalability when the contingency statistics are used in their appropriate context, namely, when the data input is not quasi-diffuse.« less

  6. Powerful Statistical Inference for Nested Data Using Sufficient Summary Statistics

    PubMed Central

    Dowding, Irene; Haufe, Stefan

    2018-01-01

    Hierarchically-organized data arise naturally in many psychology and neuroscience studies. As the standard assumption of independent and identically distributed samples does not hold for such data, two important problems are to accurately estimate group-level effect sizes, and to obtain powerful statistical tests against group-level null hypotheses. A common approach is to summarize subject-level data by a single quantity per subject, which is often the mean or the difference between class means, and treat these as samples in a group-level t-test. This “naive” approach is, however, suboptimal in terms of statistical power, as it ignores information about the intra-subject variance. To address this issue, we review several approaches to deal with nested data, with a focus on methods that are easy to implement. With what we call the sufficient-summary-statistic approach, we highlight a computationally efficient technique that can improve statistical power by taking into account within-subject variances, and we provide step-by-step instructions on how to apply this approach to a number of frequently-used measures of effect size. The properties of the reviewed approaches and the potential benefits over a group-level t-test are quantitatively assessed on simulated data and demonstrated on EEG data from a simulated-driving experiment. PMID:29615885

  7. Chi-squared and C statistic minimization for low count per bin data

    NASA Astrophysics Data System (ADS)

    Nousek, John A.; Shue, David R.

    1989-07-01

    Results are presented from a computer simulation comparing two statistical fitting techniques on data samples with large and small counts per bin; the results are then related specifically to X-ray astronomy. The Marquardt and Powell minimization techniques are compared by using both to minimize the chi-squared statistic. In addition, Cash's C statistic is applied, with Powell's method, and it is shown that the C statistic produces better fits in the low-count regime than chi-squared.

  8. Chi-squared and C statistic minimization for low count per bin data. [sampling in X ray astronomy

    NASA Technical Reports Server (NTRS)

    Nousek, John A.; Shue, David R.

    1989-01-01

    Results are presented from a computer simulation comparing two statistical fitting techniques on data samples with large and small counts per bin; the results are then related specifically to X-ray astronomy. The Marquardt and Powell minimization techniques are compared by using both to minimize the chi-squared statistic. In addition, Cash's C statistic is applied, with Powell's method, and it is shown that the C statistic produces better fits in the low-count regime than chi-squared.

  9. New Statistics for Testing Differential Expression of Pathways from Microarray Data

    NASA Astrophysics Data System (ADS)

    Siu, Hoicheong; Dong, Hua; Jin, Li; Xiong, Momiao

    Exploring biological meaning from microarray data is very important but remains a great challenge. Here, we developed three new statistics: linear combination test, quadratic test and de-correlation test to identify differentially expressed pathways from gene expression profile. We apply our statistics to two rheumatoid arthritis datasets. Notably, our results reveal three significant pathways and 275 genes in common in two datasets. The pathways we found are meaningful to uncover the disease mechanisms of rheumatoid arthritis, which implies that our statistics are a powerful tool in functional analysis of gene expression data.

  10. Fundamentals and Catalytic Innovation: The Statistical and Data Management Center of the Antibacterial Resistance Leadership Group

    PubMed Central

    Huvane, Jacqueline; Komarow, Lauren; Hill, Carol; Tran, Thuy Tien T.; Pereira, Carol; Rosenkranz, Susan L.; Finnemeyer, Matt; Earley, Michelle; Jiang, Hongyu (Jeanne); Wang, Rui; Lok, Judith

    2017-01-01

    Abstract The Statistical and Data Management Center (SDMC) provides the Antibacterial Resistance Leadership Group (ARLG) with statistical and data management expertise to advance the ARLG research agenda. The SDMC is active at all stages of a study, including design; data collection and monitoring; data analyses and archival; and publication of study results. The SDMC enhances the scientific integrity of ARLG studies through the development and implementation of innovative and practical statistical methodologies and by educating research colleagues regarding the application of clinical trial fundamentals. This article summarizes the challenges and roles, as well as the innovative contributions in the design, monitoring, and analyses of clinical trials and diagnostic studies, of the ARLG SDMC. PMID:28350899

  11. Correcting for Optimistic Prediction in Small Data Sets

    PubMed Central

    Smith, Gordon C. S.; Seaman, Shaun R.; Wood, Angela M.; Royston, Patrick; White, Ian R.

    2014-01-01

    The C statistic is a commonly reported measure of screening test performance. Optimistic estimation of the C statistic is a frequent problem because of overfitting of statistical models in small data sets, and methods exist to correct for this issue. However, many studies do not use such methods, and those that do correct for optimism use diverse methods, some of which are known to be biased. We used clinical data sets (United Kingdom Down syndrome screening data from Glasgow (1991–2003), Edinburgh (1999–2003), and Cambridge (1990–2006), as well as Scottish national pregnancy discharge data (2004–2007)) to evaluate different approaches to adjustment for optimism. We found that sample splitting, cross-validation without replication, and leave-1-out cross-validation produced optimism-adjusted estimates of the C statistic that were biased and/or associated with greater absolute error than other available methods. Cross-validation with replication, bootstrapping, and a new method (leave-pair-out cross-validation) all generated unbiased optimism-adjusted estimates of the C statistic and had similar absolute errors in the clinical data set. Larger simulation studies confirmed that all 3 methods performed similarly with 10 or more events per variable, or when the C statistic was 0.9 or greater. However, with lower events per variable or lower C statistics, bootstrapping tended to be optimistic but with lower absolute and mean squared errors than both methods of cross-validation. PMID:24966219

  12. Physics in Perspective Volume II, Part C, Statistical Data.

    ERIC Educational Resources Information Center

    National Academy of Sciences - National Research Council, Washington, DC. Physics Survey Committee.

    Statistical data relating to the sociology and economics of the physics enterprise are presented and explained. The data are divided into three sections: manpower data, data on funding and costs, and data on the literature of physics. Each section includes numerous studies, with notes on the sources and types of data, gathering procedures, and…

  13. Analysis Code - Data Analysis in 'Leveraging Multiple Statistical Methods for Inverse Prediction in Nuclear Forensics Applications' (LMSMIPNFA) v. 1.0

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lewis, John R

    R code that performs the analysis of a data set presented in the paper ‘Leveraging Multiple Statistical Methods for Inverse Prediction in Nuclear Forensics Applications’ by Lewis, J., Zhang, A., Anderson-Cook, C. It provides functions for doing inverse predictions in this setting using several different statistical methods. The data set is a publicly available data set from a historical Plutonium production experiment.

  14. Section 6 Schools in Six States: Eleven Case Studies of Transfer Issues

    DTIC Science & Technology

    1991-01-01

    in Facing Up--21: Statistical Data on Virginia’s Public Schools, published by the State Department of Education , Virginia, June 1987. - 200 - State...Department of Education , Facing Up--23 Statistical Data on Virginia’s Public Schools, April 1989. Although the county has experienced some growth, the...and state level. 2This and the following data are from the Virginia Department of Education , Facing Up--23 Statistical Data on Virginia’s Public

  15. Aircraft Maneuvers for the Evaluation of Flying Qualities and Agility. Volume 1. Maneuver Development Process and Initial Maneuver Set

    DTIC Science & Technology

    1993-08-01

    subtitled "Simulation Data," consists of detailed infonrnation on the design parmneter variations tested, subsequent statistical analyses conducted...used with confidence during the design process. The data quality can be examined in various forms such as statistical analyses of measure of merit data...merit, such as time to capture or nmaximurn pitch rate, can be calculated from the simulation time history data. Statistical techniques are then used

  16. A Stochastic Fractional Dynamics Model of Rainfall Statistics

    NASA Astrophysics Data System (ADS)

    Kundu, Prasun; Travis, James

    2013-04-01

    Rainfall varies in space and time in a highly irregular manner and is described naturally in terms of a stochastic process. A characteristic feature of rainfall statistics is that they depend strongly on the space-time scales over which rain data are averaged. A spectral model of precipitation has been developed based on a stochastic differential equation of fractional order for the point rain rate, that allows a concise description of the second moment statistics of rain at any prescribed space-time averaging scale. The model is designed to faithfully reflect the scale dependence and is thus capable of providing a unified description of the statistics of both radar and rain gauge data. The underlying dynamical equation can be expressed in terms of space-time derivatives of fractional orders that are adjusted together with other model parameters to fit the data. The form of the resulting spectrum gives the model adequate flexibility to capture the subtle interplay between the spatial and temporal scales of variability of rain but strongly constrains the predicted statistical behavior as a function of the averaging length and times scales. The main restriction is the assumption that the statistics of the precipitation field is spatially homogeneous and isotropic and stationary in time. We test the model with radar and gauge data collected contemporaneously at the NASA TRMM ground validation sites located near Melbourne, Florida and in Kwajalein Atoll, Marshall Islands in the tropical Pacific. We estimate the parameters by tuning them to the second moment statistics of the radar data. The model predictions are then found to fit the second moment statistics of the gauge data reasonably well without any further adjustment. Some data sets containing periods of non-stationary behavior that involves occasional anomalously correlated rain events, present a challenge for the model.

  17. Metro U.S.A. Data Sheet: Population Estimates and Selected Demographic Indicators for the Metropolitan Areas of the United States. Special edition of the United States Population Data Sheet.

    ERIC Educational Resources Information Center

    Population Reference Bureau, Inc., Washington, DC.

    This poster-size data sheet presents population estimates and selected demographic indicators for the nation's 281 metropolitan areas. These areas are divided into 261 Metropolitan Statistical Areas (MSAs) and 20 Consolidated Metropolitan Statistical Areas (CMSAs), reporting units which replace the Standard Metropolitan Statistical Areas (SMSAs)…

  18. Statistical Analysis of Research Data | Center for Cancer Research

    Cancer.gov

    Recent advances in cancer biology have resulted in the need for increased statistical analysis of research data. The Statistical Analysis of Research Data (SARD) course will be held on April 5-6, 2018 from 9 a.m.-5 p.m. at the National Institutes of Health's Natcher Conference Center, Balcony C on the Bethesda Campus. SARD is designed to provide an overview on the general

  19. FT. Sam 91 Whiskey Combat Medic Medical Simulation Training Quantitative Integration Enhancement Program

    DTIC Science & Technology

    2011-07-01

    joined the project team in the statistical and research coordination role. Dr. Collin is an employee at the University of Pittsburgh. A successful...3. Submit to Ft. Detrick Completed Milestone: Statistical analysis planning 1. Review planned data metrics and data gathering tools...approach to performance assessment for continuous quality improvement.  Analyzing data with modern statistical techniques to determine the

  20. Make measurable what is not so: national monitoring of the status of persons with intellectual disability.

    PubMed

    Fujiura, Glenn T; Rutkowski-Kmitta, Violet; Owen, Randall

    2010-12-01

    Statistics are critical in holding governments accountable for the well-being of citizens with disability. International initiatives are underway to improve the quality of disability statistics, but meaningful ID data is exceptionally rare. The status of ID data was evaluated in a review of 12 national statistical systems. Recurring data collection by national ministries was identified and the availability of measures of poverty, exclusion, and disadvantage was assessed. A total of 131 recurring systems coordinated by 50 different ministries were identified. The majority included general disability but less than 25% of the systems screened ID. Of these, few provided policy-relevant data. The scope of ID data was dismal at best, though a significant statistical infrastructure exists for the integration of ID data. Advocacy will be necessary. There is no optimal form of data monitoring, and decisions regarding priorities in purpose, targeted audiences, and the goals for surveillance must be resolved.

  1. 20 CFR 634.4 - Statistical standards.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 20 Employees' Benefits 3 2011-04-01 2011-04-01 false Statistical standards. 634.4 Section 634.4... System § 634.4 Statistical standards. Recipients shall agree to provide required data following the statistical standards prescribed by the Bureau of Labor Statistics for cooperative statistical programs. ...

  2. 20 CFR 634.4 - Statistical standards.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 20 Employees' Benefits 3 2010-04-01 2010-04-01 false Statistical standards. 634.4 Section 634.4... System § 634.4 Statistical standards. Recipients shall agree to provide required data following the statistical standards prescribed by the Bureau of Labor Statistics for cooperative statistical programs. ...

  3. Towards good practice for health statistics: lessons from the Millennium Development Goal health indicators.

    PubMed

    Murray, Christopher J L

    2007-03-10

    Health statistics are at the centre of an increasing number of worldwide health controversies. Several factors are sharpening the tension between the supply and demand for high quality health information, and the health-related Millennium Development Goals (MDGs) provide a high-profile example. With thousands of indicators recommended but few measured well, the worldwide health community needs to focus its efforts on improving measurement of a small set of priority areas. Priority indicators should be selected on the basis of public-health significance and several dimensions of measurability. Health statistics can be divided into three types: crude, corrected, and predicted. Health statistics are necessary inputs to planning and strategic decision making, programme implementation, monitoring progress towards targets, and assessment of what works and what does not. Crude statistics that are biased have no role in any of these steps; corrected statistics are preferred. For strategic decision making, when corrected statistics are unavailable, predicted statistics can play an important part. For monitoring progress towards agreed targets and assessment of what works and what does not, however, predicted statistics should not be used. Perhaps the most effective method to decrease controversy over health statistics and to encourage better primary data collection and the development of better analytical methods is a strong commitment to provision of an explicit data audit trail. This initiative would make available the primary data, all post-data collection adjustments, models including covariates used for farcasting and forecasting, and necessary documentation to the public.

  4. Assessing the Kansas water-level monitoring program: An example of the application of classical statistics to a geological problem

    USGS Publications Warehouse

    Davis, J.C.

    2000-01-01

    Geologists may feel that geological data are not amenable to statistical analysis, or at best require specialized approaches such as nonparametric statistics and geostatistics. However, there are many circumstances, particularly in systematic studies conducted for environmental or regulatory purposes, where traditional parametric statistical procedures can be beneficial. An example is the application of analysis of variance to data collected in an annual program of measuring groundwater levels in Kansas. Influences such as well conditions, operator effects, and use of the water can be assessed and wells that yield less reliable measurements can be identified. Such statistical studies have resulted in yearly improvements in the quality and reliability of the collected hydrologic data. Similar benefits may be achieved in other geological studies by the appropriate use of classical statistical tools.

  5. Applications of spatial statistical network models to stream data

    USGS Publications Warehouse

    Isaak, Daniel J.; Peterson, Erin E.; Ver Hoef, Jay M.; Wenger, Seth J.; Falke, Jeffrey A.; Torgersen, Christian E.; Sowder, Colin; Steel, E. Ashley; Fortin, Marie-Josée; Jordan, Chris E.; Ruesch, Aaron S.; Som, Nicholas; Monestiez, Pascal

    2014-01-01

    Streams and rivers host a significant portion of Earth's biodiversity and provide important ecosystem services for human populations. Accurate information regarding the status and trends of stream resources is vital for their effective conservation and management. Most statistical techniques applied to data measured on stream networks were developed for terrestrial applications and are not optimized for streams. A new class of spatial statistical model, based on valid covariance structures for stream networks, can be used with many common types of stream data (e.g., water quality attributes, habitat conditions, biological surveys) through application of appropriate distributions (e.g., Gaussian, binomial, Poisson). The spatial statistical network models account for spatial autocorrelation (i.e., nonindependence) among measurements, which allows their application to databases with clustered measurement locations. Large amounts of stream data exist in many areas where spatial statistical analyses could be used to develop novel insights, improve predictions at unsampled sites, and aid in the design of efficient monitoring strategies at relatively low cost. We review the topic of spatial autocorrelation and its effects on statistical inference, demonstrate the use of spatial statistics with stream datasets relevant to common research and management questions, and discuss additional applications and development potential for spatial statistics on stream networks. Free software for implementing the spatial statistical network models has been developed that enables custom applications with many stream databases.

  6. Transportation Statistical Data and Information

    DOT National Transportation Integrated Search

    1976-12-01

    The document contains an extensive review of internal and external sources of transportation data and statistics especially created for data administrators. Organized around the transportation industry and around the elements of the U.S. Department o...

  7. Launch commit criteria performance trending analysis, phase 1, revision A. SRM and QA mission services

    NASA Technical Reports Server (NTRS)

    1989-01-01

    An assessment of quantitative methods and measures for measuring launch commit criteria (LCC) performance measurement trends is made. A statistical performance trending analysis pilot study was processed and compared to STS-26 mission data. This study used four selected shuttle measurement types (solid rocket booster, external tank, space shuttle main engine, and range safety switch safe and arm device) from the five missions prior to mission 51-L. After obtaining raw data coordinates, each set of measurements was processed to obtain statistical confidence bounds and mean data profiles for each of the selected measurement types. STS-26 measurements were compared to the statistical data base profiles to verify the statistical capability of assessing occurrences of data trend anomalies and abnormal time-varying operational conditions associated with data amplitude and phase shifts.

  8. Statistical Record of Native North Americans. Second Edition.

    ERIC Educational Resources Information Center

    Reddy, Marlita A., Ed.

    This book compiles statistical data on Native North American populations, including Alaska and Canada Natives. Data sources include federal and state agencies, census records, tribal governments, associations, and other organizations. The book includes statistics on Native North Americans as compared with other racial and ethnic groups under…

  9. Confidentiality of Research and Statistical Data.

    ERIC Educational Resources Information Center

    Law Enforcement Assistance Administration (Dept. of Justice), Washington, DC.

    This document was prepared by the Privacy and Security Staff, National Criminal Justice Information and Statistics Service, in conjunction with the Law Enforcement Assistance Administration (LEAA) Office of General Counsel, to explain and discuss the requirements of the LEAA regulations governing confidentiality of research and statistical data.…

  10. 28 CFR 22.22 - Revelation of identifiable data.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... STATISTICAL INFORMATION § 22.22 Revelation of identifiable data. (a) Except as noted in paragraph (b) of this section, research and statistical information relating to a private person may be revealed in identifiable... Act. (3) Persons or organizations for research or statistical purposes. Information may only be...

  11. Statistical Analysis of Big Data on Pharmacogenomics

    PubMed Central

    Fan, Jianqing; Liu, Han

    2013-01-01

    This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively review several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, large-scale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identifying important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed. PMID:23602905

  12. Scaling up to address data science challenges

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wendelberger, Joanne R.

    Statistics and Data Science provide a variety of perspectives and technical approaches for exploring and understanding Big Data. Partnerships between scientists from different fields such as statistics, machine learning, computer science, and applied mathematics can lead to innovative approaches for addressing problems involving increasingly large amounts of data in a rigorous and effective manner that takes advantage of advances in computing. Here, this article will explore various challenges in Data Science and will highlight statistical approaches that can facilitate analysis of large-scale data including sampling and data reduction methods, techniques for effective analysis and visualization of large-scale simulations, and algorithmsmore » and procedures for efficient processing.« less

  13. Scaling up to address data science challenges

    DOE PAGES

    Wendelberger, Joanne R.

    2017-04-27

    Statistics and Data Science provide a variety of perspectives and technical approaches for exploring and understanding Big Data. Partnerships between scientists from different fields such as statistics, machine learning, computer science, and applied mathematics can lead to innovative approaches for addressing problems involving increasingly large amounts of data in a rigorous and effective manner that takes advantage of advances in computing. Here, this article will explore various challenges in Data Science and will highlight statistical approaches that can facilitate analysis of large-scale data including sampling and data reduction methods, techniques for effective analysis and visualization of large-scale simulations, and algorithmsmore » and procedures for efficient processing.« less

  14. Statistical properties of measures of association and the Kappa statistic for assessing the accuracy of remotely sensed data using double sampling

    Treesearch

    Mohammed A. Kalkhan; Robin M. Reich; Raymond L. Czaplewski

    1996-01-01

    A Monte Carlo simulation was used to evaluate the statistical properties of measures of association and the Kappa statistic under double sampling with replacement. Three error matrices representing three levels of classification accuracy of Landsat TM Data consisting of four forest cover types in North Carolina. The overall accuracy of the five indices ranged from 0.35...

  15. Binomial test statistics using Psi functions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bowman, Kimiko o

    2007-01-01

    For the negative binomial model (probability generating function (p + 1 - pt){sup -k}) a logarithmic derivative is the Psi function difference {psi}(k + x) - {psi}(k); this and its derivatives lead to a test statistic to decide on the validity of a specified model. The test statistic uses a data base so there exists a comparison available between theory and application. Note that the test function is not dominated by outliers. Applications to (i) Fisher's tick data, (ii) accidents data, (iii) Weldon's dice data are included.

  16. Library Statistical Data Base Formats and Definitions.

    ERIC Educational Resources Information Center

    Jones, Dennis; And Others

    Represented are the detailed set of data structures relevant to the categorization of information, terminology, and definitions employed in the design of the library statistical data base. The data base, or management information system, provides administrators with a framework of information and standardized data for library management, planning,…

  17. Interactive Visualisations and Statistical Literacy

    ERIC Educational Resources Information Center

    Sutherland, Sinclair; Ridgway, Jim

    2017-01-01

    Statistical literacy involves engagement with the data one encounters. New forms of data and new ways to engage with data--notably via interactive data visualisations--are emerging. Some of the skills required to work effectively with these new visualisation tools are described. We argue that interactive data visualisations will have as profound…

  18. INTERFACING SAS TO ORACLE IN THE UNIX ENVIRONMENT

    EPA Science Inventory

    SAS is an EPA standard data and statistical analysis software package while ORACLE is EPA's standard data base management system software package. RACLE has the advantage over SAS in data retrieval and storage capabilities but has limited data and statistical analysis capability....

  19. Tools for Basic Statistical Analysis

    NASA Technical Reports Server (NTRS)

    Luz, Paul L.

    2005-01-01

    Statistical Analysis Toolset is a collection of eight Microsoft Excel spreadsheet programs, each of which performs calculations pertaining to an aspect of statistical analysis. These programs present input and output data in user-friendly, menu-driven formats, with automatic execution. The following types of calculations are performed: Descriptive statistics are computed for a set of data x(i) (i = 1, 2, 3 . . . ) entered by the user. Normal Distribution Estimates will calculate the statistical value that corresponds to cumulative probability values, given a sample mean and standard deviation of the normal distribution. Normal Distribution from two Data Points will extend and generate a cumulative normal distribution for the user, given two data points and their associated probability values. Two programs perform two-way analysis of variance (ANOVA) with no replication or generalized ANOVA for two factors with four levels and three repetitions. Linear Regression-ANOVA will curvefit data to the linear equation y=f(x) and will do an ANOVA to check its significance.

  20. Standard deviation and standard error of the mean.

    PubMed

    Lee, Dong Kyu; In, Junyong; Lee, Sangseok

    2015-06-01

    In most clinical and experimental studies, the standard deviation (SD) and the estimated standard error of the mean (SEM) are used to present the characteristics of sample data and to explain statistical analysis results. However, some authors occasionally muddle the distinctive usage between the SD and SEM in medical literature. Because the process of calculating the SD and SEM includes different statistical inferences, each of them has its own meaning. SD is the dispersion of data in a normal distribution. In other words, SD indicates how accurately the mean represents sample data. However the meaning of SEM includes statistical inference based on the sampling distribution. SEM is the SD of the theoretical distribution of the sample means (the sampling distribution). While either SD or SEM can be applied to describe data and statistical results, one should be aware of reasonable methods with which to use SD and SEM. We aim to elucidate the distinctions between SD and SEM and to provide proper usage guidelines for both, which summarize data and describe statistical results.

  1. Standard deviation and standard error of the mean

    PubMed Central

    In, Junyong; Lee, Sangseok

    2015-01-01

    In most clinical and experimental studies, the standard deviation (SD) and the estimated standard error of the mean (SEM) are used to present the characteristics of sample data and to explain statistical analysis results. However, some authors occasionally muddle the distinctive usage between the SD and SEM in medical literature. Because the process of calculating the SD and SEM includes different statistical inferences, each of them has its own meaning. SD is the dispersion of data in a normal distribution. In other words, SD indicates how accurately the mean represents sample data. However the meaning of SEM includes statistical inference based on the sampling distribution. SEM is the SD of the theoretical distribution of the sample means (the sampling distribution). While either SD or SEM can be applied to describe data and statistical results, one should be aware of reasonable methods with which to use SD and SEM. We aim to elucidate the distinctions between SD and SEM and to provide proper usage guidelines for both, which summarize data and describe statistical results. PMID:26045923

  2. Anomaly detection of turbopump vibration in Space Shuttle Main Engine using statistics and neural networks

    NASA Technical Reports Server (NTRS)

    Lo, C. F.; Wu, K.; Whitehead, B. A.

    1993-01-01

    The statistical and neural networks methods have been applied to investigate the feasibility in detecting anomalies in turbopump vibration of SSME. The anomalies are detected based on the amplitude of peaks of fundamental and harmonic frequencies in the power spectral density. These data are reduced to the proper format from sensor data measured by strain gauges and accelerometers. Both methods are feasible to detect the vibration anomalies. The statistical method requires sufficient data points to establish a reasonable statistical distribution data bank. This method is applicable for on-line operation. The neural networks method also needs to have enough data basis to train the neural networks. The testing procedure can be utilized at any time so long as the characteristics of components remain unchanged.

  3. BTS statistical standards manual

    DOT National Transportation Integrated Search

    2005-10-01

    The Bureau of Transportation Statistics (BTS), like other federal statistical agencies, establishes professional standards to guide the methods and procedures for the collection, processing, storage, and presentation of statistical data. Standards an...

  4. Statistical assessment of the learning curves of health technologies.

    PubMed

    Ramsay, C R; Grant, A M; Wallace, S A; Garthwaite, P H; Monk, A F; Russell, I T

    2001-01-01

    (1) To describe systematically studies that directly assessed the learning curve effect of health technologies. (2) Systematically to identify 'novel' statistical techniques applied to learning curve data in other fields, such as psychology and manufacturing. (3) To test these statistical techniques in data sets from studies of varying designs to assess health technologies in which learning curve effects are known to exist. METHODS - STUDY SELECTION (HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW): For a study to be included, it had to include a formal analysis of the learning curve of a health technology using a graphical, tabular or statistical technique. METHODS - STUDY SELECTION (NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH): For a study to be included, it had to include a formal assessment of a learning curve using a statistical technique that had not been identified in the previous search. METHODS - DATA SOURCES: Six clinical and 16 non-clinical biomedical databases were searched. A limited amount of handsearching and scanning of reference lists was also undertaken. METHODS - DATA EXTRACTION (HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW): A number of study characteristics were abstracted from the papers such as study design, study size, number of operators and the statistical method used. METHODS - DATA EXTRACTION (NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH): The new statistical techniques identified were categorised into four subgroups of increasing complexity: exploratory data analysis; simple series data analysis; complex data structure analysis, generic techniques. METHODS - TESTING OF STATISTICAL METHODS: Some of the statistical methods identified in the systematic searches for single (simple) operator series data and for multiple (complex) operator series data were illustrated and explored using three data sets. The first was a case series of 190 consecutive laparoscopic fundoplication procedures performed by a single surgeon; the second was a case series of consecutive laparoscopic cholecystectomy procedures performed by ten surgeons; the third was randomised trial data derived from the laparoscopic procedure arm of a multicentre trial of groin hernia repair, supplemented by data from non-randomised operations performed during the trial. RESULTS - HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW: Of 4571 abstracts identified, 272 (6%) were later included in the study after review of the full paper. Some 51% of studies assessed a surgical minimal access technique and 95% were case series. The statistical method used most often (60%) was splitting the data into consecutive parts (such as halves or thirds), with only 14% attempting a more formal statistical analysis. The reporting of the studies was poor, with 31% giving no details of data collection methods. RESULTS - NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH: Of 9431 abstracts assessed, 115 (1%) were deemed appropriate for further investigation and, of these, 18 were included in the study. All of the methods for complex data sets were identified in the non-clinical literature. These were discriminant analysis, two-stage estimation of learning rates, generalised estimating equations, multilevel models, latent curve models, time series models and stochastic parameter models. In addition, eight new shapes of learning curves were identified. RESULTS - TESTING OF STATISTICAL METHODS: No one particular shape of learning curve performed significantly better than another. The performance of 'operation time' as a proxy for learning differed between the three procedures. Multilevel modelling using the laparoscopic cholecystectomy data demonstrated and measured surgeon-specific and confounding effects. The inclusion of non-randomised cases, despite the possible limitations of the method, enhanced the interpretation of learning effects. CONCLUSIONS - HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW: The statistical methods used for assessing learning effects in health technology assessment have been crude and the reporting of studies poor. CONCLUSIONS - NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH: A number of statistical methods for assessing learning effects were identified that had not hitherto been used in health technology assessment. There was a hierarchy of methods for the identification and measurement of learning, and the more sophisticated methods for both have had little if any use in health technology assessment. This demonstrated the value of considering fields outside clinical research when addressing methodological issues in health technology assessment. CONCLUSIONS - TESTING OF STATISTICAL METHODS: It has been demonstrated that the portfolio of techniques identified can enhance investigations of learning curve effects. (ABSTRACT TRUNCATED)

  5. Library Statistics of Colleges and Universities, 1963-1964. Analytic Report.

    ERIC Educational Resources Information Center

    Samore, Theodore

    The series of analytic reports on management and salary data of the academic libraries, paralleling the series titled "Library Statistics of Colleges and Universities, Institutional Data," is continued by this publication. The statistical tables of this report are of value to administrators, librarians, and others because: (1) they help…

  6. Improving Student Understanding of Spatial Ecology Statistics

    ERIC Educational Resources Information Center

    Hopkins, Robert, II; Alberts, Halley

    2015-01-01

    This activity is designed as a primer to teaching population dispersion analysis. The aim is to help improve students' spatial thinking and their understanding of how spatial statistic equations work. Students use simulated data to develop their own statistic and apply that equation to experimental behavioral data for Gambusia affinis (western…

  7. Educating the Educator: U.S. Government Statistical Sources for Geographic Research and Teaching.

    ERIC Educational Resources Information Center

    Fryman, James F.; Wilkinson, Patrick J.

    Appropriate for college geography students and researchers, this paper briefly introduces basic federal statistical publications and corresponding finding aids. General references include "Statistical Abstract of the United States," and three complementary publications: "County and City Data Book,""State and Metropolitan Area Data Book," and…

  8. The Utility of Robust Means in Statistics

    ERIC Educational Resources Information Center

    Goodwyn, Fara

    2012-01-01

    Location estimates calculated from heuristic data were examined using traditional and robust statistical methods. The current paper demonstrates the impact outliers have on the sample mean and proposes robust methods to control for outliers in sample data. Traditional methods fail because they rely on the statistical assumptions of normality and…

  9. Statistical Cost Estimation in Higher Education: Some Alternatives.

    ERIC Educational Resources Information Center

    Brinkman, Paul T.; Niwa, Shelley

    Recent developments in econometrics that are relevant to the task of estimating costs in higher education are reviewed. The relative effectiveness of alternative statistical procedures for estimating costs are also tested. Statistical cost estimation involves three basic parts: a model, a data set, and an estimation procedure. Actual data are used…

  10. Statistics Report on TEQSA Registered Higher Education Providers, 2017

    ERIC Educational Resources Information Center

    Australian Government Tertiary Education Quality and Standards Agency, 2017

    2017-01-01

    The "Statistics Report on TEQSA Registered Higher Education Providers" ("the Statistics Report") is the fourth release of selected higher education sector data held by is the fourth release of selected higher education sector data held by the Australian Government Tertiary Education Quality and Standards Agency (TEQSA) for its…

  11. 75 FR 7412 - Reporting Information Regarding Falsification of Data

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-02-19

    ... concomitant medications or treatments; omitting data so that a statistical analysis yields a result that would..., results, statistics, items of information, or statements made by individuals. This proposed rule would..., Bureau of Labor Statistics ( www.bls.gov/oes/current/naics4_325400.htm ); compliance officer wage rate...

  12. Data management in large-scale collaborative toxicity studies: how to file experimental data for automated statistical analysis.

    PubMed

    Stanzel, Sven; Weimer, Marc; Kopp-Schneider, Annette

    2013-06-01

    High-throughput screening approaches are carried out for the toxicity assessment of a large number of chemical compounds. In such large-scale in vitro toxicity studies several hundred or thousand concentration-response experiments are conducted. The automated evaluation of concentration-response data using statistical analysis scripts saves time and yields more consistent results in comparison to data analysis performed by the use of menu-driven statistical software. Automated statistical analysis requires that concentration-response data are available in a standardised data format across all compounds. To obtain consistent data formats, a standardised data management workflow must be established, including guidelines for data storage, data handling and data extraction. In this paper two procedures for data management within large-scale toxicological projects are proposed. Both procedures are based on Microsoft Excel files as the researcher's primary data format and use a computer programme to automate the handling of data files. The first procedure assumes that data collection has not yet started whereas the second procedure can be used when data files already exist. Successful implementation of the two approaches into the European project ACuteTox is illustrated. Copyright © 2012 Elsevier Ltd. All rights reserved.

  13. Trends in Fertility in the United States. Vital and Health Statistics, Data from the National Vital Statistics System. Series 21, Number 28.

    ERIC Educational Resources Information Center

    Taffel, Selma

    This report presents and interprets birth statistics for the United States with particular emphasis on changes that took place during the period 1970-73. Data for the report were based on information entered on birth certificates collected from all states. The majority of the document comprises graphs and tables of data, but there are four short…

  14. Low-flow statistics of selected streams in Chester County, Pennsylvania

    USGS Publications Warehouse

    Schreffler, Curtis L.

    1998-01-01

    Low-flow statistics for many streams in Chester County, Pa., were determined on the basis of data from 14 continuous-record streamflow stations in Chester County and data from 1 station in Maryland and 1 station in Delaware. The stations in Maryland and Delaware are on streams that drain large areas within Chester County. Streamflow data through the 1994 water year were used in the analyses. The low-flow statistics summarized are the 1Q10, 7Q10, 30Q10, and harmonic mean. Low-flow statistics were estimated at 34 partial-record stream sites throughout Chester County.

  15. Statistics Clinic

    NASA Technical Reports Server (NTRS)

    Feiveson, Alan H.; Foy, Millennia; Ploutz-Snyder, Robert; Fiedler, James

    2014-01-01

    Do you have elevated p-values? Is the data analysis process getting you down? Do you experience anxiety when you need to respond to criticism of statistical methods in your manuscript? You may be suffering from Insufficient Statistical Support Syndrome (ISSS). For symptomatic relief of ISSS, come for a free consultation with JSC biostatisticians at our help desk during the poster sessions at the HRP Investigators Workshop. Get answers to common questions about sample size, missing data, multiple testing, when to trust the results of your analyses and more. Side effects may include sudden loss of statistics anxiety, improved interpretation of your data, and increased confidence in your results.

  16. Toward Global Comparability of Sexual Orientation Data in Official Statistics: A Conceptual Framework of Sexual Orientation for Health Data Collection in New Zealand's Official Statistics System

    PubMed Central

    Gray, Alistair; Veale, Jaimie F.; Binson, Diane; Sell, Randell L.

    2013-01-01

    Objective. Effectively addressing health disparities experienced by sexual minority populations requires high-quality official data on sexual orientation. We developed a conceptual framework of sexual orientation to improve the quality of sexual orientation data in New Zealand's Official Statistics System. Methods. We reviewed conceptual and methodological literature, culminating in a draft framework. To improve the framework, we held focus groups and key-informant interviews with sexual minority stakeholders and producers and consumers of official statistics. An advisory board of experts provided additional guidance. Results. The framework proposes working definitions of the sexual orientation topic and measurement concepts, describes dimensions of the measurement concepts, discusses variables framing the measurement concepts, and outlines conceptual grey areas. Conclusion. The framework proposes standard definitions and concepts for the collection of official sexual orientation data in New Zealand. It presents a model for producers of official statistics in other countries, who wish to improve the quality of health data on their citizens. PMID:23840231

  17. Petroleum supply monthly, February 1999, with data for December 1998

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    NONE

    Data presented in the Petroleum Supply Monthly (PSM) describes the supply and disposition of petroleum products in the United States and major US geographic regions. The data series describe production, imports and exports, inter-Petroleum Administration for Defense (PAD) District movements, and inventories by the primary suppliers of petroleum products in the United States (50 States and the District of Columbia). The reporting universe includes those petroleum sectors in primary supply. Included are: petroleum refiners, motor gasoline blenders, operators of natural gas processing plants and fractionators, inter-PAD transporters, importers, and major inventory holders of petroleum products and crude oil. When aggregated,more » the data reported by these sectors approximately represent the consumption of petroleum products in the United States. Data presented in the PSM are divided into two sections: Summary Statistics and Detailed Statistics. The tables and figures in the Summary Statistics section of the PSM present a time series of selected petroleum data on a US level. The Detailed Statistics tables of the PSM present statistics for the most current month available as well as year-to-date. 16 figs., 66 tabs.« less

  18. A Sorting Statistic with Application in Neurological Magnetic Resonance Imaging of Autism.

    PubMed

    Levman, Jacob; Takahashi, Emi; Forgeron, Cynthia; MacDonald, Patrick; Stewart, Natalie; Lim, Ashley; Martel, Anne

    2018-01-01

    Effect size refers to the assessment of the extent of differences between two groups of samples on a single measurement. Assessing effect size in medical research is typically accomplished with Cohen's d statistic. Cohen's d statistic assumes that average values are good estimators of the position of a distribution of numbers and also assumes Gaussian (or bell-shaped) underlying data distributions. In this paper, we present an alternative evaluative statistic that can quantify differences between two data distributions in a manner that is similar to traditional effect size calculations; however, the proposed approach avoids making assumptions regarding the shape of the underlying data distribution. The proposed sorting statistic is compared with Cohen's d statistic and is demonstrated to be capable of identifying feature measurements of potential interest for which Cohen's d statistic implies the measurement would be of little use. This proposed sorting statistic has been evaluated on a large clinical autism dataset from Boston Children's Hospital , Harvard Medical School , demonstrating that it can potentially play a constructive role in future healthcare technologies.

  19. A Sorting Statistic with Application in Neurological Magnetic Resonance Imaging of Autism

    PubMed Central

    Takahashi, Emi; Lim, Ashley; Martel, Anne

    2018-01-01

    Effect size refers to the assessment of the extent of differences between two groups of samples on a single measurement. Assessing effect size in medical research is typically accomplished with Cohen's d statistic. Cohen's d statistic assumes that average values are good estimators of the position of a distribution of numbers and also assumes Gaussian (or bell-shaped) underlying data distributions. In this paper, we present an alternative evaluative statistic that can quantify differences between two data distributions in a manner that is similar to traditional effect size calculations; however, the proposed approach avoids making assumptions regarding the shape of the underlying data distribution. The proposed sorting statistic is compared with Cohen's d statistic and is demonstrated to be capable of identifying feature measurements of potential interest for which Cohen's d statistic implies the measurement would be of little use. This proposed sorting statistic has been evaluated on a large clinical autism dataset from Boston Children's Hospital, Harvard Medical School, demonstrating that it can potentially play a constructive role in future healthcare technologies. PMID:29796236

  20. The crossing statistic: dealing with unknown errors in the dispersion of Type Ia supernovae

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shafieloo, Arman; Clifton, Timothy; Ferreira, Pedro, E-mail: arman@ewha.ac.kr, E-mail: tclifton@astro.ox.ac.uk, E-mail: p.ferreira1@physics.ox.ac.uk

    2011-08-01

    We propose a new statistic that has been designed to be used in situations where the intrinsic dispersion of a data set is not well known: The Crossing Statistic. This statistic is in general less sensitive than χ{sup 2} to the intrinsic dispersion of the data, and hence allows us to make progress in distinguishing between different models using goodness of fit to the data even when the errors involved are poorly understood. The proposed statistic makes use of the shape and trends of a model's predictions in a quantifiable manner. It is applicable to a variety of circumstances, althoughmore » we consider it to be especially well suited to the task of distinguishing between different cosmological models using type Ia supernovae. We show that this statistic can easily distinguish between different models in cases where the χ{sup 2} statistic fails. We also show that the last mode of the Crossing Statistic is identical to χ{sup 2}, so that it can be considered as a generalization of χ{sup 2}.« less

  1. Rank-based testing of equal survivorship based on cross-sectional survival data with or without prospective follow-up.

    PubMed

    Chan, Kwun Chuen Gary; Qin, Jing

    2015-10-01

    Existing linear rank statistics cannot be applied to cross-sectional survival data without follow-up since all subjects are essentially censored. However, partial survival information are available from backward recurrence times and are frequently collected from health surveys without prospective follow-up. Under length-biased sampling, a class of linear rank statistics is proposed based only on backward recurrence times without any prospective follow-up. When follow-up data are available, the proposed rank statistic and a conventional rank statistic that utilizes follow-up information from the same sample are shown to be asymptotically independent. We discuss four ways to combine these two statistics when follow-up is present. Simulations show that all combined statistics have substantially improved power compared with conventional rank statistics, and a Mantel-Haenszel test performed the best among the proposal statistics. The method is applied to a cross-sectional health survey without follow-up and a study of Alzheimer's disease with prospective follow-up. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  2. CAN'T MISS--conquer any number task by making important statistics simple. Part 2. Probability, populations, samples, and normal distributions.

    PubMed

    Hansen, John P

    2003-01-01

    Healthcare quality improvement professionals need to understand and use inferential statistics to interpret sample data from their organizations. In quality improvement and healthcare research studies all the data from a population often are not available, so investigators take samples and make inferences about the population by using inferential statistics. This three-part series will give readers an understanding of the concepts of inferential statistics as well as the specific tools for calculating confidence intervals for samples of data. This article, Part 2, describes probability, populations, and samples. The uses of descriptive and inferential statistics are outlined. The article also discusses the properties and probability of normal distributions, including the standard normal distribution.

  3. Obtaining Streamflow Statistics for Massachusetts Streams on the World Wide Web

    USGS Publications Warehouse

    Ries, Kernell G.; Steeves, Peter A.; Freeman, Aleda; Singh, Raj

    2000-01-01

    A World Wide Web application has been developed to make it easy to obtain streamflow statistics for user-selected locations on Massachusetts streams. The Web application, named STREAMSTATS (available at http://water.usgs.gov/osw/streamstats/massachusetts.html ), can provide peak-flow frequency, low-flow frequency, and flow-duration statistics for most streams in Massachusetts. These statistics describe the magnitude (how much), frequency (how often), and duration (how long) of flow in a stream. The U.S. Geological Survey (USGS) has published streamflow statistics, such as the 100-year peak flow, the 7-day, 10-year low flow, and flow-duration statistics, for its data-collection stations in numerous reports. Federal, State, and local agencies need these statistics to plan and manage use of water resources and to regulate activities in and around streams. Engineering and environmental consulting firms, utilities, industry, and others use the statistics to design and operate water-supply systems, hydropower facilities, industrial facilities, wastewater treatment facilities, and roads, bridges, and other structures. Until now, streamflow statistics for data-collection stations have often been difficult to obtain because they are scattered among many reports, some of which are not readily available to the public. In addition, streamflow statistics are often needed for locations where no data are available. STREAMSTATS helps solve these problems. STREAMSTATS was developed jointly by the USGS and MassGIS, the State Geographic Information Systems (GIS) agency, in cooperation with the Massachusetts Departments of Environmental Management and Environmental Protection. The application consists of three major components: (1) a user interface that displays maps and allows users to select stream locations for which they want streamflow statistics (fig. 1), (2) a data base of previously published streamflow statistics and descriptive information for 725 USGS data-collection stations, and (3) an automated procedure that determines characteristics of the land-surface area (basin) that drains to the stream and inserts those characteristics into equations that estimate the streamflow statistics. Each of these components is described and guidance for using STREAMSTATS is provided below.

  4. Statistical analysis of solid waste composition data: Arithmetic mean, standard deviation and correlation coefficients.

    PubMed

    Edjabou, Maklawe Essonanawe; Martín-Fernández, Josep Antoni; Scheutz, Charlotte; Astrup, Thomas Fruergaard

    2017-11-01

    Data for fractional solid waste composition provide relative magnitudes of individual waste fractions, the percentages of which always sum to 100, thereby connecting them intrinsically. Due to this sum constraint, waste composition data represent closed data, and their interpretation and analysis require statistical methods, other than classical statistics that are suitable only for non-constrained data such as absolute values. However, the closed characteristics of waste composition data are often ignored when analysed. The results of this study showed, for example, that unavoidable animal-derived food waste amounted to 2.21±3.12% with a confidence interval of (-4.03; 8.45), which highlights the problem of the biased negative proportions. A Pearson's correlation test, applied to waste fraction generation (kg mass), indicated a positive correlation between avoidable vegetable food waste and plastic packaging. However, correlation tests applied to waste fraction compositions (percentage values) showed a negative association in this regard, thus demonstrating that statistical analyses applied to compositional waste fraction data, without addressing the closed characteristics of these data, have the potential to generate spurious or misleading results. Therefore, ¨compositional data should be transformed adequately prior to any statistical analysis, such as computing mean, standard deviation and correlation coefficients. Copyright © 2017 Elsevier Ltd. All rights reserved.

  5. Fetal Alcohol Spectrum Disorders (FASDs)

    MedlinePlus

    ... other research. DATA & STATISTICS Data and statistics highlights. Interventions CHOICES program and alcohol screening and brief intervention (SBI). EDUCATION & TRAINING Tools, training centers, & educational resources. ...

  6. The Heuristics of Statistical Argumentation: Scaffolding at the Postsecondary Level

    ERIC Educational Resources Information Center

    Pardue, Teneal Messer

    2017-01-01

    Language plays a key role in statistics and, by extension, in statistics education. Enculturating students into the practice of statistics requires preparing them to communicate results of data analysis. Statistical argumentation is one way of providing structure to facilitate discourse in the statistics classroom. In this study, a teaching…

  7. Analysis and interpretation of cost data in randomised controlled trials: review of published studies

    PubMed Central

    Barber, Julie A; Thompson, Simon G

    1998-01-01

    Objective To review critically the statistical methods used for health economic evaluations in randomised controlled trials where an estimate of cost is available for each patient in the study. Design Survey of published randomised trials including an economic evaluation with cost values suitable for statistical analysis; 45 such trials published in 1995 were identified from Medline. Main outcome measures The use of statistical methods for cost data was assessed in terms of the descriptive statistics reported, use of statistical inference, and whether the reported conclusions were justified. Results Although all 45 trials reviewed apparently had cost data for each patient, only 9 (20%) reported adequate measures of variability for these data and only 25 (56%) gave results of statistical tests or a measure of precision for the comparison of costs between the randomised groups. Only 16 (36%) of the articles gave conclusions which were justified on the basis of results presented in the paper. No paper reported sample size calculations for costs. Conclusions The analysis and interpretation of cost data from published trials reveal a lack of statistical awareness. Strong and potentially misleading conclusions about the relative costs of alternative therapies have often been reported in the absence of supporting statistical evidence. Improvements in the analysis and reporting of health economic assessments are urgently required. Health economic guidelines need to be revised to incorporate more detailed statistical advice. Key messagesHealth economic evaluations required for important healthcare policy decisions are often carried out in randomised controlled trialsA review of such published economic evaluations assessed whether statistical methods for cost outcomes have been appropriately used and interpretedFew publications presented adequate descriptive information for costs or performed appropriate statistical analysesIn at least two thirds of the papers, the main conclusions regarding costs were not justifiedThe analysis and reporting of health economic assessments within randomised controlled trials urgently need improving PMID:9794854

  8. Sources of international migration statistics in Africa.

    PubMed

    1984-01-01

    The sources of international migration data for Africa may be classified into 2 main categories: administrative records and 2) censuses and survey data. Both categories are sources for the direct measurement of migration, but the 2nd category can be used for the indirect estimation of net international migration. The administrative records from which data on international migration may be derived include 1) entry/departure cards or forms completed at international borders, 2) residence/work permits issued to aliens, and 3) general population registers and registers of aliens. The statistics derived from the entry/departure cards may be described as 1) land frontier control statistics and 2) port control statistics. The former refer to data derived from movements across land borders and the latter refer to information collected at international airports and seaports. Other administrative records which are potential sources of statistics on international migration in some African countries include some limited population registers, records of the registration of aliens, and particulars of residence/work permits issued to aliens. Although frontier control data are considered the most important source of international migration statistics, in many African countries these data are too deficient to provide a satisfactory indication of the level of international migration. Thus decennial population censuses and/or sample surveys are the major sources of the available statistics on the stock and characteristics of international migration. Indirect methods can be used to supplement census data with intercensal estimates of net migration using census data on the total population. This indirect method of obtaining information on migration can be used to evaluate estimates derived from frontier control records, and it also offers the means of obtaining alternative information on international migration in African countries which have not directly investigated migration topics in their censuses or surveys.

  9. Statistical methods and errors in family medicine articles between 2010 and 2014-Suez Canal University, Egypt: A cross-sectional study.

    PubMed

    Nour-Eldein, Hebatallah

    2016-01-01

    With limited statistical knowledge of most physicians it is not uncommon to find statistical errors in research articles. To determine the statistical methods and to assess the statistical errors in family medicine (FM) research articles that were published between 2010 and 2014. This was a cross-sectional study. All 66 FM research articles that were published over 5 years by FM authors with affiliation to Suez Canal University were screened by the researcher between May and August 2015. Types and frequencies of statistical methods were reviewed in all 66 FM articles. All 60 articles with identified inferential statistics were examined for statistical errors and deficiencies. A comprehensive 58-item checklist based on statistical guidelines was used to evaluate the statistical quality of FM articles. Inferential methods were recorded in 62/66 (93.9%) of FM articles. Advanced analyses were used in 29/66 (43.9%). Contingency tables 38/66 (57.6%), regression (logistic, linear) 26/66 (39.4%), and t-test 17/66 (25.8%) were the most commonly used inferential tests. Within 60 FM articles with identified inferential statistics, no prior sample size 19/60 (31.7%), application of wrong statistical tests 17/60 (28.3%), incomplete documentation of statistics 59/60 (98.3%), reporting P value without test statistics 32/60 (53.3%), no reporting confidence interval with effect size measures 12/60 (20.0%), use of mean (standard deviation) to describe ordinal/nonnormal data 8/60 (13.3%), and errors related to interpretation were mainly for conclusions without support by the study data 5/60 (8.3%). Inferential statistics were used in the majority of FM articles. Data analysis and reporting statistics are areas for improvement in FM research articles.

  10. Statistical methods and errors in family medicine articles between 2010 and 2014-Suez Canal University, Egypt: A cross-sectional study

    PubMed Central

    Nour-Eldein, Hebatallah

    2016-01-01

    Background: With limited statistical knowledge of most physicians it is not uncommon to find statistical errors in research articles. Objectives: To determine the statistical methods and to assess the statistical errors in family medicine (FM) research articles that were published between 2010 and 2014. Methods: This was a cross-sectional study. All 66 FM research articles that were published over 5 years by FM authors with affiliation to Suez Canal University were screened by the researcher between May and August 2015. Types and frequencies of statistical methods were reviewed in all 66 FM articles. All 60 articles with identified inferential statistics were examined for statistical errors and deficiencies. A comprehensive 58-item checklist based on statistical guidelines was used to evaluate the statistical quality of FM articles. Results: Inferential methods were recorded in 62/66 (93.9%) of FM articles. Advanced analyses were used in 29/66 (43.9%). Contingency tables 38/66 (57.6%), regression (logistic, linear) 26/66 (39.4%), and t-test 17/66 (25.8%) were the most commonly used inferential tests. Within 60 FM articles with identified inferential statistics, no prior sample size 19/60 (31.7%), application of wrong statistical tests 17/60 (28.3%), incomplete documentation of statistics 59/60 (98.3%), reporting P value without test statistics 32/60 (53.3%), no reporting confidence interval with effect size measures 12/60 (20.0%), use of mean (standard deviation) to describe ordinal/nonnormal data 8/60 (13.3%), and errors related to interpretation were mainly for conclusions without support by the study data 5/60 (8.3%). Conclusion: Inferential statistics were used in the majority of FM articles. Data analysis and reporting statistics are areas for improvement in FM research articles. PMID:27453839

  11. How to interpret the results of medical time series data analysis: Classical statistical approaches versus dynamic Bayesian network modeling.

    PubMed

    Onisko, Agnieszka; Druzdzel, Marek J; Austin, R Marshall

    2016-01-01

    Classical statistics is a well-established approach in the analysis of medical data. While the medical community seems to be familiar with the concept of a statistical analysis and its interpretation, the Bayesian approach, argued by many of its proponents to be superior to the classical frequentist approach, is still not well-recognized in the analysis of medical data. The goal of this study is to encourage data analysts to use the Bayesian approach, such as modeling with graphical probabilistic networks, as an insightful alternative to classical statistical analysis of medical data. This paper offers a comparison of two approaches to analysis of medical time series data: (1) classical statistical approach, such as the Kaplan-Meier estimator and the Cox proportional hazards regression model, and (2) dynamic Bayesian network modeling. Our comparison is based on time series cervical cancer screening data collected at Magee-Womens Hospital, University of Pittsburgh Medical Center over 10 years. The main outcomes of our comparison are cervical cancer risk assessments produced by the three approaches. However, our analysis discusses also several aspects of the comparison, such as modeling assumptions, model building, dealing with incomplete data, individualized risk assessment, results interpretation, and model validation. Our study shows that the Bayesian approach is (1) much more flexible in terms of modeling effort, and (2) it offers an individualized risk assessment, which is more cumbersome for classical statistical approaches.

  12. A random-sum Wilcoxon statistic and its application to analysis of ROC and LROC data.

    PubMed

    Tang, Liansheng Larry; Balakrishnan, N

    2011-01-01

    The Wilcoxon-Mann-Whitney statistic is commonly used for a distribution-free comparison of two groups. One requirement for its use is that the sample sizes of the two groups are fixed. This is violated in some of the applications such as medical imaging studies and diagnostic marker studies; in the former, the violation occurs since the number of correctly localized abnormal images is random, while in the latter the violation is due to some subjects not having observable measurements. For this reason, we propose here a random-sum Wilcoxon statistic for comparing two groups in the presence of ties, and derive its variance as well as its asymptotic distribution for large sample sizes. The proposed statistic includes the regular Wilcoxon rank-sum statistic. Finally, we apply the proposed statistic for summarizing location response operating characteristic data from a liver computed tomography study, and also for summarizing diagnostic accuracy of biomarker data.

  13. Recovering incomplete data using Statistical Multiple Imputations (SMI): a case study in environmental chemistry.

    PubMed

    Mercer, Theresa G; Frostick, Lynne E; Walmsley, Anthony D

    2011-10-15

    This paper presents a statistical technique that can be applied to environmental chemistry data where missing values and limit of detection levels prevent the application of statistics. A working example is taken from an environmental leaching study that was set up to determine if there were significant differences in levels of leached arsenic (As), chromium (Cr) and copper (Cu) between lysimeters containing preservative treated wood waste and those containing untreated wood. Fourteen lysimeters were setup and left in natural conditions for 21 weeks. The resultant leachate was analysed by ICP-OES to determine the As, Cr and Cu concentrations. However, due to the variation inherent in each lysimeter combined with the limits of detection offered by ICP-OES, the collected quantitative data was somewhat incomplete. Initial data analysis was hampered by the number of 'missing values' in the data. To recover the dataset, the statistical tool of Statistical Multiple Imputation (SMI) was applied, and the data was re-analysed successfully. It was demonstrated that using SMI did not affect the variance in the data, but facilitated analysis of the complete dataset. Copyright © 2011 Elsevier B.V. All rights reserved.

  14. A system for classifying wood-using industries and recording statistics for automatic data processing.

    Treesearch

    E.W. Fobes; R.W. Rowe

    1968-01-01

    A system for classifying wood-using industries and recording pertinent statistics for automatic data processing is described. Forms and coding instructions for recording data of primary processing plants are included.

  15. Routine hospital data - is it good enough for trials? An example using England's Hospital Episode Statistics in the SHIFT trial of Family Therapy vs. Treatment as Usual in adolescents following self-harm.

    PubMed

    Wright-Hughes, Alexandra; Graham, Elizabeth; Cottrell, David; Farrin, Amanda

    2018-04-01

    Use of routine data sources within clinical research is increasing and is endorsed by the National Institute for Health Research to increase trial efficiencies; however there is limited evidence for its use in clinical trials, especially in relation to self-harm. One source of routine data, Hospital Episode Statistics, is collated and distributed by NHS Digital and contains details of admissions, outpatient, and Accident and Emergency attendances provided periodically by English National Health Service hospitals. We explored the reliability and accuracy of Hospital Episode Statistics, compared to data collected directly from hospital records, to assess whether it would provide complete, accurate, and reliable means of acquiring hospital attendances for self-harm - the primary outcome for the SHIFT (Self-Harm Intervention: Family Therapy) trial evaluating Family Therapy for adolescents following self-harm. Participant identifiers were linked to Hospital Episode Statistics Accident and Emergency, and Admissions data, and episodes combined to describe participants' complete hospital attendance. Attendance data were initially compared to data previously gathered by trial researchers from pre-identified hospitals. Final comparison was conducted of subsequent attendances collected through Hospital Episode Statistics and researcher follow-up. Consideration was given to linkage rates; number and proportion of attendances retrieved; reliability of Accident and Emergency, and Admissions data; percentage of self-harm episodes recorded and coded appropriately; and percentage of required data items retrieved. Participants were first linked to Hospital Episode Statistics with an acceptable match rate of 95%, identifying a total of 341 complete hospital attendances, compared to 139 reported by the researchers at the time. More than double the proportion of Hospital Episode Statistics Accident and Emergency episodes could not be classified in relation to self-harm (75%) compared to 34.9% of admitted episodes, and of overall attendances, 18% were classified as self-harm related and 20% not related, while ambiguity or insufficient information meant 62% were unclassified. Of 39 self-harm-related attendances reported by the researchers, Hospital Episode Statistics identified 24 (62%) as self-harm related while 15 (38%) were unclassified. Based on final data received, 1490 complete hospital attendances were identified and comparison to researcher follow-up found Hospital Episode Statistics underestimated the number of self-harm attendances by 37.2% (95% confidence interval 32.6%-41.9%). Advantages of routine data collection via NHS Digital included the acquisition of more comprehensive and timely trial outcome data, identifying more than double the number of hospital attendances than researchers. Disadvantages included ambiguity in the classification of self-harm relatedness. Our resulting primary outcome data collection strategy used routine data to identify hospital attendances supplemented by targeted researcher data collection for attendances requiring further self-harm classification.

  16. A Data Warehouse Architecture for DoD Healthcare Performance Measurements.

    DTIC Science & Technology

    1999-09-01

    design, develop, implement, and apply statistical analysis and data mining tools to a Data Warehouse of healthcare metrics. With the DoD healthcare...framework, this thesis defines a methodology to design, develop, implement, and apply statistical analysis and data mining tools to a Data Warehouse...21 F. INABILITY TO CONDUCT HELATHCARE ANALYSIS

  17. Using Data Mining to Teach Applied Statistics and Correlation

    ERIC Educational Resources Information Center

    Hartnett, Jessica L.

    2016-01-01

    This article describes two class activities that introduce the concept of data mining and very basic data mining analyses. Assessment data suggest that students learned some of the conceptual basics of data mining, understood some of the ethical concerns related to the practice, and were able to perform correlations via the Statistical Package for…

  18. Usage Data for Electronic Resources: A Comparison between Locally Collected and Vendor-Provided Statistics.

    ERIC Educational Resources Information Center

    Duy, Joanna; Vaughan, Liwen

    2003-01-01

    Vendor-provided electronic resource usage statistics are not currently standardized across vendors. This study investigates the feasibility of using locally collected data to check the reliability of vendor-provided data. Vendor-provided data were compared with local data collected from North Carolina State University (NCSU) Libraries' Web…

  19. What You Learn is What You See: Using Eye Movements to Study Infant Cross-Situational Word Learning

    PubMed Central

    Smith, Linda

    2016-01-01

    Recent studies show that both adults and young children possess powerful statistical learning capabilities to solve the word-to-world mapping problem. However, the underlying mechanisms that make statistical learning possible and powerful are not yet known. With the goal of providing new insights into this issue, the research reported in this paper used an eye tracker to record the moment-by-moment eye movement data of 14-month-old babies in statistical learning tasks. Various measures are applied to such fine-grained temporal data, such as looking duration and shift rate (the number of shifts in gaze from one visual object to the other) trial by trial, showing different eye movement patterns between strong and weak statistical learners. Moreover, an information-theoretic measure is developed and applied to gaze data to quantify the degree of learning uncertainty trial by trial. Next, a simple associative statistical learning model is applied to eye movement data and these simulation results are compared with empirical results from young children, showing strong correlations between these two. This suggests that an associative learning mechanism with selective attention can provide a cognitively plausible model of cross-situational statistical learning. The work represents the first steps to use eye movement data to infer underlying real-time processes in statistical word learning. PMID:22213894

  20. Statistical approaches in published ophthalmic clinical science papers: a comparison to statistical practice two decades ago.

    PubMed

    Zhang, Harrison G; Ying, Gui-Shuang

    2018-02-09

    The aim of this study is to evaluate the current practice of statistical analysis of eye data in clinical science papers published in British Journal of Ophthalmology ( BJO ) and to determine whether the practice of statistical analysis has improved in the past two decades. All clinical science papers (n=125) published in BJO in January-June 2017 were reviewed for their statistical analysis approaches for analysing primary ocular measure. We compared our findings to the results from a previous paper that reviewed BJO papers in 1995. Of 112 papers eligible for analysis, half of the studies analysed the data at an individual level because of the nature of observation, 16 (14%) studies analysed data from one eye only, 36 (32%) studies analysed data from both eyes at ocular level, one study (1%) analysed the overall summary of ocular finding per individual and three (3%) studies used the paired comparison. Among studies with data available from both eyes, 50 (89%) of 56 papers in 2017 did not analyse data from both eyes or ignored the intereye correlation, as compared with in 60 (90%) of 67 papers in 1995 (P=0.96). Among studies that analysed data from both eyes at an ocular level, 33 (92%) of 36 studies completely ignored the intereye correlation in 2017, as compared with in 16 (89%) of 18 studies in 1995 (P=0.40). A majority of studies did not analyse the data properly when data from both eyes were available. The practice of statistical analysis did not improve in the past two decades. Collaborative efforts should be made in the vision research community to improve the practice of statistical analysis for ocular data. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  1. Estimating order statistics of network degrees

    NASA Astrophysics Data System (ADS)

    Chu, J.; Nadarajah, S.

    2018-01-01

    We model the order statistics of network degrees of big data sets by a range of generalised beta distributions. A three parameter beta distribution due to Libby and Novick (1982) is shown to give the best overall fit for at least four big data sets. The fit of this distribution is significantly better than the fit suggested by Olhede and Wolfe (2012) across the whole range of order statistics for all four data sets.

  2. Effects of War Tax Collection in Honduran Society: Evaluating the Social and Economic Cost

    DTIC Science & Technology

    2015-06-12

    information gathering for this thesis relied on open electronic sources and statistical data from security and intelligence agencies. There are no...commission of this crime have identified. However, the main limitation encountered is the poor reliability of the statistical data because the high...justice systems with crime and violence. According to statistical data presented in Air & Space Power Journal, at least 60 percent of the 2,576 murders

  3. On computations of variance, covariance and correlation for interval data

    NASA Astrophysics Data System (ADS)

    Kishida, Masako

    2017-02-01

    In many practical situations, the data on which statistical analysis is to be performed is only known with interval uncertainty. Different combinations of values from the interval data usually lead to different values of variance, covariance, and correlation. Hence, it is desirable to compute the endpoints of possible values of these statistics. This problem is, however, NP-hard in general. This paper shows that the problem of computing the endpoints of possible values of these statistics can be rewritten as the problem of computing skewed structured singular values ν, for which there exist feasible (polynomial-time) algorithms that compute reasonably tight bounds in most practical cases. This allows one to find tight intervals of the aforementioned statistics for interval data.

  4. Development of new on-line statistical program for the Korean Society for Radiation Oncology

    PubMed Central

    Song, Si Yeol; Ahn, Seung Do; Chung, Weon Kuu; Choi, Eun Kyung; Cho, Kwan Ho

    2015-01-01

    Purpose To develop new on-line statistical program for the Korean Society for Radiation Oncology (KOSRO) to collect and extract medical data in radiation oncology more efficiently. Materials and Methods The statistical program is a web-based program. The directory was placed in a sub-folder of the homepage of KOSRO and its web address is http://www.kosro.or.kr/asda. The operating systems server is Linux and the webserver is the Apache HTTP server. For database (DB) server, MySQL is adopted and dedicated scripting language is the PHP. Each ID and password are controlled independently and all screen pages for data input or analysis are made to be friendly to users. Scroll-down menu is actively used for the convenience of user and the consistence of data analysis. Results Year of data is one of top categories and main topics include human resource, equipment, clinical statistics, specialized treatment and research achievement. Each topic or category has several subcategorized topics. Real-time on-line report of analysis is produced immediately after entering each data and the administrator is able to monitor status of data input of each hospital. Backup of data as spread sheets can be accessed by the administrator and be used for academic works by any members of the KOSRO. Conclusion The new on-line statistical program was developed to collect data from nationwide departments of radiation oncology. Intuitive screen and consistent input structure are expected to promote entering data of member hospitals and annual statistics should be a cornerstone of advance in radiation oncology. PMID:26157684

  5. Development of new on-line statistical program for the Korean Society for Radiation Oncology.

    PubMed

    Song, Si Yeol; Ahn, Seung Do; Chung, Weon Kuu; Shin, Kyung Hwan; Choi, Eun Kyung; Cho, Kwan Ho

    2015-06-01

    To develop new on-line statistical program for the Korean Society for Radiation Oncology (KOSRO) to collect and extract medical data in radiation oncology more efficiently. The statistical program is a web-based program. The directory was placed in a sub-folder of the homepage of KOSRO and its web address is http://www.kosro.or.kr/asda. The operating systems server is Linux and the webserver is the Apache HTTP server. For database (DB) server, MySQL is adopted and dedicated scripting language is the PHP. Each ID and password are controlled independently and all screen pages for data input or analysis are made to be friendly to users. Scroll-down menu is actively used for the convenience of user and the consistence of data analysis. Year of data is one of top categories and main topics include human resource, equipment, clinical statistics, specialized treatment and research achievement. Each topic or category has several subcategorized topics. Real-time on-line report of analysis is produced immediately after entering each data and the administrator is able to monitor status of data input of each hospital. Backup of data as spread sheets can be accessed by the administrator and be used for academic works by any members of the KOSRO. The new on-line statistical program was developed to collect data from nationwide departments of radiation oncology. Intuitive screen and consistent input structure are expected to promote entering data of member hospitals and annual statistics should be a cornerstone of advance in radiation oncology.

  6. Statistical analysis of 59 inspected SSME HPFTP turbine blades (uncracked and cracked)

    NASA Technical Reports Server (NTRS)

    Wheeler, John T.

    1987-01-01

    The numerical results of statistical analysis of the test data of Space Shuttle Main Engine high pressure fuel turbopump second-stage turbine blades, including some with cracks are presented. Several statistical methods use the test data to determine the application of differences in frequency variations between the uncracked and cracked blades.

  7. 42 CFR 417.568 - Adequate financial records, statistical data, and cost finding.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ..., COMPETITIVE MEDICAL PLANS, AND HEALTH CARE PREPAYMENT PLANS Medicare Payment: Cost Basis § 417.568 Adequate... definitions and accounting, statistics, and reporting practices that are widely accepted in the health care... 42 Public Health 3 2010-10-01 2010-10-01 false Adequate financial records, statistical data, and...

  8. 42 CFR 417.568 - Adequate financial records, statistical data, and cost finding.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ..., COMPETITIVE MEDICAL PLANS, AND HEALTH CARE PREPAYMENT PLANS Medicare Payment: Cost Basis § 417.568 Adequate... definitions and accounting, statistics, and reporting practices that are widely accepted in the health care... 42 Public Health 3 2011-10-01 2011-10-01 false Adequate financial records, statistical data, and...

  9. Interpolative modeling of GaAs FET S-parameter data bases for use in Monte Carlo simulations

    NASA Technical Reports Server (NTRS)

    Campbell, L.; Purviance, J.

    1992-01-01

    A statistical interpolation technique is presented for modeling GaAs FET S-parameter measurements for use in the statistical analysis and design of circuits. This is accomplished by interpolating among the measurements in a GaAs FET S-parameter data base in a statistically valid manner.

  10. Statistical Literacy in the Data Science Workplace

    ERIC Educational Resources Information Center

    Grant, Robert

    2017-01-01

    Statistical literacy, the ability to understand and make use of statistical information including methods, has particular relevance in the age of data science, when complex analyses are undertaken by teams from diverse backgrounds. Not only is it essential to communicate to the consumers of information but also within the team. Writing from the…

  11. A new statistic for the analysis of circular data in gamma-ray astronomy

    NASA Technical Reports Server (NTRS)

    Protheroe, R. J.

    1985-01-01

    A new statistic is proposed for the analysis of circular data. The statistic is designed specifically for situations where a test of uniformity is required which is powerful against alternatives in which a small fraction of the observations is grouped in a small range of directions, or phases.

  12. Comparative Financial Statistics for Public Two-Year Colleges: FY 1991 National Sample.

    ERIC Educational Resources Information Center

    Dickmeyer, Nathan; Cirino, Anna Marie

    This report provides comparative financial information derived from a national sample of 503 public two-year colleges. The report includes space for colleges to compare their institutional statistics with data provided on national sample medians; quartile data for the national sample; and statistics presented in various formats, including tables,…

  13. Explorations in Statistics: The Analysis of Ratios and Normalized Data

    ERIC Educational Resources Information Center

    Curran-Everett, Douglas

    2013-01-01

    Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This ninth installment of "Explorations in Statistics" explores the analysis of ratios and normalized--or standardized--data. As researchers, we compute a ratio--a numerator divided by a denominator--to compute a…

  14. Statistics Report on TEQSA Registered Higher Education Providers, 2016

    ERIC Educational Resources Information Center

    Australian Government Tertiary Education Quality and Standards Agency, 2016

    2016-01-01

    This Statistics Report is the third release of selected higher education sector data held by the Australian Government Tertiary Education Quality and Standards Agency (TEQSA) for its quality assurance activities. It provides a snapshot of national statistics on all parts of the sector by bringing together data collected directly by TEQSA with data…

  15. Web-Based Statistical Sampling and Analysis

    ERIC Educational Resources Information Center

    Quinn, Anne; Larson, Karen

    2016-01-01

    Consistent with the Common Core State Standards for Mathematics (CCSSI 2010), the authors write that they have asked students to do statistics projects with real data. To obtain real data, their students use the free Web-based app, Census at School, created by the American Statistical Association (ASA) to help promote civic awareness among school…

  16. Big Data and Neuroimaging.

    PubMed

    Webb-Vargas, Yenny; Chen, Shaojie; Fisher, Aaron; Mejia, Amanda; Xu, Yuting; Crainiceanu, Ciprian; Caffo, Brian; Lindquist, Martin A

    2017-12-01

    Big Data are of increasing importance in a variety of areas, especially in the biosciences. There is an emerging critical need for Big Data tools and methods, because of the potential impact of advancements in these areas. Importantly, statisticians and statistical thinking have a major role to play in creating meaningful progress in this arena. We would like to emphasize this point in this special issue, as it highlights both the dramatic need for statistical input for Big Data analysis and for a greater number of statisticians working on Big Data problems. We use the field of statistical neuroimaging to demonstrate these points. As such, this paper covers several applications and novel methodological developments of Big Data tools applied to neuroimaging data.

  17. A scan statistic to extract causal gene clusters from case-control genome-wide rare CNV data.

    PubMed

    Nishiyama, Takeshi; Takahashi, Kunihiko; Tango, Toshiro; Pinto, Dalila; Scherer, Stephen W; Takami, Satoshi; Kishino, Hirohisa

    2011-05-26

    Several statistical tests have been developed for analyzing genome-wide association data by incorporating gene pathway information in terms of gene sets. Using these methods, hundreds of gene sets are typically tested, and the tested gene sets often overlap. This overlapping greatly increases the probability of generating false positives, and the results obtained are difficult to interpret, particularly when many gene sets show statistical significance. We propose a flexible statistical framework to circumvent these problems. Inspired by spatial scan statistics for detecting clustering of disease occurrence in the field of epidemiology, we developed a scan statistic to extract disease-associated gene clusters from a whole gene pathway. Extracting one or a few significant gene clusters from a global pathway limits the overall false positive probability, which results in increased statistical power, and facilitates the interpretation of test results. In the present study, we applied our method to genome-wide association data for rare copy-number variations, which have been strongly implicated in common diseases. Application of our method to a simulated dataset demonstrated the high accuracy of this method in detecting disease-associated gene clusters in a whole gene pathway. The scan statistic approach proposed here shows a high level of accuracy in detecting gene clusters in a whole gene pathway. This study has provided a sound statistical framework for analyzing genome-wide rare CNV data by incorporating topological information on the gene pathway.

  18. Capturing rogue waves by multi-point statistics

    NASA Astrophysics Data System (ADS)

    Hadjihosseini, A.; Wächter, Matthias; Hoffmann, N. P.; Peinke, J.

    2016-01-01

    As an example of a complex system with extreme events, we investigate ocean wave states exhibiting rogue waves. We present a statistical method of data analysis based on multi-point statistics which for the first time allows the grasping of extreme rogue wave events in a highly satisfactory statistical manner. The key to the success of the approach is mapping the complexity of multi-point data onto the statistics of hierarchically ordered height increments for different time scales, for which we can show that a stochastic cascade process with Markov properties is governed by a Fokker-Planck equation. Conditional probabilities as well as the Fokker-Planck equation itself can be estimated directly from the available observational data. With this stochastic description surrogate data sets can in turn be generated, which makes it possible to work out arbitrary statistical features of the complex sea state in general, and extreme rogue wave events in particular. The results also open up new perspectives for forecasting the occurrence probability of extreme rogue wave events, and even for forecasting the occurrence of individual rogue waves based on precursory dynamics.

  19. Empirical investigation into depth-resolution of Magnetotelluric data

    NASA Astrophysics Data System (ADS)

    Piana Agostinetti, N.; Ogaya, X.

    2017-12-01

    We investigate the depth-resolution of MT data comparing reconstructed 1D resistivity profiles with measured resistivity and lithostratigraphy from borehole data. Inversion of MT data has been widely used to reconstruct the 1D fine-layered resistivity structure beneath an isolated Magnetotelluric (MT) station. Uncorrelated noise is generally assumed to be associated to MT data. However, wrong assumptions on error statistics have been proved to strongly bias the results obtained in geophysical inversions. In particular the number of resolved layers at depth strongly depends on error statistics. In this study, we applied a trans-dimensional McMC algorithm for reconstructing the 1D resistivity profile near-by the location of a 1500 m-deep borehole, using MT data. We resolve the MT inverse problem imposing different models for the error statistics associated to the MT data. Following a Hierachical Bayes' approach, we also inverted for the hyper-parameters associated to each error statistics model. Preliminary results indicate that assuming un-correlated noise leads to a number of resolved layers larger than expected from the retrieved lithostratigraphy. Moreover, comparing the inversion of synthetic resistivity data obtained from the "true" resistivity stratification measured along the borehole shows that a consistent number of resistivity layers can be obtained using a Gaussian model for the error statistics, with substantial correlation length.

  20. Students' Emergent Articulations of Statistical Models and Modeling in Making Informal Statistical Inferences

    ERIC Educational Resources Information Center

    Braham, Hana Manor; Ben-Zvi, Dani

    2017-01-01

    A fundamental aspect of statistical inference is representation of real-world data using statistical models. This article analyzes students' articulations of statistical models and modeling during their first steps in making informal statistical inferences. An integrated modeling approach (IMA) was designed and implemented to help students…

  1. Directory of Michigan Library Statistics. 1994 Edition. Reporting 1992 and 1993 Statistical Activities including: Public Library Statistics, Library Cooperative Statistics, Regional/Subregional Statistics.

    ERIC Educational Resources Information Center

    Leaf, Donald C., Comp.; Neely, Linda, Comp.

    This edition focuses on statistical data supplied by Michigan public libraries, public library cooperatives, and those public libraries which serve as regional or subregional outlets for blind and physically handicapped services. Since statistics in Michigan academic libraries are typically collected in odd-numbered years, they are not included…

  2. Record statistics of financial time series and geometric random walks

    NASA Astrophysics Data System (ADS)

    Sabir, Behlool; Santhanam, M. S.

    2014-09-01

    The study of record statistics of correlated series in physics, such as random walks, is gaining momentum, and several analytical results have been obtained in the past few years. In this work, we study the record statistics of correlated empirical data for which random walk models have relevance. We obtain results for the records statistics of select stock market data and the geometric random walk, primarily through simulations. We show that the distribution of the age of records is a power law with the exponent α lying in the range 1.5≤α≤1.8. Further, the longest record ages follow the Fréchet distribution of extreme value theory. The records statistics of geometric random walk series is in good agreement with that obtained from empirical stock data.

  3. Statistical Reporting Errors and Collaboration on Statistical Analyses in Psychological Science.

    PubMed

    Veldkamp, Coosje L S; Nuijten, Michèle B; Dominguez-Alvarez, Linda; van Assen, Marcel A L M; Wicherts, Jelte M

    2014-01-01

    Statistical analysis is error prone. A best practice for researchers using statistics would therefore be to share data among co-authors, allowing double-checking of executed tasks just as co-pilots do in aviation. To document the extent to which this 'co-piloting' currently occurs in psychology, we surveyed the authors of 697 articles published in six top psychology journals and asked them whether they had collaborated on four aspects of analyzing data and reporting results, and whether the described data had been shared between the authors. We acquired responses for 49.6% of the articles and found that co-piloting on statistical analysis and reporting results is quite uncommon among psychologists, while data sharing among co-authors seems reasonably but not completely standard. We then used an automated procedure to study the prevalence of statistical reporting errors in the articles in our sample and examined the relationship between reporting errors and co-piloting. Overall, 63% of the articles contained at least one p-value that was inconsistent with the reported test statistic and the accompanying degrees of freedom, and 20% of the articles contained at least one p-value that was inconsistent to such a degree that it may have affected decisions about statistical significance. Overall, the probability that a given p-value was inconsistent was over 10%. Co-piloting was not found to be associated with reporting errors.

  4. Statistical Reporting Errors and Collaboration on Statistical Analyses in Psychological Science

    PubMed Central

    Veldkamp, Coosje L. S.; Nuijten, Michèle B.; Dominguez-Alvarez, Linda; van Assen, Marcel A. L. M.; Wicherts, Jelte M.

    2014-01-01

    Statistical analysis is error prone. A best practice for researchers using statistics would therefore be to share data among co-authors, allowing double-checking of executed tasks just as co-pilots do in aviation. To document the extent to which this ‘co-piloting’ currently occurs in psychology, we surveyed the authors of 697 articles published in six top psychology journals and asked them whether they had collaborated on four aspects of analyzing data and reporting results, and whether the described data had been shared between the authors. We acquired responses for 49.6% of the articles and found that co-piloting on statistical analysis and reporting results is quite uncommon among psychologists, while data sharing among co-authors seems reasonably but not completely standard. We then used an automated procedure to study the prevalence of statistical reporting errors in the articles in our sample and examined the relationship between reporting errors and co-piloting. Overall, 63% of the articles contained at least one p-value that was inconsistent with the reported test statistic and the accompanying degrees of freedom, and 20% of the articles contained at least one p-value that was inconsistent to such a degree that it may have affected decisions about statistical significance. Overall, the probability that a given p-value was inconsistent was over 10%. Co-piloting was not found to be associated with reporting errors. PMID:25493918

  5. Descriptive data analysis.

    PubMed

    Thompson, Cheryl Bagley

    2009-01-01

    This 13th article of the Basics of Research series is first in a short series on statistical analysis. These articles will discuss creating your statistical analysis plan, levels of measurement, descriptive statistics, probability theory, inferential statistics, and general considerations for interpretation of the results of a statistical analysis.

  6. Statistical Reference Datasets

    National Institute of Standards and Technology Data Gateway

    Statistical Reference Datasets (Web, free access)   The Statistical Reference Datasets is also supported by the Standard Reference Data Program. The purpose of this project is to improve the accuracy of statistical software by providing reference datasets with certified computational results that enable the objective evaluation of statistical software.

  7. Idaho State University Statistical Portrait, Academic Year 1998-1999.

    ERIC Educational Resources Information Center

    Idaho State Univ., Pocatello. Office of Institutional Research.

    This report provides basic statistical data for Idaho State University, and includes both point-of-time data as well as trend data. The information is divided into sections emphasizing students, programs, faculty and staff, finances, and physical facilities. Student data includes enrollment, geographical distribution, student/faculty ratios,…

  8. BLS Machine-Readable Data and Tabulating Routines.

    ERIC Educational Resources Information Center

    DiFillipo, Tony

    This report describes the machine-readable data and tabulating routines that the Bureau of Labor Statistics (BLS) is prepared to distribute. An introduction discusses the LABSTAT (Labor Statistics) database and the BLS policy on release of unpublished data. Descriptions summarizing data stored in 25 files follow this format: overview, data…

  9. 76 FR 12823 - Enhanced Collection of Relevant Data and Statistics Relating to Women

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-03-09

    ... greater understanding of policies and programs. Preparation of this report revealed the vast data... Collection of Relevant Data and Statistics Relating to Women Memorandum for the Heads of Executive... accompanying website collection of relevant data, will assist Government officials in crafting policies in...

  10. Use of the Global Test Statistic as a Performance Measurement in a Reananlysis of Environmental Health Data

    PubMed Central

    Dymova, Natalya; Hanumara, R. Choudary; Gagnon, Ronald N.

    2009-01-01

    Performance measurement is increasingly viewed as an essential component of environmental and public health protection programs. In characterizing program performance over time, investigators often observe multiple changes resulting from a single intervention across a range of categories. Although a variety of statistical tools allow evaluation of data one variable at a time, the global test statistic is uniquely suited for analyses of categories or groups of interrelated variables. Here we demonstrate how the global test statistic can be applied to environmental and occupational health data for the purpose of making overall statements on the success of targeted intervention strategies. PMID:19696393

  11. Use of the global test statistic as a performance measurement in a reanalysis of environmental health data.

    PubMed

    Dymova, Natalya; Hanumara, R Choudary; Enander, Richard T; Gagnon, Ronald N

    2009-10-01

    Performance measurement is increasingly viewed as an essential component of environmental and public health protection programs. In characterizing program performance over time, investigators often observe multiple changes resulting from a single intervention across a range of categories. Although a variety of statistical tools allow evaluation of data one variable at a time, the global test statistic is uniquely suited for analyses of categories or groups of interrelated variables. Here we demonstrate how the global test statistic can be applied to environmental and occupational health data for the purpose of making overall statements on the success of targeted intervention strategies.

  12. The disagreeable behaviour of the kappa statistic.

    PubMed

    Flight, Laura; Julious, Steven A

    2015-01-01

    It is often of interest to measure the agreement between a number of raters when an outcome is nominal or ordinal. The kappa statistic is used as a measure of agreement. The statistic is highly sensitive to the distribution of the marginal totals and can produce unreliable results. Other statistics such as the proportion of concordance, maximum attainable kappa and prevalence and bias adjusted kappa should be considered to indicate how well the kappa statistic represents agreement in the data. Each kappa should be considered and interpreted based on the context of the data being analysed. Copyright © 2014 John Wiley & Sons, Ltd.

  13. The use of statistical tools in field testing of putative effects of genetically modified plants on nontarget organisms

    PubMed Central

    Semenov, Alexander V; Elsas, Jan Dirk; Glandorf, Debora C M; Schilthuizen, Menno; Boer, Willem F

    2013-01-01

    Abstract To fulfill existing guidelines, applicants that aim to place their genetically modified (GM) insect-resistant crop plants on the market are required to provide data from field experiments that address the potential impacts of the GM plants on nontarget organisms (NTO's). Such data may be based on varied experimental designs. The recent EFSA guidance document for environmental risk assessment (2010) does not provide clear and structured suggestions that address the statistics of field trials on effects on NTO's. This review examines existing practices in GM plant field testing such as the way of randomization, replication, and pseudoreplication. Emphasis is placed on the importance of design features used for the field trials in which effects on NTO's are assessed. The importance of statistical power and the positive and negative aspects of various statistical models are discussed. Equivalence and difference testing are compared, and the importance of checking the distribution of experimental data is stressed to decide on the selection of the proper statistical model. While for continuous data (e.g., pH and temperature) classical statistical approaches – for example, analysis of variance (ANOVA) – are appropriate, for discontinuous data (counts) only generalized linear models (GLM) are shown to be efficient. There is no golden rule as to which statistical test is the most appropriate for any experimental situation. In particular, in experiments in which block designs are used and covariates play a role GLMs should be used. Generic advice is offered that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in this testing. The combination of decision trees and a checklist for field trials, which are provided, will help in the interpretation of the statistical analyses of field trials and to assess whether such analyses were correctly applied. We offer generic advice to risk assessors and applicants that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in field testing. PMID:24567836

  14. The use of statistical tools in field testing of putative effects of genetically modified plants on nontarget organisms.

    PubMed

    Semenov, Alexander V; Elsas, Jan Dirk; Glandorf, Debora C M; Schilthuizen, Menno; Boer, Willem F

    2013-08-01

    To fulfill existing guidelines, applicants that aim to place their genetically modified (GM) insect-resistant crop plants on the market are required to provide data from field experiments that address the potential impacts of the GM plants on nontarget organisms (NTO's). Such data may be based on varied experimental designs. The recent EFSA guidance document for environmental risk assessment (2010) does not provide clear and structured suggestions that address the statistics of field trials on effects on NTO's. This review examines existing practices in GM plant field testing such as the way of randomization, replication, and pseudoreplication. Emphasis is placed on the importance of design features used for the field trials in which effects on NTO's are assessed. The importance of statistical power and the positive and negative aspects of various statistical models are discussed. Equivalence and difference testing are compared, and the importance of checking the distribution of experimental data is stressed to decide on the selection of the proper statistical model. While for continuous data (e.g., pH and temperature) classical statistical approaches - for example, analysis of variance (ANOVA) - are appropriate, for discontinuous data (counts) only generalized linear models (GLM) are shown to be efficient. There is no golden rule as to which statistical test is the most appropriate for any experimental situation. In particular, in experiments in which block designs are used and covariates play a role GLMs should be used. Generic advice is offered that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in this testing. The combination of decision trees and a checklist for field trials, which are provided, will help in the interpretation of the statistical analyses of field trials and to assess whether such analyses were correctly applied. We offer generic advice to risk assessors and applicants that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in field testing.

  15. R and Spatial Data

    EPA Science Inventory

    R is an open source language and environment for statistical computing and graphics that can also be used for both spatial analysis (i.e. geoprocessing and mapping of different types of spatial data) and spatial data analysis (i.e. the application of statistical descriptions and ...

  16. Some Dimensions of Data Quality in Statistical Systems

    DOT National Transportation Integrated Search

    1997-07-01

    An important objective of a statistical data system is to enable users of the data to recommend (an organizations to take) rational action for solving problems or for improving quality of service of manufactured product. With this view in mind, this ...

  17. Statistical methods to estimate treatment effects from multichannel electroencephalography (EEG) data in clinical trials.

    PubMed

    Ma, Junshui; Wang, Shubing; Raubertas, Richard; Svetnik, Vladimir

    2010-07-15

    With the increasing popularity of using electroencephalography (EEG) to reveal the treatment effect in drug development clinical trials, the vast volume and complex nature of EEG data compose an intriguing, but challenging, topic. In this paper the statistical analysis methods recommended by the EEG community, along with methods frequently used in the published literature, are first reviewed. A straightforward adjustment of the existing methods to handle multichannel EEG data is then introduced. In addition, based on the spatial smoothness property of EEG data, a new category of statistical methods is proposed. The new methods use a linear combination of low-degree spherical harmonic (SPHARM) basis functions to represent a spatially smoothed version of the EEG data on the scalp, which is close to a sphere in shape. In total, seven statistical methods, including both the existing and the newly proposed methods, are applied to two clinical datasets to compare their power to detect a drug effect. Contrary to the EEG community's recommendation, our results suggest that (1) the nonparametric method does not outperform its parametric counterpart; and (2) including baseline data in the analysis does not always improve the statistical power. In addition, our results recommend that (3) simple paired statistical tests should be avoided due to their poor power; and (4) the proposed spatially smoothed methods perform better than their unsmoothed versions. Copyright 2010 Elsevier B.V. All rights reserved.

  18. Statistical framework for evaluation of climate model simulations by use of climate proxy data from the last millennium - Part 1: Theory

    NASA Astrophysics Data System (ADS)

    Sundberg, R.; Moberg, A.; Hind, A.

    2012-08-01

    A statistical framework for comparing the output of ensemble simulations from global climate models with networks of climate proxy and instrumental records has been developed, focusing on near-surface temperatures for the last millennium. This framework includes the formulation of a joint statistical model for proxy data, instrumental data and simulation data, which is used to optimize a quadratic distance measure for ranking climate model simulations. An essential underlying assumption is that the simulations and the proxy/instrumental series have a shared component of variability that is due to temporal changes in external forcing, such as volcanic aerosol load, solar irradiance or greenhouse gas concentrations. Two statistical tests have been formulated. Firstly, a preliminary test establishes whether a significant temporal correlation exists between instrumental/proxy and simulation data. Secondly, the distance measure is expressed in the form of a test statistic of whether a forced simulation is closer to the instrumental/proxy series than unforced simulations. The proposed framework allows any number of proxy locations to be used jointly, with different seasons, record lengths and statistical precision. The goal is to objectively rank several competing climate model simulations (e.g. with alternative model parameterizations or alternative forcing histories) by means of their goodness of fit to the unobservable true past climate variations, as estimated from noisy proxy data and instrumental observations.

  19. Estimating chronic disease rates in Canada: which population-wide denominator to use?

    PubMed

    Ellison, J; Nagamuthu, C; Vanderloo, S; McRae, B; Waters, C

    2016-10-01

    Chronic disease rates are produced from the Public Health Agency of Canada's Canadian Chronic Disease Surveillance System (CCDSS) using administrative health data from provincial/territorial health ministries. Denominators for these rates are based on estimates of populations derived from health insurance files. However, these data may not be accessible to all researchers. Another source for population size estimates is the Statistics Canada census. The purpose of our study was to calculate the major differences between the CCDSS and Statistics Canada's population denominators and to identify the sources or reasons for the potential differences between these data sources. We compared the 2009 denominators from the CCDSS and Statistics Canada. The CCDSS denominator was adjusted for the growth components (births, deaths, emigration and immigration) from Statistics Canada's census data. The unadjusted CCDSS denominator was 34 429 804, 3.2% higher than Statistics Canada's estimate of population in 2009. After the CCDSS denominator was adjusted for the growth components, the difference between the two estimates was reduced to 431 323 people, a difference of 1.3%. The CCDSS overestimates the population relative to Statistics Canada overall. The largest difference between the two estimates was from the migrant growth component, while the smallest was from the emigrant component. By using data descriptions by data source, researchers can make decisions about which population to use in their calculations of disease frequency.

  20. Replicate This! Creating Individual-Level Data from Summary Statistics Using R

    ERIC Educational Resources Information Center

    Morse, Brendan J.

    2013-01-01

    Incorporating realistic data and research examples into quantitative (e.g., statistics and research methods) courses has been widely recommended for enhancing student engagement and comprehension. One way to achieve these ends is to use a data generator to emulate the data in published research articles. "MorseGen" is a free data generator that…

  1. Equivalent statistics and data interpretation.

    PubMed

    Francis, Gregory

    2017-08-01

    Recent reform efforts in psychological science have led to a plethora of choices for scientists to analyze their data. A scientist making an inference about their data must now decide whether to report a p value, summarize the data with a standardized effect size and its confidence interval, report a Bayes Factor, or use other model comparison methods. To make good choices among these options, it is necessary for researchers to understand the characteristics of the various statistics used by the different analysis frameworks. Toward that end, this paper makes two contributions. First, it shows that for the case of a two-sample t test with known sample sizes, many different summary statistics are mathematically equivalent in the sense that they are based on the very same information in the data set. When the sample sizes are known, the p value provides as much information about a data set as the confidence interval of Cohen's d or a JZS Bayes factor. Second, this equivalence means that different analysis methods differ only in their interpretation of the empirical data. At first glance, it might seem that mathematical equivalence of the statistics suggests that it does not matter much which statistic is reported, but the opposite is true because the appropriateness of a reported statistic is relative to the inference it promotes. Accordingly, scientists should choose an analysis method appropriate for their scientific investigation. A direct comparison of the different inferential frameworks provides some guidance for scientists to make good choices and improve scientific practice.

  2. Computer and Software Use in Teaching the Beginning Statistics Course.

    ERIC Educational Resources Information Center

    Bartz, Albert E.; Sabolik, Marisa A.

    2001-01-01

    Surveys the extent of computer usage in the beginning statistics course and the variety of statistics software used. Finds that 69% of the psychology departments used computers in beginning statistics courses and 90% used computer-assisted data analysis in statistics or other courses. (CMK)

  3. Drug safety data mining with a tree-based scan statistic.

    PubMed

    Kulldorff, Martin; Dashevsky, Inna; Avery, Taliser R; Chan, Arnold K; Davis, Robert L; Graham, David; Platt, Richard; Andrade, Susan E; Boudreau, Denise; Gunter, Margaret J; Herrinton, Lisa J; Pawloski, Pamala A; Raebel, Marsha A; Roblin, Douglas; Brown, Jeffrey S

    2013-05-01

    In post-marketing drug safety surveillance, data mining can potentially detect rare but serious adverse events. Assessing an entire collection of drug-event pairs is traditionally performed on a predefined level of granularity. It is unknown a priori whether a drug causes a very specific or a set of related adverse events, such as mitral valve disorders, all valve disorders, or different types of heart disease. This methodological paper evaluates the tree-based scan statistic data mining method to enhance drug safety surveillance. We use a three-million-member electronic health records database from the HMO Research Network. Using the tree-based scan statistic, we assess the safety of selected antifungal and diabetes drugs, simultaneously evaluating overlapping diagnosis groups at different granularity levels, adjusting for multiple testing. Expected and observed adverse event counts were adjusted for age, sex, and health plan, producing a log likelihood ratio test statistic. Out of 732 evaluated disease groupings, 24 were statistically significant, divided among 10 non-overlapping disease categories. Five of the 10 signals are known adverse effects, four are likely due to confounding by indication, while one may warrant further investigation. The tree-based scan statistic can be successfully applied as a data mining tool in drug safety surveillance using observational data. The total number of statistical signals was modest and does not imply a causal relationship. Rather, data mining results should be used to generate candidate drug-event pairs for rigorous epidemiological studies to evaluate the individual and comparative safety profiles of drugs. Copyright © 2013 John Wiley & Sons, Ltd.

  4. Petroleum supply monthly, June 1999, with data for April 1999

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    NONE

    Data presented in the Petroleum Supply Monthly (PSM) describe the supply and disposition of petroleum products in the US and major US geographic regions. The data series describe production, imports and exports, inter-Petroleum Administration for Defense (PAD) District movements, and inventories by the primary suppliers of petroleum products in the US (50 States and the District of Columbia). The reporting universe includes those petroleum sectors in primary supply. Included are: petroleum refiners, motor gasoline blenders, operators of natural gas processing plants and fractionators, inter-PAD transporters, importers, and major inventory holders of petroleum products and crude oil. When aggregated, the datamore » reported by these sectors approximately represent the consumption of petroleum products in the US. Data presented in the PSM are divided into two sections: Summary Statistics and Detailed Statistics. The tables and figures in the Summary Statistics section of the PSM present a time series of selected petroleum data on a US level. The Detailed Statistics tables of the PSM present statistics for the most current month available as well as year-to-date. 16 figs., 66 tabs.« less

  5. How the Mastery Rubric for Statistical Literacy Can Generate Actionable Evidence about Statistical and Quantitative Learning Outcomes

    ERIC Educational Resources Information Center

    Tractenberg, Rochelle E.

    2017-01-01

    Statistical literacy is essential to an informed citizenry; and two emerging trends highlight a growing need for training that achieves this literacy. The first trend is towards "big" data: while automated analyses can exploit massive amounts of data, the interpretation--and possibly more importantly, the replication--of results are…

  6. Analysis of thrips distribution: application of spatial statistics and Kriging

    Treesearch

    John Aleong; Bruce L. Parker; Margaret Skinner; Diantha Howard

    1991-01-01

    Kriging is a statistical technique that provides predictions for spatially and temporally correlated data. Observations of thrips distribution and density in Vermont soils are made in both space and time. Traditional statistical analysis of such data assumes that the counts taken over space and time are independent, which is not necessarily true. Therefore, to analyze...

  7. The Higher Education System in Israel: Statistical Abstract and Analysis.

    ERIC Educational Resources Information Center

    Herskovic, Shlomo

    This edition of a statistical abstract published every few years on the higher education system in Israel presents the most recent data available through 1990-91. The data were gathered through the cooperation of the Central Bureau of Statistics and institutions of higher education. Chapter 1 presents a summary of principal findings covering the…

  8. Outliers in Questionnaire Data: Can They Be Detected and Should They Be Removed?

    ERIC Educational Resources Information Center

    Zijlstra, Wobbe P.; van der Ark, L. Andries; Sijtsma, Klaas

    2011-01-01

    Outliers in questionnaire data are unusual observations, which may bias statistical results, and outlier statistics may be used to detect such outliers. The authors investigated the effect outliers have on the specificity and the sensitivity of each of six different outlier statistics. The Mahalanobis distance and the item-pair based outlier…

  9. The Development Data Book: A Guide to Social and Economic Statistics. Second Edition.

    ERIC Educational Resources Information Center

    Sheram, Katherine

    This data book presents satistics on countries with populations of more than one million. The statistics relate to economic development and the changes it is bringing about in the world. These statistics are measures of social and economic conditions in developing and industrial countries. Five indicators of economic development are presented,…

  10. Comparative Financial Statistics for Public Two-Year Colleges: FY 1992 National Sample.

    ERIC Educational Resources Information Center

    Dickmeyer, Nathan; Cirino, Anna Marie

    This report, the 15th in an annual series, provides comparative information derived from a national sample of 544 public two-year colleges, highlighting financial statistics for fiscal year 1991-92. The report offers space for colleges to compare their institutional statistics with data provided on national sample medians; quartile data for the…

  11. What We Know about Community College Low-Income and Minority Student Outcomes: Descriptive Statistics from National Surveys

    ERIC Educational Resources Information Center

    Bailey, Thomas; Jenkins, Davis; Leinbach, Timothy

    2005-01-01

    This report summarizes the latest available national statistics on access and attainment by low income and minority community college students. The data come from the National Center for Education Statistics' (NCES) Integrated Postsecondary Education Data System (IPEDS) annual surveys of all postsecondary educational institutions and the NCES…

  12. 75 FR 79320 - Animal Drugs, Feeds, and Related Products; Regulation of Carcinogenic Compounds in Food-Producing...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-12-20

    ... is calculated from tumor data of the cancer bioassays using a statistical extrapolation procedure... carcinogenic concern currently set forth in Sec. 500.84 utilizes a statistical extrapolation procedure that... procedures did not rely on a statistical extrapolation of the data to a 1 in 1 million risk of cancer to test...

  13. Statistical Literacy as a Function of Online versus Hybrid Course Delivery Format for an Introductory Graduate Statistics Course

    ERIC Educational Resources Information Center

    Hahs-Vaughn, Debbie L.; Acquaye, Hannah; Griffith, Matthew D.; Jo, Hang; Matthews, Ken; Acharya, Parul

    2017-01-01

    Statistical literacy refers to understanding fundamental statistical concepts. Assessment of statistical literacy can take the forms of tasks that require students to identify, translate, compute, read, and interpret data. In addition, statistical instruction can take many forms encompassing course delivery format such as face-to-face, hybrid,…

  14. Using the Bootstrap Method for a Statistical Significance Test of Differences between Summary Histograms

    NASA Technical Reports Server (NTRS)

    Xu, Kuan-Man

    2006-01-01

    A new method is proposed to compare statistical differences between summary histograms, which are the histograms summed over a large ensemble of individual histograms. It consists of choosing a distance statistic for measuring the difference between summary histograms and using a bootstrap procedure to calculate the statistical significance level. Bootstrapping is an approach to statistical inference that makes few assumptions about the underlying probability distribution that describes the data. Three distance statistics are compared in this study. They are the Euclidean distance, the Jeffries-Matusita distance and the Kuiper distance. The data used in testing the bootstrap method are satellite measurements of cloud systems called cloud objects. Each cloud object is defined as a contiguous region/patch composed of individual footprints or fields of view. A histogram of measured values over footprints is generated for each parameter of each cloud object and then summary histograms are accumulated over all individual histograms in a given cloud-object size category. The results of statistical hypothesis tests using all three distances as test statistics are generally similar, indicating the validity of the proposed method. The Euclidean distance is determined to be most suitable after comparing the statistical tests of several parameters with distinct probability distributions among three cloud-object size categories. Impacts on the statistical significance levels resulting from differences in the total lengths of satellite footprint data between two size categories are also discussed.

  15. Modeling and replicating statistical topology and evidence for CMB nonhomogeneity

    PubMed Central

    Agami, Sarit

    2017-01-01

    Under the banner of “big data,” the detection and classification of structure in extremely large, high-dimensional, data sets are two of the central statistical challenges of our times. Among the most intriguing new approaches to this challenge is “TDA,” or “topological data analysis,” one of the primary aims of which is providing nonmetric, but topologically informative, preanalyses of data which make later, more quantitative, analyses feasible. While TDA rests on strong mathematical foundations from topology, in applications, it has faced challenges due to difficulties in handling issues of statistical reliability and robustness, often leading to an inability to make scientific claims with verifiable levels of statistical confidence. We propose a methodology for the parametric representation, estimation, and replication of persistence diagrams, the main diagnostic tool of TDA. The power of the methodology lies in the fact that even if only one persistence diagram is available for analysis—the typical case for big data applications—the replications permit conventional statistical hypothesis testing. The methodology is conceptually simple and computationally practical, and provides a broadly effective statistical framework for persistence diagram TDA analysis. We demonstrate the basic ideas on a toy example, and the power of the parametric approach to TDA modeling in an analysis of cosmic microwave background (CMB) nonhomogeneity. PMID:29078301

  16. mvp - an open-source preprocessor for cleaning duplicate records and missing values in mass spectrometry data.

    PubMed

    Lee, Geunho; Lee, Hyun Beom; Jung, Byung Hwa; Nam, Hojung

    2017-07-01

    Mass spectrometry (MS) data are used to analyze biological phenomena based on chemical species. However, these data often contain unexpected duplicate records and missing values due to technical or biological factors. These 'dirty data' problems increase the difficulty of performing MS analyses because they lead to performance degradation when statistical or machine-learning tests are applied to the data. Thus, we have developed missing values preprocessor (mvp), an open-source software for preprocessing data that might include duplicate records and missing values. mvp uses the property of MS data in which identical chemical species present the same or similar values for key identifiers, such as the mass-to-charge ratio and intensity signal, and forms cliques via graph theory to process dirty data. We evaluated the validity of the mvp process via quantitative and qualitative analyses and compared the results from a statistical test that analyzed the original and mvp-applied data. This analysis showed that using mvp reduces problems associated with duplicate records and missing values. We also examined the effects of using unprocessed data in statistical tests and examined the improved statistical test results obtained with data preprocessed using mvp.

  17. General Aviation Avionics Statistics : 1978 Data

    DOT National Transportation Integrated Search

    1980-12-01

    The report presents avionics statistics for the 1978 general aviation (GA) aircraft fleet and is the fifth in a series titled "General Aviation Statistics." The statistics are presented in a capability group framework which enables one to relate airb...

  18. General Aviation Avionics Statistics : 1979 Data

    DOT National Transportation Integrated Search

    1981-04-01

    This report presents avionics statistics for the 1979 general aviation (GA) aircraft fleet and is the sixth in a series titled General Aviation Avionics Statistics. The statistics preseneted in a capability group framework which enables one to relate...

  19. [The research protocol VI: How to choose the appropriate statistical test. Inferential statistics].

    PubMed

    Flores-Ruiz, Eric; Miranda-Novales, María Guadalupe; Villasís-Keever, Miguel Ángel

    2017-01-01

    The statistical analysis can be divided in two main components: descriptive analysis and inferential analysis. An inference is to elaborate conclusions from the tests performed with the data obtained from a sample of a population. Statistical tests are used in order to establish the probability that a conclusion obtained from a sample is applicable to the population from which it was obtained. However, choosing the appropriate statistical test in general poses a challenge for novice researchers. To choose the statistical test it is necessary to take into account three aspects: the research design, the number of measurements and the scale of measurement of the variables. Statistical tests are divided into two sets, parametric and nonparametric. Parametric tests can only be used if the data show a normal distribution. Choosing the right statistical test will make it easier for readers to understand and apply the results.

  20. Reserve Manpower Statistics, 1 January - 31 March 1986.

    DTIC Science & Technology

    1986-03-31

    This is the first issue of Reserv’e Nztnpowe Statistics , a quarterly publication based upon data from the Reserve Components Common Personnel Data System...1.2~5 MI ’CROCOPY RESOLUTION TEST CHART NATIONAL BUREAU OF STANDARDS-1963-A ,iI M15 o Department of Defense RESERVE MANPOWER STATISTICS __ March 31...1986 GUARD JID I A* LECTE3I ___SEP 17 1986 it % Ii TA 9 WWto’ VubUd reIOWa jiW, ii~ Department of Defense Reserve Manpower Statistics March 31

  1. Applying Bayesian statistics to the study of psychological trauma: A suggestion for future research.

    PubMed

    Yalch, Matthew M

    2016-03-01

    Several contemporary researchers have noted the virtues of Bayesian methods of data analysis. Although debates continue about whether conventional or Bayesian statistics is the "better" approach for researchers in general, there are reasons why Bayesian methods may be well suited to the study of psychological trauma in particular. This article describes how Bayesian statistics offers practical solutions to the problems of data non-normality, small sample size, and missing data common in research on psychological trauma. After a discussion of these problems and the effects they have on trauma research, this article explains the basic philosophical and statistical foundations of Bayesian statistics and how it provides solutions to these problems using an applied example. Results of the literature review and the accompanying example indicates the utility of Bayesian statistics in addressing problems common in trauma research. Bayesian statistics provides a set of methodological tools and a broader philosophical framework that is useful for trauma researchers. Methodological resources are also provided so that interested readers can learn more. (c) 2016 APA, all rights reserved).

  2. A Guerilla Guide to Common Problems in ‘Neurostatistics’: Essential Statistical Topics in Neuroscience

    PubMed Central

    Smith, Paul F.

    2017-01-01

    Effective inferential statistical analysis is essential for high quality studies in neuroscience. However, recently, neuroscience has been criticised for the poor use of experimental design and statistical analysis. Many of the statistical issues confronting neuroscience are similar to other areas of biology; however, there are some that occur more regularly in neuroscience studies. This review attempts to provide a succinct overview of some of the major issues that arise commonly in the analyses of neuroscience data. These include: the non-normal distribution of the data; inequality of variance between groups; extensive correlation in data for repeated measurements across time or space; excessive multiple testing; inadequate statistical power due to small sample sizes; pseudo-replication; and an over-emphasis on binary conclusions about statistical significance as opposed to effect sizes. Statistical analysis should be viewed as just another neuroscience tool, which is critical to the final outcome of the study. Therefore, it needs to be done well and it is a good idea to be proactive and seek help early, preferably before the study even begins. PMID:29371855

  3. A Guerilla Guide to Common Problems in 'Neurostatistics': Essential Statistical Topics in Neuroscience.

    PubMed

    Smith, Paul F

    2017-01-01

    Effective inferential statistical analysis is essential for high quality studies in neuroscience. However, recently, neuroscience has been criticised for the poor use of experimental design and statistical analysis. Many of the statistical issues confronting neuroscience are similar to other areas of biology; however, there are some that occur more regularly in neuroscience studies. This review attempts to provide a succinct overview of some of the major issues that arise commonly in the analyses of neuroscience data. These include: the non-normal distribution of the data; inequality of variance between groups; extensive correlation in data for repeated measurements across time or space; excessive multiple testing; inadequate statistical power due to small sample sizes; pseudo-replication; and an over-emphasis on binary conclusions about statistical significance as opposed to effect sizes. Statistical analysis should be viewed as just another neuroscience tool, which is critical to the final outcome of the study. Therefore, it needs to be done well and it is a good idea to be proactive and seek help early, preferably before the study even begins.

  4. Healthy Water

    MedlinePlus

    ... Medical Professionals Aquatics, Water Utilities, & Other Water-related Sectors Publications, Data, & Statistics Get Email Updates To receive ... Medical Professionals Aquatics, Water Utilities, & Other Water-related Sectors Publications, Data, & Statistics Magnitude & Burden of Waterborne Disease ...

  5. Supervised classification in the presence of misclassified training data: a Monte Carlo simulation study in the three group case.

    PubMed

    Bolin, Jocelyn Holden; Finch, W Holmes

    2014-01-01

    Statistical classification of phenomena into observed groups is very common in the social and behavioral sciences. Statistical classification methods, however, are affected by the characteristics of the data under study. Statistical classification can be further complicated by initial misclassification of the observed groups. The purpose of this study is to investigate the impact of initial training data misclassification on several statistical classification and data mining techniques. Misclassification conditions in the three group case will be simulated and results will be presented in terms of overall as well as subgroup classification accuracy. Results show decreased classification accuracy as sample size, group separation and group size ratio decrease and as misclassification percentage increases with random forests demonstrating the highest accuracy across conditions.

  6. Evaluation of data generated by statistically oriented end-result specifications : final report.

    DOT National Transportation Integrated Search

    1979-01-01

    This report is concerned with the review of data generated on projects let under the statistically oriented end-result specifications (ERS) on asphaltic concrete and Portland cement concrete. The review covered analysis of data for determination of p...

  7. A Novel Genome-Information Content-Based Statistic for Genome-Wide Association Analysis Designed for Next-Generation Sequencing Data

    PubMed Central

    Luo, Li; Zhu, Yun

    2012-01-01

    Abstract The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T2, collapsing method, multivariate and collapsing (CMC) method, individual χ2 test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets. PMID:22651812

  8. A novel genome-information content-based statistic for genome-wide association analysis designed for next-generation sequencing data.

    PubMed

    Luo, Li; Zhu, Yun; Xiong, Momiao

    2012-06-01

    The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T(2), collapsing method, multivariate and collapsing (CMC) method, individual χ(2) test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets.

  9. Fundamentals and Catalytic Innovation: The Statistical and Data Management Center of the Antibacterial Resistance Leadership Group.

    PubMed

    Huvane, Jacqueline; Komarow, Lauren; Hill, Carol; Tran, Thuy Tien T; Pereira, Carol; Rosenkranz, Susan L; Finnemeyer, Matt; Earley, Michelle; Jiang, Hongyu Jeanne; Wang, Rui; Lok, Judith; Evans, Scott R

    2017-03-15

    The Statistical and Data Management Center (SDMC) provides the Antibacterial Resistance Leadership Group (ARLG) with statistical and data management expertise to advance the ARLG research agenda. The SDMC is active at all stages of a study, including design; data collection and monitoring; data analyses and archival; and publication of study results. The SDMC enhances the scientific integrity of ARLG studies through the development and implementation of innovative and practical statistical methodologies and by educating research colleagues regarding the application of clinical trial fundamentals. This article summarizes the challenges and roles, as well as the innovative contributions in the design, monitoring, and analyses of clinical trials and diagnostic studies, of the ARLG SDMC. © The Author 2017. Published by Oxford University Press for the Infectious Diseases Society of America. All rights reserved. For permissions, e-mail: journals.permissions@oup.com.

  10. Generalized massive optimal data compression

    NASA Astrophysics Data System (ADS)

    Alsing, Justin; Wandelt, Benjamin

    2018-05-01

    In this paper, we provide a general procedure for optimally compressing N data down to n summary statistics, where n is equal to the number of parameters of interest. We show that compression to the score function - the gradient of the log-likelihood with respect to the parameters - yields n compressed statistics that are optimal in the sense that they preserve the Fisher information content of the data. Our method generalizes earlier work on linear Karhunen-Loéve compression for Gaussian data whilst recovering both lossless linear compression and quadratic estimation as special cases when they are optimal. We give a unified treatment that also includes the general non-Gaussian case as long as mild regularity conditions are satisfied, producing optimal non-linear summary statistics when appropriate. As a worked example, we derive explicitly the n optimal compressed statistics for Gaussian data in the general case where both the mean and covariance depend on the parameters.

  11. Experimental Mathematics and Computational Statistics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bailey, David H.; Borwein, Jonathan M.

    2009-04-30

    The field of statistics has long been noted for techniques to detect patterns and regularities in numerical data. In this article we explore connections between statistics and the emerging field of 'experimental mathematics'. These includes both applications of experimental mathematics in statistics, as well as statistical methods applied to computational mathematics.

  12. Time Series Analysis Based on Running Mann Whitney Z Statistics

    USDA-ARS?s Scientific Manuscript database

    A sensitive and objective time series analysis method based on the calculation of Mann Whitney U statistics is described. This method samples data rankings over moving time windows, converts those samples to Mann-Whitney U statistics, and then normalizes the U statistics to Z statistics using Monte-...

  13. Flexible statistical modelling detects clinical functional magnetic resonance imaging activation in partially compliant subjects.

    PubMed

    Waites, Anthony B; Mannfolk, Peter; Shaw, Marnie E; Olsrud, Johan; Jackson, Graeme D

    2007-02-01

    Clinical functional magnetic resonance imaging (fMRI) occasionally fails to detect significant activation, often due to variability in task performance. The present study seeks to test whether a more flexible statistical analysis can better detect activation, by accounting for variance associated with variable compliance to the task over time. Experimental results and simulated data both confirm that even at 80% compliance to the task, such a flexible model outperforms standard statistical analysis when assessed using the extent of activation (experimental data), goodness of fit (experimental data), and area under the operator characteristic curve (simulated data). Furthermore, retrospective examination of 14 clinical fMRI examinations reveals that in patients where the standard statistical approach yields activation, there is a measurable gain in model performance in adopting the flexible statistical model, with little or no penalty in lost sensitivity. This indicates that a flexible model should be considered, particularly for clinical patients who may have difficulty complying fully with the study task.

  14. Diffusion-Based Density-Equalizing Maps: an Interdisciplinary Approach to Visualizing Homicide Rates and Other Georeferenced Statistical Data

    NASA Astrophysics Data System (ADS)

    Mazzitello, Karina I.; Candia, Julián

    2012-12-01

    In every country, public and private agencies allocate extensive funding to collect large-scale statistical data, which in turn are studied and analyzed in order to determine local, regional, national, and international policies regarding all aspects relevant to the welfare of society. One important aspect of that process is the visualization of statistical data with embedded geographical information, which most often relies on archaic methods such as maps colored according to graded scales. In this work, we apply nonstandard visualization techniques based on physical principles. We illustrate the method with recent statistics on homicide rates in Brazil and their correlation to other publicly available data. This physics-based approach provides a novel tool that can be used by interdisciplinary teams investigating statistics and model projections in a variety of fields such as economics and gross domestic product research, public health and epidemiology, sociodemographics, political science, business and marketing, and many others.

  15. Advances in Statistical Methods for Substance Abuse Prevention Research

    PubMed Central

    MacKinnon, David P.; Lockwood, Chondra M.

    2010-01-01

    The paper describes advances in statistical methods for prevention research with a particular focus on substance abuse prevention. Standard analysis methods are extended to the typical research designs and characteristics of the data collected in prevention research. Prevention research often includes longitudinal measurement, clustering of data in units such as schools or clinics, missing data, and categorical as well as continuous outcome variables. Statistical methods to handle these features of prevention data are outlined. Developments in mediation, moderation, and implementation analysis allow for the extraction of more detailed information from a prevention study. Advancements in the interpretation of prevention research results include more widespread calculation of effect size and statistical power, the use of confidence intervals as well as hypothesis testing, detailed causal analysis of research findings, and meta-analysis. The increased availability of statistical software has contributed greatly to the use of new methods in prevention research. It is likely that the Internet will continue to stimulate the development and application of new methods. PMID:12940467

  16. Promoting Statistical Thinking in Schools with Road Injury Data

    ERIC Educational Resources Information Center

    Woltman, Marie

    2017-01-01

    Road injury is an immediately relevant topic for 9-19 year olds. Current availability of Open Data makes it increasingly possible to find locally relevant data. Statistical lessons developed from these data can mutually reinforce life lessons about minimizing risk on the road. Devon County Council demonstrate how a wide array of statistical…

  17. 75 FR 80364 - Sample Income Data To Meet the Low-Income Definition

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-12-22

    ..., that member data drawn from loan applications or member surveys be used to show a majority of the... particular credit union are compared with like categories of statistical data on income from the Census... of statistical data. For example, if a credit union provides individual income information for...

  18. The Effects of Data and Graph Type on Concepts and Visualizations of Variability

    ERIC Educational Resources Information Center

    Cooper, Linda L.; Shore, Felice S.

    2010-01-01

    Recognizing and interpreting variability in data lies at the heart of statistical reasoning. Since graphical displays should facilitate communication about data, statistical literacy should include an understanding of how variability in data can be gleaned from a graph. This paper identifies several types of graphs that students typically…

  19. Using Statistics to Lie, Distort, and Abuse Data

    ERIC Educational Resources Information Center

    Bintz, William; Moore, Sara; Adams, Cheryll; Pierce, Rebecca

    2009-01-01

    Statistics is a branch of mathematics that involves organization, presentation, and interpretation of data, both quantitative and qualitative. Data do not lie, but people do. On the surface, quantitative data are basically inanimate objects, nothing more than lifeless and meaningless symbols that appear on a page, calculator, computer, or in one's…

  20. Statistical Methodology for the Analysis of Repeated Duration Data in Behavioral Studies

    ERIC Educational Resources Information Center

    Letué, Frédérique; Martinez, Marie-José; Samson, Adeline; Vilain, Anne; Vilain, Coriandre

    2018-01-01

    Purpose: Repeated duration data are frequently used in behavioral studies. Classical linear or log-linear mixed models are often inadequate to analyze such data, because they usually consist of nonnegative and skew-distributed variables. Therefore, we recommend use of a statistical methodology specific to duration data. Method: We propose a…

  1. Primary, Secondary, and Meta-Analysis of Research

    ERIC Educational Resources Information Center

    Glass, Gene V.

    1976-01-01

    Examines data analysis at three levels: analysis of data; secondary analysis is the re-analysis of data for the purpose of answering the original research question with better statistical techniques, or answering new questions with old data; and, meta-analysis refers to the statistical analysis of many analysis results from individual studies for…

  2. A high-resolution open biomass burning emission inventory based on statistical data and MODIS observations in mainland China

    NASA Astrophysics Data System (ADS)

    Xu, Y.; Fan, M.; Huang, Z.; Zheng, J.; Chen, L.

    2017-12-01

    Open biomass burning which has adverse effects on air quality and human health is an important source of gas and particulate matter (PM) in China. Current emission estimations of open biomass burning are generally based on single source (alternative to statistical data and satellite-derived data) and thus contain large uncertainty due to the limitation of data. In this study, to quantify the 2015-based amount of open biomass burning, we established a new estimation method for open biomass burning activity levels by combining the bottom-up statistical data and top-down MODIS observations. And three sub-category sources which used different activity data were considered. For open crop residue burning, the "best estimate" of activity data was obtained by averaging the statistical data from China statistical yearbooks and satellite observations from MODIS burned area product MCD64A1 weighted by their uncertainties. For the forest and grassland fires, their activity levels were represented by the combination of statistical data and MODIS active fire product MCD14ML. Using the fire radiative power (FRP) which is considered as a better indicator of active fire level as the spatial allocation surrogate, coarse gridded emissions were reallocated into 3km ×3km grids to get a high-resolution emission inventory. Our results showed that emissions of CO, NOx, SO2, NH3, VOCs, PM2.5, PM10, BC and OC in mainland China were 6607, 427, 84, 79, 1262, 1198, 1222, 159 and 686 Gg/yr, respectively. Among all provinces of China, Henan, Shandong and Heilongjiang were the top three contributors to the total emissions. In this study, the developed open biomass burning emission inventory with a high-resolution could support air quality modeling and policy-making for pollution control.

  3. Highway Statistics 1994

    DOT National Transportation Integrated Search

    1995-10-01

    This is an annual report containing analyzed statistical data on motor fuel; motor vehicles; driver licensing; highway-user taxation; State highway finance; highway mileage; Federal aid for highways; highway finance data for municipalities, counties,...

  4. CADDIS Volume 4. Data Analysis: Biological and Environmental Data Requirements

    EPA Pesticide Factsheets

    Overview of PECBO Module, using scripts to infer environmental conditions from biological observations, statistically estimating species-environment relationships, methods for inferring environmental conditions, statistical scripts in module.

  5. Estimation of Global Network Statistics from Incomplete Data

    PubMed Central

    Bliss, Catherine A.; Danforth, Christopher M.; Dodds, Peter Sheridan

    2014-01-01

    Complex networks underlie an enormous variety of social, biological, physical, and virtual systems. A profound complication for the science of complex networks is that in most cases, observing all nodes and all network interactions is impossible. Previous work addressing the impacts of partial network data is surprisingly limited, focuses primarily on missing nodes, and suggests that network statistics derived from subsampled data are not suitable estimators for the same network statistics describing the overall network topology. We generate scaling methods to predict true network statistics, including the degree distribution, from only partial knowledge of nodes, links, or weights. Our methods are transparent and do not assume a known generating process for the network, thus enabling prediction of network statistics for a wide variety of applications. We validate analytical results on four simulated network classes and empirical data sets of various sizes. We perform subsampling experiments by varying proportions of sampled data and demonstrate that our scaling methods can provide very good estimates of true network statistics while acknowledging limits. Lastly, we apply our techniques to a set of rich and evolving large-scale social networks, Twitter reply networks. Based on 100 million tweets, we use our scaling techniques to propose a statistical characterization of the Twitter Interactome from September 2008 to November 2008. Our treatment allows us to find support for Dunbar's hypothesis in detecting an upper threshold for the number of active social contacts that individuals maintain over the course of one week. PMID:25338183

  6. Compression Algorithm Analysis of In-Situ (S)TEM Video: Towards Automatic Event Detection and Characterization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Teuton, Jeremy R.; Griswold, Richard L.; Mehdi, Beata L.

    Precise analysis of both (S)TEM images and video are time and labor intensive processes. As an example, determining when crystal growth and shrinkage occurs during the dynamic process of Li dendrite deposition and stripping involves manually scanning through each frame in the video to extract a specific set of frames/images. For large numbers of images, this process can be very time consuming, so a fast and accurate automated method is desirable. Given this need, we developed software that uses analysis of video compression statistics for detecting and characterizing events in large data sets. This software works by converting the datamore » into a series of images which it compresses into an MPEG-2 video using the open source “avconv” utility [1]. The software does not use the video itself, but rather analyzes the video statistics from the first pass of the video encoding that avconv records in the log file. This file contains statistics for each frame of the video including the frame quality, intra-texture and predicted texture bits, forward and backward motion vector resolution, among others. In all, avconv records 15 statistics for each frame. By combining different statistics, we have been able to detect events in various types of data. We have developed an interactive tool for exploring the data and the statistics that aids the analyst in selecting useful statistics for each analysis. Going forward, an algorithm for detecting and possibly describing events automatically can be written based on statistic(s) for each data type.« less

  7. Pitfalls of national routine death statistics for maternal mortality study.

    PubMed

    Saucedo, Monica; Bouvier-Colle, Marie-Hélène; Chantry, Anne A; Lamarche-Vadel, Agathe; Rey, Grégoire; Deneux-Tharaux, Catherine

    2014-11-01

    The lessons learned from the study of maternal deaths depend on the accuracy of data. Our objective was to assess time trends in the underestimation of maternal mortality (MM) in the national routine death statistics in France and to evaluate their current accuracy for the selection and causes of maternal deaths. National data obtained by enhanced methods in 1989, 1999, and 2007-09 were used as the gold standard to assess time trends in the underestimation of MM ratios (MMRs) in death statistics. Enhanced data and death statistics for 2007-09 were further compared by characterising false negatives (FNs) and false positives (FPs). The distribution of cause-specific MMRs, as assessed by each system, was described. Underestimation of MM in death statistics decreased from 55.6% in 1989 to 11.4% in 2007-09 (P < 0.001). In 2007-09, of 787 pregnancy-associated deaths, 254 were classified as maternal by the enhanced system and 211 by the death statistics; 34% of maternal deaths in the enhanced system were FNs in the death statistics, and 20% of maternal deaths in the death statistics were FPs. The hierarchy of causes of MM differed between the two systems. The discordances were mainly explained by the lack of precision in the drafting of death certificates by clinicians. Although the underestimation of MM in routine death statistics has decreased substantially over time, one third of maternal deaths remain unidentified, and the main causes of death are incorrectly identified in these data. Defining relevant priorities in maternal health requires the use of enhanced methods for MM study. © 2014 John Wiley & Sons Ltd.

  8. Statistical studies of animal response data from USF toxicity screening test method

    NASA Technical Reports Server (NTRS)

    Hilado, C. J.; Machado, A. M.

    1978-01-01

    Statistical examination of animal response data obtained using Procedure B of the USF toxicity screening test method indicates that the data deviate only slightly from a normal or Gaussian distribution. This slight departure from normality is not expected to invalidate conclusions based on theoretical statistics. Comparison of times to staggering, convulsions, collapse, and death as endpoints shows that time to death appears to be the most reliable endpoint because it offers the lowest probability of missed observations and premature judgements.

  9. Regression Analysis of Long-term Profile Ozone Data Set from BUV Instruments

    NASA Technical Reports Server (NTRS)

    Frith, Stacey; Taylor, Steve; DeLand, Matt; Ahn, Chang-Woo; Stolarski, Richard S.

    2005-01-01

    We have produced a profile merged ozone data set (MOD) based on the SBUV/SBUV2 series of nadir-viewing satellite backscatter instruments, covering the period from November 1978 - December 2003. In 2004, data from the Nimbus 7 SBUV and NOAA 9,11, and 16 SBUV/2 instruments were reprocessed using the Version 8 (V8) algorithm and most recent calibrations. More recently, data from the Nimbus 4 BUV instrument, which operated from 1970 - 1977, were also reprocessed using the V8 algorithm. As part of the V8 profile calibration, the Nimbus 7 and NOAA 9 (1993-1997 only) instrument calibrations have been adjusted to match the NOAA 11 calibration, which was established from comparisons with SSBUV shuttle flight data. Given the level of agreement between the data sets, we simply average the ozone values during periods of instrument overlap to produce the MOD profile data set. We use statistical time-series analysis of the MOD profile data set (1978-2003) to estimate the change in profile ozone due to changing stratospheric chlorine levels. The Nimbus 4 BUV data offer an opportunity to test the physical properties of our statistical model. We extrapolate our statistical model fit backwards in time and compare to the Nimbus 4 data. We compare the statistics of the residuals from the fit for the Nimbus 4 period to those obtained from the 1978-2003 period over which the statistical model coefficients were estimated.

  10. A Stochastic Model of Space-Time Variability of Mesoscale Rainfall: Statistics of Spatial Averages

    NASA Technical Reports Server (NTRS)

    Kundu, Prasun K.; Bell, Thomas L.

    2003-01-01

    A characteristic feature of rainfall statistics is that they depend on the space and time scales over which rain data are averaged. A previously developed spectral model of rain statistics that is designed to capture this property, predicts power law scaling behavior for the second moment statistics of area-averaged rain rate on the averaging length scale L as L right arrow 0. In the present work a more efficient method of estimating the model parameters is presented, and used to fit the model to the statistics of area-averaged rain rate derived from gridded radar precipitation data from TOGA COARE. Statistical properties of the data and the model predictions are compared over a wide range of averaging scales. An extension of the spectral model scaling relations to describe the dependence of the average fraction of grid boxes within an area containing nonzero rain (the "rainy area fraction") on the grid scale L is also explored.

  11. Statistical and data reporting guidelines for the European Journal of Cardio-Thoracic Surgery and the Interactive CardioVascular and Thoracic Surgery.

    PubMed

    Hickey, Graeme L; Dunning, Joel; Seifert, Burkhardt; Sodeck, Gottfried; Carr, Matthew J; Burger, Hans Ulrich; Beyersdorf, Friedhelm

    2015-08-01

    As part of the peer review process for the European Journal of Cardio-Thoracic Surgery (EJCTS) and the Interactive CardioVascular and Thoracic Surgery (ICVTS), a statistician reviews any manuscript that includes a statistical analysis. To facilitate authors considering submitting a manuscript and to make it clearer about the expectations of the statistical reviewers, we present up-to-date guidelines for authors on statistical and data reporting specifically in these journals. The number of statistical methods used in the cardiothoracic literature is vast, as are the ways in which data are presented. Therefore, we narrow the scope of these guidelines to cover the most common applications submitted to the EJCTS and ICVTS, focusing in particular on those that the statistical reviewers most frequently comment on. © The Author 2015. Published by Oxford University Press on behalf of the European Association for Cardio-Thoracic Surgery. All rights reserved.

  12. Routine hospital data – is it good enough for trials? An example using England’s Hospital Episode Statistics in the SHIFT trial of Family Therapy vs. Treatment as Usual in adolescents following self-harm

    PubMed Central

    Graham, Elizabeth; Cottrell, David; Farrin, Amanda

    2018-01-01

    Background: Use of routine data sources within clinical research is increasing and is endorsed by the National Institute for Health Research to increase trial efficiencies; however there is limited evidence for its use in clinical trials, especially in relation to self-harm. One source of routine data, Hospital Episode Statistics, is collated and distributed by NHS Digital and contains details of admissions, outpatient, and Accident and Emergency attendances provided periodically by English National Health Service hospitals. We explored the reliability and accuracy of Hospital Episode Statistics, compared to data collected directly from hospital records, to assess whether it would provide complete, accurate, and reliable means of acquiring hospital attendances for self-harm – the primary outcome for the SHIFT (Self-Harm Intervention: Family Therapy) trial evaluating Family Therapy for adolescents following self-harm. Methods: Participant identifiers were linked to Hospital Episode Statistics Accident and Emergency, and Admissions data, and episodes combined to describe participants’ complete hospital attendance. Attendance data were initially compared to data previously gathered by trial researchers from pre-identified hospitals. Final comparison was conducted of subsequent attendances collected through Hospital Episode Statistics and researcher follow-up. Consideration was given to linkage rates; number and proportion of attendances retrieved; reliability of Accident and Emergency, and Admissions data; percentage of self-harm episodes recorded and coded appropriately; and percentage of required data items retrieved. Results: Participants were first linked to Hospital Episode Statistics with an acceptable match rate of 95%, identifying a total of 341 complete hospital attendances, compared to 139 reported by the researchers at the time. More than double the proportion of Hospital Episode Statistics Accident and Emergency episodes could not be classified in relation to self-harm (75%) compared to 34.9% of admitted episodes, and of overall attendances, 18% were classified as self-harm related and 20% not related, while ambiguity or insufficient information meant 62% were unclassified. Of 39 self-harm-related attendances reported by the researchers, Hospital Episode Statistics identified 24 (62%) as self-harm related while 15 (38%) were unclassified. Based on final data received, 1490 complete hospital attendances were identified and comparison to researcher follow-up found Hospital Episode Statistics underestimated the number of self-harm attendances by 37.2% (95% confidence interval 32.6%–41.9%). Conclusion: Advantages of routine data collection via NHS Digital included the acquisition of more comprehensive and timely trial outcome data, identifying more than double the number of hospital attendances than researchers. Disadvantages included ambiguity in the classification of self-harm relatedness. Our resulting primary outcome data collection strategy used routine data to identify hospital attendances supplemented by targeted researcher data collection for attendances requiring further self-harm classification. PMID:29498542

  13. Statistical methods in personality assessment research.

    PubMed

    Schinka, J A; LaLone, L; Broeckel, J A

    1997-06-01

    Emerging models of personality structure and advances in the measurement of personality and psychopathology suggest that research in personality and personality assessment has entered a stage of advanced development, in this article we examine whether researchers in these areas have taken advantage of new and evolving statistical procedures. We conducted a review of articles published in the Journal of Personality, Assessment during the past 5 years. Of the 449 articles that included some form of data analysis, 12.7% used only descriptive statistics, most employed only univariate statistics, and fewer than 10% used multivariate methods of data analysis. We discuss the cost of using limited statistical methods, the possible reasons for the apparent reluctance to employ advanced statistical procedures, and potential solutions to this technical shortcoming.

  14. Measures of health sciences journal use: a comparison of vendor, link-resolver, and local citation statistics.

    PubMed

    De Groote, Sandra L; Blecic, Deborah D; Martin, Kristin

    2013-04-01

    Libraries require efficient and reliable methods to assess journal use. Vendors provide complete counts of articles retrieved from their platforms. However, if a journal is available on multiple platforms, several sets of statistics must be merged. Link-resolver reports merge data from all platforms into one report but only record partial use because users can access library subscriptions from other paths. Citation data are limited to publication use. Vendor, link-resolver, and local citation data were examined to determine correlation. Because link-resolver statistics are easy to obtain, the study library especially wanted to know if they correlate highly with the other measures. Vendor, link-resolver, and local citation statistics for the study institution were gathered for health sciences journals. Spearman rank-order correlation coefficients were calculated. There was a high positive correlation between all three data sets, with vendor data commonly showing the highest use. However, a small percentage of titles showed anomalous results. Link-resolver data correlate well with vendor and citation data, but due to anomalies, low link-resolver data would best be used to suggest titles for further evaluation using vendor data. Citation data may not be needed as it correlates highly with other measures.

  15. Using Carbon Emissions Data to "Heat Up" Descriptive Statistics

    ERIC Educational Resources Information Center

    Brooks, Robert

    2012-01-01

    This article illustrates using carbon emissions data in an introductory statistics assignment. The carbon emissions data has desirable characteristics including: choice of measure; skewness; and outliers. These complexities allow research and public policy debate to be introduced. (Contains 4 figures and 2 tables.)

  16. Data Warehousing: How To Make Your Statistics Meaningful.

    ERIC Educational Resources Information Center

    Flaherty, William

    2001-01-01

    Examines how one school district found a way to turn data collection from a disparate mountain of statistics into more useful information by using their Instructional Decision Support System. System software is explained as is how the district solved some data management challenges. (GR)

  17. A spatial scan statistic for survival data based on Weibull distribution.

    PubMed

    Bhatt, Vijaya; Tiwari, Neeraj

    2014-05-20

    The spatial scan statistic has been developed as a geographical cluster detection analysis tool for different types of data sets such as Bernoulli, Poisson, ordinal, normal and exponential. We propose a scan statistic for survival data based on Weibull distribution. It may also be used for other survival distributions, such as exponential, gamma, and log normal. The proposed method is applied on the survival data of tuberculosis patients for the years 2004-2005 in Nainital district of Uttarakhand, India. Simulation studies reveal that the proposed method performs well for different survival distribution functions. Copyright © 2013 John Wiley & Sons, Ltd.

  18. Data Anonymization that Leads to the Most Accurate Estimates of Statistical Characteristics: Fuzzy-Motivated Approach

    PubMed Central

    Xiang, G.; Ferson, S.; Ginzburg, L.; Longpré, L.; Mayorga, E.; Kosheleva, O.

    2013-01-01

    To preserve privacy, the original data points (with exact values) are replaced by boxes containing each (inaccessible) data point. This privacy-motivated uncertainty leads to uncertainty in the statistical characteristics computed based on this data. In a previous paper, we described how to minimize this uncertainty under the assumption that we use the same standard statistical estimates for the desired characteristics. In this paper, we show that we can further decrease the resulting uncertainty if we allow fuzzy-motivated weighted estimates, and we explain how to optimally select the corresponding weights. PMID:25187183

  19. Statistical methods used in the public health literature and implications for training of public health professionals

    PubMed Central

    Hayat, Matthew J.; Powell, Amanda; Johnson, Tessa; Cadwell, Betsy L.

    2017-01-01

    Statistical literacy and knowledge is needed to read and understand the public health literature. The purpose of this study was to quantify basic and advanced statistical methods used in public health research. We randomly sampled 216 published articles from seven top tier general public health journals. Studies were reviewed by two readers and a standardized data collection form completed for each article. Data were analyzed with descriptive statistics and frequency distributions. Results were summarized for statistical methods used in the literature, including descriptive and inferential statistics, modeling, advanced statistical techniques, and statistical software used. Approximately 81.9% of articles reported an observational study design and 93.1% of articles were substantively focused. Descriptive statistics in table or graphical form were reported in more than 95% of the articles, and statistical inference reported in more than 76% of the studies reviewed. These results reveal the types of statistical methods currently used in the public health literature. Although this study did not obtain information on what should be taught, information on statistical methods being used is useful for curriculum development in graduate health sciences education, as well as making informed decisions about continuing education for public health professionals. PMID:28591190

  20. Statistical methods used in the public health literature and implications for training of public health professionals.

    PubMed

    Hayat, Matthew J; Powell, Amanda; Johnson, Tessa; Cadwell, Betsy L

    2017-01-01

    Statistical literacy and knowledge is needed to read and understand the public health literature. The purpose of this study was to quantify basic and advanced statistical methods used in public health research. We randomly sampled 216 published articles from seven top tier general public health journals. Studies were reviewed by two readers and a standardized data collection form completed for each article. Data were analyzed with descriptive statistics and frequency distributions. Results were summarized for statistical methods used in the literature, including descriptive and inferential statistics, modeling, advanced statistical techniques, and statistical software used. Approximately 81.9% of articles reported an observational study design and 93.1% of articles were substantively focused. Descriptive statistics in table or graphical form were reported in more than 95% of the articles, and statistical inference reported in more than 76% of the studies reviewed. These results reveal the types of statistical methods currently used in the public health literature. Although this study did not obtain information on what should be taught, information on statistical methods being used is useful for curriculum development in graduate health sciences education, as well as making informed decisions about continuing education for public health professionals.

  1. Brightness temperature and attenuation statistics at 20.6 and 31.65 GHz

    NASA Technical Reports Server (NTRS)

    Westwater, Edgeworth R.; Falls, M. J.

    1991-01-01

    Attenuation and brightness temperature statistics at 20.6 and 31.65 GHz are analyzed for a year's worth of data. The data were collected in 1988 at Denver and Platteville, Colorado. The locations are separated by 49 km. Single-station statistics are derived for the entire year. Quality control procedures are discussed and examples of their application are given.

  2. Enhancing the Teaching of Statistics: Portfolio Theory, an Application of Statistics in Finance

    ERIC Educational Resources Information Center

    Christou, Nicolas

    2008-01-01

    In this paper we present an application of statistics using real stock market data. Most, if not all, students have some familiarity with the stock market (or at least they have heard about it) and therefore can understand the problem easily. It is the real data analysis that students find interesting. Here we explore the building of efficient…

  3. 7 CFR 1717.860 - Lien accommodations and subordinations under section 306E of the RE Act.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... worth will be based on the borrower's most recent financial and statistical report, the data in which... statistical reports (Form 7 for distribution borrowers and Form 12a for power supply borrowers) are subject to... borrower's financial and statistical report, the data in which shall not be more than 60 days old when the...

  4. The Effect on Prospective Teachers of the Learning Environment Supported by Dynamic Statistics Software

    ERIC Educational Resources Information Center

    Koparan, Timur

    2016-01-01

    In this study, the effect on the achievement and attitudes of prospective teachers is examined. With this aim ahead, achievement test, attitude scale for statistics and interviews were used as data collection tools. The achievement test comprises 8 problems based on statistical data, and the attitude scale comprises 13 Likert-type items. The study…

  5. Introduction to Statistics. Learning Packages in the Policy Sciences Series, PS-26. Revised Edition.

    ERIC Educational Resources Information Center

    Policy Studies Associates, Croton-on-Hudson, NY.

    The primary objective of this booklet is to introduce students to basic statistical skills that are useful in the analysis of public policy data. A few, selected statistical methods are presented, and theory is not emphasized. Chapter 1 provides instruction for using tables, bar graphs, bar graphs with grouped data, trend lines, pie diagrams,…

  6. Measuring an Effect Size from Dichotomized Data: Contrasted Results Whether Using a Correlation or an Odds Ratio

    ERIC Educational Resources Information Center

    Rousson, Valentin

    2014-01-01

    It is well known that dichotomizing continuous data has the effect to decrease statistical power when the goal is to test for a statistical association between two variables. Modern researchers however are focusing not only on statistical significance but also on an estimation of the "effect size" (i.e., the strength of association…

  7. Towards evidence-based computational statistics: lessons from clinical research on the role and design of real-data benchmark studies.

    PubMed

    Boulesteix, Anne-Laure; Wilson, Rory; Hapfelmeier, Alexander

    2017-09-09

    The goal of medical research is to develop interventions that are in some sense superior, with respect to patient outcome, to interventions currently in use. Similarly, the goal of research in methodological computational statistics is to develop data analysis tools that are themselves superior to the existing tools. The methodology of the evaluation of medical interventions continues to be discussed extensively in the literature and it is now well accepted that medicine should be at least partly "evidence-based". Although we statisticians are convinced of the importance of unbiased, well-thought-out study designs and evidence-based approaches in the context of clinical research, we tend to ignore these principles when designing our own studies for evaluating statistical methods in the context of our methodological research. In this paper, we draw an analogy between clinical trials and real-data-based benchmarking experiments in methodological statistical science, with datasets playing the role of patients and methods playing the role of medical interventions. Through this analogy, we suggest directions for improvement in the design and interpretation of studies which use real data to evaluate statistical methods, in particular with respect to dataset inclusion criteria and the reduction of various forms of bias. More generally, we discuss the concept of "evidence-based" statistical research, its limitations and its impact on the design and interpretation of real-data-based benchmark experiments. We suggest that benchmark studies-a method of assessment of statistical methods using real-world datasets-might benefit from adopting (some) concepts from evidence-based medicine towards the goal of more evidence-based statistical research.

  8. Implementation and evaluation of an efficient secure computation system using ‘R’ for healthcare statistics

    PubMed Central

    Chida, Koji; Morohashi, Gembu; Fuji, Hitoshi; Magata, Fumihiko; Fujimura, Akiko; Hamada, Koki; Ikarashi, Dai; Yamamoto, Ryuichi

    2014-01-01

    Background and objective While the secondary use of medical data has gained attention, its adoption has been constrained due to protection of patient privacy. Making medical data secure by de-identification can be problematic, especially when the data concerns rare diseases. We require rigorous security management measures. Materials and methods Using secure computation, an approach from cryptography, our system can compute various statistics over encrypted medical records without decrypting them. An issue of secure computation is that the amount of processing time required is immense. We implemented a system that securely computes healthcare statistics from the statistical computing software ‘R’ by effectively combining secret-sharing-based secure computation with original computation. Results Testing confirmed that our system could correctly complete computation of average and unbiased variance of approximately 50 000 records of dummy insurance claim data in a little over a second. Computation including conditional expressions and/or comparison of values, for example, t test and median, could also be correctly completed in several tens of seconds to a few minutes. Discussion If medical records are simply encrypted, the risk of leaks exists because decryption is usually required during statistical analysis. Our system possesses high-level security because medical records remain in encrypted state even during statistical analysis. Also, our system can securely compute some basic statistics with conditional expressions using ‘R’ that works interactively while secure computation protocols generally require a significant amount of processing time. Conclusions We propose a secure statistical analysis system using ‘R’ for medical data that effectively integrates secret-sharing-based secure computation and original computation. PMID:24763677

  9. Implementation and evaluation of an efficient secure computation system using 'R' for healthcare statistics.

    PubMed

    Chida, Koji; Morohashi, Gembu; Fuji, Hitoshi; Magata, Fumihiko; Fujimura, Akiko; Hamada, Koki; Ikarashi, Dai; Yamamoto, Ryuichi

    2014-10-01

    While the secondary use of medical data has gained attention, its adoption has been constrained due to protection of patient privacy. Making medical data secure by de-identification can be problematic, especially when the data concerns rare diseases. We require rigorous security management measures. Using secure computation, an approach from cryptography, our system can compute various statistics over encrypted medical records without decrypting them. An issue of secure computation is that the amount of processing time required is immense. We implemented a system that securely computes healthcare statistics from the statistical computing software 'R' by effectively combining secret-sharing-based secure computation with original computation. Testing confirmed that our system could correctly complete computation of average and unbiased variance of approximately 50,000 records of dummy insurance claim data in a little over a second. Computation including conditional expressions and/or comparison of values, for example, t test and median, could also be correctly completed in several tens of seconds to a few minutes. If medical records are simply encrypted, the risk of leaks exists because decryption is usually required during statistical analysis. Our system possesses high-level security because medical records remain in encrypted state even during statistical analysis. Also, our system can securely compute some basic statistics with conditional expressions using 'R' that works interactively while secure computation protocols generally require a significant amount of processing time. We propose a secure statistical analysis system using 'R' for medical data that effectively integrates secret-sharing-based secure computation and original computation. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  10. National Transportation Statistics (Annual Report, 1992)

    DOT National Transportation Integrated Search

    1992-06-01

    The "National Transportation Statistics" (NTS) Annual Report is a compendium of selected national transportation, and transportation-related energy data from a wide variety of government and private sources. The data illustrate transportation activit...

  11. 76 FR 34385 - Program Integrity: Gainful Employment-Debt Measures

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-06-13

    ... postsecondary education at a public institution. National Center for Education Statistics, 2004/2009 Beginning... reliable earnings information, including use of State data, survey data, or Bureau of Labor Statistics (BLS...

  12. Truth, Damn Truth, and Statistics

    ERIC Educational Resources Information Center

    Velleman, Paul F.

    2008-01-01

    Statisticians and Statistics teachers often have to push back against the popular impression that Statistics teaches how to lie with data. Those who believe incorrectly that Statistics is solely a branch of Mathematics (and thus algorithmic), often see the use of judgment in Statistics as evidence that we do indeed manipulate our results. In the…

  13. Statistical Literacy: Developing a Youth and Adult Education Statistical Project

    ERIC Educational Resources Information Center

    Conti, Keli Cristina; Lucchesi de Carvalho, Dione

    2014-01-01

    This article focuses on the notion of literacy--general and statistical--in the analysis of data from a fieldwork research project carried out as part of a master's degree that investigated the teaching and learning of statistics in adult education mathematics classes. We describe the statistical context of the project that involved the…

  14. Problematizing Statistical Literacy: An Intersection of Critical and Statistical Literacies

    ERIC Educational Resources Information Center

    Weiland, Travis

    2017-01-01

    In this paper, I problematize traditional notions of statistical literacy by juxtaposing it with critical literacy. At the school level statistical literacy is vitally important for students who are preparing to become citizens in modern societies that are increasingly shaped and driven by data based arguments. The teaching of statistics, which is…

  15. The Effect of a Student-Designed Data Collection: Project on Attitudes toward Statistics

    ERIC Educational Resources Information Center

    Carnell, Lisa J.

    2008-01-01

    Students often enter an introductory statistics class with less than positive attitudes about the subject. They tend to believe statistics is difficult and irrelevant to their lives. Observational evidence from previous studies suggests including projects in a statistics course may enhance students' attitudes toward statistics. This study examines…

  16. SOCR: Statistics Online Computational Resource

    PubMed Central

    Dinov, Ivo D.

    2011-01-01

    The need for hands-on computer laboratory experience in undergraduate and graduate statistics education has been firmly established in the past decade. As a result a number of attempts have been undertaken to develop novel approaches for problem-driven statistical thinking, data analysis and result interpretation. In this paper we describe an integrated educational web-based framework for: interactive distribution modeling, virtual online probability experimentation, statistical data analysis, visualization and integration. Following years of experience in statistical teaching at all college levels using established licensed statistical software packages, like STATA, S-PLUS, R, SPSS, SAS, Systat, etc., we have attempted to engineer a new statistics education environment, the Statistics Online Computational Resource (SOCR). This resource performs many of the standard types of statistical analysis, much like other classical tools. In addition, it is designed in a plug-in object-oriented architecture and is completely platform independent, web-based, interactive, extensible and secure. Over the past 4 years we have tested, fine-tuned and reanalyzed the SOCR framework in many of our undergraduate and graduate probability and statistics courses and have evidence that SOCR resources build student’s intuition and enhance their learning. PMID:21451741

  17. 78 FR 9055 - National Center for Health Statistics (NCHS), Classifications and Public Health Data Standards...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-02-07

    ... DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Disease Control and Prevention National Center for Health Statistics (NCHS), Classifications and Public Health Data Standards Staff, Announces the..., Medical Systems Administrator, Classifications and Public Health Data Standards Staff, NCHS, 3311 Toledo...

  18. Inference Control Mechanism for Statistical Database: Frequency-Imposed Data Distortions.

    ERIC Educational Resources Information Center

    Liew, Chong K.; And Others

    1985-01-01

    Introduces two data distortion methods (Frequency-Imposed Distortion, Frequency-Imposed Probability Distortion) and uses a Monte Carlo study to compare their performance with that of other distortion methods (Point Distortion, Probability Distortion). Indications that data generated by these two methods produce accurate statistics and protect…

  19. 2012 Sexually Transmitted Diseases Surveillance

    MedlinePlus

    ... Data Appendix Tables A1 - A4 STD Surveillance Case Definitions Contributors Related Links STD Home STD Data & Statistics NCHHSTP Atlas Interactive STD Data - 1996-2013 STD Health Equity HIV/AIDS Surveillance & Statistics Follow STD STD on Twitter STD on Facebook File Formats Help: How do I view different ...

  20. Secondary Analysis of Qualitative Data.

    ERIC Educational Resources Information Center

    Turner, Paul D.

    The reanalysis of data to answer the original research question with better statistical techniques or to answer new questions with old data is not uncommon in quantitative studies. Meta analysis and research syntheses have increased with the increase in research using similar statistical analyses, refinements of analytical techniques, and the…

  1. "Plateau"-related summary statistics are uninformative for comparing working memory models.

    PubMed

    van den Berg, Ronald; Ma, Wei Ji

    2014-10-01

    Performance on visual working memory tasks decreases as more items need to be remembered. Over the past decade, a debate has unfolded between proponents of slot models and slotless models of this phenomenon (Ma, Husain, Bays (Nature Neuroscience 17, 347-356, 2014). Zhang and Luck (Nature 453, (7192), 233-235, 2008) and Anderson, Vogel, and Awh (Attention, Perception, Psychophys 74, (5), 891-910, 2011) noticed that as more items need to be remembered, "memory noise" seems to first increase and then reach a "stable plateau." They argued that three summary statistics characterizing this plateau are consistent with slot models, but not with slotless models. Here, we assess the validity of their methods. We generated synthetic data both from a leading slot model and from a recent slotless model and quantified model evidence using log Bayes factors. We found that the summary statistics provided at most 0.15 % of the expected model evidence in the raw data. In a model recovery analysis, a total of more than a million trials were required to achieve 99 % correct recovery when models were compared on the basis of summary statistics, whereas fewer than 1,000 trials were sufficient when raw data were used. Therefore, at realistic numbers of trials, plateau-related summary statistics are highly unreliable for model comparison. Applying the same analyses to subject data from Anderson et al. (Attention, Perception, Psychophys 74, (5), 891-910, 2011), we found that the evidence in the summary statistics was at most 0.12 % of the evidence in the raw data and far too weak to warrant any conclusions. The evidence in the raw data, in fact, strongly favored the slotless model. These findings call into question claims about working memory that are based on summary statistics.

  2. 17 CFR Appendix A to Part 145 - Compilation of Commission Records Available to the Public

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... photographs. (10) Statistical data concerning the Commission's budget. (11) Statistical data concerning... applicant's legal status and governance structure, including governance fitness information, and any other...

  3. STATISTICAL SAMPLING AND DATA ANALYSIS

    EPA Science Inventory

    Research is being conducted to develop approaches to improve soil and sediment sampling techniques, measurement design and geostatistics, and data analysis via chemometric, environmetric, and robust statistical methods. Improvements in sampling contaminated soil and other hetero...

  4. Comparisons of non-Gaussian statistical models in DNA methylation analysis.

    PubMed

    Ma, Zhanyu; Teschendorff, Andrew E; Yu, Hong; Taghia, Jalil; Guo, Jun

    2014-06-16

    As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance.

  5. Comparisons of Non-Gaussian Statistical Models in DNA Methylation Analysis

    PubMed Central

    Ma, Zhanyu; Teschendorff, Andrew E.; Yu, Hong; Taghia, Jalil; Guo, Jun

    2014-01-01

    As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance. PMID:24937687

  6. Statistical wind analysis for near-space applications

    NASA Astrophysics Data System (ADS)

    Roney, Jason A.

    2007-09-01

    Statistical wind models were developed based on the existing observational wind data for near-space altitudes between 60 000 and 100 000 ft (18 30 km) above ground level (AGL) at two locations, Akon, OH, USA, and White Sands, NM, USA. These two sites are envisioned as playing a crucial role in the first flights of high-altitude airships. The analysis shown in this paper has not been previously applied to this region of the stratosphere for such an application. Standard statistics were compiled for these data such as mean, median, maximum wind speed, and standard deviation, and the data were modeled with Weibull distributions. These statistics indicated, on a yearly average, there is a lull or a “knee” in the wind between 65 000 and 72 000 ft AGL (20 22 km). From the standard statistics, trends at both locations indicated substantial seasonal variation in the mean wind speed at these heights. The yearly and monthly statistical modeling indicated that Weibull distributions were a reasonable model for the data. Forecasts and hindcasts were done by using a Weibull model based on 2004 data and comparing the model with the 2003 and 2005 data. The 2004 distribution was also a reasonable model for these years. Lastly, the Weibull distribution and cumulative function were used to predict the 50%, 95%, and 99% winds, which are directly related to the expected power requirements of a near-space station-keeping airship. These values indicated that using only the standard deviation of the mean may underestimate the operational conditions.

  7. Statistically significant performance results of a mine detector and fusion algorithm from an x-band high-resolution SAR

    NASA Astrophysics Data System (ADS)

    Williams, Arnold C.; Pachowicz, Peter W.

    2004-09-01

    Current mine detection research indicates that no single sensor or single look from a sensor will detect mines/minefields in a real-time manner at a performance level suitable for a forward maneuver unit. Hence, the integrated development of detectors and fusion algorithms are of primary importance. A problem in this development process has been the evaluation of these algorithms with relatively small data sets, leading to anecdotal and frequently over trained results. These anecdotal results are often unreliable and conflicting among various sensors and algorithms. Consequently, the physical phenomena that ought to be exploited and the performance benefits of this exploitation are often ambiguous. The Army RDECOM CERDEC Night Vision Laboratory and Electron Sensors Directorate has collected large amounts of multisensor data such that statistically significant evaluations of detection and fusion algorithms can be obtained. Even with these large data sets care must be taken in algorithm design and data processing to achieve statistically significant performance results for combined detectors and fusion algorithms. This paper discusses statistically significant detection and combined multilook fusion results for the Ellipse Detector (ED) and the Piecewise Level Fusion Algorithm (PLFA). These statistically significant performance results are characterized by ROC curves that have been obtained through processing this multilook data for the high resolution SAR data of the Veridian X-Band radar. We discuss the implications of these results on mine detection and the importance of statistical significance, sample size, ground truth, and algorithm design in performance evaluation.

  8. A Stochastic Fractional Dynamics Model of Space-time Variability of Rain

    NASA Technical Reports Server (NTRS)

    Kundu, Prasun K.; Travis, James E.

    2013-01-01

    Rainfall varies in space and time in a highly irregular manner and is described naturally in terms of a stochastic process. A characteristic feature of rainfall statistics is that they depend strongly on the space-time scales over which rain data are averaged. A spectral model of precipitation has been developed based on a stochastic differential equation of fractional order for the point rain rate, that allows a concise description of the second moment statistics of rain at any prescribed space-time averaging scale. The model is thus capable of providing a unified description of the statistics of both radar and rain gauge data. The underlying dynamical equation can be expressed in terms of space-time derivatives of fractional orders that are adjusted together with other model parameters to fit the data. The form of the resulting spectrum gives the model adequate flexibility to capture the subtle interplay between the spatial and temporal scales of variability of rain but strongly constrains the predicted statistical behavior as a function of the averaging length and times scales. We test the model with radar and gauge data collected contemporaneously at the NASA TRMM ground validation sites located near Melbourne, Florida and in Kwajalein Atoll, Marshall Islands in the tropical Pacific. We estimate the parameters by tuning them to the second moment statistics of radar data. The model predictions are then found to fit the second moment statistics of the gauge data reasonably well without any further adjustment.

  9. Multisample adjusted U-statistics that account for confounding covariates.

    PubMed

    Satten, Glen A; Kong, Maiying; Datta, Somnath

    2018-06-19

    Multisample U-statistics encompass a wide class of test statistics that allow the comparison of 2 or more distributions. U-statistics are especially powerful because they can be applied to both numeric and nonnumeric data, eg, ordinal and categorical data where a pairwise similarity or distance-like measure between categories is available. However, when comparing the distribution of a variable across 2 or more groups, observed differences may be due to confounding covariates. For example, in a case-control study, the distribution of exposure in cases may differ from that in controls entirely because of variables that are related to both exposure and case status and are distributed differently among case and control participants. We propose to use individually reweighted data (ie, using the stratification score for retrospective data or the propensity score for prospective data) to construct adjusted U-statistics that can test the equality of distributions across 2 (or more) groups in the presence of confounding covariates. Asymptotic normality of our adjusted U-statistics is established and a closed form expression of their asymptotic variance is presented. The utility of our approach is demonstrated through simulation studies, as well as in an analysis of data from a case-control study conducted among African-Americans, comparing whether the similarity in haplotypes (ie, sets of adjacent genetic loci inherited from the same parent) occurring in a case and a control participant differs from the similarity in haplotypes occurring in 2 control participants. Copyright © 2018 John Wiley & Sons, Ltd.

  10. Using meta-regression models to systematically evaluate data in the published literature: relative contributions of agricultural drift, para-occupational, and residential use exposure pathways to house dust pesticide concentrations

    EPA Science Inventory

    Background: Data reported in the published literature have been used qualitatively to aid exposure assessment activities in epidemiologic studies. Analyzing these data in computational models presents statistical challenges because these data are often reported as summary statist...

  11. Using Internet Search Data to Produce State-Level Measures: The Case of Tea Party Mobilization

    ERIC Educational Resources Information Center

    DiGrazia, Joseph

    2017-01-01

    This study proposes using Internet search data from search engines like Google to produce state-level metrics that are useful in social science research. Generally, state-level research relies on demographic statistics, official statistics produced by government agencies, or aggregated survey data. However, each of these data sources has serious…

  12. National Institute of Statistical Sciences Data Confidentiality Technical Panel: Final Report. NCES 2011-608

    ERIC Educational Resources Information Center

    Karr, Alan

    2011-01-01

    NCES asked the National Institute of Statistical Sciences (NISS) to convene a technical panel of survey and policy experts to examine the NCES current and planned data dissemination strategies for confidential data with respect to: mandates and directives that NCES make data available; current and prospective technologies for protecting and…

  13. U.S. Timber Production, Trade, Consumption, and Price Statistics, 1965-2013

    Treesearch

    James L. Howard; Kwameka C. Jones

    2016-01-01

    This report presents annual data but is published every 2 years. The data present current and historical information on production, trade, consumption, and prices of timber products in the United States. The report focuses on national statistics but includes some data for individual states and regions and for Canada. The data were collected from industry trade...

  14. Multivariate statistical analysis software technologies for astrophysical research involving large data bases

    NASA Technical Reports Server (NTRS)

    Djorgovski, George

    1993-01-01

    The existing and forthcoming data bases from NASA missions contain an abundance of information whose complexity cannot be efficiently tapped with simple statistical techniques. Powerful multivariate statistical methods already exist which can be used to harness much of the richness of these data. Automatic classification techniques have been developed to solve the problem of identifying known types of objects in multiparameter data sets, in addition to leading to the discovery of new physical phenomena and classes of objects. We propose an exploratory study and integration of promising techniques in the development of a general and modular classification/analysis system for very large data bases, which would enhance and optimize data management and the use of human research resource.

  15. Linear regression models and k-means clustering for statistical analysis of fNIRS data.

    PubMed

    Bonomini, Viola; Zucchelli, Lucia; Re, Rebecca; Ieva, Francesca; Spinelli, Lorenzo; Contini, Davide; Paganoni, Anna; Torricelli, Alessandro

    2015-02-01

    We propose a new algorithm, based on a linear regression model, to statistically estimate the hemodynamic activations in fNIRS data sets. The main concern guiding the algorithm development was the minimization of assumptions and approximations made on the data set for the application of statistical tests. Further, we propose a K-means method to cluster fNIRS data (i.e. channels) as activated or not activated. The methods were validated both on simulated and in vivo fNIRS data. A time domain (TD) fNIRS technique was preferred because of its high performances in discriminating cortical activation and superficial physiological changes. However, the proposed method is also applicable to continuous wave or frequency domain fNIRS data sets.

  16. Linear regression models and k-means clustering for statistical analysis of fNIRS data

    PubMed Central

    Bonomini, Viola; Zucchelli, Lucia; Re, Rebecca; Ieva, Francesca; Spinelli, Lorenzo; Contini, Davide; Paganoni, Anna; Torricelli, Alessandro

    2015-01-01

    We propose a new algorithm, based on a linear regression model, to statistically estimate the hemodynamic activations in fNIRS data sets. The main concern guiding the algorithm development was the minimization of assumptions and approximations made on the data set for the application of statistical tests. Further, we propose a K-means method to cluster fNIRS data (i.e. channels) as activated or not activated. The methods were validated both on simulated and in vivo fNIRS data. A time domain (TD) fNIRS technique was preferred because of its high performances in discriminating cortical activation and superficial physiological changes. However, the proposed method is also applicable to continuous wave or frequency domain fNIRS data sets. PMID:25780751

  17. Multivariate statistical analysis software technologies for astrophysical research involving large data bases

    NASA Technical Reports Server (NTRS)

    Djorgovski, Stanislav

    1992-01-01

    The existing and forthcoming data bases from NASA missions contain an abundance of information whose complexity cannot be efficiently tapped with simple statistical techniques. Powerful multivariate statistical methods already exist which can be used to harness much of the richness of these data. Automatic classification techniques have been developed to solve the problem of identifying known types of objects in multi parameter data sets, in addition to leading to the discovery of new physical phenomena and classes of objects. We propose an exploratory study and integration of promising techniques in the development of a general and modular classification/analysis system for very large data bases, which would enhance and optimize data management and the use of human research resources.

  18. The statistics of identifying differentially expressed genes in Expresso and TM4: a comparison

    PubMed Central

    Sioson, Allan A; Mane, Shrinivasrao P; Li, Pinghua; Sha, Wei; Heath, Lenwood S; Bohnert, Hans J; Grene, Ruth

    2006-01-01

    Background Analysis of DNA microarray data takes as input spot intensity measurements from scanner software and returns differential expression of genes between two conditions, together with a statistical significance assessment. This process typically consists of two steps: data normalization and identification of differentially expressed genes through statistical analysis. The Expresso microarray experiment management system implements these steps with a two-stage, log-linear ANOVA mixed model technique, tailored to individual experimental designs. The complement of tools in TM4, on the other hand, is based on a number of preset design choices that limit its flexibility. In the TM4 microarray analysis suite, normalization, filter, and analysis methods form an analysis pipeline. TM4 computes integrated intensity values (IIV) from the average intensities and spot pixel counts returned by the scanner software as input to its normalization steps. By contrast, Expresso can use either IIV data or median intensity values (MIV). Here, we compare Expresso and TM4 analysis of two experiments and assess the results against qRT-PCR data. Results The Expresso analysis using MIV data consistently identifies more genes as differentially expressed, when compared to Expresso analysis with IIV data. The typical TM4 normalization and filtering pipeline corrects systematic intensity-specific bias on a per microarray basis. Subsequent statistical analysis with Expresso or a TM4 t-test can effectively identify differentially expressed genes. The best agreement with qRT-PCR data is obtained through the use of Expresso analysis and MIV data. Conclusion The results of this research are of practical value to biologists who analyze microarray data sets. The TM4 normalization and filtering pipeline corrects microarray-specific systematic bias and complements the normalization stage in Expresso analysis. The results of Expresso using MIV data have the best agreement with qRT-PCR results. In one experiment, MIV is a better choice than IIV as input to data normalization and statistical analysis methods, as it yields as greater number of statistically significant differentially expressed genes; TM4 does not support the choice of MIV input data. Overall, the more flexible and extensive statistical models of Expresso achieve more accurate analytical results, when judged by the yardstick of qRT-PCR data, in the context of an experimental design of modest complexity. PMID:16626497

  19. The impact of mother's literacy on child dental caries: Individual data or aggregate data analysis?

    PubMed

    Haghdoost, Ali-Akbar; Hessari, Hossein; Baneshi, Mohammad Reza; Rad, Maryam; Shahravan, Arash

    2017-01-01

    To evaluate the impact of mother's literacy on child dental caries based on a national oral health survey in Iran and to investigate the possibility of ecological fallacy in aggregate data analysis. Existing data were from second national oral health survey that was carried out in 2004, which including 8725 6 years old participants. The association of mother's literacy with caries occurrence (DMF (Decayed, Missing, Filling) total score >0) of her child was assessed using individual data by logistic regression model. Then the association of the percentages of mother's literacy and the percentages of decayed teeth in each 30 provinces of Iran was assessed using aggregated data retrieved from the data of second national oral health survey of Iran and alternatively from census of "Statistical Center of Iran" using linear regression model. The significance level was set at 0.05 for all analysis. Individual data analysis showed a statistically significant association between mother's literacy and decayed teeth of children ( P = 0.02, odds ratio = 0.83). There were not statistical significant association between mother's literacy and child dental caries in aggregate data analysis of oral health survey ( P = 0.79, B = 0.03) and census of "Statistical Center of Statistics" ( P = 0.60, B = 0.14). Literate mothers have a preventive effect on occurring dental caries of children. According to the high percentage of illiterate parents in Iran, it's logical to consider suitable methods of oral health education which do not need reading or writing. Aggregate data analysis and individual data analysis had completely different results in this study.

  20. The suitability of using death certificates as a data source for cancer mortality assessment in Turkey

    PubMed Central

    Ulus, Tumer; Yurtseven, Eray; Cavdar, Sabanur; Erginoz, Ethem; Erdogan, M. Sarper

    2012-01-01

    Aim To compare the quality of the 2008 cancer mortality data of the Istanbul Directorate of Cemeteries (IDC) with the 2008 data of International Agency for Research on Cancer (IARC) and Turkish Statistical Institute (TUIK), and discuss the suitability of using this databank for estimations of cancer mortality in the future. Methods We used 2008 and 2010 death records of the IDC and compared it to TUIK and IARC data. Results According to the WHO statistics, in Turkey in 2008 there were 67 255 estimated cancer deaths. As the population of Turkey was 71 517 100, the cancer mortality rate was 9.4 per 10 000. According to the IDC statistics, the cancer mortality rate in Istanbul in 2008 was 5.97 per 10 000. Conclusion IDC estimates were higher than WHO estimates probably because WHO bases its estimates on a sample group and because of the restrictions of IDC data collection method. Death certificates could be a reliable and accurate data source for mortality statistics if the problems of data collection are solved. PMID:23100210

  1. Appropriate Statistics for Determining Chance-Removed Interpractitioner Agreement.

    PubMed

    Popplewell, Michael; Reizes, John; Zaslawski, Chris

    2018-05-31

    Fleiss' Kappa (FK) has been commonly, but incorrectly, employed as the "standard" for evaluating chance-removed inter-rater agreement with ordinal data. This practice may lead to misleading conclusions in inter-rater agreement research. An example is presented that demonstrates the conditions where FK produces inappropriate results, compared with Gwet's AC2, which is proposed as a more appropriate statistic. A novel format for recording a Chinese Medical (CM) diagnoses, called the Diagnostic System of Oriental Medicine (DSOM), was used to record and compare patient diagnostic data, which, unlike the contemporary CM diagnostic format, allows agreement by chance to be considered when evaluating patient data obtained with unrestricted diagnostic options available to diagnosticians. Five CM practitioners diagnosed 42 subjects drawn from an open population. Subjects' diagnoses were recorded using the DSOM format. All the available data were initially used to evaluate agreement. Then, the subjects were sorted into three groups to demonstrate the effects of differing data marginality on the calculated chance-removed agreement. Agreement between the practitioners for each subject was evaluated with linearly weighted simple agreement, FK and Gwet's AC2. In all cases, overall agreement was much lower with FK than Gwet's AC2. Larger differences occurred when the data were more free marginal. Inter-rater agreement determined with FK statistics is unlikely to be correct unless it can be shown that the data from which agreement is determined are, in fact, fixed marginal. It follows that results obtained on agreement between practitioners with FK are probably incorrect. It is shown that inter-rater agreement evaluated with AC2 statistic is an appropriate measure when fixed marginal data are neither expected nor guaranteed. The AC2 statistic should be used as the standard statistical approach for determining agreement between practitioners.

  2. Evaluation of PDA Technical Report No 33. Statistical Testing Recommendations for a Rapid Microbiological Method Case Study.

    PubMed

    Murphy, Thomas; Schwedock, Julie; Nguyen, Kham; Mills, Anna; Jones, David

    2015-01-01

    New recommendations for the validation of rapid microbiological methods have been included in the revised Technical Report 33 release from the PDA. The changes include a more comprehensive review of the statistical methods to be used to analyze data obtained during validation. This case study applies those statistical methods to accuracy, precision, ruggedness, and equivalence data obtained using a rapid microbiological methods system being evaluated for water bioburden testing. Results presented demonstrate that the statistical methods described in the PDA Technical Report 33 chapter can all be successfully applied to the rapid microbiological method data sets and gave the same interpretation for equivalence to the standard method. The rapid microbiological method was in general able to pass the requirements of PDA Technical Report 33, though the study shows that there can be occasional outlying results and that caution should be used when applying statistical methods to low average colony-forming unit values. Prior to use in a quality-controlled environment, any new method or technology has to be shown to work as designed by the manufacturer for the purpose required. For new rapid microbiological methods that detect and enumerate contaminating microorganisms, additional recommendations have been provided in the revised PDA Technical Report No. 33. The changes include a more comprehensive review of the statistical methods to be used to analyze data obtained during validation. This paper applies those statistical methods to analyze accuracy, precision, ruggedness, and equivalence data obtained using a rapid microbiological method system being validated for water bioburden testing. The case study demonstrates that the statistical methods described in the PDA Technical Report No. 33 chapter can be successfully applied to rapid microbiological method data sets and give the same comparability results for similarity or difference as the standard method. © PDA, Inc. 2015.

  3. Using assemblage data in ecological indicators: A comparison and evaluation of commonly available statistical tools

    USGS Publications Warehouse

    Smith, Joseph M.; Mather, Martha E.

    2012-01-01

    Ecological indicators are science-based tools used to assess how human activities have impacted environmental resources. For monitoring and environmental assessment, existing species assemblage data can be used to make these comparisons through time or across sites. An impediment to using assemblage data, however, is that these data are complex and need to be simplified in an ecologically meaningful way. Because multivariate statistics are mathematical relationships, statistical groupings may not make ecological sense and will not have utility as indicators. Our goal was to define a process to select defensible and ecologically interpretable statistical simplifications of assemblage data in which researchers and managers can have confidence. For this, we chose a suite of statistical methods, compared the groupings that resulted from these analyses, identified convergence among groupings, then we interpreted the groupings using species and ecological guilds. When we tested this approach using a statewide stream fish dataset, not all statistical methods worked equally well. For our dataset, logistic regression (Log), detrended correspondence analysis (DCA), cluster analysis (CL), and non-metric multidimensional scaling (NMDS) provided consistent, simplified output. Specifically, the Log, DCA, CL-1, and NMDS-1 groupings were ≥60% similar to each other, overlapped with the fluvial-specialist ecological guild, and contained a common subset of species. Groupings based on number of species (e.g., Log, DCA, CL and NMDS) outperformed groupings based on abundance [e.g., principal components analysis (PCA) and Poisson regression]. Although the specific methods that worked on our test dataset have generality, here we are advocating a process (e.g., identifying convergent groupings with redundant species composition that are ecologically interpretable) rather than the automatic use of any single statistical tool. We summarize this process in step-by-step guidance for the future use of these commonly available ecological and statistical methods in preparing assemblage data for use in ecological indicators.

  4. Statistical summaries of selected Iowa streamflow data through September 2013

    USGS Publications Warehouse

    Eash, David A.; O'Shea, Padraic S.; Weber, Jared R.; Nguyen, Kevin T.; Montgomery, Nicholas L.; Simonson, Adrian J.

    2016-01-04

    Statistical summaries of streamflow data collected at 184 streamgages in Iowa are presented in this report. All streamgages included for analysis have at least 10 years of continuous record collected before or through September 2013. This report is an update to two previously published reports that presented statistical summaries of selected Iowa streamflow data through September 1988 and September 1996. The statistical summaries include (1) monthly and annual flow durations, (2) annual exceedance probabilities of instantaneous peak discharges (flood frequencies), (3) annual exceedance probabilities of high discharges, and (4) annual nonexceedance probabilities of low discharges and seasonal low discharges. Also presented for each streamgage are graphs of the annual mean discharges, mean annual mean discharges, 50-percent annual flow-duration discharges (median flows), harmonic mean flows, mean daily mean discharges, and flow-duration curves. Two sets of statistical summaries are presented for each streamgage, which include (1) long-term statistics for the entire period of streamflow record and (2) recent-term statistics for or during the 30-year period of record from 1984 to 2013. The recent-term statistics are only calculated for streamgages with streamflow records pre-dating the 1984 water year and with at least 10 years of record during 1984–2013. The streamflow statistics in this report are not adjusted for the effects of water use; although some of this water is used consumptively, most of it is returned to the streams.

  5. Potential errors and misuse of statistics in studies on leakage in endodontics.

    PubMed

    Lucena, C; Lopez, J M; Pulgar, R; Abalos, C; Valderrama, M J

    2013-04-01

    To assess the quality of the statistical methodology used in studies of leakage in Endodontics, and to compare the results found using appropriate versus inappropriate inferential statistical methods. The search strategy used the descriptors 'root filling' 'microleakage', 'dye penetration', 'dye leakage', 'polymicrobial leakage' and 'fluid filtration' for the time interval 2001-2010 in journals within the categories 'Dentistry, Oral Surgery and Medicine' and 'Materials Science, Biomaterials' of the Journal Citation Report. All retrieved articles were reviewed to find potential pitfalls in statistical methodology that may be encountered during study design, data management or data analysis. The database included 209 papers. In all the studies reviewed, the statistical methods used were appropriate for the category attributed to the outcome variable, but in 41% of the cases, the chi-square test or parametric methods were inappropriately selected subsequently. In 2% of the papers, no statistical test was used. In 99% of cases, a statistically 'significant' or 'not significant' effect was reported as a main finding, whilst only 1% also presented an estimation of the magnitude of the effect. When the appropriate statistical methods were applied in the studies with originally inappropriate data analysis, the conclusions changed in 19% of the cases. Statistical deficiencies in leakage studies may affect their results and interpretation and might be one of the reasons for the poor agreement amongst the reported findings. Therefore, more effort should be made to standardize statistical methodology. © 2012 International Endodontic Journal.

  6. Evidence-based orthodontics. Current statistical trends in published articles in one journal.

    PubMed

    Law, Scott V; Chudasama, Dipak N; Rinchuse, Donald J

    2010-09-01

    To ascertain the number, type, and overall usage of statistics in American Journal of Orthodontics and Dentofacial (AJODO) articles for 2008. These data were then compared to data from three previous years: 1975, 1985, and 2003. The frequency and distribution of statistics used in the AJODO original articles for 2008 were dichotomized into those using statistics and those not using statistics. Statistical procedures were then broadly divided into descriptive statistics (mean, standard deviation, range, percentage) and inferential statistics (t-test, analysis of variance). Descriptive statistics were used to make comparisons. In 1975, 1985, 2003, and 2008, AJODO published 72, 87, 134, and 141 original articles, respectively. The percentage of original articles using statistics was 43.1% in 1975, 75.9% in 1985, 94.0% in 2003, and 92.9% in 2008; original articles using statistics stayed relatively the same from 2003 to 2008, with only a small 1.1% decrease. The percentage of articles using inferential statistical analyses was 23.7% in 1975, 74.2% in 1985, 92.9% in 2003, and 84.4% in 2008. Comparing AJODO publications in 2003 and 2008, there was an 8.5% increase in the use of descriptive articles (from 7.1% to 15.6%), and there was an 8.5% decrease in articles using inferential statistics (from 92.9% to 84.4%).

  7. Differences in Performance Among Test Statistics for Assessing Phylogenomic Model Adequacy.

    PubMed

    Duchêne, David A; Duchêne, Sebastian; Ho, Simon Y W

    2018-05-18

    Statistical phylogenetic analyses of genomic data depend on models of nucleotide or amino acid substitution. The adequacy of these substitution models can be assessed using a number of test statistics, allowing the model to be rejected when it is found to provide a poor description of the evolutionary process. A potentially valuable use of model-adequacy test statistics is to identify when data sets are likely to produce unreliable phylogenetic estimates, but their differences in performance are rarely explored. We performed a comprehensive simulation study to identify test statistics that are sensitive to some of the most commonly cited sources of phylogenetic estimation error. Our results show that, for many test statistics, traditional thresholds for assessing model adequacy can fail to reject the model when the phylogenetic inferences are inaccurate and imprecise. This is particularly problematic when analysing loci that have few variable informative sites. We propose new thresholds for assessing substitution model adequacy and demonstrate their effectiveness in analyses of three phylogenomic data sets. These thresholds lead to frequent rejection of the model for loci that yield topological inferences that are imprecise and are likely to be inaccurate. We also propose the use of a summary statistic that provides a practical assessment of overall model adequacy. Our approach offers a promising means of enhancing model choice in genome-scale data sets, potentially leading to improvements in the reliability of phylogenomic inference.

  8. Hunting statistics: what data for what use? An account of an international workshop

    USGS Publications Warehouse

    Nichols, J.D.; Lancia, R.A.; Lebreton, J.D.

    2001-01-01

    Hunting interacts with the underlying dynamics of game species in several different ways and is, at the same time, a source of valuable information not easily obtained from populations that are not subjected to hunting. Specific questions, including the sustainability of hunting activities, can be addressed using hunting statistics. Such investigations will frequently require that hunting statistics be combined with data from other sources of population-level information. Such reflections served as a basis for the meeting, ?Hunting Statistics: What Data for What Use,? held on January 15-18, 2001 in Saint-Benoist, France. We review here the 20 talks held during the workshop and the contribution of hunting statistics to our knowledge of the population dynamics of game species. Three specific topics (adaptive management, catch-effort models, and dynamics of exploited populations) were highlighted as important themes and are more extensively presented as boxes.

  9. Generic Feature Selection with Short Fat Data

    PubMed Central

    Clarke, B.; Chu, J.-H.

    2014-01-01

    SUMMARY Consider a regression problem in which there are many more explanatory variables than data points, i.e., p ≫ n. Essentially, without reducing the number of variables inference is impossible. So, we group the p explanatory variables into blocks by clustering, evaluate statistics on the blocks and then regress the response on these statistics under a penalized error criterion to obtain estimates of the regression coefficients. We examine the performance of this approach for a variety of choices of n, p, classes of statistics, clustering algorithms, penalty terms, and data types. When n is not large, the discrimination over number of statistics is weak, but computations suggest regressing on approximately [n/K] statistics where K is the number of blocks formed by a clustering algorithm. Small deviations from this are observed when the blocks of variables are of very different sizes. Larger deviations are observed when the penalty term is an Lq norm with high enough q. PMID:25346546

  10. Are Assumptions of Well-Known Statistical Techniques Checked, and Why (Not)?

    PubMed Central

    Hoekstra, Rink; Kiers, Henk A. L.; Johnson, Addie

    2012-01-01

    A valid interpretation of most statistical techniques requires that one or more assumptions be met. In published articles, however, little information tends to be reported on whether the data satisfy the assumptions underlying the statistical techniques used. This could be due to self-selection: Only manuscripts with data fulfilling the assumptions are submitted. Another explanation could be that violations of assumptions are rarely checked for in the first place. We studied whether and how 30 researchers checked fictitious data for violations of assumptions in their own working environment. Participants were asked to analyze the data as they would their own data, for which often used and well-known techniques such as the t-procedure, ANOVA and regression (or non-parametric alternatives) were required. It was found that the assumptions of the techniques were rarely checked, and that if they were, it was regularly by means of a statistical test. Interviews afterward revealed a general lack of knowledge about assumptions, the robustness of the techniques with regards to the assumptions, and how (or whether) assumptions should be checked. These data suggest that checking for violations of assumptions is not a well-considered choice, and that the use of statistics can be described as opportunistic. PMID:22593746

  11. 75 FR 39265 - National Center for Health Statistics (NCHS), Classifications and Public Health Data Standards...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-07-08

    ... DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Disease Control and Prevention National Center for Health Statistics (NCHS), Classifications and Public Health Data Standards Staff, Announces the... Prevention, Classifications and Public Health Data Standards, 3311 Toledo Road, Room 2337, Hyattsville, MD...

  12. 78 FR 53148 - National Center for Health Statistics (NCHS), Classifications and Public Health Data Standards...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-08-28

    ... DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Disease Control and Prevention National Center for Health Statistics (NCHS), Classifications and Public Health Data Standards Staff, Announces the... Administrator, Classifications and Public Health Data Standards Staff, NCHS, 3311 Toledo Road, Room 2337...

  13. 75 FR 56549 - National Center for Health Statistics (NCHS), Classifications and Public Health Data Standards...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-09-16

    ... DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Disease Control and Prevention National Center for Health Statistics (NCHS), Classifications and Public Health Data Standards Staff, Announces the... Public Health Data Standards Staff, NCHS, 3311 Toledo Road, Room 2337, Hyattsville, Maryland 20782, e...

  14. World Population: Facts in Focus. World Population Data Sheet Workbook. Population Learning Series.

    ERIC Educational Resources Information Center

    Crews, Kimberly A.

    This workbook teaches population analysis using world population statistics. To complete the four student activity sheets, the students refer to the included "1988 World Population Data Sheet" which lists nations' statistical data that includes population totals, projected population, birth and death rates, fertility levels, and the…

  15. Secondary Analysis of National Longitudinal Transition Study 2 Data

    ERIC Educational Resources Information Center

    Hicks, Tyler A.; Knollman, Greg A.

    2015-01-01

    This review examines published secondary analyses of National Longitudinal Transition Study 2 (NLTS2) data, with a primary focus upon statistical objectives, paradigms, inferences, and methods. Its primary purpose was to determine which statistical techniques have been common in secondary analyses of NLTS2 data. The review begins with an…

  16. 76 FR 12996 - Data Users Advisory Committee, Notice of Meeting and Agenda

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-03-09

    ... DEPARTMENT OF LABOR Bureau of Labor Statistics Data Users Advisory Committee, Notice of Meeting... advice to the Bureau of Labor Statistics from the points of view of data users from various sectors of the U.S. economy, including the labor, business, research, academic, and government communities, on...

  17. Commentary to Library Statistical Data Base.

    ERIC Educational Resources Information Center

    Jones, Dennis; And Others

    The National Center for Higher Education Management Systems (NCHEMS) has developed a library statistical data base which concentrates on the management information needs of administrators of public and academic libraries. This document provides an overview of the framework and conceptual approach employed in the design of the data base. The data…

  18. 2015 Workplace and Gender Relations Survey of Reserve Component Members: Statistical Methodology Report

    DTIC Science & Technology

    2016-03-17

    RESERVE COMPONENT MEMBERS: STATISTICAL METHODOLOGY REPORT Defense Research, Surveys, and Statistics Center (RSSC) Defense Manpower Data Center...Defense Manpower Data Center (DMDC) is indebted to numerous people for their assistance with the 2015 Workplace and Gender Relations Survey of Reserve...outcomes were modeled as a function of an extensive set of administrative variables available for both respondents and nonrespondents, resulting in six

  19. A Guide to Disability Statistics from the Behavioral Risk Factors Surveillance System. Disability Statistics User Guide Series

    ERIC Educational Resources Information Center

    Erickson, William A.; Dumoulin-Smith, Adrien

    2009-01-01

    The mission of the Cornell StatsRRTC is to bridge the divide between the sources of disability data and the users of disability statistics. One product of this effort is a set of "User Guides" to national survey data that collect information on the disability population. The purpose of each of the "User Guides" is to provide…

  20. The Dependence of Strength in Plastics upon Polymer Chain Length and Chain Orientation: An Experiment Emphasizing the Statistical Handling and Evaluation of Data.

    ERIC Educational Resources Information Center

    Spencer, R. Donald

    1984-01-01

    Describes an experiment (using plastic bags) designed to give students practical understanding on using statistics to evaluate data and how statistical treatment of experimental results can enhance their value in solving scientific problems. Students also gain insight into the orientation and structure of polymers by examining the plastic bags.…

  1. Bildung in der Europaischen Union: Daten und Kennzahlen = Education across the European Union: Statistics and Indicators = Education dans L'Union Europeennee Statistiques et Indicateurs, 1996.

    ERIC Educational Resources Information Center

    EURYDICE European Unit, Brussels (Belgium).

    This publication provides comparable statistics and indicators on education across the 15 member states of the European Union. The main source of data is the joint UOE (UNESCO, Organisation for Economic Development, Eurostat) revised questionnaire on education statistics introduced in 1995. Educational attainment information draws on data from the…

  2. Statistical tools for transgene copy number estimation based on real-time PCR.

    PubMed

    Yuan, Joshua S; Burris, Jason; Stewart, Nathan R; Mentewab, Ayalew; Stewart, C Neal

    2007-11-01

    As compared with traditional transgene copy number detection technologies such as Southern blot analysis, real-time PCR provides a fast, inexpensive and high-throughput alternative. However, the real-time PCR based transgene copy number estimation tends to be ambiguous and subjective stemming from the lack of proper statistical analysis and data quality control to render a reliable estimation of copy number with a prediction value. Despite the recent progresses in statistical analysis of real-time PCR, few publications have integrated these advancements in real-time PCR based transgene copy number determination. Three experimental designs and four data quality control integrated statistical models are presented. For the first method, external calibration curves are established for the transgene based on serially-diluted templates. The Ct number from a control transgenic event and putative transgenic event are compared to derive the transgene copy number or zygosity estimation. Simple linear regression and two group T-test procedures were combined to model the data from this design. For the second experimental design, standard curves were generated for both an internal reference gene and the transgene, and the copy number of transgene was compared with that of internal reference gene. Multiple regression models and ANOVA models can be employed to analyze the data and perform quality control for this approach. In the third experimental design, transgene copy number is compared with reference gene without a standard curve, but rather, is based directly on fluorescence data. Two different multiple regression models were proposed to analyze the data based on two different approaches of amplification efficiency integration. Our results highlight the importance of proper statistical treatment and quality control integration in real-time PCR-based transgene copy number determination. These statistical methods allow the real-time PCR-based transgene copy number estimation to be more reliable and precise with a proper statistical estimation. Proper confidence intervals are necessary for unambiguous prediction of trangene copy number. The four different statistical methods are compared for their advantages and disadvantages. Moreover, the statistical methods can also be applied for other real-time PCR-based quantification assays including transfection efficiency analysis and pathogen quantification.

  3. Paleontology and Darwin's Theory of Evolution: The Subversive Role of Statistics at the End of the 19th Century.

    PubMed

    Tamborini, Marco

    2015-11-01

    This paper examines the subversive role of statistics paleontology at the end of the 19th and the beginning of the 20th centuries. In particular, I will focus on German paleontology and its relationship with statistics. I argue that in paleontology, the quantitative method was questioned and strongly limited by the first decade of the 20th century because, as its opponents noted, when the fossil record is treated statistically, it was found to generate results openly in conflict with the Darwinian theory of evolution. Essentially, statistics questions the gradual mode of evolution and the role of natural selection. The main objections to statistics were addressed during the meetings at the Kaiserlich-Königliche Geologische Reichsanstalt in Vienna in the 1880s. After having introduced the statistical treatment of the fossil record, I will use the works of Charles Léo Lesquereux (1806-1889), Joachim Barrande (1799-1833), and Henry Shaler Williams (1847-1918) to compare the objections raised in Vienna with how the statistical treatment of the data worked in practice. Furthermore, I will discuss the criticisms of Melchior Neumayr (1845-1890), one of the leading German opponents of statistical paleontology, to show why, and to what extent, statistics were questioned in Vienna. The final part of this paper considers what paleontologists can derive from a statistical notion of data: the necessity of opening a discussion about the completeness and nature of the paleontological data. The Vienna discussion about which method paleontologists should follow offers an interesting case study in order to understand the epistemic tensions within paleontology surrounding Darwin's theory as well as the variety of non-Darwinian alternatives that emerged from the statistical treatment of the fossil record at the end of the 19th century.

  4. Demonstration of a software design and statistical analysis methodology with application to patient outcomes data sets

    PubMed Central

    Mayo, Charles; Conners, Steve; Warren, Christopher; Miller, Robert; Court, Laurence; Popple, Richard

    2013-01-01

    Purpose: With emergence of clinical outcomes databases as tools utilized routinely within institutions, comes need for software tools to support automated statistical analysis of these large data sets and intrainstitutional exchange from independent federated databases to support data pooling. In this paper, the authors present a design approach and analysis methodology that addresses both issues. Methods: A software application was constructed to automate analysis of patient outcomes data using a wide range of statistical metrics, by combining use of C#.Net and R code. The accuracy and speed of the code was evaluated using benchmark data sets. Results: The approach provides data needed to evaluate combinations of statistical measurements for ability to identify patterns of interest in the data. Through application of the tools to a benchmark data set for dose-response threshold and to SBRT lung data sets, an algorithm was developed that uses receiver operator characteristic curves to identify a threshold value and combines use of contingency tables, Fisher exact tests, Welch t-tests, and Kolmogorov-Smirnov tests to filter the large data set to identify values demonstrating dose-response. Kullback-Leibler divergences were used to provide additional confirmation. Conclusions: The work demonstrates the viability of the design approach and the software tool for analysis of large data sets. PMID:24320426

  5. Demonstration of a software design and statistical analysis methodology with application to patient outcomes data sets.

    PubMed

    Mayo, Charles; Conners, Steve; Warren, Christopher; Miller, Robert; Court, Laurence; Popple, Richard

    2013-11-01

    With emergence of clinical outcomes databases as tools utilized routinely within institutions, comes need for software tools to support automated statistical analysis of these large data sets and intrainstitutional exchange from independent federated databases to support data pooling. In this paper, the authors present a design approach and analysis methodology that addresses both issues. A software application was constructed to automate analysis of patient outcomes data using a wide range of statistical metrics, by combining use of C#.Net and R code. The accuracy and speed of the code was evaluated using benchmark data sets. The approach provides data needed to evaluate combinations of statistical measurements for ability to identify patterns of interest in the data. Through application of the tools to a benchmark data set for dose-response threshold and to SBRT lung data sets, an algorithm was developed that uses receiver operator characteristic curves to identify a threshold value and combines use of contingency tables, Fisher exact tests, Welch t-tests, and Kolmogorov-Smirnov tests to filter the large data set to identify values demonstrating dose-response. Kullback-Leibler divergences were used to provide additional confirmation. The work demonstrates the viability of the design approach and the software tool for analysis of large data sets.

  6. The Profile of Creativity and Proposing Statistical Problem Quality Level Reviewed From Cognitive Style

    NASA Astrophysics Data System (ADS)

    Awi; Ahmar, A. S.; Rahman, A.; Minggi, I.; Mulbar, U.; Asdar; Ruslan; Upu, H.; Alimuddin; Hamda; Rosidah; Sutamrin; Tiro, M. A.; Rusli

    2018-01-01

    This research aims to reveal the profile about the level of creativity and the ability to propose statistical problem of students at Mathematics Education 2014 Batch in the State University of Makassar in terms of their cognitive style. This research uses explorative qualitative method by giving meta-cognitive scaffolding at the time of research. The hypothesis of research is that students who have field independent (FI) cognitive style in statistics problem posing from the provided information already able to propose the statistical problem that can be solved and create new data and the problem is already been included as a high quality statistical problem, while students who have dependent cognitive field (FD) commonly are still limited in statistics problem posing that can be finished and do not load new data and the problem is included as medium quality statistical problem.

  7. Titanic: A Statistical Exploration.

    ERIC Educational Resources Information Center

    Takis, Sandra L.

    1999-01-01

    Uses the available data about the Titanic's passengers to interest students in exploring categorical data and the chi-square distribution. Describes activities incorporated into a statistics class and gives additional resources for collecting information about the Titanic. (ASK)

  8. Illinois travel statistics, 2009

    DOT National Transportation Integrated Search

    2010-01-01

    The 2009 Illinois Travel Statistics publication is assembled to provide detailed traffic : information to the different users of traffic data. While most users of traffic data at this level : of detail are within the Illinois Department of Transporta...

  9. Illinois travel statistics, 2001

    DOT National Transportation Integrated Search

    2002-01-01

    The 2001 Illinois Travel Statistics publication is assembled to provide detailed traffic : information to the different users of traffic data. While most users of traffic data at this : level of detail are within the Illinois Department of Transporta...

  10. Illinois travel statistics, 2003

    DOT National Transportation Integrated Search

    2004-01-01

    The 2003 Illinois Travel Statistics publication is assembled to provide detailed traffic : information to the different users of traffic data. While most users of traffic data at this level : of detail are within the Illinois Department of Transporta...

  11. Illinois travel statistics, 2010

    DOT National Transportation Integrated Search

    2011-01-01

    The 2010 Illinois Travel Statistics publication is assembled to provide detailed traffic : information to the different users of traffic data. While most users of traffic data at this level : of detail are within the Illinois Department of Transporta...

  12. Illinois travel statistics, 2005

    DOT National Transportation Integrated Search

    2006-01-01

    The 2005 Illinois Travel Statistics publication is assembled to provide detailed traffic : information to the different users of traffic data. While most users of traffic data at this level : of detail are within the Illinois Department of Transporta...

  13. Illinois travel statistics, 2007

    DOT National Transportation Integrated Search

    2008-01-01

    The 2007 Illinois Travel Statistics publication is assembled to provide detailed traffic : information to the different users of traffic data. While most users of traffic data at this level : of detail are within the Illinois Department of Transporta...

  14. Illinois travel statistics, 2000

    DOT National Transportation Integrated Search

    2001-01-01

    The 2000 Illinois Travel Statistics publication is assembled to provide detailed traffic : information to the different users of traffic data. While most users of traffic data at this : level of detail are within the Illinois Department of Transporta...

  15. Illinois travel statistics, 2006

    DOT National Transportation Integrated Search

    2007-01-01

    The 2006 Illinois Travel Statistics publication is assembled to provide detailed traffic : information to the different users of traffic data. While most users of traffic data at this level : of detail are within the Illinois Department of Transporta...

  16. Illinois travel statistics, 2002

    DOT National Transportation Integrated Search

    2003-01-01

    The 2002 Illinois Travel Statistics publication is assembled to provide detailed traffic : information to the different users of traffic data. While most users of traffic data at this level : of detail are within the Illinois Department of Transporta...

  17. Illinois travel statistics, 2008

    DOT National Transportation Integrated Search

    2009-01-01

    The 2008 Illinois Travel Statistics publication is assembled to provide detailed traffic : information to the different users of traffic data. While most users of traffic data at this level : of detail are within the Illinois Department of Transporta...

  18. Illinois travel statistics, 2004

    DOT National Transportation Integrated Search

    2005-01-01

    The 2004 Illinois Travel Statistics publication is assembled to provide detailed traffic : information to the different users of traffic data. While most users of traffic data at this level : of detail are within the Illinois Department of Transporta...

  19. Illinois travel statistics, 1999

    DOT National Transportation Integrated Search

    2000-01-01

    The 1999 Illinois Travel Statistics publication is assembled to provide detailed traffic : information to the different users of traffic data. While most users of traffic data at this : level of detail are within the Illinois Department of Transporta...

  20. “Plateau”-related summary statistics are uninformative for comparing working memory models

    PubMed Central

    van den Berg, Ronald; Ma, Wei Ji

    2014-01-01

    Performance on visual working memory tasks decreases as more items need to be remembered. Over the past decade, a debate has unfolded between proponents of slot models and slotless models of this phenomenon. Zhang and Luck (2008) and Anderson, Vogel, and Awh (2011) noticed that as more items need to be remembered, “memory noise” seems to first increase and then reach a “stable plateau.” They argued that three summary statistics characterizing this plateau are consistent with slot models, but not with slotless models. Here, we assess the validity of their methods. We generated synthetic data both from a leading slot model and from a recent slotless model and quantified model evidence using log Bayes factors. We found that the summary statistics provided, at most, 0.15% of the expected model evidence in the raw data. In a model recovery analysis, a total of more than a million trials were required to achieve 99% correct recovery when models were compared on the basis of summary statistics, whereas fewer than 1,000 trials were sufficient when raw data were used. At realistic numbers of trials, plateau-related summary statistics are completely unreliable for model comparison. Applying the same analyses to subject data from Anderson et al. (2011), we found that the evidence in the summary statistics was, at most, 0.12% of the evidence in the raw data and far too weak to warrant any conclusions. These findings call into question claims about working memory that are based on summary statistics. PMID:24719235

  1. Evaluating national cause-of-death statistics: principles and application to the case of China.

    PubMed Central

    Rao, Chalapati; Lopez, Alan D.; Yang, Gonghuan; Begg, Stephen; Ma, Jiemin

    2005-01-01

    Mortality statistics systems provide basic information on the levels and causes of mortality in populations. Only a third of the world's countries have complete civil registration systems that yield adequate cause-specific mortality data for health policy-making and monitoring. This paper describes the development of a set of criteria for evaluating the quality of national mortality statistics and applies them to China as an example. The criteria cover a range of structural, statistical and technical aspects of national mortality data. Little is known about cause-of-death data in China, which is home to roughly one-fifth of the world's population. These criteria were used to evaluate the utility of data from two mortality statistics systems in use in China, namely the Ministry of Health-Vital Registration (MOH-VR) system and the Disease Surveillance Point (DSP) system. We concluded that mortality registration was incomplete in both. No statistics were available for geographical subdivisions of the country to inform resource allocation or for the monitoring of health programmes. Compilation and publication of statistics is irregular in the case of the DSP, and they are not made publicly available at all by the MOH-VR. More research is required to measure the content validity of cause-of-death attribution in the two systems, especially due to the use of verbal autopsy methods in rural areas. This framework of criteria-based evaluation is recommended for the evaluation of national mortality data in developing countries to determine their utility and to guide efforts to improve their value for guiding policy. PMID:16184281

  2. SPA- STATISTICAL PACKAGE FOR TIME AND FREQUENCY DOMAIN ANALYSIS

    NASA Technical Reports Server (NTRS)

    Brownlow, J. D.

    1994-01-01

    The need for statistical analysis often arises when data is in the form of a time series. This type of data is usually a collection of numerical observations made at specified time intervals. Two kinds of analysis may be performed on the data. First, the time series may be treated as a set of independent observations using a time domain analysis to derive the usual statistical properties including the mean, variance, and distribution form. Secondly, the order and time intervals of the observations may be used in a frequency domain analysis to examine the time series for periodicities. In almost all practical applications, the collected data is actually a mixture of the desired signal and a noise signal which is collected over a finite time period with a finite precision. Therefore, any statistical calculations and analyses are actually estimates. The Spectrum Analysis (SPA) program was developed to perform a wide range of statistical estimation functions. SPA can provide the data analyst with a rigorous tool for performing time and frequency domain studies. In a time domain statistical analysis the SPA program will compute the mean variance, standard deviation, mean square, and root mean square. It also lists the data maximum, data minimum, and the number of observations included in the sample. In addition, a histogram of the time domain data is generated, a normal curve is fit to the histogram, and a goodness-of-fit test is performed. These time domain calculations may be performed on both raw and filtered data. For a frequency domain statistical analysis the SPA program computes the power spectrum, cross spectrum, coherence, phase angle, amplitude ratio, and transfer function. The estimates of the frequency domain parameters may be smoothed with the use of Hann-Tukey, Hamming, Barlett, or moving average windows. Various digital filters are available to isolate data frequency components. Frequency components with periods longer than the data collection interval are removed by least-squares detrending. As many as ten channels of data may be analyzed at one time. Both tabular and plotted output may be generated by the SPA program. This program is written in FORTRAN IV and has been implemented on a CDC 6000 series computer with a central memory requirement of approximately 142K (octal) of 60 bit words. This core requirement can be reduced by segmentation of the program. The SPA program was developed in 1978.

  3. An Exploration of Student Attitudes and Satisfaction in a GAISE-Influenced Introductory Statistics Course

    ERIC Educational Resources Information Center

    Paul, Warren; Cunnington, R. Clare

    2017-01-01

    We used the Survey of Attitudes Toward Statistics to (1) evaluate using presemester data the Students' Attitudes Toward Statistics Model (SATS-M), and (2) test the effect on attitudes of an introductory statistics course redesigned according to the Guidelines for Assessment and Instruction in Statistics Education (GAISE) by examining the change in…

  4. Statistical Modeling for Radiation Hardness Assurance

    NASA Technical Reports Server (NTRS)

    Ladbury, Raymond L.

    2014-01-01

    We cover the models and statistics associated with single event effects (and total ionizing dose), why we need them, and how to use them: What models are used, what errors exist in real test data, and what the model allows us to say about the DUT will be discussed. In addition, how to use other sources of data such as historical, heritage, and similar part and how to apply experience, physics, and expert opinion to the analysis will be covered. Also included will be concepts of Bayesian statistics, data fitting, and bounding rates.

  5. Joint Services Electronics Program Annual Progress Report.

    DTIC Science & Technology

    1985-11-01

    one symbol memory) adaptive lHuffman codes were performed, and the compression achieved was compared with that of Ziv - Lempel coding. As was expected...MATERIALS 8 4. Information Systems 9 4.1 REAL TIME STATISTICAL DATA PROCESSING 9 -. 4.2 DATA COMPRESSION for COMPUTER DATA STRUCTURES 9 5. PhD...a. Real Time Statistical Data Processing (T. Kailatb) b. Data Compression for Computer Data Structures (J. Gill) Acces Fo NTIS CRA&I I " DTIC TAB

  6. Evaluation of SLAR and thematic mapper MSS data for forest cover mapping using computer-aided analysis techniques

    NASA Technical Reports Server (NTRS)

    Hoffer, R. M. (Principal Investigator); Knowlton, D. J.; Dean, M. E.

    1981-01-01

    A set of training statistics for the 30 meter resolution simulated thematic mapper MSS data was generated based on land use/land cover classes. In addition to this supervised data set, a nonsupervised multicluster block of training statistics is being defined in order to compare the classification results and evaluate the effect of the different training selection methods on classification performance. Two test data sets, defined using a stratified sampling procedure incorporating a grid system with dimensions of 50 lines by 50 columns, and another set based on an analyst supervised set of test fields were used to evaluate the classifications of the TMS data. The supervised training data set generated training statistics, and a per point Gaussian maximum likelihood classification of the 1979 TMS data was obtained. The August 1980 MSS data was radiometrically adjusted. The SAR data was redigitized and the SAR imagery was qualitatively analyzed.

  7. PANDA-view: An easy-to-use tool for statistical analysis and visualization of quantitative proteomics data.

    PubMed

    Chang, Cheng; Xu, Kaikun; Guo, Chaoping; Wang, Jinxia; Yan, Qi; Zhang, Jian; He, Fuchu; Zhu, Yunping

    2018-05-22

    Compared with the numerous software tools developed for identification and quantification of -omics data, there remains a lack of suitable tools for both downstream analysis and data visualization. To help researchers better understand the biological meanings in their -omics data, we present an easy-to-use tool, named PANDA-view, for both statistical analysis and visualization of quantitative proteomics data and other -omics data. PANDA-view contains various kinds of analysis methods such as normalization, missing value imputation, statistical tests, clustering and principal component analysis, as well as the most commonly-used data visualization methods including an interactive volcano plot. Additionally, it provides user-friendly interfaces for protein-peptide-spectrum representation of the quantitative proteomics data. PANDA-view is freely available at https://sourceforge.net/projects/panda-view/. 1987ccpacer@163.com and zhuyunping@gmail.com. Supplementary data are available at Bioinformatics online.

  8. WIND Toolkit Offshore Summary Dataset

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Draxl, Caroline; Musial, Walt; Scott, George

    This dataset contains summary statistics for offshore wind resources for the continental United States derived from the Wind Integration National Datatset (WIND) Toolkit. These data are available in two formats: GDB - Compressed geodatabases containing statistical summaries aligned with lease blocks (aliquots) stored in a GIS format. These data are partitioned into Pacific, Atlantic, and Gulf resource regions. HDF5 - Statistical summaries of all points in the offshore Pacific, Atlantic, and Gulf offshore regions. These data are located on the original WIND Toolkit grid and have not been reassigned or downsampled to lease blocks. These data were developed under contractmore » by NREL for the Bureau of Oceanic Energy Management (BOEM).« less

  9. Learning investment indicators through data extension

    NASA Astrophysics Data System (ADS)

    Dvořák, Marek

    2017-07-01

    Stock prices in the form of time series were analysed using single and multivariate statistical methods. After simple data preprocessing in the form of logarithmic differences, we augmented this single variate time series to a multivariate representation. This method makes use of sliding windows to calculate several dozen of new variables using simple statistic tools like first and second moments as well as more complicated statistic, like auto-regression coefficients and residual analysis, followed by an optional quadratic transformation that was further used for data extension. These were used as a explanatory variables in a regularized logistic LASSO regression which tried to estimate Buy-Sell Index (BSI) from real stock market data.

  10. IBM Watson Analytics: Automating Visualization, Descriptive, and Predictive Statistics

    PubMed Central

    2016-01-01

    Background We live in an era of explosive data generation that will continue to grow and involve all industries. One of the results of this explosion is the need for newer and more efficient data analytics procedures. Traditionally, data analytics required a substantial background in statistics and computer science. In 2015, International Business Machines Corporation (IBM) released the IBM Watson Analytics (IBMWA) software that delivered advanced statistical procedures based on the Statistical Package for the Social Sciences (SPSS). The latest entry of Watson Analytics into the field of analytical software products provides users with enhanced functions that are not available in many existing programs. For example, Watson Analytics automatically analyzes datasets, examines data quality, and determines the optimal statistical approach. Users can request exploratory, predictive, and visual analytics. Using natural language processing (NLP), users are able to submit additional questions for analyses in a quick response format. This analytical package is available free to academic institutions (faculty and students) that plan to use the tools for noncommercial purposes. Objective To report the features of IBMWA and discuss how this software subjectively and objectively compares to other data mining programs. Methods The salient features of the IBMWA program were examined and compared with other common analytical platforms, using validated health datasets. Results Using a validated dataset, IBMWA delivered similar predictions compared with several commercial and open source data mining software applications. The visual analytics generated by IBMWA were similar to results from programs such as Microsoft Excel and Tableau Software. In addition, assistance with data preprocessing and data exploration was an inherent component of the IBMWA application. Sensitivity and specificity were not included in the IBMWA predictive analytics results, nor were odds ratios, confidence intervals, or a confusion matrix. Conclusions IBMWA is a new alternative for data analytics software that automates descriptive, predictive, and visual analytics. This program is very user-friendly but requires data preprocessing, statistical conceptual understanding, and domain expertise. PMID:27729304

  11. IBM Watson Analytics: Automating Visualization, Descriptive, and Predictive Statistics.

    PubMed

    Hoyt, Robert Eugene; Snider, Dallas; Thompson, Carla; Mantravadi, Sarita

    2016-10-11

    We live in an era of explosive data generation that will continue to grow and involve all industries. One of the results of this explosion is the need for newer and more efficient data analytics procedures. Traditionally, data analytics required a substantial background in statistics and computer science. In 2015, International Business Machines Corporation (IBM) released the IBM Watson Analytics (IBMWA) software that delivered advanced statistical procedures based on the Statistical Package for the Social Sciences (SPSS). The latest entry of Watson Analytics into the field of analytical software products provides users with enhanced functions that are not available in many existing programs. For example, Watson Analytics automatically analyzes datasets, examines data quality, and determines the optimal statistical approach. Users can request exploratory, predictive, and visual analytics. Using natural language processing (NLP), users are able to submit additional questions for analyses in a quick response format. This analytical package is available free to academic institutions (faculty and students) that plan to use the tools for noncommercial purposes. To report the features of IBMWA and discuss how this software subjectively and objectively compares to other data mining programs. The salient features of the IBMWA program were examined and compared with other common analytical platforms, using validated health datasets. Using a validated dataset, IBMWA delivered similar predictions compared with several commercial and open source data mining software applications. The visual analytics generated by IBMWA were similar to results from programs such as Microsoft Excel and Tableau Software. In addition, assistance with data preprocessing and data exploration was an inherent component of the IBMWA application. Sensitivity and specificity were not included in the IBMWA predictive analytics results, nor were odds ratios, confidence intervals, or a confusion matrix. IBMWA is a new alternative for data analytics software that automates descriptive, predictive, and visual analytics. This program is very user-friendly but requires data preprocessing, statistical conceptual understanding, and domain expertise.

  12. ParallABEL: an R library for generalized parallelization of genome-wide association studies.

    PubMed

    Sangket, Unitsa; Mahasirimongkol, Surakameth; Chantratita, Wasun; Tandayya, Pichaya; Aulchenko, Yurii S

    2010-04-29

    Genome-Wide Association (GWA) analysis is a powerful method for identifying loci associated with complex traits and drug response. Parts of GWA analyses, especially those involving thousands of individuals and consuming hours to months, will benefit from parallel computation. It is arduous acquiring the necessary programming skills to correctly partition and distribute data, control and monitor tasks on clustered computers, and merge output files. Most components of GWA analysis can be divided into four groups based on the types of input data and statistical outputs. The first group contains statistics computed for a particular Single Nucleotide Polymorphism (SNP), or trait, such as SNP characterization statistics or association test statistics. The input data of this group includes the SNPs/traits. The second group concerns statistics characterizing an individual in a study, for example, the summary statistics of genotype quality for each sample. The input data of this group includes individuals. The third group consists of pair-wise statistics derived from analyses between each pair of individuals in the study, for example genome-wide identity-by-state or genomic kinship analyses. The input data of this group includes pairs of SNPs/traits. The final group concerns pair-wise statistics derived for pairs of SNPs, such as the linkage disequilibrium characterisation. The input data of this group includes pairs of individuals. We developed the ParallABEL library, which utilizes the Rmpi library, to parallelize these four types of computations. ParallABEL library is not only aimed at GenABEL, but may also be employed to parallelize various GWA packages in R. The data set from the North American Rheumatoid Arthritis Consortium (NARAC) includes 2,062 individuals with 545,080, SNPs' genotyping, was used to measure ParallABEL performance. Almost perfect speed-up was achieved for many types of analyses. For example, the computing time for the identity-by-state matrix was linearly reduced from approximately eight hours to one hour when ParallABEL employed eight processors. Executing genome-wide association analysis using the ParallABEL library on a computer cluster is an effective way to boost performance, and simplify the parallelization of GWA studies. ParallABEL is a user-friendly parallelization of GenABEL.

  13. Analysis of ground-water data for selected wells near Holloman Air Force Base, New Mexico, 1950-95

    USGS Publications Warehouse

    Huff, G.F.

    1996-01-01

    Ground-water-level, ground-water-withdrawal, and ground- water-quality data were evaluated for trends. Holloman Air Force Base is located in the west-central part of Otero County, New Mexico. Ground-water-data analyses include assembly and inspection of U.S. Geological Survey and Holloman Air Force Base data, including ground-water-level data for public-supply and observation wells and withdrawal and water-quality data for public-supply wells in the area. Well Douglas 4 shows a statistically significant decreasing trend in water levels for 1972-86 and a statistically significant increasing trend in water levels for 1986-90. Water levels in wells San Andres 5 and San Andres 6 show statistically significant decreasing trends for 1972-93 and 1981-89, respectively. A mixture of statistically significant increasing trends, statistically significant decreasing trends, and lack of statistically significant trends over periods ranging from the early 1970's to the early 1990's are indicated for the Boles wells and wells near the Boles wells. Well Boles 5 shows a statistically significant increasing trend in water levels for 1981-90. Well Boles 5 and well 17S.09E.25.343 show no statistically significant trends in water levels for 1990-93 and 1988-93, respectively. For 1986-93, well Frenchy 1 shows a statistically significant decreasing trend in water levels. Ground-water withdrawal from the San Andres and Douglas wells regularly exceeded estimated ground-water recharge from San Andres Canyon for 1963-87. For 1951-57 and 1960-86, ground-water withdrawal from the Boles wells regularly exceeded total estimated ground-water recharge from Mule, Arrow, and Lead Canyons. Ground-water withdrawal from the San Andres and Douglas wells and from the Boles wells nearly equaled estimated ground- water recharge for 1989-93 and 1986-93, respectively. For 1987- 93, ground-water withdrawal from the Escondido well regularly exceeded estimated ground-water recharge from Escondido Canyon, and ground-water withdrawal from the Frenchy wells regularly exceeded total estimated ground-water recharge from Dog and Deadman Canyons. Water-quality samples were collected from selected Douglas, San Andres, and Boles public-supply wells from December 1994 to February 1995. Concentrations of dissolved nitrate show the most consistent increases between current and historical data. Current concentrations of dissolved nitrate are greater than historical concentrations in 7 of 10 wells.

  14. Mapping cell populations in flow cytometry data for cross‐sample comparison using the Friedman–Rafsky test statistic as a distance measure

    PubMed Central

    Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu

    2015-01-01

    Abstract Flow cytometry (FCM) is a fluorescence‐based single‐cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap‐FR, a novel method for cell population mapping across FCM samples. FlowMap‐FR is based on the Friedman–Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap‐FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap‐FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap‐FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap‐FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap‐FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback–Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL‐distance in distinguishing equivalent from nonequivalent cell populations. FlowMap‐FR was also employed as a distance metric to match cell populations delineated by manual gating across 30 FCM samples from a benchmark FlowCAP data set. An F‐measure of 0.88 was obtained, indicating high precision and recall of the FR‐based population matching results. FlowMap‐FR has been implemented as a standalone R/Bioconductor package so that it can be easily incorporated into current FCM data analytical workflows. © 2015 International Society for Advancement of Cytometry PMID:26274018

  15. Mapping cell populations in flow cytometry data for cross-sample comparison using the Friedman-Rafsky test statistic as a distance measure.

    PubMed

    Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu; Scheuermann, Richard H

    2016-01-01

    Flow cytometry (FCM) is a fluorescence-based single-cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap-FR, a novel method for cell population mapping across FCM samples. FlowMap-FR is based on the Friedman-Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap-FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap-FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap-FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap-FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap-FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback-Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL-distance in distinguishing equivalent from nonequivalent cell populations. FlowMap-FR was also employed as a distance metric to match cell populations delineated by manual gating across 30 FCM samples from a benchmark FlowCAP data set. An F-measure of 0.88 was obtained, indicating high precision and recall of the FR-based population matching results. FlowMap-FR has been implemented as a standalone R/Bioconductor package so that it can be easily incorporated into current FCM data analytical workflows. © The Authors. Published by Wiley Periodicals, Inc. on behalf of ISAC.

  16. Statistical Methods for Assessments in Simulations and Serious Games. Research Report. ETS RR-14-12

    ERIC Educational Resources Information Center

    Fu, Jianbin; Zapata, Diego; Mavronikolas, Elia

    2014-01-01

    Simulation or game-based assessments produce outcome data and process data. In this article, some statistical models that can potentially be used to analyze data from simulation or game-based assessments are introduced. Specifically, cognitive diagnostic models that can be used to estimate latent skills from outcome data so as to scale these…

  17. Statistics: The Shape of the Data. Used Numbers: Real Data in the Classroom. Grades 4-6.

    ERIC Educational Resources Information Center

    Russell, Susan Jo; Corwin, Rebecca B.

    A unit of study that introduces collecting, representing, describing, and interpreting data is presented. Suitable for students in grades 4 through 6, it provides a foundation for further work in statistics and data analysis. The investigations may extend from one to four class sessions and are grouped into three parts: "Introduction to Data…

  18. Gaia DR2 documentation Chapter 1: Introduction

    NASA Astrophysics Data System (ADS)

    de Bruijne, J. H. J.; Abreu, A.; Brown, A. G. A.; Castañeda, J.; Cheek, N.; Crowley, C.; De Angeli, F.; Drimmel, R.; Fabricius, C.; Fleitas, J.; Gracia-Abril, G.; Guerra, R.; Hutton, A.; Messineo, R.; Mora, A.; Nienartowicz, K.; Panem, C.; Siddiqui, H.

    2018-04-01

    This chapter of the Gaia DR2 documentation describes the Gaia mission, the Gaia spacecraft, and the organisation of the Gaia Data Processing and Analysis Consortium (DPAC), which is responsible for the processing and analysis of the Gaia data. Furthermore, various properties of the data release are summarised, including statistical properties, object statistics, completeness, selection and filtering criteria, and limitations of the data.

  19. U.S. timber production, trade, consumption, and price statistics 1965 to 2005

    Treesearch

    James L. Howard

    2007-01-01

    This report presents annual data but is published every 2 years. The data present current and historical information on the production, trade, consumption, and prices of timber products in the United States. The report focuses on national statistics, but includes some data for individual States and regions and for Canada. The data were collected from industry trade...

  20. U.S. Timber Production, Trade, Consumption and Price Statistics 1965-2011

    Treesearch

    James L. Howard; Rebecca M. Westby

    2013-01-01

    This report presents annual data but is published every 2 years. The data present current and historical information on the production, trade, consumption, and prices of timber products in the United States. The report focuses on national statistics, but includes some data for individual states and regions and for Canada. The data were collected from industry trade...

  1. Maximum Likelihood Estimation of Spectra Information from Multiple Independent Astrophysics Data Sets

    NASA Technical Reports Server (NTRS)

    Howell, Leonard W., Jr.; Six, N. Frank (Technical Monitor)

    2002-01-01

    The Maximum Likelihood (ML) statistical theory required to estimate spectra information from an arbitrary number of astrophysics data sets produced by vastly different science instruments is developed in this paper. This theory and its successful implementation will facilitate the interpretation of spectral information from multiple astrophysics missions and thereby permit the derivation of superior spectral information based on the combination of data sets. The procedure is of significant value to both existing data sets and those to be produced by future astrophysics missions consisting of two or more detectors by allowing instrument developers to optimize each detector's design parameters through simulation studies in order to design and build complementary detectors that will maximize the precision with which the science objectives may be obtained. The benefits of this ML theory and its application is measured in terms of the reduction of the statistical errors (standard deviations) of the spectra information using the multiple data sets in concert as compared to the statistical errors of the spectra information when the data sets are considered separately, as well as any biases resulting from poor statistics in one or more of the individual data sets that might be reduced when the data sets are combined.

  2. Preparing for the first meeting with a statistician.

    PubMed

    De Muth, James E

    2008-12-15

    Practical statistical issues that should be considered when performing data collection and analysis are reviewed. The meeting with a statistician should take place early in the research development before any study data are collected. The process of statistical analysis involves establishing the research question, formulating a hypothesis, selecting an appropriate test, sampling correctly, collecting data, performing tests, and making decisions. Once the objectives are established, the researcher can determine the characteristics or demographics of the individuals required for the study, how to recruit volunteers, what type of data are needed to answer the research question(s), and the best methods for collecting the required information. There are two general types of statistics: descriptive and inferential. Presenting data in a more palatable format for the reader is called descriptive statistics. Inferential statistics involve making an inference or decision about a population based on results obtained from a sample of that population. In order for the results of a statistical test to be valid, the sample should be representative of the population from which it is drawn. When collecting information about volunteers, researchers should only collect information that is directly related to the study objectives. Important information that a statistician will require first is an understanding of the type of variables involved in the study and which variables can be controlled by researchers and which are beyond their control. Data can be presented in one of four different measurement scales: nominal, ordinal, interval, or ratio. Hypothesis testing involves two mutually exclusive and exhaustive statements related to the research question. Statisticians should not be replaced by computer software, and they should be consulted before any research data are collected. When preparing to meet with a statistician, the pharmacist researcher should be familiar with the steps of statistical analysis and consider several questions related to the study to be conducted.

  3. Electric power annual 1992

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    The Electric Power Annual presents a summary of electric utility statistics at national, regional and State levels. The objective of the publication is to provide industry decisionmakers, government policymakers, analysts and the general public with historical data that may be used in understanding US electricity markets. The Electric Power Annual is prepared by the Survey Management Division; Office of Coal, Nuclear, Electric and Alternate Fuels; Energy Information Administration (EIA); US Department of Energy. ``The US Electric Power Industry at a Glance`` section presents a profile of the electric power industry ownership and performance, and a review of key statistics formore » the year. Subsequent sections present data on generating capability, including proposed capability additions; net generation; fossil-fuel statistics; retail sales; revenue; financial statistics; environmental statistics; electric power transactions; demand-side management; and nonutility power producers. In addition, the appendices provide supplemental data on major disturbances and unusual occurrences in US electricity power systems. Each section contains related text and tables and refers the reader to the appropriate publication that contains more detailed data on the subject matter. Monetary values in this publication are expressed in nominal terms.« less

  4. Kappa statistic for clustered matched-pair data.

    PubMed

    Yang, Zhao; Zhou, Ming

    2014-07-10

    Kappa statistic is widely used to assess the agreement between two procedures in the independent matched-pair data. For matched-pair data collected in clusters, on the basis of the delta method and sampling techniques, we propose a nonparametric variance estimator for the kappa statistic without within-cluster correlation structure or distributional assumptions. The results of an extensive Monte Carlo simulation study demonstrate that the proposed kappa statistic provides consistent estimation and the proposed variance estimator behaves reasonably well for at least a moderately large number of clusters (e.g., K ≥50). Compared with the variance estimator ignoring dependence within a cluster, the proposed variance estimator performs better in maintaining the nominal coverage probability when the intra-cluster correlation is fair (ρ ≥0.3), with more pronounced improvement when ρ is further increased. To illustrate the practical application of the proposed estimator, we analyze two real data examples of clustered matched-pair data. Copyright © 2014 John Wiley & Sons, Ltd.

  5. OSPAR standard method and software for statistical analysis of beach litter data.

    PubMed

    Schulz, Marcus; van Loon, Willem; Fleet, David M; Baggelaar, Paul; van der Meulen, Eit

    2017-09-15

    The aim of this study is to develop standard statistical methods and software for the analysis of beach litter data. The optimal ensemble of statistical methods comprises the Mann-Kendall trend test, the Theil-Sen slope estimation, the Wilcoxon step trend test and basic descriptive statistics. The application of Litter Analyst, a tailor-made software for analysing the results of beach litter surveys, to OSPAR beach litter data from seven beaches bordering on the south-eastern North Sea, revealed 23 significant trends in the abundances of beach litter types for the period 2009-2014. Litter Analyst revealed a large variation in the abundance of litter types between beaches. To reduce the effects of spatial variation, trend analysis of beach litter data can most effectively be performed at the beach or national level. Spatial aggregation of beach litter data within a region is possible, but resulted in a considerable reduction in the number of significant trends. Copyright © 2017 Elsevier Ltd. All rights reserved.

  6. Pearson-type goodness-of-fit test with bootstrap maximum likelihood estimation.

    PubMed

    Yin, Guosheng; Ma, Yanyuan

    2013-01-01

    The Pearson test statistic is constructed by partitioning the data into bins and computing the difference between the observed and expected counts in these bins. If the maximum likelihood estimator (MLE) of the original data is used, the statistic generally does not follow a chi-squared distribution or any explicit distribution. We propose a bootstrap-based modification of the Pearson test statistic to recover the chi-squared distribution. We compute the observed and expected counts in the partitioned bins by using the MLE obtained from a bootstrap sample. This bootstrap-sample MLE adjusts exactly the right amount of randomness to the test statistic, and recovers the chi-squared distribution. The bootstrap chi-squared test is easy to implement, as it only requires fitting exactly the same model to the bootstrap data to obtain the corresponding MLE, and then constructs the bin counts based on the original data. We examine the test size and power of the new model diagnostic procedure using simulation studies and illustrate it with a real data set.

  7. National Transportation Statistics (Annual Report, 1995)

    DOT National Transportation Integrated Search

    1995-11-01

    National Transportation Statistics is a compendium of selected national transportation and transportation-related statistics from a wide variety of government and private sources. The data illustrate transportation activity for the major transportati...

  8. National Transportation Statistics (Annual Report, 1996)

    DOT National Transportation Integrated Search

    1995-11-01

    National Transportation Statistics is a compendium of selected national transportation and transportation-related statistics from a wide variety of government and private sources. The data illustrate transportation activity for the major transportati...

  9. National Transportation Statistics (Annual Report, 1997)

    DOT National Transportation Integrated Search

    1997-12-01

    National Transportation Statistics is a compendium of selected national transportation and transportation-related statistics from a wide variety of government and private sources. The data illustrate transportation activity for the major transportati...

  10. National Transportation Statistics (Annual Report, 1977)

    DOT National Transportation Integrated Search

    1979-01-01

    National Transportation Statistics is a compendium of selected national transportation and transportation-related statistics from a wide variety of government and private sources. The data illustrate transportation activity for the major transportati...

  11. Transportation statistics annual report 2009

    DOT National Transportation Integrated Search

    2009-01-01

    The Transportation Statistics Annual Report (TSAR) presents data and information selected by the Bureau of Transportation Statistics (BTS), a component of the U.S. Department of Transportation's (USDOT's) Research and Innovative Technology Administra...

  12. Transportation statistics annual report 2010

    DOT National Transportation Integrated Search

    2011-01-01

    The Transportation Statistics Annual Report (TSAR) presents data and information compiled by the Bureau of Transportation Statistics (BTS), a component of the U.S. Department of Transportations (USDOTs) Research and Innovative Technology Admini...

  13. Measures of health sciences journal use: a comparison of vendor, link-resolver, and local citation statistics*

    PubMed Central

    De Groote, Sandra L.; Blecic, Deborah D.; Martin, Kristin

    2013-01-01

    Objective: Libraries require efficient and reliable methods to assess journal use. Vendors provide complete counts of articles retrieved from their platforms. However, if a journal is available on multiple platforms, several sets of statistics must be merged. Link-resolver reports merge data from all platforms into one report but only record partial use because users can access library subscriptions from other paths. Citation data are limited to publication use. Vendor, link-resolver, and local citation data were examined to determine correlation. Because link-resolver statistics are easy to obtain, the study library especially wanted to know if they correlate highly with the other measures. Methods: Vendor, link-resolver, and local citation statistics for the study institution were gathered for health sciences journals. Spearman rank-order correlation coefficients were calculated. Results: There was a high positive correlation between all three data sets, with vendor data commonly showing the highest use. However, a small percentage of titles showed anomalous results. Discussion and Conclusions: Link-resolver data correlate well with vendor and citation data, but due to anomalies, low link-resolver data would best be used to suggest titles for further evaluation using vendor data. Citation data may not be needed as it correlates highly with other measures. PMID:23646026

  14. Descriptive and inferential statistical methods used in burns research.

    PubMed

    Al-Benna, Sammy; Al-Ajam, Yazan; Way, Benjamin; Steinstraesser, Lars

    2010-05-01

    Burns research articles utilise a variety of descriptive and inferential methods to present and analyse data. The aim of this study was to determine the descriptive methods (e.g. mean, median, SD, range, etc.) and survey the use of inferential methods (statistical tests) used in articles in the journal Burns. This study defined its population as all original articles published in the journal Burns in 2007. Letters to the editor, brief reports, reviews, and case reports were excluded. Study characteristics, use of descriptive statistics and the number and types of statistical methods employed were evaluated. Of the 51 articles analysed, 11(22%) were randomised controlled trials, 18(35%) were cohort studies, 11(22%) were case control studies and 11(22%) were case series. The study design and objectives were defined in all articles. All articles made use of continuous and descriptive data. Inferential statistics were used in 49(96%) articles. Data dispersion was calculated by standard deviation in 30(59%). Standard error of the mean was quoted in 19(37%). The statistical software product was named in 33(65%). Of the 49 articles that used inferential statistics, the tests were named in 47(96%). The 6 most common tests used (Student's t-test (53%), analysis of variance/co-variance (33%), chi(2) test (27%), Wilcoxon & Mann-Whitney tests (22%), Fisher's exact test (12%)) accounted for the majority (72%) of statistical methods employed. A specified significance level was named in 43(88%) and the exact significance levels were reported in 28(57%). Descriptive analysis and basic statistical techniques account for most of the statistical tests reported. This information should prove useful in deciding which tests should be emphasised in educating burn care professionals. These results highlight the need for burn care professionals to have a sound understanding of basic statistics, which is crucial in interpreting and reporting data. Advice should be sought from professionals in the fields of biostatistics and epidemiology when using more advanced statistical techniques. Copyright 2009 Elsevier Ltd and ISBI. All rights reserved.

  15. Hemophilia Data and Statistics

    MedlinePlus

    ... View public health webinars on blood disorders Data & Statistics Language: English (US) Español (Spanish) Recommend on Facebook ... genetic testing is done to diagnose hemophilia before birth. For the one-third ... rates and hospitalization rates for bleeding complications from hemophilia ...

  16. Teaching Statistics--Despite Its Applications

    ERIC Educational Resources Information Center

    Ridgway, Jim; Nicholson, James; McCusker, Sean

    2007-01-01

    Evidence-based policy requires sophisticated modelling and reasoning about complex social data. The current UK statistics curricula do not equip tomorrow's citizens to understand such reasoning. We advocate radical curriculum reform, designed to require students to reason from complex data.

  17. Petroleum marketing monthly, February 1999 with data for November 1998

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    NONE

    1999-02-01

    The Petroleum Marketing Monthly (PMM) provides information and statistical data on a variety of crude oils and refined petroleum products. The publication presents statistics on crude oil costs and refined petroleum products sales for use by industry, government, private sector analysts, educational institutions, and consumers. Data on crude oil include the domestic first purchase price, the f.o.b. and landed cost of imported crude oil, and the refiners` acquisition cost of crude oil. Refined petroleum product sales data include motor gasoline, distillates, residuals, aviation fuels, kerosene, and propane. Monthly statistics on purchases of crude oil and sales of petroleum products aremore » presented in the Petroleum Marketing Monthly in six sections: Initial Estimates; Summary Statistics; Crude Oil Prices; Prices of Petroleum Products; Volumes of Petroleum Products; and Prime Supplier Sales Volumes of Petroleum Products for Local Consumption. 7 figs., 50 tabs.« less

  18. Petroleum marketing monthly, March 1999 with data for December 1998

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    NONE

    1999-03-01

    The Petroleum Marketing Monthly (PMM) provides information and statistical data on a variety of crude oils and refined petroleum products. The publication presents statistics on crude oil costs and refined petroleum products sales for use by industry, government, private sector analysts, educational institutions, and consumers. Data on crude oil include the domestic first purchase price, the f.o.b. and landed cost of imported crude oil, and the refiners` acquisition cost of crude oil. Refined petroleum product sales data include motor gasoline, distillates, residuals, aviation fuels, kerosene, and propane. Monthly statistics on purchases of crude oil and sales of petroleum products aremore » presented in the Petroleum Marketing Monthly in five sections: summary statistics; crude oil prices; prices of petroleum products; volumes of petroleum products; and prime supplier sales volumes of petroleum products for local consumption. 7 figs., 50 tabs.« less

  19. Fusion of Local Statistical Parameters for Buried Underwater Mine Detection in Sonar Imaging

    NASA Astrophysics Data System (ADS)

    Maussang, F.; Rombaut, M.; Chanussot, J.; Hétet, A.; Amate, M.

    2008-12-01

    Detection of buried underwater objects, and especially mines, is a current crucial strategic task. Images provided by sonar systems allowing to penetrate in the sea floor, such as the synthetic aperture sonars (SASs), are of great interest for the detection and classification of such objects. However, the signal-to-noise ratio is fairly low and advanced information processing is required for a correct and reliable detection of the echoes generated by the objects. The detection method proposed in this paper is based on a data-fusion architecture using the belief theory. The input data of this architecture are local statistical characteristics extracted from SAS data corresponding to the first-, second-, third-, and fourth-order statistical properties of the sonar images, respectively. The interest of these parameters is derived from a statistical model of the sonar data. Numerical criteria are also proposed to estimate the detection performances and to validate the method.

  20. Noise removing in encrypted color images by statistical analysis

    NASA Astrophysics Data System (ADS)

    Islam, N.; Puech, W.

    2012-03-01

    Cryptographic techniques are used to secure confidential data from unauthorized access but these techniques are very sensitive to noise. A single bit change in encrypted data can have catastrophic impact over the decrypted data. This paper addresses the problem of removing bit error in visual data which are encrypted using AES algorithm in the CBC mode. In order to remove the noise, a method is proposed which is based on the statistical analysis of each block during the decryption. The proposed method exploits local statistics of the visual data and confusion/diffusion properties of the encryption algorithm to remove the errors. Experimental results show that the proposed method can be used at the receiving end for the possible solution for noise removing in visual data in encrypted domain.

  1. Michigan Library Statistical Report. 1997 Edition.

    ERIC Educational Resources Information Center

    Michigan Library, Lansing.

    This 1997 edition focuses on statistical data supplied by Michigan public libraries, public library cooperatives, and those public libraries which serve as regional or subregional outlets for blind and physically handicapped patrons. Statistics on academic libraries are also presented in this edition, and summary statistics for prior fiscal years…

  2. 40 CFR 1065.602 - Statistics.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 32 2010-07-01 2010-07-01 false Statistics. 1065.602 Section 1065.602... PROCEDURES Calculations and Data Requirements § 1065.602 Statistics. (a) Overview. This section contains equations and example calculations for statistics that are specified in this part. In this section we use...

  3. Statistical primer: how to deal with missing data in scientific research?

    PubMed

    Papageorgiou, Grigorios; Grant, Stuart W; Takkenberg, Johanna J M; Mokhles, Mostafa M

    2018-05-10

    Missing data are a common challenge encountered in research which can compromise the results of statistical inference when not handled appropriately. This paper aims to introduce basic concepts of missing data to a non-statistical audience, list and compare some of the most popular approaches for handling missing data in practice and provide guidelines and recommendations for dealing with and reporting missing data in scientific research. Complete case analysis and single imputation are simple approaches for handling missing data and are popular in practice, however, in most cases they are not guaranteed to provide valid inferences. Multiple imputation is a robust and general alternative which is appropriate for data missing at random, surpassing the disadvantages of the simpler approaches, but should always be conducted with care. The aforementioned approaches are illustrated and compared in an example application using Cox regression.

  4. Algorithm for computing descriptive statistics for very large data sets and the exa-scale era

    NASA Astrophysics Data System (ADS)

    Beekman, Izaak

    2017-11-01

    An algorithm for Single-point, Parallel, Online, Converging Statistics (SPOCS) is presented. It is suited for in situ analysis that traditionally would be relegated to post-processing, and can be used to monitor the statistical convergence and estimate the error/residual in the quantity-useful for uncertainty quantification too. Today, data may be generated at an overwhelming rate by numerical simulations and proliferating sensing apparatuses in experiments and engineering applications. Monitoring descriptive statistics in real time lets costly computations and experiments be gracefully aborted if an error has occurred, and monitoring the level of statistical convergence allows them to be run for the shortest amount of time required to obtain good results. This algorithm extends work by Pébay (Sandia Report SAND2008-6212). Pébay's algorithms are recast into a converging delta formulation, with provably favorable properties. The mean, variance, covariances and arbitrary higher order statistical moments are computed in one pass. The algorithm is tested using Sillero, Jiménez, & Moser's (2013, 2014) publicly available UPM high Reynolds number turbulent boundary layer data set, demonstrating numerical robustness, efficiency and other favorable properties.

  5. Killing Barney Fife: Law Enforcements Socially Constructed Perception of Violence and its Influence on Police Militarization

    DTIC Science & Technology

    2015-09-01

    then examines the correlation between violence and police militarization. A statistical analysis of crime data found an inverse relationship between...violence and police militarization. A statistical analysis of crime data found an inverse relationship between levels of reported violence and...events. The research then focused on the correlation between violence and police militarization. The research began with a detailed statistical

  6. Use of Services for Family Planning and Infertility, United States, 1982. (Data from the National Survey of Family Growth, Series 23, No. 13).

    ERIC Educational Resources Information Center

    Horn, Marjorie C.; Mosher, William D.

    1986-01-01

    The National Survey of Family Growth is a periodic survey conducted by the National Center for Health Statistics, and designed to produce national estimates of statistics on fertility, family planning, and aspects of maternal and child health that are closely related to childbearing. This report presents statistics based on data collected in the…

  7. Study/experimental/research design: much more than statistics.

    PubMed

    Knight, Kenneth L

    2010-01-01

    The purpose of study, experimental, or research design in scientific manuscripts has changed significantly over the years. It has evolved from an explanation of the design of the experiment (ie, data gathering or acquisition) to an explanation of the statistical analysis. This practice makes "Methods" sections hard to read and understand. To clarify the difference between study design and statistical analysis, to show the advantages of a properly written study design on article comprehension, and to encourage authors to correctly describe study designs. The role of study design is explored from the introduction of the concept by Fisher through modern-day scientists and the AMA Manual of Style. At one time, when experiments were simpler, the study design and statistical design were identical or very similar. With the complex research that is common today, which often includes manipulating variables to create new variables and the multiple (and different) analyses of a single data set, data collection is very different than statistical design. Thus, both a study design and a statistical design are necessary. Scientific manuscripts will be much easier to read and comprehend. A proper experimental design serves as a road map to the study methods, helping readers to understand more clearly how the data were obtained and, therefore, assisting them in properly analyzing the results.

  8. The Performance and Retention of Female Navy Officers with a Military Spouse

    DTIC Science & Technology

    2017-03-01

    5 2. Female Officer Retention and Dual-Military Couples ...............7 3. Demographic Statistics ...23 III. DATA DESCRIPTION AND STATISTICS ...28 2. Independent Variables.................................................................31 C. SUMMARY STATISTICS

  9. A systematic review of the quality of statistical methods employed for analysing quality of life data in cancer randomised controlled trials.

    PubMed

    Hamel, Jean-Francois; Saulnier, Patrick; Pe, Madeline; Zikos, Efstathios; Musoro, Jammbe; Coens, Corneel; Bottomley, Andrew

    2017-09-01

    Over the last decades, Health-related Quality of Life (HRQoL) end-points have become an important outcome of the randomised controlled trials (RCTs). HRQoL methodology in RCTs has improved following international consensus recommendations. However, no international recommendations exist concerning the statistical analysis of such data. The aim of our study was to identify and characterise the quality of the statistical methods commonly used for analysing HRQoL data in cancer RCTs. Building on our recently published systematic review, we analysed a total of 33 published RCTs studying the HRQoL methods reported in RCTs since 1991. We focussed on the ability of the methods to deal with the three major problems commonly encountered when analysing HRQoL data: their multidimensional and longitudinal structure and the commonly high rate of missing data. All studies reported HRQoL being assessed repeatedly over time for a period ranging from 2 to 36 months. Missing data were common, with compliance rates ranging from 45% to 90%. From the 33 studies considered, 12 different statistical methods were identified. Twenty-nine studies analysed each of the questionnaire sub-dimensions without type I error adjustment. Thirteen studies repeated the HRQoL analysis at each assessment time again without type I error adjustment. Only 8 studies used methods suitable for repeated measurements. Our findings show a lack of consistency in statistical methods for analysing HRQoL data. Problems related to multiple comparisons were rarely considered leading to a high risk of false positive results. It is therefore critical that international recommendations for improving such statistical practices are developed. Copyright © 2017. Published by Elsevier Ltd.

  10. Detection and Evaluation of Spatio-Temporal Spike Patterns in Massively Parallel Spike Train Data with SPADE.

    PubMed

    Quaglio, Pietro; Yegenoglu, Alper; Torre, Emiliano; Endres, Dominik M; Grün, Sonja

    2017-01-01

    Repeated, precise sequences of spikes are largely considered a signature of activation of cell assemblies. These repeated sequences are commonly known under the name of spatio-temporal patterns (STPs). STPs are hypothesized to play a role in the communication of information in the computational process operated by the cerebral cortex. A variety of statistical methods for the detection of STPs have been developed and applied to electrophysiological recordings, but such methods scale poorly with the current size of available parallel spike train recordings (more than 100 neurons). In this work, we introduce a novel method capable of overcoming the computational and statistical limits of existing analysis techniques in detecting repeating STPs within massively parallel spike trains (MPST). We employ advanced data mining techniques to efficiently extract repeating sequences of spikes from the data. Then, we introduce and compare two alternative approaches to distinguish statistically significant patterns from chance sequences. The first approach uses a measure known as conceptual stability, of which we investigate a computationally cheap approximation for applications to such large data sets. The second approach is based on the evaluation of pattern statistical significance. In particular, we provide an extension to STPs of a method we recently introduced for the evaluation of statistical significance of synchronous spike patterns. The performance of the two approaches is evaluated in terms of computational load and statistical power on a variety of artificial data sets that replicate specific features of experimental data. Both methods provide an effective and robust procedure for detection of STPs in MPST data. The method based on significance evaluation shows the best overall performance, although at a higher computational cost. We name the novel procedure the spatio-temporal Spike PAttern Detection and Evaluation (SPADE) analysis.

  11. A new statistic for identifying batch effects in high-throughput genomic data that uses guided principal component analysis.

    PubMed

    Reese, Sarah E; Archer, Kellie J; Therneau, Terry M; Atkinson, Elizabeth J; Vachon, Celine M; de Andrade, Mariza; Kocher, Jean-Pierre A; Eckel-Passow, Jeanette E

    2013-11-15

    Batch effects are due to probe-specific systematic variation between groups of samples (batches) resulting from experimental features that are not of biological interest. Principal component analysis (PCA) is commonly used as a visual tool to determine whether batch effects exist after applying a global normalization method. However, PCA yields linear combinations of the variables that contribute maximum variance and thus will not necessarily detect batch effects if they are not the largest source of variability in the data. We present an extension of PCA to quantify the existence of batch effects, called guided PCA (gPCA). We describe a test statistic that uses gPCA to test whether a batch effect exists. We apply our proposed test statistic derived using gPCA to simulated data and to two copy number variation case studies: the first study consisted of 614 samples from a breast cancer family study using Illumina Human 660 bead-chip arrays, whereas the second case study consisted of 703 samples from a family blood pressure study that used Affymetrix SNP Array 6.0. We demonstrate that our statistic has good statistical properties and is able to identify significant batch effects in two copy number variation case studies. We developed a new statistic that uses gPCA to identify whether batch effects exist in high-throughput genomic data. Although our examples pertain to copy number data, gPCA is general and can be used on other data types as well. The gPCA R package (Available via CRAN) provides functionality and data to perform the methods in this article. reesese@vcu.edu

  12. Detection and Evaluation of Spatio-Temporal Spike Patterns in Massively Parallel Spike Train Data with SPADE

    PubMed Central

    Quaglio, Pietro; Yegenoglu, Alper; Torre, Emiliano; Endres, Dominik M.; Grün, Sonja

    2017-01-01

    Repeated, precise sequences of spikes are largely considered a signature of activation of cell assemblies. These repeated sequences are commonly known under the name of spatio-temporal patterns (STPs). STPs are hypothesized to play a role in the communication of information in the computational process operated by the cerebral cortex. A variety of statistical methods for the detection of STPs have been developed and applied to electrophysiological recordings, but such methods scale poorly with the current size of available parallel spike train recordings (more than 100 neurons). In this work, we introduce a novel method capable of overcoming the computational and statistical limits of existing analysis techniques in detecting repeating STPs within massively parallel spike trains (MPST). We employ advanced data mining techniques to efficiently extract repeating sequences of spikes from the data. Then, we introduce and compare two alternative approaches to distinguish statistically significant patterns from chance sequences. The first approach uses a measure known as conceptual stability, of which we investigate a computationally cheap approximation for applications to such large data sets. The second approach is based on the evaluation of pattern statistical significance. In particular, we provide an extension to STPs of a method we recently introduced for the evaluation of statistical significance of synchronous spike patterns. The performance of the two approaches is evaluated in terms of computational load and statistical power on a variety of artificial data sets that replicate specific features of experimental data. Both methods provide an effective and robust procedure for detection of STPs in MPST data. The method based on significance evaluation shows the best overall performance, although at a higher computational cost. We name the novel procedure the spatio-temporal Spike PAttern Detection and Evaluation (SPADE) analysis. PMID:28596729

  13. 42 CFR 417.568 - Adequate financial records, statistical data, and cost finding.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... ORGANIZATIONS, COMPETITIVE MEDICAL PLANS, AND HEALTH CARE PREPAYMENT PLANS Medicare Payment: Cost Basis § 417... health care industry. (b) Provision of data. (1) The HMO or CMP must provide adequate cost and... 42 Public Health 3 2012-10-01 2012-10-01 false Adequate financial records, statistical data, and...

  14. Library Impact Data Project: Looking for the Link between Library Usage and Student Attainment

    ERIC Educational Resources Information Center

    Stone, Graham; Ramsden, Bryony

    2013-01-01

    The Library Impact Data Project was a six-month project funded by Jisc and managed by the University of Huddersfield to investigate this hypothesis: "There is a statistically significant correlation across a number of universities between library activity data and student attainment." E-resources usage, library borrowing statistics, and…

  15. Audience Diversion Due to Cable Television: A Statistical Analysis of New Data.

    ERIC Educational Resources Information Center

    Park, Rolla Edward

    A statistical analysis of new data suggests that television broadcasting will continue to prosper, despite increasing competition from cable television carrying distant signals. Data on cable and non-cable audiences in 121 counties with well defined signal choice support generalized least squares estimates of two models: total audience and…

  16. Four types of ensemble coding in data visualizations.

    PubMed

    Szafir, Danielle Albers; Haroz, Steve; Gleicher, Michael; Franconeri, Steven

    2016-01-01

    Ensemble coding supports rapid extraction of visual statistics about distributed visual information. Researchers typically study this ability with the goal of drawing conclusions about how such coding extracts information from natural scenes. Here we argue that a second domain can serve as another strong inspiration for understanding ensemble coding: graphs, maps, and other visual presentations of data. Data visualizations allow observers to leverage their ability to perform visual ensemble statistics on distributions of spatial or featural visual information to estimate actual statistics on data. We survey the types of visual statistical tasks that occur within data visualizations across everyday examples, such as scatterplots, and more specialized images, such as weather maps or depictions of patterns in text. We divide these tasks into four categories: identification of sets of values, summarization across those values, segmentation of collections, and estimation of structure. We point to unanswered questions for each category and give examples of such cross-pollination in the current literature. Increased collaboration between the data visualization and perceptual psychology research communities can inspire new solutions to challenges in visualization while simultaneously exposing unsolved problems in perception research.

  17. Analysis of repeated measurement data in the clinical trials

    PubMed Central

    Singh, Vineeta; Rana, Rakesh Kumar; Singhal, Richa

    2013-01-01

    Statistics is an integral part of Clinical Trials. Elements of statistics span Clinical Trial design, data monitoring, analyses and reporting. A solid understanding of statistical concepts by clinicians improves the comprehension and the resulting quality of Clinical Trials. In biomedical research it has been seen that researcher frequently use t-test and ANOVA to compare means between the groups of interest irrespective of the nature of the data. In Clinical Trials we record the data on the patients more than two times. In such a situation using the standard ANOVA procedures is not appropriate as it does not consider dependencies between observations within subjects in the analysis. To deal with such types of study data Repeated Measure ANOVA should be used. In this article the application of One-way Repeated Measure ANOVA has been demonstrated by using the software SPSS (Statistical Package for Social Sciences) Version 15.0 on the data collected at four time points 0 day, 15th day, 30th day, and 45th day of multicentre clinical trial conducted on Pandu Roga (~Iron Deficiency Anemia) with an Ayurvedic formulation Dhatrilauha. PMID:23930038

  18. RepExplore: addressing technical replicate variance in proteomics and metabolomics data analysis.

    PubMed

    Glaab, Enrico; Schneider, Reinhard

    2015-07-01

    High-throughput omics datasets often contain technical replicates included to account for technical sources of noise in the measurement process. Although summarizing these replicate measurements by using robust averages may help to reduce the influence of noise on downstream data analysis, the information on the variance across the replicate measurements is lost in the averaging process and therefore typically disregarded in subsequent statistical analyses.We introduce RepExplore, a web-service dedicated to exploit the information captured in the technical replicate variance to provide more reliable and informative differential expression and abundance statistics for omics datasets. The software builds on previously published statistical methods, which have been applied successfully to biomedical omics data but are difficult to use without prior experience in programming or scripting. RepExplore facilitates the analysis by providing a fully automated data processing and interactive ranking tables, whisker plot, heat map and principal component analysis visualizations to interpret omics data and derived statistics. Freely available at http://www.repexplore.tk enrico.glaab@uni.lu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  19. Perspectives on clinical trial data transparency and disclosure.

    PubMed

    Alemayehu, Demissie; Anziano, Richard J; Levenstein, Marcia

    2014-09-01

    The increased demand for transparency and disclosure of data from clinical trials sponsored by pharmaceutical companies poses considerable challenges and opportunities from a statistical perspective. A central issue is the need to protect patient privacy and adhere to Good Clinical and Statistical Practices, while ensuring access to patient-level data from clinical trials to the wider research community. This paper offers options to navigate this dilemma and balance competing priorities, with emphasis on the role of good clinical and statistical practices as proven safeguards for scientific integrity, the importance of adopting best practices for reporting of data from secondary analyses, and the need for optimal collaboration among stakeholders to facilitate data sharing. Copyright © 2014 Elsevier Inc. All rights reserved.

  20. Data on electrical energy conservation using high efficiency motors for the confidence bounds using statistical techniques.

    PubMed

    Shaikh, Muhammad Mujtaba; Memon, Abdul Jabbar; Hussain, Manzoor

    2016-09-01

    In this article, we describe details of the data used in the research paper "Confidence bounds for energy conservation in electric motors: An economical solution using statistical techniques" [1]. The data presented in this paper is intended to show benefits of high efficiency electric motors over the standard efficiency motors of similar rating in the industrial sector of Pakistan. We explain how the data was collected and then processed by means of formulas to show cost effectiveness of energy efficient motors in terms of three important parameters: annual energy saving, cost saving and payback periods. This data can be further used to construct confidence bounds for the parameters using statistical techniques as described in [1].

  1. The 1985 Army Experience Survey. Data Sourcebook and User’s Manual

    DTIC Science & Technology

    1986-01-01

    on the survey data file produced for the 1985 AES.- 4 The survey data are available in Operating System (OS) as well as Statistical Analysis System ...version of the survey data files was produced using the Statistical Analysis System (SASJ. The survey data were also produced in Operating System (OS...impacts upon future enlistments. In order iThe OS data file was designed to make the survey data accessible on any IBM-compatible computer system . 3 N’ to

  2. PIV Data Validation Software Package

    NASA Technical Reports Server (NTRS)

    Blackshire, James L.

    1997-01-01

    A PIV data validation and post-processing software package was developed to provide semi-automated data validation and data reduction capabilities for Particle Image Velocimetry data sets. The software provides three primary capabilities including (1) removal of spurious vector data, (2) filtering, smoothing, and interpolating of PIV data, and (3) calculations of out-of-plane vorticity, ensemble statistics, and turbulence statistics information. The software runs on an IBM PC/AT host computer working either under Microsoft Windows 3.1 or Windows 95 operating systems.

  3. Annually and monthly resolved solar irradiance and atmospheric temperature data across the Hawaiian archipelago from 1998 to 2015 with interannual summary statistics.

    PubMed

    Bryce, Richard; Losada Carreño, Ignacio; Kumler, Andrew; Hodge, Bri-Mathias; Roberts, Billy; Brancucci Martinez-Anido, Carlo

    2018-08-01

    This article contains data and summary statistics of solar irradiance and dry bulb temperature across the Hawaiian archipelago resolved on a monthly basis and spanning years 1998-2015. This data was derived in association with an article titled "Consequences of Neglecting the Interannual Variability of the Solar Resource: A Case Study of Photovoltaic Power Among the Hawaiian Islands" (Bryce et al., 2018 [7]). The solar irradiance data is presented in terms of Direct Normal Irradiance (DNI), Diffuse Horizontal Irradiance (DHI), and Global Horizontal Irradiance (GHI) and was obtained from the satellite-derived data contained in the National Solar Radiation Database (NSRDB). The temperature data is also obtained from this source. We have processed the NSRDB data and compiled these monthly resolved data sets, along with interannual summary statistics including the interannual coefficient of variability.

  4. A stochastic fractional dynamics model of space-time variability of rain

    NASA Astrophysics Data System (ADS)

    Kundu, Prasun K.; Travis, James E.

    2013-09-01

    varies in space and time in a highly irregular manner and is described naturally in terms of a stochastic process. A characteristic feature of rainfall statistics is that they depend strongly on the space-time scales over which rain data are averaged. A spectral model of precipitation has been developed based on a stochastic differential equation of fractional order for the point rain rate, which allows a concise description of the second moment statistics of rain at any prescribed space-time averaging scale. The model is thus capable of providing a unified description of the statistics of both radar and rain gauge data. The underlying dynamical equation can be expressed in terms of space-time derivatives of fractional orders that are adjusted together with other model parameters to fit the data. The form of the resulting spectrum gives the model adequate flexibility to capture the subtle interplay between the spatial and temporal scales of variability of rain but strongly constrains the predicted statistical behavior as a function of the averaging length and time scales. We test the model with radar and gauge data collected contemporaneously at the NASA TRMM ground validation sites located near Melbourne, Florida and on the Kwajalein Atoll, Marshall Islands in the tropical Pacific. We estimate the parameters by tuning them to fit the second moment statistics of radar data at the smaller spatiotemporal scales. The model predictions are then found to fit the second moment statistics of the gauge data reasonably well at these scales without any further adjustment.

  5. Statistics at the Chinese Universities.

    DTIC Science & Technology

    1981-09-01

    education in China in the postwar years is pro- vided to give some perspective. My observa- tions on statistics at the Chinese universities are necessarily...has been accepted as a member society of ISI. 3. Education in China Understanding of statistics in universities in China will be enhanced through some...programaming), Statistical Mathematics (infer- ence, data analysis, industrial statistics , information theory), tiathematical Physics (dif- ferential

  6. A New Statistic for Evaluating Item Response Theory Models for Ordinal Data. CRESST Report 839

    ERIC Educational Resources Information Center

    Cai, Li; Monroe, Scott

    2014-01-01

    We propose a new limited-information goodness of fit test statistic C[subscript 2] for ordinal IRT models. The construction of the new statistic lies formally between the M[subscript 2] statistic of Maydeu-Olivares and Joe (2006), which utilizes first and second order marginal probabilities, and the M*[subscript 2] statistic of Cai and Hansen…

  7. Investigation of Statistical Inference Methodologies Through Scale Model Propagation Experiments

    DTIC Science & Technology

    2015-09-30

    statistical inference methodologies for ocean- acoustic problems by investigating and applying statistical methods to data collected from scale-model...to begin planning experiments for statistical inference applications. APPROACH In the ocean acoustics community over the past two decades...solutions for waveguide parameters. With the introduction of statistical inference to the field of ocean acoustics came the desire to interpret marginal

  8. A flexibly shaped space-time scan statistic for disease outbreak detection and monitoring.

    PubMed

    Takahashi, Kunihiko; Kulldorff, Martin; Tango, Toshiro; Yih, Katherine

    2008-04-11

    Early detection of disease outbreaks enables public health officials to implement disease control and prevention measures at the earliest possible time. A time periodic geographical disease surveillance system based on a cylindrical space-time scan statistic has been used extensively for disease surveillance along with the SaTScan software. In the purely spatial setting, many different methods have been proposed to detect spatial disease clusters. In particular, some spatial scan statistics are aimed at detecting irregularly shaped clusters which may not be detected by the circular spatial scan statistic. Based on the flexible purely spatial scan statistic, we propose a flexibly shaped space-time scan statistic for early detection of disease outbreaks. The performance of the proposed space-time scan statistic is compared with that of the cylindrical scan statistic using benchmark data. In order to compare their performances, we have developed a space-time power distribution by extending the purely spatial bivariate power distribution. Daily syndromic surveillance data in Massachusetts, USA, are used to illustrate the proposed test statistic. The flexible space-time scan statistic is well suited for detecting and monitoring disease outbreaks in irregularly shaped areas.

  9. Low altitude wind shear statistics derived from measured and FAA proposed standard wind profiles

    NASA Technical Reports Server (NTRS)

    Dunham, R. E., Jr.; Usry, J. W.

    1984-01-01

    Wind shear statistics were calculated for a simulated data set using wind profiles proposed as a standard and compared to statistics derived from measured wind profile data. Wind shear values were grouped in altitude bands of 100 ft between 100 and 1400 ft, and in wind shear increments of 0.025 kt/ft between + or - 0.600 kt/ft for the simulated data set and between + or - 0.200 kt/ft for the measured set. No values existed outside the + or - 0.200 kt/ft boundaries for the measured data. Frequency distributions, means, and standard deviations were derived for each altitude band for both data sets, and compared. Also, frequency distributions were derived for the total sample for both data sets and compared. Frequency of occurrence of a given wind shear was about the same for both data sets for wind shears, but less than + or 0.10 kt/ft, but the simulated data set had larger values outside these boundaries. Neglecting the vertical wind component did not significantly affect the statistics for these data sets. The frequency of occurrence of wind shears for the flight measured data was essentially the same for each altitude band and the total sample, but the simulated data distributions were different for each altitude band. The larger wind shears for the flight measured data were found to have short durations.

  10. Pathway analysis with next-generation sequencing data.

    PubMed

    Zhao, Jinying; Zhu, Yun; Boerwinkle, Eric; Xiong, Momiao

    2015-04-01

    Although pathway analysis methods have been developed and successfully applied to association studies of common variants, the statistical methods for pathway-based association analysis of rare variants have not been well developed. Many investigators observed highly inflated false-positive rates and low power in pathway-based tests of association of rare variants. The inflated false-positive rates and low true-positive rates of the current methods are mainly due to their lack of ability to account for gametic phase disequilibrium. To overcome these serious limitations, we develop a novel statistic that is based on the smoothed functional principal component analysis (SFPCA) for pathway association tests with next-generation sequencing data. The developed statistic has the ability to capture position-level variant information and account for gametic phase disequilibrium. By intensive simulations, we demonstrate that the SFPCA-based statistic for testing pathway association with either rare or common or both rare and common variants has the correct type 1 error rates. Also the power of the SFPCA-based statistic and 22 additional existing statistics are evaluated. We found that the SFPCA-based statistic has a much higher power than other existing statistics in all the scenarios considered. To further evaluate its performance, the SFPCA-based statistic is applied to pathway analysis of exome sequencing data in the early-onset myocardial infarction (EOMI) project. We identify three pathways significantly associated with EOMI after the Bonferroni correction. In addition, our preliminary results show that the SFPCA-based statistic has much smaller P-values to identify pathway association than other existing methods.

  11. RADSS: an integration of GIS, spatial statistics, and network service for regional data mining

    NASA Astrophysics Data System (ADS)

    Hu, Haitang; Bao, Shuming; Lin, Hui; Zhu, Qing

    2005-10-01

    Regional data mining, which aims at the discovery of knowledge about spatial patterns, clusters or association between regions, has widely applications nowadays in social science, such as sociology, economics, epidemiology, crime, and so on. Many applications in the regional or other social sciences are more concerned with the spatial relationship, rather than the precise geographical location. Based on the spatial continuity rule derived from Tobler's first law of geography: observations at two sites tend to be more similar to each other if the sites are close together than if far apart, spatial statistics, as an important means for spatial data mining, allow the users to extract the interesting and useful information like spatial pattern, spatial structure, spatial association, spatial outlier and spatial interaction, from the vast amount of spatial data or non-spatial data. Therefore, by integrating with the spatial statistical methods, the geographical information systems will become more powerful in gaining further insights into the nature of spatial structure of regional system, and help the researchers to be more careful when selecting appropriate models. However, the lack of such tools holds back the application of spatial data analysis techniques and development of new methods and models (e.g., spatio-temporal models). Herein, we make an attempt to develop such an integrated software and apply it into the complex system analysis for the Poyang Lake Basin. This paper presents a framework for integrating GIS, spatial statistics and network service in regional data mining, as well as their implementation. After discussing the spatial statistics methods involved in regional complex system analysis, we introduce RADSS (Regional Analysis and Decision Support System), our new regional data mining tool, by integrating GIS, spatial statistics and network service. RADSS includes the functions of spatial data visualization, exploratory spatial data analysis, and spatial statistics. The tool also includes some fundamental spatial and non-spatial database in regional population and environment, which can be updated by external database via CD or network. Utilizing this data mining and exploratory analytical tool, the users can easily and quickly analyse the huge mount of the interrelated regional data, and better understand the spatial patterns and trends of the regional development, so as to make a credible and scientific decision. Moreover, it can be used as an educational tool for spatial data analysis and environmental studies. In this paper, we also present a case study on Poyang Lake Basin as an application of the tool and spatial data mining in complex environmental studies. At last, several concluding remarks are discussed.

  12. Transportation statistics annual report 2006

    DOT National Transportation Integrated Search

    2006-12-01

    This edition of the Transportation Statistics Annual Report presents selected transportation : data on 13 topics specifi ed in the legislative mandate of the Bureau of : Transportation Statistics (BTS) of the U.S. Department of Transportations Res...

  13. Test Statistics and Confidence Intervals to Establish Noninferiority between Treatments with Ordinal Categorical Data.

    PubMed

    Zhang, Fanghong; Miyaoka, Etsuo; Huang, Fuping; Tanaka, Yutaka

    2015-01-01

    The problem for establishing noninferiority is discussed between a new treatment and a standard (control) treatment with ordinal categorical data. A measure of treatment effect is used and a method of specifying noninferiority margin for the measure is provided. Two Z-type test statistics are proposed where the estimation of variance is constructed under the shifted null hypothesis using U-statistics. Furthermore, the confidence interval and the sample size formula are given based on the proposed test statistics. The proposed procedure is applied to a dataset from a clinical trial. A simulation study is conducted to compare the performance of the proposed test statistics with that of the existing ones, and the results show that the proposed test statistics are better in terms of the deviation from nominal level and the power.

  14. Statistical summaries of selected Iowa streamflow data through September 2013.

    DOT National Transportation Integrated Search

    2015-01-01

    Statistical summaries of streamflow data collected at : 184 streamgages in Iowa are presented in this report. All : streamgages included for analysis have at least 10 years of : continuous record collected before or through September : 2013. This rep...

  15. Heads Up! a Calculation- & Jargon-Free Approach to Statistics

    ERIC Educational Resources Information Center

    Giese, Alan R.

    2012-01-01

    Evaluating the strength of evidence in noisy data is a critical step in scientific thinking that typically relies on statistics. Students without statistical training will benefit from heuristic models that highlight the logic of statistical analysis. The likelihood associated with various coin-tossing outcomes gives students such a model. There…

  16. Statistics for Geography Teachers: Topics in Geography, Number 2.

    ERIC Educational Resources Information Center

    National Council for Geographic Education.

    This publication is designed to provide geography teachers with useful statistical information. It presents tables, maps, graphs, diagrams, and explanations of statistical data in 24 areas. The areas in which statistics are given are conversions, measurement, astronomy, time, daylight, twilight, latitude and longitude as distance, the relationship…

  17. 77 FR 62517 - Proposed Data Collections Submitted for Public Comment and Recommendations

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-15

    ...-based vital statistics at the national level, referred to as the U.S. National Vital Statistics System... days of this notice. Proposed Project Vital Statistics Training Application, OMB No. 0920-0217--Revision exp. 5/31/2013--National Center for Health Statistics (NCHS), Centers for Disease Control and...

  18. Using Technology to Prompt Good Questions about Distributions in Statistics

    ERIC Educational Resources Information Center

    Nabbout-Cheiban, Marie; Fisher, Forest; Edwards, Michael Todd

    2017-01-01

    The Common Core State Standards for Mathematics envisions data analysis as a key component of K-grade 12 mathematics instruction with statistics introduced in the early grades. Nonetheless, deficiencies in statistical learning persist throughout elementary school and beyond. Too often, mathematics teachers lack the statistical knowledge for…

  19. Transport Statistics - Transport - UNECE

    Science.gov Websites

    Statistics and Data Online Infocards Database SDG Papers E-Road Census Traffic Census Map Traffic Census 2015 available. Two new datasets have been added to the transport statistics database: bus and coach statistics Database Evaluations Follow UNECE Facebook Rss Twitter You tube Contact us Instagram Flickr Google+ Â

  20. 7 CFR Appendix B to Subpart E of... - Federal Reserve Statistical Release

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 7 Agriculture 12 2011-01-01 2011-01-01 false Federal Reserve Statistical Release B Appendix B to Subpart E of Part 1786 Agriculture Regulations of the Department of Agriculture (Continued) RURAL... Statistical Release Federal Reserve Statistical Release These data are released each Monday. The availability...

Top