Sample records for data interpretation statistical

  1. Localized Smart-Interpretation

    NASA Astrophysics Data System (ADS)

    Lundh Gulbrandsen, Mats; Mejer Hansen, Thomas; Bach, Torben; Pallesen, Tom

    2014-05-01

    The complex task of setting up a geological model consists not only of combining available geological information into a conceptual plausible model, but also requires consistency with availably data, e.g. geophysical data. However, in many cases the direct geological information, e.g borehole samples, are very sparse, so in order to create a geological model, the geologist needs to rely on the geophysical data. The problem is however, that the amount of geophysical data in many cases are so vast that it is practically impossible to integrate all of them in the manual interpretation process. This means that a lot of the information available from the geophysical surveys are unexploited, which is a problem, due to the fact that the resulting geological model does not fulfill its full potential and hence are less trustworthy. We suggest an approach to geological modeling that 1. allow all geophysical data to be considered when building the geological model 2. is fast 3. allow quantification of geological modeling. The method is constructed to build a statistical model, f(d,m), describing the relation between what the geologists interpret, d, and what the geologist knows, m. The para- meter m reflects any available information that can be quantified, such as geophysical data, the result of a geophysical inversion, elevation maps, etc... The parameter d reflects an actual interpretation, such as for example the depth to the base of a ground water reservoir. First we infer a statistical model f(d,m), by examining sets of actual interpretations made by a geological expert, [d1, d2, ...], and the information used to perform the interpretation; [m1, m2, ...]. This makes it possible to quantify how the geological expert performs interpolation through f(d,m). As the geological expert proceeds interpreting, the number of interpreted datapoints from which the statistical model is inferred increases, and therefore the accuracy of the statistical model increases. When a model f(d,m) successfully has been inferred, we are able to simulate how the geological expert would perform an interpretation given some external information m, through f(d|m). We will demonstrate this method applied on geological interpretation and densely sampled airborne electromagnetic data. In short, our goal is to build a statistical model describing how a geological expert performs geological interpretation given some geophysical data. We then wish to use this statistical model to perform semi automatic interpretation, everywhere where such geophysical data exist, in a manner consistent with the choices made by a geological expert. Benefits of such a statistical model are that 1. it provides a quantification of how a geological expert performs interpretation based on available diverse data 2. all available geophysical information can be used 3. it allows much faster interpretation of large data sets.

  2. Applied statistics in ecology: common pitfalls and simple solutions

    Treesearch

    E. Ashley Steel; Maureen C. Kennedy; Patrick G. Cunningham; John S. Stanovick

    2013-01-01

    The most common statistical pitfalls in ecological research are those associated with data exploration, the logic of sampling and design, and the interpretation of statistical results. Although one can find published errors in calculations, the majority of statistical pitfalls result from incorrect logic or interpretation despite correct numerical calculations. There...

  3. Interpretation of Statistical Data: The Importance of Affective Expressions

    ERIC Educational Resources Information Center

    Queiroz, Tamires; Monteiro, Carlos; Carvalho, Liliane; François, Karen

    2017-01-01

    In recent years, research on teaching and learning of statistics emphasized that the interpretation of data is a complex process that involves cognitive and technical aspects. However, it is a human activity that involves also contextual and affective aspects. This view is in line with research on affectivity and cognition. While the affective…

  4. Interpretations of Boxplots: Helping Middle School Students to Think outside the Box

    ERIC Educational Resources Information Center

    Edwards, Thomas G.; Özgün-Koca, Asli; Barr, John

    2017-01-01

    Boxplots are statistical representations for organizing and displaying data that are relatively easy to create with a five-number summary. However, boxplots are not as easy to understand, interpret, or connect with other statistical representations of the same data. We worked at two different schools with 259 middle school students who constructed…

  5. Hold My Calls: An Activity for Introducing the Statistical Process

    ERIC Educational Resources Information Center

    Abel, Todd; Poling, Lisa

    2015-01-01

    Working with practicing teachers, this article demonstrates, through the facilitation of a statistical activity, how to introduce and investigate the unique qualities of the statistical process including: formulate a question, collect data, analyze data, and interpret data.

  6. Teaching Business Statistics with Real Data to Undergraduates and the Use of Technology in the Class Room

    ERIC Educational Resources Information Center

    Singamsetti, Rao

    2007-01-01

    In this paper an attempt is made to highlight some issues of interpretation of statistical concepts and interpretation of results as taught in undergraduate Business statistics courses. The use of modern technology in the class room is shown to have increased the efficiency and the ease of learning and teaching in statistics. The importance of…

  7. Data exploration systems for databases

    NASA Technical Reports Server (NTRS)

    Greene, Richard J.; Hield, Christopher

    1992-01-01

    Data exploration systems apply machine learning techniques, multivariate statistical methods, information theory, and database theory to databases to identify significant relationships among the data and summarize information. The result of applying data exploration systems should be a better understanding of the structure of the data and a perspective of the data enabling an analyst to form hypotheses for interpreting the data. This paper argues that data exploration systems need a minimum amount of domain knowledge to guide both the statistical strategy and the interpretation of the resulting patterns discovered by these systems.

  8. Design, analysis, and interpretation of field quality-control data for water-sampling projects

    USGS Publications Warehouse

    Mueller, David K.; Schertz, Terry L.; Martin, Jeffrey D.; Sandstrom, Mark W.

    2015-01-01

    The report provides extensive information about statistical methods used to analyze quality-control data in order to estimate potential bias and variability in environmental data. These methods include construction of confidence intervals on various statistical measures, such as the mean, percentiles and percentages, and standard deviation. The methods are used to compare quality-control results with the larger set of environmental data in order to determine whether the effects of bias and variability might interfere with interpretation of these data. Examples from published reports are presented to illustrate how the methods are applied, how bias and variability are reported, and how the interpretation of environmental data can be qualified based on the quality-control analysis.

  9. Using the U.S. Geological Survey National Water Quality Laboratory LT-MDL to Evaluate and Analyze Data

    USGS Publications Warehouse

    Bonn, Bernadine A.

    2008-01-01

    A long-term method detection level (LT-MDL) and laboratory reporting level (LRL) are used by the U.S. Geological Survey?s National Water Quality Laboratory (NWQL) when reporting results from most chemical analyses of water samples. Changing to this method provided data users with additional information about their data and often resulted in more reported values in the low concentration range. Before this method was implemented, many of these values would have been censored. The use of the LT-MDL and LRL presents some challenges for the data user. Interpreting data in the low concentration range increases the need for adequate quality assurance because even small contamination or recovery problems can be relatively large compared to concentrations near the LT-MDL and LRL. In addition, the definition of the LT-MDL, as well as the inclusion of low values, can result in complex data sets with multiple censoring levels and reported values that are less than a censoring level. Improper interpretation or statistical manipulation of low-range results in these data sets can result in bias and incorrect conclusions. This document is designed to help data users use and interpret data reported with the LTMDL/ LRL method. The calculation and application of the LT-MDL and LRL are described. This document shows how to extract statistical information from the LT-MDL and LRL and how to use that information in USGS investigations, such as assessing the quality of field data, interpreting field data, and planning data collection for new projects. A set of 19 detailed examples are included in this document to help data users think about their data and properly interpret lowrange data without introducing bias. Although this document is not meant to be a comprehensive resource of statistical methods, several useful methods of analyzing censored data are demonstrated, including Regression on Order Statistics and Kaplan-Meier Estimation. These two statistical methods handle complex censored data sets without resorting to substitution, thereby avoiding a common source of bias and inaccuracy.

  10. Statistical Data Analyses of Trace Chemical, Biochemical, and Physical Analytical Signatures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Udey, Ruth Norma

    Analytical and bioanalytical chemistry measurement results are most meaningful when interpreted using rigorous statistical treatments of the data. The same data set may provide many dimensions of information depending on the questions asked through the applied statistical methods. Three principal projects illustrated the wealth of information gained through the application of statistical data analyses to diverse problems.

  11. Report: New analytical and statistical approaches for interpreting the relationships among environmental stressors and biomarkers

    EPA Science Inventory

    The broad topic of biomarker research has an often-overlooked component: the documentation and interpretation of the surrounding chemical environment and other meta-data, especially from visualization, analytical, and statistical perspectives (Pleil et al. 2014; Sobus et al. 2011...

  12. Paleomagnetism.org: An online multi-platform open source environment for paleomagnetic data analysis

    NASA Astrophysics Data System (ADS)

    Koymans, Mathijs R.; Langereis, Cor G.; Pastor-Galán, Daniel; van Hinsbergen, Douwe J. J.

    2016-08-01

    This contribution provides an overview of Paleomagnetism.org, an open-source, multi-platform online environment for paleomagnetic data analysis. Paleomagnetism.org provides an interactive environment where paleomagnetic data can be interpreted, evaluated, visualized, and exported. The Paleomagnetism.org application is split in to an interpretation portal, a statistics portal, and a portal for miscellaneous paleomagnetic tools. In the interpretation portal, principle component analysis can be performed on visualized demagnetization diagrams. Interpreted directions and great circles can be combined to find great circle solutions. These directions can be used in the statistics portal, or exported as data and figures. The tools in the statistics portal cover standard Fisher statistics for directions and VGPs, including other statistical parameters used as reliability criteria. Other available tools include an eigenvector approach foldtest, two reversal test including a Monte Carlo simulation on mean directions, and a coordinate bootstrap on the original data. An implementation is included for the detection and correction of inclination shallowing in sediments following TK03.GAD. Finally we provide a module to visualize VGPs and expected paleolatitudes, declinations, and inclinations relative to widely used global apparent polar wander path models in coordinates of major continent-bearing plates. The tools in the miscellaneous portal include a net tectonic rotation (NTR) analysis to restore a body to its paleo-vertical and a bootstrapped oroclinal test using linear regressive techniques, including a modified foldtest around a vertical axis. Paleomagnetism.org provides an integrated approach for researchers to work with visualized (e.g. hemisphere projections, Zijderveld diagrams) paleomagnetic data. The application constructs a custom exportable file that can be shared freely and included in public databases. This exported file contains all data and can later be imported to the application by other researchers. The accessibility and simplicity through which paleomagnetic data can be interpreted, analyzed, visualized, and shared makes Paleomagnetism.org of interest to the community.

  13. How to interpret the results of medical time series data analysis: Classical statistical approaches versus dynamic Bayesian network modeling.

    PubMed

    Onisko, Agnieszka; Druzdzel, Marek J; Austin, R Marshall

    2016-01-01

    Classical statistics is a well-established approach in the analysis of medical data. While the medical community seems to be familiar with the concept of a statistical analysis and its interpretation, the Bayesian approach, argued by many of its proponents to be superior to the classical frequentist approach, is still not well-recognized in the analysis of medical data. The goal of this study is to encourage data analysts to use the Bayesian approach, such as modeling with graphical probabilistic networks, as an insightful alternative to classical statistical analysis of medical data. This paper offers a comparison of two approaches to analysis of medical time series data: (1) classical statistical approach, such as the Kaplan-Meier estimator and the Cox proportional hazards regression model, and (2) dynamic Bayesian network modeling. Our comparison is based on time series cervical cancer screening data collected at Magee-Womens Hospital, University of Pittsburgh Medical Center over 10 years. The main outcomes of our comparison are cervical cancer risk assessments produced by the three approaches. However, our analysis discusses also several aspects of the comparison, such as modeling assumptions, model building, dealing with incomplete data, individualized risk assessment, results interpretation, and model validation. Our study shows that the Bayesian approach is (1) much more flexible in terms of modeling effort, and (2) it offers an individualized risk assessment, which is more cumbersome for classical statistical approaches.

  14. Critical Views of 8th Grade Students toward Statistical Data in Newspaper Articles: Analysis in Light of Statistical Literacy

    ERIC Educational Resources Information Center

    Guler, Mustafa; Gursoy, Kadir; Guven, Bulent

    2016-01-01

    Understanding and interpreting biased data, decision-making in accordance with the data, and critically evaluating situations involving data are among the fundamental skills necessary in the modern world. To develop these required skills, emphasis on statistical literacy in school mathematics has been gradually increased in recent years. The…

  15. Targeting Change: Assessing a Faculty Learning Community Focused on Increasing Statistics Content in Life Science Curricula

    ERIC Educational Resources Information Center

    Parker, Loran Carleton; Gleichsner, Alyssa M.; Adedokun, Omolola A.; Forney, James

    2016-01-01

    Transformation of research in all biological fields necessitates the design, analysis and, interpretation of large data sets. Preparing students with the requisite skills in experimental design, statistical analysis, and interpretation, and mathematical reasoning will require both curricular reform and faculty who are willing and able to integrate…

  16. Australian Curriculum Linked Lessons: Statistics

    ERIC Educational Resources Information Center

    Day, Lorraine

    2014-01-01

    Students recognise and analyse data and draw inferences. They represent, summarise and interpret data and undertake purposeful investigations involving the collection and interpretation of data… They develop an increasingly sophisticated ability to critically evaluate chance and data concepts and make reasoned judgments and decisions, as well as…

  17. Combining data visualization and statistical approaches for interpreting measurements and meta-data: Integrating heatmaps, variable clustering, and mixed regression models

    EPA Science Inventory

    The advent of new higher throughput analytical instrumentation has put a strain on interpreting and explaining the results from complex studies. Contemporary human, environmental, and biomonitoring data sets are comprised of tens or hundreds of analytes, multiple repeat measures...

  18. Descriptive data analysis.

    PubMed

    Thompson, Cheryl Bagley

    2009-01-01

    This 13th article of the Basics of Research series is first in a short series on statistical analysis. These articles will discuss creating your statistical analysis plan, levels of measurement, descriptive statistics, probability theory, inferential statistics, and general considerations for interpretation of the results of a statistical analysis.

  19. Statistical Literacy: High School Students in Reading, Interpreting and Presenting Data

    NASA Astrophysics Data System (ADS)

    Hafiyusholeh, M.; Budayasa, K.; Siswono, T. Y. E.

    2018-01-01

    One of the foundations for high school students in statistics is to be able to read data; presents data in the form of tables and diagrams and its interpretation. The purpose of this study is to describe high school students’ competencies in reading, interpreting and presenting data. Subjects were consisted of male and female students who had high levels of mathematical ability. Collecting data was done in form of task formulation which is analyzed by reducing, presenting and verifying data. Results showed that the students read the data based on explicit explanations on the diagram, such as explaining the points in the diagram as the relation between the x and y axis and determining the simple trend of a graph, including the maximum and minimum point. In interpreting and summarizing the data, both subjects pay attention to general data trends and use them to predict increases or decreases in data. The male estimates the value of the (n+1) of weight data by using the modus of the data, while the females estimate the weigth by using the average. The male tend to do not consider the characteristics of the data, while the female more carefully consider the characteristics of data.

  20. Assessment of water quality parameters using multivariate analysis for Klang River basin, Malaysia.

    PubMed

    Mohamed, Ibrahim; Othman, Faridah; Ibrahim, Adriana I N; Alaa-Eldin, M E; Yunus, Rossita M

    2015-01-01

    This case study uses several univariate and multivariate statistical techniques to evaluate and interpret a water quality data set obtained from the Klang River basin located within the state of Selangor and the Federal Territory of Kuala Lumpur, Malaysia. The river drains an area of 1,288 km(2), from the steep mountain rainforests of the main Central Range along Peninsular Malaysia to the river mouth in Port Klang, into the Straits of Malacca. Water quality was monitored at 20 stations, nine of which are situated along the main river and 11 along six tributaries. Data was collected from 1997 to 2007 for seven parameters used to evaluate the status of the water quality, namely dissolved oxygen, biochemical oxygen demand, chemical oxygen demand, suspended solids, ammoniacal nitrogen, pH, and temperature. The data were first investigated using descriptive statistical tools, followed by two practical multivariate analyses that reduced the data dimensions for better interpretation. The analyses employed were factor analysis and principal component analysis, which explain 60 and 81.6% of the total variation in the data, respectively. We found that the resulting latent variables from the factor analysis are interpretable and beneficial for describing the water quality in the Klang River. This study presents the usefulness of several statistical methods in evaluating and interpreting water quality data for the purpose of monitoring the effectiveness of water resource management. The results should provide more straightforward data interpretation as well as valuable insight for managers to conceive optimum action plans for controlling pollution in river water.

  1. Analysis and interpretation of cost data in randomised controlled trials: review of published studies

    PubMed Central

    Barber, Julie A; Thompson, Simon G

    1998-01-01

    Objective To review critically the statistical methods used for health economic evaluations in randomised controlled trials where an estimate of cost is available for each patient in the study. Design Survey of published randomised trials including an economic evaluation with cost values suitable for statistical analysis; 45 such trials published in 1995 were identified from Medline. Main outcome measures The use of statistical methods for cost data was assessed in terms of the descriptive statistics reported, use of statistical inference, and whether the reported conclusions were justified. Results Although all 45 trials reviewed apparently had cost data for each patient, only 9 (20%) reported adequate measures of variability for these data and only 25 (56%) gave results of statistical tests or a measure of precision for the comparison of costs between the randomised groups. Only 16 (36%) of the articles gave conclusions which were justified on the basis of results presented in the paper. No paper reported sample size calculations for costs. Conclusions The analysis and interpretation of cost data from published trials reveal a lack of statistical awareness. Strong and potentially misleading conclusions about the relative costs of alternative therapies have often been reported in the absence of supporting statistical evidence. Improvements in the analysis and reporting of health economic assessments are urgently required. Health economic guidelines need to be revised to incorporate more detailed statistical advice. Key messagesHealth economic evaluations required for important healthcare policy decisions are often carried out in randomised controlled trialsA review of such published economic evaluations assessed whether statistical methods for cost outcomes have been appropriately used and interpretedFew publications presented adequate descriptive information for costs or performed appropriate statistical analysesIn at least two thirds of the papers, the main conclusions regarding costs were not justifiedThe analysis and reporting of health economic assessments within randomised controlled trials urgently need improving PMID:9794854

  2. Statistics corner: A guide to appropriate use of correlation coefficient in medical research.

    PubMed

    Mukaka, M M

    2012-09-01

    Correlation is a statistical method used to assess a possible linear association between two continuous variables. It is simple both to calculate and to interpret. However, misuse of correlation is so common among researchers that some statisticians have wished that the method had never been devised at all. The aim of this article is to provide a guide to appropriate use of correlation in medical research and to highlight some misuse. Examples of the applications of the correlation coefficient have been provided using data from statistical simulations as well as real data. Rule of thumb for interpreting size of a correlation coefficient has been provided.

  3. Assessing the suitability of fractional polynomial methods in health services research: a perspective on the categorization epidemic.

    PubMed

    Williams, Jennifer Stewart

    2011-07-01

    To show how fractional polynomial methods can usefully replace the practice of arbitrarily categorizing data in epidemiology and health services research. A health service setting is used to illustrate a structured and transparent way of representing non-linear data without arbitrary grouping. When age is a regressor its effects on an outcome will be interpreted differently depending upon the placing of cutpoints or the use of a polynomial transformation. Although it is common practice, categorization comes at a cost. Information is lost, and accuracy and statistical power reduced, leading to spurious statistical interpretation of the data. The fractional polynomial method is widely supported by statistical software programs, and deserves greater attention and use.

  4. Statistical Approaches to Interpretation of Local, Regional, and National Highway-Runoff and Urban-Stormwater Data

    USGS Publications Warehouse

    Tasker, Gary D.; Granato, Gregory E.

    2000-01-01

    Decision makers need viable methods for the interpretation of local, regional, and national-highway runoff and urban-stormwater data including flows, concentrations and loads of chemical constituents and sediment, potential effects on receiving waters, and the potential effectiveness of various best management practices (BMPs). Valid (useful for intended purposes), current, and technically defensible stormwater-runoff models are needed to interpret data collected in field studies, to support existing highway and urban-runoffplanning processes, to meet National Pollutant Discharge Elimination System (NPDES) requirements, and to provide methods for computation of Total Maximum Daily Loads (TMDLs) systematically and economically. Historically, conceptual, simulation, empirical, and statistical models of varying levels of detail, complexity, and uncertainty have been used to meet various data-quality objectives in the decision-making processes necessary for the planning, design, construction, and maintenance of highways and for other land-use applications. Water-quality simulation models attempt a detailed representation of the physical processes and mechanisms at a given site. Empirical and statistical regional water-quality assessment models provide a more general picture of water quality or changes in water quality over a region. All these modeling techniques share one common aspect-their predictive ability is poor without suitable site-specific data for calibration. To properly apply the correct model, one must understand the classification of variables, the unique characteristics of water-resources data, and the concept of population structure and analysis. Classifying variables being used to analyze data may determine which statistical methods are appropriate for data analysis. An understanding of the characteristics of water-resources data is necessary to evaluate the applicability of different statistical methods, to interpret the results of these techniques, and to use tools and techniques that account for the unique nature of water-resources data sets. Populations of data on stormwater-runoff quantity and quality are often best modeled as logarithmic transformations. Therefore, these factors need to be considered to form valid, current, and technically defensible stormwater-runoff models. Regression analysis is an accepted method for interpretation of water-resources data and for prediction of current or future conditions at sites that fit the input data model. Regression analysis is designed to provide an estimate of the average response of a system as it relates to variation in one or more known variables. To produce valid models, however, regression analysis should include visual analysis of scatterplots, an examination of the regression equation, evaluation of the method design assumptions, and regression diagnostics. A number of statistical techniques are described in the text and in the appendixes to provide information necessary to interpret data by use of appropriate methods. Uncertainty is an important part of any decisionmaking process. In order to deal with uncertainty problems, the analyst needs to know the severity of the statistical uncertainty of the methods used to predict water quality. Statistical models need to be based on information that is meaningful, representative, complete, precise, accurate, and comparable to be deemed valid, up to date, and technically supportable. To assess uncertainty in the analytical tools, the modeling methods, and the underlying data set, all of these components need be documented and communicated in an accessible format within project publications.

  5. The Effects of Data and Graph Type on Concepts and Visualizations of Variability

    ERIC Educational Resources Information Center

    Cooper, Linda L.; Shore, Felice S.

    2010-01-01

    Recognizing and interpreting variability in data lies at the heart of statistical reasoning. Since graphical displays should facilitate communication about data, statistical literacy should include an understanding of how variability in data can be gleaned from a graph. This paper identifies several types of graphs that students typically…

  6. Using Statistics to Lie, Distort, and Abuse Data

    ERIC Educational Resources Information Center

    Bintz, William; Moore, Sara; Adams, Cheryll; Pierce, Rebecca

    2009-01-01

    Statistics is a branch of mathematics that involves organization, presentation, and interpretation of data, both quantitative and qualitative. Data do not lie, but people do. On the surface, quantitative data are basically inanimate objects, nothing more than lifeless and meaningless symbols that appear on a page, calculator, computer, or in one's…

  7. How the Mastery Rubric for Statistical Literacy Can Generate Actionable Evidence about Statistical and Quantitative Learning Outcomes

    ERIC Educational Resources Information Center

    Tractenberg, Rochelle E.

    2017-01-01

    Statistical literacy is essential to an informed citizenry; and two emerging trends highlight a growing need for training that achieves this literacy. The first trend is towards "big" data: while automated analyses can exploit massive amounts of data, the interpretation--and possibly more importantly, the replication--of results are…

  8. Statistical Literacy as a Function of Online versus Hybrid Course Delivery Format for an Introductory Graduate Statistics Course

    ERIC Educational Resources Information Center

    Hahs-Vaughn, Debbie L.; Acquaye, Hannah; Griffith, Matthew D.; Jo, Hang; Matthews, Ken; Acharya, Parul

    2017-01-01

    Statistical literacy refers to understanding fundamental statistical concepts. Assessment of statistical literacy can take the forms of tasks that require students to identify, translate, compute, read, and interpret data. In addition, statistical instruction can take many forms encompassing course delivery format such as face-to-face, hybrid,…

  9. Evaluating Structural Equation Models for Categorical Outcomes: A New Test Statistic and a Practical Challenge of Interpretation.

    PubMed

    Monroe, Scott; Cai, Li

    2015-01-01

    This research is concerned with two topics in assessing model fit for categorical data analysis. The first topic involves the application of a limited-information overall test, introduced in the item response theory literature, to structural equation modeling (SEM) of categorical outcome variables. Most popular SEM test statistics assess how well the model reproduces estimated polychoric correlations. In contrast, limited-information test statistics assess how well the underlying categorical data are reproduced. Here, the recently introduced C2 statistic of Cai and Monroe (2014) is applied. The second topic concerns how the root mean square error of approximation (RMSEA) fit index can be affected by the number of categories in the outcome variable. This relationship creates challenges for interpreting RMSEA. While the two topics initially appear unrelated, they may conveniently be studied in tandem since RMSEA is based on an overall test statistic, such as C2. The results are illustrated with an empirical application to data from a large-scale educational survey.

  10. Using assemblage data in ecological indicators: A comparison and evaluation of commonly available statistical tools

    USGS Publications Warehouse

    Smith, Joseph M.; Mather, Martha E.

    2012-01-01

    Ecological indicators are science-based tools used to assess how human activities have impacted environmental resources. For monitoring and environmental assessment, existing species assemblage data can be used to make these comparisons through time or across sites. An impediment to using assemblage data, however, is that these data are complex and need to be simplified in an ecologically meaningful way. Because multivariate statistics are mathematical relationships, statistical groupings may not make ecological sense and will not have utility as indicators. Our goal was to define a process to select defensible and ecologically interpretable statistical simplifications of assemblage data in which researchers and managers can have confidence. For this, we chose a suite of statistical methods, compared the groupings that resulted from these analyses, identified convergence among groupings, then we interpreted the groupings using species and ecological guilds. When we tested this approach using a statewide stream fish dataset, not all statistical methods worked equally well. For our dataset, logistic regression (Log), detrended correspondence analysis (DCA), cluster analysis (CL), and non-metric multidimensional scaling (NMDS) provided consistent, simplified output. Specifically, the Log, DCA, CL-1, and NMDS-1 groupings were ≥60% similar to each other, overlapped with the fluvial-specialist ecological guild, and contained a common subset of species. Groupings based on number of species (e.g., Log, DCA, CL and NMDS) outperformed groupings based on abundance [e.g., principal components analysis (PCA) and Poisson regression]. Although the specific methods that worked on our test dataset have generality, here we are advocating a process (e.g., identifying convergent groupings with redundant species composition that are ecologically interpretable) rather than the automatic use of any single statistical tool. We summarize this process in step-by-step guidance for the future use of these commonly available ecological and statistical methods in preparing assemblage data for use in ecological indicators.

  11. Trends in Fertility in the United States. Vital and Health Statistics, Data from the National Vital Statistics System. Series 21, Number 28.

    ERIC Educational Resources Information Center

    Taffel, Selma

    This report presents and interprets birth statistics for the United States with particular emphasis on changes that took place during the period 1970-73. Data for the report were based on information entered on birth certificates collected from all states. The majority of the document comprises graphs and tables of data, but there are four short…

  12. The use of statistical tools in field testing of putative effects of genetically modified plants on nontarget organisms

    PubMed Central

    Semenov, Alexander V; Elsas, Jan Dirk; Glandorf, Debora C M; Schilthuizen, Menno; Boer, Willem F

    2013-01-01

    Abstract To fulfill existing guidelines, applicants that aim to place their genetically modified (GM) insect-resistant crop plants on the market are required to provide data from field experiments that address the potential impacts of the GM plants on nontarget organisms (NTO's). Such data may be based on varied experimental designs. The recent EFSA guidance document for environmental risk assessment (2010) does not provide clear and structured suggestions that address the statistics of field trials on effects on NTO's. This review examines existing practices in GM plant field testing such as the way of randomization, replication, and pseudoreplication. Emphasis is placed on the importance of design features used for the field trials in which effects on NTO's are assessed. The importance of statistical power and the positive and negative aspects of various statistical models are discussed. Equivalence and difference testing are compared, and the importance of checking the distribution of experimental data is stressed to decide on the selection of the proper statistical model. While for continuous data (e.g., pH and temperature) classical statistical approaches – for example, analysis of variance (ANOVA) – are appropriate, for discontinuous data (counts) only generalized linear models (GLM) are shown to be efficient. There is no golden rule as to which statistical test is the most appropriate for any experimental situation. In particular, in experiments in which block designs are used and covariates play a role GLMs should be used. Generic advice is offered that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in this testing. The combination of decision trees and a checklist for field trials, which are provided, will help in the interpretation of the statistical analyses of field trials and to assess whether such analyses were correctly applied. We offer generic advice to risk assessors and applicants that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in field testing. PMID:24567836

  13. The use of statistical tools in field testing of putative effects of genetically modified plants on nontarget organisms.

    PubMed

    Semenov, Alexander V; Elsas, Jan Dirk; Glandorf, Debora C M; Schilthuizen, Menno; Boer, Willem F

    2013-08-01

    To fulfill existing guidelines, applicants that aim to place their genetically modified (GM) insect-resistant crop plants on the market are required to provide data from field experiments that address the potential impacts of the GM plants on nontarget organisms (NTO's). Such data may be based on varied experimental designs. The recent EFSA guidance document for environmental risk assessment (2010) does not provide clear and structured suggestions that address the statistics of field trials on effects on NTO's. This review examines existing practices in GM plant field testing such as the way of randomization, replication, and pseudoreplication. Emphasis is placed on the importance of design features used for the field trials in which effects on NTO's are assessed. The importance of statistical power and the positive and negative aspects of various statistical models are discussed. Equivalence and difference testing are compared, and the importance of checking the distribution of experimental data is stressed to decide on the selection of the proper statistical model. While for continuous data (e.g., pH and temperature) classical statistical approaches - for example, analysis of variance (ANOVA) - are appropriate, for discontinuous data (counts) only generalized linear models (GLM) are shown to be efficient. There is no golden rule as to which statistical test is the most appropriate for any experimental situation. In particular, in experiments in which block designs are used and covariates play a role GLMs should be used. Generic advice is offered that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in this testing. The combination of decision trees and a checklist for field trials, which are provided, will help in the interpretation of the statistical analyses of field trials and to assess whether such analyses were correctly applied. We offer generic advice to risk assessors and applicants that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in field testing.

  14. Statistics Refresher for Molecular Imaging Technologists, Part 2: Accuracy of Interpretation, Significance, and Variance.

    PubMed

    Farrell, Mary Beth

    2018-06-01

    This article is the second part of a continuing education series reviewing basic statistics that nuclear medicine and molecular imaging technologists should understand. In this article, the statistics for evaluating interpretation accuracy, significance, and variance are discussed. Throughout the article, actual statistics are pulled from the published literature. We begin by explaining 2 methods for quantifying interpretive accuracy: interreader and intrareader reliability. Agreement among readers can be expressed simply as a percentage. However, the Cohen κ-statistic is a more robust measure of agreement that accounts for chance. The higher the κ-statistic is, the higher is the agreement between readers. When 3 or more readers are being compared, the Fleiss κ-statistic is used. Significance testing determines whether the difference between 2 conditions or interventions is meaningful. Statistical significance is usually expressed using a number called a probability ( P ) value. Calculation of P value is beyond the scope of this review. However, knowing how to interpret P values is important for understanding the scientific literature. Generally, a P value of less than 0.05 is considered significant and indicates that the results of the experiment are due to more than just chance. Variance, standard deviation (SD), confidence interval, and standard error (SE) explain the dispersion of data around a mean of a sample drawn from a population. SD is commonly reported in the literature. A small SD indicates that there is not much variation in the sample data. Many biologic measurements fall into what is referred to as a normal distribution taking the shape of a bell curve. In a normal distribution, 68% of the data will fall within 1 SD, 95% will fall within 2 SDs, and 99.7% will fall within 3 SDs. Confidence interval defines the range of possible values within which the population parameter is likely to lie and gives an idea of the precision of the statistic being measured. A wide confidence interval indicates that if the experiment were repeated multiple times on other samples, the measured statistic would lie within a wide range of possibilities. The confidence interval relies on the SE. © 2018 by the Society of Nuclear Medicine and Molecular Imaging.

  15. Workplace Statistical Literacy for Teachers: Interpreting Box Plots

    ERIC Educational Resources Information Center

    Pierce, Robyn; Chick, Helen

    2013-01-01

    As a consequence of the increased use of data in workplace environments, there is a need to understand the demands that are placed on users to make sense of such data. In education, teachers are being increasingly expected to interpret and apply complex data about student and school performance, and, yet it is not clear that they always have the…

  16. Making Decisions with Data: Are We Environmentally Friendly?

    ERIC Educational Resources Information Center

    English, Lyn; Watson, Jane

    2016-01-01

    Statistical literacy is a vital component of numeracy. Students need to learn to critically evaluate and interpret statistical information if they are to become informed citizens. This article examines a Year 5 unit of work that uses the data collection and analysis cycle within a sustainability context.

  17. Digital recovery, modification, and analysis of Tetra Tech seismic horizon mapping, National Petroleum Reserve Alaska (NPRA), northern Alaska

    USGS Publications Warehouse

    Saltus, R.W.; Kulander, Christopher S.; Potter, Christopher J.

    2002-01-01

    We have digitized, modified, and analyzed seismic interpretation maps of 12 subsurface stratigraphic horizons spanning portions of the National Petroleum Reserve in Alaska (NPRA). These original maps were prepared by Tetra Tech, Inc., based on about 15,000 miles of seismic data collected from 1974 to 1981. We have also digitized interpreted faults and seismic velocities from Tetra Tech maps. The seismic surfaces were digitized as two-way travel time horizons and converted to depth using Tetra Tech seismic velocities. The depth surfaces were then modified by long-wavelength corrections based on recent USGS seismic re-interpretation along regional seismic lines. We have developed and executed an algorithm to identify and calculate statistics on the area, volume, height, and depth of closed structures based on these seismic horizons. These closure statistics are tabulated and have been used as input to oil and gas assessment calculations for the region. Directories accompanying this report contain basic digitized data, processed data, maps, tabulations of closure statistics, and software relating to this project.

  18. Statistical transformation and the interpretation of inpatient glucose control data from the intensive care unit.

    PubMed

    Saulnier, George E; Castro, Janna C; Cook, Curtiss B

    2014-05-01

    Glucose control can be problematic in critically ill patients. We evaluated the impact of statistical transformation on interpretation of intensive care unit inpatient glucose control data. Point-of-care blood glucose (POC-BG) data derived from patients in the intensive care unit for 2011 was obtained. Box-Cox transformation of POC-BG measurements was performed, and distribution of data was determined before and after transformation. Different data subsets were used to establish statistical upper and lower control limits. Exponentially weighted moving average (EWMA) control charts constructed from April, October, and November data determined whether out-of-control events could be identified differently in transformed versus nontransformed data. A total of 8679 POC-BG values were analyzed. POC-BG distributions in nontransformed data were skewed but approached normality after transformation. EWMA control charts revealed differences in projected detection of out-of-control events. In April, an out-of-control process resulting in the lower control limit being exceeded was identified at sample 116 in nontransformed data but not in transformed data. October transformed data detected an out-of-control process exceeding the upper control limit at sample 27 that was not detected in nontransformed data. Nontransformed November results remained in control, but transformation identified an out-of-control event less than 10 samples into the observation period. Using statistical methods to assess population-based glucose control in the intensive care unit could alter conclusions about the effectiveness of care processes for managing hyperglycemia. Further study is required to determine whether transformed versus nontransformed data change clinical decisions about the interpretation of care or intervention results. © 2014 Diabetes Technology Society.

  19. Statistical Transformation and the Interpretation of Inpatient Glucose Control Data From the Intensive Care Unit

    PubMed Central

    Saulnier, George E.; Castro, Janna C.

    2014-01-01

    Glucose control can be problematic in critically ill patients. We evaluated the impact of statistical transformation on interpretation of intensive care unit inpatient glucose control data. Point-of-care blood glucose (POC-BG) data derived from patients in the intensive care unit for 2011 was obtained. Box–Cox transformation of POC-BG measurements was performed, and distribution of data was determined before and after transformation. Different data subsets were used to establish statistical upper and lower control limits. Exponentially weighted moving average (EWMA) control charts constructed from April, October, and November data determined whether out-of-control events could be identified differently in transformed versus nontransformed data. A total of 8679 POC-BG values were analyzed. POC-BG distributions in nontransformed data were skewed but approached normality after transformation. EWMA control charts revealed differences in projected detection of out-of-control events. In April, an out-of-control process resulting in the lower control limit being exceeded was identified at sample 116 in nontransformed data but not in transformed data. October transformed data detected an out-of-control process exceeding the upper control limit at sample 27 that was not detected in nontransformed data. Nontransformed November results remained in control, but transformation identified an out-of-control event less than 10 samples into the observation period. Using statistical methods to assess population-based glucose control in the intensive care unit could alter conclusions about the effectiveness of care processes for managing hyperglycemia. Further study is required to determine whether transformed versus nontransformed data change clinical decisions about the interpretation of care or intervention results. PMID:24876620

  20. Statistics: The Shape of the Data. Used Numbers: Real Data in the Classroom. Grades 4-6.

    ERIC Educational Resources Information Center

    Russell, Susan Jo; Corwin, Rebecca B.

    A unit of study that introduces collecting, representing, describing, and interpreting data is presented. Suitable for students in grades 4 through 6, it provides a foundation for further work in statistics and data analysis. The investigations may extend from one to four class sessions and are grouped into three parts: "Introduction to Data…

  1. Survival analysis, or what to do with upper limits in astronomical surveys

    NASA Technical Reports Server (NTRS)

    Isobe, Takashi; Feigelson, Eric D.

    1986-01-01

    A field of applied statistics called survival analysis has been developed over several decades to deal with censored data, which occur in astronomical surveys when objects are too faint to be detected. How these methods can assist in the statistical interpretation of astronomical data are reviewed.

  2. What Is Missing in Counseling Research? Reporting Missing Data

    ERIC Educational Resources Information Center

    Sterner, William R.

    2011-01-01

    Missing data have long been problematic in quantitative research. Despite the statistical and methodological advances made over the past 3 decades, counseling researchers fail to provide adequate information on this phenomenon. Interpreting the complex statistical procedures and esoteric language seems to be a contributing factor. An overview of…

  3. Investigation of Statistical Inference Methodologies Through Scale Model Propagation Experiments

    DTIC Science & Technology

    2015-09-30

    statistical inference methodologies for ocean- acoustic problems by investigating and applying statistical methods to data collected from scale-model...to begin planning experiments for statistical inference applications. APPROACH In the ocean acoustics community over the past two decades...solutions for waveguide parameters. With the introduction of statistical inference to the field of ocean acoustics came the desire to interpret marginal

  4. The Sport Students’ Ability of Literacy and Statistical Reasoning

    NASA Astrophysics Data System (ADS)

    Hidayah, N.

    2017-03-01

    The ability of literacy and statistical reasoning is very important for the students of sport education college due to the materials of statistical learning can be taken from their many activities such as sport competition, the result of test and measurement, predicting achievement based on training, finding connection among variables, and others. This research tries to describe the sport education college students’ ability of literacy and statistical reasoning related to the identification of data type, probability, table interpretation, description and explanation by using bar or pie graphic, explanation of variability, interpretation, the calculation and explanation of mean, median, and mode through an instrument. This instrument is tested to 50 college students majoring in sport resulting only 26% of all students have the ability above 30% while others still below 30%. Observing from all subjects; 56% of students have the ability of identification data classification, 49% of students have the ability to read, display and interpret table through graphic, 27% students have the ability in probability, 33% students have the ability to describe variability, and 16.32% students have the ability to read, count and describe mean, median and mode. The result of this research shows that the sport students’ ability of literacy and statistical reasoning has not been adequate and students’ statistical study has not reached comprehending concept, literary ability trining and statistical rasoning, so it is critical to increase the sport students’ ability of literacy and statistical reasoning

  5. Statistical Analysis of Time-Series from Monitoring of Active Volcanic Vents

    NASA Astrophysics Data System (ADS)

    Lachowycz, S.; Cosma, I.; Pyle, D. M.; Mather, T. A.; Rodgers, M.; Varley, N. R.

    2016-12-01

    Despite recent advances in the collection and analysis of time-series from volcano monitoring, and the resulting insights into volcanic processes, challenges remain in forecasting and interpreting activity from near real-time analysis of monitoring data. Statistical methods have potential to characterise the underlying structure and facilitate intercomparison of these time-series, and so inform interpretation of volcanic activity. We explore the utility of multiple statistical techniques that could be widely applicable to monitoring data, including Shannon entropy and detrended fluctuation analysis, by their application to various data streams from volcanic vents during periods of temporally variable activity. Each technique reveals changes through time in the structure of some of the data that were not apparent from conventional analysis. For example, we calculate the Shannon entropy (a measure of the randomness of a signal) of time-series from the recent dome-forming eruptions of Volcán de Colima (Mexico) and Soufrière Hills (Montserrat). The entropy of real-time seismic measurements and the count rate of certain volcano-seismic event types from both volcanoes is found to be temporally variable, with these data generally having higher entropy during periods of lava effusion and/or larger explosions. In some instances, the entropy shifts prior to or coincident with changes in seismic or eruptive activity, some of which were not clearly recognised by real-time monitoring. Comparison with other statistics demonstrates the sensitivity of the entropy to the data distribution, but that it is distinct from conventional statistical measures such as coefficient of variation. We conclude that each analysis technique examined could provide valuable insights for interpretation of diverse monitoring time-series.

  6. Efficient Geological Modelling of Large AEM Surveys

    NASA Astrophysics Data System (ADS)

    Bach, Torben; Martlev Pallesen, Tom; Jørgensen, Flemming; Lundh Gulbrandsen, Mats; Mejer Hansen, Thomas

    2014-05-01

    Combining geological expert knowledge with geophysical observations into a final 3D geological model is, in most cases, not a straight forward process. It typically involves many types of data and requires both an understanding of the data and the geological target. When dealing with very large areas, such as modelling of large AEM surveys, the manual task for the geologist to correctly evaluate and properly utilise all the data available in the survey area, becomes overwhelming. In the ERGO project (Efficient High-Resolution Geological Modelling) we address these issues and propose a new modelling methodology enabling fast and consistent modelling of very large areas. The vision of the project is to build a user friendly expert system that enables the combination of very large amounts of geological and geophysical data with geological expert knowledge. This is done in an "auto-pilot" type functionality, named Smart Interpretation, designed to aid the geologist in the interpretation process. The core of the expert system is a statistical model that describes the relation between data and geological interpretation made by a geological expert. This facilitates fast and consistent modelling of very large areas. It will enable the construction of models with high resolution as the system will "learn" the geology of an area directly from interpretations made by a geological expert, and instantly apply it to all hard data in the survey area, ensuring the utilisation of all the data available in the geological model. Another feature is that the statistical model the system creates for one area can be used in another area with similar data and geology. This feature can be useful as an aid to an untrained geologist to build a geological model, guided by the experienced geologist way of interpretation, as quantified by the expert system in the core statistical model. In this project presentation we provide some examples of the problems we are aiming to address in the project, and show some preliminary results.

  7. SOCR: Statistics Online Computational Resource

    ERIC Educational Resources Information Center

    Dinov, Ivo D.

    2006-01-01

    The need for hands-on computer laboratory experience in undergraduate and graduate statistics education has been firmly established in the past decade. As a result a number of attempts have been undertaken to develop novel approaches for problem-driven statistical thinking, data analysis and result interpretation. In this paper we describe an…

  8. Thinking with Data

    ERIC Educational Resources Information Center

    Smith, Amy; Molinaro, Marco; Lee, Alisa; Guzman-Alvarez, Alberto

    2014-01-01

    For students to be successful in STEM, they need "statistical literacy," the ability to interpret, evaluate, and communicate statistical information (Gal 2002). The science and engineering practices dimension of the "Next Generation Science Standards" ("NGSS") highlights these skills, emphasizing the importance of…

  9. Emerging Personnel Requirements in Academic Libraries as Reflected in Recent Position Announcements.

    ERIC Educational Resources Information Center

    Block, David

    This study of the personnel requirements and hiring patterns of academic libraries draws on data collected from academic library position announcements issued nationwide during the fourth quarter of 1980. Data on 224 announcements were analyzed using the Statistical Package for the Social Sciences, and the resulting statistics are interpreted as a…

  10. Comparing the Lifetimes of Two Brands of Batteries

    ERIC Educational Resources Information Center

    Dunn, Peter K.

    2013-01-01

    In this paper, we report a case study that illustrates the importance in interpreting the results from statistical tests, and shows the difference between practical importance and statistical significance. This case study presents three sets of data concerning the performance of two brands of batteries. The data are easy to describe and…

  11. Airborne gamma-ray spectrometer and magnetometer survey, Durango D, Colorado. Final report Volume II A. Detail area

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    1983-01-01

    This volume contains geology of the Durango D detail area, radioactive mineral occurrences in Colorado, and geophysical data interpretation. Eight appendices provide: stacked profiles, geologic histograms, geochemical histograms, speed and altitude histograms, geologic statistical tables, geochemical statistical tables, magnetic and ancillary profiles, and test line data.

  12. Statistics and Data Interpretation for Social Work

    ERIC Educational Resources Information Center

    Rosenthal, James A.

    2011-01-01

    Written by a social worker for social work students, this is a nuts and bolts guide to statistics that presents complex calculations and concepts in clear, easy-to-understand language. It includes numerous examples, data sets, and issues that students will encounter in social work practice. The first section introduces basic concepts and terms to…

  13. Enhancing the Development of Statistical Literacy through the Robot Bioglyph

    ERIC Educational Resources Information Center

    Bragg, Leicha A.; Koch, Jessica; Willis, Ashley

    2017-01-01

    One way to heighten students' interest in the classroom is by personalising tasks. Through designing Robot Bioglyphs students are able to explore personalised data through a creative and engaging process. By understanding, producing and interpreting data, students are able to develop their statistical literacy, which is an essential skill in…

  14. A scan statistic to extract causal gene clusters from case-control genome-wide rare CNV data.

    PubMed

    Nishiyama, Takeshi; Takahashi, Kunihiko; Tango, Toshiro; Pinto, Dalila; Scherer, Stephen W; Takami, Satoshi; Kishino, Hirohisa

    2011-05-26

    Several statistical tests have been developed for analyzing genome-wide association data by incorporating gene pathway information in terms of gene sets. Using these methods, hundreds of gene sets are typically tested, and the tested gene sets often overlap. This overlapping greatly increases the probability of generating false positives, and the results obtained are difficult to interpret, particularly when many gene sets show statistical significance. We propose a flexible statistical framework to circumvent these problems. Inspired by spatial scan statistics for detecting clustering of disease occurrence in the field of epidemiology, we developed a scan statistic to extract disease-associated gene clusters from a whole gene pathway. Extracting one or a few significant gene clusters from a global pathway limits the overall false positive probability, which results in increased statistical power, and facilitates the interpretation of test results. In the present study, we applied our method to genome-wide association data for rare copy-number variations, which have been strongly implicated in common diseases. Application of our method to a simulated dataset demonstrated the high accuracy of this method in detecting disease-associated gene clusters in a whole gene pathway. The scan statistic approach proposed here shows a high level of accuracy in detecting gene clusters in a whole gene pathway. This study has provided a sound statistical framework for analyzing genome-wide rare CNV data by incorporating topological information on the gene pathway.

  15. Genome Expression Pathway Analysis Tool – Analysis and visualization of microarray gene expression data under genomic, proteomic and metabolic context

    PubMed Central

    Weniger, Markus; Engelmann, Julia C; Schultz, Jörg

    2007-01-01

    Background Regulation of gene expression is relevant to many areas of biology and medicine, in the study of treatments, diseases, and developmental stages. Microarrays can be used to measure the expression level of thousands of mRNAs at the same time, allowing insight into or comparison of different cellular conditions. The data derived out of microarray experiments is highly dimensional and often noisy, and interpretation of the results can get intricate. Although programs for the statistical analysis of microarray data exist, most of them lack an integration of analysis results and biological interpretation. Results We have developed GEPAT, Genome Expression Pathway Analysis Tool, offering an analysis of gene expression data under genomic, proteomic and metabolic context. We provide an integration of statistical methods for data import and data analysis together with a biological interpretation for subsets of probes or single probes on the chip. GEPAT imports various types of oligonucleotide and cDNA array data formats. Different normalization methods can be applied to the data, afterwards data annotation is performed. After import, GEPAT offers various statistical data analysis methods, as hierarchical, k-means and PCA clustering, a linear model based t-test or chromosomal profile comparison. The results of the analysis can be interpreted by enrichment of biological terms, pathway analysis or interaction networks. Different biological databases are included, to give various information for each probe on the chip. GEPAT offers no linear work flow, but allows the usage of any subset of probes and samples as a start for a new data analysis. GEPAT relies on established data analysis packages, offers a modular approach for an easy extension, and can be run on a computer grid to allow a large number of users. It is freely available under the LGPL open source license for academic and commercial users at . Conclusion GEPAT is a modular, scalable and professional-grade software integrating analysis and interpretation of microarray gene expression data. An installation available for academic users can be found at . PMID:17543125

  16. Application of multivariable statistical techniques in plant-wide WWTP control strategies analysis.

    PubMed

    Flores, X; Comas, J; Roda, I R; Jiménez, L; Gernaey, K V

    2007-01-01

    The main objective of this paper is to present the application of selected multivariable statistical techniques in plant-wide wastewater treatment plant (WWTP) control strategies analysis. In this study, cluster analysis (CA), principal component analysis/factor analysis (PCA/FA) and discriminant analysis (DA) are applied to the evaluation matrix data set obtained by simulation of several control strategies applied to the plant-wide IWA Benchmark Simulation Model No 2 (BSM2). These techniques allow i) to determine natural groups or clusters of control strategies with a similar behaviour, ii) to find and interpret hidden, complex and casual relation features in the data set and iii) to identify important discriminant variables within the groups found by the cluster analysis. This study illustrates the usefulness of multivariable statistical techniques for both analysis and interpretation of the complex multicriteria data sets and allows an improved use of information for effective evaluation of control strategies.

  17. Statistical transformation and the interpretation of inpatient glucose control data.

    PubMed

    Saulnier, George E; Castro, Janna C; Cook, Curtiss B

    2014-03-01

    To introduce a statistical method of assessing hospital-based non-intensive care unit (non-ICU) inpatient glucose control. Point-of-care blood glucose (POC-BG) data from hospital non-ICUs were extracted for January 1 through December 31, 2011. Glucose data distribution was examined before and after Box-Cox transformations and compared to normality. Different subsets of data were used to establish upper and lower control limits, and exponentially weighted moving average (EWMA) control charts were constructed from June, July, and October data as examples to determine if out-of-control events were identified differently in nontransformed versus transformed data. A total of 36,381 POC-BG values were analyzed. In all 3 monthly test samples, glucose distributions in nontransformed data were skewed but approached a normal distribution once transformed. Interpretation of out-of-control events from EWMA control chart analyses also revealed differences. In the June test data, an out-of-control process was identified at sample 53 with nontransformed data, whereas the transformed data remained in control for the duration of the observed period. Analysis of July data demonstrated an out-of-control process sooner in the transformed (sample 55) than nontransformed (sample 111) data, whereas for October, transformed data remained in control longer than nontransformed data. Statistical transformations increase the normal behavior of inpatient non-ICU glycemic data sets. The decision to transform glucose data could influence the interpretation and conclusions about the status of inpatient glycemic control. Further study is required to determine whether transformed versus nontransformed data influence clinical decisions or evaluation of interventions.

  18. Assessing compositional variability through graphical analysis and Bayesian statistical approaches: case studies on transgenic crops.

    PubMed

    Harrigan, George G; Harrison, Jay M

    2012-01-01

    New transgenic (GM) crops are subjected to extensive safety assessments that include compositional comparisons with conventional counterparts as a cornerstone of the process. The influence of germplasm, location, environment, and agronomic treatments on compositional variability is, however, often obscured in these pair-wise comparisons. Furthermore, classical statistical significance testing can often provide an incomplete and over-simplified summary of highly responsive variables such as crop composition. In order to more clearly describe the influence of the numerous sources of compositional variation we present an introduction to two alternative but complementary approaches to data analysis and interpretation. These include i) exploratory data analysis (EDA) with its emphasis on visualization and graphics-based approaches and ii) Bayesian statistical methodology that provides easily interpretable and meaningful evaluations of data in terms of probability distributions. The EDA case-studies include analyses of herbicide-tolerant GM soybean and insect-protected GM maize and soybean. Bayesian approaches are presented in an analysis of herbicide-tolerant GM soybean. Advantages of these approaches over classical frequentist significance testing include the more direct interpretation of results in terms of probabilities pertaining to quantities of interest and no confusion over the application of corrections for multiple comparisons. It is concluded that a standardized framework for these methodologies could provide specific advantages through enhanced clarity of presentation and interpretation in comparative assessments of crop composition.

  19. A Generalized Approach for the Interpretation of Geophysical Well Logs in Ground-Water Studies - Theory and Application

    USGS Publications Warehouse

    Paillet, Frederick L.; Crowder, R.E.

    1996-01-01

    Quantitative analysis of geophysical logs in ground-water studies often involves at least as broad a range of applications and variation in lithology as is typically encountered in petroleum exploration, making such logs difficult to calibrate and complicating inversion problem formulation. At the same time, data inversion and analysis depend on inversion model formulation and refinement, so that log interpretation cannot be deferred to a geophysical log specialist unless active involvement with interpretation can be maintained by such an expert over the lifetime of the project. We propose a generalized log-interpretation procedure designed to guide hydrogeologists in the interpretation of geophysical logs, and in the integration of log data into ground-water models that may be systematically refined and improved in an iterative way. The procedure is designed to maximize the effective use of three primary contributions from geophysical logs: (1) The continuous depth scale of the measurements along the well bore; (2) The in situ measurement of lithologic properties and the correlation with hydraulic properties of the formations over a finite sample volume; and (3) Multiple independent measurements that can potentially be inverted for multiple physical or hydraulic properties of interest. The approach is formulated in the context of geophysical inversion theory, and is designed to be interfaced with surface geophysical soundings and conventional hydraulic testing. The step-by-step procedures given in our generalized interpretation and inversion technique are based on both qualitative analysis designed to assist formulation of the interpretation model, and quantitative analysis used to assign numerical values to model parameters. The approach bases a decision as to whether quantitative inversion is statistically warranted by formulating an over-determined inversion. If no such inversion is consistent with the inversion model, quantitative inversion is judged not possible with the given data set. Additional statistical criteria such as the statistical significance of regressions are used to guide the subsequent calibration of geophysical data in terms of hydraulic variables in those situations where quantitative data inversion is considered appropriate.

  20. Statistics Poster Challenge for Schools

    ERIC Educational Resources Information Center

    Payne, Brad; Freeman, Jenny; Stillman, Eleanor

    2013-01-01

    The analysis and interpretation of data are important life skills. A poster challenge for schoolchildren provides an innovative outlet for these skills and demonstrates their relevance to daily life. We discuss our Statistics Poster Challenge and the lessons we have learned.

  1. Equivalent statistics and data interpretation.

    PubMed

    Francis, Gregory

    2017-08-01

    Recent reform efforts in psychological science have led to a plethora of choices for scientists to analyze their data. A scientist making an inference about their data must now decide whether to report a p value, summarize the data with a standardized effect size and its confidence interval, report a Bayes Factor, or use other model comparison methods. To make good choices among these options, it is necessary for researchers to understand the characteristics of the various statistics used by the different analysis frameworks. Toward that end, this paper makes two contributions. First, it shows that for the case of a two-sample t test with known sample sizes, many different summary statistics are mathematically equivalent in the sense that they are based on the very same information in the data set. When the sample sizes are known, the p value provides as much information about a data set as the confidence interval of Cohen's d or a JZS Bayes factor. Second, this equivalence means that different analysis methods differ only in their interpretation of the empirical data. At first glance, it might seem that mathematical equivalence of the statistics suggests that it does not matter much which statistic is reported, but the opposite is true because the appropriateness of a reported statistic is relative to the inference it promotes. Accordingly, scientists should choose an analysis method appropriate for their scientific investigation. A direct comparison of the different inferential frameworks provides some guidance for scientists to make good choices and improve scientific practice.

  2. Using 3D visualization and seismic attributes to improve structural and stratigraphic resolution of reservoirs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kerr, J.; Jones, G.L.

    1996-01-01

    Recent advances in hardware and software have given the interpreter and engineer new ways to view 3D seismic data and well bore information. Recent papers have also highlighted the use of various statistics and seismic attributes. By combining new 3D rendering technologies with recent trends in seismic analysis, the interpreter can improve the structural and stratigraphic resolution of hydrocarbon reservoirs. This paper gives several examples using 3D visualization to better define both the structural and stratigraphic aspects of several different structural types from around the world. Statistics, 3D visualization techniques and rapid animation are used to show complex faulting andmore » detailed channel systems. These systems would be difficult to map using either 2D or 3D data with conventional interpretation techniques.« less

  3. Using 3D visualization and seismic attributes to improve structural and stratigraphic resolution of reservoirs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kerr, J.; Jones, G.L.

    1996-12-31

    Recent advances in hardware and software have given the interpreter and engineer new ways to view 3D seismic data and well bore information. Recent papers have also highlighted the use of various statistics and seismic attributes. By combining new 3D rendering technologies with recent trends in seismic analysis, the interpreter can improve the structural and stratigraphic resolution of hydrocarbon reservoirs. This paper gives several examples using 3D visualization to better define both the structural and stratigraphic aspects of several different structural types from around the world. Statistics, 3D visualization techniques and rapid animation are used to show complex faulting andmore » detailed channel systems. These systems would be difficult to map using either 2D or 3D data with conventional interpretation techniques.« less

  4. Vocational students' learning preferences: the interpretability of ipsative data.

    PubMed

    Smith, P J

    2000-02-01

    A number of researchers have argued that ipsative data are not suitable for statistical procedures designed for normative data. Others have argued that the interpretability of such analyses of ipsative data are little affected where the number of variables and the sample size are sufficiently large. The research reported here represents a factor analysis of the scores on the Canfield Learning Styles Inventory for 1,252 students in vocational education. The results of the factor analysis of these ipsative data were examined in a context of existing theory and research on vocational students and lend support to the argument that the factor analysis of ipsative data can provide sensibly interpretable results.

  5. Airborne gamma-ray spectrometer and magnetometer survey, Durango A, Colorado. Final report Volume II A. Detail area

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    1983-01-01

    This volume contains geology of the Durango A detail area, radioactive mineral occurences in Colorado, and geophysical data interpretation. Eight appendices provide the following: stacked profiles, geologic histograms, geochemical histograms, speed and altitude histograms, geologic statistical tables, geochemical statistical tables, magnetic and ancillary profiles, and test line data.

  6. Airborne gamma-ray spectrometer and magnetometer survey, Durango B, Colorado. Final report Volume II A. Detail area

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    1983-01-01

    The geology of the Durango B detail area, the radioactive mineral occurrences in Colorado and the geophysical data interpretation are included in this report. Seven appendices contain: stacked profiles, geologic histograms, geochemical histograms, speed and altitude histograms, geologic statistical tables, geochemical statistical tables, and test line data.

  7. Workplace statistical literacy for teachers: interpreting box plots

    NASA Astrophysics Data System (ADS)

    Pierce, Robyn; Chick, Helen

    2013-06-01

    As a consequence of the increased use of data in workplace environments, there is a need to understand the demands that are placed on users to make sense of such data. In education, teachers are being increasingly expected to interpret and apply complex data about student and school performance, and, yet it is not clear that they always have the appropriate knowledge and experience to interpret the graphs, tables and other data that they receive. This study examined the statistical literacy demands placed on teachers, with a particular focus on box plot representations. Although box plots summarise the data in a way that makes visual comparisons possible across sets of data, this study showed that teachers do not always have the necessary fluency with the representation to describe correctly how the data are distributed in the representation. In particular, a significant number perceived the size of the regions of the box plot to be depicting frequencies rather than density, and there were misconceptions associated with outlying data that were not displayed on the plot. As well, teachers' perceptions of box plots were found to relate to three themes: attitudes, perceived value and misconceptions.

  8. Statistics Clinic

    NASA Technical Reports Server (NTRS)

    Feiveson, Alan H.; Foy, Millennia; Ploutz-Snyder, Robert; Fiedler, James

    2014-01-01

    Do you have elevated p-values? Is the data analysis process getting you down? Do you experience anxiety when you need to respond to criticism of statistical methods in your manuscript? You may be suffering from Insufficient Statistical Support Syndrome (ISSS). For symptomatic relief of ISSS, come for a free consultation with JSC biostatisticians at our help desk during the poster sessions at the HRP Investigators Workshop. Get answers to common questions about sample size, missing data, multiple testing, when to trust the results of your analyses and more. Side effects may include sudden loss of statistics anxiety, improved interpretation of your data, and increased confidence in your results.

  9. 100 Students

    ERIC Educational Resources Information Center

    Riskowski, Jody L.; Olbricht, Gayla; Wilson, Jennifer

    2010-01-01

    Statistics is the art and science of gathering, analyzing, and making conclusions from data. However, many people do not fully understand how to interpret statistical results and conclusions. Placing students in a collaborative environment involving project-based learning may enable them to overcome misconceptions of probability and enhance the…

  10. Rare earth element geochemistry of shallow carbonate outcropping strata in Saudi Arabia: Application for depositional environments prediction

    NASA Astrophysics Data System (ADS)

    Eltom, Hassan A.; Abdullatif, Osman M.; Makkawi, Mohammed H.; Eltoum, Isam-Eldin A.

    2017-03-01

    The interpretation of depositional environments provides important information to understand facies distribution and geometry. The classical approach to interpret depositional environments principally relies on the analysis of lithofacies, biofacies and stratigraphic data, among others. An alternative method, based on geochemical data (chemical element data), is advantageous because it can simply, reproducibly and efficiently interpret and refine the interpretation of the depositional environment of carbonate strata. Here we geochemically analyze and statistically model carbonate samples (n = 156) from seven sections of the Arab-D reservoir outcrop analog of central Saudi Arabia, to determine whether the elemental signatures (major, trace and rare earth elements [REEs]) can be effectively used to predict depositional environments. We find that lithofacies associations of the studied outcrop (peritidal to open marine depositional environments) possess altered REE signatures, and that this trend increases stratigraphically from bottom-to-top, which corresponds to an upward shallowing of depositional environments. The relationship between REEs and major, minor and trace elements indicates that contamination by detrital materials is the principal source of REEs, whereas redox condition, marine and diagenetic processes have minimal impact on the relative distribution of REEs in the lithofacies. In a statistical model (factor analysis and logistic regression), REEs, major and trace elements cluster together and serve as markers to differentiate between peritidal and open marine facies and to differentiate between intertidal and subtidal lithofacies within the peritidal facies. The results indicate that statistical modelling of the elemental composition of carbonate strata can be used as a quantitative method to predict depositional environments and regional paleogeography. The significance of this study lies in offering new assessments of the relationships between lithofacies and geochemical elements by using advanced statistical analysis, a method that could be used elsewhere to interpret depositional environment and refine facies models.

  11. The disagreeable behaviour of the kappa statistic.

    PubMed

    Flight, Laura; Julious, Steven A

    2015-01-01

    It is often of interest to measure the agreement between a number of raters when an outcome is nominal or ordinal. The kappa statistic is used as a measure of agreement. The statistic is highly sensitive to the distribution of the marginal totals and can produce unreliable results. Other statistics such as the proportion of concordance, maximum attainable kappa and prevalence and bias adjusted kappa should be considered to indicate how well the kappa statistic represents agreement in the data. Each kappa should be considered and interpreted based on the context of the data being analysed. Copyright © 2014 John Wiley & Sons, Ltd.

  12. The Australian Pharmaceutical Benefits Scheme data collection: a practical guide for researchers.

    PubMed

    Mellish, Leigh; Karanges, Emily A; Litchfield, Melisa J; Schaffer, Andrea L; Blanch, Bianca; Daniels, Benjamin J; Segrave, Alicia; Pearson, Sallie-Anne

    2015-11-02

    The Pharmaceutical Benefits Scheme (PBS) is Australia's national drug subsidy program. This paper provides a practical guide to researchers using PBS data to examine prescribed medicine use. Excerpts of the PBS data collection are available in a variety of formats. We describe the core components of four publicly available extracts (the Australian Statistics on Medicines, PBS statistics online, section 85 extract, under co-payment extract). We also detail common analytical challenges and key issues regarding the interpretation of utilisation using the PBS collection and its various extracts. Research using routinely collected data is increasing internationally. PBS data are a valuable resource for Australian pharmacoepidemiological and pharmaceutical policy research. A detailed knowledge of the PBS, the nuances of data capture, and the extracts available for research purposes are necessary to ensure robust methodology, interpretation, and translation of study findings into policy and practice.

  13. A Tutorial in Bayesian Potential Outcomes Mediation Analysis.

    PubMed

    Miočević, Milica; Gonzalez, Oscar; Valente, Matthew J; MacKinnon, David P

    2018-01-01

    Statistical mediation analysis is used to investigate intermediate variables in the relation between independent and dependent variables. Causal interpretation of mediation analyses is challenging because randomization of subjects to levels of the independent variable does not rule out the possibility of unmeasured confounders of the mediator to outcome relation. Furthermore, commonly used frequentist methods for mediation analysis compute the probability of the data given the null hypothesis, which is not the probability of a hypothesis given the data as in Bayesian analysis. Under certain assumptions, applying the potential outcomes framework to mediation analysis allows for the computation of causal effects, and statistical mediation in the Bayesian framework gives indirect effects probabilistic interpretations. This tutorial combines causal inference and Bayesian methods for mediation analysis so the indirect and direct effects have both causal and probabilistic interpretations. Steps in Bayesian causal mediation analysis are shown in the application to an empirical example.

  14. Comparison between deterministic and statistical wavelet estimation methods through predictive deconvolution: Seismic to well tie example from the North Sea

    NASA Astrophysics Data System (ADS)

    de Macedo, Isadora A. S.; da Silva, Carolina B.; de Figueiredo, J. J. S.; Omoboya, Bode

    2017-01-01

    Wavelet estimation as well as seismic-to-well tie procedures are at the core of every seismic interpretation workflow. In this paper we perform a comparative study of wavelet estimation methods for seismic-to-well tie. Two approaches to wavelet estimation are discussed: a deterministic estimation, based on both seismic and well log data, and a statistical estimation, based on predictive deconvolution and the classical assumptions of the convolutional model, which provides a minimum-phase wavelet. Our algorithms, for both wavelet estimation methods introduce a semi-automatic approach to determine the optimum parameters of deterministic wavelet estimation and statistical wavelet estimation and, further, to estimate the optimum seismic wavelets by searching for the highest correlation coefficient between the recorded trace and the synthetic trace, when the time-depth relationship is accurate. Tests with numerical data show some qualitative conclusions, which are probably useful for seismic inversion and interpretation of field data, by comparing deterministic wavelet estimation and statistical wavelet estimation in detail, especially for field data example. The feasibility of this approach is verified on real seismic and well data from Viking Graben field, North Sea, Norway. Our results also show the influence of the washout zones on well log data on the quality of the well to seismic tie.

  15. Application of multivariate statistical techniques in microbial ecology.

    PubMed

    Paliy, O; Shankar, V

    2016-03-01

    Recent advances in high-throughput methods of molecular analyses have led to an explosion of studies generating large-scale ecological data sets. In particular, noticeable effect has been attained in the field of microbial ecology, where new experimental approaches provided in-depth assessments of the composition, functions and dynamic changes of complex microbial communities. Because even a single high-throughput experiment produces large amount of data, powerful statistical techniques of multivariate analysis are well suited to analyse and interpret these data sets. Many different multivariate techniques are available, and often it is not clear which method should be applied to a particular data set. In this review, we describe and compare the most widely used multivariate statistical techniques including exploratory, interpretive and discriminatory procedures. We consider several important limitations and assumptions of these methods, and we present examples of how these approaches have been utilized in recent studies to provide insight into the ecology of the microbial world. Finally, we offer suggestions for the selection of appropriate methods based on the research question and data set structure. © 2016 John Wiley & Sons Ltd.

  16. Exploratory Visual Analysis of Statistical Results from Microarray Experiments Comparing High and Low Grade Glioma

    PubMed Central

    Reif, David M.; Israel, Mark A.; Moore, Jason H.

    2007-01-01

    The biological interpretation of gene expression microarray results is a daunting challenge. For complex diseases such as cancer, wherein the body of published research is extensive, the incorporation of expert knowledge provides a useful analytical framework. We have previously developed the Exploratory Visual Analysis (EVA) software for exploring data analysis results in the context of annotation information about each gene, as well as biologically relevant groups of genes. We present EVA as a flexible combination of statistics and biological annotation that provides a straightforward visual interface for the interpretation of microarray analyses of gene expression in the most commonly occuring class of brain tumors, glioma. We demonstrate the utility of EVA for the biological interpretation of statistical results by analyzing publicly available gene expression profiles of two important glial tumors. The results of a statistical comparison between 21 malignant, high-grade glioblastoma multiforme (GBM) tumors and 19 indolent, low-grade pilocytic astrocytomas were analyzed using EVA. By using EVA to examine the results of a relatively simple statistical analysis, we were able to identify tumor class-specific gene expression patterns having both statistical and biological significance. Our interactive analysis highlighted the potential importance of genes involved in cell cycle progression, proliferation, signaling, adhesion, migration, motility, and structure, as well as candidate gene loci on a region of Chromosome 7 that has been implicated in glioma. Because EVA does not require statistical or computational expertise and has the flexibility to accommodate any type of statistical analysis, we anticipate EVA will prove a useful addition to the repertoire of computational methods used for microarray data analysis. EVA is available at no charge to academic users and can be found at http://www.epistasis.org. PMID:19390666

  17. Data Analysis and Statistical Methods for the Assessment and Interpretation of Geochronologic Data

    NASA Astrophysics Data System (ADS)

    Reno, B. L.; Brown, M.; Piccoli, P. M.

    2007-12-01

    Ages are traditionally reported as a weighted mean with an uncertainty based on least squares analysis of analytical error on individual dates. This method does not take into account geological uncertainties, and cannot accommodate asymmetries in the data. In most instances, this method will understate uncertainty on a given age, which may lead to over interpretation of age data. Geologic uncertainty is difficult to quantify, but is typically greater than analytical uncertainty. These factors make traditional statistical approaches inadequate to fully evaluate geochronologic data. We propose a protocol to assess populations within multi-event datasets and to calculate age and uncertainty from each population of dates interpreted to represent a single geologic event using robust and resistant statistical methods. To assess whether populations thought to represent different events are statistically separate exploratory data analysis is undertaken using a box plot, where the range of the data is represented by a 'box' of length given by the interquartile range, divided at the median of the data, with 'whiskers' that extend to the furthest datapoint that lies within 1.5 times the interquartile range beyond the box. If the boxes representing the populations do not overlap, they are interpreted to represent statistically different sets of dates. Ages are calculated from statistically distinct populations using a robust tool such as the tanh method of Kelsey et al. (2003, CMP, 146, 326-340), which is insensitive to any assumptions about the underlying probability distribution from which the data are drawn. Therefore, this method takes into account the full range of data, and is not drastically affected by outliers. The interquartile range of each population of dates (the interquartile range) gives a first pass at expressing uncertainty, which accommodates asymmetry in the dataset; outliers have a minor affect on the uncertainty. To better quantify the uncertainty, a resistant tool that is insensitive to local misbehavior of data is preferred, such as the normalized median absolute deviations proposed by Powell et al. (2002, Chem Geol, 185, 191-204). We illustrate the method using a dataset of 152 monazite dates determined using EPMA chemical data from a single sample from the Neoproterozoic Brasília Belt, Brazil. Results are compared with ages and uncertainties calculated using traditional methods to demonstrate the differences. The dataset was manually culled into three populations representing discrete compositional domains within chemically-zoned monazite grains. The weighted mean ages and least squares uncertainties for these populations are 633±6 (2σ) Ma for a core domain, 614±5 (2σ) Ma for an intermediate domain and 595±6 (2σ) Ma for a rim domain. Probability distribution plots indicate asymmetric distributions of all populations, which cannot be accounted for with traditional statistical tools. These three domains record distinct ages outside the interquartile range for each population of dates, with the core domain lying in the subrange 642-624 Ma, the intermediate domain 617-609 Ma and the rim domain 606-589 Ma. The tanh estimator yields ages of 631±7 (2σ) for the core domain, 616±7 (2σ) for the intermediate domain and 601±8 (2σ) for the rim domain. Whereas the uncertainties derived using a resistant statistical tool are larger than those derived from traditional statistical tools, the method yields more realistic uncertainties that better address the spread in the dataset and account for asymmetry in the data.

  18. Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial

    PubMed Central

    Hallgren, Kevin A.

    2012-01-01

    Many research designs require the assessment of inter-rater reliability (IRR) to demonstrate consistency among observational ratings provided by multiple coders. However, many studies use incorrect statistical procedures, fail to fully report the information necessary to interpret their results, or do not address how IRR affects the power of their subsequent analyses for hypothesis testing. This paper provides an overview of methodological issues related to the assessment of IRR with a focus on study design, selection of appropriate statistics, and the computation, interpretation, and reporting of some commonly-used IRR statistics. Computational examples include SPSS and R syntax for computing Cohen’s kappa and intra-class correlations to assess IRR. PMID:22833776

  19. Novice Interpretations of Progress Monitoring Graphs: Extreme Values and Graphical Aids

    ERIC Educational Resources Information Center

    Newell, Kirsten W.; Christ, Theodore J.

    2017-01-01

    Curriculum-Based Measurement of Reading (CBM-R) is frequently used to monitor instructional effects and evaluate response to instruction. Educators often view the data graphically on a time-series graph that might include a variety of statistical and visual aids, which are intended to facilitate the interpretation. This study evaluated the effects…

  20. Towards evidence-based computational statistics: lessons from clinical research on the role and design of real-data benchmark studies.

    PubMed

    Boulesteix, Anne-Laure; Wilson, Rory; Hapfelmeier, Alexander

    2017-09-09

    The goal of medical research is to develop interventions that are in some sense superior, with respect to patient outcome, to interventions currently in use. Similarly, the goal of research in methodological computational statistics is to develop data analysis tools that are themselves superior to the existing tools. The methodology of the evaluation of medical interventions continues to be discussed extensively in the literature and it is now well accepted that medicine should be at least partly "evidence-based". Although we statisticians are convinced of the importance of unbiased, well-thought-out study designs and evidence-based approaches in the context of clinical research, we tend to ignore these principles when designing our own studies for evaluating statistical methods in the context of our methodological research. In this paper, we draw an analogy between clinical trials and real-data-based benchmarking experiments in methodological statistical science, with datasets playing the role of patients and methods playing the role of medical interventions. Through this analogy, we suggest directions for improvement in the design and interpretation of studies which use real data to evaluate statistical methods, in particular with respect to dataset inclusion criteria and the reduction of various forms of bias. More generally, we discuss the concept of "evidence-based" statistical research, its limitations and its impact on the design and interpretation of real-data-based benchmark experiments. We suggest that benchmark studies-a method of assessment of statistical methods using real-world datasets-might benefit from adopting (some) concepts from evidence-based medicine towards the goal of more evidence-based statistical research.

  1. Mid-infrared interferometry of AGNs: A statistical view into the dusty nuclear environment of the Seyfert Galaxies.

    NASA Astrophysics Data System (ADS)

    Lopez-Gonzaga, N.

    2015-09-01

    The high resolution achieved by the instrument MIDI at the VLTI allowed to obtain more detail information about the geometry and structure of the nuclear mid-infrared emission of AGNs, but due to the lack of real images, the interpretation of the results is not an easy task. To profit more from the high resolution data, we developed a statistical tool that allows interpret these data using clumpy torus models. A statistical approach is needed to overcome effects such as, the randomness in the position of the clouds and the uncertainty of the true position angle on the sky. Our results, obtained by studying the mid-infrared emission at the highest resolution currently available, suggest that the dusty environment of Type I objects is formed by a lower number of clouds than Type II objects.

  2. Structural interpretation of seismic data and inherent uncertainties

    NASA Astrophysics Data System (ADS)

    Bond, Clare

    2013-04-01

    Geoscience is perhaps unique in its reliance on incomplete datasets and building knowledge from their interpretation. This interpretation basis for the science is fundamental at all levels; from creation of a geological map to interpretation of remotely sensed data. To teach and understand better the uncertainties in dealing with incomplete data we need to understand the strategies individual practitioners deploy that make them effective interpreters. The nature of interpretation is such that the interpreter needs to use their cognitive ability in the analysis of the data to propose a sensible solution in their final output that is both consistent not only with the original data but also with other knowledge and understanding. In a series of experiments Bond et al. (2007, 2008, 2011, 2012) investigated the strategies and pitfalls of expert and non-expert interpretation of seismic images. These studies focused on large numbers of participants to provide a statistically sound basis for analysis of the results. The outcome of these experiments showed that a wide variety of conceptual models were applied to single seismic datasets. Highlighting not only spatial variations in fault placements, but whether interpreters thought they existed at all, or had the same sense of movement. Further, statistical analysis suggests that the strategies an interpreter employs are more important than expert knowledge per se in developing successful interpretations. Experts are successful because of their application of these techniques. In a new set of experiments a small number of experts are focused on to determine how they use their cognitive and reasoning skills, in the interpretation of 2D seismic profiles. Live video and practitioner commentary were used to track the evolving interpretation and to gain insight on their decision processes. The outputs of the study allow us to create an educational resource of expert interpretation through online video footage and commentary with associated further interpretation and analysis of the techniques and strategies employed. This resource will be of use to undergraduate, post-graduate, industry and academic professionals seeking to improve their seismic interpretation skills, develop reasoning strategies for dealing with incomplete datasets, and for assessing the uncertainty in these interpretations. Bond, C.E. et al. (2012). 'What makes an expert effective at interpreting seismic images?' Geology, 40, 75-78. Bond, C. E. et al. (2011). 'When there isn't a right answer: interpretation and reasoning, key skills for 21st century geoscience'. International Journal of Science Education, 33, 629-652. Bond, C. E. et al. (2008). 'Structural models: Optimizing risk analysis by understanding conceptual uncertainty'. First Break, 26, 65-71. Bond, C. E. et al., (2007). 'What do you think this is?: "Conceptual uncertainty" In geoscience interpretation'. GSA Today, 17, 4-10.

  3. Collegiate Enrollments in the U.S., 1980-81: Statistics, Interpretations, and Trends. Combined 4-Year Institutions and 2-Year Institutions Edition.

    ERIC Educational Resources Information Center

    Mickler, J. Ernest

    Data on 1980-81 enrollments in U.S. four-year and two-year higher education institutions are presented and interpreted. Data from 1,818 four-year institutions in the United States, Puerto Rico, and the U.S. Territories indicated a total enrollment for fall 1980 of over 7 million, of which over 5 million were full-time and slightly over 2 million…

  4. Sources of Safety Data and Statistical Strategies for Design and Analysis: Clinical Trials.

    PubMed

    Zink, Richard C; Marchenko, Olga; Sanchez-Kam, Matilde; Ma, Haijun; Jiang, Qi

    2018-03-01

    There has been an increased emphasis on the proactive and comprehensive evaluation of safety endpoints to ensure patient well-being throughout the medical product life cycle. In fact, depending on the severity of the underlying disease, it is important to plan for a comprehensive safety evaluation at the start of any development program. Statisticians should be intimately involved in this process and contribute their expertise to study design, safety data collection, analysis, reporting (including data visualization), and interpretation. In this manuscript, we review the challenges associated with the analysis of safety endpoints and describe the safety data that are available to influence the design and analysis of premarket clinical trials. We share our recommendations for the statistical and graphical methodologies necessary to appropriately analyze, report, and interpret safety outcomes, and we discuss the advantages and disadvantages of safety data obtained from clinical trials compared to other sources. Clinical trials are an important source of safety data that contribute to the totality of safety information available to generate evidence for regulators, sponsors, payers, physicians, and patients. This work is a result of the efforts of the American Statistical Association Biopharmaceutical Section Safety Working Group.

  5. Statistically Characterizing Intra- and Inter-Individual Variability in Children with Developmental Coordination Disorder

    ERIC Educational Resources Information Center

    King, Bradley R.; Harring, Jeffrey R.; Oliveira, Marcio A.; Clark, Jane E.

    2011-01-01

    Previous research investigating children with Developmental Coordination Disorder (DCD) has consistently reported increased intra- and inter-individual variability during motor skill performance. Statistically characterizing this variability is not only critical for the analysis and interpretation of behavioral data, but also may facilitate our…

  6. Basic Statistical Concepts and Methods for Earth Scientists

    USGS Publications Warehouse

    Olea, Ricardo A.

    2008-01-01

    INTRODUCTION Statistics is the science of collecting, analyzing, interpreting, modeling, and displaying masses of numerical data primarily for the characterization and understanding of incompletely known systems. Over the years, these objectives have lead to a fair amount of analytical work to achieve, substantiate, and guide descriptions and inferences.

  7. CAN'T MISS--conquer any number task by making important statistics simple. Part 2. Probability, populations, samples, and normal distributions.

    PubMed

    Hansen, John P

    2003-01-01

    Healthcare quality improvement professionals need to understand and use inferential statistics to interpret sample data from their organizations. In quality improvement and healthcare research studies all the data from a population often are not available, so investigators take samples and make inferences about the population by using inferential statistics. This three-part series will give readers an understanding of the concepts of inferential statistics as well as the specific tools for calculating confidence intervals for samples of data. This article, Part 2, describes probability, populations, and samples. The uses of descriptive and inferential statistics are outlined. The article also discusses the properties and probability of normal distributions, including the standard normal distribution.

  8. Interrelationships Between Receiver/Relative Operating Characteristics Display, Binomial, Logit, and Bayes' Rule Probability of Detection Methodologies

    NASA Technical Reports Server (NTRS)

    Generazio, Edward R.

    2014-01-01

    Unknown risks are introduced into failure critical systems when probability of detection (POD) capabilities are accepted without a complete understanding of the statistical method applied and the interpretation of the statistical results. The presence of this risk in the nondestructive evaluation (NDE) community is revealed in common statements about POD. These statements are often interpreted in a variety of ways and therefore, the very existence of the statements identifies the need for a more comprehensive understanding of POD methodologies. Statistical methodologies have data requirements to be met, procedures to be followed, and requirements for validation or demonstration of adequacy of the POD estimates. Risks are further enhanced due to the wide range of statistical methodologies used for determining the POD capability. Receiver/Relative Operating Characteristics (ROC) Display, simple binomial, logistic regression, and Bayes' rule POD methodologies are widely used in determining POD capability. This work focuses on Hit-Miss data to reveal the framework of the interrelationships between Receiver/Relative Operating Characteristics Display, simple binomial, logistic regression, and Bayes' Rule methodologies as they are applied to POD. Knowledge of these interrelationships leads to an intuitive and global understanding of the statistical data, procedural and validation requirements for establishing credible POD estimates.

  9. A Closer Look at Data Independence: Comment on “Lies, Damned Lies, and Statistics (in Geology)”

    NASA Astrophysics Data System (ADS)

    Kravtsov, Sergey; Saunders, Rolando Olivas

    2011-02-01

    In his Forum (Eos, 90(47), 443, doi:10.1029/2009EO470004, 2009), P. Vermeesch suggests that statistical tests are not fit to interpret long data records. He asserts that for large enough data sets any true null hypothesis will always be rejected. This is certainly not the case! Here we revisit this author's example of weekly distribution of earthquakes and show that statistical results support the commonsense expectation that seismic activity does not depend on weekday (see the online supplement to this Eos issue for details (http://www.agu.org/eos_elec/)).

  10. ArraySolver: an algorithm for colour-coded graphical display and Wilcoxon signed-rank statistics for comparing microarray gene expression data.

    PubMed

    Khan, Haseeb Ahmad

    2004-01-01

    The massive surge in the production of microarray data poses a great challenge for proper analysis and interpretation. In recent years numerous computational tools have been developed to extract meaningful interpretation of microarray gene expression data. However, a convenient tool for two-groups comparison of microarray data is still lacking and users have to rely on commercial statistical packages that might be costly and require special skills, in addition to extra time and effort for transferring data from one platform to other. Various statistical methods, including the t-test, analysis of variance, Pearson test and Mann-Whitney U test, have been reported for comparing microarray data, whereas the utilization of the Wilcoxon signed-rank test, which is an appropriate test for two-groups comparison of gene expression data, has largely been neglected in microarray studies. The aim of this investigation was to build an integrated tool, ArraySolver, for colour-coded graphical display and comparison of gene expression data using the Wilcoxon signed-rank test. The results of software validation showed similar outputs with ArraySolver and SPSS for large datasets. Whereas the former program appeared to be more accurate for 25 or fewer pairs (n < or = 25), suggesting its potential application in analysing molecular signatures that usually contain small numbers of genes. The main advantages of ArraySolver are easy data selection, convenient report format, accurate statistics and the familiar Excel platform.

  11. ArraySolver: An Algorithm for Colour-Coded Graphical Display and Wilcoxon Signed-Rank Statistics for Comparing Microarray Gene Expression Data

    PubMed Central

    2004-01-01

    The massive surge in the production of microarray data poses a great challenge for proper analysis and interpretation. In recent years numerous computational tools have been developed to extract meaningful interpretation of microarray gene expression data. However, a convenient tool for two-groups comparison of microarray data is still lacking and users have to rely on commercial statistical packages that might be costly and require special skills, in addition to extra time and effort for transferring data from one platform to other. Various statistical methods, including the t-test, analysis of variance, Pearson test and Mann–Whitney U test, have been reported for comparing microarray data, whereas the utilization of the Wilcoxon signed-rank test, which is an appropriate test for two-groups comparison of gene expression data, has largely been neglected in microarray studies. The aim of this investigation was to build an integrated tool, ArraySolver, for colour-coded graphical display and comparison of gene expression data using the Wilcoxon signed-rank test. The results of software validation showed similar outputs with ArraySolver and SPSS for large datasets. Whereas the former program appeared to be more accurate for 25 or fewer pairs (n ≤ 25), suggesting its potential application in analysing molecular signatures that usually contain small numbers of genes. The main advantages of ArraySolver are easy data selection, convenient report format, accurate statistics and the familiar Excel platform. PMID:18629036

  12. Critical Analysis of Primary Literature in a Master’s-Level Class: Effects on Self-Efficacy and Science-Process Skills

    PubMed Central

    Abdullah, Christopher; Parris, Julian; Lie, Richard; Guzdar, Amy; Tour, Ella

    2015-01-01

    The ability to think analytically and creatively is crucial for success in the modern workforce, particularly for graduate students, who often aim to become physicians or researchers. Analysis of the primary literature provides an excellent opportunity to practice these skills. We describe a course that includes a structured analysis of four research papers from diverse fields of biology and group exercises in proposing experiments that would follow up on these papers. To facilitate a critical approach to primary literature, we included a paper with questionable data interpretation and two papers investigating the same biological question yet reaching opposite conclusions. We report a significant increase in students’ self-efficacy in analyzing data from research papers, evaluating authors’ conclusions, and designing experiments. Using our science-process skills test, we observe a statistically significant increase in students’ ability to propose an experiment that matches the goal of investigation. We also detect gains in interpretation of controls and quantitative analysis of data. No statistically significant changes were observed in questions that tested the skills of interpretation, inference, and evaluation. PMID:26250564

  13. Can patients interpret health information? An assessment of the medical data interpretation test.

    PubMed

    Schwartz, Lisa M; Woloshin, Steven; Welch, H Gilbert

    2005-01-01

    To establish the reliability/validity of an 18-item test of patients' medical data interpretation skills. Survey with retest after 2 weeks. Subjects. 178 people recruited from advertisements in local newspapers, an outpatient clinic, and a hospital open house. The percentage of correct answers to individual items ranged from 20% to 87%, and medical data interpretation test scores (on a 0- 100 scale) were normally distributed (median 61.1, mean 61.0, range 6-94). Reliability was good (test-retest correlation=0.67, Cronbach's alpha=0.71). Construct validity was supported in several ways. Higher scores were found among people with highest versus lowest numeracy (71 v. 36, P<0.001), highest quantitative literacy (65 v. 28, P<0.001), and highest education (69 v. 42, P=0.004). Scores for 15 physician experts also completing the survey were significantly higher than participants with other postgraduate degrees (mean score 89 v. 69, P<0.001). The medical data interpretation test is a reliable and valid measure of the ability to interpret medical statistics.

  14. Collegiate Enrollments in the U.S., 1979-80. Statistics, Interpretations, and Trends in 4-Year and Related Institutions.

    ERIC Educational Resources Information Center

    Mickler, J. Ernest

    This 60th annual report on collegiate enrollments in the United States is based on data received from 1,635 four-year institutions in the U.S., Puerto Rico, and the U.S. Territories. General notes, survey methodology notes, and a summary of findings are presented. Detailed statistical charts present institutional data on men and women students and…

  15. Biostatistical analysis of quantitative immunofluorescence microscopy images.

    PubMed

    Giles, C; Albrecht, M A; Lam, V; Takechi, R; Mamo, J C

    2016-12-01

    Semiquantitative immunofluorescence microscopy has become a key methodology in biomedical research. Typical statistical workflows are considered in the context of avoiding pseudo-replication and marginalising experimental error. However, immunofluorescence microscopy naturally generates hierarchically structured data that can be leveraged to improve statistical power and enrich biological interpretation. Herein, we describe a robust distribution fitting procedure and compare several statistical tests, outlining their potential advantages/disadvantages in the context of biological interpretation. Further, we describe tractable procedures for power analysis that incorporates the underlying distribution, sample size and number of images captured per sample. The procedures outlined have significant potential for increasing understanding of biological processes and decreasing both ethical and financial burden through experimental optimization. © 2016 The Authors Journal of Microscopy © 2016 Royal Microscopical Society.

  16. Method Designed to Respect Molecular Heterogeneity Can Profoundly Correct Present Data Interpretations for Genome-Wide Expression Analysis

    PubMed Central

    Chen, Chih-Hao; Hsu, Chueh-Lin; Huang, Shih-Hao; Chen, Shih-Yuan; Hung, Yi-Lin; Chen, Hsiao-Rong; Wu, Yu-Chung

    2015-01-01

    Although genome-wide expression analysis has become a routine tool for gaining insight into molecular mechanisms, extraction of information remains a major challenge. It has been unclear why standard statistical methods, such as the t-test and ANOVA, often lead to low levels of reproducibility, how likely applying fold-change cutoffs to enhance reproducibility is to miss key signals, and how adversely using such methods has affected data interpretations. We broadly examined expression data to investigate the reproducibility problem and discovered that molecular heterogeneity, a biological property of genetically different samples, has been improperly handled by the statistical methods. Here we give a mathematical description of the discovery and report the development of a statistical method, named HTA, for better handling molecular heterogeneity. We broadly demonstrate the improved sensitivity and specificity of HTA over the conventional methods and show that using fold-change cutoffs has lost much information. We illustrate the especial usefulness of HTA for heterogeneous diseases, by applying it to existing data sets of schizophrenia, bipolar disorder and Parkinson’s disease, and show it can abundantly and reproducibly uncover disease signatures not previously detectable. Based on 156 biological data sets, we estimate that the methodological issue has affected over 96% of expression studies and that HTA can profoundly correct 86% of the affected data interpretations. The methodological advancement can better facilitate systems understandings of biological processes, render biological inferences that are more reliable than they have hitherto been and engender translational medical applications, such as identifying diagnostic biomarkers and drug prediction, which are more robust. PMID:25793610

  17. Strategies for Dealing with Missing Accelerometer Data.

    PubMed

    Stephens, Samantha; Beyene, Joseph; Tremblay, Mark S; Faulkner, Guy; Pullnayegum, Eleanor; Feldman, Brian M

    2018-05-01

    Missing data is a universal research problem that can affect studies examining the relationship between physical activity measured with accelerometers and health outcomes. Statistical techniques are available to deal with missing data; however, available techniques have not been synthesized. A scoping review was conducted to summarize the advantages and disadvantages of identified methods of dealing with missing data from accelerometers. Missing data poses a threat to the validity and interpretation of trials using physical activity data from accelerometry. Imputation using multiple imputation techniques is recommended to deal with missing data and improve the validity and interpretation of studies using accelerometry. Copyright © 2018 Elsevier Inc. All rights reserved.

  18. Variance estimates and confidence intervals for the Kappa measure of classification accuracy

    Treesearch

    M. A. Kalkhan; R. M. Reich; R. L. Czaplewski

    1997-01-01

    The Kappa statistic is frequently used to characterize the results of an accuracy assessment used to evaluate land use and land cover classifications obtained by remotely sensed data. This statistic allows comparisons of alternative sampling designs, classification algorithms, photo-interpreters, and so forth. In order to make these comparisons, it is...

  19. The Use of Computer-Assisted Identification of ARIMA Time-Series.

    ERIC Educational Resources Information Center

    Brown, Roger L.

    This study was conducted to determine the effects of using various levels of tutorial statistical software for the tentative identification of nonseasonal ARIMA models, a statistical technique proposed by Box and Jenkins for the interpretation of time-series data. The Box-Jenkins approach is an iterative process encompassing several stages of…

  20. Conference Report on Youth Unemployment: Its Measurements and Meaning.

    ERIC Educational Resources Information Center

    Employment and Training Administration (DOL), Washington, DC.

    Thirteen papers presented at a conference on employment statistics and youth are contained in this report. Reviewed are the problems of gathering, interpreting, and applying employment and unemployment data relating to youth. The titles of the papers are as follow: "Counting Youth: A Comparison of Youth Labor Force Statistics in the Current…

  1. Assistive Technologies for Second-Year Statistics Students Who Are Blind

    ERIC Educational Resources Information Center

    Erhardt, Robert J.; Shuman, Michael P.

    2015-01-01

    At Wake Forest University, a student who is blind enrolled in a second course in statistics. The course covered simple and multiple regression, model diagnostics, model selection, data visualization, and elementary logistic regression. These topics required that the student both interpret and produce three sets of materials: mathematical writing,…

  2. Statistical Challenges in "Big Data" Human Neuroimaging.

    PubMed

    Smith, Stephen M; Nichols, Thomas E

    2018-01-17

    Smith and Nichols discuss "big data" human neuroimaging studies, with very large subject numbers and amounts of data. These studies provide great opportunities for making new discoveries about the brain but raise many new analytical challenges and interpretational risks. Copyright © 2017 Elsevier Inc. All rights reserved.

  3. [Globalization and infectious diseases in Mexico's indigenous population].

    PubMed

    Castro, Roberto; Erviti, Joaquina; Leyva, René

    2007-01-01

    This paper discusses the health status of indigenous populations in Mexico. The first section characterizes the concept of globalization and its links to the population's health. Based on available statistical data, the second section documents the current indigenous populations' health status in the country. The article then argues that the presupposition of equity, crucial to globalization theory, does not apply to this case. Using the Mexican National Health Survey (2000), the third section further analyzes the health status of indigenous populations and identifies important inconsistencies in the data. The discussion section contends that these inconsistencies derive from the fact that such health surveys fail to contemplate the cultural specificities of indigenous peoples, thus leading to erroneous interpretations of the data. The article concludes that statistics on indigenous peoples' health must be interpreted with extreme caution and always with the support of social science theories and research methods.

  4. How Statisticians Speak Risk

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Redus, K.S.

    2007-07-01

    The foundation of statistics deals with (a) how to measure and collect data and (b) how to identify models using estimates of statistical parameters derived from the data. Risk is a term used by the statistical community and those that employ statistics to express the results of a statistically based study. Statistical risk is represented as a probability that, for example, a statistical model is sufficient to describe a data set; but, risk is also interpreted as a measure of worth of one alternative when compared to another. The common thread of any risk-based problem is the combination of (a)more » the chance an event will occur, with (b) the value of the event. This paper presents an introduction to, and some examples of, statistical risk-based decision making from a quantitative, visual, and linguistic sense. This should help in understanding areas of radioactive waste management that can be suitably expressed using statistical risk and vice-versa. (authors)« less

  5. Statistical Knowledge and the Over-Interpretation of Student Evaluations of Teaching

    ERIC Educational Resources Information Center

    Boysen, Guy A.

    2017-01-01

    Research shows that teachers interpret small differences in student evaluations of teaching as meaningful even when available statistical information indicates that the differences are not reliable. The current research explored the effect of statistical training on college teachers' tendency to over-interpret student evaluation differences. A…

  6. Knowledge-base for interpretation of cerebrospinal fluid data patterns. Essentials in neurology and psychiatry.

    PubMed

    Reiber, Hansotto

    2016-06-01

    The physiological and biophysical knowledge base for interpretations of cerebrospinal fluid (CSF) data and reference ranges are essential for the clinical pathologist and neurochemist. With the popular description of the CSF flow dependent barrier function, the dynamics and concentration gradients of blood-derived, brain-derived and leptomeningeal proteins in CSF or the specificity-independent functions of B-lymphocytes in brain also the neurologist, psychiatrist, neurosurgeon as well as the neuropharmacologist may find essentials for diagnosis, research or development of therapies. This review may help to replace the outdated ideas like "leakage" models of the barriers, linear immunoglobulin Index Interpretations or CSF electrophoresis. Calculations, Interpretations and analytical pitfalls are described for albumin quotients, quantitation of immunoglobulin synthesis in Reibergrams, oligoclonal IgG, IgM analysis, the polyspecific ( MRZ- ) antibody reaction, the statistical treatment of CSF data and general quality assessment in the CSF laboratory. The diagnostic relevance is documented in an accompaning review.

  7. Statistics Translated: A Step-by-Step Guide to Analyzing and Interpreting Data

    ERIC Educational Resources Information Center

    Terrell, Steven R.

    2012-01-01

    Written in a humorous and encouraging style, this text shows how the most common statistical tools can be used to answer interesting real-world questions, presented as mysteries to be solved. Engaging research examples lead the reader through a series of six steps, from identifying a researchable problem to stating a hypothesis, identifying…

  8. A Framework for Assessing High School Students' Statistical Reasoning.

    PubMed

    Chan, Shiau Wei; Ismail, Zaleha; Sumintono, Bambang

    2016-01-01

    Based on a synthesis of literature, earlier studies, analyses and observations on high school students, this study developed an initial framework for assessing students' statistical reasoning about descriptive statistics. Framework descriptors were established across five levels of statistical reasoning and four key constructs. The former consisted of idiosyncratic reasoning, verbal reasoning, transitional reasoning, procedural reasoning, and integrated process reasoning. The latter include describing data, organizing and reducing data, representing data, and analyzing and interpreting data. In contrast to earlier studies, this initial framework formulated a complete and coherent statistical reasoning framework. A statistical reasoning assessment tool was then constructed from this initial framework. The tool was administered to 10 tenth-grade students in a task-based interview. The initial framework was refined, and the statistical reasoning assessment tool was revised. The ten students then participated in the second task-based interview, and the data obtained were used to validate the framework. The findings showed that the students' statistical reasoning levels were consistent across the four constructs, and this result confirmed the framework's cohesion. Developed to contribute to statistics education, this newly developed statistical reasoning framework provides a guide for planning learning goals and designing instruction and assessments.

  9. A Framework for Assessing High School Students' Statistical Reasoning

    PubMed Central

    2016-01-01

    Based on a synthesis of literature, earlier studies, analyses and observations on high school students, this study developed an initial framework for assessing students’ statistical reasoning about descriptive statistics. Framework descriptors were established across five levels of statistical reasoning and four key constructs. The former consisted of idiosyncratic reasoning, verbal reasoning, transitional reasoning, procedural reasoning, and integrated process reasoning. The latter include describing data, organizing and reducing data, representing data, and analyzing and interpreting data. In contrast to earlier studies, this initial framework formulated a complete and coherent statistical reasoning framework. A statistical reasoning assessment tool was then constructed from this initial framework. The tool was administered to 10 tenth-grade students in a task-based interview. The initial framework was refined, and the statistical reasoning assessment tool was revised. The ten students then participated in the second task-based interview, and the data obtained were used to validate the framework. The findings showed that the students’ statistical reasoning levels were consistent across the four constructs, and this result confirmed the framework’s cohesion. Developed to contribute to statistics education, this newly developed statistical reasoning framework provides a guide for planning learning goals and designing instruction and assessments. PMID:27812091

  10. Deconstructing multivariate decoding for the study of brain function.

    PubMed

    Hebart, Martin N; Baker, Chris I

    2017-08-04

    Multivariate decoding methods were developed originally as tools to enable accurate predictions in real-world applications. The realization that these methods can also be employed to study brain function has led to their widespread adoption in the neurosciences. However, prior to the rise of multivariate decoding, the study of brain function was firmly embedded in a statistical philosophy grounded on univariate methods of data analysis. In this way, multivariate decoding for brain interpretation grew out of two established frameworks: multivariate decoding for predictions in real-world applications, and classical univariate analysis based on the study and interpretation of brain activation. We argue that this led to two confusions, one reflecting a mixture of multivariate decoding for prediction or interpretation, and the other a mixture of the conceptual and statistical philosophies underlying multivariate decoding and classical univariate analysis. Here we attempt to systematically disambiguate multivariate decoding for the study of brain function from the frameworks it grew out of. After elaborating these confusions and their consequences, we describe six, often unappreciated, differences between classical univariate analysis and multivariate decoding. We then focus on how the common interpretation of what is signal and noise changes in multivariate decoding. Finally, we use four examples to illustrate where these confusions may impact the interpretation of neuroimaging data. We conclude with a discussion of potential strategies to help resolve these confusions in interpreting multivariate decoding results, including the potential departure from multivariate decoding methods for the study of brain function. Copyright © 2017. Published by Elsevier Inc.

  11. Statistical geometric affinity in human brain electric activity

    NASA Astrophysics Data System (ADS)

    Chornet-Lurbe, A.; Oteo, J. A.; Ros, J.

    2007-05-01

    The representation of the human electroencephalogram (EEG) records by neurophysiologists demands standardized time-amplitude scales for their correct conventional interpretation. In a suite of graphical experiments involving scaling affine transformations we have been able to convert electroencephalogram samples corresponding to any particular sleep phase and relaxed wakefulness into each other. We propound a statistical explanation for that finding in terms of data collapse. As a sequel, we determine characteristic time and amplitude scales and outline a possible physical interpretation. An analysis for characteristic times based on lacunarity is also carried out as well as a study of the synchrony between left and right EEG channels.

  12. Neuronal Correlation Parameter and the Idea of Thermodynamic Entropy of an N-Body Gravitationally Bounded System.

    PubMed

    Haranas, Ioannis; Gkigkitzis, Ioannis; Kotsireas, Ilias; Austerlitz, Carlos

    2017-01-01

    Understanding how the brain encodes information and performs computation requires statistical and functional analysis. Given the complexity of the human brain, simple methods that facilitate the interpretation of statistical correlations among different brain regions can be very useful. In this report we introduce a numerical correlation measure that may serve the interpretation of correlational neuronal data, and may assist in the evaluation of different brain states. The description of the dynamical brain system, through a global numerical measure may indicate the presence of an action principle which may facilitate a application of physics principles in the study of the human brain and cognition.

  13. Linear regression analysis: part 14 of a series on evaluation of scientific publications.

    PubMed

    Schneider, Astrid; Hommel, Gerhard; Blettner, Maria

    2010-11-01

    Regression analysis is an important statistical method for the analysis of medical data. It enables the identification and characterization of relationships among multiple factors. It also enables the identification of prognostically relevant risk factors and the calculation of risk scores for individual prognostication. This article is based on selected textbooks of statistics, a selective review of the literature, and our own experience. After a brief introduction of the uni- and multivariable regression models, illustrative examples are given to explain what the important considerations are before a regression analysis is performed, and how the results should be interpreted. The reader should then be able to judge whether the method has been used correctly and interpret the results appropriately. The performance and interpretation of linear regression analysis are subject to a variety of pitfalls, which are discussed here in detail. The reader is made aware of common errors of interpretation through practical examples. Both the opportunities for applying linear regression analysis and its limitations are presented.

  14. The assignment of scores procedure for ordinal categorical data.

    PubMed

    Chen, Han-Ching; Wang, Nae-Sheng

    2014-01-01

    Ordinal data are the most frequently encountered type of data in the social sciences. Many statistical methods can be used to process such data. One common method is to assign scores to the data, convert them into interval data, and further perform statistical analysis. There are several authors who have recently developed assigning score methods to assign scores to ordered categorical data. This paper proposes an approach that defines an assigning score system for an ordinal categorical variable based on underlying continuous latent distribution with interpretation by using three case study examples. The results show that the proposed score system is well for skewed ordinal categorical data.

  15. Integrated Analysis of Pharmacologic, Clinical, and SNP Microarray Data using Projection onto the Most Interesting Statistical Evidence with Adaptive Permutation Testing

    PubMed Central

    Pounds, Stan; Cao, Xueyuan; Cheng, Cheng; Yang, Jun; Campana, Dario; Evans, William E.; Pui, Ching-Hon; Relling, Mary V.

    2010-01-01

    Powerful methods for integrated analysis of multiple biological data sets are needed to maximize interpretation capacity and acquire meaningful knowledge. We recently developed Projection Onto the Most Interesting Statistical Evidence (PROMISE). PROMISE is a statistical procedure that incorporates prior knowledge about the biological relationships among endpoint variables into an integrated analysis of microarray gene expression data with multiple biological and clinical endpoints. Here, PROMISE is adapted to the integrated analysis of pharmacologic, clinical, and genome-wide genotype data that incorporating knowledge about the biological relationships among pharmacologic and clinical response data. An efficient permutation-testing algorithm is introduced so that statistical calculations are computationally feasible in this higher-dimension setting. The new method is applied to a pediatric leukemia data set. The results clearly indicate that PROMISE is a powerful statistical tool for identifying genomic features that exhibit a biologically meaningful pattern of association with multiple endpoint variables. PMID:21516175

  16. Knock 'm Down.

    ERIC Educational Resources Information Center

    Hunt, Gordon

    1998-01-01

    Describes an activity that engages pupils in the whole statistical process from collecting data to interpreting the results. Discusses the importance of these activities in developing knowledge, skills, understanding, and ability in problem solving. (Author/ASK)

  17. Pilot dynamics for instrument approach tasks: Full panel multiloop and flight director operations

    NASA Technical Reports Server (NTRS)

    Weir, D. H.; Mcruer, D. T.

    1972-01-01

    Measurements and interpretations of single and mutiloop pilot response properties during simulated instrument approach are presented. Pilot subjects flew Category 2-like ILS approaches in a fixed base DC-8 simulaton. A conventional instrument panel and controls were used, with simulated vertical gust and glide slope beam bend forcing functions. Reduced and interpreted pilot describing functions and remmant are given for pitch attitude, flight director, and multiloop (longitudinal) control tasks. The response data are correlated with simultaneously recorded eye scanning statistics, previously reported in NASA CR-1535. The resulting combined response and scanning data and their interpretations provide a basis for validating and extending the theory of manual control displays.

  18. Determining the optimal number of independent components for reproducible transcriptomic data analysis.

    PubMed

    Kairov, Ulykbek; Cantini, Laura; Greco, Alessandro; Molkenov, Askhat; Czerwinska, Urszula; Barillot, Emmanuel; Zinovyev, Andrei

    2017-09-11

    Independent Component Analysis (ICA) is a method that models gene expression data as an action of a set of statistically independent hidden factors. The output of ICA depends on a fundamental parameter: the number of components (factors) to compute. The optimal choice of this parameter, related to determining the effective data dimension, remains an open question in the application of blind source separation techniques to transcriptomic data. Here we address the question of optimizing the number of statistically independent components in the analysis of transcriptomic data for reproducibility of the components in multiple runs of ICA (within the same or within varying effective dimensions) and in multiple independent datasets. To this end, we introduce ranking of independent components based on their stability in multiple ICA computation runs and define a distinguished number of components (Most Stable Transcriptome Dimension, MSTD) corresponding to the point of the qualitative change of the stability profile. Based on a large body of data, we demonstrate that a sufficient number of dimensions is required for biological interpretability of the ICA decomposition and that the most stable components with ranks below MSTD have more chances to be reproduced in independent studies compared to the less stable ones. At the same time, we show that a transcriptomics dataset can be reduced to a relatively high number of dimensions without losing the interpretability of ICA, even though higher dimensions give rise to components driven by small gene sets. We suggest a protocol of ICA application to transcriptomics data with a possibility of prioritizing components with respect to their reproducibility that strengthens the biological interpretation. Computing too few components (much less than MSTD) is not optimal for interpretability of the results. The components ranked within MSTD range have more chances to be reproduced in independent studies.

  19. Autoadaptivity and optimization in distributed ECG interpretation.

    PubMed

    Augustyniak, Piotr

    2010-03-01

    This paper addresses principal issues of the ECG interpretation adaptivity in a distributed surveillance network. In the age of pervasive access to wireless digital communication, distributed biosignal interpretation networks may not only optimally solve difficult medical cases, but also adapt the data acquisition, interpretation, and transmission to the variable patient's status and availability of technical resources. The background of such adaptivity is the innovative use of results from the automatic ECG analysis to the seamless remote modification of the interpreting software. Since the medical relevance of issued diagnostic data depends on the patient's status, the interpretation adaptivity implies the flexibility of report content and frequency. Proposed solutions are based on the research on human experts behavior, procedures reliability, and usage statistics. Despite the limited scale of our prototype client-server application, the tests yielded very promising results: the transmission channel occupation was reduced by 2.6 to 5.6 times comparing to the rigid reporting mode and the improvement of the remotely computed diagnostic outcome was achieved in case of over 80% of software adaptation attempts.

  20. Automated lithology prediction from PGNAA and other geophysical logs.

    PubMed

    Borsaru, M; Zhou, B; Aizawa, T; Karashima, H; Hashimoto, T

    2006-02-01

    Different methods of lithology predictions from geophysical data have been developed in the last 15 years. The geophysical logs used for predicting lithology are the conventional logs: sonic, neutron-neutron, gamma (total natural-gamma) and density (backscattered gamma-gamma). The prompt gamma neutron activation analysis (PGNAA) is another established geophysical logging technique for in situ element analysis of rocks in boreholes. The work described in this paper was carried out to investigate the application of PGNAA to the lithology interpretation. The data interpretation was conducted using the automatic interpretation program LogTrans based on statistical analysis. Limited test suggests that PGNAA logging data can be used to predict the lithology. A success rate of 73% for lithology prediction was achieved from PGNAA logging data only. It can also be used in conjunction with the conventional geophysical logs to enhance the lithology prediction.

  1. 76 FR 22385 - Fisheries of the Caribbean; Southeast Data, Assessment, and Review (SEDAR); Public Meeting

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-04-21

    ... sources for providing information on life history characteristics, catch statistics, discard estimates... people with disabilities. Requests for sign language interpretation or other auxiliary aids should be...

  2. Using U.S. Census Data to Teach Mathematics for Social Justice

    ERIC Educational Resources Information Center

    Leonard, Jacqueline

    2010-01-01

    Teachers can use census data to teach important mathematics content while addressing issues of social justice. The author describes activities that teach students to read and write large numbers, interpret census data and statistics, and apply algebraic concepts to describe U.S. population growth. Additional learning outcomes include applying…

  3. Automated Box-Cox Transformations for Improved Visual Encoding.

    PubMed

    Maciejewski, Ross; Pattath, Avin; Ko, Sungahn; Hafen, Ryan; Cleveland, William S; Ebert, David S

    2013-01-01

    The concept of preconditioning data (utilizing a power transformation as an initial step) for analysis and visualization is well established within the statistical community and is employed as part of statistical modeling and analysis. Such transformations condition the data to various inherent assumptions of statistical inference procedures, as well as making the data more symmetric and easier to visualize and interpret. In this paper, we explore the use of the Box-Cox family of power transformations to semiautomatically adjust visual parameters. We focus on time-series scaling, axis transformations, and color binning for choropleth maps. We illustrate the usage of this transformation through various examples, and discuss the value and some issues in semiautomatically using these transformations for more effective data visualization.

  4. 40 CFR Appendix K to Part 50 - Interpretation of the National Ambient Air Quality Standards for Particulate Matter

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ..., other techniques, such as the use of statistical models or the use of historical data could be..., mathematical techniques should be applied to account for the trends to ensure that the expected annual values... emission patterns, either the most recent representative year(s) could be used or statistical techniques or...

  5. Student's Conceptions in Statistical Graph's Interpretation

    ERIC Educational Resources Information Center

    Kukliansky, Ida

    2016-01-01

    Histograms, box plots and cumulative distribution graphs are popular graphic representations for statistical distributions. The main research question that this study focuses on is how college students deal with interpretation of these statistical graphs when translating graphical representations into analytical concepts in descriptive statistics.…

  6. Making Sense of Learner and Learning Big Data: Reviewing Five Years of Data Wrangling at the Open University UK

    ERIC Educational Resources Information Center

    Rienties, Bart; Cross, Simon; Marsh, Vicky; Ullmann, Thomas

    2017-01-01

    Most distance learning institutions collect vast amounts of learning data. Making sense of this 'Big Data' can be a challenge, in particular when data are stored at different data warehouses and require advanced statistical skills to interpret complex patterns of data. As a leading institute on learning analytics, the Open University UK instigated…

  7. Imputing historical statistics, soils information, and other land-use data to crop area

    NASA Technical Reports Server (NTRS)

    Perry, C. R., Jr.; Willis, R. W.; Lautenschlager, L.

    1982-01-01

    In foreign crop condition monitoring, satellite acquired imagery is routinely used. To facilitate interpretation of this imagery, it is advantageous to have estimates of the crop types and their extent for small area units, i.e., grid cells on a map represent, at 60 deg latitude, an area nominally 25 by 25 nautical miles in size. The feasibility of imputing historical crop statistics, soils information, and other ancillary data to crop area for a province in Argentina is studied.

  8. Interpretable dimensionality reduction of single cell transcriptome data with deep generative models.

    PubMed

    Ding, Jiarui; Condon, Anne; Shah, Sohrab P

    2018-05-21

    Single-cell RNA-sequencing has great potential to discover cell types, identify cell states, trace development lineages, and reconstruct the spatial organization of cells. However, dimension reduction to interpret structure in single-cell sequencing data remains a challenge. Existing algorithms are either not able to uncover the clustering structures in the data or lose global information such as groups of clusters that are close to each other. We present a robust statistical model, scvis, to capture and visualize the low-dimensional structures in single-cell gene expression data. Simulation results demonstrate that low-dimensional representations learned by scvis preserve both the local and global neighbor structures in the data. In addition, scvis is robust to the number of data points and learns a probabilistic parametric mapping function to add new data points to an existing embedding. We then use scvis to analyze four single-cell RNA-sequencing datasets, exemplifying interpretable two-dimensional representations of the high-dimensional single-cell RNA-sequencing data.

  9. Data Acquisition and Preprocessing in Studies on Humans: What Is Not Taught in Statistics Classes?

    PubMed

    Zhu, Yeyi; Hernandez, Ladia M; Mueller, Peter; Dong, Yongquan; Forman, Michele R

    2013-01-01

    The aim of this paper is to address issues in research that may be missing from statistics classes and important for (bio-)statistics students. In the context of a case study, we discuss data acquisition and preprocessing steps that fill the gap between research questions posed by subject matter scientists and statistical methodology for formal inference. Issues include participant recruitment, data collection training and standardization, variable coding, data review and verification, data cleaning and editing, and documentation. Despite the critical importance of these details in research, most of these issues are rarely discussed in an applied statistics program. One reason for the lack of more formal training is the difficulty in addressing the many challenges that can possibly arise in the course of a study in a systematic way. This article can help to bridge this gap between research questions and formal statistical inference by using an illustrative case study for a discussion. We hope that reading and discussing this paper and practicing data preprocessing exercises will sensitize statistics students to these important issues and achieve optimal conduct, quality control, analysis, and interpretation of a study.

  10. Statistical interpretation of machine learning-based feature importance scores for biomarker discovery.

    PubMed

    Huynh-Thu, Vân Anh; Saeys, Yvan; Wehenkel, Louis; Geurts, Pierre

    2012-07-01

    Univariate statistical tests are widely used for biomarker discovery in bioinformatics. These procedures are simple, fast and their output is easily interpretable by biologists but they can only identify variables that provide a significant amount of information in isolation from the other variables. As biological processes are expected to involve complex interactions between variables, univariate methods thus potentially miss some informative biomarkers. Variable relevance scores provided by machine learning techniques, however, are potentially able to highlight multivariate interacting effects, but unlike the p-values returned by univariate tests, these relevance scores are usually not statistically interpretable. This lack of interpretability hampers the determination of a relevance threshold for extracting a feature subset from the rankings and also prevents the wide adoption of these methods by practicians. We evaluated several, existing and novel, procedures that extract relevant features from rankings derived from machine learning approaches. These procedures replace the relevance scores with measures that can be interpreted in a statistical way, such as p-values, false discovery rates, or family wise error rates, for which it is easier to determine a significance level. Experiments were performed on several artificial problems as well as on real microarray datasets. Although the methods differ in terms of computing times and the tradeoff, they achieve in terms of false positives and false negatives, some of them greatly help in the extraction of truly relevant biomarkers and should thus be of great practical interest for biologists and physicians. As a side conclusion, our experiments also clearly highlight that using model performance as a criterion for feature selection is often counter-productive. Python source codes of all tested methods, as well as the MATLAB scripts used for data simulation, can be found in the Supplementary Material.

  11. The effectiveness of a primer to help people understand risk: two randomized trials in distinct populations.

    PubMed

    Woloshin, Steven; Schwartz, Lisa M; Welch, H Gilbert

    2007-02-20

    People need basic data interpretation skills to understand health risks and to weigh the harms and benefits of actions meant to reduce those risks. Although many studies document problems with understanding risk information, few assess ways to teach interpretation skills. To see whether a general education primer improves patients' medical data interpretation skills. Two randomized, controlled trials done in populations with high and low socioeconomic status (SES). The high SES trial included persons who attended a public lecture series at Dartmouth Medical School, Hanover, New Hampshire; and the low SES trial included veterans and their families from the waiting areas at the White River Junction Veterans Affairs Medical Center, White River Junction, Vermont. 334 adults in the high SES trial and 221 veterans and their families in the low SES trial were enrolled from October 2004 to August 2005. Completion rates for the primer and control groups in each trial were 95% versus 98% (high SES) and 85% versus 96% (low SES). The intervention in the primer groups was an educational booklet specifically developed to teach people the skills needed to understand risk. The control groups received a general health booklet developed by the U.S. Department of Health and Human Services Agency for Health Care Research and Quality. Score on a medical data interpretation test, a previously validated 100-point scale, in which 75 points or more is considered "passing." Secondary outcomes included 2 other 100-point validated scores (interest and confidence in interpreting medical statistics) and participants' ratings of the booklet's usefulness. In the high SES trial, 74% of participants in the primer group received a "passing grade" on the medical data interpretation test versus 56% in the control group (P = 0.001). Mean scores were 81 and 75, respectively (P = 0.0006). In the low SES trial, 44% versus 26% "passed" (P = 0.010): Mean scores were 69 and 62 in the primer and control groups, respectively (P = 0.008). The primer also significantly increased interest in medical statistics by 6 points in the high SES trial (a 4-point increase vs. a 2-point decrease from baseline) (P = 0.004) and by 8 points in the low SES trial (a 6-point increase vs. a 2-point decrease from baseline) (P = 0.004) compared with the control booklet. The primer, however, did not improve participants' confidence in interpreting medical statistics beyond the control booklet (a 2-point vs. a 4-point increase in the high SES trial [P = 0.36] and a 2-point versus a 6-point increase in the low SES trial [P = 0.166]). The primer was rated highly: 91% of participants in the high SES trial found it "helpful" or "very helpful," as did 95% of participants in the low SES trial. The primarily male low SES sample and the primarily female high SES sample limits generalizability. The authors did not assess whether better data interpretation skills improved decision-making. The primer improved medical data interpretation skills in people with high and low SES. ClinicalTrials.gov registration number: NCT00380432.

  12. A statistical model and national data set for partioning fish-tissue mercury concentration variation between spatiotemporal and sample characteristic effects

    USGS Publications Warehouse

    Wente, Stephen P.

    2004-01-01

    Many Federal, Tribal, State, and local agencies monitor mercury in fish-tissue samples to identify sites with elevated fish-tissue mercury (fish-mercury) concentrations, track changes in fish-mercury concentrations over time, and produce fish-consumption advisories. Interpretation of such monitoring data commonly is impeded by difficulties in separating the effects of sample characteristics (species, tissues sampled, and sizes of fish) from the effects of spatial and temporal trends on fish-mercury concentrations. Without such a separation, variation in fish-mercury concentrations due to differences in the characteristics of samples collected over time or across space can be misattributed to temporal or spatial trends; and/or actual trends in fish-mercury concentration can be misattributed to differences in sample characteristics. This report describes a statistical model and national data set (31,813 samples) for calibrating the aforementioned statistical model that can separate spatiotemporal and sample characteristic effects in fish-mercury concentration data. This model could be useful for evaluating spatial and temporal trends in fishmercury concentrations and developing fish-consumption advisories. The observed fish-mercury concentration data and model predictions can be accessed, displayed geospatially, and downloaded via the World Wide Web (http://emmma.usgs.gov). This report and the associated web site may assist in the interpretation of large amounts of data from widespread fishmercury monitoring efforts.

  13. Data Interpretation: Using Probability

    ERIC Educational Resources Information Center

    Drummond, Gordon B.; Vowler, Sarah L.

    2011-01-01

    Experimental data are analysed statistically to allow researchers to draw conclusions from a limited set of measurements. The hard fact is that researchers can never be certain that measurements from a sample will exactly reflect the properties of the entire group of possible candidates available to be studied (although using a sample is often the…

  14. A Bayesian Missing Data Framework for Generalized Multiple Outcome Mixed Treatment Comparisons

    ERIC Educational Resources Information Center

    Hong, Hwanhee; Chu, Haitao; Zhang, Jing; Carlin, Bradley P.

    2016-01-01

    Bayesian statistical approaches to mixed treatment comparisons (MTCs) are becoming more popular because of their flexibility and interpretability. Many randomized clinical trials report multiple outcomes with possible inherent correlations. Moreover, MTC data are typically sparse (although richer than standard meta-analysis, comparing only two…

  15. Enhancing the Teaching and Learning of Mathematical Visual Images

    ERIC Educational Resources Information Center

    Quinnell, Lorna

    2014-01-01

    The importance of mathematical visual images is indicated by the introductory paragraph in the Statistics and Probability content strand of the Australian Curriculum, which draws attention to the importance of learners developing skills to analyse and draw inferences from data and "represent, summarise and interpret data and undertake…

  16. Statistical analysis of water-quality data containing multiple detection limits: S-language software for regression on order statistics

    USGS Publications Warehouse

    Lee, L.; Helsel, D.

    2005-01-01

    Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.

  17. Experimental statistics for biological sciences.

    PubMed

    Bang, Heejung; Davidian, Marie

    2010-01-01

    In this chapter, we cover basic and fundamental principles and methods in statistics - from "What are Data and Statistics?" to "ANOVA and linear regression," which are the basis of any statistical thinking and undertaking. Readers can easily find the selected topics in most introductory statistics textbooks, but we have tried to assemble and structure them in a succinct and reader-friendly manner in a stand-alone chapter. This text has long been used in real classroom settings for both undergraduate and graduate students who do or do not major in statistical sciences. We hope that from this chapter, readers would understand the key statistical concepts and terminologies, how to design a study (experimental or observational), how to analyze the data (e.g., describe the data and/or estimate the parameter(s) and make inference), and how to interpret the results. This text would be most useful if it is used as a supplemental material, while the readers take their own statistical courses or it would serve as a great reference text associated with a manual for any statistical software as a self-teaching guide.

  18. Combining Shapley value and statistics to the analysis of gene expression data in children exposed to air pollution

    PubMed Central

    Moretti, Stefano; van Leeuwen, Danitsja; Gmuender, Hans; Bonassi, Stefano; van Delft, Joost; Kleinjans, Jos; Patrone, Fioravante; Merlo, Domenico Franco

    2008-01-01

    Background In gene expression analysis, statistical tests for differential gene expression provide lists of candidate genes having, individually, a sufficiently low p-value. However, the interpretation of each single p-value within complex systems involving several interacting genes is problematic. In parallel, in the last sixty years, game theory has been applied to political and social problems to assess the power of interacting agents in forcing a decision and, more recently, to represent the relevance of genes in response to certain conditions. Results In this paper we introduce a Bootstrap procedure to test the null hypothesis that each gene has the same relevance between two conditions, where the relevance is represented by the Shapley value of a particular coalitional game defined on a microarray data-set. This method, which is called Comparative Analysis of Shapley value (shortly, CASh), is applied to data concerning the gene expression in children differentially exposed to air pollution. The results provided by CASh are compared with the results from a parametric statistical test for testing differential gene expression. Both lists of genes provided by CASh and t-test are informative enough to discriminate exposed subjects on the basis of their gene expression profiles. While many genes are selected in common by CASh and the parametric test, it turns out that the biological interpretation of the differences between these two selections is more interesting, suggesting a different interpretation of the main biological pathways in gene expression regulation for exposed individuals. A simulation study suggests that CASh offers more power than t-test for the detection of differential gene expression variability. Conclusion CASh is successfully applied to gene expression analysis of a data-set where the joint expression behavior of genes may be critical to characterize the expression response to air pollution. We demonstrate a synergistic effect between coalitional games and statistics that resulted in a selection of genes with a potential impact in the regulation of complex pathways. PMID:18764936

  19. Reporting and Interpreting Quantitative Research Findings: What Gets Reported and Recommendations for the Field

    ERIC Educational Resources Information Center

    Larson-Hall, Jenifer; Plonsky, Luke

    2015-01-01

    This paper presents a set of guidelines for reporting on five types of quantitative data issues: (1) Descriptive statistics, (2) Effect sizes and confidence intervals, (3) Instrument reliability, (4) Visual displays of data, and (5) Raw data. Our recommendations are derived mainly from various professional sources related to L2 research but…

  20. Decomposing biodiversity data using the Latent Dirichlet Allocation model, a probabilistic multivariate statistical method

    Treesearch

    Denis Valle; Benjamin Baiser; Christopher W. Woodall; Robin Chazdon; Jerome Chave

    2014-01-01

    We propose a novel multivariate method to analyse biodiversity data based on the Latent Dirichlet Allocation (LDA) model. LDA, a probabilistic model, reduces assemblages to sets of distinct component communities. It produces easily interpretable results, can represent abrupt and gradual changes in composition, accommodates missing data and allows for coherent estimates...

  1. Response to traumatic brain injury neurorehabilitation through an artificial intelligence and statistics hybrid knowledge discovery from databases methodology.

    PubMed

    Gibert, Karina; García-Rudolph, Alejandro; García-Molina, Alberto; Roig-Rovira, Teresa; Bernabeu, Montse; Tormos, José María

    2008-01-01

    Develop a classificatory tool to identify different populations of patients with Traumatic Brain Injury based on the characteristics of deficit and response to treatment. A KDD framework where first, descriptive statistics of every variable was done, data cleaning and selection of relevant variables. Then data was mined using a generalization of Clustering based on rules (CIBR), an hybrid AI and Statistics technique which combines inductive learning (AI) and clustering (Statistics). A prior Knowledge Base (KB) is considered to properly bias the clustering; semantic constraints implied by the KB hold in final clusters, guaranteeing interpretability of the resultis. A generalization (Exogenous Clustering based on rules, ECIBR) is presented, allowing to define the KB in terms of variables which will not be considered in the clustering process itself, to get more flexibility. Several tools as Class panel graph are introduced in the methodology to assist final interpretation. A set of 5 classes was recommended by the system and interpretation permitted profiles labeling. From the medical point of view, composition of classes is well corresponding with different patterns of increasing level of response to rehabilitation treatments. All the patients initially assessable conform a single group. Severe impaired patients are subdivided in four profiles which clearly distinct response patterns. Particularly interesting the partial response profile, where patients could not improve executive functions. Meaningful classes were obtained and, from a semantics point of view, the results were sensibly improved regarding classical clustering, according to our opinion that hybrid AI & Stats techniques are more powerful for KDD than pure ones.

  2. Basin Characterisation by Means of Joint Inversion of Electromagnetic Geophysical Data, Borehole Data and Multivariate Statistical Methods: The Loop Head Peninsula, Western Ireland, Case Study

    NASA Astrophysics Data System (ADS)

    Campanya, J. L.; Ogaya, X.; Jones, A. G.; Rath, V.; McConnell, B.; Haughton, P.; Prada, M.

    2016-12-01

    The Science Foundation Ireland funded project IRECCSEM project (www.ireccsem.ie) aims to evaluate Ireland's potential for onshore carbon sequestration in saline aquifers by integrating new electromagnetic geophysical data with existing geophysical and geological data. One of the objectives of this component of IRECCSEM is to characterise the subsurface beneath the Loop Head Peninsula (part of Clare Basin, Co. Clare, Ireland), and identify major electrical resistivity structures that can guide an interpretation of the carbon sequestration potential of this area. During the summer of 2014, a magnetotelluric (MT) survey was carried out on the Loop Head Peninsula, and data from a total of 140 sites were acquired, including audio-magnetotelluric (AMT), and broadband magnetotelluric (BBMT). The dataset was used to generate shallow three-dimensional (3-D) electrical resistivity models constraining the subsurface to depths of up to 3.5 km. The three-dimensional (3-D) joint inversions were performed using three different types of electromagnetic data: MT impedance tensor (Z), geomagnetic transfer functions (T), and inter-station horizontal magnetic transfer-functions (H). The interpretation of the results was complemented with second-derivative models of the resulting electrical resistivity models, and a quantitative comparison with borehole data using multivariate statistical methods. Second-derivative models were used to define the main interfaces between the geoelectrical structures, facilitating superior comparison with geological and seismic results, and also reducing the influence of the colour scale when interpreting the results. Specific analysis was performed to compare the extant borehole data with the electrical resistivity model, identifying those structures that are better characterised by the resistivity model. Finally, the electrical resistivity model was also used to propagate some of the physical properties measured in the borehole, when a good relation was possible between the different types of data. The final results were compared with independent geological and geophysical data for a high-quality interpretation.

  3. What Should We Grow in Our School Garden to Sell at the Farmers' Market? Initiating Statistical Literacy through Science and Mathematics Integration

    ERIC Educational Resources Information Center

    Selmer, Sarah J.; Rye, James A.; Malone, Elizabeth; Fernandez, Danielle; Trebino, Kathryn

    2014-01-01

    Statistical literacy is essential to scientific literacy, and the quest for such is best initiated in the elementary grades. The "Next Generation Science Standards and the Common Core State Standards for Mathematics" set forth practices (e.g., asking questions, using tools strategically to analyze and interpret data) and content (e.g.,…

  4. 75 FR 24883 - Fisheries of the South Atlantic; Southeast Data, Assessment, and Review (SEDAR); Public Meeting

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-05-06

    ... history characteristics, catch statistics, discard estimates, length and age composition, and fishery... interpretation or other auxiliary aids should be directed to the Council office (see ADDRESSES) at least 10...

  5. Identification of the isomers using principal component analysis (PCA) method

    NASA Astrophysics Data System (ADS)

    Kepceoǧlu, Abdullah; Gündoǧdu, Yasemin; Ledingham, Kenneth William David; Kilic, Hamdi Sukur

    2016-03-01

    In this work, we have carried out a detailed statistical analysis for experimental data of mass spectra from xylene isomers. Principle Component Analysis (PCA) was used to identify the isomers which cannot be distinguished using conventional statistical methods for interpretation of their mass spectra. Experiments have been carried out using a linear TOF-MS coupled to a femtosecond laser system as an energy source for the ionisation processes. We have performed experiments and collected data which has been analysed and interpreted using PCA as a multivariate analysis of these spectra. This demonstrates the strength of the method to get an insight for distinguishing the isomers which cannot be identified using conventional mass analysis obtained through dissociative ionisation processes on these molecules. The PCA results dependending on the laser pulse energy and the background pressure in the spectrometers have been presented in this work.

  6. Advances in Statistical Methods for Substance Abuse Prevention Research

    PubMed Central

    MacKinnon, David P.; Lockwood, Chondra M.

    2010-01-01

    The paper describes advances in statistical methods for prevention research with a particular focus on substance abuse prevention. Standard analysis methods are extended to the typical research designs and characteristics of the data collected in prevention research. Prevention research often includes longitudinal measurement, clustering of data in units such as schools or clinics, missing data, and categorical as well as continuous outcome variables. Statistical methods to handle these features of prevention data are outlined. Developments in mediation, moderation, and implementation analysis allow for the extraction of more detailed information from a prevention study. Advancements in the interpretation of prevention research results include more widespread calculation of effect size and statistical power, the use of confidence intervals as well as hypothesis testing, detailed causal analysis of research findings, and meta-analysis. The increased availability of statistical software has contributed greatly to the use of new methods in prevention research. It is likely that the Internet will continue to stimulate the development and application of new methods. PMID:12940467

  7. Maps and interpretation of geochemical anomalies, Chuckwalla Mountains Wilderness Study Area, Riverside County, California

    USGS Publications Warehouse

    Watts, K.C.

    1986-01-01

    This report discusses and interprets geochemical results as they are seen at the reconnaissance stage. Analytical results for all samples collected are released in a U.S. Geological Survey Open-File Report (Adrian and others, 1985). A statistical summary of the data from heavy-mineral concentrates and sieved stream sediments is shown in table 1. The analytical results for selected elements in rock samples are shown in table 2.

  8. Radiologist Uncertainty and the Interpretation of Screening

    PubMed Central

    Carney, Patricia A.; Elmore, Joann G.; Abraham, Linn A.; Gerrity, Martha S.; Hendrick, R. Edward; Taplin, Stephen H.; Barlow, William E.; Cutter, Gary R.; Poplack, Steven P.; D’Orsi, Carl J.

    2011-01-01

    Objective To determine radiologists’ reactions to uncertainty when interpreting mammography and the extent to which radiologist uncertainty explains variability in interpretive performance. Methods The authors used a mailed survey to assess demographic and clinical characteristics of radiologists and reactions to uncertainty associated with practice. Responses were linked to radiologists’ actual interpretive performance data obtained from 3 regionally located mammography registries. Results More than 180 radiologists were eligible to participate, and 139 consented for a response rate of 76.8%. Radiologist gender, more years interpreting, and higher volume were associated with lower uncertainty scores. Positive predictive value, recall rates, and specificity were more affected by reactions to uncertainty than sensitivity or negative predictive value; however, none of these relationships was statistically significant. Conclusion Certain practice factors, such as gender and years of interpretive experience, affect uncertainty scores. Radiologists’ reactions to uncertainty do not appear to affect interpretive performance. PMID:15155014

  9. Methods for trend analysis: Examples with problem/failure data

    NASA Technical Reports Server (NTRS)

    Church, Curtis K.

    1989-01-01

    Statistics are emphasized as an important role in quality control and reliability. Consequently, Trend Analysis Techniques recommended a variety of statistical methodologies that could be applied to time series data. The major goal of the working handbook, using data from the MSFC Problem Assessment System, is to illustrate some of the techniques in the NASA standard, some different techniques, and to notice patterns of data. Techniques for trend estimation used are: regression (exponential, power, reciprocal, straight line) and Kendall's rank correlation coefficient. The important details of a statistical strategy for estimating a trend component are covered in the examples. However, careful analysis and interpretation is necessary because of small samples and frequent zero problem reports in a given time period. Further investigations to deal with these issues are being conducted.

  10. Understanding spatial organizations of chromosomes via statistical analysis of Hi-C data

    PubMed Central

    Hu, Ming; Deng, Ke; Qin, Zhaohui; Liu, Jun S.

    2015-01-01

    Understanding how chromosomes fold provides insights into the transcription regulation, hence, the functional state of the cell. Using the next generation sequencing technology, the recently developed Hi-C approach enables a global view of spatial chromatin organization in the nucleus, which substantially expands our knowledge about genome organization and function. However, due to multiple layers of biases, noises and uncertainties buried in the protocol of Hi-C experiments, analyzing and interpreting Hi-C data poses great challenges, and requires novel statistical methods to be developed. This article provides an overview of recent Hi-C studies and their impacts on biomedical research, describes major challenges in statistical analysis of Hi-C data, and discusses some perspectives for future research. PMID:26124977

  11. Statistical tables and charts showing geochemical variation in the Mesoproterozoic Big Creek, Apple Creek, and Gunsight formations, Lemhi group, Salmon River Mountains and Lemhi Range, central Idaho

    USGS Publications Warehouse

    Lindsey, David A.; Tysdal, Russell G.; Taggart, Joseph E.

    2002-01-01

    The principal purpose of this report is to provide a reference archive for results of a statistical analysis of geochemical data for metasedimentary rocks of Mesoproterozoic age of the Salmon River Mountains and Lemhi Range, central Idaho. Descriptions of geochemical data sets, statistical methods, rationale for interpretations, and references to the literature are provided. Three methods of analysis are used: R-mode factor analysis of major oxide and trace element data for identifying petrochemical processes, analysis of variance for effects of rock type and stratigraphic position on chemical composition, and major-oxide ratio plots for comparison with the chemical composition of common clastic sedimentary rocks.

  12. Field and laboratory analyses of water from the Columbia aquifer in Eastern Maryland

    USGS Publications Warehouse

    Bachman, L.J.

    1984-01-01

    Field and laboratory analyses of pH, alkalinity, and specific conductance from water samples collected from the Columbia aquifer on the Delmarva Peninsula in eastern Maryland were compared to determine if laboratory analyses could be used for making regional water-quality interpretations. Kruskal-Wallis tests of field and laboratory data indicate that the difference between field and laboratory values is usually not enough to affect the outcome of the statistical tests. Thus, laboratory measurements of these constituents may be adequate for making certain regional water-quality interpretations, although they may result in errors if used for geochemical interpretations.

  13. Distributed data collection for a database of radiological image interpretations

    NASA Astrophysics Data System (ADS)

    Long, L. Rodney; Ostchega, Yechiam; Goh, Gin-Hua; Thoma, George R.

    1997-01-01

    The National Library of Medicine, in collaboration with the National Center for Health Statistics and the National Institute for Arthritis and Musculoskeletal and Skin Diseases, has built a system for collecting radiological interpretations for a large set of x-ray images acquired as part of the data gathered in the second National Health and Nutrition Examination Survey. This system is capable of delivering across the Internet 5- and 10-megabyte x-ray images to Sun workstations equipped with X Window based 2048 X 2560 image displays, for the purpose of having these images interpreted for the degree of presence of particular osteoarthritic conditions in the cervical and lumbar spines. The collected interpretations can then be stored in a database at the National Library of Medicine, under control of the Illustra DBMS. This system is a client/server database application which integrates (1) distributed server processing of client requests, (2) a customized image transmission method for faster Internet data delivery, (3) distributed client workstations with high resolution displays, image processing functions and an on-line digital atlas, and (4) relational database management of the collected data.

  14. Crop identification technology assessment for remote sensing (CITARS). Volume 10: Interpretation of results

    NASA Technical Reports Server (NTRS)

    Bizzell, R. M.; Feiveson, A. H.; Hall, F. G.; Bauer, M. E.; Davis, B. J.; Malila, W. A.; Rice, D. P.

    1975-01-01

    The CITARS was an experiment designed to quantitatively evaluate crop identification performance for corn and soybeans in various environments using a well-defined set of automatic data processing (ADP) techniques. Each technique was applied to data acquired to recognize and estimate proportions of corn and soybeans. The CITARS documentation summarizes, interprets, and discusses the crop identification performances obtained using (1) different ADP procedures; (2) a linear versus a quadratic classifier; (3) prior probability information derived from historic data; (4) local versus nonlocal recognition training statistics and the associated use of preprocessing; (5) multitemporal data; (6) classification bias and mixed pixels in proportion estimation; and (7) data with differnt site characteristics, including crop, soil, atmospheric effects, and stages of crop maturity.

  15. The Heat Is On! Using a Stylised Graph to Engender Understanding

    ERIC Educational Resources Information Center

    Fitzallen, Noleine; Watson, Jane; Wright, Suzie

    2017-01-01

    When working within a meaningful context quite young students are capable of working with sophisticated data. Year 3 students investigate thermal insulation and the transfer of heat in a STEM inquiry, developing skills in measuring temperature by conducting a statistical investigation, and using a stylised graph to interpret their data.

  16. Recent Reliability Reporting Practices in "Psychological Assessment": Recognizing the People behind the Data

    ERIC Educational Resources Information Center

    Green, Carlton E.; Chen, Cynthia E.; Helms, Janet E.; Henze, Kevin T.

    2011-01-01

    Helms, Henze, Sass, and Mifsud (2006) defined good practices for internal consistency reporting, interpretation, and analysis consistent with an alpha-as-data perspective. Their viewpoint (a) expands on previous arguments that reliability coefficients are group-level summary statistics of samples' responses rather than stable properties of scales…

  17. Airborne gamma-ray spectrometer and magnetometer survey, Durango C, Colorado. Final report Volume II A. Detail area

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    1983-01-01

    Geology of Durango C detail area, radioactive mineral occurrences in Colorado, and geophysical data interpretation are included in this report. Eight appendices provide: stacked profiles, geologic histograms, geochemical histograms, speed and altitude histograms, geologic statistical tables, magnetic and ancillary profiles, and test line data.

  18. STUDY OF FEASIBILITY OF NEW EDUCATIONAL MEDIA FOR DEVELOPING COUNTRIES.

    ERIC Educational Resources Information Center

    VAIZEY, J.

    COST DATA ON THE USE OF THE NEW INSTRUCTIONAL MEDIA ARE NECESSARY IN ORDER TO COMPARE DIFFERENT FORMS OF EDUCATION, TO DETERMINE THE ECONOMICALLY OPTIMUM RATE OF TECHNICAL USAGE, AND TO ASSIST ADMINISTRATORS. THE HISTORICAL INACCURACY OR STATISTICAL BIAS OF SOURCES AND THE INCOMPARABILITY OF DATA POSE DIFFICULTIES IN INTERPRETATION. THE COST OF…

  19. Making the Case for Evidence-Based Practice

    ERIC Educational Resources Information Center

    Bates, Joanne; McClure, Janelle; Spinks, Andy

    2010-01-01

    Evidence-based practice is the collection, interpretation, and use of data, such as collection statistics or assessment results, that measure the effectiveness of a library media program. In this article, the authors will present various forms of evidence and show that any library media specialist can use data to make informed decisions that…

  20. Dose-response relationships and extrapolation in toxicology - Mechanistic and statistical considerations

    EPA Science Inventory

    Controversy on toxicological dose-response relationships and low-dose extrapolation of respective risks is often the consequence of misleading data presentation, lack of differentiation between types of response variables, and diverging mechanistic interpretation. In this chapter...

  1. Statistical Literacy in Data Revolution Era: Building Blocks and Instructional Dilemmas

    ERIC Educational Resources Information Center

    Prodromou, Theodosia; Dunne, Tim

    2017-01-01

    The data revolution has given citizens access to enormous large-scale open databases. In order to take into account the full complexity of data, we have to change the way we think in terms of the nature of data and its availability, the ways in which it is displayed and used, and the skills that are required for its interpretation. Substantial…

  2. Students' Successes and Challenges Applying Data Analysis and Measurement Skills in a Fifth-Grade Integrated STEM Unit

    ERIC Educational Resources Information Center

    Glancy, Aran W.; Moore, Tamara J.; Guzey, Selcen; Smith, Karl A.

    2017-01-01

    An understanding of statistics and skills in data analysis are becoming more and more essential, yet research consistently shows that students struggle with these concepts at all levels. This case study documents some of the struggles four groups of fifth-grade students encounter as they collect, organize, and interpret data and then ultimately…

  3. [Intranarcotic infusion therapy -- a computer interpretation using the program package SPSS (Statistical Package for the Social Sciences)].

    PubMed

    Link, J; Pachaly, J

    1975-08-01

    In a retrospective 18-month study the infusion therapy applied in a great anesthesia institute is examined. The data of the course of anesthesia recorded on magnetic tape by routine are analysed for this purpose bya computer with the statistical program SPSS. It could be proved that the behaviour of the several anesthetists is very different. Various correlations are discussed.

  4. "Magnitude-based inference": a statistical review.

    PubMed

    Welsh, Alan H; Knight, Emma J

    2015-04-01

    We consider "magnitude-based inference" and its interpretation by examining in detail its use in the problem of comparing two means. We extract from the spreadsheets, which are provided to users of the analysis (http://www.sportsci.org/), a precise description of how "magnitude-based inference" is implemented. We compare the implemented version of the method with general descriptions of it and interpret the method in familiar statistical terms. We show that "magnitude-based inference" is not a progressive improvement on modern statistics. The additional probabilities introduced are not directly related to the confidence interval but, rather, are interpretable either as P values for two different nonstandard tests (for different null hypotheses) or as approximate Bayesian calculations, which also lead to a type of test. We also discuss sample size calculations associated with "magnitude-based inference" and show that the substantial reduction in sample sizes claimed for the method (30% of the sample size obtained from standard frequentist calculations) is not justifiable so the sample size calculations should not be used. Rather than using "magnitude-based inference," a better solution is to be realistic about the limitations of the data and use either confidence intervals or a fully Bayesian analysis.

  5. Statistical analysis and interpretation of prenatal diagnostic imaging studies, Part 2: descriptive and inferential statistical methods.

    PubMed

    Tuuli, Methodius G; Odibo, Anthony O

    2011-08-01

    The objective of this article is to discuss the rationale for common statistical tests used for the analysis and interpretation of prenatal diagnostic imaging studies. Examples from the literature are used to illustrate descriptive and inferential statistics. The uses and limitations of linear and logistic regression analyses are discussed in detail.

  6. Cluster detection methods applied to the Upper Cape Cod cancer data.

    PubMed

    Ozonoff, Al; Webster, Thomas; Vieira, Veronica; Weinberg, Janice; Ozonoff, David; Aschengrau, Ann

    2005-09-15

    A variety of statistical methods have been suggested to assess the degree and/or the location of spatial clustering of disease cases. However, there is relatively little in the literature devoted to comparison and critique of different methods. Most of the available comparative studies rely on simulated data rather than real data sets. We have chosen three methods currently used for examining spatial disease patterns: the M-statistic of Bonetti and Pagano; the Generalized Additive Model (GAM) method as applied by Webster; and Kulldorff's spatial scan statistic. We apply these statistics to analyze breast cancer data from the Upper Cape Cancer Incidence Study using three different latency assumptions. The three different latency assumptions produced three different spatial patterns of cases and controls. For 20 year latency, all three methods generally concur. However, for 15 year latency and no latency assumptions, the methods produce different results when testing for global clustering. The comparative analyses of real data sets by different statistical methods provides insight into directions for further research. We suggest a research program designed around examining real data sets to guide focused investigation of relevant features using simulated data, for the purpose of understanding how to interpret statistical methods applied to epidemiological data with a spatial component.

  7. Considerations for estimating microbial environmental data concentrations collected from a field setting

    PubMed Central

    Silvestri, Erin E; Yund, Cynthia; Taft, Sarah; Bowling, Charlena Yoder; Chappie, Daniel; Garrahan, Kevin; Brady-Roberts, Eletha; Stone, Harry; Nichols, Tonya L

    2017-01-01

    In the event of an indoor release of an environmentally persistent microbial pathogen such as Bacillus anthracis, the potential for human exposure will be considered when remedial decisions are made. Microbial site characterization and clearance sampling data collected in the field might be used to estimate exposure. However, there are many challenges associated with estimating environmental concentrations of B. anthracis or other spore-forming organisms after such an event before being able to estimate exposure. These challenges include: (1) collecting environmental field samples that are adequate for the intended purpose, (2) conducting laboratory analyses and selecting the reporting format needed for the laboratory data, and (3) analyzing and interpreting the data using appropriate statistical techniques. This paper summarizes some key challenges faced in collecting, analyzing, and interpreting microbial field data from a contaminated site. Although the paper was written with considerations for B. anthracis contamination, it may also be applicable to other bacterial agents. It explores the implications and limitations of using field data for determining environmental concentrations both before and after decontamination. Several findings were of interest. First, to date, the only validated surface/sampling device combinations are swabs and sponge-sticks on stainless steel surfaces, thus limiting availability of quantitative analytical results which could be used for statistical analysis. Second, agreement needs to be reached with the analytical laboratory on the definition of the countable range and on reporting of data below the limit of quantitation. Finally, the distribution of the microbial field data and statistical methods needed for a particular data set could vary depending on these data that were collected, and guidance is needed on appropriate statistical software for handling microbial data. Further, research is needed to develop better methods to estimate human exposure from pathogens using environmental data collected from a field setting. PMID:26883476

  8. Smart Interpretation - Application of Machine Learning in Geological Interpretation of AEM Data

    NASA Astrophysics Data System (ADS)

    Bach, T.; Gulbrandsen, M. L.; Jacobsen, R.; Pallesen, T. M.; Jørgensen, F.; Høyer, A. S.; Hansen, T. M.

    2015-12-01

    When using airborne geophysical measurements in e.g. groundwater mapping, an overwhelming amount of data is collected. Increasingly larger survey areas, denser data collection and limited resources, combines to an increasing problem of building geological models that use all the available data in a manner that is consistent with the geologists knowledge about the geology of the survey area. In the ERGO project, funded by The Danish National Advanced Technology Foundation, we address this problem, by developing new, usable tools, enabling the geologist utilize her geological knowledge directly in the interpretation of the AEM data, and thereby handle the large amount of data, In the project we have developed the mathematical basis for capturing geological expertise in a statistical model. Based on this, we have implemented new algorithms that have been operationalized and embedded in user friendly software. In this software, the machine learning algorithm, Smart Interpretation, enables the geologist to use the system as an assistant in the geological modelling process. As the software 'learns' the geology from the geologist, the system suggest new modelling features in the data. In this presentation we demonstrate the application of the results from the ERGO project, including the proposed modelling workflow utilized on a variety of data examples.

  9. Statistical analysis of arthroplasty data

    PubMed Central

    2011-01-01

    It is envisaged that guidelines for statistical analysis and presentation of results will improve the quality and value of research. The Nordic Arthroplasty Register Association (NARA) has therefore developed guidelines for the statistical analysis of arthroplasty register data. The guidelines are divided into two parts, one with an introduction and a discussion of the background to the guidelines (Ranstam et al. 2011a, see pages x-y in this issue), and this one with a more technical statistical discussion on how specific problems can be handled. This second part contains (1) recommendations for the interpretation of methods used to calculate survival, (2) recommendations on howto deal with bilateral observations, and (3) a discussion of problems and pitfalls associated with analysis of factors that influence survival or comparisons between outcomes extracted from different hospitals. PMID:21619500

  10. Mathematics pre-service teachers’ statistical reasoning about meaning

    NASA Astrophysics Data System (ADS)

    Kristanto, Y. D.

    2018-01-01

    This article offers a descriptive qualitative analysis of 3 second-year pre-service teachers’ statistical reasoning about the mean. Twenty-six pre-service teachers were tested using an open-ended problem where they were expected to analyze a method in finding the mean of a data. Three of their test results are selected to be analyzed. The results suggest that the pre-service teachers did not use context to develop the interpretation of mean. Therefore, this article also offers strategies to promote statistical reasoning about mean that use various contexts.

  11. Multicollinearity in Regression Analyses Conducted in Epidemiologic Studies

    PubMed Central

    Vatcheva, Kristina P.; Lee, MinJae; McCormick, Joseph B.; Rahbar, Mohammad H.

    2016-01-01

    The adverse impact of ignoring multicollinearity on findings and data interpretation in regression analysis is very well documented in the statistical literature. The failure to identify and report multicollinearity could result in misleading interpretations of the results. A review of epidemiological literature in PubMed from January 2004 to December 2013, illustrated the need for a greater attention to identifying and minimizing the effect of multicollinearity in analysis of data from epidemiologic studies. We used simulated datasets and real life data from the Cameron County Hispanic Cohort to demonstrate the adverse effects of multicollinearity in the regression analysis and encourage researchers to consider the diagnostic for multicollinearity as one of the steps in regression analysis. PMID:27274911

  12. Multicollinearity in Regression Analyses Conducted in Epidemiologic Studies.

    PubMed

    Vatcheva, Kristina P; Lee, MinJae; McCormick, Joseph B; Rahbar, Mohammad H

    2016-04-01

    The adverse impact of ignoring multicollinearity on findings and data interpretation in regression analysis is very well documented in the statistical literature. The failure to identify and report multicollinearity could result in misleading interpretations of the results. A review of epidemiological literature in PubMed from January 2004 to December 2013, illustrated the need for a greater attention to identifying and minimizing the effect of multicollinearity in analysis of data from epidemiologic studies. We used simulated datasets and real life data from the Cameron County Hispanic Cohort to demonstrate the adverse effects of multicollinearity in the regression analysis and encourage researchers to consider the diagnostic for multicollinearity as one of the steps in regression analysis.

  13. A unified statistical approach to non-negative matrix factorization and probabilistic latent semantic indexing

    PubMed Central

    Wang, Guoli; Ebrahimi, Nader

    2014-01-01

    Non-negative matrix factorization (NMF) is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix V into the product of two nonnegative matrices, W and H, such that V ∼ W H. It has been shown to have a parts-based, sparse representation of the data. NMF has been successfully applied in a variety of areas such as natural language processing, neuroscience, information retrieval, image processing, speech recognition and computational biology for the analysis and interpretation of large-scale data. There has also been simultaneous development of a related statistical latent class modeling approach, namely, probabilistic latent semantic indexing (PLSI), for analyzing and interpreting co-occurrence count data arising in natural language processing. In this paper, we present a generalized statistical approach to NMF and PLSI based on Renyi's divergence between two non-negative matrices, stemming from the Poisson likelihood. Our approach unifies various competing models and provides a unique theoretical framework for these methods. We propose a unified algorithm for NMF and provide a rigorous proof of monotonicity of multiplicative updates for W and H. In addition, we generalize the relationship between NMF and PLSI within this framework. We demonstrate the applicability and utility of our approach as well as its superior performance relative to existing methods using real-life and simulated document clustering data. PMID:25821345

  14. A unified statistical approach to non-negative matrix factorization and probabilistic latent semantic indexing.

    PubMed

    Devarajan, Karthik; Wang, Guoli; Ebrahimi, Nader

    2015-04-01

    Non-negative matrix factorization (NMF) is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix V into the product of two nonnegative matrices, W and H , such that V ∼ W H . It has been shown to have a parts-based, sparse representation of the data. NMF has been successfully applied in a variety of areas such as natural language processing, neuroscience, information retrieval, image processing, speech recognition and computational biology for the analysis and interpretation of large-scale data. There has also been simultaneous development of a related statistical latent class modeling approach, namely, probabilistic latent semantic indexing (PLSI), for analyzing and interpreting co-occurrence count data arising in natural language processing. In this paper, we present a generalized statistical approach to NMF and PLSI based on Renyi's divergence between two non-negative matrices, stemming from the Poisson likelihood. Our approach unifies various competing models and provides a unique theoretical framework for these methods. We propose a unified algorithm for NMF and provide a rigorous proof of monotonicity of multiplicative updates for W and H . In addition, we generalize the relationship between NMF and PLSI within this framework. We demonstrate the applicability and utility of our approach as well as its superior performance relative to existing methods using real-life and simulated document clustering data.

  15. SOCR: Statistics Online Computational Resource

    PubMed Central

    Dinov, Ivo D.

    2011-01-01

    The need for hands-on computer laboratory experience in undergraduate and graduate statistics education has been firmly established in the past decade. As a result a number of attempts have been undertaken to develop novel approaches for problem-driven statistical thinking, data analysis and result interpretation. In this paper we describe an integrated educational web-based framework for: interactive distribution modeling, virtual online probability experimentation, statistical data analysis, visualization and integration. Following years of experience in statistical teaching at all college levels using established licensed statistical software packages, like STATA, S-PLUS, R, SPSS, SAS, Systat, etc., we have attempted to engineer a new statistics education environment, the Statistics Online Computational Resource (SOCR). This resource performs many of the standard types of statistical analysis, much like other classical tools. In addition, it is designed in a plug-in object-oriented architecture and is completely platform independent, web-based, interactive, extensible and secure. Over the past 4 years we have tested, fine-tuned and reanalyzed the SOCR framework in many of our undergraduate and graduate probability and statistics courses and have evidence that SOCR resources build student’s intuition and enhance their learning. PMID:21451741

  16. Petroleum supply annual, 1990. [Contains Glossary

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    1991-05-30

    The Petroleum Supply Annual (PSA) contains information on the supply and disposition of crude oil and petroleum products. The publication reflects data that were collected from the petroleum industry during 1990 through annual and monthly surveys. The PSA is divided into two volumes. This first volume contains three sections, Summary Statistics, Detailed Statistics, and Refinery Capacity, each with final annual data. The second volume contains final statistics for each month of 1990, and replaces data previously published in the Petroleum Supply Monthly (PSM). The tables in Volumes 1 and 2 are similarly numbered to facilitate comparison between them. Explanatory Notes,more » located at the end of this publication, present information describing data collection, sources, estimation methodology, data quality control procedures, modifications to reporting requirements and interpretation of tables. Industry terminology and product definitions are listed alphabetically in the Glossary. 35 tabs.« less

  17. Petroleum supply annual 1992. [Contains glossary

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    1993-05-27

    The Petroleum Supply Annual (PSA) contains information on the supply and disposition of crude oil and petroleum products. The publication reflects data that were collected from the petroleum industry during 1992 through annual and monthly surveys. The PSA is divided into two volumes. The first volume contains four sections: Summary Statistics, Detailed Statistics, Refinery Capacity, and Oxygenate Capacity each with final annual data. This second volume contains final statistics for each month of 1992, and replaces data previously published in the Petroleum Supply Monthly (PSM). The tables in Volumes 1 and 2 are similarly numbered to facilitate comparison between them.more » Explanatory Notes, located at the end of this publication, present information describing data collection, sources, estimation methodology, data quality control procedures, modifications to reporting requirements and interpretation of tables. Industry terminology and product definitions are listed alphabetically in the Glossary.« less

  18. Statistical methods of fracture characterization using acoustic borehole televiewer log interpretation

    NASA Astrophysics Data System (ADS)

    Massiot, Cécile; Townend, John; Nicol, Andrew; McNamara, David D.

    2017-08-01

    Acoustic borehole televiewer (BHTV) logs provide measurements of fracture attributes (orientations, thickness, and spacing) at depth. Orientation, censoring, and truncation sampling biases similar to those described for one-dimensional outcrop scanlines, and other logging or drilling artifacts specific to BHTV logs, can affect the interpretation of fracture attributes from BHTV logs. K-means, fuzzy K-means, and agglomerative clustering methods provide transparent means of separating fracture groups on the basis of their orientation. Fracture spacing is calculated for each of these fracture sets. Maximum likelihood estimation using truncated distributions permits the fitting of several probability distributions to the fracture attribute data sets within truncation limits, which can then be extrapolated over the entire range where they naturally occur. Akaike Information Criterion (AIC) and Schwartz Bayesian Criterion (SBC) statistical information criteria rank the distributions by how well they fit the data. We demonstrate these attribute analysis methods with a data set derived from three BHTV logs acquired from the high-temperature Rotokawa geothermal field, New Zealand. Varying BHTV log quality reduces the number of input data points, but careful selection of the quality levels where fractures are deemed fully sampled increases the reliability of the analysis. Spacing data analysis comprising up to 300 data points and spanning three orders of magnitude can be approximated similarly well (similar AIC rankings) with several distributions. Several clustering configurations and probability distributions can often characterize the data at similar levels of statistical criteria. Thus, several scenarios should be considered when using BHTV log data to constrain numerical fracture models.

  19. Method for factor analysis of GC/MS data

    DOEpatents

    Van Benthem, Mark H; Kotula, Paul G; Keenan, Michael R

    2012-09-11

    The method of the present invention provides a fast, robust, and automated multivariate statistical analysis of gas chromatography/mass spectroscopy (GC/MS) data sets. The method can involve systematic elimination of undesired, saturated peak masses to yield data that follow a linear, additive model. The cleaned data can then be subjected to a combination of PCA and orthogonal factor rotation followed by refinement with MCR-ALS to yield highly interpretable results.

  20. Random Responding as a Threat to the Validity of Effect Size Estimates in Correlational Research

    ERIC Educational Resources Information Center

    Crede, Marcus

    2010-01-01

    Random responding to psychological inventories is a long-standing concern among clinical practitioners and researchers interested in interpreting idiographic data, but it is typically viewed as having only a minor impact on the statistical inferences drawn from nomothetic data. This article explores the impact of random responding on the size and…

  1. Investigating Faculty Familiarity with Assessment Terminology by Applying Cluster Analysis to Interpret Survey Data

    ERIC Educational Resources Information Center

    Raker, Jeffrey R.; Holme, Thomas A.

    2014-01-01

    A cluster analysis was conducted with a set of survey data on chemistry faculty familiarity with 13 assessment terms. Cluster groupings suggest a high, middle, and low overall familiarity with the terminology and an independent high and low familiarity with terms related to fundamental statistics. The six resultant clusters were found to be…

  2. The Statistical Interpretation of Classical Thermodynamic Heating and Expansion Processes

    ERIC Educational Resources Information Center

    Cartier, Stephen F.

    2011-01-01

    A statistical model has been developed and applied to interpret thermodynamic processes typically presented from the macroscopic, classical perspective. Through this model, students learn and apply the concepts of statistical mechanics, quantum mechanics, and classical thermodynamics in the analysis of the (i) constant volume heating, (ii)…

  3. A Preliminary Bayesian Analysis of Incomplete Longitudinal Data from a Small Sample: Methodological Advances in an International Comparative Study of Educational Inequality

    ERIC Educational Resources Information Center

    Hsieh, Chueh-An; Maier, Kimberly S.

    2009-01-01

    The capacity of Bayesian methods in estimating complex statistical models is undeniable. Bayesian data analysis is seen as having a range of advantages, such as an intuitive probabilistic interpretation of the parameters of interest, the efficient incorporation of prior information to empirical data analysis, model averaging and model selection.…

  4. Counting Penguins.

    ERIC Educational Resources Information Center

    Perry, Mike; Kader, Gary

    1998-01-01

    Presents an activity on the simplification of penguin counting by employing the basic ideas and principles of sampling to teach students to understand and recognize its role in statistical claims. Emphasizes estimation, data analysis and interpretation, and central limit theorem. Includes a list of items for classroom discussion. (ASK)

  5. Statistical ecology comes of age.

    PubMed

    Gimenez, Olivier; Buckland, Stephen T; Morgan, Byron J T; Bez, Nicolas; Bertrand, Sophie; Choquet, Rémi; Dray, Stéphane; Etienne, Marie-Pierre; Fewster, Rachel; Gosselin, Frédéric; Mérigot, Bastien; Monestiez, Pascal; Morales, Juan M; Mortier, Frédéric; Munoz, François; Ovaskainen, Otso; Pavoine, Sandrine; Pradel, Roger; Schurr, Frank M; Thomas, Len; Thuiller, Wilfried; Trenkel, Verena; de Valpine, Perry; Rexstad, Eric

    2014-12-01

    The desire to predict the consequences of global environmental change has been the driver towards more realistic models embracing the variability and uncertainties inherent in ecology. Statistical ecology has gelled over the past decade as a discipline that moves away from describing patterns towards modelling the ecological processes that generate these patterns. Following the fourth International Statistical Ecology Conference (1-4 July 2014) in Montpellier, France, we analyse current trends in statistical ecology. Important advances in the analysis of individual movement, and in the modelling of population dynamics and species distributions, are made possible by the increasing use of hierarchical and hidden process models. Exciting research perspectives include the development of methods to interpret citizen science data and of efficient, flexible computational algorithms for model fitting. Statistical ecology has come of age: it now provides a general and mathematically rigorous framework linking ecological theory and empirical data.

  6. Statistical ecology comes of age

    PubMed Central

    Gimenez, Olivier; Buckland, Stephen T.; Morgan, Byron J. T.; Bez, Nicolas; Bertrand, Sophie; Choquet, Rémi; Dray, Stéphane; Etienne, Marie-Pierre; Fewster, Rachel; Gosselin, Frédéric; Mérigot, Bastien; Monestiez, Pascal; Morales, Juan M.; Mortier, Frédéric; Munoz, François; Ovaskainen, Otso; Pavoine, Sandrine; Pradel, Roger; Schurr, Frank M.; Thomas, Len; Thuiller, Wilfried; Trenkel, Verena; de Valpine, Perry; Rexstad, Eric

    2014-01-01

    The desire to predict the consequences of global environmental change has been the driver towards more realistic models embracing the variability and uncertainties inherent in ecology. Statistical ecology has gelled over the past decade as a discipline that moves away from describing patterns towards modelling the ecological processes that generate these patterns. Following the fourth International Statistical Ecology Conference (1–4 July 2014) in Montpellier, France, we analyse current trends in statistical ecology. Important advances in the analysis of individual movement, and in the modelling of population dynamics and species distributions, are made possible by the increasing use of hierarchical and hidden process models. Exciting research perspectives include the development of methods to interpret citizen science data and of efficient, flexible computational algorithms for model fitting. Statistical ecology has come of age: it now provides a general and mathematically rigorous framework linking ecological theory and empirical data. PMID:25540151

  7. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nunes, Rafael C.; Abreu, Everton M.C.; Neto, Jorge Ananias

    Based on the relationship between thermodynamics and gravity we propose, with the aid of Verlinde's formalism, an alternative interpretation of the dynamical evolution of the Friedmann-Robertson-Walker Universe. This description takes into account the entropy and temperature intrinsic to the horizon of the universe due to the information holographically stored there through non-gaussian statistical theories proposed by Tsallis and Kaniadakis. The effect of these non-gaussian statistics in the cosmological context is to change the strength of the gravitational constant. In this paper, we consider the w CDM model modified by the non-gaussian statistics and investigate the compatibility of these non-gaussian modificationmore » with the cosmological observations. In order to analyze in which extend the cosmological data constrain these non-extensive statistics, we will use type Ia supernovae, baryon acoustic oscillations, Hubble expansion rate function and the linear growth of matter density perturbations data. We show that Tsallis' statistics is favored at 1σ confidence level.« less

  8. Data Model Performance in Data Warehousing

    NASA Astrophysics Data System (ADS)

    Rorimpandey, G. C.; Sangkop, F. I.; Rantung, V. P.; Zwart, J. P.; Liando, O. E. S.; Mewengkang, A.

    2018-02-01

    Data Warehouses have increasingly become important in organizations that have large amount of data. It is not a product but a part of a solution for the decision support system in those organizations. Data model is the starting point for designing and developing of data warehouses architectures. Thus, the data model needs stable interfaces and consistent for a longer period of time. The aim of this research is to know which data model in data warehousing has the best performance. The research method is descriptive analysis, which has 3 main tasks, such as data collection and organization, analysis of data and interpretation of data. The result of this research is discussed in a statistic analysis method, represents that there is no statistical difference among data models used in data warehousing. The organization can utilize four data model proposed when designing and developing data warehouse.

  9. Pitfalls and important issues in testing reliability using intraclass correlation coefficients in orthopaedic research.

    PubMed

    Lee, Kyoung Min; Lee, Jaebong; Chung, Chin Youb; Ahn, Soyeon; Sung, Ki Hyuk; Kim, Tae Won; Lee, Hui Jong; Park, Moon Seok

    2012-06-01

    Intra-class correlation coefficients (ICCs) provide a statistical means of testing the reliability. However, their interpretation is not well documented in the orthopedic field. The purpose of this study was to investigate the use of ICCs in the orthopedic literature and to demonstrate pitfalls regarding their use. First, orthopedic articles that used ICCs were retrieved from the Pubmed database, and journal demography, ICC models and concurrent statistics used were evaluated. Second, reliability test was performed on three common physical examinations in cerebral palsy, namely, the Thomas test, the Staheli test, and popliteal angle measurement. Thirty patients were assessed by three orthopedic surgeons to explore the statistical methods testing reliability. Third, the factors affecting the ICC values were examined by simulating the data sets based on the physical examination data where the ranges, slopes, and interobserver variability were modified. Of the 92 orthopedic articles identified, 58 articles (63%) did not clarify the ICC model used, and only 5 articles (5%) described all models, types, and measures. In reliability testing, although the popliteal angle showed a larger mean absolute difference than the Thomas test and the Staheli test, the ICC of popliteal angle was higher, which was believed to be contrary to the context of measurement. In addition, the ICC values were affected by the model, type, and measures used. In simulated data sets, the ICC showed higher values when the range of data sets were larger, the slopes of the data sets were parallel, and the interobserver variability was smaller. Care should be taken when interpreting the absolute ICC values, i.e., a higher ICC does not necessarily mean less variability because the ICC values can also be affected by various factors. The authors recommend that researchers clarify ICC models used and ICC values are interpreted in the context of measurement.

  10. CAN'T MISS--conquer any number task by making important statistics simple. Part 1. Types of variables, mean, median, variance, and standard deviation.

    PubMed

    Hansen, John P

    2003-01-01

    Healthcare quality improvement professionals need to understand and use inferential statistics to interpret sample data from their organizations. In quality improvement and healthcare research studies all the data from a population often are not available, so investigators take samples and make inferences about the population by using inferential statistics. This three-part series will give readers an understanding of the concepts of inferential statistics as well as the specific tools for calculating confidence intervals for samples of data. This article, Part 1, presents basic information about data including a classification system that describes the four major types of variables: continuous quantitative variable, discrete quantitative variable, ordinal categorical variable (including the binomial variable), and nominal categorical variable. A histogram is a graph that displays the frequency distribution for a continuous variable. The article also demonstrates how to calculate the mean, median, standard deviation, and variance for a continuous variable.

  11. Water quality analysis of the Rapur area, Andhra Pradesh, South India using multivariate techniques

    NASA Astrophysics Data System (ADS)

    Nagaraju, A.; Sreedhar, Y.; Thejaswi, A.; Sayadi, Mohammad Hossein

    2017-10-01

    The groundwater samples from Rapur area were collected from different sites to evaluate the major ion chemistry. The large number of data can lead to difficulties in the integration, interpretation, and representation of the results. Two multivariate statistical methods, hierarchical cluster analysis (HCA) and factor analysis (FA), were applied to evaluate their usefulness to classify and identify geochemical processes controlling groundwater geochemistry. Four statistically significant clusters were obtained from 30 sampling stations. This has resulted two important clusters viz., cluster 1 (pH, Si, CO3, Mg, SO4, Ca, K, HCO3, alkalinity, Na, Na + K, Cl, and hardness) and cluster 2 (EC and TDS) which are released to the study area from different sources. The application of different multivariate statistical techniques, such as principal component analysis (PCA), assists in the interpretation of complex data matrices for a better understanding of water quality of a study area. From PCA, it is clear that the first factor (factor 1), accounted for 36.2% of the total variance, was high positive loading in EC, Mg, Cl, TDS, and hardness. Based on the PCA scores, four significant cluster groups of sampling locations were detected on the basis of similarity of their water quality.

  12. The earth is flat (p > 0.05): significance thresholds and the crisis of unreplicable research.

    PubMed

    Amrhein, Valentin; Korner-Nievergelt, Fränzi; Roth, Tobias

    2017-01-01

    The widespread use of 'statistical significance' as a license for making a claim of a scientific finding leads to considerable distortion of the scientific process (according to the American Statistical Association). We review why degrading p -values into 'significant' and 'nonsignificant' contributes to making studies irreproducible, or to making them seem irreproducible. A major problem is that we tend to take small p -values at face value, but mistrust results with larger p -values. In either case, p -values tell little about reliability of research, because they are hardly replicable even if an alternative hypothesis is true. Also significance ( p  ≤ 0.05) is hardly replicable: at a good statistical power of 80%, two studies will be 'conflicting', meaning that one is significant and the other is not, in one third of the cases if there is a true effect. A replication can therefore not be interpreted as having failed only because it is nonsignificant. Many apparent replication failures may thus reflect faulty judgment based on significance thresholds rather than a crisis of unreplicable research. Reliable conclusions on replicability and practical importance of a finding can only be drawn using cumulative evidence from multiple independent studies. However, applying significance thresholds makes cumulative knowledge unreliable. One reason is that with anything but ideal statistical power, significant effect sizes will be biased upwards. Interpreting inflated significant results while ignoring nonsignificant results will thus lead to wrong conclusions. But current incentives to hunt for significance lead to selective reporting and to publication bias against nonsignificant findings. Data dredging, p -hacking, and publication bias should be addressed by removing fixed significance thresholds. Consistent with the recommendations of the late Ronald Fisher, p -values should be interpreted as graded measures of the strength of evidence against the null hypothesis. Also larger p -values offer some evidence against the null hypothesis, and they cannot be interpreted as supporting the null hypothesis, falsely concluding that 'there is no effect'. Information on possible true effect sizes that are compatible with the data must be obtained from the point estimate, e.g., from a sample average, and from the interval estimate, such as a confidence interval. We review how confusion about interpretation of larger p -values can be traced back to historical disputes among the founders of modern statistics. We further discuss potential arguments against removing significance thresholds, for example that decision rules should rather be more stringent, that sample sizes could decrease, or that p -values should better be completely abandoned. We conclude that whatever method of statistical inference we use, dichotomous threshold thinking must give way to non-automated informed judgment.

  13. The earth is flat (p > 0.05): significance thresholds and the crisis of unreplicable research

    PubMed Central

    Korner-Nievergelt, Fränzi; Roth, Tobias

    2017-01-01

    The widespread use of ‘statistical significance’ as a license for making a claim of a scientific finding leads to considerable distortion of the scientific process (according to the American Statistical Association). We review why degrading p-values into ‘significant’ and ‘nonsignificant’ contributes to making studies irreproducible, or to making them seem irreproducible. A major problem is that we tend to take small p-values at face value, but mistrust results with larger p-values. In either case, p-values tell little about reliability of research, because they are hardly replicable even if an alternative hypothesis is true. Also significance (p ≤ 0.05) is hardly replicable: at a good statistical power of 80%, two studies will be ‘conflicting’, meaning that one is significant and the other is not, in one third of the cases if there is a true effect. A replication can therefore not be interpreted as having failed only because it is nonsignificant. Many apparent replication failures may thus reflect faulty judgment based on significance thresholds rather than a crisis of unreplicable research. Reliable conclusions on replicability and practical importance of a finding can only be drawn using cumulative evidence from multiple independent studies. However, applying significance thresholds makes cumulative knowledge unreliable. One reason is that with anything but ideal statistical power, significant effect sizes will be biased upwards. Interpreting inflated significant results while ignoring nonsignificant results will thus lead to wrong conclusions. But current incentives to hunt for significance lead to selective reporting and to publication bias against nonsignificant findings. Data dredging, p-hacking, and publication bias should be addressed by removing fixed significance thresholds. Consistent with the recommendations of the late Ronald Fisher, p-values should be interpreted as graded measures of the strength of evidence against the null hypothesis. Also larger p-values offer some evidence against the null hypothesis, and they cannot be interpreted as supporting the null hypothesis, falsely concluding that ‘there is no effect’. Information on possible true effect sizes that are compatible with the data must be obtained from the point estimate, e.g., from a sample average, and from the interval estimate, such as a confidence interval. We review how confusion about interpretation of larger p-values can be traced back to historical disputes among the founders of modern statistics. We further discuss potential arguments against removing significance thresholds, for example that decision rules should rather be more stringent, that sample sizes could decrease, or that p-values should better be completely abandoned. We conclude that whatever method of statistical inference we use, dichotomous threshold thinking must give way to non-automated informed judgment. PMID:28698825

  14. Land-Cover Change in the Central Irregular Plains, 1973-2000

    USGS Publications Warehouse

    Karstensen, Krista A.

    2009-01-01

    Spearheaded by the Geographic Analysis and Monitoring Program of the U.S. Geological Survey (USGS) in collaboration with the U.S. Environmental Protection Agency (EPA) and the National Aeronautics and Space Administration (NASA), the Land Cover Trends is a research project focused on understanding the rates, trends, causes, and consequences of contemporary United States land-use and land-cover change. Using the EPA Level III ecoregions as the geographic framework, scientists process geospatial data collected between 1973 and 2000 to characterize ecosystem responses to land-use changes. The 27-year study period was divided into five temporal periods: 1973-1980, 1980-1986, 1986-1992, 1992-2000 and 1973-2000. General land-cover classes for these periods were interpreted from Landsat Multispectral Scanner, Thematic Mapper, and Enhanced Thematic Mapper Plus imagery to categorize land-cover change and evaluate using a modified Anderson Land Use Land Cover Classification System for image interpretation. The rates of land-cover change are estimated using a stratified, random sampling of 10-kilometer (km) by 10-km blocks allocated within each ecoregion. For each sample block, satellite images are used to interpret land-cover change. Additionally, historical aerial photographs from similar timeframes and other ancillary data such as census statistics and published literature are used. The sample block data are then incorporated into statistical analyses to generate an overall change matrix for the ecoregion. These change statistics are applicable for different levels of scale, including total change for the individual sample blocks and change estimates for the entire ecoregion. The results illustrate that there is no single profile of land-cover change but instead point to geographic variability that results from land uses within ecoregions continuously adapting to various factors including environmental, technological, and socioeconomic.

  15. Which level of evidence does the US National Toxicology Program provide? Statistical considerations using the Technical Report 578 on Ginkgo biloba as an example.

    PubMed

    Gaus, Wilhelm

    2014-09-02

    The US National Toxicology Program (NTP) is assessed by a statistician. In the NTP-program groups of rodents are fed for a certain period of time with different doses of the substance that is being investigated. Then the animals are sacrificed and all organs are examined pathologically. Such an investigation facilitates many statistical tests. Technical Report TR 578 on Ginkgo biloba is used as an example. More than 4800 statistical tests are possible with the investigations performed. Due to a thought experiment we expect >240 false significant tests. In actuality, 209 significant pathological findings were reported. The readers of Toxicology Letters should carefully distinguish between confirmative and explorative statistics. A confirmative interpretation of a significant test rejects the null-hypothesis and delivers "statistical proof". It is only allowed if (i) a precise hypothesis was established independently from the data used for the test and (ii) the computed p-values are adjusted for multiple testing if more than one test was performed. Otherwise an explorative interpretation generates a hypothesis. We conclude that NTP-reports - including TR 578 on Ginkgo biloba - deliver explorative statistics, i.e. they generate hypotheses, but do not prove them. Copyright © 2014 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.

  16. An R package for analyzing and modeling ranking data

    PubMed Central

    2013-01-01

    Background In medical informatics, psychology, market research and many other fields, researchers often need to analyze and model ranking data. However, there is no statistical software that provides tools for the comprehensive analysis of ranking data. Here, we present pmr, an R package for analyzing and modeling ranking data with a bundle of tools. The pmr package enables descriptive statistics (mean rank, pairwise frequencies, and marginal matrix), Analytic Hierarchy Process models (with Saaty’s and Koczkodaj’s inconsistencies), probability models (Luce model, distance-based model, and rank-ordered logit model), and the visualization of ranking data with multidimensional preference analysis. Results Examples of the use of package pmr are given using a real ranking dataset from medical informatics, in which 566 Hong Kong physicians ranked the top five incentives (1: competitive pressures; 2: increased savings; 3: government regulation; 4: improved efficiency; 5: improved quality care; 6: patient demand; 7: financial incentives) to the computerization of clinical practice. The mean rank showed that item 4 is the most preferred item and item 3 is the least preferred item, and significance difference was found between physicians’ preferences with respect to their monthly income. A multidimensional preference analysis identified two dimensions that explain 42% of the total variance. The first can be interpreted as the overall preference of the seven items (labeled as “internal/external”), and the second dimension can be interpreted as their overall variance of (labeled as “push/pull factors”). Various statistical models were fitted, and the best were found to be weighted distance-based models with Spearman’s footrule distance. Conclusions In this paper, we presented the R package pmr, the first package for analyzing and modeling ranking data. The package provides insight to users through descriptive statistics of ranking data. Users can also visualize ranking data by applying a thought multidimensional preference analysis. Various probability models for ranking data are also included, allowing users to choose that which is most suitable to their specific situations. PMID:23672645

  17. An R package for analyzing and modeling ranking data.

    PubMed

    Lee, Paul H; Yu, Philip L H

    2013-05-14

    In medical informatics, psychology, market research and many other fields, researchers often need to analyze and model ranking data. However, there is no statistical software that provides tools for the comprehensive analysis of ranking data. Here, we present pmr, an R package for analyzing and modeling ranking data with a bundle of tools. The pmr package enables descriptive statistics (mean rank, pairwise frequencies, and marginal matrix), Analytic Hierarchy Process models (with Saaty's and Koczkodaj's inconsistencies), probability models (Luce model, distance-based model, and rank-ordered logit model), and the visualization of ranking data with multidimensional preference analysis. Examples of the use of package pmr are given using a real ranking dataset from medical informatics, in which 566 Hong Kong physicians ranked the top five incentives (1: competitive pressures; 2: increased savings; 3: government regulation; 4: improved efficiency; 5: improved quality care; 6: patient demand; 7: financial incentives) to the computerization of clinical practice. The mean rank showed that item 4 is the most preferred item and item 3 is the least preferred item, and significance difference was found between physicians' preferences with respect to their monthly income. A multidimensional preference analysis identified two dimensions that explain 42% of the total variance. The first can be interpreted as the overall preference of the seven items (labeled as "internal/external"), and the second dimension can be interpreted as their overall variance of (labeled as "push/pull factors"). Various statistical models were fitted, and the best were found to be weighted distance-based models with Spearman's footrule distance. In this paper, we presented the R package pmr, the first package for analyzing and modeling ranking data. The package provides insight to users through descriptive statistics of ranking data. Users can also visualize ranking data by applying a thought multidimensional preference analysis. Various probability models for ranking data are also included, allowing users to choose that which is most suitable to their specific situations.

  18. The challenges for scientific publishing, 60 years on.

    PubMed

    Hausmann, Laura; Murphy, Sean P

    2016-10-01

    The most obvious difference in science publishing between 'then' and 'now' is the dramatic change in the communication of data and in their interpretation. The democratization of science via the Internet has brought not only benefits but also challenges to publishing including fraudulent behavior and plagiarism, data and statistics reporting standards, authorship confirmation and other issues which affect authors, readers, and publishers in different ways. The wide accessibility of data on a global scale permits acquisition and meta-analysis to mine for novel synergies, and has created a highly commercialized environment. As we illustrate here, identifying unacceptable practices leads to changes in the standards for data reporting. In the past decades, science publishing underwent dramatic changes in the communication of data and in their interpretation, in the increasing pressure and commercialization, and the democratization of science on a global scale via the Internet. This article reviews the benefits and challenges to publishing including fraudulent behavior and plagiarism, data and statistics reporting standards, authorship confirmation and other issues, with the aim to provide readers with practical examples and hands-on guidelines. As we illustrate here, identifying unacceptable practices leads to changes in the standards for data reporting. This article is part of the 60th Anniversary special issue. © 2016 International Society for Neurochemistry.

  19. Proper interpretation of chronic toxicity studies and their statistics: A critique of "Which level of evidence does the US National Toxicology Program provide? Statistical considerations using the Technical Report 578 on Ginkgo biloba as an example".

    PubMed

    Kissling, Grace E; Haseman, Joseph K; Zeiger, Errol

    2015-09-02

    A recent article by Gaus (2014) demonstrates a serious misunderstanding of the NTP's statistical analysis and interpretation of rodent carcinogenicity data as reported in Technical Report 578 (Ginkgo biloba) (NTP, 2013), as well as a failure to acknowledge the abundant literature on false positive rates in rodent carcinogenicity studies. The NTP reported Ginkgo biloba extract to be carcinogenic in mice and rats. Gaus claims that, in this study, 4800 statistical comparisons were possible, and that 209 of them were statistically significant (p<0.05) compared with 240 (4800×0.05) expected by chance alone; thus, the carcinogenicity of Ginkgo biloba extract cannot be definitively established. However, his assumptions and calculations are flawed since he incorrectly assumes that the NTP uses no correction for multiple comparisons, and that significance tests for discrete data operate at exactly the nominal level. He also misrepresents the NTP's decision making process, overstates the number of statistical comparisons made, and ignores the fact that the mouse liver tumor effects were so striking (e.g., p<0.0000000000001) that it is virtually impossible that they could be false positive outcomes. Gaus' conclusion that such obvious responses merely "generate a hypothesis" rather than demonstrate a real carcinogenic effect has no scientific credibility. Moreover, his claims regarding the high frequency of false positive outcomes in carcinogenicity studies are misleading because of his methodological misconceptions and errors. Published by Elsevier Ireland Ltd.

  20. Proper interpretation of chronic toxicity studies and their statistics: A critique of “Which level of evidence does the US National Toxicology Program provide? Statistical considerations using the Technical Report 578 on Ginkgo biloba as an example”

    PubMed Central

    Kissling, Grace E.; Haseman, Joseph K.; Zeiger, Errol

    2014-01-01

    A recent article by Gaus (2014) demonstrates a serious misunderstanding of the NTP’s statistical analysis and interpretation of rodent carcinogenicity data as reported in Technical Report 578 (Ginkgo biloba) (NTP 2013), as well as a failure to acknowledge the abundant literature on false positive rates in rodent carcinogenicity studies. The NTP reported Ginkgo biloba extract to be carcinogenic in mice and rats. Gaus claims that, in this study, 4800 statistical comparisons were possible, and that 209 of them were statistically significant (p<0.05) compared with 240 (4800 × 0.05) expected by chance alone; thus, the carcinogenicity of Ginkgo biloba extract cannot be definitively established. However, his assumptions and calculations are flawed since he incorrectly assumes that the NTP uses no correction for multiple comparisons, and that significance tests for discrete data operate at exactly the nominal level. He also misrepresents the NTP’s decision making process, overstates the number of statistical comparisons made, and ignores that fact that that the mouse liver tumor effects were so striking (e.g., p<0.0000000000001) that it is virtually impossible that they could be false positive outcomes. Gaus’ conclusion that such obvious responses merely “generate a hypothesis” rather than demonstrate a real carcinogenic effect has no scientific credibility. Moreover, his claims regarding the high frequency of false positive outcomes in carcinogenicity studies are misleading because of his methodological misconceptions and errors. PMID:25261588

  1. Evaluation of a statistics-based Ames mutagenicity QSAR model and interpretation of the results obtained.

    PubMed

    Barber, Chris; Cayley, Alex; Hanser, Thierry; Harding, Alex; Heghes, Crina; Vessey, Jonathan D; Werner, Stephane; Weiner, Sandy K; Wichard, Joerg; Giddings, Amanda; Glowienke, Susanne; Parenty, Alexis; Brigo, Alessandro; Spirkl, Hans-Peter; Amberg, Alexander; Kemper, Ray; Greene, Nigel

    2016-04-01

    The relative wealth of bacterial mutagenicity data available in the public literature means that in silico quantitative/qualitative structure activity relationship (QSAR) systems can readily be built for this endpoint. A good means of evaluating the performance of such systems is to use private unpublished data sets, which generally represent a more distinct chemical space than publicly available test sets and, as a result, provide a greater challenge to the model. However, raw performance metrics should not be the only factor considered when judging this type of software since expert interpretation of the results obtained may allow for further improvements in predictivity. Enough information should be provided by a QSAR to allow the user to make general, scientifically-based arguments in order to assess and overrule predictions when necessary. With all this in mind, we sought to validate the performance of the statistics-based in vitro bacterial mutagenicity prediction system Sarah Nexus (version 1.1) against private test data sets supplied by nine different pharmaceutical companies. The results of these evaluations were then analysed in order to identify findings presented by the model which would be useful for the user to take into consideration when interpreting the results and making their final decision about the mutagenic potential of a given compound. Copyright © 2015 Elsevier Inc. All rights reserved.

  2. Truth, models, model sets, AIC, and multimodel inference: a Bayesian perspective

    USGS Publications Warehouse

    Barker, Richard J.; Link, William A.

    2015-01-01

    Statistical inference begins with viewing data as realizations of stochastic processes. Mathematical models provide partial descriptions of these processes; inference is the process of using the data to obtain a more complete description of the stochastic processes. Wildlife and ecological scientists have become increasingly concerned with the conditional nature of model-based inference: what if the model is wrong? Over the last 2 decades, Akaike's Information Criterion (AIC) has been widely and increasingly used in wildlife statistics for 2 related purposes, first for model choice and second to quantify model uncertainty. We argue that for the second of these purposes, the Bayesian paradigm provides the natural framework for describing uncertainty associated with model choice and provides the most easily communicated basis for model weighting. Moreover, Bayesian arguments provide the sole justification for interpreting model weights (including AIC weights) as coherent (mathematically self consistent) model probabilities. This interpretation requires treating the model as an exact description of the data-generating mechanism. We discuss the implications of this assumption, and conclude that more emphasis is needed on model checking to provide confidence in the quality of inference.

  3. RepExplore: addressing technical replicate variance in proteomics and metabolomics data analysis.

    PubMed

    Glaab, Enrico; Schneider, Reinhard

    2015-07-01

    High-throughput omics datasets often contain technical replicates included to account for technical sources of noise in the measurement process. Although summarizing these replicate measurements by using robust averages may help to reduce the influence of noise on downstream data analysis, the information on the variance across the replicate measurements is lost in the averaging process and therefore typically disregarded in subsequent statistical analyses.We introduce RepExplore, a web-service dedicated to exploit the information captured in the technical replicate variance to provide more reliable and informative differential expression and abundance statistics for omics datasets. The software builds on previously published statistical methods, which have been applied successfully to biomedical omics data but are difficult to use without prior experience in programming or scripting. RepExplore facilitates the analysis by providing a fully automated data processing and interactive ranking tables, whisker plot, heat map and principal component analysis visualizations to interpret omics data and derived statistics. Freely available at http://www.repexplore.tk enrico.glaab@uni.lu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  4. Statistical analysis of solid waste composition data: Arithmetic mean, standard deviation and correlation coefficients.

    PubMed

    Edjabou, Maklawe Essonanawe; Martín-Fernández, Josep Antoni; Scheutz, Charlotte; Astrup, Thomas Fruergaard

    2017-11-01

    Data for fractional solid waste composition provide relative magnitudes of individual waste fractions, the percentages of which always sum to 100, thereby connecting them intrinsically. Due to this sum constraint, waste composition data represent closed data, and their interpretation and analysis require statistical methods, other than classical statistics that are suitable only for non-constrained data such as absolute values. However, the closed characteristics of waste composition data are often ignored when analysed. The results of this study showed, for example, that unavoidable animal-derived food waste amounted to 2.21±3.12% with a confidence interval of (-4.03; 8.45), which highlights the problem of the biased negative proportions. A Pearson's correlation test, applied to waste fraction generation (kg mass), indicated a positive correlation between avoidable vegetable food waste and plastic packaging. However, correlation tests applied to waste fraction compositions (percentage values) showed a negative association in this regard, thus demonstrating that statistical analyses applied to compositional waste fraction data, without addressing the closed characteristics of these data, have the potential to generate spurious or misleading results. Therefore, ¨compositional data should be transformed adequately prior to any statistical analysis, such as computing mean, standard deviation and correlation coefficients. Copyright © 2017 Elsevier Ltd. All rights reserved.

  5. Quantifying Treatment Benefit in Molecular Subgroups to Assess a Predictive Biomarker

    PubMed Central

    Iasonos, Alexia; Chapman, Paul B.; Satagopan, Jaya M.

    2016-01-01

    There is an increased interest in finding predictive biomarkers that can guide treatment options for both mutation carriers and non-carriers. The statistical assessment of variation in treatment benefit (TB) according to the biomarker carrier status plays an important role in evaluating predictive biomarkers. For time to event endpoints, the hazard ratio (HR) for interaction between treatment and a biomarker from a Proportional Hazards regression model is commonly used as a measure of variation in treatment benefit. While this can be easily obtained using available statistical software packages, the interpretation of HR is not straightforward. In this article, we propose different summary measures of variation in TB on the scale of survival probabilities for evaluating a predictive biomarker. The proposed summary measures can be easily interpreted as quantifying differential in TB in terms of relative risk or excess absolute risk due to treatment in carriers versus non-carriers. We illustrate the use and interpretation of the proposed measures using data from completed clinical trials. We encourage clinical practitioners to interpret variation in TB in terms of measures based on survival probabilities, particularly in terms of excess absolute risk, as opposed to HR. PMID:27141007

  6. Unrealistic comparative optimism: An unsuccessful search for evidence of a genuinely motivational bias

    PubMed Central

    Harris, Adam J. L.; de Molière, Laura; Soh, Melinda; Hahn, Ulrike

    2017-01-01

    One of the most accepted findings across psychology is that people are unrealistically optimistic in their judgments of comparative risk concerning future life events—they judge negative events as less likely to happen to themselves than to the average person. Harris and Hahn (2011), however, demonstrated how unbiased (non-optimistic) responses can result in data patterns commonly interpreted as indicative of optimism due to statistical artifacts. In the current paper, we report the results of 5 studies that control for these statistical confounds and observe no evidence for residual unrealistic optimism, even observing a ‘severity effect’ whereby severe outcomes were overestimated relative to neutral ones (Studies 3 & 4). We conclude that there is no evidence supporting an optimism interpretation of previous results using the prevalent comparison method. PMID:28278200

  7. Unrealistic comparative optimism: An unsuccessful search for evidence of a genuinely motivational bias.

    PubMed

    Harris, Adam J L; de Molière, Laura; Soh, Melinda; Hahn, Ulrike

    2017-01-01

    One of the most accepted findings across psychology is that people are unrealistically optimistic in their judgments of comparative risk concerning future life events-they judge negative events as less likely to happen to themselves than to the average person. Harris and Hahn (2011), however, demonstrated how unbiased (non-optimistic) responses can result in data patterns commonly interpreted as indicative of optimism due to statistical artifacts. In the current paper, we report the results of 5 studies that control for these statistical confounds and observe no evidence for residual unrealistic optimism, even observing a 'severity effect' whereby severe outcomes were overestimated relative to neutral ones (Studies 3 & 4). We conclude that there is no evidence supporting an optimism interpretation of previous results using the prevalent comparison method.

  8. The challenging use and interpretation of circulating biomarkers of exposure to persistent organic pollutants in environmental health: Comparison of lipid adjustment approaches in a case study related to endometriosis.

    PubMed

    Cano-Sancho, German; Labrune, Léa; Ploteau, Stéphane; Marchand, Philippe; Le Bizec, Bruno; Antignac, Jean-Philippe

    2018-06-01

    The gold-standard matrix for measuring the internal levels of persistent organic pollutants (POPs) is the adipose tissue, however in epidemiological studies the use of serum is preferred due to the low cost and higher accessibility. The interpretation of serum biomarkers is tightly related to the understanding of the underlying causal structure relating the POPs, serum lipids and the disease. Considering the extended benefits of using serum biomarkers we aimed to further examine if through statistical modelling we would be able to improve the use and interpretation of serum biomarkers in the study of endometriosis. Hence, we have conducted a systematic comparison of statistical approaches commonly used to lipid-adjust the circulating biomarkers of POPs based on existing methods, using data from a pilot case-control study focused on severe deep infiltrating endometriosis. The odds ratios (ORs) obtained from unconditional regression for those models with serum biomarkers were further compared to those obtained from adipose tissue. The results of this exploratory study did not support the use of blood biomarkers as proxy estimates of POPs in adipose tissue to implement in risk models for endometriosis with the available statistical approaches to correct for lipids. The current statistical approaches commonly used to lipid-adjust circulating POPs, do not fully represent the underlying biological complexity between POPs, lipids and disease (especially those directly or indirectly affecting or affected by lipid metabolism). Hence, further investigations are warranted to improve the use and interpretation of blood biomarkers under complex scenarios of lipid dynamics. Copyright © 2018 Elsevier Ltd. All rights reserved.

  9. Statistical significance versus clinical relevance.

    PubMed

    van Rijn, Marieke H C; Bech, Anneke; Bouyer, Jean; van den Brand, Jan A J G

    2017-04-01

    In March this year, the American Statistical Association (ASA) posted a statement on the correct use of P-values, in response to a growing concern that the P-value is commonly misused and misinterpreted. We aim to translate these warnings given by the ASA into a language more easily understood by clinicians and researchers without a deep background in statistics. Moreover, we intend to illustrate the limitations of P-values, even when used and interpreted correctly, and bring more attention to the clinical relevance of study findings using two recently reported studies as examples. We argue that P-values are often misinterpreted. A common mistake is saying that P < 0.05 means that the null hypothesis is false, and P ≥0.05 means that the null hypothesis is true. The correct interpretation of a P-value of 0.05 is that if the null hypothesis were indeed true, a similar or more extreme result would occur 5% of the times upon repeating the study in a similar sample. In other words, the P-value informs about the likelihood of the data given the null hypothesis and not the other way around. A possible alternative related to the P-value is the confidence interval (CI). It provides more information on the magnitude of an effect and the imprecision with which that effect was estimated. However, there is no magic bullet to replace P-values and stop erroneous interpretation of scientific results. Scientists and readers alike should make themselves familiar with the correct, nuanced interpretation of statistical tests, P-values and CIs. © The Author 2017. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.

  10. Medical cost analysis: application to colorectal cancer data from the SEER Medicare database.

    PubMed

    Bang, Heejung

    2005-10-01

    Incompleteness is a key feature of most survival data. Numerous well established statistical methodologies and algorithms exist for analyzing life or failure time data. However, induced censorship invalidates the use of those standard analytic tools for some survival-type data such as medical costs. In this paper, some valid methods currently available for analyzing censored medical cost data are reviewed. Some cautionary findings under different assumptions are envisioned through application to medical costs from colorectal cancer patients. Cost analysis should be suitably planned and carefully interpreted under various meaningful scenarios even with judiciously selected statistical methods. This approach would be greatly helpful to policy makers who seek to prioritize health care expenditures and to assess the elements of resource use.

  11. Maximum entropy models of ecosystem functioning

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bertram, Jason, E-mail: jason.bertram@anu.edu.au

    2014-12-05

    Using organism-level traits to deduce community-level relationships is a fundamental problem in theoretical ecology. This problem parallels the physical one of using particle properties to deduce macroscopic thermodynamic laws, which was successfully achieved with the development of statistical physics. Drawing on this parallel, theoretical ecologists from Lotka onwards have attempted to construct statistical mechanistic theories of ecosystem functioning. Jaynes’ broader interpretation of statistical mechanics, which hinges on the entropy maximisation algorithm (MaxEnt), is of central importance here because the classical foundations of statistical physics do not have clear ecological analogues (e.g. phase space, dynamical invariants). However, models based on themore » information theoretic interpretation of MaxEnt are difficult to interpret ecologically. Here I give a broad discussion of statistical mechanical models of ecosystem functioning and the application of MaxEnt in these models. Emphasising the sample frequency interpretation of MaxEnt, I show that MaxEnt can be used to construct models of ecosystem functioning which are statistical mechanical in the traditional sense using a savanna plant ecology model as an example.« less

  12. A decision support system and rule-based algorithm to augment the human interpretation of the 12-lead electrocardiogram.

    PubMed

    Cairns, Andrew W; Bond, Raymond R; Finlay, Dewar D; Guldenring, Daniel; Badilini, Fabio; Libretti, Guido; Peace, Aaron J; Leslie, Stephen J

    The 12-lead Electrocardiogram (ECG) has been used to detect cardiac abnormalities in the same format for more than 70years. However, due to the complex nature of 12-lead ECG interpretation, there is a significant cognitive workload required from the interpreter. This complexity in ECG interpretation often leads to errors in diagnosis and subsequent treatment. We have previously reported on the development of an ECG interpretation support system designed to augment the human interpretation process. This computerised decision support system has been named 'Interactive Progressive based Interpretation' (IPI). In this study, a decision support algorithm was built into the IPI system to suggest potential diagnoses based on the interpreter's annotations of the 12-lead ECG. We hypothesise semi-automatic interpretation using a digital assistant can be an optimal man-machine model for ECG interpretation. To improve interpretation accuracy and reduce missed co-abnormalities. The Differential Diagnoses Algorithm (DDA) was developed using web technologies where diagnostic ECG criteria are defined in an open storage format, Javascript Object Notation (JSON), which is queried using a rule-based reasoning algorithm to suggest diagnoses. To test our hypothesis, a counterbalanced trial was designed where subjects interpreted ECGs using the conventional approach and using the IPI+DDA approach. A total of 375 interpretations were collected. The IPI+DDA approach was shown to improve diagnostic accuracy by 8.7% (although not statistically significant, p-value=0.1852), the IPI+DDA suggested the correct interpretation more often than the human interpreter in 7/10 cases (varying statistical significance). Human interpretation accuracy increased to 70% when seven suggestions were generated. Although results were not found to be statistically significant, we found; 1) our decision support tool increased the number of correct interpretations, 2) the DDA algorithm suggested the correct interpretation more often than humans, and 3) as many as 7 computerised diagnostic suggestions augmented human decision making in ECG interpretation. Statistical significance may be achieved by expanding sample size. Copyright © 2017 Elsevier Inc. All rights reserved.

  13. Use of keyword hierarchies to interpret gene expression patterns.

    PubMed

    Masys, D R; Welsh, J B; Lynn Fink, J; Gribskov, M; Klacansky, I; Corbeil, J

    2001-04-01

    High-density microarray technology permits the quantitative and simultaneous monitoring of thousands of genes. The interpretation challenge is to extract relevant information from this large amount of data. A growing variety of statistical analysis approaches are available to identify clusters of genes that share common expression characteristics, but provide no information regarding the biological similarities of genes within clusters. The published literature provides a potential source of information to assist in interpretation of clustering results. We describe a data mining method that uses indexing terms ('keywords') from the published literature linked to specific genes to present a view of the conceptual similarity of genes within a cluster or group of interest. The method takes advantage of the hierarchical nature of Medical Subject Headings used to index citations in the MEDLINE database, and the registry numbers applied to enzymes.

  14. Data User's Note: Apollo seismological investigations

    NASA Technical Reports Server (NTRS)

    Vostreys, R. W.

    1980-01-01

    Seismological objectives and equipment used in the passive seismic, active seismic, lunar seismic profiling, and the lunar gravimeter experiments conducted during Apollo 11, 12, 14, 15, 16, and 17 missions are described. The various formats in which the data form these investigations can be obtained are listed an an index showing the NSSDC identification number is provided. Tables show manned lunar landing missions, lunar seismic network statistics, lunar impact coordinate statistics, detonation masses and times of EP's, the ALSEP (Apollo 14) operational history; compressed scale playout tape availability, LSPE coverage for one lunation, and experimenter interpreted events types.

  15. Antecedents of obesity - analysis, interpretation, and use of longitudinal data.

    PubMed

    Gillman, Matthew W; Kleinman, Ken

    2007-07-01

    The obesity epidemic causes misery and death. Most epidemiologists accept the hypothesis that characteristics of the early stages of human development have lifelong influences on obesity-related health outcomes. Unfortunately, there is a dearth of data of sufficient scope and individual history to help unravel the associations of prenatal, postnatal, and childhood factors with adult obesity and health outcomes. Here the authors discuss analytic methods, the interpretation of models, and the use to which such rare and valuable data may be put in developing interventions to combat the epidemic. For example, analytic methods such as quantile and multinomial logistic regression can describe the effects on body mass index range rather than just its mean; structural equation models may allow comparison of the contributions of different factors at different periods in the life course. Interpretation of the data and model construction is complex, and it requires careful consideration of the biologic plausibility and statistical interpretation of putative causal factors. The goals of discovering modifiable determinants of obesity during the prenatal, postnatal, and childhood periods must be kept in sight, and analyses should be built to facilitate them. Ultimately, interventions in these factors may help prevent obesity-related adverse health outcomes for future generations.

  16. Physiological time-series analysis: what does regularity quantify?

    NASA Technical Reports Server (NTRS)

    Pincus, S. M.; Goldberger, A. L.

    1994-01-01

    Approximate entropy (ApEn) is a recently developed statistic quantifying regularity and complexity that appears to have potential application to a wide variety of physiological and clinical time-series data. The focus here is to provide a better understanding of ApEn to facilitate its proper utilization, application, and interpretation. After giving the formal mathematical description of ApEn, we provide a multistep description of the algorithm as applied to two contrasting clinical heart rate data sets. We discuss algorithm implementation and interpretation and introduce a general mathematical hypothesis of the dynamics of a wide class of diseases, indicating the utility of ApEn to test this hypothesis. We indicate the relationship of ApEn to variability measures, the Fourier spectrum, and algorithms motivated by study of chaotic dynamics. We discuss further mathematical properties of ApEn, including the choice of input parameters, statistical issues, and modeling considerations, and we conclude with a section on caveats to ensure correct ApEn utilization.

  17. Understanding sexual orientation and health in Canada: Who are we capturing and who are we missing using the Statistics Canada sexual orientation question?

    PubMed

    Dharma, Christoffer; Bauer, Greta R

    2017-04-20

    Public health research on inequalities in Canada depends heavily on population data sets such as the Canadian Community Health Survey. While sexual orientation has three dimensions - identity, behaviour and attraction - Statistics Canada and public health agencies assess sexual orientation with a single questionnaire item on identity, defined behaviourally. This study aims to evaluate this item, to allow for clearer interpretation of sexual orientation frequencies and inequalities. Through an online convenience sampling of Canadians ≥14 years of age, participants (n = 311) completed the Statistics Canada question and a second set of sexual orientation questions. The single-item question had an 85.8% sensitivity in capturing sexual minorities, broadly defined by their sexual identity, lifetime behaviour and attraction. Kappa statistic for agreement between the single item and sexual identity was 0.89; with past year, lifetime behaviour and attraction were 0.39, 0.48 and 0.57 respectively. The item captured 99.3% of those with a sexual minority identity, 84.2% of those with any lifetime same-sex partners, 98.4% with a past-year same-sex partner, and 97.8% who indicated at least equal attraction to same-sex persons. Findings from Statistics Canada surveys can be best interpreted as applying to those who identify as sexual minorities. Analyses using this measure will underidentify those with same-sex partners or attractions who do not identify as a sexual minority, and should be interpreted accordingly. To understand patterns of sexual minority health in Canada, there is a need to incorporate other dimensions of sexual orientation.

  18. A guide to missing data for the pediatric nephrologist.

    PubMed

    Larkins, Nicholas G; Craig, Jonathan C; Teixeira-Pinto, Armando

    2018-03-13

    Missing data is an important and common source of bias in clinical research. Readers should be alert to and consider the impact of missing data when reading studies. Beyond preventing missing data in the first place, through good study design and conduct, there are different strategies available to handle data containing missing observations. Complete case analysis is often biased unless data are missing completely at random. Better methods of handling missing data include multiple imputation and models using likelihood-based estimation. With advancing computing power and modern statistical software, these methods are within the reach of clinician-researchers under guidance of a biostatistician. As clinicians reading papers, we need to continue to update our understanding of statistical methods, so that we understand the limitations of these techniques and can critically interpret literature.

  19. Characterizing multivariate decoding models based on correlated EEG spectral features

    PubMed Central

    McFarland, Dennis J.

    2013-01-01

    Objective Multivariate decoding methods are popular techniques for analysis of neurophysiological data. The present study explored potential interpretative problems with these techniques when predictors are correlated. Methods Data from sensorimotor rhythm-based cursor control experiments was analyzed offline with linear univariate and multivariate models. Features were derived from autoregressive (AR) spectral analysis of varying model order which produced predictors that varied in their degree of correlation (i.e., multicollinearity). Results The use of multivariate regression models resulted in much better prediction of target position as compared to univariate regression models. However, with lower order AR features interpretation of the spectral patterns of the weights was difficult. This is likely to be due to the high degree of multicollinearity present with lower order AR features. Conclusions Care should be exercised when interpreting the pattern of weights of multivariate models with correlated predictors. Comparison with univariate statistics is advisable. Significance While multivariate decoding algorithms are very useful for prediction their utility for interpretation may be limited when predictors are correlated. PMID:23466267

  20. Regulatory considerations in the design of comparative observational studies using propensity scores.

    PubMed

    Yue, Lilly Q

    2012-01-01

    In the evaluation of medical products, including drugs, biological products, and medical devices, comparative observational studies could play an important role when properly conducted randomized, well-controlled clinical trials are infeasible due to ethical or practical reasons. However, various biases could be introduced at every stage and into every aspect of the observational study, and consequently the interpretation of the resulting statistical inference would be of concern. While there do exist statistical techniques for addressing some of the challenging issues, often based on propensity score methodology, these statistical tools probably have not been as widely employed in prospectively designing observational studies as they should be. There are also times when they are implemented in an unscientific manner, such as performing propensity score model selection for a dataset involving outcome data in the same dataset, so that the integrity of observational study design and the interpretability of outcome analysis results could be compromised. In this paper, regulatory considerations on prospective study design using propensity scores are shared and illustrated with hypothetical examples.

  1. A flexible, interpretable framework for assessing sensitivity to unmeasured confounding.

    PubMed

    Dorie, Vincent; Harada, Masataka; Carnegie, Nicole Bohme; Hill, Jennifer

    2016-09-10

    When estimating causal effects, unmeasured confounding and model misspecification are both potential sources of bias. We propose a method to simultaneously address both issues in the form of a semi-parametric sensitivity analysis. In particular, our approach incorporates Bayesian Additive Regression Trees into a two-parameter sensitivity analysis strategy that assesses sensitivity of posterior distributions of treatment effects to choices of sensitivity parameters. This results in an easily interpretable framework for testing for the impact of an unmeasured confounder that also limits the number of modeling assumptions. We evaluate our approach in a large-scale simulation setting and with high blood pressure data taken from the Third National Health and Nutrition Examination Survey. The model is implemented as open-source software, integrated into the treatSens package for the R statistical programming language. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  2. Application of multivariate statistical techniques in microbial ecology

    PubMed Central

    Paliy, O.; Shankar, V.

    2016-01-01

    Recent advances in high-throughput methods of molecular analyses have led to an explosion of studies generating large scale ecological datasets. Especially noticeable effect has been attained in the field of microbial ecology, where new experimental approaches provided in-depth assessments of the composition, functions, and dynamic changes of complex microbial communities. Because even a single high-throughput experiment produces large amounts of data, powerful statistical techniques of multivariate analysis are well suited to analyze and interpret these datasets. Many different multivariate techniques are available, and often it is not clear which method should be applied to a particular dataset. In this review we describe and compare the most widely used multivariate statistical techniques including exploratory, interpretive, and discriminatory procedures. We consider several important limitations and assumptions of these methods, and we present examples of how these approaches have been utilized in recent studies to provide insight into the ecology of the microbial world. Finally, we offer suggestions for the selection of appropriate methods based on the research question and dataset structure. PMID:26786791

  3. Results of the Verification of the Statistical Distribution Model of Microseismicity Emission Characteristics

    NASA Astrophysics Data System (ADS)

    Cianciara, Aleksander

    2016-09-01

    The paper presents the results of research aimed at verifying the hypothesis that the Weibull distribution is an appropriate statistical distribution model of microseismicity emission characteristics, namely: energy of phenomena and inter-event time. It is understood that the emission under consideration is induced by the natural rock mass fracturing. Because the recorded emission contain noise, therefore, it is subjected to an appropriate filtering. The study has been conducted using the method of statistical verification of null hypothesis that the Weibull distribution fits the empirical cumulative distribution function. As the model describing the cumulative distribution function is given in an analytical form, its verification may be performed using the Kolmogorov-Smirnov goodness-of-fit test. Interpretations by means of probabilistic methods require specifying the correct model describing the statistical distribution of data. Because in these methods measurement data are not used directly, but their statistical distributions, e.g., in the method based on the hazard analysis, or in that that uses maximum value statistics.

  4. Tools to Support Interpreting Multiple Regression in the Face of Multicollinearity

    PubMed Central

    Kraha, Amanda; Turner, Heather; Nimon, Kim; Zientek, Linda Reichwein; Henson, Robin K.

    2012-01-01

    While multicollinearity may increase the difficulty of interpreting multiple regression (MR) results, it should not cause undue problems for the knowledgeable researcher. In the current paper, we argue that rather than using one technique to investigate regression results, researchers should consider multiple indices to understand the contributions that predictors make not only to a regression model, but to each other as well. Some of the techniques to interpret MR effects include, but are not limited to, correlation coefficients, beta weights, structure coefficients, all possible subsets regression, commonality coefficients, dominance weights, and relative importance weights. This article will review a set of techniques to interpret MR effects, identify the elements of the data on which the methods focus, and identify statistical software to support such analyses. PMID:22457655

  5. Tools to support interpreting multiple regression in the face of multicollinearity.

    PubMed

    Kraha, Amanda; Turner, Heather; Nimon, Kim; Zientek, Linda Reichwein; Henson, Robin K

    2012-01-01

    While multicollinearity may increase the difficulty of interpreting multiple regression (MR) results, it should not cause undue problems for the knowledgeable researcher. In the current paper, we argue that rather than using one technique to investigate regression results, researchers should consider multiple indices to understand the contributions that predictors make not only to a regression model, but to each other as well. Some of the techniques to interpret MR effects include, but are not limited to, correlation coefficients, beta weights, structure coefficients, all possible subsets regression, commonality coefficients, dominance weights, and relative importance weights. This article will review a set of techniques to interpret MR effects, identify the elements of the data on which the methods focus, and identify statistical software to support such analyses.

  6. Statistical techniques applied to aerial radiometric surveys (STAARS): principal components analysis user's manual. [NURE program

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Koch, C.D.; Pirkle, F.L.; Schmidt, J.S.

    1981-01-01

    A Principal Components Analysis (PCA) has been written to aid in the interpretation of multivariate aerial radiometric data collected by the US Department of Energy (DOE) under the National Uranium Resource Evaluation (NURE) program. The variations exhibited by these data have been reduced and classified into a number of linear combinations by using the PCA program. The PCA program then generates histograms and outlier maps of the individual variates. Black and white plots can be made on a Calcomp plotter by the application of follow-up programs. All programs referred to in this guide were written for a DEC-10. From thismore » analysis a geologist may begin to interpret the data structure. Insight into geological processes underlying the data may be obtained.« less

  7. Combinatorial interpretation of Haldane-Wu fractional exclusion statistics.

    PubMed

    Aringazin, A K; Mazhitov, M I

    2002-08-01

    Assuming that the maximal allowed number of identical particles in a state is an integer parameter, q, we derive the statistical weight and analyze the associated equation that defines the statistical distribution. The derived distribution covers Fermi-Dirac and Bose-Einstein ones in the particular cases q=1 and q--> infinity (n(i)/q-->1), respectively. We show that the derived statistical weight provides a natural combinatorial interpretation of Haldane-Wu fractional exclusion statistics, and present exact solutions of the distribution equation.

  8. “Magnitude-based Inference”: A Statistical Review

    PubMed Central

    Welsh, Alan H.; Knight, Emma J.

    2015-01-01

    ABSTRACT Purpose We consider “magnitude-based inference” and its interpretation by examining in detail its use in the problem of comparing two means. Methods We extract from the spreadsheets, which are provided to users of the analysis (http://www.sportsci.org/), a precise description of how “magnitude-based inference” is implemented. We compare the implemented version of the method with general descriptions of it and interpret the method in familiar statistical terms. Results and Conclusions We show that “magnitude-based inference” is not a progressive improvement on modern statistics. The additional probabilities introduced are not directly related to the confidence interval but, rather, are interpretable either as P values for two different nonstandard tests (for different null hypotheses) or as approximate Bayesian calculations, which also lead to a type of test. We also discuss sample size calculations associated with “magnitude-based inference” and show that the substantial reduction in sample sizes claimed for the method (30% of the sample size obtained from standard frequentist calculations) is not justifiable so the sample size calculations should not be used. Rather than using “magnitude-based inference,” a better solution is to be realistic about the limitations of the data and use either confidence intervals or a fully Bayesian analysis. PMID:25051387

  9. GeoDash: Assisting Visual Image Interpretation in Collect Earth Online by Leveraging Big Data on Google Earth Engine

    NASA Technical Reports Server (NTRS)

    Markert, Kel; Ashmall, William; Johnson, Gary; Saah, David; Mollicone, Danilo; Diaz, Alfonso Sanchez-Paus; Anderson, Eric; Flores, Africa; Griffin, Robert

    2017-01-01

    Collect Earth Online (CEO) is a free and open online implementation of the FAO Collect Earth system for collaboratively collecting environmental data through the visual interpretation of Earth observation imagery. The primary collection mechanism in CEO is human interpretation of land surface characteristics in imagery served via Web Map Services (WMS). However, interpreters may not have enough contextual information to classify samples by only viewing the imagery served via WMS, be they high resolution or otherwise. To assist in the interpretation and collection processes in CEO, SERVIR, a joint NASA-USAID initiative that brings Earth observations to improve environmental decision making in developing countries, developed the GeoDash system, an embedded and critical component of CEO. GeoDash leverages Google Earth Engine (GEE) by allowing users to set up custom browser-based widgets that pull from GEE's massive public data catalog. These widgets can be quick looks of other satellite imagery, time series graphs of environmental variables, and statistics panels of the same. Users can customize widgets with any of GEE's image collections, such as the historical Landsat collection with data available since the 1970s, select date ranges, image stretch parameters, graph characteristics, and create custom layouts, all on-the-fly to support plot interpretation in CEO. This presentation focuses on the implementation and potential applications, including the back-end links to GEE and the user interface with custom widget building. GeoDash takes large data volumes and condenses them into meaningful, relevant information for interpreters. While designed initially with national and global forest resource assessments in mind, the system will complement disaster assessments, agriculture management, project monitoring and evaluation, and more.

  10. GeoDash: Assisting Visual Image Interpretation in Collect Earth Online by Leveraging Big Data on Google Earth Engine

    NASA Astrophysics Data System (ADS)

    Markert, K. N.; Ashmall, W.; Johnson, G.; Saah, D. S.; Anderson, E.; Flores Cordova, A. I.; Díaz, A. S. P.; Mollicone, D.; Griffin, R.

    2017-12-01

    Collect Earth Online (CEO) is a free and open online implementation of the FAO Collect Earth system for collaboratively collecting environmental data through the visual interpretation of Earth observation imagery. The primary collection mechanism in CEO is human interpretation of land surface characteristics in imagery served via Web Map Services (WMS). However, interpreters may not have enough contextual information to classify samples by only viewing the imagery served via WMS, be they high resolution or otherwise. To assist in the interpretation and collection processes in CEO, SERVIR, a joint NASA-USAID initiative that brings Earth observations to improve environmental decision making in developing countries, developed the GeoDash system, an embedded and critical component of CEO. GeoDash leverages Google Earth Engine (GEE) by allowing users to set up custom browser-based widgets that pull from GEE's massive public data catalog. These widgets can be quick looks of other satellite imagery, time series graphs of environmental variables, and statistics panels of the same. Users can customize widgets with any of GEE's image collections, such as the historical Landsat collection with data available since the 1970s, select date ranges, image stretch parameters, graph characteristics, and create custom layouts, all on-the-fly to support plot interpretation in CEO. This presentation focuses on the implementation and potential applications, including the back-end links to GEE and the user interface with custom widget building. GeoDash takes large data volumes and condenses them into meaningful, relevant information for interpreters. While designed initially with national and global forest resource assessments in mind, the system will complement disaster assessments, agriculture management, project monitoring and evaluation, and more.

  11. Investigating Student Understanding of Histograms

    ERIC Educational Resources Information Center

    Kaplan, Jennifer J.; Gabrosek, John G.; Curtiss, Phyllis; Malone, Chris

    2014-01-01

    Histograms are adept at revealing the distribution of data values, especially the shape of the distribution and any outlier values. They are included in introductory statistics texts, research methods texts, and in the popular press, yet students often have difficulty interpreting the information conveyed by a histogram. This research identifies…

  12. School Readiness Factor Analyzed.

    ERIC Educational Resources Information Center

    Brenner, Anton; Scott, Leland H.

    This paper is an empirical statistical analysis and interpretation of data relating to school readiness previously examined and reported on a theoretical basis. A total of 118 white, middle class children from six consecutive kindergarten groups in Dearborn, Michigan were tested with seven instruments, evaluated in terms of achievement, ability,…

  13. Simpson's Paradox in the Interpretation of "Leaky Pipeline" Data

    ERIC Educational Resources Information Center

    Walton, Paul H.; Walton, Daniel J.

    2016-01-01

    The traditional "leaky pipeline" plots are widely used to inform gender equality policy and practice. Herein, we demonstrate how a statistical phenomenon known as Simpson's paradox can obscure trends in gender "leaky pipeline" plots. Our approach has been to use Excel spreadsheets to generate hypothetical "leaky…

  14. Maximum Likelihood Estimation of Spectra Information from Multiple Independent Astrophysics Data Sets

    NASA Technical Reports Server (NTRS)

    Howell, Leonard W., Jr.; Six, N. Frank (Technical Monitor)

    2002-01-01

    The Maximum Likelihood (ML) statistical theory required to estimate spectra information from an arbitrary number of astrophysics data sets produced by vastly different science instruments is developed in this paper. This theory and its successful implementation will facilitate the interpretation of spectral information from multiple astrophysics missions and thereby permit the derivation of superior spectral information based on the combination of data sets. The procedure is of significant value to both existing data sets and those to be produced by future astrophysics missions consisting of two or more detectors by allowing instrument developers to optimize each detector's design parameters through simulation studies in order to design and build complementary detectors that will maximize the precision with which the science objectives may be obtained. The benefits of this ML theory and its application is measured in terms of the reduction of the statistical errors (standard deviations) of the spectra information using the multiple data sets in concert as compared to the statistical errors of the spectra information when the data sets are considered separately, as well as any biases resulting from poor statistics in one or more of the individual data sets that might be reduced when the data sets are combined.

  15. Common Scientific and Statistical Errors in Obesity Research

    PubMed Central

    George, Brandon J.; Beasley, T. Mark; Brown, Andrew W.; Dawson, John; Dimova, Rositsa; Divers, Jasmin; Goldsby, TaShauna U.; Heo, Moonseong; Kaiser, Kathryn A.; Keith, Scott; Kim, Mimi Y.; Li, Peng; Mehta, Tapan; Oakes, J. Michael; Skinner, Asheley; Stuart, Elizabeth; Allison, David B.

    2015-01-01

    We identify 10 common errors and problems in the statistical analysis, design, interpretation, and reporting of obesity research and discuss how they can be avoided. The 10 topics are: 1) misinterpretation of statistical significance, 2) inappropriate testing against baseline values, 3) excessive and undisclosed multiple testing and “p-value hacking,” 4) mishandling of clustering in cluster randomized trials, 5) misconceptions about nonparametric tests, 6) mishandling of missing data, 7) miscalculation of effect sizes, 8) ignoring regression to the mean, 9) ignoring confirmation bias, and 10) insufficient statistical reporting. We hope that discussion of these errors can improve the quality of obesity research by helping researchers to implement proper statistical practice and to know when to seek the help of a statistician. PMID:27028280

  16. [Notes on vital statistics for the study of perinatal health].

    PubMed

    Juárez, Sol Pía

    2014-01-01

    Vital statistics, published by the National Statistics Institute in Spain, are a highly important source for the study of perinatal health nationwide. However, the process of data collection is not well-known and has implications both for the quality and interpretation of the epidemiological results derived from this source. The aim of this study was to present how the information is collected and some of the associated problems. This study is the result of an analysis of the methodological notes from the National Statistics Institute and first-hand information obtained from hospitals, the Central Civil Registry of Madrid, and the Madrid Institute for Statistics. Greater integration between these institutions is required to improve the quality of birth and stillbirth statistics. Copyright © 2014 SESPAS. Published by Elsevier Espana. All rights reserved.

  17. Deciphering Sources of Variability in Clinical Pathology.

    PubMed

    Tripathi, Niraj K; Everds, Nancy E; Schultze, A Eric; Irizarry, Armando R; Hall, Robert L; Provencher, Anne; Aulbach, Adam

    2017-01-01

    The objectives of this session were to explore causes of variability in clinical pathology data due to preanalytical and analytical variables as well as study design and other procedures that occur in toxicity testing studies. The presenters highlighted challenges associated with such variability in differentiating test article-related effects from the effects of experimental procedures and its impact on overall data interpretation. These presentations focused on preanalytical and analytical variables and study design-related factors and their influence on clinical pathology data, and the importance of various factors that influence data interpretation including statistical analysis and reference intervals. Overall, these presentations touched upon potential effect of many variables on clinical pathology parameters, including animal physiology, sample collection process, specimen handling and analysis, study design, and some discussion points on how to manage those variables to ensure accurate interpretation of clinical pathology data in toxicity studies. This article is a brief synopsis of presentations given in a session entitled "Deciphering Sources of Variability in Clinical Pathology-It's Not Just about the Numbers" that occurred at the 35th Annual Symposium of the Society of Toxicologic Pathology in San Diego, California.

  18. Statistical Model Selection for TID Hardness Assurance

    NASA Technical Reports Server (NTRS)

    Ladbury, R.; Gorelick, J. L.; McClure, S.

    2010-01-01

    Radiation Hardness Assurance (RHA) methodologies against Total Ionizing Dose (TID) degradation impose rigorous statistical treatments for data from a part's Radiation Lot Acceptance Test (RLAT) and/or its historical performance. However, no similar methods exist for using "similarity" data - that is, data for similar parts fabricated in the same process as the part under qualification. This is despite the greater difficulty and potential risk in interpreting of similarity data. In this work, we develop methods to disentangle part-to-part, lot-to-lot and part-type-to-part-type variation. The methods we develop apply not just for qualification decisions, but also for quality control and detection of process changes and other "out-of-family" behavior. We begin by discussing the data used in ·the study and the challenges of developing a statistic providing a meaningful measure of degradation across multiple part types, each with its own performance specifications. We then develop analysis techniques and apply them to the different data sets.

  19. Solar-cell interconnect design for terrestrial photovoltaic modules

    NASA Technical Reports Server (NTRS)

    Mon, G. R.; Moore, D. M.; Ross, R. G., Jr.

    1984-01-01

    Useful solar cell interconnect reliability design and life prediction algorithms are presented, together with experimental data indicating that the classical strain cycle (fatigue) curve for the interconnect material does not account for the statistical scatter that is required in reliability predictions. This shortcoming is presently addressed by fitting a functional form to experimental cumulative interconnect failure rate data, which thereby yields statistical fatigue curves enabling not only the prediction of cumulative interconnect failures during the design life of an array field, but also the quantitative interpretation of data from accelerated thermal cycling tests. Optimal interconnect cost reliability design algorithms are also derived which may allow the minimization of energy cost over the design life of the array field.

  20. Solar-cell interconnect design for terrestrial photovoltaic modules

    NASA Astrophysics Data System (ADS)

    Mon, G. R.; Moore, D. M.; Ross, R. G., Jr.

    1984-11-01

    Useful solar cell interconnect reliability design and life prediction algorithms are presented, together with experimental data indicating that the classical strain cycle (fatigue) curve for the interconnect material does not account for the statistical scatter that is required in reliability predictions. This shortcoming is presently addressed by fitting a functional form to experimental cumulative interconnect failure rate data, which thereby yields statistical fatigue curves enabling not only the prediction of cumulative interconnect failures during the design life of an array field, but also the quantitative interpretation of data from accelerated thermal cycling tests. Optimal interconnect cost reliability design algorithms are also derived which may allow the minimization of energy cost over the design life of the array field.

  1. Interpreting comprehensive two-dimensional gas chromatography using peak topography maps with application to petroleum forensics.

    PubMed

    Ghasemi Damavandi, Hamidreza; Sen Gupta, Ananya; Nelson, Robert K; Reddy, Christopher M

    2016-01-01

    Comprehensive two-dimensional gas chromatography [Formula: see text] provides high-resolution separations across hundreds of compounds in a complex mixture, thus unlocking unprecedented information for intricate quantitative interpretation. We exploit this compound diversity across the [Formula: see text] topography to provide quantitative compound-cognizant interpretation beyond target compound analysis with petroleum forensics as a practical application. We focus on the [Formula: see text] topography of biomarker hydrocarbons, hopanes and steranes, as they are generally recalcitrant to weathering. We introduce peak topography maps (PTM) and topography partitioning techniques that consider a notably broader and more diverse range of target and non-target biomarker compounds compared to traditional approaches that consider approximately 20 biomarker ratios. Specifically, we consider a range of 33-154 target and non-target biomarkers with highest-to-lowest peak ratio within an injection ranging from 4.86 to 19.6 (precise numbers depend on biomarker diversity of individual injections). We also provide a robust quantitative measure for directly determining "match" between samples, without necessitating training data sets. We validate our methods across 34 [Formula: see text] injections from a diverse portfolio of petroleum sources, and provide quantitative comparison of performance against established statistical methods such as principal components analysis (PCA). Our data set includes a wide range of samples collected following the 2010 Deepwater Horizon disaster that released approximately 160 million gallons of crude oil from the Macondo well (MW). Samples that were clearly collected following this disaster exhibit statistically significant match [Formula: see text] using PTM-based interpretation against other closely related sources. PTM-based interpretation also provides higher differentiation between closely correlated but distinct sources than obtained using PCA-based statistical comparisons. In addition to results based on this experimental field data, we also provide extentive perturbation analysis of the PTM method over numerical simulations that introduce random variability of peak locations over the [Formula: see text] biomarker ROI image of the MW pre-spill sample (sample [Formula: see text] in Additional file 4: Table S1). We compare the robustness of the cross-PTM score against peak location variability in both dimensions and compare the results against PCA analysis over the same set of simulated images. Detailed description of the simulation experiment and discussion of results are provided in Additional file 1: Section S8. We provide a peak-cognizant informational framework for quantitative interpretation of [Formula: see text] topography. Proposed topographic analysis enables [Formula: see text] forensic interpretation across target petroleum biomarkers, while including the nuances of lesser-known non-target biomarkers clustered around the target peaks. This allows potential discovery of hitherto unknown connections between target and non-target biomarkers.

  2. On Allometry Relations

    NASA Astrophysics Data System (ADS)

    West, Damien; West, Bruce J.

    2012-07-01

    There are a substantial number of empirical relations that began with the identification of a pattern in data; were shown to have a terse power-law description; were interpreted using existing theory; reached the level of "law" and given a name; only to be subsequently fade away when it proved impossible to connect the "law" with a larger body of theory and/or data. Various forms of allometry relations (ARs) have followed this path. The ARs in biology are nearly two hundred years old and those in ecology, geophysics, physiology and other areas of investigation are not that much younger. In general if X is a measure of the size of a complex host network and Y is a property of a complex subnetwork embedded within the host network a theoretical AR exists between the two when Y = aXb. We emphasize that the reductionistic models of AR interpret X and Y as dynamic variables, albeit the ARs themselves are explicitly time independent even though in some cases the parameter values change over time. On the other hand, the phenomenological models of AR are based on the statistical analysis of data and interpret X and Y as averages to yield the empirical AR: = ab. Modern explanations of AR begin with the application of fractal geometry and fractal statistics to scaling phenomena. The detailed application of fractal geometry to the explanation of theoretical ARs in living networks is slightly more than a decade old and although well received it has not been universally accepted. An alternate perspective is given by the empirical AR that is derived using linear regression analysis of fluctuating data sets. We emphasize that the theoretical and empirical ARs are not the same and review theories "explaining" AR from both the reductionist and statistical fractal perspectives. The probability calculus is used to systematically incorporate both views into a single modeling strategy. We conclude that the empirical AR is entailed by the scaling behavior of the probability density, which is derived using the probability calculus.

  3. Exploring the Relationship Between Eye Movements and Electrocardiogram Interpretation Accuracy

    NASA Astrophysics Data System (ADS)

    Davies, Alan; Brown, Gavin; Vigo, Markel; Harper, Simon; Horseman, Laura; Splendiani, Bruno; Hill, Elspeth; Jay, Caroline

    2016-12-01

    Interpretation of electrocardiograms (ECGs) is a complex task involving visual inspection. This paper aims to improve understanding of how practitioners perceive ECGs, and determine whether visual behaviour can indicate differences in interpretation accuracy. A group of healthcare practitioners (n = 31) who interpret ECGs as part of their clinical role were shown 11 commonly encountered ECGs on a computer screen. The participants’ eye movement data were recorded as they viewed the ECGs and attempted interpretation. The Jensen-Shannon distance was computed for the distance between two Markov chains, constructed from the transition matrices (visual shifts from and to ECG leads) of the correct and incorrect interpretation groups for each ECG. A permutation test was then used to compare this distance against 10,000 randomly shuffled groups made up of the same participants. The results demonstrated a statistically significant (α  0.05) result in 5 of the 11 stimuli demonstrating that the gaze shift between the ECG leads is different between the groups making correct and incorrect interpretations and therefore a factor in interpretation accuracy. The results shed further light on the relationship between visual behaviour and ECG interpretation accuracy, providing information that can be used to improve both human and automated interpretation approaches.

  4. For a statistical interpretation of Helmholtz' thermal displacement

    NASA Astrophysics Data System (ADS)

    Podio-Guidugli, Paolo

    2016-11-01

    On moving from the classic papers by Einstein and Langevin on Brownian motion, two consistent statistical interpretations are given for the thermal displacement, a scalar field formally introduced by Helmholtz, whose time derivative is by definition the absolute temperature.

  5. Examination of two methods for statistical analysis of data with magnitude and direction emphasizing vestibular research applications

    NASA Technical Reports Server (NTRS)

    Calkins, D. S.

    1998-01-01

    When the dependent (or response) variable response variable in an experiment has direction and magnitude, one approach that has been used for statistical analysis involves splitting magnitude and direction and applying univariate statistical techniques to the components. However, such treatment of quantities with direction and magnitude is not justifiable mathematically and can lead to incorrect conclusions about relationships among variables and, as a result, to flawed interpretations. This note discusses a problem with that practice and recommends mathematically correct procedures to be used with dependent variables that have direction and magnitude for 1) computation of mean values, 2) statistical contrasts of and confidence intervals for means, and 3) correlation methods.

  6. Techniques in teaching statistics : linking research production and research use.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Martinez-Moyano, I .; Smith, A.; Univ. of Massachusetts at Boston)

    In the spirit of closing the 'research-practice gap,' the authors extend evidence-based principles to statistics instruction in social science graduate education. The authors employ a Delphi method to survey experienced statistics instructors to identify teaching techniques to overcome the challenges inherent in teaching statistics to students enrolled in practitioner-oriented master's degree programs. Among the teaching techniques identi?ed as essential are using real-life examples, requiring data collection exercises, and emphasizing interpretation rather than results. Building on existing research, preliminary interviews, and the ?ndings from the study, the authors develop a model describing antecedents to the strength of the link between researchmore » and practice.« less

  7. Paleomagnetism.org - An online multi-platform and open source environment for paleomagnetic analysis.

    NASA Astrophysics Data System (ADS)

    Koymans, Mathijs; Langereis, Cor; Pastor-Galán, Daniel; van Hinsbergen, Douwe

    2017-04-01

    This contribution gives an overview of Paleomagnetism.org (Koymans et al., 2016), an online environment for paleomagnetic analysis. The application is developed in JavaScript and is fully open-sourced. It presents an interactive website in which paleomagnetic data can be interpreted, evaluated, visualized, and shared with others. The application has been available from late 2015 and since then has evolved with the addition of a magnetostratigraphic tool, additional input formats, and features that emphasize on the link between geomagnetism and tectonics. In the interpretation portal, principle component analysis (Kirschvink et al., 1981) can be applied on visualized demagnetization data (Zijderveld, 1967). Interpreted directions and great circles are combined using the iterative procedure described by (McFadden and McElhinny, 1988). The resulting directions can be further used in the statistics portal or exported as raw tabulated data and high-quality figures. The available tools in the statistics portal cover standard Fisher statistics for directional data and virtual geomagnetic poles (Fisher, 1953; Butler, 1992; Deenen et al., 2011). Other tools include the eigenvector approach foldtest (Tauxe and Watson, 1994), a bootstrapped reversal test (Tauxe et al., 2009), and the classical reversal test (McFadden and McElhinny, 1990). An implementation exists for the detection and correction of inclination shallowing in sediments (Tauxe and Kent, 2004; Tauxe et al., 2008), and a module to visualize apparent polar wander paths (Torsvik et al., 2012; Kent and Irving, 2010; Besse and Courtillot, 2002) for large continent-bearing plates. A miscellaneous portal exists for a set of tools that include a boostrapped oroclinal test (Pastor-Galán et al., 2016) for assessing possible linear relationships between strike and declination. Another tool that is available completes a net tectonic rotation analysis (after Morris et al., 1999) that restores a dyke to its paleo-vertical and can be used in determining paleo-spreading directions fundamental to plate reconstructions. Paleomagnetism.org provides an integrated approach for researchers to export and share paleomagnetic data through a common interface. The portals create a custom exportable file that can be distributed and included in public databases. With a publication, this file can be appended and would contain all paleomagnetic data discussed in the publication. The appended file can then be imported to the application by other researchers for reviewing. The accessibility and simplicity through which paleomagnetic data can be interpreted, analyzed, visualized, and shared should make Paleomagnetism.org of interest to the paleomagnetic and tectonic communities.

  8. Evaluation of forensic DNA mixture evidence: protocol for evaluation, interpretation, and statistical calculations using the combined probability of inclusion.

    PubMed

    Bieber, Frederick R; Buckleton, John S; Budowle, Bruce; Butler, John M; Coble, Michael D

    2016-08-31

    The evaluation and interpretation of forensic DNA mixture evidence faces greater interpretational challenges due to increasingly complex mixture evidence. Such challenges include: casework involving low quantity or degraded evidence leading to allele and locus dropout; allele sharing of contributors leading to allele stacking; and differentiation of PCR stutter artifacts from true alleles. There is variation in statistical approaches used to evaluate the strength of the evidence when inclusion of a specific known individual(s) is determined, and the approaches used must be supportable. There are concerns that methods utilized for interpretation of complex forensic DNA mixtures may not be implemented properly in some casework. Similar questions are being raised in a number of U.S. jurisdictions, leading to some confusion about mixture interpretation for current and previous casework. Key elements necessary for the interpretation and statistical evaluation of forensic DNA mixtures are described. Given the most common method for statistical evaluation of DNA mixtures in many parts of the world, including the USA, is the Combined Probability of Inclusion/Exclusion (CPI/CPE). Exposition and elucidation of this method and a protocol for use is the focus of this article. Formulae and other supporting materials are provided. Guidance and details of a DNA mixture interpretation protocol is provided for application of the CPI/CPE method in the analysis of more complex forensic DNA mixtures. This description, in turn, should help reduce the variability of interpretation with application of this methodology and thereby improve the quality of DNA mixture interpretation throughout the forensic community.

  9. Whole-Range Assessment: A Simple Method for Analysing Allelopathic Dose-Response Data

    PubMed Central

    An, Min; Pratley, J. E.; Haig, T.; Liu, D.L.

    2005-01-01

    Based on the typical biological responses of an organism to allelochemicals (hormesis), concepts of whole-range assessment and inhibition index were developed for improved analysis of allelopathic data. Examples of their application are presented using data drawn from the literature. The method is concise and comprehensive, and makes data grouping and multiple comparisons simple, logical, and possible. It improves data interpretation, enhances research outcomes, and is a statistically efficient summary of the plant response profiles. PMID:19330165

  10. An evaluation of the Goddard Space Flight Center Library

    NASA Technical Reports Server (NTRS)

    Herner, S.; Lancaster, F. W.; Wright, N.; Ockerman, L.; Shearer, B.; Greenspan, S.; Mccartney, J.; Vellucci, M.

    1979-01-01

    The character and degree of coincidence between the current and future missions, programs, and projects of the Goddard Space Flight Center and the current and future collection, services, and facilities of its library were determined from structured interviews and discussions with various classes of facility personnel. In addition to the tabulation and interpretation of the data from the structured interview survey, five types of statistical analyses were performed to corroborate (or contradict) the survey results and to produce useful information not readily attainable through survey material. Conclusions reached regarding compatability between needs and holdings, services and buildings, library hours of operation, methods of early detection and anticipation of changing holdings requirements, and the impact of near future programs are presented along with a list of statistics needing collection, organization, and interpretation on a continuing or longitudinal basis.

  11. When Machines Think: Radiology's Next Frontier.

    PubMed

    Dreyer, Keith J; Geis, J Raymond

    2017-12-01

    Artificial intelligence (AI), machine learning, and deep learning are terms now seen frequently, all of which refer to computer algorithms that change as they are exposed to more data. Many of these algorithms are surprisingly good at recognizing objects in images. The combination of large amounts of machine-consumable digital data, increased and cheaper computing power, and increasingly sophisticated statistical models combine to enable machines to find patterns in data in ways that are not only cost-effective but also potentially beyond humans' abilities. Building an AI algorithm can be surprisingly easy. Understanding the associated data structures and statistics, on the other hand, is often difficult and obscure. Converting the algorithm into a sophisticated product that works consistently in broad, general clinical use is complex and incompletely understood. To show how these AI products reduce costs and improve outcomes will require clinical translation and industrial-grade integration into routine workflow. Radiology has the chance to leverage AI to become a center of intelligently aggregated, quantitative, diagnostic information. Centaur radiologists, formed as a synergy of human plus computer, will provide interpretations using data extracted from images by humans and image-analysis computer algorithms, as well as the electronic health record, genomics, and other disparate sources. These interpretations will form the foundation of precision health care, or care customized to an individual patient. © RSNA, 2017.

  12. Modified two-sources quantum statistical model and multiplicity fluctuation in the finite rapidity region

    NASA Astrophysics Data System (ADS)

    Ghosh, Dipak; Sarkar, Sharmila; Sen, Sanjib; Roy, Jaya

    1995-06-01

    In this paper the behavior of factorial moments with rapidity window size, which is usually explained in terms of ``intermittency,'' has been interpreted by simple quantum statistical properties of the emitting system using the concept of ``modified two-source model'' as recently proposed by Ghosh and Sarkar [Phys. Lett. B 278, 465 (1992)]. The analysis has been performed using our own data of 16Ag/Br and 24Ag/Br interactions at a few tens of GeV energy regime.

  13. 78 FR 13508 - Interpreting Nondiscrimination Requirements of Executive Order 11246 With Respect to Systemic...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-02-28

    ...-2.17(b)-(d). Nevertheless, Bureau of Labor Statistics data and numerous research studies indicate... Affairs 193 (2011). Ultimately, the research literature still finds an unexplained gap exists even after... multiple regression as potential evidence of discrimination.\\22\\ Similarly, published research on...

  14. Evaluation of Teacher Candidates Writing Skills

    ERIC Educational Resources Information Center

    Bagci Ayranci, Bilge; Mete, Filiz

    2017-01-01

    In this study, 200 volunteer students who were in Faculty of Education printed free essay compositions and the compositions were evaluated by descriptive statistics and content analysis. Content analysis is to interpret similar data within specific contexts and themes. The students were free to choose their own writing topic. The reason for this…

  15. Interpreting Bivariate Regression Coefficients: Going beyond the Average

    ERIC Educational Resources Information Center

    Halcoussis, Dennis; Phillips, G. Michael

    2010-01-01

    Statistics, econometrics, investment analysis, and data analysis classes often review the calculation of several types of averages, including the arithmetic mean, geometric mean, harmonic mean, and various weighted averages. This note shows how each of these can be computed using a basic regression framework. By recognizing when a regression model…

  16. Examining How Treatment Fidelity Is Supported, Measured, and Reported in K-3 Reading Intervention Research

    ERIC Educational Resources Information Center

    Capin, Philip; Walker, Melodee A.; Vaughn, Sharon; Wanzek, Jeanne

    2017-01-01

    Treatment fidelity data (descriptive and statistical) are critical to interpreting and generalizing outcomes of intervention research. Despite recommendations for treatment fidelity reporting from funding agencies and researchers, past syntheses have found treatment fidelity is frequently unreported (e.g., Swanson, "The Journal of Special…

  17. Mathematical and statistical approaches for interpreting biomarker compounds in exhaled human breath

    EPA Science Inventory

    The various instrumental techniques, human studies, and diagnostic tests that produce data from samples of exhaled breath have one thing in common: they all need to be put into a context wherein a posed question can actually be answered. Exhaled breath contains numerous compoun...

  18. Primer of statistics in dental research: part I.

    PubMed

    Shintani, Ayumi

    2014-01-01

    Statistics play essential roles in evidence-based dentistry (EBD) practice and research. It ranges widely from formulating scientific questions, designing studies, collecting and analyzing data to interpreting, reporting, and presenting study findings. Mastering statistical concepts appears to be an unreachable goal among many dental researchers in part due to statistical authorities' limitations of explaining statistical principles to health researchers without elaborating complex mathematical concepts. This series of 2 articles aim to introduce dental researchers to 9 essential topics in statistics to conduct EBD with intuitive examples. The part I of the series includes the first 5 topics (1) statistical graph, (2) how to deal with outliers, (3) p-value and confidence interval, (4) testing equivalence, and (5) multiplicity adjustment. Part II will follow to cover the remaining topics including (6) selecting the proper statistical tests, (7) repeated measures analysis, (8) epidemiological consideration for causal association, and (9) analysis of agreement. Copyright © 2014. Published by Elsevier Ltd.

  19. Problems with the interpretation of paleomagnetic measurements due to studies of basalts from the Paleovolcano Vogelsberg in Hessen

    NASA Technical Reports Server (NTRS)

    Schenk, E.

    1979-01-01

    The palaeomagnetic parameters of more than 5,000 samples of cores taken from 33 drilling holes through innumerable basalt units of the Vogelsberg Paleovolcano in Hessen were measured. Measurements of specimens of thin and thick layers without any gap proved that inclination, natural remanence, susceptibility and Konigsberger factor were dependent on their distance from the surface of units, layers, lamelles, etc. Therefore, representative data for the evaluation of palaeomagnetic measurements can be expected only in the interior part of lava flows and intrusions. The statistic method which enclosed all values of measurements gave significant data which was not appropriate for the interpretation of palaeomagnetic and geological events.

  20. [Automated identification, interpretation and classification of focal changes in the lungs on the images obtained at computed tomography for lung cancer screening].

    PubMed

    Barchuk, A A; Podolsky, M D; Tarakanov, S A; Kotsyuba, I Yu; Gaidukov, V S; Kuznetsov, V I; Merabishvili, V M; Barchuk, A S; Levchenko, E V; Filochkina, A V; Arseniev, A I

    2015-01-01

    This review article analyzes data of literature devoted to the description, interpretation and classification of focal (nodal) changes in the lungs detected by computed tomography of the chest cavity. There are discussed possible criteria for determining the most likely of their character--primary and metastatic tumor processes, inflammation, scarring, and autoimmune changes, tuberculosis and others. Identification of the most characteristic, reliable and statistically significant evidences of a variety of pathological processes in the lungs including the use of modern computer-aided detection and diagnosis of sites will optimize the diagnostic measures and ensure processing of a large volume of medical data in a short time.

  1. Statistical significance of trace evidence matches using independent physicochemical measurements

    NASA Astrophysics Data System (ADS)

    Almirall, Jose R.; Cole, Michael; Furton, Kenneth G.; Gettinby, George

    1997-02-01

    A statistical approach to the significance of glass evidence is proposed using independent physicochemical measurements and chemometrics. Traditional interpretation of the significance of trace evidence matches or exclusions relies on qualitative descriptors such as 'indistinguishable from,' 'consistent with,' 'similar to' etc. By performing physical and chemical measurements with are independent of one another, the significance of object exclusions or matches can be evaluated statistically. One of the problems with this approach is that the human brain is excellent at recognizing and classifying patterns and shapes but performs less well when that object is represented by a numerical list of attributes. Chemometrics can be employed to group similar objects using clustering algorithms and provide statistical significance in a quantitative manner. This approach is enhanced when population databases exist or can be created and the data in question can be evaluated given these databases. Since the selection of the variables used and their pre-processing can greatly influence the outcome, several different methods could be employed in order to obtain a more complete picture of the information contained in the data. Presently, we report on the analysis of glass samples using refractive index measurements and the quantitative analysis of the concentrations of the metals: Mg, Al, Ca, Fe, Mn, Ba, Sr, Ti and Zr. The extension of this general approach to fiber and paint comparisons also is discussed. This statistical approach should not replace the current interpretative approaches to trace evidence matches or exclusions but rather yields an additional quantitative measure. The lack of sufficient general population databases containing the needed physicochemical measurements and the potential for confusion arising from statistical analysis currently hamper this approach and ways of overcoming these obstacles are presented.

  2. Change detection with heterogeneous data using ecoregional stratification, statistical summaries and a land allocation algorithm

    Treesearch

    Kathleen M. Bergen; Daniel G. Brown; James F. Rutherford; Eric J. Gustafson

    2005-01-01

    A ca. 1980 national-scale land-cover classification based on aerial photo interpretation was combined with 2000 AVHRR satellite imagery to derive land cover and land-cover change information for forest, urban, and agriculture categories over a seven-state region in the U.S. To derive useful land-cover change data using a heterogeneous dataset and to validate our...

  3. Crux: Rapid Open Source Protein Tandem Mass Spectrometry Analysis

    PubMed Central

    2015-01-01

    Efficiently and accurately analyzing big protein tandem mass spectrometry data sets requires robust software that incorporates state-of-the-art computational, machine learning, and statistical methods. The Crux mass spectrometry analysis software toolkit (http://cruxtoolkit.sourceforge.net) is an open source project that aims to provide users with a cross-platform suite of analysis tools for interpreting protein mass spectrometry data. PMID:25182276

  4. Are Assumptions of Well-Known Statistical Techniques Checked, and Why (Not)?

    PubMed Central

    Hoekstra, Rink; Kiers, Henk A. L.; Johnson, Addie

    2012-01-01

    A valid interpretation of most statistical techniques requires that one or more assumptions be met. In published articles, however, little information tends to be reported on whether the data satisfy the assumptions underlying the statistical techniques used. This could be due to self-selection: Only manuscripts with data fulfilling the assumptions are submitted. Another explanation could be that violations of assumptions are rarely checked for in the first place. We studied whether and how 30 researchers checked fictitious data for violations of assumptions in their own working environment. Participants were asked to analyze the data as they would their own data, for which often used and well-known techniques such as the t-procedure, ANOVA and regression (or non-parametric alternatives) were required. It was found that the assumptions of the techniques were rarely checked, and that if they were, it was regularly by means of a statistical test. Interviews afterward revealed a general lack of knowledge about assumptions, the robustness of the techniques with regards to the assumptions, and how (or whether) assumptions should be checked. These data suggest that checking for violations of assumptions is not a well-considered choice, and that the use of statistics can be described as opportunistic. PMID:22593746

  5. Can hospital episode statistics support appraisal and revalidation? Randomised study of physician attitudes.

    PubMed

    Croft, Giles P; Williams, John G; Mann, Robin Y; Cohen, David; Phillips, Ceri J

    2007-08-01

    Hospital episode statistics were originally designed to monitor activity and allocate resources in the NHS. Recently their uses have widened to include analysis of individuals' activity, to inform appraisal and revalidation, and monitor performance. This study investigated physician attitudes to the validity and usefulness of these data for such purposes, and the effect of supporting individuals in data interpretation. A randomised study was conducted with consultant physicians in England, Wales and Scotland. The intervention group was supported by a clinician and an information analyst in obtaining and analysing their own data. The control group was unsupported. Attitudes to the data and confidence in their ability to reflect clinical practice were examined before and after the intervention. It was concluded that hospital episode statistics are not presently fit for monitoring the performance of individual physicians. A more comprehensive description of activity is required for these purposes. Improvements in the quality of existing data through clinical engagement at a local level, however, are possible.

  6. Robust misinterpretation of confidence intervals.

    PubMed

    Hoekstra, Rink; Morey, Richard D; Rouder, Jeffrey N; Wagenmakers, Eric-Jan

    2014-10-01

    Null hypothesis significance testing (NHST) is undoubtedly the most common inferential technique used to justify claims in the social sciences. However, even staunch defenders of NHST agree that its outcomes are often misinterpreted. Confidence intervals (CIs) have frequently been proposed as a more useful alternative to NHST, and their use is strongly encouraged in the APA Manual. Nevertheless, little is known about how researchers interpret CIs. In this study, 120 researchers and 442 students-all in the field of psychology-were asked to assess the truth value of six particular statements involving different interpretations of a CI. Although all six statements were false, both researchers and students endorsed, on average, more than three statements, indicating a gross misunderstanding of CIs. Self-declared experience with statistics was not related to researchers' performance, and, even more surprisingly, researchers hardly outperformed the students, even though the students had not received any education on statistical inference whatsoever. Our findings suggest that many researchers do not know the correct interpretation of a CI. The misunderstandings surrounding p-values and CIs are particularly unfortunate because they constitute the main tools by which psychologists draw conclusions from data.

  7. Homeostasis and Gauss statistics: barriers to understanding natural variability.

    PubMed

    West, Bruce J

    2010-06-01

    In this paper, the concept of knowledge is argued to be the top of a three-tiered system of science. The first tier is that of measurement and data, followed by information consisting of the patterns within the data, and ending with theory that interprets the patterns and yields knowledge. Thus, when a scientific theory ceases to be consistent with the database the knowledge based on that theory must be re-examined and potentially modified. Consequently, all knowledge, like glory, is transient. Herein we focus on the non-normal statistics of physiologic time series and conclude that the empirical inverse power-law statistics and long-time correlations are inconsistent with the theoretical notion of homeostasis. We suggest replacing the notion of homeostasis with that of Fractal Physiology.

  8. Critical Analysis of Primary Literature in a Master's-Level Class: Effects on Self-Efficacy and Science-Process Skills.

    PubMed

    Abdullah, Christopher; Parris, Julian; Lie, Richard; Guzdar, Amy; Tour, Ella

    2015-01-01

    The ability to think analytically and creatively is crucial for success in the modern workforce, particularly for graduate students, who often aim to become physicians or researchers. Analysis of the primary literature provides an excellent opportunity to practice these skills. We describe a course that includes a structured analysis of four research papers from diverse fields of biology and group exercises in proposing experiments that would follow up on these papers. To facilitate a critical approach to primary literature, we included a paper with questionable data interpretation and two papers investigating the same biological question yet reaching opposite conclusions. We report a significant increase in students' self-efficacy in analyzing data from research papers, evaluating authors' conclusions, and designing experiments. Using our science-process skills test, we observe a statistically significant increase in students' ability to propose an experiment that matches the goal of investigation. We also detect gains in interpretation of controls and quantitative analysis of data. No statistically significant changes were observed in questions that tested the skills of interpretation, inference, and evaluation. © 2015 C. Abdullah et al. CBE—Life Sciences Education © 2015 The American Society for Cell Biology. This article is distributed by The American Society for Cell Biology under license from the author(s). It is available to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0).

  9. Using the Halstead-Reitan Battery to diagnose brain damage: a comparison of the predictive power of traditional techniques to Rohling's Interpretive Method.

    PubMed

    Rohling, Martin L; Williamson, David J; Miller, L Stephen; Adams, Russell L

    2003-11-01

    The aim of this project was to validate an alternative global measure of neurocognitive impairment (Rohling Interpretive Method, or RIM) that could be generated from data gathered from a flexible battery approach. A critical step in this process is to establish the utility of the technique against current standards in the field. In this paper, we compared results from the Rohling Interpretive Method to those obtained from the General Neuropsychological Deficit Scale (GNDS; Reitan & Wolfson, 1988) and the Halstead-Russell Average Impairment Rating (AIR; Russell, Neuringer & Goldstein, 1970) on a large previously published sample of patients assessed with the Halstead-Reitan Battery (HRB). Findings support the use of the Rohling Interpretive Method in producing summary statistics similar in diagnostic sensitivity and specificity to the traditional HRB indices.

  10. MWASTools: an R/bioconductor package for metabolome-wide association studies.

    PubMed

    Rodriguez-Martinez, Andrea; Posma, Joram M; Ayala, Rafael; Neves, Ana L; Anwar, Maryam; Petretto, Enrico; Emanueli, Costanza; Gauguier, Dominique; Nicholson, Jeremy K; Dumas, Marc-Emmanuel

    2018-03-01

    MWASTools is an R package designed to provide an integrated pipeline to analyse metabonomic data in large-scale epidemiological studies. Key functionalities of our package include: quality control analysis; metabolome-wide association analysis using various models (partial correlations, generalized linear models); visualization of statistical outcomes; metabolite assignment using statistical total correlation spectroscopy (STOCSY); and biological interpretation of metabolome-wide association studies results. The MWASTools R package is implemented in R (version  > =3.4) and is available from Bioconductor: https://bioconductor.org/packages/MWASTools/. m.dumas@imperial.ac.uk. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  11. Jordanian twelfth-grade science teachers' self-reported usage of science and engineering practices in the next generation science standards

    NASA Astrophysics Data System (ADS)

    Malkawi, Amal Reda; Rababah, Ebtesam Qassim

    2018-06-01

    This study investigated the degree that Science and Engineering Practices (SEPs) criteria from the Next Generation Science Standards (NGSS) were included in self-reported teaching practices of twelfth-grade science teachers in Jordan. This study sampled (n = 315) science teachers recruited from eight different public school directorates. The sample was surveyed using an instrument adapted from Kawasaki (2015). Results found that Jordanian science teachers incorporate (SEPs) in their classroom teaching at only a moderate level. SEPs applied most frequently included 'using the diagram, table or graphic through instructions to clarify the subject of a new science,' and to 'discuss with the students how to interpret the quantitative data from the experiment or investigation'. The practice with the lowest frequency was 'teach a lesson on interpreting statistics or quantitative data,' which was moderately applied. No statistically significant differences at (α = 0.05) were found among these Jordanian science teachers' self-estimations of (SEP) application into their own teaching according to the study's demographic variables (specialisation, educational qualification, teaching experience). However, a statistically significant difference at (α = 0.05) was found among Jordanian high school science teachers' practice means based on gender, with female teachers using SEPs at a higher rate than male teachers.

  12. An empirical evaluation of genetic distance statistics using microsatellite data from bear (Ursidae) populations.

    PubMed

    Paetkau, D; Waits, L P; Clarkson, P L; Craighead, L; Strobeck, C

    1997-12-01

    A large microsatellite data set from three species of bear (Ursidae) was used to empirically test the performance of six genetic distance measures in resolving relationships at a variety of scales ranging from adjacent areas in a continuous distribution to species that diverged several million years ago. At the finest scale, while some distance measures performed extremely well, statistics developed specifically to accommodate the mutational processes of microsatellites performed relatively poorly, presumably because of the relatively higher variance of these statistics. At the other extreme, no statistic was able to resolve the close sister relationship of polar bears and brown bears from more distantly related pairs of species. This failure is most likely due to constraints on allele distributions at microsatellite loci. At intermediate scales, both within continuous distributions and in comparisons to insular populations of late Pleistocene origin, it was not possible to define the point where linearity was lost for each of the statistics, except that it is clearly lost after relatively short periods of independent evolution. All of the statistics were affected by the amount of genetic diversity within the populations being compared, significantly complicating the interpretation of genetic distance data.

  13. An Empirical Evaluation of Genetic Distance Statistics Using Microsatellite Data from Bear (Ursidae) Populations

    PubMed Central

    Paetkau, D.; Waits, L. P.; Clarkson, P. L.; Craighead, L.; Strobeck, C.

    1997-01-01

    A large microsatellite data set from three species of bear (Ursidae) was used to empirically test the performance of six genetic distance measures in resolving relationships at a variety of scales ranging from adjacent areas in a continuous distribution to species that diverged several million years ago. At the finest scale, while some distance measures performed extremely well, statistics developed specifically to accommodate the mutational processes of microsatellites performed relatively poorly, presumably because of the relatively higher variance of these statistics. At the other extreme, no statistic was able to resolve the close sister relationship of polar bears and brown bears from more distantly related pairs of species. This failure is most likely due to constraints on allele distributions at microsatellite loci. At intermediate scales, both within continuous distributions and in comparisons to insular populations of late Pleistocene origin, it was not possible to define the point where linearity was lost for each of the statistics, except that it is clearly lost after relatively short periods of independent evolution. All of the statistics were affected by the amount of genetic diversity within the populations being compared, significantly complicating the interpretation of genetic distance data. PMID:9409849

  14. Reproducibility-optimized test statistic for ranking genes in microarray studies.

    PubMed

    Elo, Laura L; Filén, Sanna; Lahesmaa, Riitta; Aittokallio, Tero

    2008-01-01

    A principal goal of microarray studies is to identify the genes showing differential expression under distinct conditions. In such studies, the selection of an optimal test statistic is a crucial challenge, which depends on the type and amount of data under analysis. While previous studies on simulated or spike-in datasets do not provide practical guidance on how to choose the best method for a given real dataset, we introduce an enhanced reproducibility-optimization procedure, which enables the selection of a suitable gene- anking statistic directly from the data. In comparison with existing ranking methods, the reproducibilityoptimized statistic shows good performance consistently under various simulated conditions and on Affymetrix spike-in dataset. Further, the feasibility of the novel statistic is confirmed in a practical research setting using data from an in-house cDNA microarray study of asthma-related gene expression changes. These results suggest that the procedure facilitates the selection of an appropriate test statistic for a given dataset without relying on a priori assumptions, which may bias the findings and their interpretation. Moreover, the general reproducibilityoptimization procedure is not limited to detecting differential expression only but could be extended to a wide range of other applications as well.

  15. Evaluating the decision accuracy and speed of clinical data visualizations.

    PubMed

    Pieczkiewicz, David S; Finkelstein, Stanley M

    2010-01-01

    Clinicians face an increasing volume of biomedical data. Assessing the efficacy of systems that enable accurate and timely clinical decision making merits corresponding attention. This paper discusses the multiple-reader multiple-case (MRMC) experimental design and linear mixed models as means of assessing and comparing decision accuracy and latency (time) for decision tasks in which clinician readers must interpret visual displays of data. These tools can assess and compare decision accuracy and latency (time). These experimental and statistical techniques, used extensively in radiology imaging studies, offer a number of practical and analytic advantages over more traditional quantitative methods such as percent-correct measurements and ANOVAs, and are recommended for their statistical efficiency and generalizability. An example analysis using readily available, free, and commercial statistical software is provided as an appendix. While these techniques are not appropriate for all evaluation questions, they can provide a valuable addition to the evaluative toolkit of medical informatics research.

  16. Single-case research design in pediatric psychology: considerations regarding data analysis.

    PubMed

    Cohen, Lindsey L; Feinstein, Amanda; Masuda, Akihiko; Vowles, Kevin E

    2014-03-01

    Single-case research allows for an examination of behavior and can demonstrate the functional relation between intervention and outcome in pediatric psychology. This review highlights key assumptions, methodological and design considerations, and options for data analysis. Single-case methodology and guidelines are reviewed with an in-depth focus on visual and statistical analyses. Guidelines allow for the careful evaluation of design quality and visual analysis. A number of statistical techniques have been introduced to supplement visual analysis, but to date, there is no consensus on their recommended use in single-case research design. Single-case methodology is invaluable for advancing pediatric psychology science and practice, and guidelines have been introduced to enhance the consistency, validity, and reliability of these studies. Experts generally agree that visual inspection is the optimal method of analysis in single-case design; however, statistical approaches are becoming increasingly evaluated and used to augment data interpretation.

  17. Innovations in curriculum design: A multi-disciplinary approach to teaching statistics to undergraduate medical students

    PubMed Central

    Freeman, Jenny V; Collier, Steve; Staniforth, David; Smith, Kevin J

    2008-01-01

    Background Statistics is relevant to students and practitioners in medicine and health sciences and is increasingly taught as part of the medical curriculum. However, it is common for students to dislike and under-perform in statistics. We sought to address these issues by redesigning the way that statistics is taught. Methods The project brought together a statistician, clinician and educational experts to re-conceptualize the syllabus, and focused on developing different methods of delivery. New teaching materials, including videos, animations and contextualized workbooks were designed and produced, placing greater emphasis on applying statistics and interpreting data. Results Two cohorts of students were evaluated, one with old style and one with new style teaching. Both were similar with respect to age, gender and previous level of statistics. Students who were taught using the new approach could better define the key concepts of p-value and confidence interval (p < 0.001 for both). They were more likely to regard statistics as integral to medical practice (p = 0.03), and to expect to use it in their medical career (p = 0.003). There was no significant difference in the numbers who thought that statistics was essential to understand the literature (p = 0.28) and those who felt comfortable with the basics of statistics (p = 0.06). More than half the students in both cohorts felt that they were comfortable with the basics of medical statistics. Conclusion Using a variety of media, and placing emphasis on interpretation can help make teaching, learning and understanding of statistics more people-centred and relevant, resulting in better outcomes for students. PMID:18452599

  18. ANALYSIS OF A CLASSIFICATION ERROR MATRIX USING CATEGORICAL DATA TECHNIQUES.

    USGS Publications Warehouse

    Rosenfield, George H.; Fitzpatrick-Lins, Katherine

    1984-01-01

    Summary form only given. A classification error matrix typically contains tabulation results of an accuracy evaluation of a thematic classification, such as that of a land use and land cover map. The diagonal elements of the matrix represent the counts corrected, and the usual designation of classification accuracy has been the total percent correct. The nondiagonal elements of the matrix have usually been neglected. The classification error matrix is known in statistical terms as a contingency table of categorical data. As an example, an application of these methodologies to a problem of remotely sensed data concerning two photointerpreters and four categories of classification indicated that there is no significant difference in the interpretation between the two photointerpreters, and that there are significant differences among the interpreted category classifications. However, two categories, oak and cottonwood, are not separable in classification in this experiment at the 0. 51 percent probability. A coefficient of agreement is determined for the interpreted map as a whole, and individually for each of the interpreted categories. A conditional coefficient of agreement for the individual categories is compared to other methods for expressing category accuracy which have already been presented in the remote sensing literature.

  19. Evaluating Random Error in Clinician-Administered Surveys: Theoretical Considerations and Clinical Applications of Interobserver Reliability and Agreement.

    PubMed

    Bennett, Rebecca J; Taljaard, Dunay S; Olaithe, Michelle; Brennan-Jones, Chris; Eikelboom, Robert H

    2017-09-18

    The purpose of this study is to raise awareness of interobserver concordance and the differences between interobserver reliability and agreement when evaluating the responsiveness of a clinician-administered survey and, specifically, to demonstrate the clinical implications of data types (nominal/categorical, ordinal, interval, or ratio) and statistical index selection (for example, Cohen's kappa, Krippendorff's alpha, or interclass correlation). In this prospective cohort study, 3 clinical audiologists, who were masked to each other's scores, administered the Practical Hearing Aid Skills Test-Revised to 18 adult owners of hearing aids. Interobserver concordance was examined using a range of reliability and agreement statistical indices. The importance of selecting statistical measures of concordance was demonstrated with a worked example, wherein the level of interobserver concordance achieved varied from "no agreement" to "almost perfect agreement" depending on data types and statistical index selected. This study demonstrates that the methodology used to evaluate survey score concordance can influence the statistical results obtained and thus affect clinical interpretations.

  20. 75 FR 22565 - Privacy Act of 1974; System of Records

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-04-29

    ... collect, assemble, interpret, analyze, report and publish surveys; research, study, statistical and... commercial entities, for surveys or research, where such releases are consistent with the mission of the...): To collect, assemble, interpret, analyze, report and publish surveys; research, study, statistical...

  1. Quantifying Treatment Benefit in Molecular Subgroups to Assess a Predictive Biomarker.

    PubMed

    Iasonos, Alexia; Chapman, Paul B; Satagopan, Jaya M

    2016-05-01

    An increased interest has been expressed in finding predictive biomarkers that can guide treatment options for both mutation carriers and noncarriers. The statistical assessment of variation in treatment benefit (TB) according to the biomarker carrier status plays an important role in evaluating predictive biomarkers. For time-to-event endpoints, the hazard ratio (HR) for interaction between treatment and a biomarker from a proportional hazards regression model is commonly used as a measure of variation in TB. Although this can be easily obtained using available statistical software packages, the interpretation of HR is not straightforward. In this article, we propose different summary measures of variation in TB on the scale of survival probabilities for evaluating a predictive biomarker. The proposed summary measures can be easily interpreted as quantifying differential in TB in terms of relative risk or excess absolute risk due to treatment in carriers versus noncarriers. We illustrate the use and interpretation of the proposed measures with data from completed clinical trials. We encourage clinical practitioners to interpret variation in TB in terms of measures based on survival probabilities, particularly in terms of excess absolute risk, as opposed to HR. Clin Cancer Res; 22(9); 2114-20. ©2016 AACR. ©2016 American Association for Cancer Research.

  2. Invited Commentary: Antecedents of Obesity—Analysis, Interpretation, and Use of Longitudinal Data

    PubMed Central

    Gillman, Matthew W.; Kleinman, Ken

    2007-01-01

    The obesity epidemic causes misery and death. Most epidemiologists accept the hypothesis that characteristics of the early stages of human development have lifelong influences on obesity-related health outcomes. Unfortunately, there is a dearth of data of sufficient scope and individual history to help unravel the associations of prenatal, postnatal, and childhood factors with adult obesity and health outcomes. Here the authors discuss analytic methods, the interpretation of models, and the use to which such rare and valuable data may be put in developing interventions to combat the epidemic. For example, analytic methods such as quantile and multinomial logistic regression can describe the effects on body mass index range rather than just its mean; structural equation models may allow comparison of the contributions of different factors at different periods in the life course. Interpretation of the data and model construction is complex, and it requires careful consideration of the biologic plausibility and statistical interpretation of putative causal factors. The goals of discovering modifiable determinants of obesity during the prenatal, postnatal, and childhood periods must be kept in sight, and analyses should be built to facilitate them. Ultimately, interventions in these factors may help prevent obesity-related adverse health outcomes for future generations. PMID:17490988

  3. Data-Intensive Science and Research Integrity.

    PubMed

    Resnik, David B; Elliott, Kevin C; Soranno, Patricia A; Smith, Elise M

    2017-01-01

    In this commentary, we consider questions related to research integrity in data-intensive science and argue that there is no need to create a distinct category of misconduct that applies to deception related to processing, analyzing, or interpreting data. The best way to promote integrity in data-intensive science is to maintain a firm commitment to epistemological and ethical values, such as honesty, openness, transparency, and objectivity, which apply to all types of research, and to promote education, policy development, and scholarly debate concerning appropriate uses of statistics.

  4. Dynamic changes of RNA-sequencing expression for precision medicine: N-of-1-pathways Mahalanobis distance within pathways of single subjects predicts breast cancer survival

    PubMed Central

    Piegorsch, Walter W.; Lussier, Yves A.

    2015-01-01

    Motivation: The conventional approach to personalized medicine relies on molecular data analytics across multiple patients. The path to precision medicine lies with molecular data analytics that can discover interpretable single-subject signals (N-of-1). We developed a global framework, N-of-1-pathways, for a mechanistic-anchored approach to single-subject gene expression data analysis. We previously employed a metric that could prioritize the statistical significance of a deregulated pathway in single subjects, however, it lacked in quantitative interpretability (e.g. the equivalent to a gene expression fold-change). Results: In this study, we extend our previous approach with the application of statistical Mahalanobis distance (MD) to quantify personal pathway-level deregulation. We demonstrate that this approach, N-of-1-pathways Paired Samples MD (N-OF-1-PATHWAYS-MD), detects deregulated pathways (empirical simulations), while not inflating false-positive rate using a study with biological replicates. Finally, we establish that N-OF-1-PATHWAYS-MD scores are, biologically significant, clinically relevant and are predictive of breast cancer survival (P < 0.05, n = 80 invasive carcinoma; TCGA RNA-sequences). Conclusion: N-of-1-pathways MD provides a practical approach towards precision medicine. The method generates the magnitude and the biological significance of personal deregulated pathways results derived solely from the patient’s transcriptome. These pathways offer the opportunities for deriving clinically actionable decisions that have the potential to complement the clinical interpretability of personal polymorphisms obtained from DNA acquired or inherited polymorphisms and mutations. In addition, it offers an opportunity for applicability to diseases in which DNA changes may not be relevant, and thus expand the ‘interpretable ‘omics’ of single subjects (e.g. personalome). Availability and implementation: http://www.lussierlab.net/publications/N-of-1-pathways. Contact: yves@email.arizona.edu or piegorsch@math.arizona.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26072495

  5. Searching for hidden unexpected features in the SnIa data

    NASA Astrophysics Data System (ADS)

    Shafieloo, A.; Perivolaropoulos, L.

    2010-06-01

    It is known that κ2 statistic and likelihood analysis may not be sensitive to the all features of the data. Despite of the fact that by using κ2 statistic we can measure the overall goodness of fit for a model confronted to a data set, some specific features of the data can stay undetectable. For instance, it has been pointed out that there is an unexpected brightness of the SnIa data at z > 1 in the Union compilation. We quantify this statement by constructing a new statistic, called Binned Normalized Difference (BND) statistic, which is applicable directly on the Type Ia Supernova (SnIa) distance moduli. This statistic is designed to pick up systematic brightness trends of SnIa data points with respect to a best fit cosmological model at high redshifts. According to this statistic there are 2.2%, 5.3% and 12.6% consistency between the Gold06, Union08 and Constitution09 data and spatially flat ΛCDM model when the real data is compared with many realizations of the simulated monte carlo datasets. The corresponding realization probability in the context of a (w0,w1) = (-1.4,2) model is more than 30% for all mentioned datasets indicating a much better consistency for this model with respect to the BND statistic. The unexpected high z brightness of SnIa can be interpreted either as a trend towards more deceleration at high z than expected in the context of ΛCDM or as a statistical fluctuation or finally as a systematic effect perhaps due to a mild SnIa evolution at high z.

  6. Statistical Data Editing in Scientific Articles.

    PubMed

    Habibzadeh, Farrokh

    2017-07-01

    Scientific journals are important scholarly forums for sharing research findings. Editors have important roles in safeguarding standards of scientific publication and should be familiar with correct presentation of results, among other core competencies. Editors do not have access to the raw data and should thus rely on clues in the submitted manuscripts. To identify probable errors, they should look for inconsistencies in presented results. Common statistical problems that can be picked up by a knowledgeable manuscript editor are discussed in this article. Manuscripts should contain a detailed section on statistical analyses of the data. Numbers should be reported with appropriate precisions. Standard error of the mean (SEM) should not be reported as an index of data dispersion. Mean (standard deviation [SD]) and median (interquartile range [IQR]) should be used for description of normally and non-normally distributed data, respectively. If possible, it is better to report 95% confidence interval (CI) for statistics, at least for main outcome variables. And, P values should be presented, and interpreted with caution, if there is a hypothesis. To advance knowledge and skills of their members, associations of journal editors are better to develop training courses on basic statistics and research methodology for non-experts. This would in turn improve research reporting and safeguard the body of scientific evidence. © 2017 The Korean Academy of Medical Sciences.

  7. Volume and Value of Big Healthcare Data.

    PubMed

    Dinov, Ivo D

    Modern scientific inquiries require significant data-driven evidence and trans-disciplinary expertise to extract valuable information and gain actionable knowledge about natural processes. Effective evidence-based decisions require collection, processing and interpretation of vast amounts of complex data. The Moore's and Kryder's laws of exponential increase of computational power and information storage, respectively, dictate the need rapid trans-disciplinary advances, technological innovation and effective mechanisms for managing and interrogating Big Healthcare Data. In this article, we review important aspects of Big Data analytics and discuss important questions like: What are the challenges and opportunities associated with this biomedical, social, and healthcare data avalanche? Are there innovative statistical computing strategies to represent, model, analyze and interpret Big heterogeneous data? We present the foundation of a new compressive big data analytics (CBDA) framework for representation, modeling and inference of large, complex and heterogeneous datasets. Finally, we consider specific directions likely to impact the process of extracting information from Big healthcare data, translating that information to knowledge, and deriving appropriate actions.

  8. Volume and Value of Big Healthcare Data

    PubMed Central

    Dinov, Ivo D.

    2016-01-01

    Modern scientific inquiries require significant data-driven evidence and trans-disciplinary expertise to extract valuable information and gain actionable knowledge about natural processes. Effective evidence-based decisions require collection, processing and interpretation of vast amounts of complex data. The Moore's and Kryder's laws of exponential increase of computational power and information storage, respectively, dictate the need rapid trans-disciplinary advances, technological innovation and effective mechanisms for managing and interrogating Big Healthcare Data. In this article, we review important aspects of Big Data analytics and discuss important questions like: What are the challenges and opportunities associated with this biomedical, social, and healthcare data avalanche? Are there innovative statistical computing strategies to represent, model, analyze and interpret Big heterogeneous data? We present the foundation of a new compressive big data analytics (CBDA) framework for representation, modeling and inference of large, complex and heterogeneous datasets. Finally, we consider specific directions likely to impact the process of extracting information from Big healthcare data, translating that information to knowledge, and deriving appropriate actions. PMID:26998309

  9. "Describing our whole experience": the statistical philosophies of W. F. R. Weldon and Karl Pearson.

    PubMed

    Pence, Charles H

    2011-12-01

    There are two motivations commonly ascribed to historical actors for taking up statistics: to reduce complicated data to a mean value (e.g., Quetelet), and to take account of diversity (e.g., Galton). Different motivations will, it is assumed, lead to different methodological decisions in the practice of the statistical sciences. Karl Pearson and W. F. R. Weldon are generally seen as following directly in Galton's footsteps. I argue for two related theses in light of this standard interpretation, based on a reading of several sources in which Weldon, independently of Pearson, reflects on his own motivations. First, while Pearson does approach statistics from this "Galtonian" perspective, he is, consistent with his positivist philosophy of science, utilizing statistics to simplify the highly variable data of biology. Weldon, on the other hand, is brought to statistics by a rich empiricism and a desire to preserve the diversity of biological data. Secondly, we have here a counterexample to the claim that divergence in motivation will lead to a corresponding separation in methodology. Pearson and Weldon, despite embracing biometry for different reasons, settled on precisely the same set of statistical tools for the investigation of evolution. Copyright © 2011 Elsevier Ltd. All rights reserved.

  10. Martian ages

    NASA Technical Reports Server (NTRS)

    Neukum, G.; Hiller, K.

    1981-01-01

    Four discussions are conducted: (1) the methodology of relative age determination by impact crater statistics, (2) a comparison of proposed Martian impact chronologies for the determination of absolute ages from crater frequencies, (3) a report on work dating Martian volcanoes and erosional features by impact crater statistics, and (4) an attempt to understand the main features of Martian history through a synthesis of crater frequency data. Two cratering chronology models are presented and used for inference of absolute ages from crater frequency data, and it is shown that the interpretation of all data available and tractable by the methodology presented leads to a global Martian geological history that is characterized by two epochs of activity. It is concluded that Mars is an ancient planet with respect to its surface features.

  11. Characterizing multivariate decoding models based on correlated EEG spectral features.

    PubMed

    McFarland, Dennis J

    2013-07-01

    Multivariate decoding methods are popular techniques for analysis of neurophysiological data. The present study explored potential interpretative problems with these techniques when predictors are correlated. Data from sensorimotor rhythm-based cursor control experiments was analyzed offline with linear univariate and multivariate models. Features were derived from autoregressive (AR) spectral analysis of varying model order which produced predictors that varied in their degree of correlation (i.e., multicollinearity). The use of multivariate regression models resulted in much better prediction of target position as compared to univariate regression models. However, with lower order AR features interpretation of the spectral patterns of the weights was difficult. This is likely to be due to the high degree of multicollinearity present with lower order AR features. Care should be exercised when interpreting the pattern of weights of multivariate models with correlated predictors. Comparison with univariate statistics is advisable. While multivariate decoding algorithms are very useful for prediction their utility for interpretation may be limited when predictors are correlated. Copyright © 2013 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.

  12. Consistency of response and image recognition, pulmonary nodules

    PubMed Central

    Liu, M A Q; Galvan, E; Bassett, R; Murphy, W A; Matamoros, A; Marom, E M

    2014-01-01

    Objective: To investigate the effect of recognition of a previously encountered radiograph on consistency of response in localized pulmonary nodules. Methods: 13 radiologists interpreted 40 radiographs each to locate pulmonary nodules. A few days later, they again interpreted 40 radiographs. Half of the images in the second set were new. We asked the radiologists whether each image had been in the first set. We used Fisher's exact test and Kruskal–Wallis test to evaluate the correlation between recognition of an image and consistency in its interpretation. We evaluated the data using all possible recognition levels—definitely, probably or possibly included vs definitely, probably or possibly not included by collapsing the recognition levels into two and by eliminating the “possibly included” and “possibly not included” scores. Results: With all but one of six methods of looking at the data, there was no significant correlation between consistency in interpretation and recognition of the image. When the possibly included and possibly not included scores were eliminated, there was a borderline statistical significance (p = 0.04) with slightly greater consistency in interpretation of recognized than that of non-recognized images. Conclusion: We found no convincing evidence that radiologists' recognition of images in an observer performance study affects their interpretation on a second encounter. Advances in knowledge: Conscious recognition of chest radiographs did not result in a greater degree of consistency in the tested interpretation than that in the interpretation of images that were not recognized. PMID:24697724

  13. Response to Comment on "Estimating the reproducibility of psychological science".

    PubMed

    Anderson, Christopher J; Bahník, Štěpán; Barnett-Cowan, Michael; Bosco, Frank A; Chandler, Jesse; Chartier, Christopher R; Cheung, Felix; Christopherson, Cody D; Cordes, Andreas; Cremata, Edward J; Della Penna, Nicolas; Estel, Vivien; Fedor, Anna; Fitneva, Stanka A; Frank, Michael C; Grange, James A; Hartshorne, Joshua K; Hasselman, Fred; Henninger, Felix; van der Hulst, Marije; Jonas, Kai J; Lai, Calvin K; Levitan, Carmel A; Miller, Jeremy K; Moore, Katherine S; Meixner, Johannes M; Munafò, Marcus R; Neijenhuijs, Koen I; Nilsonne, Gustav; Nosek, Brian A; Plessow, Franziska; Prenoveau, Jason M; Ricker, Ashley A; Schmidt, Kathleen; Spies, Jeffrey R; Stieger, Stefan; Strohminger, Nina; Sullivan, Gavin B; van Aert, Robbie C M; van Assen, Marcel A L M; Vanpaemel, Wolf; Vianello, Michelangelo; Voracek, Martin; Zuni, Kellylynn

    2016-03-04

    Gilbert et al. conclude that evidence from the Open Science Collaboration's Reproducibility Project: Psychology indicates high reproducibility, given the study methodology. Their very optimistic assessment is limited by statistical misconceptions and by causal inferences from selectively interpreted, correlational data. Using the Reproducibility Project: Psychology data, both optimistic and pessimistic conclusions about reproducibility are possible, and neither are yet warranted. Copyright © 2016, American Association for the Advancement of Science.

  14. Comparison between Linear and Nonlinear Regression in a Laboratory Heat Transfer Experiment

    ERIC Educational Resources Information Center

    Gonçalves, Carine Messias; Schwaab, Marcio; Pinto, José Carlos

    2013-01-01

    In order to interpret laboratory experimental data, undergraduate students are used to perform linear regression through linearized versions of nonlinear models. However, the use of linearized models can lead to statistically biased parameter estimates. Even so, it is not an easy task to introduce nonlinear regression and show for the students…

  15. Computers and Student Learning: Interpreting the Multivariate Analysis of PISA 2000

    ERIC Educational Resources Information Center

    Bielefeldt, Talbot

    2005-01-01

    In November 2004, economists Thomas Fuchs and Ludger Woessmann published a statistical analysis of the relationship between technology and student achievement using year 2000 data from the Programme for International Student Assessment (PISA). The 2000 PISA was the first in a series of triennial assessments of 15-year-olds conducted by the…

  16. The Inter-Temporal Aspect of Well-Being and Societal Progress

    ERIC Educational Resources Information Center

    Sicherl, Pavle

    2007-01-01

    The perceptions on well-being and societal progress are influenced also by the quantitative indicators and measures used in the measurement, presentation and semantics of discussing these issues. The article presents a novel generic statistical measure S-time-distance, with clear interpretability that delivers a broader concept to look at data, to…

  17. Downscaling Indicators of Forest Habitat Structure from National Assessments

    Treesearch

    Kurt H. Riitters

    2005-01-01

    Downscaling is an important problem because consistent large-area assessments of forest habitat structure, while feasible, are only feasible when using relatively coarse data and indicators. Techniques are needed to enable more detailed and local interpretations of the national statistics. Using the results of national assessments from land-cover maps, this paper...

  18. A Survey of Scholarly Literature Describing the Field of Bioinformatics Education and Bioinformatics Educational Research

    ERIC Educational Resources Information Center

    Magana, Alejandra J.; Taleyarkhan, Manaz; Alvarado, Daniela Rivera; Kane, Michael; Springer, John; Clase, Kari

    2014-01-01

    Bioinformatics education can be broadly defined as the teaching and learning of the use of computer and information technology, along with mathematical and statistical analysis for gathering, storing, analyzing, interpreting, and integrating data to solve biological problems. The recent surge of genomics, proteomics, and structural biology in the…

  19. Biometrics in the Medical School Curriculum: Making the Necessary Relevant.

    ERIC Educational Resources Information Center

    Murphy, James R.

    1980-01-01

    Because a student is more likely to learn and retain course content perceived as relevant, an attempt was made to change medical students' perceptions of a biometrics course by introducing statistical methods as a means of solving problems in the interpretation of clinical lab data. Retrospective analysis of student course evaluations indicates a…

  20. Statistical assessment of DNA extraction reagent lot variability in real-time quantitative PCR

    USGS Publications Warehouse

    Bushon, R.N.; Kephart, C.M.; Koltun, G.F.; Francy, D.S.; Schaefer, F. W.; Lindquist, H.D. Alan

    2010-01-01

    Aims: The aim of this study was to evaluate the variability in lots of a DNA extraction kit using real-time PCR assays for Bacillus anthracis, Francisella tularensis and Vibrio cholerae. Methods and Results: Replicate aliquots of three bacteria were processed in duplicate with three different lots of a commercial DNA extraction kit. This experiment was repeated in triplicate. Results showed that cycle threshold values were statistically different among the different lots. Conclusions: Differences in DNA extraction reagent lots were found to be a significant source of variability for qPCR results. Steps should be taken to ensure the quality and consistency of reagents. Minimally, we propose that standard curves should be constructed for each new lot of extraction reagents, so that lot-to-lot variation is accounted for in data interpretation. Significance and Impact of the Study: This study highlights the importance of evaluating variability in DNA extraction procedures, especially when different reagent lots are used. Consideration of this variability in data interpretation should be an integral part of studies investigating environmental samples with unknown concentrations of organisms. ?? 2010 The Society for Applied Microbiology.

  1. Statistical Reform in School Psychology Research: A Synthesis

    ERIC Educational Resources Information Center

    Swaminathan, Hariharan; Rogers, H. Jane

    2007-01-01

    Statistical reform in school psychology research is discussed in terms of research designs, measurement issues, statistical modeling and analysis procedures, interpretation and reporting of statistical results, and finally statistics education.

  2. Interpretation of the results of statistical measurements. [search for basic probability model

    NASA Technical Reports Server (NTRS)

    Olshevskiy, V. V.

    1973-01-01

    For random processes, the calculated probability characteristic, and the measured statistical estimate are used in a quality functional, which defines the difference between the two functions. Based on the assumption that the statistical measurement procedure is organized so that the parameters for a selected model are optimized, it is shown that the interpretation of experimental research is a search for a basic probability model.

  3. Inference of median difference based on the Box-Cox model in randomized clinical trials.

    PubMed

    Maruo, K; Isogawa, N; Gosho, M

    2015-05-10

    In randomized clinical trials, many medical and biological measurements are not normally distributed and are often skewed. The Box-Cox transformation is a powerful procedure for comparing two treatment groups for skewed continuous variables in terms of a statistical test. However, it is difficult to directly estimate and interpret the location difference between the two groups on the original scale of the measurement. We propose a helpful method that infers the difference of the treatment effect on the original scale in a more easily interpretable form. We also provide statistical analysis packages that consistently include an estimate of the treatment effect, covariance adjustments, standard errors, and statistical hypothesis tests. The simulation study that focuses on randomized parallel group clinical trials with two treatment groups indicates that the performance of the proposed method is equivalent to or better than that of the existing non-parametric approaches in terms of the type-I error rate and power. We illustrate our method with cluster of differentiation 4 data in an acquired immune deficiency syndrome clinical trial. Copyright © 2015 John Wiley & Sons, Ltd.

  4. CRN5EXP: Expert system for statistical quality control

    NASA Technical Reports Server (NTRS)

    Hentea, Mariana

    1991-01-01

    The purpose of the Expert System CRN5EXP is to assist in checking the quality of the coils at two very important mills: Hot Rolling and Cold Rolling in a steel plant. The system interprets the statistical quality control charts, diagnoses and predicts the quality of the steel. Measurements of process control variables are recorded in a database and sample statistics such as the mean and the range are computed and plotted on a control chart. The chart is analyzed through patterns using the C Language Integrated Production System (CLIPS) and a forward chaining technique to reach a conclusion about the causes of defects and to take management measures for the improvement of the quality control techniques. The Expert System combines the certainty factors associated with the process control variables to predict the quality of the steel. The paper presents the approach to extract data from the database, the reason to combine certainty factors, the architecture and the use of the Expert System. However, the interpretation of control charts patterns requires the human expert's knowledge and lends to Expert Systems rules.

  5. A Comparative Evaluation of Mixed Dentition Analysis on Reliability of Cone Beam Computed Tomography Image Compared to Plaster Model.

    PubMed

    Gowd, Snigdha; Shankar, T; Dash, Samarendra; Sahoo, Nivedita; Chatterjee, Suravi; Mohanty, Pritam

    2017-01-01

    The aim of the study was to evaluate the reliability of cone beam computed tomography (CBCT) obtained image over plaster model for the assessment of mixed dentition analysis. Thirty CBCT-derived images and thirty plaster models were derived from the dental archives, and Moyer's and Tanaka-Johnston analyses were performed. The data obtained were interpreted and analyzed statistically using SPSS 10.0/PC (SPSS Inc., Chicago, IL, USA). Descriptive and analytical analysis along with Student's t -test was performed to qualitatively evaluate the data and P < 0.05 was considered statistically significant. Statistically, significant results were obtained on data comparison between CBCT-derived images and plaster model; the mean for Moyer's analysis in the left and right lower arch for CBCT and plaster model was 21.2 mm, 21.1 mm and 22.5 mm, 22.5 mm, respectively. CBCT-derived images were less reliable as compared to data obtained directly from plaster model for mixed dentition analysis.

  6. Crater-based dating of geological units on Mars: methods and application for the new global geological map

    USGS Publications Warehouse

    Platz, Thomas; Michael, Gregory; Tanaka, Kenneth L.; Skinner, James A.; Fortezzo, Corey M.

    2013-01-01

    The new, post-Viking generation of Mars orbital imaging and topographical data provide significant higher-resolution details of surface morphologies, which induced a new effort to photo-geologically map the surface of Mars at 1:20,000,000 scale. Although from unit superposition relations a relative stratigraphical framework can be compiled, it was the ambition of this mapping project to provide absolute unit age constraints through crater statistics. In this study, the crater counting method is described in detail, starting with the selection of image data, type locations (both from the mapper’s and crater counter’s perspectives) and the identification of impact craters. We describe the criteria used to validate and analyse measured crater populations, and to derive and interpret crater model ages. We provide examples of how geological information about the unit’s resurfacing history can be retrieved from crater size–frequency distributions. Three cases illustrate short-, intermediate, and long-term resurfacing histories. In addition, we introduce an interpretation-independent visualisation of the crater resurfacing history that uses the reduction of the crater population in a given size range relative to the expected population given the observed crater density at larger sizes. From a set of potential type locations, 48 areas from 22 globally mapped units were deemed suitable for crater counting. Because resurfacing ages were derived from crater statistics, these secondary ages were used to define the unit age rather than the base age. Using the methods described herein, we modelled ages that are consistent with the interpreted stratigraphy. Our derived model ages allow age assignments to be included in unit names. We discuss the limitations of using the crater dating technique for global-scale geological mapping. Finally, we present recommendations for the documentation and presentation of crater statistics in publications.

  7. Computer Administering of the Psychological Investigations: Set-Relational Representation

    NASA Astrophysics Data System (ADS)

    Yordzhev, Krasimir

    Computer administering of a psychological investigation is the computer representation of the entire procedure of psychological assessments - test construction, test implementation, results evaluation, storage and maintenance of the developed database, its statistical processing, analysis and interpretation. A mathematical description of psychological assessment with the aid of personality tests is discussed in this article. The set theory and the relational algebra are used in this description. A relational model of data, needed to design a computer system for automation of certain psychological assessments is given. Some finite sets and relation on them, which are necessary for creating a personality psychological test, are described. The described model could be used to develop real software for computer administering of any psychological test and there is full automation of the whole process: test construction, test implementation, result evaluation, storage of the developed database, statistical implementation, analysis and interpretation. A software project for computer administering personality psychological tests is suggested.

  8. Biomembrane Permeabilization: Statistics of Individual Leakage Events Harmonize the Interpretation of Vesicle Leakage.

    PubMed

    Braun, Stefan; Pokorná, Šárka; Šachl, Radek; Hof, Martin; Heerklotz, Heiko; Hoernke, Maria

    2018-01-23

    The mode of action of membrane-active molecules, such as antimicrobial, anticancer, cell penetrating, and fusion peptides and their synthetic mimics, transfection agents, drug permeation enhancers, and biological signaling molecules (e.g., quorum sensing), involves either the general or local destabilization of the target membrane or the formation of defined, rather stable pores. Some effects aim at killing the cell, while others need to be limited in space and time to avoid serious damage. Biological tests reveal translocation of compounds and cell death but do not provide a detailed, mechanistic, and quantitative understanding of the modes of action and their molecular basis. Model membrane studies of membrane leakage have been used for decades to tackle this issue, but their interpretation in terms of biology has remained challenging and often quite limited. Here we compare two recent, powerful protocols to study model membrane leakage: the microscopic detection of dye influx into giant liposomes and time-correlated single photon counting experiments to characterize dye efflux from large unilamellar vesicles. A statistical treatment of both data sets does not only harmonize apparent discrepancies but also makes us aware of principal issues that have been confusing the interpretation of model membrane leakage data so far. Moreover, our study reveals a fundamental difference between nano- and microscale systems that needs to be taken into account when conclusions about microscale objects, such as cells, are drawn from nanoscale models.

  9. Harnessing Big Data for Systems Pharmacology

    PubMed Central

    Xie, Lei; Draizen, Eli J.; Bourne, Philip E.

    2017-01-01

    Systems pharmacology aims to holistically understand mechanisms of drug actions to support drug discovery and clinical practice. Systems pharmacology modeling (SPM) is data driven. It integrates an exponentially growing amount of data at multiple scales (genetic, molecular, cellular, organismal, and environmental). The goal of SPM is to develop mechanistic or predictive multiscale models that are interpretable and actionable. The current explosions in genomics and other omics data, as well as the tremendous advances in big data technologies, have already enabled biologists to generate novel hypotheses and gain new knowledge through computational models of genome-wide, heterogeneous, and dynamic data sets. More work is needed to interpret and predict a drug response phenotype, which is dependent on many known and unknown factors. To gain a comprehensive understanding of drug actions, SPM requires close collaborations between domain experts from diverse fields and integration of heterogeneous models from biophysics, mathematics, statistics, machine learning, and semantic webs. This creates challenges in model management, model integration, model translation, and knowledge integration. In this review, we discuss several emergent issues in SPM and potential solutions using big data technology and analytics. The concurrent development of high-throughput techniques, cloud computing, data science, and the semantic web will likely allow SPM to be findable, accessible, interoperable, reusable, reliable, interpretable, and actionable. PMID:27814027

  10. Harnessing Big Data for Systems Pharmacology.

    PubMed

    Xie, Lei; Draizen, Eli J; Bourne, Philip E

    2017-01-06

    Systems pharmacology aims to holistically understand mechanisms of drug actions to support drug discovery and clinical practice. Systems pharmacology modeling (SPM) is data driven. It integrates an exponentially growing amount of data at multiple scales (genetic, molecular, cellular, organismal, and environmental). The goal of SPM is to develop mechanistic or predictive multiscale models that are interpretable and actionable. The current explosions in genomics and other omics data, as well as the tremendous advances in big data technologies, have already enabled biologists to generate novel hypotheses and gain new knowledge through computational models of genome-wide, heterogeneous, and dynamic data sets. More work is needed to interpret and predict a drug response phenotype, which is dependent on many known and unknown factors. To gain a comprehensive understanding of drug actions, SPM requires close collaborations between domain experts from diverse fields and integration of heterogeneous models from biophysics, mathematics, statistics, machine learning, and semantic webs. This creates challenges in model management, model integration, model translation, and knowledge integration. In this review, we discuss several emergent issues in SPM and potential solutions using big data technology and analytics. The concurrent development of high-throughput techniques, cloud computing, data science, and the semantic web will likely allow SPM to be findable, accessible, interoperable, reusable, reliable, interpretable, and actionable.

  11. A bird's eye view: the cognitive strategies of experts interpreting seismic profiles

    NASA Astrophysics Data System (ADS)

    Bond, C. E.; Butler, R.

    2012-12-01

    Geoscience is perhaps unique in its reliance on incomplete datasets and building knowledge from their interpretation. This interpretation basis for the science is fundamental at all levels; from creation of a geological map to interpretation of remotely sensed data. To teach and understand better the uncertainties in dealing with incomplete data we need to understand the strategies individual practitioners deploy that make them effective interpreters. The nature of interpretation is such that the interpreter needs to use their cognitive ability in the analysis of the data to propose a sensible solution in their final output that is both consistent not only with the original data but also with other knowledge and understanding. In a series of experiments Bond et al. (2007, 2008, 2011, 2012) investigated the strategies and pitfalls of expert and non-expert interpretation of seismic images. These studies focused on large numbers of participants to provide a statistically sound basis for analysis of the results. The outcome of these experiments showed that techniques and strategies are more important than expert knowledge per se in developing successful interpretations. Experts are successful because of their application of these techniques. In a new set of experiments we have focused on a small number of experts to determine how they use their cognitive and reasoning skills, in the interpretation of 2D seismic profiles. Live video and practitioner commentary were used to track the evolving interpretation and to gain insight on their decision processes. The outputs of the study allow us to create an educational resource of expert interpretation through online video footage and commentary with associated further interpretation and analysis of the techniques and strategies employed. This resource will be of use to undergraduate, post-graduate, industry and academic professionals seeking to improve their seismic interpretation skills, develop reasoning strategies for dealing with incomplete datasets, and for assessing the uncertainty in these interpretations. Bond, C.E. et al. (2012). 'What makes an expert effective at interpreting seismic images?' Geology, 40, 75-78. Bond, C. E. et al. (2011). 'When there isn't a right answer: interpretation and reasoning, key skills for 21st century geoscience'. International Journal of Science Education, 33, 629-652. Bond, C. E. et al. (2008). 'Structural models: Optimizing risk analysis by understanding conceptual uncertainty'. First Break, 26, 65-71. Bond, C. E. et al., (2007). 'What do you think this is?: "Conceptual uncertainty" In geoscience interpretation'. GSA Today, 17, 4-10.

  12. Weighted Feature Significance: A Simple, Interpretable Model of Compound Toxicity Based on the Statistical Enrichment of Structural Features

    PubMed Central

    Huang, Ruili; Southall, Noel; Xia, Menghang; Cho, Ming-Hsuang; Jadhav, Ajit; Nguyen, Dac-Trung; Inglese, James; Tice, Raymond R.; Austin, Christopher P.

    2009-01-01

    In support of the U.S. Tox21 program, we have developed a simple and chemically intuitive model we call weighted feature significance (WFS) to predict the toxicological activity of compounds, based on the statistical enrichment of structural features in toxic compounds. We trained and tested the model on the following: (1) data from quantitative high–throughput screening cytotoxicity and caspase activation assays conducted at the National Institutes of Health Chemical Genomics Center, (2) data from Salmonella typhimurium reverse mutagenicity assays conducted by the U.S. National Toxicology Program, and (3) hepatotoxicity data published in the Registry of Toxic Effects of Chemical Substances. Enrichments of structural features in toxic compounds are evaluated for their statistical significance and compiled into a simple additive model of toxicity and then used to score new compounds for potential toxicity. The predictive power of the model for cytotoxicity was validated using an independent set of compounds from the U.S. Environmental Protection Agency tested also at the National Institutes of Health Chemical Genomics Center. We compared the performance of our WFS approach with classical classification methods such as Naive Bayesian clustering and support vector machines. In most test cases, WFS showed similar or slightly better predictive power, especially in the prediction of hepatotoxic compounds, where WFS appeared to have the best performance among the three methods. The new algorithm has the important advantages of simplicity, power, interpretability, and ease of implementation. PMID:19805409

  13. SERE: single-parameter quality control and sample comparison for RNA-Seq.

    PubMed

    Schulze, Stefan K; Kanwar, Rahul; Gölzenleuchter, Meike; Therneau, Terry M; Beutler, Andreas S

    2012-10-03

    Assessing the reliability of experimental replicates (or global alterations corresponding to different experimental conditions) is a critical step in analyzing RNA-Seq data. Pearson's correlation coefficient r has been widely used in the RNA-Seq field even though its statistical characteristics may be poorly suited to the task. Here we present a single-parameter test procedure for count data, the Simple Error Ratio Estimate (SERE), that can determine whether two RNA-Seq libraries are faithful replicates or globally different. Benchmarking shows that the interpretation of SERE is unambiguous regardless of the total read count or the range of expression differences among bins (exons or genes), a score of 1 indicating faithful replication (i.e., samples are affected only by Poisson variation of individual counts), a score of 0 indicating data duplication, and scores >1 corresponding to true global differences between RNA-Seq libraries. On the contrary the interpretation of Pearson's r is generally ambiguous and highly dependent on sequencing depth and the range of expression levels inherent to the sample (difference between lowest and highest bin count). Cohen's simple Kappa results are also ambiguous and are highly dependent on the choice of bins. For quantifying global sample differences SERE performs similarly to a measure based on the negative binomial distribution yet is simpler to compute. SERE can therefore serve as a straightforward and reliable statistical procedure for the global assessment of pairs or large groups of RNA-Seq datasets by a single statistical parameter.

  14. Matching the Statistical Model to the Research Question for Dental Caries Indices with Many Zero Counts.

    PubMed

    Preisser, John S; Long, D Leann; Stamm, John W

    2017-01-01

    Marginalized zero-inflated count regression models have recently been introduced for the statistical analysis of dental caries indices and other zero-inflated count data as alternatives to traditional zero-inflated and hurdle models. Unlike the standard approaches, the marginalized models directly estimate overall exposure or treatment effects by relating covariates to the marginal mean count. This article discusses model interpretation and model class choice according to the research question being addressed in caries research. Two data sets, one consisting of fictional dmft counts in 2 groups and the other on DMFS among schoolchildren from a randomized clinical trial comparing 3 toothpaste formulations to prevent incident dental caries, are analyzed with negative binomial hurdle, zero-inflated negative binomial, and marginalized zero-inflated negative binomial models. In the first example, estimates of treatment effects vary according to the type of incidence rate ratio (IRR) estimated by the model. Estimates of IRRs in the analysis of the randomized clinical trial were similar despite their distinctive interpretations. The choice of statistical model class should match the study's purpose, while accounting for the broad decline in children's caries experience, such that dmft and DMFS indices more frequently generate zero counts. Marginalized (marginal mean) models for zero-inflated count data should be considered for direct assessment of exposure effects on the marginal mean dental caries count in the presence of high frequencies of zero counts. © 2017 S. Karger AG, Basel.

  15. SERE: Single-parameter quality control and sample comparison for RNA-Seq

    PubMed Central

    2012-01-01

    Background Assessing the reliability of experimental replicates (or global alterations corresponding to different experimental conditions) is a critical step in analyzing RNA-Seq data. Pearson’s correlation coefficient r has been widely used in the RNA-Seq field even though its statistical characteristics may be poorly suited to the task. Results Here we present a single-parameter test procedure for count data, the Simple Error Ratio Estimate (SERE), that can determine whether two RNA-Seq libraries are faithful replicates or globally different. Benchmarking shows that the interpretation of SERE is unambiguous regardless of the total read count or the range of expression differences among bins (exons or genes), a score of 1 indicating faithful replication (i.e., samples are affected only by Poisson variation of individual counts), a score of 0 indicating data duplication, and scores >1 corresponding to true global differences between RNA-Seq libraries. On the contrary the interpretation of Pearson’s r is generally ambiguous and highly dependent on sequencing depth and the range of expression levels inherent to the sample (difference between lowest and highest bin count). Cohen’s simple Kappa results are also ambiguous and are highly dependent on the choice of bins. For quantifying global sample differences SERE performs similarly to a measure based on the negative binomial distribution yet is simpler to compute. Conclusions SERE can therefore serve as a straightforward and reliable statistical procedure for the global assessment of pairs or large groups of RNA-Seq datasets by a single statistical parameter. PMID:23033915

  16. The “Task B problem” and other considerations in developmental functional neuroimaging

    PubMed Central

    Church, Jessica A.; Petersen, Steven E.; Schlaggar, Bradley L.

    2012-01-01

    Functional neuroimaging provides a remarkable tool to allow us to study cognition across the lifespan and in special populations in a safe way. However, experimenters face a number of methodological issues, and these issues are particularly pertinent when imaging children. This brief article discusses assessing task performance, strategies for dealing with group performance differences, controlling for movement, statistical power, proper atlas registration, and data analysis strategies. In addition, there will be discussion of two other topics that have important implications for interpreting fMRI data: the question of whether functional neuroanatomical differences between adults and children are the consequence of putative developmental neurovascular differences, and the issue of interpreting negative blood oxygenation-level dependent (BOLD) signal change. PMID:20496376

  17. Interpretation of Confidence Interval Facing the Conflict

    ERIC Educational Resources Information Center

    Andrade, Luisa; Fernández, Felipe

    2016-01-01

    As literature has reported, it is usual that university students in statistics courses, and even statistics teachers, interpret the confidence level associated with a confidence interval as the probability that the parameter value will be between the lower and upper interval limits. To confront this misconception, class activities have been…

  18. [Age index and an interpretation of survivorship curves (author's transl)].

    PubMed

    Lohmann, W

    1977-01-01

    Clinical investigations showed that the age dependences of physiological functions do not show -- as generally assumed -- a linear increase with age, but an exponential one. Considering this result one can easily interpret the survivorship curve of a population (Gompertz plot). The only thing that is required is that the probability of death (death rate) is proportional to a function of ageing given by mu(t) = mu0 exp (alpha t). Considering survivorship curves resulting from annual death statistics and fitting them by suitable parameters, then the resulting alpha-values are in agreement with clinical data.

  19. Translational Research for Occupational Therapy: Using SPRE in Hippotherapy for Children with Developmental Disabilities.

    PubMed

    Weissman-Miller, Deborah; Miller, Rosalie J; Shotwell, Mary P

    2017-01-01

    Translational research is redefined in this paper using a combination of methods in statistics and data science to enhance the understanding of outcomes and practice in occupational therapy. These new methods are applied, using larger data and smaller single-subject data, to a study in hippotherapy for children with developmental disabilities (DD). The Centers for Disease Control and Prevention estimates DD affects nearly 10 million children, aged 2-19, where diagnoses may be comorbid. Hippotherapy is defined here as a treatment strategy in occupational therapy using equine movement to achieve functional outcomes. Semiparametric ratio estimator (SPRE), a single-subject statistical and small data science model, is used to derive a "change point" indicating where the participant adapts to treatment, from which predictions are made. Data analyzed here is from an institutional review board approved pilot study using the Hippotherapy Evaluation and Assessment Tool measure, where outcomes are given separately for each of four measured domains and the total scores of each participant. Analysis with SPRE, using statistical methods to predict a "change point" and data science graphical interpretations of data, shows the translational comparisons between results from larger mean values and the very different results from smaller values for each HEAT domain in terms of relationships and statistical probabilities.

  20. Translational Research for Occupational Therapy: Using SPRE in Hippotherapy for Children with Developmental Disabilities

    PubMed Central

    Miller, Rosalie J.; Shotwell, Mary P.

    2017-01-01

    Translational research is redefined in this paper using a combination of methods in statistics and data science to enhance the understanding of outcomes and practice in occupational therapy. These new methods are applied, using larger data and smaller single-subject data, to a study in hippotherapy for children with developmental disabilities (DD). The Centers for Disease Control and Prevention estimates DD affects nearly 10 million children, aged 2–19, where diagnoses may be comorbid. Hippotherapy is defined here as a treatment strategy in occupational therapy using equine movement to achieve functional outcomes. Semiparametric ratio estimator (SPRE), a single-subject statistical and small data science model, is used to derive a “change point” indicating where the participant adapts to treatment, from which predictions are made. Data analyzed here is from an institutional review board approved pilot study using the Hippotherapy Evaluation and Assessment Tool measure, where outcomes are given separately for each of four measured domains and the total scores of each participant. Analysis with SPRE, using statistical methods to predict a “change point” and data science graphical interpretations of data, shows the translational comparisons between results from larger mean values and the very different results from smaller values for each HEAT domain in terms of relationships and statistical probabilities. PMID:29097962

  1. Solution x-ray scattering and structure formation in protein dynamics

    NASA Astrophysics Data System (ADS)

    Nasedkin, Alexandr; Davidsson, Jan; Niemi, Antti J.; Peng, Xubiao

    2017-12-01

    We propose a computationally effective approach that builds on Landau mean-field theory in combination with modern nonequilibrium statistical mechanics to model and interpret protein dynamics and structure formation in small- to wide-angle x-ray scattering (S/WAXS) experiments. We develop the methodology by analyzing experimental data in the case of Engrailed homeodomain protein as an example. We demonstrate how to interpret S/WAXS data qualitatively with a good precision and over an extended temperature range. We explain experimental observations in terms of protein phase structure, and we make predictions for future experiments and for how to analyze data at different ambient temperature values. We conclude that the approach we propose has the potential to become a highly accurate, computationally effective, and predictive tool for analyzing S/WAXS data. For this, we compare our results with those obtained previously in an all-atom molecular dynamics simulation.

  2. STATISTICAL ANALYSIS OF SPECTROPHOTOMETRIC DETERMINATIONS OF BORON; Estudo Estatistico de Determinacoes Espectrofotometricas de Boro

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lima, F.W.; Pagano, C.; Schneiderman, B.

    1959-07-01

    Boron can be determined quantitatively by absorption spectrophotometry of solutions of the red compound formed by the reaction of boric acid with curcumin. This reaction is affected by various factors, some of which can be detected easily in the data interpretation. Others, however, provide more difficulty. The application of modern statistical method to the study of the influence of these factors on the quantitative determination of boron is presented. These methods provide objective ways of establishing significant effects of the factors involved. (auth)

  3. Hydrological responses to dynamically and statistically downscaled climate model output

    USGS Publications Warehouse

    Wilby, R.L.; Hay, L.E.; Gutowski, W.J.; Arritt, R.W.; Takle, E.S.; Pan, Z.; Leavesley, G.H.; Clark, M.P.

    2000-01-01

    Daily rainfall and surface temperature series were simulated for the Animas River basin, Colorado using dynamically and statistically downscaled output from the National Center for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) re-analysis. A distributed hydrological model was then applied to the downscaled data. Relative to raw NCEP output, downscaled climate variables provided more realistic stimulations of basin scale hydrology. However, the results highlight the sensitivity of modeled processes to the choice of downscaling technique, and point to the need for caution when interpreting future hydrological scenarios.

  4. Comment on 'Imaging of prompt gamma rays emitted during delivery of clinical proton beams with a Compton camera: feasibility studies for range verification'.

    PubMed

    Sitek, Arkadiusz

    2016-12-21

    The origin ensemble (OE) algorithm is a new method used for image reconstruction from nuclear tomographic data. The main advantage of this algorithm is the ease of implementation for complex tomographic models and the sound statistical theory. In this comment, the author provides the basics of the statistical interpretation of OE and gives suggestions for the improvement of the algorithm in the application to prompt gamma imaging as described in Polf et al (2015 Phys. Med. Biol. 60 7085).

  5. Comment on ‘Imaging of prompt gamma rays emitted during delivery of clinical proton beams with a Compton camera: feasibility studies for range verification’

    NASA Astrophysics Data System (ADS)

    Sitek, Arkadiusz

    2016-12-01

    The origin ensemble (OE) algorithm is a new method used for image reconstruction from nuclear tomographic data. The main advantage of this algorithm is the ease of implementation for complex tomographic models and the sound statistical theory. In this comment, the author provides the basics of the statistical interpretation of OE and gives suggestions for the improvement of the algorithm in the application to prompt gamma imaging as described in Polf et al (2015 Phys. Med. Biol. 60 7085).

  6. Modified two-sources quantum statistical model and multiplicity fluctuation in the finite rapidity region

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ghosh, D.; Sarkar, S.; Sen, S.

    1995-06-01

    In this paper the behavior of factorial moments with rapidity window size, which is usually explained in terms of ``intermittency,`` has been interpreted by simple quantum statistical properties of the emitting system using the concept of ``modified two-source model`` as recently proposed by Ghosh and Sarkar [Phys. Lett. B 278, 465 (1992)]. The analysis has been performed using our own data of {sup 16}O-Ag/Br and {sup 24}Mg-Ag/Br interactions at a few tens of GeV energy regime.

  7. Multi-Reader ROC studies with Split-Plot Designs: A Comparison of Statistical Methods

    PubMed Central

    Obuchowski, Nancy A.; Gallas, Brandon D.; Hillis, Stephen L.

    2012-01-01

    Rationale and Objectives Multi-reader imaging trials often use a factorial design, where study patients undergo testing with all imaging modalities and readers interpret the results of all tests for all patients. A drawback of the design is the large number of interpretations required of each reader. Split-plot designs have been proposed as an alternative, in which one or a subset of readers interprets all images of a sample of patients, while other readers interpret the images of other samples of patients. In this paper we compare three methods of analysis for the split-plot design. Materials and Methods Three statistical methods are presented: Obuchowski-Rockette method modified for the split-plot design, a newly proposed marginal-mean ANOVA approach, and an extension of the three-sample U-statistic method. A simulation study using the Roe-Metz model was performed to compare the type I error rate, power and confidence interval coverage of the three test statistics. Results The type I error rates for all three methods are close to the nominal level but tend to be slightly conservative. The statistical power is nearly identical for the three methods. The coverage of 95% CIs fall close to the nominal coverage for small and large sample sizes. Conclusions The split-plot MRMC study design can be statistically efficient compared with the factorial design, reducing the number of interpretations required per reader. Three methods of analysis, shown to have nominal type I error rate, similar power, and nominal CI coverage, are available for this study design. PMID:23122570

  8. Multi-reader ROC studies with split-plot designs: a comparison of statistical methods.

    PubMed

    Obuchowski, Nancy A; Gallas, Brandon D; Hillis, Stephen L

    2012-12-01

    Multireader imaging trials often use a factorial design, in which study patients undergo testing with all imaging modalities and readers interpret the results of all tests for all patients. A drawback of this design is the large number of interpretations required of each reader. Split-plot designs have been proposed as an alternative, in which one or a subset of readers interprets all images of a sample of patients, while other readers interpret the images of other samples of patients. In this paper, the authors compare three methods of analysis for the split-plot design. Three statistical methods are presented: the Obuchowski-Rockette method modified for the split-plot design, a newly proposed marginal-mean analysis-of-variance approach, and an extension of the three-sample U-statistic method. A simulation study using the Roe-Metz model was performed to compare the type I error rate, power, and confidence interval coverage of the three test statistics. The type I error rates for all three methods are close to the nominal level but tend to be slightly conservative. The statistical power is nearly identical for the three methods. The coverage of 95% confidence intervals falls close to the nominal coverage for small and large sample sizes. The split-plot multireader, multicase study design can be statistically efficient compared to the factorial design, reducing the number of interpretations required per reader. Three methods of analysis, shown to have nominal type I error rates, similar power, and nominal confidence interval coverage, are available for this study design. Copyright © 2012 AUR. All rights reserved.

  9. Potential errors and misuse of statistics in studies on leakage in endodontics.

    PubMed

    Lucena, C; Lopez, J M; Pulgar, R; Abalos, C; Valderrama, M J

    2013-04-01

    To assess the quality of the statistical methodology used in studies of leakage in Endodontics, and to compare the results found using appropriate versus inappropriate inferential statistical methods. The search strategy used the descriptors 'root filling' 'microleakage', 'dye penetration', 'dye leakage', 'polymicrobial leakage' and 'fluid filtration' for the time interval 2001-2010 in journals within the categories 'Dentistry, Oral Surgery and Medicine' and 'Materials Science, Biomaterials' of the Journal Citation Report. All retrieved articles were reviewed to find potential pitfalls in statistical methodology that may be encountered during study design, data management or data analysis. The database included 209 papers. In all the studies reviewed, the statistical methods used were appropriate for the category attributed to the outcome variable, but in 41% of the cases, the chi-square test or parametric methods were inappropriately selected subsequently. In 2% of the papers, no statistical test was used. In 99% of cases, a statistically 'significant' or 'not significant' effect was reported as a main finding, whilst only 1% also presented an estimation of the magnitude of the effect. When the appropriate statistical methods were applied in the studies with originally inappropriate data analysis, the conclusions changed in 19% of the cases. Statistical deficiencies in leakage studies may affect their results and interpretation and might be one of the reasons for the poor agreement amongst the reported findings. Therefore, more effort should be made to standardize statistical methodology. © 2012 International Endodontic Journal.

  10. Statistical analysis of aerosol species, trace gasses, and meteorology in Chicago.

    PubMed

    Binaku, Katrina; O'Brien, Timothy; Schmeling, Martina; Fosco, Tinamarie

    2013-09-01

    Both canonical correlation analysis (CCA) and principal component analysis (PCA) were applied to atmospheric aerosol and trace gas concentrations and meteorological data collected in Chicago during the summer months of 2002, 2003, and 2004. Concentrations of ammonium, calcium, nitrate, sulfate, and oxalate particulate matter, as well as, meteorological parameters temperature, wind speed, wind direction, and humidity were subjected to CCA and PCA. Ozone and nitrogen oxide mixing ratios were also included in the data set. The purpose of statistical analysis was to determine the extent of existing linear relationship(s), or lack thereof, between meteorological parameters and pollutant concentrations in addition to reducing dimensionality of the original data to determine sources of pollutants. In CCA, the first three canonical variate pairs derived were statistically significant at the 0.05 level. Canonical correlation between the first canonical variate pair was 0.821, while correlations of the second and third canonical variate pairs were 0.562 and 0.461, respectively. The first canonical variate pair indicated that increasing temperatures resulted in high ozone mixing ratios, while the second canonical variate pair showed wind speed and humidity's influence on local ammonium concentrations. No new information was uncovered in the third variate pair. Canonical loadings were also interpreted for information regarding relationships between data sets. Four principal components (PCs), expressing 77.0 % of original data variance, were derived in PCA. Interpretation of PCs suggested significant production and/or transport of secondary aerosols in the region (PC1). Furthermore, photochemical production of ozone and wind speed's influence on pollutants were expressed (PC2) along with overall measure of local meteorology (PC3). In summary, CCA and PCA results combined were successful in uncovering linear relationships between meteorology and air pollutants in Chicago and aided in determining possible pollutant sources.

  11. Visualization of the variability of 3D statistical shape models by animation.

    PubMed

    Lamecker, Hans; Seebass, Martin; Lange, Thomas; Hege, Hans-Christian; Deuflhard, Peter

    2004-01-01

    Models of the 3D shape of anatomical objects and the knowledge about their statistical variability are of great benefit in many computer assisted medical applications like images analysis, therapy or surgery planning. Statistical model of shapes have successfully been applied to automate the task of image segmentation. The generation of 3D statistical shape models requires the identification of corresponding points on two shapes. This remains a difficult problem, especially for shapes of complicated topology. In order to interpret and validate variations encoded in a statistical shape model, visual inspection is of great importance. This work describes the generation and interpretation of statistical shape models of the liver and the pelvic bone.

  12. On Improving the Quality and Interpretation of Environmental Assessments using Statistical Analysis and Geographic Information Systems

    NASA Astrophysics Data System (ADS)

    Karuppiah, R.; Faldi, A.; Laurenzi, I.; Usadi, A.; Venkatesh, A.

    2014-12-01

    An increasing number of studies are focused on assessing the environmental footprint of different products and processes, especially using life cycle assessment (LCA). This work shows how combining statistical methods and Geographic Information Systems (GIS) with environmental analyses can help improve the quality of results and their interpretation. Most environmental assessments in literature yield single numbers that characterize the environmental impact of a process/product - typically global or country averages, often unchanging in time. In this work, we show how statistical analysis and GIS can help address these limitations. For example, we demonstrate a method to separately quantify uncertainty and variability in the result of LCA models using a power generation case study. This is important for rigorous comparisons between the impacts of different processes. Another challenge is lack of data that can affect the rigor of LCAs. We have developed an approach to estimate environmental impacts of incompletely characterized processes using predictive statistical models. This method is applied to estimate unreported coal power plant emissions in several world regions. There is also a general lack of spatio-temporal characterization of the results in environmental analyses. For instance, studies that focus on water usage do not put in context where and when water is withdrawn. Through the use of hydrological modeling combined with GIS, we quantify water stress on a regional and seasonal basis to understand water supply and demand risks for multiple users. Another example where it is important to consider regional dependency of impacts is when characterizing how agricultural land occupation affects biodiversity in a region. We developed a data-driven methodology used in conjuction with GIS to determine if there is a statistically significant difference between the impacts of growing different crops on different species in various biomes of the world.

  13. An Efficient Data Partitioning to Improve Classification Performance While Keeping Parameters Interpretable

    PubMed Central

    Korjus, Kristjan; Hebart, Martin N.; Vicente, Raul

    2016-01-01

    Supervised machine learning methods typically require splitting data into multiple chunks for training, validating, and finally testing classifiers. For finding the best parameters of a classifier, training and validation are usually carried out with cross-validation. This is followed by application of the classifier with optimized parameters to a separate test set for estimating the classifier’s generalization performance. With limited data, this separation of test data creates a difficult trade-off between having more statistical power in estimating generalization performance versus choosing better parameters and fitting a better model. We propose a novel approach that we term “Cross-validation and cross-testing” improving this trade-off by re-using test data without biasing classifier performance. The novel approach is validated using simulated data and electrophysiological recordings in humans and rodents. The results demonstrate that the approach has a higher probability of discovering significant results than the standard approach of cross-validation and testing, while maintaining the nominal alpha level. In contrast to nested cross-validation, which is maximally efficient in re-using data, the proposed approach additionally maintains the interpretability of individual parameters. Taken together, we suggest an addition to currently used machine learning approaches which may be particularly useful in cases where model weights do not require interpretation, but parameters do. PMID:27564393

  14. An Efficient Data Partitioning to Improve Classification Performance While Keeping Parameters Interpretable.

    PubMed

    Korjus, Kristjan; Hebart, Martin N; Vicente, Raul

    2016-01-01

    Supervised machine learning methods typically require splitting data into multiple chunks for training, validating, and finally testing classifiers. For finding the best parameters of a classifier, training and validation are usually carried out with cross-validation. This is followed by application of the classifier with optimized parameters to a separate test set for estimating the classifier's generalization performance. With limited data, this separation of test data creates a difficult trade-off between having more statistical power in estimating generalization performance versus choosing better parameters and fitting a better model. We propose a novel approach that we term "Cross-validation and cross-testing" improving this trade-off by re-using test data without biasing classifier performance. The novel approach is validated using simulated data and electrophysiological recordings in humans and rodents. The results demonstrate that the approach has a higher probability of discovering significant results than the standard approach of cross-validation and testing, while maintaining the nominal alpha level. In contrast to nested cross-validation, which is maximally efficient in re-using data, the proposed approach additionally maintains the interpretability of individual parameters. Taken together, we suggest an addition to currently used machine learning approaches which may be particularly useful in cases where model weights do not require interpretation, but parameters do.

  15. Modified MODIS fluorescence line height data product to improve image interpretation for red tide monitoring in the eastern Gulf of Mexico

    NASA Astrophysics Data System (ADS)

    Hu, Chuanmin; Feng, Lian

    2017-01-01

    Several satellite-based methods have been used to detect and trace Karenia brevis red tide blooms in the eastern Gulf of Mexico. Some require data statistics and multiple data products while others use a single data product. Of these, the MODIS normalized fluorescence line height (nFLH) has shown its advantage of detecting blooms in waters rich in colored dissolved organic matter, thus having been used routinely to assess bloom conditions by the Florida Fish and Wildlife Conservation Commission (FWC), the official state agency of Florida responsible for red tide monitoring and mitigation. However, elevated sediment concentrations in the water column due to wind storms can also result in high nFLH values, leading to false-positive bloom interpretation. Here, a modified nFLH data product is developed to minimize such impacts through empirical adjustments of the nFLH values using MODIS-derived remote sensing reflectance in the green band at 547 nm. The new product is termed as an algal bloom index (ABI), which has shown improved performance over the original nFLH in both retrospective evaluation statistics and near real-time applications. The ABI product has been made available in near real-time through a Web portal and has been used by the FWC on a routine basis to guide field sampling efforts and prepare for red tide bulletins distributed to many user groups.

  16. Statistics of Statisticians: Critical Mass of Statistics and Operational Research Groups

    NASA Astrophysics Data System (ADS)

    Kenna, Ralph; Berche, Bertrand

    Using a recently developed model, inspired by mean field theory in statistical physics, and data from the UK's Research Assessment Exercise, we analyse the relationship between the qualities of statistics and operational research groups and the quantities of researchers in them. Similar to other academic disciplines, we provide evidence for a linear dependency of quality on quantity up to an upper critical mass, which is interpreted as the average maximum number of colleagues with whom a researcher can communicate meaningfully within a research group. The model also predicts a lower critical mass, which research groups should strive to achieve to avoid extinction. For statistics and operational research, the lower critical mass is estimated to be 9 ± 3. The upper critical mass, beyond which research quality does not significantly depend on group size, is 17 ± 6.

  17. Time series, periodograms, and significance

    NASA Astrophysics Data System (ADS)

    Hernandez, G.

    1999-05-01

    The geophysical literature shows a wide and conflicting usage of methods employed to extract meaningful information on coherent oscillations from measurements. This makes it difficult, if not impossible, to relate the findings reported by different authors. Therefore, we have undertaken a critical investigation of the tests and methodology used for determining the presence of statistically significant coherent oscillations in periodograms derived from time series. Statistical significance tests are only valid when performed on the independent frequencies present in a measurement. Both the number of possible independent frequencies in a periodogram and the significance tests are determined by the number of degrees of freedom, which is the number of true independent measurements, present in the time series, rather than the number of sample points in the measurement. The number of degrees of freedom is an intrinsic property of the data, and it must be determined from the serial coherence of the time series. As part of this investigation, a detailed study has been performed which clearly illustrates the deleterious effects that the apparently innocent and commonly used processes of filtering, de-trending, and tapering of data have on periodogram analysis and the consequent difficulties in the interpretation of the statistical significance thus derived. For the sake of clarity, a specific example of actual field measurements containing unevenly-spaced measurements, gaps, etc., as well as synthetic examples, have been used to illustrate the periodogram approach, and pitfalls, leading to the (statistical) significance tests for the presence of coherent oscillations. Among the insights of this investigation are: (1) the concept of a time series being (statistically) band limited by its own serial coherence and thus having a critical sampling rate which defines one of the necessary requirements for the proper statistical design of an experiment; (2) the design of a critical test for the maximum number of significant frequencies which can be used to describe a time series, while retaining intact the variance of the test sample; (3) a demonstration of the unnecessary difficulties that manipulation of the data brings into the statistical significance interpretation of said data; and (4) the resolution and correction of the apparent discrepancy in significance results obtained by the use of the conventional Lomb-Scargle significance test, when compared with the long-standing Schuster-Walker and Fisher tests.

  18. Statistical methods and errors in family medicine articles between 2010 and 2014-Suez Canal University, Egypt: A cross-sectional study.

    PubMed

    Nour-Eldein, Hebatallah

    2016-01-01

    With limited statistical knowledge of most physicians it is not uncommon to find statistical errors in research articles. To determine the statistical methods and to assess the statistical errors in family medicine (FM) research articles that were published between 2010 and 2014. This was a cross-sectional study. All 66 FM research articles that were published over 5 years by FM authors with affiliation to Suez Canal University were screened by the researcher between May and August 2015. Types and frequencies of statistical methods were reviewed in all 66 FM articles. All 60 articles with identified inferential statistics were examined for statistical errors and deficiencies. A comprehensive 58-item checklist based on statistical guidelines was used to evaluate the statistical quality of FM articles. Inferential methods were recorded in 62/66 (93.9%) of FM articles. Advanced analyses were used in 29/66 (43.9%). Contingency tables 38/66 (57.6%), regression (logistic, linear) 26/66 (39.4%), and t-test 17/66 (25.8%) were the most commonly used inferential tests. Within 60 FM articles with identified inferential statistics, no prior sample size 19/60 (31.7%), application of wrong statistical tests 17/60 (28.3%), incomplete documentation of statistics 59/60 (98.3%), reporting P value without test statistics 32/60 (53.3%), no reporting confidence interval with effect size measures 12/60 (20.0%), use of mean (standard deviation) to describe ordinal/nonnormal data 8/60 (13.3%), and errors related to interpretation were mainly for conclusions without support by the study data 5/60 (8.3%). Inferential statistics were used in the majority of FM articles. Data analysis and reporting statistics are areas for improvement in FM research articles.

  19. Statistical methods and errors in family medicine articles between 2010 and 2014-Suez Canal University, Egypt: A cross-sectional study

    PubMed Central

    Nour-Eldein, Hebatallah

    2016-01-01

    Background: With limited statistical knowledge of most physicians it is not uncommon to find statistical errors in research articles. Objectives: To determine the statistical methods and to assess the statistical errors in family medicine (FM) research articles that were published between 2010 and 2014. Methods: This was a cross-sectional study. All 66 FM research articles that were published over 5 years by FM authors with affiliation to Suez Canal University were screened by the researcher between May and August 2015. Types and frequencies of statistical methods were reviewed in all 66 FM articles. All 60 articles with identified inferential statistics were examined for statistical errors and deficiencies. A comprehensive 58-item checklist based on statistical guidelines was used to evaluate the statistical quality of FM articles. Results: Inferential methods were recorded in 62/66 (93.9%) of FM articles. Advanced analyses were used in 29/66 (43.9%). Contingency tables 38/66 (57.6%), regression (logistic, linear) 26/66 (39.4%), and t-test 17/66 (25.8%) were the most commonly used inferential tests. Within 60 FM articles with identified inferential statistics, no prior sample size 19/60 (31.7%), application of wrong statistical tests 17/60 (28.3%), incomplete documentation of statistics 59/60 (98.3%), reporting P value without test statistics 32/60 (53.3%), no reporting confidence interval with effect size measures 12/60 (20.0%), use of mean (standard deviation) to describe ordinal/nonnormal data 8/60 (13.3%), and errors related to interpretation were mainly for conclusions without support by the study data 5/60 (8.3%). Conclusion: Inferential statistics were used in the majority of FM articles. Data analysis and reporting statistics are areas for improvement in FM research articles. PMID:27453839

  20. GO Explorer: A gene-ontology tool to aid in the interpretation of shotgun proteomics data.

    PubMed

    Carvalho, Paulo C; Fischer, Juliana Sg; Chen, Emily I; Domont, Gilberto B; Carvalho, Maria Gc; Degrave, Wim M; Yates, John R; Barbosa, Valmir C

    2009-02-24

    Spectral counting is a shotgun proteomics approach comprising the identification and relative quantitation of thousands of proteins in complex mixtures. However, this strategy generates bewildering amounts of data whose biological interpretation is a challenge. Here we present a new algorithm, termed GO Explorer (GOEx), that leverages the gene ontology (GO) to aid in the interpretation of proteomic data. GOEx stands out because it combines data from protein fold changes with GO over-representation statistics to help draw conclusions. Moreover, it is tightly integrated within the PatternLab for Proteomics project and, thus, lies within a complete computational environment that provides parsers and pattern recognition tools designed for spectral counting. GOEx offers three independent methods to query data: an interactive directed acyclic graph, a specialist mode where key words can be searched, and an automatic search. Its usefulness is demonstrated by applying it to help interpret the effects of perillyl alcohol, a natural chemotherapeutic agent, on glioblastoma multiform cell lines (A172). We used a new multi-surfactant shotgun proteomic strategy and identified more than 2600 proteins; GOEx pinpointed key sets of differentially expressed proteins related to cell cycle, alcohol catabolism, the Ras pathway, apoptosis, and stress response, to name a few. GOEx facilitates organism-specific studies by leveraging GO and providing a rich graphical user interface. It is a simple to use tool, specialized for biologists who wish to analyze spectral counting data from shotgun proteomics. GOEx is available at http://pcarvalho.com/patternlab.

  1. Statistical Compression of Wind Speed Data

    NASA Astrophysics Data System (ADS)

    Tagle, F.; Castruccio, S.; Crippa, P.; Genton, M.

    2017-12-01

    In this work we introduce a lossy compression approach that utilizes a stochastic wind generator based on a non-Gaussian distribution to reproduce the internal climate variability of daily wind speed as represented by the CESM Large Ensemble over Saudi Arabia. Stochastic wind generators, and stochastic weather generators more generally, are statistical models that aim to match certain statistical properties of the data on which they are trained. They have been used extensively in applications ranging from agricultural models to climate impact studies. In this novel context, the parameters of the fitted model can be interpreted as encoding the information contained in the original uncompressed data. The statistical model is fit to only 3 of the 30 ensemble members and it adequately captures the variability of the ensemble in terms of seasonal internannual variability of daily wind speed. To deal with such a large spatial domain, it is partitioned into 9 region, and the model is fit independently to each of these. We further discuss a recent refinement of the model, which relaxes this assumption of regional independence, by introducing a large-scale component that interacts with the fine-scale regional effects.

  2. Land Cover Change in the Boston Mountains, 1973-2000

    USGS Publications Warehouse

    Karstensen, Krista A.

    2009-01-01

    The U.S. Geological Survey (USGS) Land Cover Trends project is focused on understanding the rates, trends, causes, and consequences of contemporary U.S. land-cover change. The objectives of the study are to: (1) to develop a comprehensive methodology for using sampling and change analysis techniques and Landsat Multispectral Scanner (MSS), Thematic Mapper (TM), and Enhanced Thematic Mapper Plus (ETM+) data to measure regional land-cover change across the United States; (2) to characterize the types, rates, and temporal variability of change for a 30-year period; (3) to document regional driving forces and consequences of change; and (4) to prepare a national synthesis of land-cover change (Loveland and others, 1999). The 1999 Environmental Protection Agency (EPA) Level III ecoregions derived from Omernik (1987) provide the geographic framework for the geospatial data collected between 1973 and 2000. The 27-year study period was divided into five temporal periods: 1973-1980, 1980-1986, 1986-1992, 1992-2000, and 1973-2000, and the data are evaluated using a modified Anderson Land Use Land Cover Classification System (Anderson and others, 1976) for image interpretation. The rates of land-cover change are estimated using a stratified, random sampling of 10-kilometer (km) by 10-km blocks allocated within each ecoregion. For each sample block, satellite images are used to interpret land-cover change for the five time periods previously mentioned. Additionally, historic aerial photographs from similar time frames and other ancillary data, such as census statistics and published literature, are used. The sample block data are then incorporated into statistical analyses to generate an overall change matrix for the ecoregion. Field data of the sample blocks include direct measurements of land cover, particularly ground-survey data collected for training and validation of image classifications (Loveland and others, 2002). The field experience allows for additional observations of the character and condition of the landscape, assistance in sample block interpretation, ground truthing of Landsat imagery, and determination of the driving forces of change identified in an ecoregion.

  3. Spit: saliva in nursing research, uses and methodological considerations in older adults.

    PubMed

    Woods, Diana Lynn; Mentes, Janet C

    2011-07-01

    Over the last 10 years, interest in the analysis of saliva as a biomarker for a variety of systemic diseases or for potential disease has soared. There are numerous advantages to using saliva as a biological fluid, particularly for nurse researchers working with vulnerable populations, such as frail older adults. Most notably, it is noninvasive and easier to collect than serum or urine. The authors describe their experiences with the use of saliva in research with older adults that examined (a) osmolality as an indicator of hydration status and (b) cortisol and behavioral symptoms of dementia. In particular, the authors discuss the timing of data collection along with data analysis and interpretation. For example, it is not enough to detect levels or rely solely on summary statistics; rather it is critical to characterize any rhythmicity inherent in the parameter of interest. Not accounting for rhythmicity in the analysis and interpretation of data can limit the interpretation of associations, thus impeding advances related to the contribution that an altered rhythm may make to individual vulnerability.

  4. Comment on "Satellites reveal contrasting responses of regional climate to the widespread greening of Earth".

    PubMed

    Li, Yue; Zeng, Zhenzhong; Huang, Ling; Lian, Xu; Piao, Shilong

    2018-06-15

    Forzieri et al (Reports, 16 June 2017, p. 1180) used satellite data to show that boreal greening caused regional warming. We show that this positive sensitivity of temperature to the greening can be derived from the positive response of vegetation to boreal warming, which indicates that results from a statistical regression with satellite data should be carefully interpreted. Copyright © 2018, American Association for the Advancement of Science.

  5. Interpreting “statistical hypothesis testing” results in clinical research

    PubMed Central

    Sarmukaddam, Sanjeev B.

    2012-01-01

    Difference between “Clinical Significance and Statistical Significance” should be kept in mind while interpreting “statistical hypothesis testing” results in clinical research. This fact is already known to many but again pointed out here as philosophy of “statistical hypothesis testing” is sometimes unnecessarily criticized mainly due to failure in considering such distinction. Randomized controlled trials are also wrongly criticized similarly. Some scientific method may not be applicable in some peculiar/particular situation does not mean that the method is useless. Also remember that “statistical hypothesis testing” is not for decision making and the field of “decision analysis” is very much an integral part of science of statistics. It is not correct to say that “confidence intervals have nothing to do with confidence” unless one understands meaning of the word “confidence” as used in context of confidence interval. Interpretation of the results of every study should always consider all possible alternative explanations like chance, bias, and confounding. Statistical tests in inferential statistics are, in general, designed to answer the question “How likely is the difference found in random sample(s) is due to chance” and therefore limitation of relying only on statistical significance in making clinical decisions should be avoided. PMID:22707861

  6. New dimensions from statistical graphics for GIS (geographic information system) analysis and interpretation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McCord, R.A.; Olson, R.J.

    1988-01-01

    Environmental research and assessment activities at Oak Ridge National Laboratory (ORNL) include the analysis of spatial and temporal patterns of ecosystem response at a landscape scale. Analysis through use of geographic information system (GIS) involves an interaction between the user and thematic data sets frequently expressed as maps. A portion of GIS analysis has a mathematical or statistical aspect, especially for the analysis of temporal patterns. ARC/INFO is an excellent tool for manipulating GIS data and producing the appropriate map graphics. INFO also has some limited ability to produce statistical tabulation. At ORNL we have extended our capabilities by graphicallymore » interfacing ARC/INFO and SAS/GRAPH to provide a combined mapping and statistical graphics environment. With the data management, statistical, and graphics capabilities of SAS added to ARC/INFO, we have expanded the analytical and graphical dimensions of the GIS environment. Pie or bar charts, frequency curves, hydrographs, or scatter plots as produced by SAS can be added to maps from attribute data associated with ARC/INFO coverages. Numerous, small, simplified graphs can also become a source of complex map ''symbols.'' These additions extend the dimensions of GIS graphics to include time, details of the thematic composition, distribution, and interrelationships. 7 refs., 3 figs.« less

  7. Ambiguities and conflicting results: the limitations of the kappa statistic in establishing the interrater reliability of the Irish nursing minimum data set for mental health: a discussion paper.

    PubMed

    Morris, Roisin; MacNeela, Padraig; Scott, Anne; Treacy, Pearl; Hyde, Abbey; O'Brien, Julian; Lehwaldt, Daniella; Byrne, Anne; Drennan, Jonathan

    2008-04-01

    In a study to establish the interrater reliability of the Irish Nursing Minimum Data Set (I-NMDS) for mental health difficulties relating to the choice of reliability test statistic were encountered. The objective of this paper is to highlight the difficulties associated with testing interrater reliability for an ordinal scale using a relatively homogenous sample and the recommended kw statistic. One pair of mental health nurses completed the I-NMDS for mental health for a total of 30 clients attending a mental health day centre over a two-week period. Data was analysed using the kw and percentage agreement statistics. A total of 34 of the 38 I-NMDS for mental health variables with lower than acceptable levels of kw reliability scores achieved acceptable levels of reliability according to their percentage agreement scores. The study findings implied that, due to the homogeneity of the sample, low variability within the data resulted in the 'base rate problem' associated with the use of kw statistic. Conclusions point to the interpretation of kw in tandem with percentage agreement scores. Suggestions that kw scores were low due to chance agreement and that one should strive to use a study sample with known variability are queried.

  8. Attribute classification for generating GPR facies models

    NASA Astrophysics Data System (ADS)

    Tronicke, Jens; Allroggen, Niklas

    2017-04-01

    Ground-penetrating radar (GPR) is an established geophysical tool to explore near-surface sedimentary environments. It has been successfully used, for example, to reconstruct past depositional environments, to investigate sedimentary processes, to aid hydrogeological investigations, and to assist in hydrocarbon reservoir analog studies. Interpreting such 2D/3D GPR data, usually relies on concepts known as GPR facies analysis, in which GPR facies are defined as units composed of characteristic reflection patterns (in terms of reflection amplitude, continuity, geometry, and internal configuration). The resulting facies models are then interpreted in terms of depositional processes, sedimentary environments, litho-, and hydrofacies. Typically, such GPR facies analyses are implemented in a manual workflow being laborious and rather inefficient especially for 3D data sets. In addition, such a subjective strategy bears the potential of inconsistency because the outcome depends on the expertise and experience of the interpreter. In this presentation, we investigate the feasibility of delineating GPR facies in an objective and largely automated manner. Our proposed workflow relies on a three-step procedure. First, we calculate a variety of geometrical and physical attributes from processed 2D and 3D GPR data sets. Then, we analyze and evaluate this attribute data base (e.g., using statistical tools such as principal component analysis) to reduce its dimensionality and to avoid redundant information, respectively. Finally, we integrate the reduced data base using tools such as composite imaging, cluster analysis, and neural networks. Using field examples that have been acquired across different depositional environments, we demonstrate that the resulting 2D/3D facies models ease and improve the interpretation of GPR data. We conclude that our interpretation strategy allows to generate GPR facies models in a consistent and largely automated manner and might be helpful in variety near-surface applications.

  9. New physicochemical interpretations for the adsorption of food dyes on chitosan films using statistical physics treatment.

    PubMed

    Dotto, G L; Pinto, L A A; Hachicha, M A; Knani, S

    2015-03-15

    In this work, statistical physics treatment was employed to study the adsorption of food dyes onto chitosan films, in order to obtain new physicochemical interpretations at molecular level. Experimental equilibrium curves were obtained for the adsorption of four dyes (FD&C red 2, FD&C yellow 5, FD&C blue 2, Acid Red 51) at different temperatures (298, 313 and 328 K). A statistical physics formula was used to interpret these curves, and the parameters such as, number of adsorbed dye molecules per site (n), anchorage number (n'), receptor sites density (NM), adsorbed quantity at saturation (N asat), steric hindrance (τ), concentration at half saturation (c1/2) and molar adsorption energy (ΔE(a)) were estimated. The relation of the above mentioned parameters with the chemical structure of the dyes and temperature was evaluated and interpreted. Copyright © 2014 Elsevier Ltd. All rights reserved.

  10. Interpreting the gamma statistic in phylogenetic diversification rate studies: a rate decrease does not necessarily indicate an early burst.

    PubMed

    Fordyce, James A

    2010-07-23

    Phylogenetic hypotheses are increasingly being used to elucidate historical patterns of diversification rate-variation. Hypothesis testing is often conducted by comparing the observed vector of branching times to a null, pure-birth expectation. A popular method for inferring a decrease in speciation rate, which might suggest an early burst of diversification followed by a decrease in diversification rate is the gamma statistic. Using simulations under varying conditions, I examine the sensitivity of gamma to the distribution of the most recent branching times. Using an exploratory data analysis tool for lineages through time plots, tree deviation, I identified trees with a significant gamma statistic that do not appear to have the characteristic early accumulation of lineages consistent with an early, rapid rate of cladogenesis. I further investigated the sensitivity of the gamma statistic to recent diversification by examining the consequences of failing to simulate the full time interval following the most recent cladogenic event. The power of gamma to detect rate decrease at varying times was assessed for simulated trees with an initial high rate of diversification followed by a relatively low rate. The gamma statistic is extraordinarily sensitive to recent diversification rates, and does not necessarily detect early bursts of diversification. This was true for trees of various sizes and completeness of taxon sampling. The gamma statistic had greater power to detect recent diversification rate decreases compared to early bursts of diversification. Caution should be exercised when interpreting the gamma statistic as an indication of early, rapid diversification.

  11. Statistics of Low-Mass Companions to Stars: Implications for Their Origin

    NASA Technical Reports Server (NTRS)

    Stepinski, T. F.; Black, D. C.

    2001-01-01

    One of the more significant results from observational astronomy over the past few years has been the detection, primarily via radial velocity studies, of low-mass companions (LMCs) to solar-like stars. The commonly held interpretation of these is that the majority are "extrasolar planets" whereas the rest are brown dwarfs, the distinction made on the basis of apparent discontinuity in the distribution of M sin i for LMCs as revealed by a histogram. We report here results from statistical analysis of M sin i, as well as of the orbital elements data for available LMCs, to rest the assertion that the LMCs population is heterogeneous. The outcome is mixed. Solely on the basis of the distribution of M sin i a heterogeneous model is preferable. Overall, we find that a definitive statement asserting that LMCs population is heterogeneous is, at present, unjustified. In addition we compare statistics of LMCs with a comparable sample of stellar binaries. We find a remarkable statistical similarity between these two populations. This similarity coupled with marked populational dissimilarity between LMCs and acknowledged planets motivates us to suggest a common origin hypothesis for LMCs and stellar binaries as an alternative to the prevailing interpretation. We discuss merits of such a hypothesis and indicate a possible scenario for the formation of LMCs.

  12. Does Training in Table Creation Enhance Table Interpretation? A Quasi-Experimental Study with Follow-Up

    ERIC Educational Resources Information Center

    Karazsia, Bryan T.; Wong, Kendal

    2016-01-01

    Quantitative and statistical literacy are core domains in the undergraduate psychology curriculum. An important component of such literacy includes interpretation of visual aids, such as tables containing results from statistical analyses. This article presents results of a quasi-experimental study with longitudinal follow-up that tested the…

  13. College Students' Interpretation of Research Reports on Group Differences: The Tall-Tale Effect

    ERIC Educational Resources Information Center

    Hogan, Thomas P.; Zaboski, Brian A.; Perry, Tiffany R.

    2015-01-01

    How does the student untrained in advanced statistics interpret results of research that reports a group difference? In two studies, statistically untrained college students were presented with abstracts or professional associations' reports and asked for estimates of scores obtained by the original participants in the studies. These estimates…

  14. A simulation-based evaluation of methods for inferring linear barriers to gene flow

    Treesearch

    Christopher Blair; Dana E. Weigel; Matthew Balazik; Annika T. H. Keeley; Faith M. Walker; Erin Landguth; Sam Cushman; Melanie Murphy; Lisette Waits; Niko Balkenhol

    2012-01-01

    Different analytical techniques used on the same data set may lead to different conclusions about the existence and strength of genetic structure. Therefore, reliable interpretation of the results from different methods depends on the efficacy and reliability of different statistical methods. In this paper, we evaluated the performance of multiple analytical methods to...

  15. DataTrack 6: Blacks and Hispanics in the United States.

    ERIC Educational Resources Information Center

    American Council of Life Insurance, Washington, DC.

    Sixth in a series of reports which compile and interpret statistical information of direct concern to life insurance executives, this report deals with Blacks and Hispanics in the United States. It can be used in the design of new products and services to meet changing consumer needs, the selection of new markets and marketing strategies, the…

  16. Lexicogrammar in the International Construction Industry: A Corpus-Based Case Study of Japanese-Hong-Kongese On-Site Interactions in English

    ERIC Educational Resources Information Center

    Handford, Michael; Matous, Petr

    2011-01-01

    The purpose of this research is to identify and interpret statistically significant lexicogrammatical items that are used in on-site spoken communication in the international construction industry, initially through comparisons with reference corpora of everyday spoken and business language. Several data sources, including audio and video…

  17. Exploring the Micro-Social Geography of Children's Interactions in Preschool: A Long-Term Observational Study and Analysis Using Geographic Information Technologies

    ERIC Educational Resources Information Center

    Torrens, Paul M.; Griffin, William A.

    2013-01-01

    The authors describe an observational and analytic methodology for recording and interpreting dynamic microprocesses that occur during social interaction, making use of space--time data collection techniques, spatial-statistical analysis, and visualization. The scheme has three investigative foci: Structure, Activity Composition, and Clustering.…

  18. Performing Contrast Analysis in Factorial Designs: From NHST to Confidence Intervals and Beyond

    ERIC Educational Resources Information Center

    Wiens, Stefan; Nilsson, Mats E.

    2017-01-01

    Because of the continuing debates about statistics, many researchers may feel confused about how to analyze and interpret data. Current guidelines in psychology advocate the use of effect sizes and confidence intervals (CIs). However, researchers may be unsure about how to extract effect sizes from factorial designs. Contrast analysis is helpful…

  19. Establishing Statistical Equivalence of Data from Different Sampling Approaches for Assessment of Bacterial Phenotypic Antimicrobial Resistance

    PubMed Central

    2018-01-01

    ABSTRACT To assess phenotypic bacterial antimicrobial resistance (AMR) in different strata (e.g., host populations, environmental areas, manure, or sewage effluents) for epidemiological purposes, isolates of target bacteria can be obtained from a stratum using various sample types. Also, different sample processing methods can be applied. The MIC of each target antimicrobial drug for each isolate is measured. Statistical equivalence testing of the MIC data for the isolates allows evaluation of whether different sample types or sample processing methods yield equivalent estimates of the bacterial antimicrobial susceptibility in the stratum. We demonstrate this approach on the antimicrobial susceptibility estimates for (i) nontyphoidal Salmonella spp. from ground or trimmed meat versus cecal content samples of cattle in processing plants in 2013-2014 and (ii) nontyphoidal Salmonella spp. from urine, fecal, and blood human samples in 2015 (U.S. National Antimicrobial Resistance Monitoring System data). We found that the sample types for cattle yielded nonequivalent susceptibility estimates for several antimicrobial drug classes and thus may gauge distinct subpopulations of salmonellae. The quinolone and fluoroquinolone susceptibility estimates for nontyphoidal salmonellae from human blood are nonequivalent to those from urine or feces, conjecturally due to the fluoroquinolone (ciprofloxacin) use to treat infections caused by nontyphoidal salmonellae. We also demonstrate statistical equivalence testing for comparing sample processing methods for fecal samples (culturing one versus multiple aliquots per sample) to assess AMR in fecal Escherichia coli. These methods yield equivalent results, except for tetracyclines. Importantly, statistical equivalence testing provides the MIC difference at which the data from two sample types or sample processing methods differ statistically. Data users (e.g., microbiologists and epidemiologists) may then interpret practical relevance of the difference. IMPORTANCE Bacterial antimicrobial resistance (AMR) needs to be assessed in different populations or strata for the purposes of surveillance and determination of the efficacy of interventions to halt AMR dissemination. To assess phenotypic antimicrobial susceptibility, isolates of target bacteria can be obtained from a stratum using different sample types or employing different sample processing methods in the laboratory. The MIC of each target antimicrobial drug for each of the isolates is measured, yielding the MIC distribution across the isolates from each sample type or sample processing method. We describe statistical equivalence testing for the MIC data for evaluating whether two sample types or sample processing methods yield equivalent estimates of the bacterial phenotypic antimicrobial susceptibility in the stratum. This includes estimating the MIC difference at which the data from the two approaches differ statistically. Data users (e.g., microbiologists, epidemiologists, and public health professionals) can then interpret whether that present difference is practically relevant. PMID:29475868

  20. Establishing Statistical Equivalence of Data from Different Sampling Approaches for Assessment of Bacterial Phenotypic Antimicrobial Resistance.

    PubMed

    Shakeri, Heman; Volkova, Victoriya; Wen, Xuesong; Deters, Andrea; Cull, Charley; Drouillard, James; Müller, Christian; Moradijamei, Behnaz; Jaberi-Douraki, Majid

    2018-05-01

    To assess phenotypic bacterial antimicrobial resistance (AMR) in different strata (e.g., host populations, environmental areas, manure, or sewage effluents) for epidemiological purposes, isolates of target bacteria can be obtained from a stratum using various sample types. Also, different sample processing methods can be applied. The MIC of each target antimicrobial drug for each isolate is measured. Statistical equivalence testing of the MIC data for the isolates allows evaluation of whether different sample types or sample processing methods yield equivalent estimates of the bacterial antimicrobial susceptibility in the stratum. We demonstrate this approach on the antimicrobial susceptibility estimates for (i) nontyphoidal Salmonella spp. from ground or trimmed meat versus cecal content samples of cattle in processing plants in 2013-2014 and (ii) nontyphoidal Salmonella spp. from urine, fecal, and blood human samples in 2015 (U.S. National Antimicrobial Resistance Monitoring System data). We found that the sample types for cattle yielded nonequivalent susceptibility estimates for several antimicrobial drug classes and thus may gauge distinct subpopulations of salmonellae. The quinolone and fluoroquinolone susceptibility estimates for nontyphoidal salmonellae from human blood are nonequivalent to those from urine or feces, conjecturally due to the fluoroquinolone (ciprofloxacin) use to treat infections caused by nontyphoidal salmonellae. We also demonstrate statistical equivalence testing for comparing sample processing methods for fecal samples (culturing one versus multiple aliquots per sample) to assess AMR in fecal Escherichia coli These methods yield equivalent results, except for tetracyclines. Importantly, statistical equivalence testing provides the MIC difference at which the data from two sample types or sample processing methods differ statistically. Data users (e.g., microbiologists and epidemiologists) may then interpret practical relevance of the difference. IMPORTANCE Bacterial antimicrobial resistance (AMR) needs to be assessed in different populations or strata for the purposes of surveillance and determination of the efficacy of interventions to halt AMR dissemination. To assess phenotypic antimicrobial susceptibility, isolates of target bacteria can be obtained from a stratum using different sample types or employing different sample processing methods in the laboratory. The MIC of each target antimicrobial drug for each of the isolates is measured, yielding the MIC distribution across the isolates from each sample type or sample processing method. We describe statistical equivalence testing for the MIC data for evaluating whether two sample types or sample processing methods yield equivalent estimates of the bacterial phenotypic antimicrobial susceptibility in the stratum. This includes estimating the MIC difference at which the data from the two approaches differ statistically. Data users (e.g., microbiologists, epidemiologists, and public health professionals) can then interpret whether that present difference is practically relevant. Copyright © 2018 Shakeri et al.

  1. Statistical evaluation of accelerated stability data obtained at a single temperature. I. Effect of experimental errors in evaluation of stability data obtained.

    PubMed

    Yoshioka, S; Aso, Y; Takeda, Y

    1990-06-01

    Accelerated stability data obtained at a single temperature is statistically evaluated, and the utility of such data for assessment of stability is discussed focussing on the chemical stability of solution-state dosage forms. The probability that the drug content of a product is observed to be within the lower specification limit in the accelerated test is interpreted graphically. This probability depends on experimental errors in the assay and temperature control, as well as the true degradation rate and activation energy. Therefore, the observation that the drug content meets the specification in the accelerated testing can provide only limited information on the shelf-life of the drug, without the knowledge of the activation energy and the accuracy and precision of the assay and temperature control.

  2. Defining window-boundaries for genomic analyses using smoothing spline techniques

    DOE PAGES

    Beissinger, Timothy M.; Rosa, Guilherme J.M.; Kaeppler, Shawn M.; ...

    2015-04-17

    High-density genomic data is often analyzed by combining information over windows of adjacent markers. Interpretation of data grouped in windows versus at individual locations may increase statistical power, simplify computation, reduce sampling noise, and reduce the total number of tests performed. However, use of adjacent marker information can result in over- or under-smoothing, undesirable window boundary specifications, or highly correlated test statistics. We introduce a method for defining windows based on statistically guided breakpoints in the data, as a foundation for the analysis of multiple adjacent data points. This method involves first fitting a cubic smoothing spline to the datamore » and then identifying the inflection points of the fitted spline, which serve as the boundaries of adjacent windows. This technique does not require prior knowledge of linkage disequilibrium, and therefore can be applied to data collected from individual or pooled sequencing experiments. Moreover, in contrast to existing methods, an arbitrary choice of window size is not necessary, since these are determined empirically and allowed to vary along the genome.« less

  3. Recommendations for Clinical Pathology Data Generation, Interpretation, and Reporting in Target Animal Safety Studies for Veterinary Drug Development.

    PubMed

    Siska, William; Gupta, Aradhana; Tomlinson, Lindsay; Tripathi, Niraj; von Beust, Barbara

    Clinical pathology testing is routinely performed in target animal safety studies in order to identify potential toxicity associated with administration of an investigational veterinary pharmaceutical product. Regulatory and other testing guidelines that address such studies provide recommendations for clinical pathology testing but occasionally contain outdated analytes and do not take into account interspecies physiologic differences that affect the practical selection of appropriate clinical pathology tests. Additionally, strong emphasis is often placed on statistical analysis and use of reference intervals for interpretation of test article-related clinical pathology changes, with limited attention given to the critical scientific review of clinically, toxicologically, or biologically relevant changes. The purpose of this communication from the Regulatory Affairs Committee of the American Society for Veterinary Clinical Pathology is to provide current recommendations for clinical pathology testing and data interpretation in target animal safety studies and thereby enhance the value of clinical pathology testing in these studies.

  4. Unlocking interpretation in near infrared multivariate calibrations by orthogonal partial least squares.

    PubMed

    Stenlund, Hans; Johansson, Erik; Gottfries, Johan; Trygg, Johan

    2009-01-01

    Near infrared spectroscopy (NIR) was developed primarily for applications such as the quantitative determination of nutrients in the agricultural and food industries. Examples include the determination of water, protein, and fat within complex samples such as grain and milk. Because of its useful properties, NIR analysis has spread to other areas such as chemistry and pharmaceutical production. NIR spectra consist of infrared overtones and combinations thereof, making interpretation of the results complicated. It can be very difficult to assign peaks to known constituents in the sample. Thus, multivariate analysis (MVA) has been crucial in translating spectral data into information, mainly for predictive purposes. Orthogonal partial least squares (OPLS), a new MVA method, has prediction and modeling properties similar to those of other MVA techniques, e.g., partial least squares (PLS), a method with a long history of use for the analysis of NIR data. OPLS provides an intrinsic algorithmic improvement for the interpretation of NIR data. In this report, four sets of NIR data were analyzed to demonstrate the improved interpretation provided by OPLS. The first two sets included simulated data to demonstrate the overall principles; the third set comprised a statistically replicated design of experiments (DoE), to demonstrate how instrumental difference could be accurately visualized and correctly attributed to Wood's anomaly phenomena; the fourth set was chosen to challenge the MVA by using data relating to powder mixing, a crucial step in the pharmaceutical industry prior to tabletting. Improved interpretation by OPLS was demonstrated for all four examples, as compared to alternative MVA approaches. It is expected that OPLS will be used mostly in applications where improved interpretation is crucial; one such area is process analytical technology (PAT). PAT involves fewer independent samples, i.e., batches, than would be associated with agricultural applications; in addition, the Food and Drug Administration (FDA) demands "process understanding" in PAT. Both these issues make OPLS the ideal tool for a multitude of NIR calibrations. In conclusion, OPLS leads to better interpretation of spectrometry data (e.g., NIR) and improved understanding facilitates cross-scientific communication. Such improved knowledge will decrease risk, with respect to both accuracy and precision, when using NIR for PAT applications.

  5. Data processing, multi-omic pathway mapping, and metabolite activity analysis using XCMS Online

    PubMed Central

    Forsberg, Erica M; Huan, Tao; Rinehart, Duane; Benton, H Paul; Warth, Benedikt; Hilmers, Brian; Siuzdak, Gary

    2018-01-01

    Systems biology is the study of complex living organisms, and as such, analysis on a systems-wide scale involves the collection of information-dense data sets that are representative of an entire phenotype. To uncover dynamic biological mechanisms, bioinformatics tools have become essential to facilitating data interpretation in large-scale analyses. Global metabolomics is one such method for performing systems biology, as metabolites represent the downstream functional products of ongoing biological processes. We have developed XCMS Online, a platform that enables online metabolomics data processing and interpretation. A systems biology workflow recently implemented within XCMS Online enables rapid metabolic pathway mapping using raw metabolomics data for investigating dysregulated metabolic processes. In addition, this platform supports integration of multi-omic (such as genomic and proteomic) data to garner further systems-wide mechanistic insight. Here, we provide an in-depth procedure showing how to effectively navigate and use the systems biology workflow within XCMS Online without a priori knowledge of the platform, including uploading liquid chromatography (LCLC)–mass spectrometry (MS) data from metabolite-extracted biological samples, defining the job parameters to identify features, correcting for retention time deviations, conducting statistical analysis of features between sample classes and performing predictive metabolic pathway analysis. Additional multi-omics data can be uploaded and overlaid with previously identified pathways to enhance systems-wide analysis of the observed dysregulations. We also describe unique visualization tools to assist in elucidation of statistically significant dysregulated metabolic pathways. Parameter input takes 5–10 min, depending on user experience; data processing typically takes 1–3 h, and data analysis takes ~30 min. PMID:29494574

  6. Could the clinical interpretability of subgroups detected using clustering methods be improved by using a novel two-stage approach?

    PubMed

    Kent, Peter; Stochkendahl, Mette Jensen; Christensen, Henrik Wulff; Kongsted, Alice

    2015-01-01

    Recognition of homogeneous subgroups of patients can usefully improve prediction of their outcomes and the targeting of treatment. There are a number of research approaches that have been used to recognise homogeneity in such subgroups and to test their implications. One approach is to use statistical clustering techniques, such as Cluster Analysis or Latent Class Analysis, to detect latent relationships between patient characteristics. Influential patient characteristics can come from diverse domains of health, such as pain, activity limitation, physical impairment, social role participation, psychological factors, biomarkers and imaging. However, such 'whole person' research may result in data-driven subgroups that are complex, difficult to interpret and challenging to recognise clinically. This paper describes a novel approach to applying statistical clustering techniques that may improve the clinical interpretability of derived subgroups and reduce sample size requirements. This approach involves clustering in two sequential stages. The first stage involves clustering within health domains and therefore requires creating as many clustering models as there are health domains in the available data. This first stage produces scoring patterns within each domain. The second stage involves clustering using the scoring patterns from each health domain (from the first stage) to identify subgroups across all domains. We illustrate this using chest pain data from the baseline presentation of 580 patients. The new two-stage clustering resulted in two subgroups that approximated the classic textbook descriptions of musculoskeletal chest pain and atypical angina chest pain. The traditional single-stage clustering resulted in five clusters that were also clinically recognisable but displayed less distinct differences. In this paper, a new approach to using clustering techniques to identify clinically useful subgroups of patients is suggested. Research designs, statistical methods and outcome metrics suitable for performing that testing are also described. This approach has potential benefits but requires broad testing, in multiple patient samples, to determine its clinical value. The usefulness of the approach is likely to be context-specific, depending on the characteristics of the available data and the research question being asked of it.

  7. Presentation approaches for enhancing interpretability of patient-reported outcomes (PROs) in meta-analysis: a protocol for a systematic survey of Cochrane reviews.

    PubMed

    Devji, Tahira; Johnston, Bradley C; Patrick, Donald L; Bhandari, Mohit; Thabane, Lehana; Guyatt, Gordon H

    2017-09-27

    Meta-analyses of clinical trials often provide sufficient information for decision-makers to evaluate whether chance can explain apparent differences between interventions. Interpretation of the magnitude and importance of treatment effects beyond statistical significance can, however, be challenging, particularly for patient-reported outcomes (PROs) measured using questionnaires with which clinicians have limited familiarity. The objectives of our study are to systematically evaluate Cochrane systematic review authors' approaches to calculation, reporting and interpretation of pooled estimates of patient-reported outcome measures (PROMs) in meta-analyses. We will conduct a methodological survey of a random sample of Cochrane systematic reviews published from 1 January 2015 to 1 April 2017 that report at least one statistically significant pooled result for at least one PRO in the abstract. Author pairs will independently review all titles, abstracts and full texts identified by the literature search, and they will extract data using a standardised data extraction form. We will extract the following: year of publication, number of included trials, number of included participants, clinical area, type of intervention(s) and control(s), type of meta-analysis and use of the Grading of Recommendations, Assessment, Development and Evaluation approach to rate the quality of evidence, as well as information regarding the characteristics of PROMs, calculation and presentation of PROM effect estimates and interpretation of PROM effect estimates. We will document and summarise the methods used for the analysis, reporting and interpretation of each summary effect measure. We will summarise categorical variables with frequencies and percentages and continuous outcomes as means and/or medians and associated measures of dispersion. Ethics approval for this study is not required. We will disseminate the results of this review in peer-reviewed publications and conference presentations. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  8. Occupational exposure decisions: can limited data interpretation training help improve accuracy?

    PubMed

    Logan, Perry; Ramachandran, Gurumurthy; Mulhausen, John; Hewett, Paul

    2009-06-01

    Accurate exposure assessments are critical for ensuring that potentially hazardous exposures are properly identified and controlled. The availability and accuracy of exposure assessments can determine whether resources are appropriately allocated to engineering and administrative controls, medical surveillance, personal protective equipment and other programs designed to protect workers. A desktop study was performed using videos, task information and sampling data to evaluate the accuracy and potential bias of participants' exposure judgments. Desktop exposure judgments were obtained from occupational hygienists for material handling jobs with small air sampling data sets (0-8 samples) and without the aid of computers. In addition, data interpretation tests (DITs) were administered to participants where they were asked to estimate the 95th percentile of an underlying log-normal exposure distribution from small data sets. Participants were presented with an exposure data interpretation or rule of thumb training which included a simple set of rules for estimating 95th percentiles for small data sets from a log-normal population. DIT was given to each participant before and after the rule of thumb training. Results of each DIT and qualitative and quantitative exposure judgments were compared with a reference judgment obtained through a Bayesian probabilistic analysis of the sampling data to investigate overall judgment accuracy and bias. There were a total of 4386 participant-task-chemical judgments for all data collections: 552 qualitative judgments made without sampling data and 3834 quantitative judgments with sampling data. The DITs and quantitative judgments were significantly better than random chance and much improved by the rule of thumb training. In addition, the rule of thumb training reduced the amount of bias in the DITs and quantitative judgments. The mean DIT % correct scores increased from 47 to 64% after the rule of thumb training (P < 0.001). The accuracy for quantitative desktop judgments increased from 43 to 63% correct after the rule of thumb training (P < 0.001). The rule of thumb training did not significantly impact accuracy for qualitative desktop judgments. The finding that even some simple statistical rules of thumb improve judgment accuracy significantly suggests that hygienists need to routinely use statistical tools while making exposure judgments using monitoring data.

  9. Seasonal assessment and apportionment of surface water pollution using multivariate statistical methods: Sinos River, southern Brazil.

    PubMed

    Alves, Darlan Daniel; Riegel, Roberta Plangg; de Quevedo, Daniela Müller; Osório, Daniela Montanari Migliavacca; da Costa, Gustavo Marques; do Nascimento, Carlos Augusto; Telöken, Franko

    2018-06-08

    Assessment of surface water quality is an issue of currently high importance, especially in polluted rivers which provide water for treatment and distribution as drinking water, as is the case of the Sinos River, southern Brazil. Multivariate statistical techniques allow a better understanding of the seasonal variations in water quality, as well as the source identification and source apportionment of water pollution. In this study, the multivariate statistical techniques of cluster analysis (CA), principal component analysis (PCA), and positive matrix factorization (PMF) were used, along with the Kruskal-Wallis test and Spearman's correlation analysis in order to interpret a water quality data set resulting from a monitoring program conducted over a period of almost two years (May 2013 to April 2015). The water samples were collected from the raw water inlet of the municipal water treatment plant (WTP) operated by the Water and Sewage Services of Novo Hamburgo (COMUSA). CA allowed the data to be grouped into three periods (autumn and summer (AUT-SUM); winter (WIN); spring (SPR)). Through the PCA, it was possible to identify that the most important parameters in contribution to water quality variations are total coliforms (TCOLI) in SUM-AUT, water level (WL), water temperature (WT), and electrical conductivity (EC) in WIN and color (COLOR) and turbidity (TURB) in SPR. PMF was applied to the complete data set and enabled the source apportionment water pollution through three factors, which are related to anthropogenic sources, such as the discharge of domestic sewage (mostly represented by Escherichia coli (ECOLI)), industrial wastewaters, and agriculture runoff. The results provided by this study demonstrate the contribution provided by the use of integrated statistical techniques in the interpretation and understanding of large data sets of water quality, showing also that this approach can be used as an efficient methodology to optimize indicators for water quality assessment.

  10. A Practical Guide to Check the Consistency of Item Response Patterns in Clinical Research Through Person-Fit Statistics: Examples and a Computer Program.

    PubMed

    Meijer, Rob R; Niessen, A Susan M; Tendeiro, Jorge N

    2016-02-01

    Although there are many studies devoted to person-fit statistics to detect inconsistent item score patterns, most studies are difficult to understand for nonspecialists. The aim of this tutorial is to explain the principles of these statistics for researchers and clinicians who are interested in applying these statistics. In particular, we first explain how invalid test scores can be detected using person-fit statistics; second, we provide the reader practical examples of existing studies that used person-fit statistics to detect and to interpret inconsistent item score patterns; and third, we discuss a new R-package that can be used to identify and interpret inconsistent score patterns. © The Author(s) 2015.

  11. [The principal components analysis--method to classify the statistical variables with applications in medicine].

    PubMed

    Dascălu, Cristina Gena; Antohe, Magda Ecaterina

    2009-01-01

    Based on the eigenvalues and the eigenvectors analysis, the principal component analysis has the purpose to identify the subspace of the main components from a set of parameters, which are enough to characterize the whole set of parameters. Interpreting the data for analysis as a cloud of points, we find through geometrical transformations the directions where the cloud's dispersion is maximal--the lines that pass through the cloud's center of weight and have a maximal density of points around them (by defining an appropriate criteria function and its minimization. This method can be successfully used in order to simplify the statistical analysis on questionnaires--because it helps us to select from a set of items only the most relevant ones, which cover the variations of the whole set of data. For instance, in the presented sample we started from a questionnaire with 28 items and, applying the principal component analysis we identified 7 principal components--or main items--fact that simplifies significantly the further data statistical analysis.

  12. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kreuzer-Martin, Helen W.; Hegg, Eric L.

    The use of isotopic signatures for forensic analysis of biological materials is well-established, and the same general principles that apply to interpretation of stable isotope content of C, N, O, and H apply to the analysis of microorganisms. Heterotrophic microorganisms derive their isotopic content from their growth substrates, which are largely plant and animal products, and the water in their culture medium. Thus the isotope signatures of microbes are tied to their growth environment. The C, N, O, and H isotope ratios of spores have been demonstrated to constitute highly discriminating signatures for sample matching. They can rule out specificmore » samples of media and/or water as possible production media, and can predict isotope ratio ranges of the culture media and water used to produce a given sample. These applications have been developed and tested through analyses of approximately 250 samples of Bacillus subtilis spores and over 500 samples of culture media, providing a strong statistical basis for data interpretation. A Bayesian statistical framework for integrating stable isotope data with other types of signatures derived from microorganisms has been able to characterize the culture medium used to produce spores of various Bacillus species, leveraging isotopic differences in different medium types and demonstrating the power of data integration for forensic investigations.« less

  13. Application of Linear Mixed-Effects Models in Human Neuroscience Research: A Comparison with Pearson Correlation in Two Auditory Electrophysiology Studies.

    PubMed

    Koerner, Tess K; Zhang, Yang

    2017-02-27

    Neurophysiological studies are often designed to examine relationships between measures from different testing conditions, time points, or analysis techniques within the same group of participants. Appropriate statistical techniques that can take into account repeated measures and multivariate predictor variables are integral and essential to successful data analysis and interpretation. This work implements and compares conventional Pearson correlations and linear mixed-effects (LME) regression models using data from two recently published auditory electrophysiology studies. For the specific research questions in both studies, the Pearson correlation test is inappropriate for determining strengths between the behavioral responses for speech-in-noise recognition and the multiple neurophysiological measures as the neural responses across listening conditions were simply treated as independent measures. In contrast, the LME models allow a systematic approach to incorporate both fixed-effect and random-effect terms to deal with the categorical grouping factor of listening conditions, between-subject baseline differences in the multiple measures, and the correlational structure among the predictor variables. Together, the comparative data demonstrate the advantages as well as the necessity to apply mixed-effects models to properly account for the built-in relationships among the multiple predictor variables, which has important implications for proper statistical modeling and interpretation of human behavior in terms of neural correlates and biomarkers.

  14. US Geological Survey nutrient preservation experiment : experimental design, statistical analysis, and interpretation of analytical results

    USGS Publications Warehouse

    Patton, Charles J.; Gilroy, Edward J.

    1999-01-01

    Data on which this report is based, including nutrient concentrations in synthetic reference samples determined concurrently with those in real samples, are extensive (greater than 20,000 determinations) and have been published separately. In addition to confirming the well-documented instability of nitrite in acidified samples, this study also demonstrates that when biota are removed from samples at collection sites by 0.45-micrometer membrane filtration, subsequent preservation with sulfuric acid or mercury (II) provides no statistically significant improvement in nutrient concentration stability during storage at 4 degrees Celsius for 30 days. Biocide preservation had no statistically significant effect on the 30-day stability of phosphorus concentrations in whole-water splits from any of the 15 stations, but did stabilize Kjeldahl nitrogen concentrations in whole-water splits from three data-collection stations where ammonium accounted for at least half of the measured Kjeldahl nitrogen.

  15. The use of analysis of variance procedures in biological studies

    USGS Publications Warehouse

    Williams, B.K.

    1987-01-01

    The analysis of variance (ANOVA) is widely used in biological studies, yet there remains considerable confusion among researchers about the interpretation of hypotheses being tested. Ambiguities arise when statistical designs are unbalanced, and in particular when not all combinations of design factors are represented in the data. This paper clarifies the relationship among hypothesis testing, statistical modelling and computing procedures in ANOVA for unbalanced data. A simple two-factor fixed effects design is used to illustrate three common parametrizations for ANOVA models, and some associations among these parametrizations are developed. Biologically meaningful hypotheses for main effects and interactions are given in terms of each parametrization, and procedures for testing the hypotheses are described. The standard statistical computing procedures in ANOVA are given along with their corresponding hypotheses. Throughout the development unbalanced designs are assumed and attention is given to problems that arise with missing cells.

  16. Application of multivariate statistical techniques for differentiation of ripe banana flour based on the composition of elements.

    PubMed

    Alkarkhi, Abbas F M; Ramli, Saifullah Bin; Easa, Azhar Mat

    2009-01-01

    Major (sodium, potassium, calcium, magnesium) and minor elements (iron, copper, zinc, manganese) and one heavy metal (lead) of Cavendish banana flour and Dream banana flour were determined, and data were analyzed using multivariate statistical techniques of factor analysis and discriminant analysis. Factor analysis yielded four factors explaining more than 81% of the total variance: the first factor explained 28.73%, comprising magnesium, sodium, and iron; the second factor explained 21.47%, comprising only manganese and copper; the third factor explained 15.66%, comprising zinc and lead; while the fourth factor explained 15.50%, comprising potassium. Discriminant analysis showed that magnesium and sodium exhibited a strong contribution in discriminating the two types of banana flour, affording 100% correct assignation. This study presents the usefulness of multivariate statistical techniques for analysis and interpretation of complex mineral content data from banana flour of different varieties.

  17. Cancer survival: an overview of measures, uses, and interpretation.

    PubMed

    Mariotto, Angela B; Noone, Anne-Michelle; Howlader, Nadia; Cho, Hyunsoon; Keel, Gretchen E; Garshell, Jessica; Woloshin, Steven; Schwartz, Lisa M

    2014-11-01

    Survival statistics are of great interest to patients, clinicians, researchers, and policy makers. Although seemingly simple, survival can be confusing: there are many different survival measures with a plethora of names and statistical methods developed to answer different questions. This paper aims to describe and disseminate different survival measures and their interpretation in less technical language. In addition, we introduce templates to summarize cancer survival statistic organized by their specific purpose: research and policy versus prognosis and clinical decision making. Published by Oxford University Press 2014.

  18. Cancer Survival: An Overview of Measures, Uses, and Interpretation

    PubMed Central

    Noone, Anne-Michelle; Howlader, Nadia; Cho, Hyunsoon; Keel, Gretchen E.; Garshell, Jessica; Woloshin, Steven; Schwartz, Lisa M.

    2014-01-01

    Survival statistics are of great interest to patients, clinicians, researchers, and policy makers. Although seemingly simple, survival can be confusing: there are many different survival measures with a plethora of names and statistical methods developed to answer different questions. This paper aims to describe and disseminate different survival measures and their interpretation in less technical language. In addition, we introduce templates to summarize cancer survival statistic organized by their specific purpose: research and policy versus prognosis and clinical decision making. PMID:25417231

  19. Material Phase Causality or a Dynamics-Statistical Interpretation of Quantum Mechanics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Koprinkov, I. G.

    2010-11-25

    The internal phase dynamics of a quantum system interacting with an electromagnetic field is revealed in details. Theoretical and experimental evidences of a causal relation of the phase of the wave function to the dynamics of the quantum system are presented sistematically for the first time. A dynamics-statistical interpretation of the quantum mechanics is introduced.

  20. Uses and Misuses of Student Evaluations of Teaching: The Interpretation of Differences in Teaching Evaluation Means Irrespective of Statistical Information

    ERIC Educational Resources Information Center

    Boysen, Guy A.

    2015-01-01

    Student evaluations of teaching are among the most accepted and important indicators of college teachers' performance. However, faculty and administrators can overinterpret small variations in mean teaching evaluations. The current research examined the effect of including statistical information on the interpretation of teaching evaluations.…

  1. A tutorial in displaying mass spectrometry-based proteomic data using heat maps.

    PubMed

    Key, Melissa

    2012-01-01

    Data visualization plays a critical role in interpreting experimental results of proteomic experiments. Heat maps are particularly useful for this task, as they allow us to find quantitative patterns across proteins and biological samples simultaneously. The quality of a heat map can be vastly improved by understanding the options available to display and organize the data in the heat map. This tutorial illustrates how to optimize heat maps for proteomics data by incorporating known characteristics of the data into the image. First, the concepts used to guide the creating of heat maps are demonstrated. Then, these concepts are applied to two types of analysis: visualizing spectral features across biological samples, and presenting the results of tests of statistical significance. For all examples we provide details of computer code in the open-source statistical programming language R, which can be used for biologists and clinicians with little statistical background. Heat maps are a useful tool for presenting quantitative proteomic data organized in a matrix format. Understanding and optimizing the parameters used to create the heat map can vastly improve both the appearance and the interoperation of heat map data.

  2. A study of statistics anxiety levels of graduate dental hygiene students.

    PubMed

    Welch, Paul S; Jacks, Mary E; Smiley, Lynn A; Walden, Carolyn E; Clark, William D; Nguyen, Carol A

    2015-02-01

    In light of increased emphasis on evidence-based practice in the profession of dental hygiene, it is important that today's dental hygienist comprehend statistical measures to fully understand research articles, and thereby apply scientific evidence to practice. Therefore, the purpose of this study was to investigate statistics anxiety among graduate dental hygiene students in the U.S. A web-based self-report, anonymous survey was emailed to directors of 17 MSDH programs in the U.S. with a request to distribute to graduate students. The survey collected data on statistics anxiety, sociodemographic characteristics and evidence-based practice. Statistic anxiety was assessed using the Statistical Anxiety Rating Scale. Study significance level was α=0.05. Only 8 of the 17 invited programs participated in the study. Statistical Anxiety Rating Scale data revealed graduate dental hygiene students experience low to moderate levels of statistics anxiety. Specifically, the level of anxiety on the Interpretation Anxiety factor indicated this population could struggle with making sense of scientific research. A decisive majority (92%) of students indicated statistics is essential for evidence-based practice and should be a required course for all dental hygienists. This study served to identify statistics anxiety in a previously unexplored population. The findings should be useful in both theory building and in practical applications. Furthermore, the results can be used to direct future research. Copyright © 2015 The American Dental Hygienists’ Association.

  3. A consistent framework for Horton regression statistics that leads to a modified Hack's law

    USGS Publications Warehouse

    Furey, P.R.; Troutman, B.M.

    2008-01-01

    A statistical framework is introduced that resolves important problems with the interpretation and use of traditional Horton regression statistics. The framework is based on a univariate regression model that leads to an alternative expression for Horton ratio, connects Horton regression statistics to distributional simple scaling, and improves the accuracy in estimating Horton plot parameters. The model is used to examine data for drainage area A and mainstream length L from two groups of basins located in different physiographic settings. Results show that confidence intervals for the Horton plot regression statistics are quite wide. Nonetheless, an analysis of covariance shows that regression intercepts, but not regression slopes, can be used to distinguish between basin groups. The univariate model is generalized to include n > 1 dependent variables. For the case where the dependent variables represent ln A and ln L, the generalized model performs somewhat better at distinguishing between basin groups than two separate univariate models. The generalized model leads to a modification of Hack's law where L depends on both A and Strahler order ??. Data show that ?? plays a statistically significant role in the modified Hack's law expression. ?? 2008 Elsevier B.V.

  4. Spatio-temporal Outlier Detection in Precipitation Data

    NASA Astrophysics Data System (ADS)

    Wu, Elizabeth; Liu, Wei; Chawla, Sanjay

    The detection of outliers from spatio-temporal data is an important task due to the increasing amount of spatio-temporal data available and the need to understand and interpret it. Due to the limitations of current data mining techniques, new techniques to handle this data need to be developed. We propose a spatio-temporal outlier detection algorithm called Outstretch, which discovers the outlier movement patterns of the top-k spatial outliers over several time periods. The top-k spatial outliers are found using the Exact-Grid Top- k and Approx-Grid Top- k algorithms, which are an extension of algorithms developed by Agarwal et al. [1]. Since they use the Kulldorff spatial scan statistic, they are capable of discovering all outliers, unaffected by neighbouring regions that may contain missing values. After generating the outlier sequences, we show one way they can be interpreted, by comparing them to the phases of the El Niño Southern Oscilliation (ENSO) weather phenomenon to provide a meaningful analysis of the results.

  5. Interpreting forest and grassland biome productivity utilizing nested scales of image resolution and biogeographical analysis

    NASA Technical Reports Server (NTRS)

    Iverson, L. R.; Olson, J. S.; Risser, P. G.; Treworgy, C.; Frank, T.; Cook, E.; Ke, Y.

    1986-01-01

    Data acquisition, initial site characterization, image and geographic information methods available, and brief evaluations of first-year for NASA's Thematic Mapper (TM) working group are presented. The TM and other spectral data are examined in order to relate local, intensive ecosystem research findings to estimates of carbon cycling rates over wide geographic regions. The effort is to span environments ranging from dry to moist climates and from good to poor site quality using the TM capability, with and without the inclusion of geographic information system (GIS) data, and thus to interpret the local spatial pattern of factors conditioning biomass or productivity. Twenty-eight TM data sets were acquired, archived, and evaluated. The ERDAS image processing and GIS system were installed on the microcomputer (PC-AT) and its capabilities are being investigated. The TM coverage of seven study areas were exported via ELAS software on the Prime to the ERDAS system. Statistical analysis procedures to be used on the spectral data are being identified.

  6. Successful water quality monitoring: The right combination of intent, measurement, interpretation, and a cooperating ecosystem

    USGS Publications Warehouse

    Soballe, D.M.

    1998-01-01

    Water quality monitoring is invaluable to ensure compliance with regulations, detect trends or patterns, and advance ecological understanding. However, monitoring typically measures only a few characteristics in a small fraction of a large and complex system, and thus the information contained in monitoring data depends upon which features of the ecosystem are actually captured by the measurements. Difficulties arise when these data contain something other than intended, but this can be minimized if the purpose of the sampling is clear, and the sampling design, measurements, and data interpretations are all compatible with this purpose. The monitoring program and data interpretation must also be properly matched to the structure and functioning of the system. Obtaining this match is sometimes an iterative process that demands a close link between research and monitoring. This paper focuses on water quality monitoring that is intended to track trends in aquatic resources and advance ecological understanding. It includes examples from three monitoring programs and a simulation exercise that illustrate problems that arise when the information content of monitoring data differs from expectation. The examples show (1) how inconsistencies among, or lack of information about, the basic elements of a monitoring program (intent, design, measurement, interpretation, and the monitored system) can produce a systematic difference (bias) between monitoring measurements and sampling intent or interpretation, and (2) that bias is not just a statistical consideration, but an insidious problem that can undermine the scientific integrity of a monitoring program. Some general suggestions are provided and hopefully these examples will help those engaged in water quality monitoring to enhance and protect the value of their monitoring investment.

  7. Arkansas StreamStats: a U.S. Geological Survey web map application for basin characteristics and streamflow statistics

    USGS Publications Warehouse

    Pugh, Aaron L.

    2014-01-01

    Users of streamflow information often require streamflow statistics and basin characteristics at various locations along a stream. The USGS periodically calculates and publishes streamflow statistics and basin characteristics for streamflowgaging stations and partial-record stations, but these data commonly are scattered among many reports that may or may not be readily available to the public. The USGS also provides and periodically updates regional analyses of streamflow statistics that include regression equations and other prediction methods for estimating statistics for ungaged and unregulated streams across the State. Use of these regional predictions for a stream can be complex and often requires the user to determine a number of basin characteristics that may require interpretation. Basin characteristics may include drainage area, classifiers for physical properties, climatic characteristics, and other inputs. Obtaining these input values for gaged and ungaged locations traditionally has been time consuming, subjective, and can lead to inconsistent results.

  8. Land-cover change in the Lower Mississippi Valley, 1973-2000

    USGS Publications Warehouse

    Karstensen, Krista A.; Sayler, Kristi L.

    2009-01-01

    The Land Cover Trends is a research project focused on understanding the rates, trends, causes, and consequences of contemporary United States land-use and land-cover change. The project is coordinated by the Geographic Analysis and Monitoring Program of the U.S. Geological Survey (USGS) in conjunction with the U.S. Environmental Protection Agency (EPA) and the National Aeronautics and Space Administration (NASA). Using the EPA Level III ecoregions as the geographic framework, scientists process geospatial data collected between 1973 and 2000 were processed to characterize ecosystem responses to land-use changes. The 27-year study period was divided into four temporal periods: 1973 to1980, 1980 to 1986, 1986 to 1992, 1992 to 2000 and overall from 1973 to 2000. General land-cover classes for these periods were interpreted from Landsat Multispectral Scanner, Thematic Mapper, and Enhanced Thematic Mapper Plus imagery to categorize and evaluate land-cover change using a modified Anderson Land Use Land Cover Classification System (Anderson and others, 1976) for image interpretation.The rates of land-cover change were estimated using a stratified, random sampling of 10-kilometer (km) by 10-km blocks allocated within each ecoregion. For each sample block, satellite images were used to interpret land-cover change. The sample block data then were incorporated into statistical analyses to generate an overall change matrix for the ecoregion. These change statistics are applicable for different levels of scale, including total change for the individual sample blocks and change estimates for the entire ecoregion.

  9. Impact of Immediate Interpretation of Screening Tomosynthesis Mammography on Performance Metrics.

    PubMed

    Winkler, Nicole S; Freer, Phoebe; Anzai, Yoshimi; Hu, Nan; Stein, Matthew

    2018-05-07

    This study aimed to compare performance metrics for immediate and delayed batch interpretation of screening tomosynthesis mammograms. This HIPAA compliant study was approved by institutional review board with a waiver of consent. A retrospective analysis of screening performance metrics for tomosynthesis mammograms interpreted in 2015 when mammograms were read immediately was compared to historical controls from 2013 to 2014 when mammograms were batch interpreted after the patient had departed. A total of 5518 screening tomosynthesis mammograms (n = 1212 for batch interpretation and n = 4306 for immediate interpretation) were evaluated. The larger sample size for the latter group reflects a group practice shift to performing tomosynthesis for the majority of patients. Age, breast density, comparison examinations, and high-risk status were compared. An asymptotic proportion test and multivariable analysis were used to compare performance metrics. There was no statistically significant difference in recall or cancer detection rates for the batch interpretation group compared to immediate interpretation group with respective recall rate of 6.5% vs 5.3% = +1.2% (95% confidence interval -0.3 to 2.7%; P = .101) and cancer detection rate of 6.6 vs 7.2 per thousand = -0.6 (95% confidence interval -5.9 to 4.6; P = .825). There was no statistically significant difference in positive predictive values (PPVs) including PPV1 (screening recall), PPV2 (biopsy recommendation), or PPV 3 (biopsy performed) with batch interpretation (10.1%, 42.1%, and 40.0%, respectively) and immediate interpretation (13.6%, 39.2%, and 39.7%, respectively). After adjusting for age, breast density, high-risk status, and comparison mammogram, there was no difference in the odds of being recalled or cancer detection between the two groups. There is no statistically significant difference in interpretation performance metrics for screening tomosynthesis mammograms interpreted immediately compared to those interpreted in a delayed fashion. Copyright © 2018. Published by Elsevier Inc.

  10. Regional tectonic evaluation of the Tuscan Apenine, vulcanism, thermal anomalies and the relation to structural units

    NASA Technical Reports Server (NTRS)

    Bodechtel, J. (Principal Investigator)

    1975-01-01

    The author has identified the following significant results. The geological interpretation on data exhibiting the Italian peninsula led to the recognition of tectonic features which are explained by a clockwise rotation of various blocks along left-handed transform faults. These faults can be interpreted as resulting from shear due to main stress directed north-eastwards. A land use map of the mountainous regions of Italy was produced on a scale of 1:250,000. For the digital treatment of MSS-CCTs an image processing software was written in FORTRAN 4. The software package includes descriptive statistics and also classification algorithms.

  11. Anima: Modular Workflow System for Comprehensive Image Data Analysis

    PubMed Central

    Rantanen, Ville; Valori, Miko; Hautaniemi, Sampsa

    2014-01-01

    Modern microscopes produce vast amounts of image data, and computational methods are needed to analyze and interpret these data. Furthermore, a single image analysis project may require tens or hundreds of analysis steps starting from data import and pre-processing to segmentation and statistical analysis; and ending with visualization and reporting. To manage such large-scale image data analysis projects, we present here a modular workflow system called Anima. Anima is designed for comprehensive and efficient image data analysis development, and it contains several features that are crucial in high-throughput image data analysis: programing language independence, batch processing, easily customized data processing, interoperability with other software via application programing interfaces, and advanced multivariate statistical analysis. The utility of Anima is shown with two case studies focusing on testing different algorithms developed in different imaging platforms and an automated prediction of alive/dead C. elegans worms by integrating several analysis environments. Anima is a fully open source and available with documentation at www.anduril.org/anima. PMID:25126541

  12. Interpreting biomarker data from the COPHES/DEMOCOPHES twin projects: Using external exposure data to understand biomarker differences among countries

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Smolders, R., E-mail: roel.smolders@vito.be; Den Hond, E.; Koppen, G.

    In 2011 and 2012, the COPHES/DEMOCOPHES twin projects performed the first ever harmonized human biomonitoring survey in 17 European countries. In more than 1800 mother–child pairs, individual lifestyle data were collected and cadmium, cotinine and certain phthalate metabolites were measured in urine. Total mercury was determined in hair samples. While the main goal of the COPHES/DEMOCOPHES twin projects was to develop and test harmonized protocols and procedures, the goal of the current paper is to investigate whether the observed differences in biomarker values among the countries implementing DEMOCOPHES can be interpreted using information from external databases on environmental quality andmore » lifestyle. In general, 13 countries having implemented DEMOCOPHES provided high-quality data from external sources that were relevant for interpretation purposes. However, some data were not available for reporting or were not in line with predefined specifications. Therefore, only part of the external information could be included in the statistical analyses. Nonetheless, there was a highly significant correlation between national levels of fish consumption and mercury in hair, the strength of antismoking legislation was significantly related to urinary cotinine levels, and we were able to show indications that also urinary cadmium levels were associated with environmental quality and food quality. These results again show the potential of biomonitoring data to provide added value for (the evaluation of) evidence-informed policy making. - Highlights: • External data was collected to interpret HBM data from DEMOCOPHES. • Hg in hair could be related to fish consumption across different countries. • Urinary cotinine was related to strictness of anti-smoking legislation. • Urinary Cd was borderline significantly related to air and food quality. • Lack of comparable data among countries hampered the analysis.« less

  13. A Laboratory Course for Teaching Laboratory Techniques, Experimental Design, Statistical Analysis, and Peer Review Process to Undergraduate Science Students

    ERIC Educational Resources Information Center

    Gliddon, C. M.; Rosengren, R. J.

    2012-01-01

    This article describes a 13-week laboratory course called Human Toxicology taught at the University of Otago, New Zealand. This course used a guided inquiry based laboratory coupled with formative assessment and collaborative learning to develop in undergraduate students the skills of problem solving/critical thinking, data interpretation and…

  14. Descriptive statistics of tree crown condition in the Southern United States and impacts on data analysis and interpretation

    Treesearch

    KaDonna C. Randolph

    2006-01-01

    The U.S. Department of Agriculture Forest Service, Forest Inventory and Analysis Program (FIA) utilizes visual assessments of tree crown condition to monitor changes and trends in forest health. This report describes and discusses distributions of three FIA crown condition indicators (crown density, crown dieback, and foliage transparency) for trees in the Southern...

  15. An Alternative to the 3PL: Using Asymmetric Item Characteristic Curves to Address Guessing Effects

    ERIC Educational Resources Information Center

    Lee, Sora; Bolt, Daniel M.

    2018-01-01

    Both the statistical and interpretational shortcomings of the three-parameter logistic (3PL) model in accommodating guessing effects on multiple-choice items are well documented. We consider the use of a residual heteroscedasticity (RH) model as an alternative, and compare its performance to the 3PL with real test data sets and through simulation…

  16. 20 Facts on Women Workers. Fact Sheet No. 88-2.

    ERIC Educational Resources Information Center

    Women's Bureau (DOL), Washington, DC.

    This fact sheet lists 20 interpreted statistics on women workers. The facts cover the following data: number of women workers and their percentage in the labor force; length of time women are expected to stay in the labor force; racial and ethnic groups in the labor force; part-time and full-time employment; types of occupations in which women are…

  17. The p-Value You Can't Buy.

    PubMed

    Demidenko, Eugene

    2016-01-02

    There is growing frustration with the concept of the p -value. Besides having an ambiguous interpretation, the p- value can be made as small as desired by increasing the sample size, n . The p -value is outdated and does not make sense with big data: Everything becomes statistically significant. The root of the problem with the p- value is in the mean comparison. We argue that statistical uncertainty should be measured on the individual, not the group, level. Consequently, standard deviation (SD), not standard error (SE), error bars should be used to graphically present the data on two groups. We introduce a new measure based on the discrimination of individuals/objects from two groups, and call it the D -value. The D -value can be viewed as the n -of-1 p -value because it is computed in the same way as p while letting n equal 1. We show how the D -value is related to discrimination probability and the area above the receiver operating characteristic (ROC) curve. The D -value has a clear interpretation as the proportion of patients who get worse after the treatment, and as such facilitates to weigh up the likelihood of events under different scenarios. [Received January 2015. Revised June 2015.].

  18. Use of statistical analysis to validate ecogenotoxicology findings arising from various comet assay components.

    PubMed

    Hussain, Bilal; Sultana, Tayyaba; Sultana, Salma; Al-Ghanim, Khalid Abdullah; Masoud, Muhammad Shahreef; Mahboob, Shahid

    2018-04-01

    Cirrhinus mrigala, Labeo rohita, and Catla catla are economically important fish for human consumption in Pakistan, but industrial and sewage pollution has drastically reduced their population in the River Chenab. Statistics are an important tool to analyze and interpret comet assay results. The specific aims of the study were to determine the DNA damage in Cirrhinus mrigala, Labeo rohita, and Catla catla due to chemical pollution and to assess the validity of statistical analyses to determine the viability of the comet assay for a possible use with these freshwater fish species as a good indicator of pollution load and habitat degradation. Comet assay results indicated a significant (P < 0.05) degree of DNA fragmentation in Cirrhinus mrigala followed by Labeo rohita and Catla catla in respect to comet head diameter, comet tail length, and % DNA damage. Regression analysis and correlation matrices conducted among the parameters of the comet assay affirmed the precision and the legitimacy of the results. The present study, therefore, strongly recommends that genotoxicological studies conduct appropriate analysis of the various components of comet assays to offer better interpretation of the assay data.

  19. Targeting change: Assessing a faculty learning community focused on increasing statistics content in life science curricula.

    PubMed

    Parker, Loran Carleton; Gleichsner, Alyssa M; Adedokun, Omolola A; Forney, James

    2016-11-12

    Transformation of research in all biological fields necessitates the design, analysis and, interpretation of large data sets. Preparing students with the requisite skills in experimental design, statistical analysis, and interpretation, and mathematical reasoning will require both curricular reform and faculty who are willing and able to integrate mathematical and statistical concepts into their life science courses. A new Faculty Learning Community (FLC) was constituted each year for four years to assist in the transformation of the life sciences curriculum and faculty at a large, Midwestern research university. Participants were interviewed after participation and surveyed before and after participation to assess the impact of the FLC on their attitudes toward teaching, perceived pedagogical skills, and planned teaching practice. Overall, the FLC had a meaningful positive impact on participants' attitudes toward teaching, knowledge about teaching, and perceived pedagogical skills. Interestingly, confidence for viewing the classroom as a site for research about teaching declined. Implications for the creation and development of FLCs for science faculty are discussed. © 2016 by The International Union of Biochemistry and Molecular Biology, 44(6):517-525, 2016. © 2016 The International Union of Biochemistry and Molecular Biology.

  20. Toward improved analysis of concentration data: Embracing nondetects.

    PubMed

    Shoari, Niloofar; Dubé, Jean-Sébastien

    2018-03-01

    Various statistical tests on concentration data serve to support decision-making regarding characterization and monitoring of contaminated media, assessing exposure to a chemical, and quantifying the associated risks. However, the routine statistical protocols cannot be directly applied because of challenges arising from nondetects or left-censored observations, which are concentration measurements below the detection limit of measuring instruments. Despite the existence of techniques based on survival analysis that can adjust for nondetects, these are seldom taken into account properly. A comprehensive review of the literature showed that managing policies regarding analysis of censored data do not always agree and that guidance from regulatory agencies may be outdated. Therefore, researchers and practitioners commonly resort to the most convenient way of tackling the censored data problem by substituting nondetects with arbitrary constants prior to data analysis, although this is generally regarded as a bias-prone approach. Hoping to improve the interpretation of concentration data, the present article aims to familiarize researchers in different disciplines with the significance of left-censored observations and provides theoretical and computational recommendations (under both frequentist and Bayesian frameworks) for adequate analysis of censored data. In particular, the present article synthesizes key findings from previous research with respect to 3 noteworthy aspects of inferential statistics: estimation of descriptive statistics, hypothesis testing, and regression analysis. Environ Toxicol Chem 2018;37:643-656. © 2017 SETAC. © 2017 SETAC.

  1. A Comparative Evaluation of Mixed Dentition Analysis on Reliability of Cone Beam Computed Tomography Image Compared to Plaster Model

    PubMed Central

    Gowd, Snigdha; Shankar, T; Dash, Samarendra; Sahoo, Nivedita; Chatterjee, Suravi; Mohanty, Pritam

    2017-01-01

    Aims and Objective: The aim of the study was to evaluate the reliability of cone beam computed tomography (CBCT) obtained image over plaster model for the assessment of mixed dentition analysis. Materials and Methods: Thirty CBCT-derived images and thirty plaster models were derived from the dental archives, and Moyer's and Tanaka-Johnston analyses were performed. The data obtained were interpreted and analyzed statistically using SPSS 10.0/PC (SPSS Inc., Chicago, IL, USA). Descriptive and analytical analysis along with Student's t-test was performed to qualitatively evaluate the data and P < 0.05 was considered statistically significant. Results: Statistically, significant results were obtained on data comparison between CBCT-derived images and plaster model; the mean for Moyer's analysis in the left and right lower arch for CBCT and plaster model was 21.2 mm, 21.1 mm and 22.5 mm, 22.5 mm, respectively. Conclusion: CBCT-derived images were less reliable as compared to data obtained directly from plaster model for mixed dentition analysis. PMID:28852639

  2. Antibiotics in Animal Products

    NASA Astrophysics Data System (ADS)

    Falcão, Amílcar C.

    The administration of antibiotics to animals to prevent or treat diseases led us to be concerned about the impact of these antibiotics on human health. In fact, animal products could be a potential vehicle to transfer drugs to humans. Using appropri ated mathematical and statistical models, one can predict the kinetic profile of drugs and their metabolites and, consequently, develop preventive procedures regarding drug transmission (i.e., determination of appropriate withdrawal periods). Nevertheless, in the present chapter the mathematical and statistical concepts for data interpretation are strictly given to allow understanding of some basic pharma-cokinetic principles and to illustrate the determination of withdrawal periods

  3. Interpretation of correlations in clinical research.

    PubMed

    Hung, Man; Bounsanga, Jerry; Voss, Maren Wright

    2017-11-01

    Critically analyzing research is a key skill in evidence-based practice and requires knowledge of research methods, results interpretation, and applications, all of which rely on a foundation based in statistics. Evidence-based practice makes high demands on trained medical professionals to interpret an ever-expanding array of research evidence. As clinical training emphasizes medical care rather than statistics, it is useful to review the basics of statistical methods and what they mean for interpreting clinical studies. We reviewed the basic concepts of correlational associations, violations of normality, unobserved variable bias, sample size, and alpha inflation. The foundations of causal inference were discussed and sound statistical analyses were examined. We discuss four ways in which correlational analysis is misused, including causal inference overreach, over-reliance on significance, alpha inflation, and sample size bias. Recent published studies in the medical field provide evidence of causal assertion overreach drawn from correlational findings. The findings present a primer on the assumptions and nature of correlational methods of analysis and urge clinicians to exercise appropriate caution as they critically analyze the evidence before them and evaluate evidence that supports practice. Critically analyzing new evidence requires statistical knowledge in addition to clinical knowledge. Studies can overstate relationships, expressing causal assertions when only correlational evidence is available. Failure to account for the effect of sample size in the analyses tends to overstate the importance of predictive variables. It is important not to overemphasize the statistical significance without consideration of effect size and whether differences could be considered clinically meaningful.

  4. [How to fit and interpret multilevel models using SPSS].

    PubMed

    Pardo, Antonio; Ruiz, Miguel A; San Martín, Rafael

    2007-05-01

    Hierarchic or multilevel models are used to analyse data when cases belong to known groups and sample units are selected both from the individual level and from the group level. In this work, the multilevel models most commonly discussed in the statistic literature are described, explaining how to fit these models using the SPSS program (any version as of the 11 th ) and how to interpret the outcomes of the analysis. Five particular models are described, fitted, and interpreted: (1) one-way analysis of variance with random effects, (2) regression analysis with means-as-outcomes, (3) one-way analysis of covariance with random effects, (4) regression analysis with random coefficients, and (5) regression analysis with means- and slopes-as-outcomes. All models are explained, trying to make them understandable to researchers in health and behaviour sciences.

  5. Remote Sensing of Landuse Changes and Implications for Landuse Policy

    NASA Technical Reports Server (NTRS)

    Kennedy, Ken

    1996-01-01

    This final report describes grant activities under which students were to study landuse changes by comparing planning and zoning documents using remote sensed data data analyzed and interpreted in the laboratory. Students were recruited through mathematics, political science and engineering classes an clubs. Work protocols were then organized for research on the county's growth patterns over the last three decades. Students and investigators made planes to identify specific scenes in Landsat and other data which would satisfy the research parameters. Finally, statistical and imaging software was identified and some was acquired.

  6. Perception in statistical graphics

    NASA Astrophysics Data System (ADS)

    VanderPlas, Susan Ruth

    There has been quite a bit of research on statistical graphics and visualization, generally focused on new types of graphics, new software to create graphics, interactivity, and usability studies. Our ability to interpret and use statistical graphics hinges on the interface between the graph itself and the brain that perceives and interprets it, and there is substantially less research on the interplay between graph, eye, brain, and mind than is sufficient to understand the nature of these relationships. The goal of the work presented here is to further explore the interplay between a static graph, the translation of that graph from paper to mental representation (the journey from eye to brain), and the mental processes that operate on that graph once it is transferred into memory (mind). Understanding the perception of statistical graphics should allow researchers to create more effective graphs which produce fewer distortions and viewer errors while reducing the cognitive load necessary to understand the information presented in the graph. Taken together, these experiments should lay a foundation for exploring the perception of statistical graphics. There has been considerable research into the accuracy of numerical judgments viewers make from graphs, and these studies are useful, but it is more effective to understand how errors in these judgments occur so that the root cause of the error can be addressed directly. Understanding how visual reasoning relates to the ability to make judgments from graphs allows us to tailor graphics to particular target audiences. In addition, understanding the hierarchy of salient features in statistical graphics allows us to clearly communicate the important message from data or statistical models by constructing graphics which are designed specifically for the perceptual system.

  7. Interpreting Association from Graphical Displays

    ERIC Educational Resources Information Center

    Fitzallen, Noleine

    2016-01-01

    Research that has explored students' interpretations of graphical representations has not extended to include how students apply understanding of particular statistical concepts related to one graphical representation to interpret different representations. This paper reports on the way in which students' understanding of covariation, evidenced…

  8. Statistical trends of episiotomy around the world: Comparative systematic review of changing practices.

    PubMed

    Clesse, Christophe; Lighezzolo-Alnot, Joëlle; De Lavergne, Sylvie; Hamlin, Sandrine; Scheffler, Michèle

    2018-06-01

    The authors' purpose for this article is to identify, review and interpret all publications about the episiotomy rates worldwide. Based on the criteria from the PRISMA guidelines, twenty databases were scrutinized. All studies which include national statistics related to episiotomy were selected, as well as studies presenting estimated data. Sixty-one papers were selected with publication dates between 1995 and 2016. A static and dynamic analysis of all the results was carried out. The assumption for the decline in the number of episiotomies is discussed and confirmed, recalling that nowadays high rates of episiotomy remain in less industrialized countries and East Asia. Finally, our analysis aims to investigate the potential determinants which influence apparent statistical disparities.

  9. Problems Associated with Statistical Pattern Recognition of Acoustic Emission Signals in a Compact Tension Fatigue Specimen

    NASA Technical Reports Server (NTRS)

    Hinton, Yolanda L.

    1999-01-01

    Acoustic emission (AE) data were acquired during fatigue testing of an aluminum 2024-T4 compact tension specimen using a commercially available AE system. AE signals from crack extension were identified and separated from noise spikes, signals that reflected from the specimen edges, and signals that saturated the instrumentation. A commercially available software package was used to train a statistical pattern recognition system to classify the signals. The software trained a network to recognize signals with a 91-percent accuracy when compared with the researcher's interpretation of the data. Reasons for the discrepancies are examined and it is postulated that additional preprocessing of the AE data to focus on the extensional wave mode and eliminate other effects before training the pattern recognition system will result in increased accuracy.

  10. Interpretation of digital breast tomosynthesis: preliminary study on comparison with picture archiving and communication system (PACS) and dedicated workstation.

    PubMed

    Kim, Young Seon; Chang, Jung Min; Yi, Ann; Shin, Sung Ui; Lee, Myung Eun; Kim, Won Hwa; Cho, Nariya; Moon, Woo Kyung

    2017-08-01

    To compare the diagnostic accuracy and efficiency in the interpretation of digital breast tomosynthesis (DBT) images using a picture archiving and communication system (PACS) and a dedicated workstation. 97 DBT images obtained for screening or diagnostic purposes were stored in both a workstation and a PACS and evaluated in combination with digital mammography by three independent radiologists retrospectively. Breast Imaging-Reporting and Data System final assessments and likelihood of malignancy (%) were assigned and the interpretation time when using the workstation and PACS was recorded. Receiver operating characteristic curve analysis, sensitivities and specificities were compared with histopathological examination and follow-up data as a reference standard. Area under the receiver operating characteristic curve values for cancer detection (0.839 vs 0.815, p = 0.6375) and sensitivity (81.8% vs 75.8%, p = 0.2188) showed no statistically significant differences between the workstation and PACS. However, specificity was significantly higher when analysing on the workstation than when using PACS (83.7% vs 76.9%, p = 0.009). When evaluating DBT images using PACS, only one case was deemed necessary to be reanalysed using the workstation. The mean time to interpret DBT images on PACS (1.68 min/case) was significantly longer than that to interpret on the workstation (1.35 min/case) (p < 0.0001). Interpretation of DBT images using PACS showed comparable diagnostic performance to a dedicated workstation, even though it required a longer reading time. Advances in knowledge: Interpretation of DBT images using PACS is an alternative to evaluate the images when a dedicated workstation is not available.

  11. Fully Bayesian tests of neutrality using genealogical summary statistics.

    PubMed

    Drummond, Alexei J; Suchard, Marc A

    2008-10-31

    Many data summary statistics have been developed to detect departures from neutral expectations of evolutionary models. However questions about the neutrality of the evolution of genetic loci within natural populations remain difficult to assess. One critical cause of this difficulty is that most methods for testing neutrality make simplifying assumptions simultaneously about the mutational model and the population size model. Consequentially, rejecting the null hypothesis of neutrality under these methods could result from violations of either or both assumptions, making interpretation troublesome. Here we harness posterior predictive simulation to exploit summary statistics of both the data and model parameters to test the goodness-of-fit of standard models of evolution. We apply the method to test the selective neutrality of molecular evolution in non-recombining gene genealogies and we demonstrate the utility of our method on four real data sets, identifying significant departures of neutrality in human influenza A virus, even after controlling for variation in population size. Importantly, by employing a full model-based Bayesian analysis, our method separates the effects of demography from the effects of selection. The method also allows multiple summary statistics to be used in concert, thus potentially increasing sensitivity. Furthermore, our method remains useful in situations where analytical expectations and variances of summary statistics are not available. This aspect has great potential for the analysis of temporally spaced data, an expanding area previously ignored for limited availability of theory and methods.

  12. Land-Cover Change in the East Central Texas Plains, 1973-2000

    USGS Publications Warehouse

    Karstensen, Krista A.

    2009-01-01

    Project Background: The Geographic Analysis and Monitoring (GAM) Program of the U.S. Geological Survey (USGS) Land Cover Trends project is focused on understanding the rates, trends, causes, and consequences of contemporary U.S. land-use and land-cover change. The objectives of the study are to: (1) develop a comprehensive methodology for using sampling and change analysis techniques and Landsat Multispectral Scanner (MSS) and Thematic Mapper (TM) data for measuring regional land-cover change across the United States, (2) characterize the types, rates and temporal variability of change for a 30-year period, (3) document regional driving forces and consequences of change, and (4) prepare a national synthesis of land-cover change (Loveland and others, 1999). Using the 1999 Environmental Protection Agency (EPA) Level III ecoregions derived from Omernik (1987) as the geographic framework, geospatial data collected between 1973 and 2000 were processed and analyzed to characterize ecosystem responses to land-use changes. The 27-year study period was divided into five temporal periods: 1973-1980, 1980-1986, 1986-1992, 1992-2000, and 1973-2000. General land-cover classes such as water, developed, grassland/shrubland, and agriculture for these periods were interpreted from Landsat MSS, TM, and Enhanced Thematic Mapper Plus imagery to categorize land-cover change and evaluate using a modified Anderson Land-Use Land-Cover Classification System for image interpretation. The interpretation of these land-cover classes complement the program objective of looking at land-use change with cover serving as a surrogate for land use. The land-cover change rates are estimated using a stratified, random sampling of 10-kilometer (km) by 10-km blocks allocated within each ecoregion. For each sample block, satellite images are used to interpret land-cover change for the five time periods previously mentioned. Additionally, historical aerial photographs from similar timeframes and other ancillary data such as census statistics and published literature are used. The sample block data are then incorporated into statistical analyses to generate an overall change matrix for the ecoregion. For example, the scalar statistics can show the spatial extent of change per cover type with time, as well as the land-cover transformations from one land-cover type to another type occurring with time. Field data of the sample blocks include direct measurements of land cover, particularly ground-survey data collected for training and validation of image classifications (Loveland and others, 2002). The field experience allows for additional observations of the character and condition of the landscape, assistance in sample block interpretation, ground truthing of Landsat imagery, and helps determine the driving forces of change identified in an ecoregion. Management and maintenance of field data, beyond initial use for training and validation of image classifications, is important as improved methods for image classification are developed, and as present-day data become part of the historical legacy for which studies of land-cover change in the future will depend (Loveland and others, 2002). The results illustrate that there is no single profile of land-cover change; instead, there is significant geographic variability that results from land uses within ecoregions continuously adapting to the resource potential created by various environmental, technological, and socioeconomic factors.

  13. Kernel-PCA data integration with enhanced interpretability

    PubMed Central

    2014-01-01

    Background Nowadays, combining the different sources of information to improve the biological knowledge available is a challenge in bioinformatics. One of the most powerful methods for integrating heterogeneous data types are kernel-based methods. Kernel-based data integration approaches consist of two basic steps: firstly the right kernel is chosen for each data set; secondly the kernels from the different data sources are combined to give a complete representation of the available data for a given statistical task. Results We analyze the integration of data from several sources of information using kernel PCA, from the point of view of reducing dimensionality. Moreover, we improve the interpretability of kernel PCA by adding to the plot the representation of the input variables that belong to any dataset. In particular, for each input variable or linear combination of input variables, we can represent the direction of maximum growth locally, which allows us to identify those samples with higher/lower values of the variables analyzed. Conclusions The integration of different datasets and the simultaneous representation of samples and variables together give us a better understanding of biological knowledge. PMID:25032747

  14. Interpreting carnivore scent-station surveys

    USGS Publications Warehouse

    Sargeant, G.A.; Johnson, D.H.; Berg, W.E.

    1998-01-01

    The scent-station survey method has been widely used to estimate trends in carnivore abundance. However, statistical properties of scent-station data are poorly understood, and the relation between scent-station indices and carnivore abundance has not been adequately evaluated. We assessed properties of scent-station indices by analyzing data collected in Minnesota during 1986-03. Visits to stations separated by <2 km were correlated for all species because individual carnivores sometimes visited several stations in succession. Thus, visits to stations had an intractable statistical distribution. Dichotomizing results for lines of 10 stations (0 or 21 visits) produced binomially distributed data that were robust to multiple visits by individuals. We abandoned 2-way comparisons among years in favor of tests for population trend, which are less susceptible to bias, and analyzed results separately for biogeographic sections of Minnesota because trends differed among sections. Before drawing inferences about carnivore population trends, we reevaluated published validation experiments. Results implicated low statistical power and confounding as possible explanations for equivocal or conflicting results of validation efforts. Long-term trends in visitation rates probably reflect real changes in populations, but poor spatial and temporal resolution, susceptibility to confounding, and low statistical power limit the usefulness of this survey method.

  15. Lognormal Distribution of Cellular Uptake of Radioactivity: Statistical Analysis of α-Particle Track Autoradiography

    PubMed Central

    Neti, Prasad V.S.V.; Howell, Roger W.

    2010-01-01

    Recently, the distribution of radioactivity among a population of cells labeled with 210Po was shown to be well described by a log-normal (LN) distribution function (J Nucl Med. 2006;47:1049–1058) with the aid of autoradiography. To ascertain the influence of Poisson statistics on the interpretation of the autoradiographic data, the present work reports on a detailed statistical analysis of these earlier data. Methods The measured distributions of α-particle tracks per cell were subjected to statistical tests with Poisson, LN, and Poisson-lognormal (P-LN) models. Results The LN distribution function best describes the distribution of radioactivity among cell populations exposed to 0.52 and 3.8 kBq/mL of 210Po-citrate. When cells were exposed to 67 kBq/mL, the P-LN distribution function gave a better fit; however, the underlying activity distribution remained log-normal. Conclusion The present analysis generally provides further support for the use of LN distributions to describe the cellular uptake of radioactivity. Care should be exercised when analyzing autoradiographic data on activity distributions to ensure that Poisson processes do not distort the underlying LN distribution. PMID:18483086

  16. Log Normal Distribution of Cellular Uptake of Radioactivity: Statistical Analysis of Alpha Particle Track Autoradiography

    PubMed Central

    Neti, Prasad V.S.V.; Howell, Roger W.

    2008-01-01

    Recently, the distribution of radioactivity among a population of cells labeled with 210Po was shown to be well described by a log normal distribution function (J Nucl Med 47, 6 (2006) 1049-1058) with the aid of an autoradiographic approach. To ascertain the influence of Poisson statistics on the interpretation of the autoradiographic data, the present work reports on a detailed statistical analyses of these data. Methods The measured distributions of alpha particle tracks per cell were subjected to statistical tests with Poisson (P), log normal (LN), and Poisson – log normal (P – LN) models. Results The LN distribution function best describes the distribution of radioactivity among cell populations exposed to 0.52 and 3.8 kBq/mL 210Po-citrate. When cells were exposed to 67 kBq/mL, the P – LN distribution function gave a better fit, however, the underlying activity distribution remained log normal. Conclusions The present analysis generally provides further support for the use of LN distributions to describe the cellular uptake of radioactivity. Care should be exercised when analyzing autoradiographic data on activity distributions to ensure that Poisson processes do not distort the underlying LN distribution. PMID:16741316

  17. Phylogeography Takes a Relaxed Random Walk in Continuous Space and Time

    PubMed Central

    Lemey, Philippe; Rambaut, Andrew; Welch, John J.; Suchard, Marc A.

    2010-01-01

    Research aimed at understanding the geographic context of evolutionary histories is burgeoning across biological disciplines. Recent endeavors attempt to interpret contemporaneous genetic variation in the light of increasingly detailed geographical and environmental observations. Such interest has promoted the development of phylogeographic inference techniques that explicitly aim to integrate such heterogeneous data. One promising development involves reconstructing phylogeographic history on a continuous landscape. Here, we present a Bayesian statistical approach to infer continuous phylogeographic diffusion using random walk models while simultaneously reconstructing the evolutionary history in time from molecular sequence data. Moreover, by accommodating branch-specific variation in dispersal rates, we relax the most restrictive assumption of the standard Brownian diffusion process and demonstrate increased statistical efficiency in spatial reconstructions of overdispersed random walks by analyzing both simulated and real viral genetic data. We further illustrate how drawing inference about summary statistics from a fully specified stochastic process over both sequence evolution and spatial movement reveals important characteristics of a rabies epidemic. Together with recent advances in discrete phylogeographic inference, the continuous model developments furnish a flexible statistical framework for biogeographical reconstructions that is easily expanded upon to accommodate various landscape genetic features. PMID:20203288

  18. [ADP system for automatic production of physician's letters as well as statistics analyses in the area of obstetrics and perinatology].

    PubMed

    Heinrich, J; Putzar, H; Seidenschnur, G

    1977-01-01

    Report about an electronic data processing system for obstetrics and perinatology. The developed data document design and data flowchart are shown. The continuously accumulated data allowed everytime a detailed interpretation record. For all clinical treated patients the computer printed out a final obstetrical report (physician report). By means of this simple system there is an improvement of the information and the typewriting work of medical staff has been reduced. Cost and work-expenditure for this system are limited in relation to our earlier hand made procedure.

  19. The chi-square test of independence.

    PubMed

    McHugh, Mary L

    2013-01-01

    The Chi-square statistic is a non-parametric (distribution free) tool designed to analyze group differences when the dependent variable is measured at a nominal level. Like all non-parametric statistics, the Chi-square is robust with respect to the distribution of the data. Specifically, it does not require equality of variances among the study groups or homoscedasticity in the data. It permits evaluation of both dichotomous independent variables, and of multiple group studies. Unlike many other non-parametric and some parametric statistics, the calculations needed to compute the Chi-square provide considerable information about how each of the groups performed in the study. This richness of detail allows the researcher to understand the results and thus to derive more detailed information from this statistic than from many others. The Chi-square is a significance statistic, and should be followed with a strength statistic. The Cramer's V is the most common strength test used to test the data when a significant Chi-square result has been obtained. Advantages of the Chi-square include its robustness with respect to distribution of the data, its ease of computation, the detailed information that can be derived from the test, its use in studies for which parametric assumptions cannot be met, and its flexibility in handling data from both two group and multiple group studies. Limitations include its sample size requirements, difficulty of interpretation when there are large numbers of categories (20 or more) in the independent or dependent variables, and tendency of the Cramer's V to produce relative low correlation measures, even for highly significant results.

  20. Rapidly locating and characterizing pollutant releases in buildings.

    PubMed

    Sohn, Michael D; Reynolds, Pamela; Singh, Navtej; Gadgil, Ashok J

    2002-12-01

    Releases of airborne contaminants in or near a building can lead to significant human exposures unless prompt response measures are taken. However, possible responses can include conflicting strategies, such as shutting the ventilation system off versus running it in a purge mode or having occupants evacuate versus sheltering in place. The proper choice depends in part on knowing the source locations, the amounts released, and the likely future dispersion routes of the pollutants. We present an approach that estimates this information in real time. It applies Bayesian statistics to interpret measurements of airborne pollutant concentrations from multiple sensors placed in the building and computes best estimates and uncertainties of the release conditions. The algorithm is fast, capable of continuously updating the estimates as measurements stream in from sensors. We demonstrate the approach using a hypothetical pollutant release in a five-room building. Unknowns to the interpretation algorithm include location, duration, and strength of the source, and some building and weather conditions. Two sensor sampling plans and three levels of data quality are examined. Data interpretation in all examples is rapid; however, locating and characterizing the source with high probability depends on the amount and quality of data and the sampling plan.

  1. A statistical assessment of differences and equivalences between genetically modified and reference plant varieties

    PubMed Central

    2011-01-01

    Background Safety assessment of genetically modified organisms is currently often performed by comparative evaluation. However, natural variation of plant characteristics between commercial varieties is usually not considered explicitly in the statistical computations underlying the assessment. Results Statistical methods are described for the assessment of the difference between a genetically modified (GM) plant variety and a conventional non-GM counterpart, and for the assessment of the equivalence between the GM variety and a group of reference plant varieties which have a history of safe use. It is proposed to present the results of both difference and equivalence testing for all relevant plant characteristics simultaneously in one or a few graphs, as an aid for further interpretation in safety assessment. A procedure is suggested to derive equivalence limits from the observed results for the reference plant varieties using a specific implementation of the linear mixed model. Three different equivalence tests are defined to classify any result in one of four equivalence classes. The performance of the proposed methods is investigated by a simulation study, and the methods are illustrated on compositional data from a field study on maize grain. Conclusions A clear distinction of practical relevance is shown between difference and equivalence testing. The proposed tests are shown to have appropriate performance characteristics by simulation, and the proposed simultaneous graphical representation of results was found to be helpful for the interpretation of results from a practical field trial data set. PMID:21324199

  2. Matching the Statistical Model to the Research Question for Dental Caries Indices with Many Zero Counts

    PubMed Central

    Preisser, John S.; Long, D. Leann; Stamm, John W.

    2017-01-01

    Marginalized zero-inflated count regression models have recently been introduced for the statistical analysis of dental caries indices and other zero-inflated count data as alternatives to traditional zero-inflated and hurdle models. Unlike the standard approaches, the marginalized models directly estimate overall exposure or treatment effects by relating covariates to the marginal mean count. This article discusses model interpretation and model class choice according to the research question being addressed in caries research. Two datasets, one consisting of fictional dmft counts in two groups and the other on DMFS among schoolchildren from a randomized clinical trial (RCT) comparing three toothpaste formulations to prevent incident dental caries, are analysed with negative binomial hurdle (NBH), zero-inflated negative binomial (ZINB), and marginalized zero-inflated negative binomial (MZINB) models. In the first example, estimates of treatment effects vary according to the type of incidence rate ratio (IRR) estimated by the model. Estimates of IRRs in the analysis of the RCT were similar despite their distinctive interpretations. Choice of statistical model class should match the study’s purpose, while accounting for the broad decline in children’s caries experience, such that dmft and DMFS indices more frequently generate zero counts. Marginalized (marginal mean) models for zero-inflated count data should be considered for direct assessment of exposure effects on the marginal mean dental caries count in the presence of high frequencies of zero counts. PMID:28291962

  3. Compositionality and Statistics in Adjective Acquisition: 4-Year-Olds Interpret "Tall" and "Short" Based on the Size Distributions of Novel Noun Referents

    ERIC Educational Resources Information Center

    Barner, David; Snedeker, Jesse

    2008-01-01

    Four experiments investigated 4-year-olds' understanding of adjective-noun compositionality and their sensitivity to statistics when interpreting scalar adjectives. In Experiments 1 and 2, children selected "tall" and "short" items from 9 novel objects called "pimwits" (1-9 in. in height) or from this array plus 4 taller or shorter distractor…

  4. On the interpretation of weight vectors of linear models in multivariate neuroimaging.

    PubMed

    Haufe, Stefan; Meinecke, Frank; Görgen, Kai; Dähne, Sven; Haynes, John-Dylan; Blankertz, Benjamin; Bießmann, Felix

    2014-02-15

    The increase in spatiotemporal resolution of neuroimaging devices is accompanied by a trend towards more powerful multivariate analysis methods. Often it is desired to interpret the outcome of these methods with respect to the cognitive processes under study. Here we discuss which methods allow for such interpretations, and provide guidelines for choosing an appropriate analysis for a given experimental goal: For a surgeon who needs to decide where to remove brain tissue it is most important to determine the origin of cognitive functions and associated neural processes. In contrast, when communicating with paralyzed or comatose patients via brain-computer interfaces, it is most important to accurately extract the neural processes specific to a certain mental state. These equally important but complementary objectives require different analysis methods. Determining the origin of neural processes in time or space from the parameters of a data-driven model requires what we call a forward model of the data; such a model explains how the measured data was generated from the neural sources. Examples are general linear models (GLMs). Methods for the extraction of neural information from data can be considered as backward models, as they attempt to reverse the data generating process. Examples are multivariate classifiers. Here we demonstrate that the parameters of forward models are neurophysiologically interpretable in the sense that significant nonzero weights are only observed at channels the activity of which is related to the brain process under study. In contrast, the interpretation of backward model parameters can lead to wrong conclusions regarding the spatial or temporal origin of the neural signals of interest, since significant nonzero weights may also be observed at channels the activity of which is statistically independent of the brain process under study. As a remedy for the linear case, we propose a procedure for transforming backward models into forward models. This procedure enables the neurophysiological interpretation of the parameters of linear backward models. We hope that this work raises awareness for an often encountered problem and provides a theoretical basis for conducting better interpretable multivariate neuroimaging analyses. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.

  5. Design, implementation and reporting strategies to reduce the instance and impact of missing patient-reported outcome (PRO) data: a systematic review

    PubMed Central

    Mercieca-Bebber, Rebecca; Palmer, Michael J; Brundage, Michael; Stockler, Martin R; King, Madeleine T

    2016-01-01

    Objectives Patient-reported outcomes (PROs) provide important information about the impact of treatment from the patients' perspective. However, missing PRO data may compromise the interpretability and value of the findings. We aimed to report: (1) a non-technical summary of problems caused by missing PRO data; and (2) a systematic review by collating strategies to: (A) minimise rates of missing PRO data, and (B) facilitate transparent interpretation and reporting of missing PRO data in clinical research. Our systematic review does not address statistical handling of missing PRO data. Data sources MEDLINE and Cumulative Index to Nursing and Allied Health Literature (CINAHL) databases (inception to 31 March 2015), and citing articles and reference lists from relevant sources. Eligibility criteria English articles providing recommendations for reducing missing PRO data rates, or strategies to facilitate transparent interpretation and reporting of missing PRO data were included. Methods 2 reviewers independently screened articles against eligibility criteria. Discrepancies were resolved with the research team. Recommendations were extracted and coded according to framework synthesis. Results 117 sources (55% discussion papers, 26% original research) met the eligibility criteria. Design and methodological strategies for reducing rates of missing PRO data included: incorporating PRO-specific information into the protocol; carefully designing PRO assessment schedules and defining termination rules; minimising patient burden; appointing a PRO coordinator; PRO-specific training for staff; ensuring PRO studies are adequately resourced; and continuous quality assurance. Strategies for transparent interpretation and reporting of missing PRO data include utilising auxiliary data to inform analysis; transparently reporting baseline PRO scores, rates and reasons for missing data; and methods for handling missing PRO data. Conclusions The instance of missing PRO data and its potential to bias clinical research can be minimised by implementing thoughtful design, rigorous methodology and transparent reporting strategies. All members of the research team have a responsibility in implementing such strategies. PMID:27311907

  6. Admixture, Population Structure, and F-Statistics.

    PubMed

    Peter, Benjamin M

    2016-04-01

    Many questions about human genetic history can be addressed by examining the patterns of shared genetic variation between sets of populations. A useful methodological framework for this purpose isF-statistics that measure shared genetic drift between sets of two, three, and four populations and can be used to test simple and complex hypotheses about admixture between populations. This article provides context from phylogenetic and population genetic theory. I review how F-statistics can be interpreted as branch lengths or paths and derive new interpretations, using coalescent theory. I further show that the admixture tests can be interpreted as testing general properties of phylogenies, allowing extension of some ideas applications to arbitrary phylogenetic trees. The new results are used to investigate the behavior of the statistics under different models of population structure and show how population substructure complicates inference. The results lead to simplified estimators in many cases, and I recommend to replace F3 with the average number of pairwise differences for estimating population divergence. Copyright © 2016 by the Genetics Society of America.

  7. Using Statistical Mechanics and Entropy Principles to Interpret Variability in Power Law Models of the Streamflow Recession

    NASA Astrophysics Data System (ADS)

    Dralle, D.; Karst, N.; Thompson, S. E.

    2015-12-01

    Multiple competing theories suggest that power law behavior governs the observed first-order dynamics of streamflow recessions - the important process by which catchments dry-out via the stream network, altering the availability of surface water resources and in-stream habitat. Frequently modeled as: dq/dt = -aqb, recessions typically exhibit a high degree of variability, even within a single catchment, as revealed by significant shifts in the values of "a" and "b" across recession events. One potential source of this variability lies in underlying, hard-to-observe fluctuations in how catchment water storage is partitioned amongst distinct storage elements, each having different discharge behaviors. Testing this and competing hypotheses with widely available streamflow timeseries, however, has been hindered by a power law scaling artifact that obscures meaningful covariation between the recession parameters, "a" and "b". Here we briefly outline a technique that removes this artifact, revealing intriguing new patterns in the joint distribution of recession parameters. Using long-term flow data from catchments in Northern California, we explore temporal variations, and find that the "a" parameter varies strongly with catchment wetness. Then we explore how the "b" parameter changes with "a", and find that measures of its variation are maximized at intermediate "a" values. We propose an interpretation of this pattern based on statistical mechanics, meaning "b" can be viewed as an indicator of the catchment "microstate" - i.e. the partitioning of storage - and "a" as a measure of the catchment macrostate (i.e. the total storage). In statistical mechanics, entropy (i.e. microstate variance, that is the variance of "b") is maximized for intermediate values of extensive variables (i.e. wetness, "a"), as observed in the recession data. This interpretation of "a" and "b" was supported by model runs using a multiple-reservoir catchment toy model, and lends support to the hypothesis that power law streamflow recession dynamics, and their variations, have their origin in the multiple modalities of storage partitioning.

  8. GHEP-ISFG collaborative exercise on mixture profiles (GHEP-MIX06). Reporting conclusions: Results and evaluation.

    PubMed

    Barrio, P A; Crespillo, M; Luque, J A; Aler, M; Baeza-Richer, C; Baldassarri, L; Carnevali, E; Coufalova, P; Flores, I; García, O; García, M A; González, R; Hernández, A; Inglés, V; Luque, G M; Mosquera-Miguel, A; Pedrosa, S; Pontes, M L; Porto, M J; Posada, Y; Ramella, M I; Ribeiro, T; Riego, E; Sala, A; Saragoni, V G; Serrano, A; Vannelli, S

    2018-07-01

    One of the main goals of the Spanish and Portuguese-Speaking Group of the International Society for Forensic Genetics (GHEP-ISFG) is to promote and contribute to the development and dissemination of scientific knowledge in the field of forensic genetics. Due to this fact, GHEP-ISFG holds different working commissions that are set up to develop activities in scientific aspects of general interest. One of them, the Mixture Commission of GHEP-ISFG, has organized annually, since 2009, a collaborative exercise on analysis and interpretation of autosomal short tandem repeat (STR) mixture profiles. Until now, six exercises have been organized. At the present edition (GHEP-MIX06), with 25 participant laboratories, the exercise main aim was to assess mixture profiles results by issuing a report, from the proposal of a complex mock case. One of the conclusions obtained from this exercise is the increasing tendency of participating laboratories to validate DNA mixture profiles analysis following international recommendations. However, the results have shown some differences among them regarding the edition and also the interpretation of mixture profiles. Besides, although the last revision of ISO/IEC 17025:2017 gives indications of how results should be reported, not all laboratories strictly follow their recommendations. Regarding the statistical aspect, all those laboratories that have performed statistical evaluation of the data have employed the likelihood ratio (LR) as a parameter to evaluate the statistical compatibility. However, LR values obtained show a wide range of variation. This fact could not be attributed to the software employed, since the vast majority of laboratories that performed LR calculation employed the same software (LRmixStudio). Thus, the final allelic composition of the edited mixture profile and the parameters employed in the software could explain this data dispersion. This highlights the need, for each laboratory, to define through internal validations its criteria for editing and interpreting mixtures, and to continuous train in software handling. Copyright © 2018 Elsevier B.V. All rights reserved.

  9. Interpreting the Results of Weighted Least-Squares Regression: Caveats for the Statistical Consumer.

    ERIC Educational Resources Information Center

    Willett, John B.; Singer, Judith D.

    In research, data sets often occur in which the variance of the distribution of the dependent variable at given levels of the predictors is a function of the values of the predictors. In this situation, the use of weighted least-squares (WLS) or techniques is required. Weights suitable for use in a WLS regression analysis must be estimated. A…

  10. Quantifying Confidence in Model Predictions for Hypersonic Aircraft Structures

    DTIC Science & Technology

    2015-03-01

    of isolating calibrations of models in the network, segmented and simultaneous calibration are compared using the Kullback - Leibler ...value of θ. While not all test -statistics are as simple as measuring goodness or badness of fit , their directional interpretations tend to remain...data quite well, qualitatively. Quantitative goodness - of - fit tests are problematic because they assume a true empirical CDF is being tested or

  11. Power, effects, confidence, and significance: an investigation of statistical practices in nursing research.

    PubMed

    Gaskin, Cadeyrn J; Happell, Brenda

    2014-05-01

    To (a) assess the statistical power of nursing research to detect small, medium, and large effect sizes; (b) estimate the experiment-wise Type I error rate in these studies; and (c) assess the extent to which (i) a priori power analyses, (ii) effect sizes (and interpretations thereof), and (iii) confidence intervals were reported. Statistical review. Papers published in the 2011 volumes of the 10 highest ranked nursing journals, based on their 5-year impact factors. Papers were assessed for statistical power, control of experiment-wise Type I error, reporting of a priori power analyses, reporting and interpretation of effect sizes, and reporting of confidence intervals. The analyses were based on 333 papers, from which 10,337 inferential statistics were identified. The median power to detect small, medium, and large effect sizes was .40 (interquartile range [IQR]=.24-.71), .98 (IQR=.85-1.00), and 1.00 (IQR=1.00-1.00), respectively. The median experiment-wise Type I error rate was .54 (IQR=.26-.80). A priori power analyses were reported in 28% of papers. Effect sizes were routinely reported for Spearman's rank correlations (100% of papers in which this test was used), Poisson regressions (100%), odds ratios (100%), Kendall's tau correlations (100%), Pearson's correlations (99%), logistic regressions (98%), structural equation modelling/confirmatory factor analyses/path analyses (97%), and linear regressions (83%), but were reported less often for two-proportion z tests (50%), analyses of variance/analyses of covariance/multivariate analyses of variance (18%), t tests (8%), Wilcoxon's tests (8%), Chi-squared tests (8%), and Fisher's exact tests (7%), and not reported for sign tests, Friedman's tests, McNemar's tests, multi-level models, and Kruskal-Wallis tests. Effect sizes were infrequently interpreted. Confidence intervals were reported in 28% of papers. The use, reporting, and interpretation of inferential statistics in nursing research need substantial improvement. Most importantly, researchers should abandon the misleading practice of interpreting the results from inferential tests based solely on whether they are statistically significant (or not) and, instead, focus on reporting and interpreting effect sizes, confidence intervals, and significance levels. Nursing researchers also need to conduct and report a priori power analyses, and to address the issue of Type I experiment-wise error inflation in their studies. Crown Copyright © 2013. Published by Elsevier Ltd. All rights reserved.

  12. Context dependent anti-aliasing image reconstruction

    NASA Technical Reports Server (NTRS)

    Beaudet, Paul R.; Hunt, A.; Arlia, N.

    1989-01-01

    Image Reconstruction has been mostly confined to context free linear processes; the traditional continuum interpretation of digital array data uses a linear interpolator with or without an enhancement filter. Here, anti-aliasing context dependent interpretation techniques are investigated for image reconstruction. Pattern classification is applied to each neighborhood to assign it a context class; a different interpolation/filter is applied to neighborhoods of differing context. It is shown how the context dependent interpolation is computed through ensemble average statistics using high resolution training imagery from which the lower resolution image array data is obtained (simulation). A quadratic least squares (LS) context-free image quality model is described from which the context dependent interpolation coefficients are derived. It is shown how ensembles of high-resolution images can be used to capture the a priori special character of different context classes. As a consequence, a priori information such as the translational invariance of edges along the edge direction, edge discontinuity, and the character of corners is captured and can be used to interpret image array data with greater spatial resolution than would be expected by the Nyquist limit. A Gibb-like artifact associated with this super-resolution is discussed. More realistic context dependent image quality models are needed and a suggestion is made for using a quality model which now is finding application in data compression.

  13. Scalable privacy-preserving data sharing methodology for genome-wide association studies.

    PubMed

    Yu, Fei; Fienberg, Stephen E; Slavković, Aleksandra B; Uhler, Caroline

    2014-08-01

    The protection of privacy of individual-level information in genome-wide association study (GWAS) databases has been a major concern of researchers following the publication of "an attack" on GWAS data by Homer et al. (2008). Traditional statistical methods for confidentiality and privacy protection of statistical databases do not scale well to deal with GWAS data, especially in terms of guarantees regarding protection from linkage to external information. The more recent concept of differential privacy, introduced by the cryptographic community, is an approach that provides a rigorous definition of privacy with meaningful privacy guarantees in the presence of arbitrary external information, although the guarantees may come at a serious price in terms of data utility. Building on such notions, Uhler et al. (2013) proposed new methods to release aggregate GWAS data without compromising an individual's privacy. We extend the methods developed in Uhler et al. (2013) for releasing differentially-private χ(2)-statistics by allowing for arbitrary number of cases and controls, and for releasing differentially-private allelic test statistics. We also provide a new interpretation by assuming the controls' data are known, which is a realistic assumption because some GWAS use publicly available data as controls. We assess the performance of the proposed methods through a risk-utility analysis on a real data set consisting of DNA samples collected by the Wellcome Trust Case Control Consortium and compare the methods with the differentially-private release mechanism proposed by Johnson and Shmatikov (2013). Copyright © 2014 Elsevier Inc. All rights reserved.

  14. Comparison of the Predictive Performance and Interpretability of Random Forest and Linear Models on Benchmark Data Sets.

    PubMed

    Marchese Robinson, Richard L; Palczewska, Anna; Palczewski, Jan; Kidley, Nathan

    2017-08-28

    The ability to interpret the predictions made by quantitative structure-activity relationships (QSARs) offers a number of advantages. While QSARs built using nonlinear modeling approaches, such as the popular Random Forest algorithm, might sometimes be more predictive than those built using linear modeling approaches, their predictions have been perceived as difficult to interpret. However, a growing number of approaches have been proposed for interpreting nonlinear QSAR models in general and Random Forest in particular. In the current work, we compare the performance of Random Forest to those of two widely used linear modeling approaches: linear Support Vector Machines (SVMs) (or Support Vector Regression (SVR)) and partial least-squares (PLS). We compare their performance in terms of their predictivity as well as the chemical interpretability of the predictions using novel scoring schemes for assessing heat map images of substructural contributions. We critically assess different approaches for interpreting Random Forest models as well as for obtaining predictions from the forest. We assess the models on a large number of widely employed public-domain benchmark data sets corresponding to regression and binary classification problems of relevance to hit identification and toxicology. We conclude that Random Forest typically yields comparable or possibly better predictive performance than the linear modeling approaches and that its predictions may also be interpreted in a chemically and biologically meaningful way. In contrast to earlier work looking at interpretation of nonlinear QSAR models, we directly compare two methodologically distinct approaches for interpreting Random Forest models. The approaches for interpreting Random Forest assessed in our article were implemented using open-source programs that we have made available to the community. These programs are the rfFC package ( https://r-forge.r-project.org/R/?group_id=1725 ) for the R statistical programming language and the Python program HeatMapWrapper [ https://doi.org/10.5281/zenodo.495163 ] for heat map generation.

  15. Analyzing and interpreting genome data at the network level with ConsensusPathDB.

    PubMed

    Herwig, Ralf; Hardt, Christopher; Lienhard, Matthias; Kamburov, Atanas

    2016-10-01

    ConsensusPathDB consists of a comprehensive collection of human (as well as mouse and yeast) molecular interaction data integrated from 32 different public repositories and a web interface featuring a set of computational methods and visualization tools to explore these data. This protocol describes the use of ConsensusPathDB (http://consensuspathdb.org) with respect to the functional and network-based characterization of biomolecules (genes, proteins and metabolites) that are submitted to the system either as a priority list or together with associated experimental data such as RNA-seq. The tool reports interaction network modules, biochemical pathways and functional information that are significantly enriched by the user's input, applying computational methods for statistical over-representation, enrichment and graph analysis. The results of this protocol can be observed within a few minutes, even with genome-wide data. The resulting network associations can be used to interpret high-throughput data mechanistically, to characterize and prioritize biomarkers, to integrate different omics levels, to design follow-up functional assay experiments and to generate topology for kinetic models at different scales.

  16. A Statistical Study of Eiscat Electron and Ion Temperature Measurements In The E-region

    NASA Astrophysics Data System (ADS)

    Hussey, G.; Haldoupis, C.; Schlegel, K.; Bösinger, T.

    Motivated by the large EISCAT data base, which covers over 15 years of common programme operation, and previous statistical work with EISCAT data (e.g., C. Hal- doupis, K. Schlegel, and G. Hussey, Auroral E-region electron density gradients mea- sured with EISCAT, Ann. Geopshysicae, 18, 1172-1181, 2000), a detailed statistical analysis of electron and ion EISCAT temperature measurements has been undertaken. This study was specifically concerned with the statistical dependence of heating events with other ambient parameters such as the electric field and electron density. The re- sults showed previously reported dependences such as the electron temperature being directly correlated with the ambient electric field and inversely related to the electron density. However, these correlations were found to be also dependent upon altitude. There was also evidence of the so called "Schlegel effect" (K. Schlegel, Reduced effective recombination coefficient in the disturbed polar E-region, J. Atmos. Terr. Phys., 44, 183-185, 1982); that is, the heated electron gas leads to increases in elec- tron density through a reduction in the recombination rate. This paper will present the statistical heating results and attempt to offer physical explanations and interpretations of the findings.

  17. Statistical analysis of water-quality data containing multiple detection limits II: S-language software for nonparametric distribution modeling and hypothesis testing

    USGS Publications Warehouse

    Lee, L.; Helsel, D.

    2007-01-01

    Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data-perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan-Meier (K-M) method. This method has seen widespread usage in the medical sciences within a general framework termed "survival analysis" where it is employed with right-censored time-to-failure data. However, K-M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K-M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K-M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K-M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.

  18. Evaluation of PDA Technical Report No 33. Statistical Testing Recommendations for a Rapid Microbiological Method Case Study.

    PubMed

    Murphy, Thomas; Schwedock, Julie; Nguyen, Kham; Mills, Anna; Jones, David

    2015-01-01

    New recommendations for the validation of rapid microbiological methods have been included in the revised Technical Report 33 release from the PDA. The changes include a more comprehensive review of the statistical methods to be used to analyze data obtained during validation. This case study applies those statistical methods to accuracy, precision, ruggedness, and equivalence data obtained using a rapid microbiological methods system being evaluated for water bioburden testing. Results presented demonstrate that the statistical methods described in the PDA Technical Report 33 chapter can all be successfully applied to the rapid microbiological method data sets and gave the same interpretation for equivalence to the standard method. The rapid microbiological method was in general able to pass the requirements of PDA Technical Report 33, though the study shows that there can be occasional outlying results and that caution should be used when applying statistical methods to low average colony-forming unit values. Prior to use in a quality-controlled environment, any new method or technology has to be shown to work as designed by the manufacturer for the purpose required. For new rapid microbiological methods that detect and enumerate contaminating microorganisms, additional recommendations have been provided in the revised PDA Technical Report No. 33. The changes include a more comprehensive review of the statistical methods to be used to analyze data obtained during validation. This paper applies those statistical methods to analyze accuracy, precision, ruggedness, and equivalence data obtained using a rapid microbiological method system being validated for water bioburden testing. The case study demonstrates that the statistical methods described in the PDA Technical Report No. 33 chapter can be successfully applied to rapid microbiological method data sets and give the same comparability results for similarity or difference as the standard method. © PDA, Inc. 2015.

  19. Exploring Rating Quality in Rater-Mediated Assessments Using Mokken Scale Analysis

    PubMed Central

    Wind, Stefanie A.; Engelhard, George

    2015-01-01

    Mokken scale analysis is a probabilistic nonparametric approach that offers statistical and graphical tools for evaluating the quality of social science measurement without placing potentially inappropriate restrictions on the structure of a data set. In particular, Mokken scaling provides a useful method for evaluating important measurement properties, such as invariance, in contexts where response processes are not well understood. Because rater-mediated assessments involve complex interactions among many variables, including assessment contexts, student artifacts, rubrics, individual rater characteristics, and others, rater-assigned scores are suitable candidates for Mokken scale analysis. The purposes of this study are to describe a suite of indices that can be used to explore the psychometric quality of data from rater-mediated assessments and to illustrate the substantive interpretation of Mokken-based statistics and displays in this context. Techniques that are commonly used in polytomous applications of Mokken scaling are adapted for use with rater-mediated assessments, with a focus on the substantive interpretation related to individual raters. Overall, the findings suggest that indices of rater monotonicity, rater scalability, and invariant rater ordering based on Mokken scaling provide diagnostic information at the level of individual raters related to the requirements for invariant measurement. These Mokken-based indices serve as an additional suite of diagnostic tools for exploring the quality of data from rater-mediated assessments that can supplement rating quality indices based on parametric models. PMID:29795883

  20. Application of Linear Mixed-Effects Models in Human Neuroscience Research: A Comparison with Pearson Correlation in Two Auditory Electrophysiology Studies

    PubMed Central

    Koerner, Tess K.; Zhang, Yang

    2017-01-01

    Neurophysiological studies are often designed to examine relationships between measures from different testing conditions, time points, or analysis techniques within the same group of participants. Appropriate statistical techniques that can take into account repeated measures and multivariate predictor variables are integral and essential to successful data analysis and interpretation. This work implements and compares conventional Pearson correlations and linear mixed-effects (LME) regression models using data from two recently published auditory electrophysiology studies. For the specific research questions in both studies, the Pearson correlation test is inappropriate for determining strengths between the behavioral responses for speech-in-noise recognition and the multiple neurophysiological measures as the neural responses across listening conditions were simply treated as independent measures. In contrast, the LME models allow a systematic approach to incorporate both fixed-effect and random-effect terms to deal with the categorical grouping factor of listening conditions, between-subject baseline differences in the multiple measures, and the correlational structure among the predictor variables. Together, the comparative data demonstrate the advantages as well as the necessity to apply mixed-effects models to properly account for the built-in relationships among the multiple predictor variables, which has important implications for proper statistical modeling and interpretation of human behavior in terms of neural correlates and biomarkers. PMID:28264422

  1. Common pitfalls in statistical analysis: Clinical versus statistical significance

    PubMed Central

    Ranganathan, Priya; Pramesh, C. S.; Buyse, Marc

    2015-01-01

    In clinical research, study results, which are statistically significant are often interpreted as being clinically important. While statistical significance indicates the reliability of the study results, clinical significance reflects its impact on clinical practice. The third article in this series exploring pitfalls in statistical analysis clarifies the importance of differentiating between statistical significance and clinical significance. PMID:26229754

  2. Landscape pattern metrics and regional assessment

    USGS Publications Warehouse

    O'Neill, R. V.; Riitters, K.H.; Wickham, J.D.; Jones, K.B.

    1999-01-01

    The combination of remote imagery data, geographic information systems software, and landscape ecology theory provides a unique basis for monitoring and assessing large-scale ecological systems. The unique feature of the work has been the need to develop and interpret quantitative measures of spatial pattern-the landscape indices. This article reviews what is known about the statistical properties of these pattern metrics and suggests some additional metrics based on island biogeography, percolation theory, hierarchy theory, and economic geography. Assessment applications of this approach have required interpreting the pattern metrics in terms of specific environmental endpoints, such as wildlife and water quality, and research into how to represent synergystic effects of many overlapping sources of stress.

  3. Integrated environmental monitoring and multivariate data analysis-A case study.

    PubMed

    Eide, Ingvar; Westad, Frank; Nilssen, Ingunn; de Freitas, Felipe Sales; Dos Santos, Natalia Gomes; Dos Santos, Francisco; Cabral, Marcelo Montenegro; Bicego, Marcia Caruso; Figueira, Rubens; Johnsen, Ståle

    2017-03-01

    The present article describes integration of environmental monitoring and discharge data and interpretation using multivariate statistics, principal component analysis (PCA), and partial least squares (PLS) regression. The monitoring was carried out at the Peregrino oil field off the coast of Brazil. One sensor platform and 3 sediment traps were placed on the seabed. The sensors measured current speed and direction, turbidity, temperature, and conductivity. The sediment trap samples were used to determine suspended particulate matter that was characterized with respect to a number of chemical parameters (26 alkanes, 16 PAHs, N, C, calcium carbonate, and Ba). Data on discharges of drill cuttings and water-based drilling fluid were provided on a daily basis. The monitoring was carried out during 7 campaigns from June 2010 to October 2012, each lasting 2 to 3 months due to the capacity of the sediment traps. The data from the campaigns were preprocessed, combined, and interpreted using multivariate statistics. No systematic difference could be observed between campaigns or traps despite the fact that the first campaign was carried out before drilling, and 1 of 3 sediment traps was located in an area not expected to be influenced by the discharges. There was a strong covariation between suspended particulate matter and total N and organic C suggesting that the majority of the sediment samples had a natural and biogenic origin. Furthermore, the multivariate regression showed no correlation between discharges of drill cuttings and sediment trap or turbidity data taking current speed and direction into consideration. Because of this lack of correlation with discharges from the drilling location, a more detailed evaluation of chemical indicators providing information about origin was carried out in addition to numerical modeling of dispersion and deposition. The chemical indicators and the modeling of dispersion and deposition support the conclusions from the multivariate statistics. Integr Environ Assess Manag 2017;13:387-395. © 2016 SETAC. © 2016 SETAC.

  4. Signatures of criticality arise from random subsampling in simple population models.

    PubMed

    Nonnenmacher, Marcel; Behrens, Christian; Berens, Philipp; Bethge, Matthias; Macke, Jakob H

    2017-10-01

    The rise of large-scale recordings of neuronal activity has fueled the hope to gain new insights into the collective activity of neural ensembles. How can one link the statistics of neural population activity to underlying principles and theories? One attempt to interpret such data builds upon analogies to the behaviour of collective systems in statistical physics. Divergence of the specific heat-a measure of population statistics derived from thermodynamics-has been used to suggest that neural populations are optimized to operate at a "critical point". However, these findings have been challenged by theoretical studies which have shown that common inputs can lead to diverging specific heat. Here, we connect "signatures of criticality", and in particular the divergence of specific heat, back to statistics of neural population activity commonly studied in neural coding: firing rates and pairwise correlations. We show that the specific heat diverges whenever the average correlation strength does not depend on population size. This is necessarily true when data with correlations is randomly subsampled during the analysis process, irrespective of the detailed structure or origin of correlations. We also show how the characteristic shape of specific heat capacity curves depends on firing rates and correlations, using both analytically tractable models and numerical simulations of a canonical feed-forward population model. To analyze these simulations, we develop efficient methods for characterizing large-scale neural population activity with maximum entropy models. We find that, consistent with experimental findings, increases in firing rates and correlation directly lead to more pronounced signatures. Thus, previous reports of thermodynamical criticality in neural populations based on the analysis of specific heat can be explained by average firing rates and correlations, and are not indicative of an optimized coding strategy. We conclude that a reliable interpretation of statistical tests for theories of neural coding is possible only in reference to relevant ground-truth models.

  5. A qualitative assessment of a random process proposed as an atmospheric turbulence model

    NASA Technical Reports Server (NTRS)

    Sidwell, K.

    1977-01-01

    A random process is formed by the product of two Gaussian processes and the sum of that product with a third Gaussian process. The resulting total random process is interpreted as the sum of an amplitude modulated process and a slowly varying, random mean value. The properties of the process are examined, including an interpretation of the process in terms of the physical structure of atmospheric motions. The inclusion of the mean value variation gives an improved representation of the properties of atmospheric motions, since the resulting process can account for the differences in the statistical properties of atmospheric velocity components and their gradients. The application of the process to atmospheric turbulence problems, including the response of aircraft dynamic systems, is examined. The effects of the mean value variation upon aircraft loads are small in most cases, but can be important in the measurement and interpretation of atmospheric turbulence data.

  6. Comparative effectiveness research methodology using secondary data: A starting user's guide.

    PubMed

    Sun, Maxine; Lipsitz, Stuart R

    2018-04-01

    The use of secondary data, such as claims or administrative data, in comparative effectiveness research has grown tremendously in recent years. We believe that the current review can help investigators relying on secondary data to (1) gain insight into both the methodologies and statistical methods, (2) better understand the necessity of a rigorous planning before initiating a comparative effectiveness investigation, and (3) optimize the quality of their investigations. Specifically, we review concepts of adjusted analyses and confounders, methods of propensity score analyses, and instrumental variable analyses, risk prediction models (logistic and time-to-event), decision-curve analysis, as well as the interpretation of the P value and hypothesis testing. Overall, we hope that the current review article can help research investigators relying on secondary data to perform comparative effectiveness research better understand the necessity of a rigorous planning before study start, and gain better insight in the choice of statistical methods so as to optimize the quality of the research study. Copyright © 2017 Elsevier Inc. All rights reserved.

  7. Association between Stereotactic Radiotherapy and Death from Brain Metastases of Epithelial Ovarian Cancer: a Gliwice Data Re-Analysis with Penalization

    PubMed

    Tukiendorf, Andrzej; Mansournia, Mohammad Ali; Wydmański, Jerzy; Wolny-Rokicka, Edyta

    2017-04-01

    Background: Clinical datasets for epithelial ovarian cancer brain metastatic patients are usually small in size. When adequate case numbers are lacking, resulting estimates of regression coefficients may demonstrate bias. One of the direct approaches to reduce such sparse-data bias is based on penalized estimation. Methods: A re- analysis of formerly reported hazard ratios in diagnosed patients was performed using penalized Cox regression with a popular SAS package providing additional software codes for a statistical computational procedure. Results: It was found that the penalized approach can readily diminish sparse data artefacts and radically reduce the magnitude of estimated regression coefficients. Conclusions: It was confirmed that classical statistical approaches may exaggerate regression estimates or distort study interpretations and conclusions. The results support the thesis that penalization via weak informative priors and data augmentation are the safest approaches to shrink sparse data artefacts frequently occurring in epidemiological research. Creative Commons Attribution License

  8. The impact of loss to follow-up on hypothesis tests of the treatment effect for several statistical methods in substance abuse clinical trials.

    PubMed

    Hedden, Sarra L; Woolson, Robert F; Carter, Rickey E; Palesch, Yuko; Upadhyaya, Himanshu P; Malcolm, Robert J

    2009-07-01

    "Loss to follow-up" can be substantial in substance abuse clinical trials. When extensive losses to follow-up occur, one must cautiously analyze and interpret the findings of a research study. Aims of this project were to introduce the types of missing data mechanisms and describe several methods for analyzing data with loss to follow-up. Furthermore, a simulation study compared Type I error and power of several methods when missing data amount and mechanism varies. Methods compared were the following: Last observation carried forward (LOCF), multiple imputation (MI), modified stratified summary statistics (SSS), and mixed effects models. Results demonstrated nominal Type I error for all methods; power was high for all methods except LOCF. Mixed effect model, modified SSS, and MI are generally recommended for use; however, many methods require that the data are missing at random or missing completely at random (i.e., "ignorable"). If the missing data are presumed to be nonignorable, a sensitivity analysis is recommended.

  9. Improving Visualization and Interpretation of Metabolome-Wide Association Studies: An Application in a Population-Based Cohort Using Untargeted 1H NMR Metabolic Profiling.

    PubMed

    Castagné, Raphaële; Boulangé, Claire Laurence; Karaman, Ibrahim; Campanella, Gianluca; Santos Ferreira, Diana L; Kaluarachchi, Manuja R; Lehne, Benjamin; Moayyeri, Alireza; Lewis, Matthew R; Spagou, Konstantina; Dona, Anthony C; Evangelos, Vangelis; Tracy, Russell; Greenland, Philip; Lindon, John C; Herrington, David; Ebbels, Timothy M D; Elliott, Paul; Tzoulaki, Ioanna; Chadeau-Hyam, Marc

    2017-10-06

    1 H NMR spectroscopy of biofluids generates reproducible data allowing detection and quantification of small molecules in large population cohorts. Statistical models to analyze such data are now well-established, and the use of univariate metabolome wide association studies (MWAS) investigating the spectral features separately has emerged as a computationally efficient and interpretable alternative to multivariate models. The MWAS rely on the accurate estimation of a metabolome wide significance level (MWSL) to be applied to control the family wise error rate. Subsequent interpretation requires efficient visualization and formal feature annotation, which, in-turn, call for efficient prioritization of spectral variables of interest. Using human serum 1 H NMR spectroscopic profiles from 3948 participants from the Multi-Ethnic Study of Atherosclerosis (MESA), we have performed a series of MWAS for serum levels of glucose. We first propose an extension of the conventional MWSL that yields stable estimates of the MWSL across the different model parameterizations and distributional features of the outcome. We propose both efficient visualization methods and a strategy based on subsampling and internal validation to prioritize the associations. Our work proposes and illustrates practical and scalable solutions to facilitate the implementation of the MWAS approach and improve interpretation in large cohort studies.

  10. Improving Visualization and Interpretation of Metabolome-Wide Association Studies: An Application in a Population-Based Cohort Using Untargeted 1H NMR Metabolic Profiling

    PubMed Central

    2017-01-01

    1H NMR spectroscopy of biofluids generates reproducible data allowing detection and quantification of small molecules in large population cohorts. Statistical models to analyze such data are now well-established, and the use of univariate metabolome wide association studies (MWAS) investigating the spectral features separately has emerged as a computationally efficient and interpretable alternative to multivariate models. The MWAS rely on the accurate estimation of a metabolome wide significance level (MWSL) to be applied to control the family wise error rate. Subsequent interpretation requires efficient visualization and formal feature annotation, which, in-turn, call for efficient prioritization of spectral variables of interest. Using human serum 1H NMR spectroscopic profiles from 3948 participants from the Multi-Ethnic Study of Atherosclerosis (MESA), we have performed a series of MWAS for serum levels of glucose. We first propose an extension of the conventional MWSL that yields stable estimates of the MWSL across the different model parameterizations and distributional features of the outcome. We propose both efficient visualization methods and a strategy based on subsampling and internal validation to prioritize the associations. Our work proposes and illustrates practical and scalable solutions to facilitate the implementation of the MWAS approach and improve interpretation in large cohort studies. PMID:28823158

  11. Geochemistry and the understanding of ground-water systems

    USGS Publications Warehouse

    Glynn, Pierre D.; Plummer, Niel

    2005-01-01

    Geochemistry has contributed significantly to the understanding of ground-water systems over the last 50 years. Historic advances include development of the hydrochemical facies concept, application of equilibrium theory, investigation of redox processes, and radiocarbon dating. Other hydrochemical concepts, tools, and techniques have helped elucidate mechanisms of flow and transport in ground-water systems, and have helped unlock an archive of paleoenvironmental information. Hydrochemical and isotopic information can be used to interpret the origin and mode of ground-water recharge, refine estimates of time scales of recharge and ground-water flow, decipher reactive processes, provide paleohydrological information, and calibrate ground-water flow models. Progress needs to be made in obtaining representative samples. Improvements are needed in the interpretation of the information obtained, and in the construction and interpretation of numerical models utilizing hydrochemical data. The best approach will ensure an optimized iterative process between field data collection and analysis, interpretation, and the application of forward, inverse, and statistical modeling tools. Advances are anticipated from microbiological investigations, the characterization of natural organics, isotopic fingerprinting, applications of dissolved gas measurements, and the fields of reaction kinetics and coupled processes. A thermodynamic perspective is offered that could facilitate the comparison and understanding of the multiple physical, chemical, and biological processes affecting ground-water systems.

  12. Statistical Significance Testing from Three Perspectives and Interpreting Statistical Significance and Nonsignificance and the Role of Statistics in Research.

    ERIC Educational Resources Information Center

    Levin, Joel R.; And Others

    1993-01-01

    Journal editors respond to criticisms of reliance on statistical significance in research reporting. Joel R. Levin ("Journal of Educational Psychology") defends its use, whereas William D. Schafer ("Measurement and Evaluation in Counseling and Development") emphasizes the distinction between statistically significant and important. William Asher…

  13. Acoustic Seafloor Classification near the Duanqiao hydrothermal field at the Southwest Indian Ridge from Multibeam Backscatter Data

    NASA Astrophysics Data System (ADS)

    Wang, A.; Tao, C.; Xu, Y.; Zhang, G.; Liao, S.

    2016-12-01

    The inactive Duanqiao hydrothermal field is located on the 50.5°E SWIR axial high with a shallow depth of about 1700 meters. Seafloor morphology of the area surrounding the field is relatively flat, which exerts less influence on multibeam backscatter data than rugged terrains do. Therefore, it is an ideal experimental area to conduct seafloor classification utilizing multibeam sonar. This paper dealt with a backscatter analysis of Simrad EM120 multibeam sonar data, acquired during the Chinese DY115-34 cruise near the Duanqiao hydrothermal field, and comprehensively studied types and distribution characteristics of seafloor substrate by combining with visual interpretations and TV-Grab Samples. Firstly, a mosaic was built to analyze backscatter distribution after multibeam backscatter data were fully processed using Geocoder engine on CARIS HIPS&SIPS software. Prior information was gained by analyzing the link between the processed backscatter data and the visual interpretations of two deep-tow video survey lines. Among the two survey lines, one corresponds to sediment-dominated seafloor and the other corresponds to pillow basalt-dominated seafloor. Then, backscatter data of the mosaic were classified statistically to identify three types of seafloor: soft substrate, medium-hard substrate and hard substrate. Compared with visual interpretations and TV-Grab Samples, these three seafloor types were interpreted as sediment, breccia and pillow basalt, respectively. Finally, a seafloor classification map was generated. According to the results, we discovered two distinguished distribution characteristics of seafloor substrate: 1. there is a transition from pillow basalt-dominated seafloor to sediment-dominated seafloor away from the SWIR axis; 2. the Duanqiao hydrothermal field is mostly outcropped by pillow basalts and locally covered by breccias and sediments, the reason of which is probably that this field is a relatively recent volcanic area.

  14. Quantifying nonstationary radioactivity concentration fluctuations near Chernobyl: A complete statistical description

    NASA Astrophysics Data System (ADS)

    Viswanathan, G. M.; Buldyrev, S. V.; Garger, E. K.; Kashpur, V. A.; Lucena, L. S.; Shlyakhter, A.; Stanley, H. E.; Tschiersch, J.

    2000-09-01

    We analyze nonstationary 137Cs atmospheric activity concentration fluctuations measured near Chernobyl after the 1986 disaster and find three new results: (i) the histogram of fluctuations is well described by a log-normal distribution; (ii) there is a pronounced spectral component with period T=1yr, and (iii) the fluctuations are long-range correlated. These findings allow us to quantify two fundamental statistical properties of the data: the probability distribution and the correlation properties of the time series. We interpret our findings as evidence that the atmospheric radionuclide resuspension processes are tightly coupled to the surrounding ecosystems and to large time scale weather patterns.

  15. Interconnect fatigue design for terrestrial photovoltaic modules

    NASA Technical Reports Server (NTRS)

    Mon, G. R.; Moore, D. M.; Ross, R. G., Jr.

    1982-01-01

    The results of comprehensive investigation of interconnect fatigue that has led to the definition of useful reliability-design and life-prediction algorithms are presented. Experimental data indicate that the classical strain-cycle (fatigue) curve for the interconnect material is a good model of mean interconnect fatigue performance, but it fails to account for the broad statistical scatter, which is critical to reliability prediction. To fill this shortcoming the classical fatigue curve is combined with experimental cumulative interconnect failure rate data to yield statistical fatigue curves (having failure probability as a parameter) which enable (1) the prediction of cumulative interconnect failures during the design life of an array field, and (2) the unambiguous--ie., quantitative--interpretation of data from field-service qualification (accelerated thermal cycling) tests. Optimal interconnect cost-reliability design algorithms are derived based on minimizing the cost of energy over the design life of the array field.

  16. Interconnect fatigue design for terrestrial photovoltaic modules

    NASA Astrophysics Data System (ADS)

    Mon, G. R.; Moore, D. M.; Ross, R. G., Jr.

    1982-03-01

    The results of comprehensive investigation of interconnect fatigue that has led to the definition of useful reliability-design and life-prediction algorithms are presented. Experimental data indicate that the classical strain-cycle (fatigue) curve for the interconnect material is a good model of mean interconnect fatigue performance, but it fails to account for the broad statistical scatter, which is critical to reliability prediction. To fill this shortcoming the classical fatigue curve is combined with experimental cumulative interconnect failure rate data to yield statistical fatigue curves (having failure probability as a parameter) which enable (1) the prediction of cumulative interconnect failures during the design life of an array field, and (2) the unambiguous--ie., quantitative--interpretation of data from field-service qualification (accelerated thermal cycling) tests. Optimal interconnect cost-reliability design algorithms are derived based on minimizing the cost of energy over the design life of the array field.

  17. Precision Cosmology: The First Half Million Years

    NASA Astrophysics Data System (ADS)

    Jones, Bernard J. T.

    2017-06-01

    Cosmology seeks to characterise our Universe in terms of models based on well-understood and tested physics. Today we know our Universe with a precision that once would have been unthinkable. This book develops the entire mathematical, physical and statistical framework within which this has been achieved. It tells the story of how we arrive at our profound conclusions, starting from the early twentieth century and following developments up to the latest data analysis of big astronomical datasets. It provides an enlightening description of the mathematical, physical and statistical basis for understanding and interpreting the results of key space- and ground-based data. Subjects covered include general relativity, cosmological models, the inhomogeneous Universe, physics of the cosmic background radiation, and methods and results of data analysis. Extensive online supplementary notes, exercises, teaching materials, and exercises in Python make this the perfect companion for researchers, teachers and students in physics, mathematics, and astrophysics.

  18. Sources of Safety Data and Statistical Strategies for Design and Analysis: Postmarket Surveillance.

    PubMed

    Izem, Rima; Sanchez-Kam, Matilde; Ma, Haijun; Zink, Richard; Zhao, Yueqin

    2018-03-01

    Safety data are continuously evaluated throughout the life cycle of a medical product to accurately assess and characterize the risks associated with the product. The knowledge about a medical product's safety profile continually evolves as safety data accumulate. This paper discusses data sources and analysis considerations for safety signal detection after a medical product is approved for marketing. This manuscript is the second in a series of papers from the American Statistical Association Biopharmaceutical Section Safety Working Group. We share our recommendations for the statistical and graphical methodologies necessary to appropriately analyze, report, and interpret safety outcomes, and we discuss the advantages and disadvantages of safety data obtained from passive postmarketing surveillance systems compared to other sources. Signal detection has traditionally relied on spontaneous reporting databases that have been available worldwide for decades. However, current regulatory guidelines and ease of reporting have increased the size of these databases exponentially over the last few years. With such large databases, data-mining tools using disproportionality analysis and helpful graphics are often used to detect potential signals. Although the data sources have many limitations, analyses of these data have been successful at identifying safety signals postmarketing. Experience analyzing these dynamic data is useful in understanding the potential and limitations of analyses with new data sources such as social media, claims, or electronic medical records data.

  19. Robust statistical methods for impulse noise suppressing of spread spectrum induced polarization data, with application to a mine site, Gansu province, China

    NASA Astrophysics Data System (ADS)

    Liu, Weiqiang; Chen, Rujun; Cai, Hongzhu; Luo, Weibin

    2016-12-01

    In this paper, we investigated the robust processing of noisy spread spectrum induced polarization (SSIP) data. SSIP is a new frequency domain induced polarization method that transmits pseudo-random m-sequence as source current where m-sequence is a broadband signal. The potential information at multiple frequencies can be obtained through measurement. Removing the noise is a crucial problem for SSIP data processing. Considering that if the ordinary mean stack and digital filter are not capable of reducing the impulse noise effectively in SSIP data processing, the impact of impulse noise will remain in the complex resistivity spectrum that will affect the interpretation of profile anomalies. We implemented a robust statistical method to SSIP data processing. The robust least-squares regression is used to fit and remove the linear trend from the original data before stacking. The robust M estimate is used to stack the data of all periods. The robust smooth filter is used to suppress the residual noise for data after stacking. For robust statistical scheme, the most appropriate influence function and iterative algorithm are chosen by testing the simulated data to suppress the outliers' influence. We tested the benefits of the robust SSIP data processing using examples of SSIP data recorded in a test site beside a mine in Gansu province, China.

  20. Improving interpretation of publically reported statistics on health and healthcare: the Figure Interpretation Assessment Tool (FIAT-Health).

    PubMed

    Gerrits, Reinie G; Kringos, Dionne S; van den Berg, Michael J; Klazinga, Niek S

    2018-03-07

    Policy-makers, managers, scientists, patients and the general public are confronted daily with figures on health and healthcare through public reporting in newspapers, webpages and press releases. However, information on the key characteristics of these figures necessary for their correct interpretation is often not adequately communicated, which can lead to misinterpretation and misinformed decision-making. The objective of this research was to map the key characteristics relevant to the interpretation of figures on health and healthcare, and to develop a Figure Interpretation Assessment Tool-Health (FIAT-Health) through which figures on health and healthcare can be systematically assessed, allowing for a better interpretation of these figures. The abovementioned key characteristics of figures on health and healthcare were identified through systematic expert consultations in the Netherlands on four topic categories of figures, namely morbidity, healthcare expenditure, healthcare outcomes and lifestyle. The identified characteristics were used as a frame for the development of the FIAT-Health. Development of the tool and its content was supported and validated through regular review by a sounding board of potential users. Identified characteristics relevant for the interpretation of figures in the four categories relate to the figures' origin, credibility, expression, subject matter, population and geographical focus, time period, and underlying data collection methods. The characteristics were translated into a set of 13 dichotomous and 4-point Likert scale questions constituting the FIAT-Health, and two final assessment statements. Users of the FIAT-Health were provided with a summary overview of their answers to support a final assessment of the correctness of a figure and the appropriateness of its reporting. FIAT-Health can support policy-makers, managers, scientists, patients and the general public to systematically assess the quality of publicly reported figures on health and healthcare. It also has the potential to support the producers of health and healthcare data in clearly communicating their data to different audiences. Future research should focus on the further validation of the tool in practice.

  1. Statistical total correlation spectroscopy scaling for enhancement of metabolic information recovery in biological NMR spectra.

    PubMed

    Maher, Anthony D; Fonville, Judith M; Coen, Muireann; Lindon, John C; Rae, Caroline D; Nicholson, Jeremy K

    2012-01-17

    The high level of complexity in nuclear magnetic resonance (NMR) metabolic spectroscopic data sets has fueled the development of experimental and mathematical techniques that enhance latent biomarker recovery and improve model interpretability. We previously showed that statistical total correlation spectroscopy (STOCSY) can be used to edit NMR spectra to remove drug metabolite signatures that obscure metabolic variation of diagnostic interest. Here, we extend this "STOCSY editing" concept to a generalized scaling procedure for NMR data that enhances recovery of latent biochemical information and improves biological classification and interpretation. We call this new procedure STOCSY-scaling (STOCSY(S)). STOCSY(S) exploits the fixed proportionality in a set of NMR spectra between resonances from the same molecule to suppress or enhance features correlated with a resonance of interest. We demonstrate this new approach using two exemplar data sets: (a) a streptozotocin rat model (n = 30) of type 1 diabetes and (b) a human epidemiological study utilizing plasma NMR spectra of patients with metabolic syndrome (n = 67). In both cases significant biomarker discovery improvement was observed by using STOCSY(S): the approach successfully suppressed interfering NMR signals from glucose and lactate that otherwise dominate the variation in the streptozotocin study, which then allowed recovery of biomarkers such as glycine, which were otherwise obscured. In the metabolic syndrome study, we used STOCSY(S) to enhance variation from the high-density lipoprotein cholesterol peak, improving the prediction of individuals with metabolic syndrome from controls in orthogonal projections to latent structures discriminant analysis models and facilitating the biological interpretation of the results. Thus, STOCSY(S) is a versatile technique that is applicable in any situation in which variation, either biological or otherwise, dominates a data set at the expense of more interesting or important features. This approach is generally appropriate for many types of NMR-based complex mixture analyses and hence for wider applications in bioanalytical science.

  2. Patient movement characteristics and the impact on CBCT image quality and interpretability.

    PubMed

    Spin-Neto, Rubens; Costa, Cláudio; Salgado, Daniela Mra; Zambrana, Nataly Rm; Gotfredsen, Erik; Wenzel, Ann

    2018-01-01

    To assess the impact of patient movement characteristics and metal/radiopaque materials in the field-of-view (FOV) on CBCT image quality and interpretability. 162 CBCT examinations were performed in 134 consecutive (i.e. prospective data collection) patients (age average: 27.2 years; range: 9-73). An accelerometer-gyroscope system registered patient's head position during examination. The threshold for movement definition was set at ≥0.5-mm movement distance based on accelerometer-gyroscope recording. Movement complexity was defined as uniplanar/multiplanar. Three observers scored independently: presence of stripe (i.e. streak) artefacts (absent/"enamel stripes"/"metal stripes"/"movement stripes"), overall unsharpness (absent/present) and image interpretability (interpretable/not interpretable). Kappa statistics assessed interobserver agreement. χ 2 tests analysed whether movement distance, movement complexity and metal/radiopaque material in the FOV affected image quality and image interpretability. Relevant risk factors (p ≤ 0.20) were entered into a multivariate logistic regression analysis with "not interpretable" as the outcome. Interobserver agreement for image interpretability was good (average = 0.65). Movement distance and presence of metal/radiopaque materials significantly affected image quality and interpretability. There were 22-28 cases, in which the observers stated the image was not interpretable. Small movements (i.e. <3 mm) did not significantly affect image interpretability. For movements ≥ 3 mm, the risk that a case was scored as "not interpretable" was significantly (p ≤ 0.05) increased [OR 3.2-11.3; 95% CI (0.70-65.47)]. Metal/radiopaque material was also a significant (p ≤ 0.05) risk factor (OR 3.61-5.05). Patient movement ≥3 mm and metal/radiopaque material in the FOV significantly affected CBCT image quality and interpretability.

  3. Clustering, randomness and regularity in cloud fields. I - Theoretical considerations. II - Cumulus cloud fields

    NASA Technical Reports Server (NTRS)

    Weger, R. C.; Lee, J.; Zhu, Tianri; Welch, R. M.

    1992-01-01

    The current controversy existing in reference to the regularity vs. clustering in cloud fields is examined by means of analysis and simulation studies based upon nearest-neighbor cumulative distribution statistics. It is shown that the Poisson representation of random point processes is superior to pseudorandom-number-generated models and that pseudorandom-number-generated models bias the observed nearest-neighbor statistics towards regularity. Interpretation of this nearest-neighbor statistics is discussed for many cases of superpositions of clustering, randomness, and regularity. A detailed analysis is carried out of cumulus cloud field spatial distributions based upon Landsat, AVHRR, and Skylab data, showing that, when both large and small clouds are included in the cloud field distributions, the cloud field always has a strong clustering signal.

  4. Classification image analysis: estimation and statistical inference for two-alternative forced-choice experiments

    NASA Technical Reports Server (NTRS)

    Abbey, Craig K.; Eckstein, Miguel P.

    2002-01-01

    We consider estimation and statistical hypothesis testing on classification images obtained from the two-alternative forced-choice experimental paradigm. We begin with a probabilistic model of task performance for simple forced-choice detection and discrimination tasks. Particular attention is paid to general linear filter models because these models lead to a direct interpretation of the classification image as an estimate of the filter weights. We then describe an estimation procedure for obtaining classification images from observer data. A number of statistical tests are presented for testing various hypotheses from classification images based on some more compact set of features derived from them. As an example of how the methods we describe can be used, we present a case study investigating detection of a Gaussian bump profile.

  5. Developments in FT-ICR MS instrumentation, ionization techniques, and data interpretation methods for petroleomics.

    PubMed

    Cho, Yunju; Ahmed, Arif; Islam, Annana; Kim, Sunghwan

    2015-01-01

    Because of the increasing importance of heavy and unconventional crude oil as an energy source, there is a growing need for petroleomics: the pursuit of more complete and detailed knowledge of the chemical compositions of crude oil. Crude oil has an extremely complex nature; hence, techniques with ultra-high resolving capabilities, such as Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS), are necessary. FT-ICR MS has been successfully applied to the study of heavy and unconventional crude oils such as bitumen and shale oil. However, the analysis of crude oil with FT-ICR MS is not trivial, and it has pushed analysis to the limits of instrumental and methodological capabilities. For example, high-resolution mass spectra of crude oils may contain over 100,000 peaks that require interpretation. To visualize large data sets more effectively, data processing methods such as Kendrick mass defect analysis and statistical analyses have been developed. The successful application of FT-ICR MS to the study of crude oil has been critically dependent on key developments in FT-ICR MS instrumentation and data processing methods. This review offers an introduction to the basic principles, FT-ICR MS instrumentation development, ionization techniques, and data interpretation methods for petroleomics and is intended for readers having no prior experience in this field of study. © 2014 Wiley Periodicals, Inc.

  6. Statistical results on restorative dentistry experiments: effect of the interaction between main variables

    PubMed Central

    CAVALCANTI, Andrea Nóbrega; MARCHI, Giselle Maria; AMBROSANO, Gláucia Maria Bovi

    2010-01-01

    Statistical analysis interpretation is a critical field in scientific research. When there is more than one main variable being studied in a research, the effect of the interaction between those variables is fundamental on experiments discussion. However, some doubts can occur when the p-value of the interaction is greater than the significance level. Objective To determine the most adequate interpretation for factorial experiments with p-values of the interaction nearly higher than the significance level. Materials and methods The p-values of the interactions found in two restorative dentistry experiments (0.053 and 0.068) were interpreted in two distinct ways: considering the interaction as not significant and as significant. Results Different findings were observed between the two analyses, and studies results became more coherent when the significant interaction was used. Conclusion The p-value of the interaction between main variables must be analyzed with caution because it can change the outcomes of research studies. Researchers are strongly advised to interpret carefully the results of their statistical analysis in order to discuss the findings of their experiments properly. PMID:20857003

  7. A statistical framework to predict functional non-coding regions in the human genome through integrated analysis of annotation data.

    PubMed

    Lu, Qiongshi; Hu, Yiming; Sun, Jiehuan; Cheng, Yuwei; Cheung, Kei-Hoi; Zhao, Hongyu

    2015-05-27

    Identifying functional regions in the human genome is a major goal in human genetics. Great efforts have been made to functionally annotate the human genome either through computational predictions, such as genomic conservation, or high-throughput experiments, such as the ENCODE project. These efforts have resulted in a rich collection of functional annotation data of diverse types that need to be jointly analyzed for integrated interpretation and annotation. Here we present GenoCanyon, a whole-genome annotation method that performs unsupervised statistical learning using 22 computational and experimental annotations thereby inferring the functional potential of each position in the human genome. With GenoCanyon, we are able to predict many of the known functional regions. The ability of predicting functional regions as well as its generalizable statistical framework makes GenoCanyon a unique and powerful tool for whole-genome annotation. The GenoCanyon web server is available at http://genocanyon.med.yale.edu.

  8. Data to inform a social media component for professional development and practices: A design-based research study.

    PubMed

    Novakovich, Jeanette; Shaw, Steven; Miah, Sophia

    2017-02-01

    This DIB article includes the course artefacts, instruments, survey data, and descriptive statistics, along with in-depth correlational analysis for the first iteration of a design-based research study on designing curriculum for developing online professional identity and social media practices for a multi-major advanced professional writing course. Raw data was entered into SPSS software. For interpretation and discussion, please see the original article entitled, "Designing curriculum to shape professional social media skills and identity in virtual communities of practice" (J. Novakovich, S. Miah, S. Shaw, 2017) [1].

  9. A stochastic approach to uncertainty quantification in residual moveout analysis

    NASA Astrophysics Data System (ADS)

    Johng-Ay, T.; Landa, E.; Dossou-Gbété, S.; Bordes, L.

    2015-06-01

    Oil and gas exploration and production relies usually on the interpretation of a single seismic image, which is obtained from observed data. However, the statistical nature of seismic data and the various approximations and assumptions are sources of uncertainties which may corrupt the evaluation of parameters. The quantification of these uncertainties is a major issue which supposes to help in decisions that have important social and commercial implications. The residual moveout analysis, which is an important step in seismic data processing is usually performed by a deterministic approach. In this paper we discuss a Bayesian approach to the uncertainty analysis.

  10. A search for anisotrophy in the cosmic microwave background on intermediate angular scales

    NASA Technical Reports Server (NTRS)

    Alsop, D. C.; Cheng, E. S.; Clapp, A. C.; Cottingham, D. A.; Fischer, M. L.; Gundersen, J. O.; Kreysa, E.; Lange, A. E.; Lubin, P. M.; Meinhold, P. R.

    1992-01-01

    The results of a search for anisotropy in the cosmic microwave background on angular scales near 1 deg are presented. Observations were simultaneously performed in bands centered at frequencies of 6, 9, and 12 per cm with a multifrequency bolometric receiver mounted on a balloon-borne telescope. The statistical sensitivity of the data is the highest reported to date at this angular scale, which is of critical importance for understanding the formation of structure in the universe. Signals in excess of random were observed in the data. The experiment, data analysis, and interpretation are described.

  11. Acquisition and extinction in autoshaping.

    PubMed

    Kakade, Sham; Dayan, Peter

    2002-07-01

    C. R. Gallistel and J. Gibbon (2000) presented quantitative data on the speed with which animals acquire behavioral responses during autoshaping, together with a statistical model of learning intended to account for them. Although this model captures the form of the dependencies among critical variables, its detailed predictions are substantially at variance with the data. In the present article, further key data on the speed of acquisition are used to motivate an alternative model of learning, in which animals can be interpreted as paying different amounts of attention to stimuli according to estimates of their differential reliabilities as predictors.

  12. Which statistics should tropical biologists learn?

    PubMed

    Loaiza Velásquez, Natalia; González Lutz, María Isabel; Monge-Nájera, Julián

    2011-09-01

    Tropical biologists study the richest and most endangered biodiversity in the planet, and in these times of climate change and mega-extinctions, the need for efficient, good quality research is more pressing than in the past. However, the statistical component in research published by tropical authors sometimes suffers from poor quality in data collection; mediocre or bad experimental design and a rigid and outdated view of data analysis. To suggest improvements in their statistical education, we listed all the statistical tests and other quantitative analyses used in two leading tropical journals, the Revista de Biología Tropical and Biotropica, during a year. The 12 most frequent tests in the articles were: Analysis of Variance (ANOVA), Chi-Square Test, Student's T Test, Linear Regression, Pearson's Correlation Coefficient, Mann-Whitney U Test, Kruskal-Wallis Test, Shannon's Diversity Index, Tukey's Test, Cluster Analysis, Spearman's Rank Correlation Test and Principal Component Analysis. We conclude that statistical education for tropical biologists must abandon the old syllabus based on the mathematical side of statistics and concentrate on the correct selection of these and other procedures and tests, on their biological interpretation and on the use of reliable and friendly freeware. We think that their time will be better spent understanding and protecting tropical ecosystems than trying to learn the mathematical foundations of statistics: in most cases, a well designed one-semester course should be enough for their basic requirements.

  13. Use of missing data methods in longitudinal studies: the persistence of bad practices in developmental psychology.

    PubMed

    Jelicić, Helena; Phelps, Erin; Lerner, Richard M

    2009-07-01

    Developmental science rests on describing, explaining, and optimizing intraindividual changes and, hence, empirically requires longitudinal research. Problems of missing data arise in most longitudinal studies, thus creating challenges for interpreting the substance and structure of intraindividual change. Using a sample of reports of longitudinal studies obtained from three flagship developmental journals-Child Development, Developmental Psychology, and Journal of Research on Adolescence-we examined the number of longitudinal studies reporting missing data and the missing data techniques used. Of the 100 longitudinal studies sampled, 57 either reported having missing data or had discrepancies in sample sizes reported for different analyses. The majority of these studies (82%) used missing data techniques that are statistically problematic (either listwise deletion or pairwise deletion) and not among the methods recommended by statisticians (i.e., the direct maximum likelihood method and the multiple imputation method). Implications of these results for developmental theory and application, and the need for understanding the consequences of using statistically inappropriate missing data techniques with actual longitudinal data sets, are discussed.

  14. On the relationship between income, fertility rates and the state of democracy in society

    NASA Astrophysics Data System (ADS)

    Hutzler, S.; Sommer, C.; Richmond, P.

    2016-06-01

    Empirical data for 145 countries shows a strong correlation between the gross national income per capita and the political form of their governance, as specified by the so-called democracy index. We interpret this relationship in analogy to phase transitions between different states of matter, using concepts of statistical physics. Fertility rates play the role of binding energy in solid state physics.

  15. The ratio of profile peak separations as a probe of pulsar radio-beam structure

    NASA Astrophysics Data System (ADS)

    Dyks, J.; Pierbattista, M.

    2015-12-01

    The known population of pulsars contains objects with four- and five-component profiles, for which the peak-to-peak separations between the inner and outer components can be measured. These Q- and M-type profiles can be interpreted as a result of sightline cut through a nested-cone beam, or through a set of azimuthal fan beams. We show that the ratio RW of the components' separations provides a useful measure of the beam shape, which is mostly independent of parameters that determine the beam scale and complicate interpretation of simpler profiles. In particular, the method does not depend on the emission altitude and the dipole tilt distribution. The different structures of the radio beam imply manifestly different statistical distributions of RW, with the conal model being several orders of magnitude less consistent with data than the fan-beam model. To bring the conal model into consistency with data, strong effects of observational selection need to be called for, with 80 per cent of Q and M profiles assumed to be undetected because of intrinsic blending effects. It is concluded that the statistical properties of Q and M profiles are more consistent with the fan-shaped beams, than with the traditional nested-cone geometry.

  16. Interpreting Meta-Analyses of Genome-Wide Association Studies

    PubMed Central

    Han, Buhm; Eskin, Eleazar

    2012-01-01

    Meta-analysis is an increasingly popular tool for combining multiple genome-wide association studies in a single analysis to identify associations with small effect sizes. The effect sizes between studies in a meta-analysis may differ and these differences, or heterogeneity, can be caused by many factors. If heterogeneity is observed in the results of a meta-analysis, interpreting the cause of heterogeneity is important because the correct interpretation can lead to a better understanding of the disease and a more effective design of a replication study. However, interpreting heterogeneous results is difficult. The standard approach of examining the association p-values of the studies does not effectively predict if the effect exists in each study. In this paper, we propose a framework facilitating the interpretation of the results of a meta-analysis. Our framework is based on a new statistic representing the posterior probability that the effect exists in each study, which is estimated utilizing cross-study information. Simulations and application to the real data show that our framework can effectively segregate the studies predicted to have an effect, the studies predicted to not have an effect, and the ambiguous studies that are underpowered. In addition to helping interpretation, the new framework also allows us to develop a new association testing procedure taking into account the existence of effect. PMID:22396665

  17. A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility.

    PubMed

    Moore, Jason H; Gilbert, Joshua C; Tsai, Chia-Ti; Chiang, Fu-Tien; Holden, Todd; Barney, Nate; White, Bill C

    2006-07-21

    Detecting, characterizing, and interpreting gene-gene interactions or epistasis in studies of human disease susceptibility is both a mathematical and a computational challenge. To address this problem, we have previously developed a multifactor dimensionality reduction (MDR) method for collapsing high-dimensional genetic data into a single dimension (i.e. constructive induction) thus permitting interactions to be detected in relatively small sample sizes. In this paper, we describe a comprehensive and flexible framework for detecting and interpreting gene-gene interactions that utilizes advances in information theory for selecting interesting single-nucleotide polymorphisms (SNPs), MDR for constructive induction, machine learning methods for classification, and finally graphical models for interpretation. We illustrate the usefulness of this strategy using artificial datasets simulated from several different two-locus and three-locus epistasis models. We show that the accuracy, sensitivity, specificity, and precision of a naïve Bayes classifier are significantly improved when SNPs are selected based on their information gain (i.e. class entropy removed) and reduced to a single attribute using MDR. We then apply this strategy to detecting, characterizing, and interpreting epistatic models in a genetic study (n = 500) of atrial fibrillation and show that both classification and model interpretation are significantly improved.

  18. Ensemble Classifiers for Predicting HIV-1 Resistance from Three Rule-Based Genotypic Resistance Interpretation Systems.

    PubMed

    Raposo, Letícia M; Nobre, Flavio F

    2017-08-30

    Resistance to antiretrovirals (ARVs) is a major problem faced by HIV-infected individuals. Different rule-based algorithms were developed to infer HIV-1 susceptibility to antiretrovirals from genotypic data. However, there is discordance between them, resulting in difficulties for clinical decisions about which treatment to use. Here, we developed ensemble classifiers integrating three interpretation algorithms: Agence Nationale de Recherche sur le SIDA (ANRS), Rega, and the genotypic resistance interpretation system from Stanford HIV Drug Resistance Database (HIVdb). Three approaches were applied to develop a classifier with a single resistance profile: stacked generalization, a simple plurality vote scheme and the selection of the interpretation system with the best performance. The strategies were compared with the Friedman's test and the performance of the classifiers was evaluated using the F-measure, sensitivity and specificity values. We found that the three strategies had similar performances for the selected antiretrovirals. For some cases, the stacking technique with naïve Bayes as the learning algorithm showed a statistically superior F-measure. This study demonstrates that ensemble classifiers can be an alternative tool for clinical decision-making since they provide a single resistance profile from the most commonly used resistance interpretation systems.

  19. The Statistics of wood assays for preservative retention

    Treesearch

    Patricia K. Lebow; Scott W. Conklin

    2011-01-01

    This paper covers general statistical concepts that apply to interpreting wood assay retention values. In particular, since wood assays are typically obtained from a single composited sample, the statistical aspects, including advantages and disadvantages, of simple compositing are covered.

  20. Continuum radiation from active galactic nuclei: A statistical study

    NASA Technical Reports Server (NTRS)

    Isobe, T.; Feigelson, E. D.; Singh, K. P.; Kembhavi, A.

    1986-01-01

    The physics of the continuum spectrum of active galactic nuclei (AGNs) was examined using a large data set and rigorous statistical methods. A data base was constructed for 469 objects which include radio selected quasars, optically selected quasars, X-ray selected AGNs, BL Lac objects, and optically unidentified compact radio sources. Each object has measurements of its radio, optical, X-ray core continuum luminosity, though many of them are upper limits. Since many radio sources have extended components, the core component were carefully selected out from the total radio luminosity. With survival analysis statistical methods, which can treat upper limits correctly, these data can yield better statistical results than those previously obtained. A variety of statistical tests are performed, such as the comparison of the luminosity functions in different subsamples, and linear regressions of luminosities in different bands. Interpretation of the results leads to the following tentative conclusions: the main emission mechanism of optically selected quasars and X-ray selected AGNs is thermal, while that of BL Lac objects is synchrotron; radio selected quasars may have two different emission mechanisms in the X-ray band; BL Lac objects appear to be special cases of the radio selected quasars; some compact radio sources show the possibility of synchrotron self-Compton (SSC) in the optical band; and the spectral index between the optical and the X-ray bands depends on the optical luminosity.

  1. Constraints on Fault Damage Zone Properties and Normal Modes from a Dense Linear Array Deployment along the San Jacinto Fault Zone

    NASA Astrophysics Data System (ADS)

    Allam, A. A.; Lin, F. C.; Share, P. E.; Ben-Zion, Y.; Vernon, F.; Schuster, G. T.; Karplus, M. S.

    2016-12-01

    We present earthquake data and statistical analyses from a month-long deployment of a linear array of 134 Fairfield three-component 5 Hz seismometers along the Clark strand of the San Jacinto fault zone in Southern California. With a total aperture of 2.4km and mean station spacing of 20m, the array locally spans the entire fault zone from the most intensely fractured core to relatively undamaged host rock on the outer edges. We recorded 36 days of continuous seismic data at 1000Hz sampling rate, capturing waveforms from 751 local events with Mw>0.5 and 43 teleseismic events with M>5.5, including two 600km deep M7.5 events along the Andean subduction zone. For any single local event on the San Jacinto fault, the central stations of the array recorded both higher amplitude and longer duration waveforms, which we interpret as the result of damage-related low-velocity structure acting as a broad waveguide. Using 271 San Jacinto events, we compute the distributions of three quantities for each station: maximum amplitude, mean amplitude, and total energy (the integral of the envelope). All three values become statistically lower with increasing distance from the fault, but in addition show a nonrandom zigzag pattern which we interpret as normal mode oscillations. This interpretation is supported by polarization analysis which demonstrates that the high-amplitude late-arriving energy is strongly vertically polarized in the central part of the array, consistent with Love-type trapped waves. These results, comprising nearly 30,000 separate coseismic waveforms, support the consistent interpretation of a 450m wide asymmetric damage zone, with the lowest velocities offset to the northeast of the mapped surface trace by 100m. This asymmetric damage zone has important implications for the earthquake dynamics of the San Jacinto and especially its ability to generate damaging multi-segment ruptures.

  2. Statistical Models for the Analysis of Zero-Inflated Pain Intensity Numeric Rating Scale Data.

    PubMed

    Goulet, Joseph L; Buta, Eugenia; Bathulapalli, Harini; Gueorguieva, Ralitza; Brandt, Cynthia A

    2017-03-01

    Pain intensity is often measured in clinical and research settings using the 0 to 10 numeric rating scale (NRS). NRS scores are recorded as discrete values, and in some samples they may display a high proportion of zeroes and a right-skewed distribution. Despite this, statistical methods for normally distributed data are frequently used in the analysis of NRS data. We present results from an observational cross-sectional study examining the association of NRS scores with patient characteristics using data collected from a large cohort of 18,935 veterans in Department of Veterans Affairs care diagnosed with a potentially painful musculoskeletal disorder. The mean (variance) NRS pain was 3.0 (7.5), and 34% of patients reported no pain (NRS = 0). We compared the following statistical models for analyzing NRS scores: linear regression, generalized linear models (Poisson and negative binomial), zero-inflated and hurdle models for data with an excess of zeroes, and a cumulative logit model for ordinal data. We examined model fit, interpretability of results, and whether conclusions about the predictor effects changed across models. In this study, models that accommodate zero inflation provided a better fit than the other models. These models should be considered for the analysis of NRS data with a large proportion of zeroes. We examined and analyzed pain data from a large cohort of veterans with musculoskeletal disorders. We found that many reported no current pain on the NRS on the diagnosis date. We present several alternative statistical methods for the analysis of pain intensity data with a large proportion of zeroes. Published by Elsevier Inc.

  3. Surveys Assessing Students' Attitudes toward Statistics: A Systematic Review of Validity and Reliability

    ERIC Educational Resources Information Center

    Nolan, Meaghan M.; Beran, Tanya; Hecker, Kent G.

    2012-01-01

    Students with positive attitudes toward statistics are likely to show strong academic performance in statistics courses. Multiple surveys measuring students' attitudes toward statistics exist; however, a comparison of the validity and reliability of interpretations based on their scores is needed. A systematic review of relevant electronic…

  4. Interpreting the Weibull fitting parameters for diffusion-controlled release data

    NASA Astrophysics Data System (ADS)

    Ignacio, Maxime; Chubynsky, Mykyta V.; Slater, Gary W.

    2017-11-01

    We examine the diffusion-controlled release of molecules from passive delivery systems using both analytical solutions of the diffusion equation and numerically exact Lattice Monte Carlo data. For very short times, the release process follows a √{ t } power law, typical of diffusion processes, while the long-time asymptotic behavior is exponential. The crossover time between these two regimes is determined by the boundary conditions and initial loading of the system. We show that while the widely used Weibull function provides a reasonable fit (in terms of statistical error), it has two major drawbacks: (i) it does not capture the correct limits and (ii) there is no direct connection between the fitting parameters and the properties of the system. Using a physically motivated interpolating fitting function that correctly includes both time regimes, we are able to predict the values of the Weibull parameters which allows us to propose a physical interpretation.

  5. Pesticides, Neurodevelopmental Disagreement, and Bradford Hill's Guidelines.

    PubMed

    Shrader-Frechette, Kristin; ChoGlueck, Christopher

    2016-06-27

    Neurodevelopmental disorders such as autism affect one-eighth of all U.S. newborns. Yet scientists, accessing the same data and using Bradford-Hill guidelines, draw different conclusions about the causes of these disorders. They disagree about the pesticide-harm hypothesis, that typical United States prenatal pesticide exposure can cause neurodevelopmental damage. This article aims to discover whether apparent scientific disagreement about this hypothesis might be partly attributable to questionable interpretations of the Bradford-Hill causal guidelines. Key scientists, who claim to employ Bradford-Hill causal guidelines, yet fail to accept the pesticide-harm hypothesis, fall into errors of trimming the guidelines, requiring statistically-significant data, and ignoring semi-experimental evidence. However, the main scientists who accept the hypothesis appear to commit none of these errors. Although settling disagreement over the pesticide-harm hypothesis requires extensive analysis, this article suggests that at least some conflicts may arise because of questionable interpretations of the guidelines.

  6. Effects of using the developing nurses' thinking model on nursing students' diagnostic accuracy.

    PubMed

    Tesoro, Mary Gay

    2012-08-01

    This quasi-experimental study tested the effectiveness of an educational model, Developing Nurses' Thinking (DNT), on nursing students' clinical reasoning to achieve patient safety. Teaching nursing students to develop effective thinking habits that promote positive patient outcomes and patient safety is a challenging endeavor. Positive patient outcomes and safety are achieved when nurses accurately interpret data and subsequently implement appropriate plans of care. This study's pretest-posttest design determined whether use of the DNT model during 2 weeks of clinical postconferences improved nursing students' (N = 83) diagnostic accuracy. The DNT model helps students to integrate four constructs-patient safety, domain knowledge, critical thinking processes, and repeated practice-to guide their thinking when interpreting patient data and developing effective plans of care. The posttest scores of students from the intervention group showed statistically significant improvement in accuracy. Copyright 2012, SLACK Incorporated.

  7. A Genealogical Interpretation of Principal Components Analysis

    PubMed Central

    McVean, Gil

    2009-01-01

    Principal components analysis, PCA, is a statistical method commonly used in population genetics to identify structure in the distribution of genetic variation across geographical location and ethnic background. However, while the method is often used to inform about historical demographic processes, little is known about the relationship between fundamental demographic parameters and the projection of samples onto the primary axes. Here I show that for SNP data the projection of samples onto the principal components can be obtained directly from considering the average coalescent times between pairs of haploid genomes. The result provides a framework for interpreting PCA projections in terms of underlying processes, including migration, geographical isolation, and admixture. I also demonstrate a link between PCA and Wright's fst and show that SNP ascertainment has a largely simple and predictable effect on the projection of samples. Using examples from human genetics, I discuss the application of these results to empirical data and the implications for inference. PMID:19834557

  8. Search for neutral MSSM Higgs bosons at LEP

    NASA Astrophysics Data System (ADS)

    Schael, S.; Barate, R.; Brunelière, R.; de Bonis, I.; Decamp, D.; Goy, C.; Jézéquel, S.; Lees, J.-P.; Martin, F.; Merle, E.; Minard, M.-N.; Pietrzyk, B.; Trocmé, B.; Bravo, S.; Casado, M. P.; Chmeissani, M.; Crespo, J. M.; Fernandez, E.; Fernandez-Bosman, M.; Garrido, L.; Martinez, M.; Pacheco, A.; Ruiz, H.; Colaleo, A.; Creanza, D.; de Filippis, N.; de Palma, M.; Iaselli, G.; Maggi, G.; Maggi, M.; Nuzzo, S.; Ranieri, A.; Raso, G.; Ruggieri, F.; Selvaggi, G.; Silvestris, L.; Tempesta, P.; Tricomi, A.; Zito, G.; Huang, X.; Lin, J.; Ouyang, Q.; Wang, T.; Xie, Y.; Xu, R.; Xue, S.; Zhang, J.; Zhang, L.; Zhao, W.; Abbaneo, D.; Barklow, T.; Buchmüller, O.; Cattaneo, M.; Clerbaux, B.; Drevermann, H.; Forty, R. W.; Frank, M.; Gianotti, F.; Hansen, J. B.; Harvey, J.; Hutchcroft, D. E.; Janot, P.; Jost, B.; Kado, M.; Mato, P.; Moutoussi, A.; Ranjard, F.; Rolandi, L.; Schlatter, D.; Teubert, F.; Valassi, A.; Videau, I.; Badaud, F.; Dessagne, S.; Falvard, A.; Fayolle, D.; Gay, P.; Jousset, J.; Michel, B.; Monteil, S.; Pallin, D.; Pascolo, J. M.; Perret, P.; Hansen, J. D.; Hansen, J. R.; Hansen, P. H.; Kraan, A. C.; Nilsson, B. S.; Kyriakis, A.; Markou, C.; Simopoulou, E.; Vayaki, A.; Zachariadou, K.; Blondel, A.; Brient, J.-C.; Machefert, F.; Rougé, A.; Videau, H.; Ciulli, V.; Focardi, E.; Parrini, G.; Antonelli, A.; Antonelli, M.; Bencivenni, G.; Bossi, F.; Capon, G.; Cerutti, F.; Chiarella, V.; Mannocchi, G.; Laurelli, P.; Mannocchi, G.; Murtas, G. P.; Passalacqua, L.; Kennedy, J.; Lynch, J. G.; Negus, P.; O'Shea, V.; Thompson, A. S.; Wasserbaech, S.; Cavanaugh, R.; Dhamotharan, S.; Geweniger, C.; Hanke, P.; Hepp, V.; Kluge, E. E.; Putzer, A.; Stenzel, H.; Tittel, K.; Wunsch, M.; Beuselinck, R.; Cameron, W.; Davies, G.; Dornan, P. J.; Girone, M.; Marinelli, N.; Nowell, J.; Rutherford, S. A.; Sedgbeer, J. K.; Thompson, J. C.; White, R.; Ghete, V. M.; Girtler, P.; Kneringer, E.; Kuhn, D.; Rudolph, G.; Bouhova-Thacker, E.; Bowdery, C. K.; Clarke, D. P.; Ellis, G.; Finch, A. J.; Foster, F.; Hughes, G.; Jones, R. W. L.; Pearson, M. R.; Robertson, N. A.; Smizanska, M.; van der Aa, O.; Delaere, C.; Leibenguth, G.; Lemaitre, V.; Blumenschein, U.; Hölldorfer, F.; Jakobs, K.; Kayser, F.; Müller, A.-S.; Renk, B.; Sander, H.-G.; Schmeling, S.; Wachsmuth, H.; Zeitnitz, C.; Ziegler, T.; Bonissent, A.; Coyle, P.; Curtil, C.; Ealet, A.; Fouchez, D.; Payre, P.; Tilquin, A.; Ragusa, F.; David, A.; Dietl, H.; Ganis, G.; Hüttmann, K.; Lütjens, G.; Männer, W.; Moser, H.-G.; Settles, R.; Villegas, M.; Wolf, G.; Boucrot, J.; Callot, O.; Davier, M.; Duflot, L.; Grivaz, J.-F.; Heusse, P.; Jacholkowska, A.; Serin, L.; Veillet, J.-J.; Azzurri, P.; Bagliesi, G.; Boccali, T.; Foà, L.; Giammanco, A.; Giassi, A.; Ligabue, F.; Messineo, A.; Palla, F.; Sanguinetti, G.; Sciabà, A.; Sguazzoni, G.; Spagnolo, P.; Tenchini, R.; Venturi, A.; Verdini, P. G.; Awunor, O.; Blair, G. A.; Cowan, G.; Garcia-Bellido, A.; Green, M. G.; Medcalf, T.; Misiejuk, A.; Strong, J. A.; Teixeira-Dias, P.; Clifft, R. W.; Edgecock, T. R.; Norton, P. R.; Tomalin, I. R.; Ward, J. J.; Bloch-Devaux, B.; Boumediene, D.; Colas, P.; Fabbro, B.; Lançon, E.; Lemaire, M.-C.; Locci, E.; Perez, P.; Rander, J.; Tuchming, B.; Vallage, B.; Litke, A. M.; Taylor, G.; Booth, C. N.; Cartwright, S.; Combley, F.; Hodgson, P. N.; Lehto, M.; Thompson, L. F.; Böhrer, A.; Brandt, S.; Grupen, C.; Hess, J.; Ngac, A.; Prange, G.; Borean, C.; Giannini, G.; He, H.; Putz, J.; Rothberg, J.; Armstrong, S. R.; Berkelman, K.; Cranmer, K.; Ferguson, D. P. S.; Gao, Y.; González, S.; Hayes, O. J.; Hu, H.; Jin, S.; Kile, J.; McNamara, P. A., III; Nielsen, J.; Pan, Y. B.; von Wimmersperg-Toeller, J. H.; Wiedenmann, W.; Wu, J.; Wu, S. L.; Wu, X.; Zobernig, G.; Dissertori, G.; Abdallah, J.; Abreu, P.; Adam, W.; Adzic, P.; Albrecht, T.; Alderweireld, T.; Alemany-Fernandez, R.; Allmendinger, T.; Allport, P. P.; Amaldi, U.; Amapane, N.; Amato, S.; Anashkin, E.; Andreazza, A.; Andringa, S.; Anjos, N.; Antilogus, P.; Apel, W.-D.; Arnoud, Y.; Ask, S.; Asman, B.; Augustin, J. E.; Augustinus, A.; Baillon, P.; Ballestrero, A.; Bambade, P.; Barbier, R.; Bardin, D.; Barker, G. J.; Baroncelli, A.; Battaglia, M.; Baubillier, M.; Becks, K.-H.; Begalli, M.; Behrmann, A.; Ben-Haim, E.; Benekos, N.; Benvenuti, A.; Berat, C.; Berggren, M.; Berntzon, L.; Bertrand, D.; Besancon, M.; Besson, N.; Bloch, D.; Blom, M.; Bluj, M.; Bonesini, M.; Boonekamp, M.; Booth, P. S. L.; Borisov, G.; Botner, O.; Bouquet, B.; Bowcock, T. J. V.; Boyko, I.; Bracko, M.; Brenner, R.; Brodet, E.; Bruckman, P.; Brunet, J. M.; Buschbeck, B.; Buschmann, P.; Calvi, M.; Camporesi, T.; Canale, V.; Carena, F.; Castro, N.; Cavallo, F.; Chapkin, M.; Charpentier, P.; Checchia, P.; Chierici, R.; Chliapnikov, P.; Chudoba, J.; Chung, S. U.; Cieslik, K.; Collins, P.; Contri, R.; Cosme, G.; Cossutti, F.; Costa, M. J.; Crennell, D.; Cuevas, J.; D'Hondt, J.; Dalmau, J.; da Silva, T.; da Silva, W.; Della Ricca, G.; de Angelis, A.; de Boer, W.; de Clercq, C.; de Lotto, B.; de Maria, N.; de Min, A.; de Paula, L.; di Ciaccio, L.; di Simone, A.; Doroba, K.; Drees, J.; Eigen, G.; Ekelof, T.; Ellert, M.; Elsing, M.; Espirito Santo, M. C.; Fanourakis, G.; Fassouliotis, D.; Feindt, M.; Fernandez, J.; Ferrer, A.; Ferro, F.; Flagmeyer, U.; Foeth, H.; Fokitis, E.; Fulda-Quenzer, F.; Fuster, J.; Gandelman, M.; Garcia, C.; Gavillet, P.; Gazis, E.; Gokieli, R.; Golob, B.; Gomez-Ceballos, G.; Goncalves, P.; Graziani, E.; Grosdidier, G.; Grzelak, K.; Guy, J.; Haag, C.; Hallgren, A.; Hamacher, K.; Hamilton, K.; Haug, S.; Hauler, F.; Hedberg, V.; Hennecke, M.; Herr, H.; Hoffman, J.; Holmgren, S.-O.; Holt, P. J.; Houlden, M. A.; Hultqvist, K.; Jackson, J. N.; Jarlskog, G.; Jarry, P.; Jeans, D.; Johansson, E. K.; Johansson, P. D.; Jonsson, P.; Joram, C.; Jungermann, L.; Kapusta, F.; Katsanevas, S.; Katsoufis, E.; Kernel, G.; Kersevan, B. P.; Kerzel, U.; King, B. T.; Kjaer, N. J.; Kluit, P.; Kokkinias, P.; Kourkoumelis, C.; Kouznetsov, O.; Krumstein, Z.; Kucharczyk, M.; Lamsa, J.; Leder, G.; Ledroit, F.; Leinonen, L.; Leitner, R.; Lemonne, J.; Lepeltier, V.; Lesiak, T.; Liebig, W.; Liko, D.; Lipniacka, A.; Lopes, J. H.; Lopez, J. M.; Loukas, D.; Lutz, P.; Lyons, L.; MacNaughton, J.; Malek, A.; Maltezos, S.; Mandl, F.; Marco, J.; Marco, R.; Marechal, B.; Margoni, M.; Marin, J.-C.; Mariotti, C.; Markou, A.; Martinez-Rivero, C.; Masik, J.; Mastroyiannopoulos, N.; Matorras, F.; Matteuzzi, C.; Mazzucato, F.; Mazzucato, M.; Mc Nulty, R.; Meroni, C.; Migliore, E.; Mitaroff, W.; Mjoernmark, U.; Moa, T.; Moch, M.; Moenig, K.; Monge, R.; Montenegro, J.; Moraes, D.; Moreno, S.; Morettini, P.; Mueller, U.; Muenich, K.; Mulders, M.; Mundim, L.; Murray, W.; Muryn, B.; Myatt, G.; Myklebust, T.; Nassiakou, M.; Navarria, F.; Nawrocki, K.; Nicolaidou, R.; Nikolenko, M.; Oblakowska-Mucha, A.; Obraztsov, V.; Olshevski, A.; Onofre, A.; Orava, R.; Osterberg, K.; Ouraou, A.; Oyanguren, A.; Paganoni, M.; Paiano, S.; Palacios, J. P.; Palka, H.; Papadopoulou, T. D.; Pape, L.; Parkes, C.; Parodi, F.; Parzefall, U.; Passeri, A.; Passon, O.; Peralta, L.; Perepelitsa, V.; Perrotta, A.; Petrolini, A.; Piedra, J.; Pieri, L.; Pierre, F.; Pimenta, M.; Piotto, E.; Podobnik, T.; Poireau, V.; Pol, M. E.; Polok, G.; Pozdniakov, V.; Pukhaeva, N.; Pullia, A.; Rames, J.; Read, A.; Rebecchi, P.; Rehn, J.; Reid, D.; Reinhardt, R.; Renton, P.; Richard, F.; Ridky, J.; Rivero, M.; Rodriguez, D.; Romero, A.; Ronchese, P.; Roudeau, P.; Rovelli, T.; Ruhlmann-Kleider, V.; Ryabtchikov, D.; Sadovsky, A.; Salmi, L.; Salt, J.; Sander, C.; Savoy-Navarro, A.; Schwickerath, U.; Segar, A.; Sekulin, R.; Siebel, M.; Sisakian, A.; Smadja, G.; Smirnova, O.; Sokolov, A.; Sopczak, A.; Sosnowski, R.; Spassov, T.; Stanitzki, M.; Stocchi, A.; Strauss, J.; Stugu, B.; Szczekowski, M.; Szeptycka, M.; Szumlak, T.; Tabarelli, T.; Taffard, A. C.; Tegenfeldt, F.; Timmermans, J.; Tkatchev, L.; Tobin, M.; Todorovova, S.; Tome, B.; Tonazzo, A.; Tortosa, P.; Travnicek, P.; Treille, D.; Tristram, G.; Trochimczuk, M.; Troncon, C.; Turluer, M.-L.; Tyapkin, I. A.; Tyapkin, P.; Tzamarias, S.; Uvarov, V.; Valenti, G.; van Dam, P.; van Eldik, J.; van Remortel, N.; van Vulpen, I.; Vegni, G.; Veloso, F.; Venus, W.; Verdier, P.; Verzi, V.; Vilanova, D.; Vitale, L.; Vrba, V.; Wahlen, H.; Washbrook, A. J.; Weiser, C.; Wicke, D.; Wickens, J.; Wilkinson, G.; Winter, M.; Witek, M.; Yushchenko, O.; Zalewska, A.; Zalewski, P.; Zavrtanik, D.; Zhuravlov, V.; Zimin, N. I.; Zintchenko, A.; Zupan, M.; Achard, P.; Adriani, O.; Aguilar-Benitez, M.; Alcaraz, J.; Alemanni, G.; Allaby, J.; Aloisio, A.; Alviggi, M. G.; Anderhub, H.; Andreev, V. P.; Anselmo, F.; Arefiev, A.; Azemoon, T.; Aziz, T.; Bagnaia, P.; Bajo, A.; Baksay, G.; Baksay, L.; Baldew, S. V.; Banerjee, S.; Banerjee, Sw.; Barczyk, A.; Barillère, R.; Bartalini, P.; Basile, M.; Batalova, N.; Battiston, R.; Bay, A.; Becattini, F.; Becker, U.; Behner, F.; Bellucci, L.; Berbeco, R.; Berdugo, J.; Berges, P.; Bertucci, B.; Betev, B. L.; Biasini, M.; Biglietti, M.; Biland, A.; Blaising, J. J.; Blyth, S. C.; Bobbink, G. J.; Böhm, A.; Boldizsar, L.; Borgia, B.; Bottai, S.; Bourilkov, D.; Bourquin, M.; Braccini, S.; Branson, J. G.; Brochu, F.; Burger, J. D.; Burger, W. J.; Cai, X. D.; Capell, M.; Cara Romeo, G.; Carlino, G.; Cartacci, A.; Casaus, J.; Cavallari, F.; Cavallo, N.; Cecchi, C.; Cerrada, M.; Chamizo, M.; Chang, Y. H.; Chemarin, M.; Chen, A.; Chen, G.; Chen, G. M.; Chen, H. F.; Chen, H. S.; Chiefari, G.; Cifarelli, L.; Cindolo, F.; Clare, I.; Clare, R.; Coignet, G.; Colino, N.; Costantini, S.; de La Cruz, B.; Cucciarelli, S.; de Asmundis, R.; Déglon, P.; Debreczeni, J.; Degré, A.; Dehmelt, K.; Deiters, K.; Della Volpe, D.; Delmeire, E.; Denes, P.; Denotaristefani, F.; de Salvo, A.; Diemoz, M.; Dierckxsens, M.; Dionisi, C.; Dittmar, M.; Doria, A.; Dova, M. T.; Duchesneau, D.; Duda, M.; Echenard, B.; Eline, A.; El Hage, A.; El Mamouni, H.; Engler, A.; Eppling, F. J.; Extermann, P.; Falagan, M. A.; Falciano, S.; Favara, A.; Fay, J.; Fedin, O.; Felcini, M.; Ferguson, T.; Fesefeldt, H.; Fiandrini, E.; Field, J. H.; Filthaut, F.; Fisher, P. H.; Fisher, W.; Forconi, G.; Freudenreich, K.; Furetta, C.; Galaktionov, Yu.; Ganguli, S. N.; Garcia-Abia, P.; Gataullin, M.; Gentile, S.; Giagu, S.; Gong, Z. F.; Grenier, G.; Grimm, O.; Gruenewald, M. W.; Guida, M.; Gupta, V. K.; Gurtu, A.; Gutay, L. J.; Haas, D.; Hatzifotiadou, D.; Hebbeker, T.; Hervé, A.; Hirschfelder, J.; Hofer, H.; Hohlmann, M.; Holzner, G.; Hou, S. R.; Hu, J.; Jin, B. N.; Jindal, P.; Jones, L. W.; de Jong, P.; Josa-Mutuberría, I.; Kaur, M.; Kienzle-Focacci, M. N.; Kim, J. K.; Kirkby, J.; Kittel, W.; Klimentov, A.; König, A. C.; Kopal, M.; Koutsenko, V.; Kräber, M.; Kraemer, R. W.; Krüger, A.; Kunin, A.; Ladron de Guevara, P.; Laktineh, I.; Landi, G.; Lebeau, M.; Lebedev, A.; Lebrun, P.; Lecomte, P.; Lecoq, P.; Le Coultre, P.; Le Goff, J. M.; Leiste, R.; Levtchenko, M.; Levtchenko, P.; Li, C.; Likhoded, S.; Lin, C. H.; Lin, W. T.; Linde, F. L.; Lista, L.; Liu, Z. A.; Lohmann, W.; Longo, E.; Lu, Y. S.; Luci, C.; Luminari, L.; Lustermann, W.; Ma, W. G.; Malgeri, L.; Malinin, A.; Ma Na, C.; Mans, J.; Martin, J. P.; Marzano, F.; Mazumdar, K.; McNeil, R. R.; Mele, S.; Merola, L.; Meschini, M.; Metzger, W. J.; Mihul, A.; Milcent, H.; Mirabelli, G.; Mnich, J.; Mohanty, G. B.; Muanza, G. S.; Muijs, A. J. M.; Musicar, B.; Musy, M.; Nagy, S.; Natale, S.; Napolitano, M.; Nessi-Tedaldi, F.; Newman, H.; Nisati, A.; Novak, T.; Nowak, H.; Ofierzynski, R.; Organtini, G.; Pal, I.; Palomares, C.; Paolucci, P.; Paramatti, R.; Passaleva, G.; Patricelli, S.; Paul, T.; Pauluzzi, M.; Paus, C.; Pauss, F.; Pedace, M.; Pensotti, S.; Perret-Gallix, D.; Piccolo, D.; Pierella, F.; Pieri, M.; Pioppi, M.; Piroué, P. A.; Pistolesi, E.; Plyaskin, V.; Pohl, M.; Pojidaev, V.; Pothier, J.; Prokofiev, D.; Rahal-Callot, G.; Rahaman, M. A.; Raics, P.; Raja, N.; Ramelli, R.; Rancoita, P. G.; Ranieri, R.; Raspereza, A.; Razis, P.; Rembeczki, S.; Ren, D.; Rescigno, M.; Reucroft, S.; Riemann, S.; Riles, K.; Roe, B. P.; Romero, L.; Rosca, A.; Rosemann, C.; Rosenbleck, C.; Rosier-Lees, S.; Roth, S.; Rubio, J. A.; Ruggiero, G.; Rykaczewski, H.; Sakharov, A.; Saremi, S.; Sarkar, S.; Salicio, J.; Sanchez, E.; Schäfer, C.; Schegelsky, V.; Schopper, H.; Schotanus, D. J.; Sciacca, C.; Servoli, L.; Shevchenko, S.; Shivarov, N.; Shoutko, V.; Shumilov, E.; Shvorob, A.; Son, D.; Souga, C.; Spillantini, P.; Steuer, M.; Stickland, D. P.; Stoyanov, B.; Straessner, A.; Sudhakar, K.; Sultanov, G.; Sun, L. Z.; Sushkov, S.; Suter, H.; Swain, J. D.; Szillasi, Z.; Tang, X. W.; Tarjan, P.; Tauscher, L.; Taylor, L.; Tellili, B.; Teyssier, D.; Timmermans, C.; Ting, S. C. C.; Ting, S. M.; Tonwar, S. C.; Tóth, J.; Tully, C.; Tung, K. L.; Ulbricht, J.; Valente, E.; van de Walle, R. T.; Vasquez, R.; Vesztergombi, G.; Vetlitsky, I.; Viertel, G.; Vivargent, M.; Vlachos, S.; Vodopianov, I.; Vogel, H.; Vogt, H.; Vorobiev, I.; Vorobyov, A. A.; Wadhwa, M.; Wang, Q.; Wang, X. L.; Wang, Z. M.; Weber, M.; Wynhoff, S.; Xia, L.; Xu, Z. Z.; Yamamoto, J.; Yang, B. Z.; Yang, C. G.; Yang, H. J.; Yang, M.; Yeh, S. C.; Zalite, An.; Zalite, Yu.; Zhang, Z. P.; Zhao, J.; Zhu, G. Y.; Zhu, R. Y.; Zhuang, H. L.; Zichichi, A.; Zimmermann, B.; Zöller, M.; Abbiendi, G.; Ainsley, C.; Åkesson, P. F.; Alexander, G.; Allison, J.; Amaral, P.; Anagnostou, G.; Anderson, K. J.; Asai, S.; Axen, D.; Azuelos, G.; Bailey, I.; Barberio, E.; Barillari, T.; Barlow, R. J.; Batley, R. J.; Bechtle, P.; Behnke, T.; Bell, K. W.; Bell, P. J.; Bella, G.; Bellerive, A.; Benelli, G.; Bethke, S.; Biebel, O.; Boeriu, O.; Bock, P.; Boutemeur, M.; Braibant, S.; Brigliadori, L.; Brown, R. M.; Buesser, K.; Burckhart, H. J.; Campana, S.; Carnegie, R. K.; Carter, A. A.; Carter, J. R.; Chang, C. Y.; Charlton, D. G.; Ciocca, C.; Csilling, A.; Cuffiani, M.; Dado, S.; de Jong, S.; de Roeck, A.; de Wolf, E. A.; Desch, K.; Dienes, B.; Donkers, M.; Dubbert, J.; Duchovni, E.; Duckeck, G.; Duerdoth, I. P.; Etzion, E.; Fabbri, F.; Feld, L.; Ferrari, P.; Fiedler, F.; Fleck, I.; Ford, M.; Frey, A.; Gagnon, P.; Gary, J. W.; Gascon-Shotkin, S. M.; Gaycken, G.; Geich-Gimbel, C.; Giacomelli, G.; Giacomelli, P.; Giunta, M.; Goldberg, J.; Gross, E.; Grunhaus, J.; Gruwé, M.; Günther, P. O.; Gupta, A.; Hajdu, C.; Hamann, M.; Hanson, G. G.; Harel, A.; Hauschild, M.; Hawkes, C. M.; Hawkings, R.; Hemingway, R. J.; Herten, G.; Heuer, R. D.; Hill, J. C.; Hoffman, K.; Horváth, D.; Igo-Kemenes, P.; Ishii, K.; Jeremie, H.; Jost, U.; Jovanovic, P.; Junk, T. R.; Kanaya, N.; Kanzaki, J.; Karlen, D.; Kawagoe, K.; Kawamoto, T.; Keeler, R. K.; Kellogg, R. G.; Kennedy, B. W.; Kluth, S.; Kobayashi, T.; Kobel, M.; Komamiya, S.; Krämer, T.; Krieger, P.; von Krogh, J.; Kruger, K.; Kuhl, T.; Kupper, M.; Lafferty, G. D.; Landsman, H.; Lanske, D.; Layter, J. G.; Lellouch, D.; Letts, J.; Levinson, L.; Lillich, J.; Lloyd, S. L.; Loebinger, F. K.; Lu, J.; Ludwig, A.; Ludwig, J.; Mader, W.; Marcellini, S.; Martin, A. J.; Masetti, G.; Mashimo, T.; Mättig, P.; McKenna, J.; McPherson, R. A.; Meijers, F.; Menges, W.; Merritt, F. S.; Mes, H.; Meyer, N.; Michelini, A.; Mihara, S.; Mikenberg, G.; Miller, D. J.; Moed, S.; Mohr, W.; Mori, T.; Mutter, A.; Nagai, K.; Nakamura, I.; Nanjo, H.; Neal, H. A.; Nisius, R.; O'Neale, S. W.; Oh, A.; Oreglia, M. J.; Orito, S.; Pahl, C.; Pásztor, G.; Pater, J. R.; Pilcher, J. E.; Pinfold, J.; Plane, D. E.; Poli, B.; Pooth, O.; Przybycień, M.; Quadt, A.; Rabbertz, K.; Rembser, C.; Renkel, P.; Roney, J. M.; Rozen, Y.; Runge, K.; Sachs, K.; Saeki, T.; Sarkisyan, E. K. G.; Schaile, A. D.; Schaile, O.; Scharff-Hansen, P.; Schieck, J.; Schörner-Sadenius, T.; Schröder, M.; Schumacher, M.; Scott, W. G.; Seuster, R.; Shears, T. G.; Shen, B. C.; Sherwood, P.; Skuja, A.; Smith, A. M.; Sobie, R.; Söldner-Rembold, S.; Spano, F.; Stahl, A.; Strom, D.; Ströhmer, R.; Tarem, S.; Tasevsky, M.; Teuscher, R.; Thomson, M. A.; Torrence, E.; Toya, D.; Tran, P.; Trigger, I.; Trócsányi, Z.; Tsur, E.; Turner-Watson, M. F.; Ueda, I.; Ujvári, B.; Vollmer, C. F.; Vannerem, P.; Vértesi, R.; Verzocchi, M.; Voss, H.; Vossebeld, J.; Ward, C. P.; Ward, D. R.; Watkins, P. M.; Watson, A. T.; Watson, N. K.; Wells, P. S.; Wengler, T.; Wermes, N.; Wilson, G. W.; Wilson, J. A.; Wolf, G.; Wyatt, T. R.; Yamashita, S.; Zer-Zion, D.; Zivkovic, L.; Heinemeyer, S.; Pilaftsis, A.; Weiglein, G.

    2006-09-01

    The four LEP collaborations, ALEPH, DELPHI, L3 and OPAL, have searched for the neutral Higgs bosons which are predicted by the Minimal Supersymmetric standard model (MSSM). The data of the four collaborations are statistically combined and examined for their consistency with the background hypothesis and with a possible Higgs boson signal. The combined LEP data show no significant excess of events which would indicate the production of Higgs bosons. The search results are used to set upper bounds on the cross-sections of various Higgs-like event topologies. The results are interpreted within the MSSM in a number of “benchmark” models, including CP-conserving and CP-violating scenarios. These interpretations lead in all cases to large exclusions in the MSSM parameter space. Absolute limits are set on the parameter cosβ and, in some scenarios, on the masses of neutral Higgs bosons.

  9. Interpretation of statistical results.

    PubMed

    García Garmendia, J L; Maroto Monserrat, F

    2018-02-21

    The appropriate interpretation of the statistical results is crucial to understand the advances in medical science. The statistical tools allow us to transform the uncertainty and apparent chaos in nature to measurable parameters which are applicable to our clinical practice. The importance of understanding the meaning and actual extent of these instruments is essential for researchers, the funders of research and for professionals who require a permanent update based on good evidence and supports to decision making. Various aspects of the designs, results and statistical analysis are reviewed, trying to facilitate his comprehension from the basics to what is most common but no better understood, and bringing a constructive, non-exhaustive but realistic look. Copyright © 2018 Elsevier España, S.L.U. y SEMICYUC. All rights reserved.

  10. Water-quality trends in the nation’s rivers and streams, 1972–2012—Data preparation, statistical methods, and trend results

    USGS Publications Warehouse

    Oelsner, Gretchen P.; Sprague, Lori A.; Murphy, Jennifer C.; Zuellig, Robert E.; Johnson, Henry M.; Ryberg, Karen R.; Falcone, James A.; Stets, Edward G.; Vecchia, Aldo V.; Riskin, Melissa L.; De Cicco, Laura A.; Mills, Taylor J.; Farmer, William H.

    2017-04-04

    Since passage of the Clean Water Act in 1972, Federal, State, and local governments have invested billions of dollars to reduce pollution entering rivers and streams. To understand the return on these investments and to effectively manage and protect the Nation’s water resources in the future, we need to know how and why water quality has been changing over time. As part of the National Water-Quality Assessment Project, of the U.S. Geological Survey’s National Water-Quality Program, data from the U.S. Geological Survey, along with multiple other Federal, State, Tribal, regional, and local agencies, have been used to support the most comprehensive assessment conducted to date of surface-water-quality trends in the United States. This report documents the methods used to determine trends in water quality and ecology because these methods are vital to ensuring the quality of the results. Specific objectives are to document (1) the data compilation and processing steps used to identify river and stream sites throughout the Nation suitable for water-quality, pesticide, and ecology trend analysis, (2) the statistical methods used to determine trends in target parameters, (3) considerations for water-quality, pesticide, and ecology data and streamflow data when modeling trends, (4) sensitivity analyses for selecting data and interpreting trend results with the Weighted Regressions on Time, Discharge, and Season method, and (5) the final trend results at each site. The scope of this study includes trends in water-quality concentrations and loads (nutrient, sediment, major ion, salinity, and carbon), pesticide concentrations and loads, and metrics for aquatic ecology (fish, invertebrates, and algae) for four time periods: (1) 1972–2012, (2) 1982–2012, (3) 1992–2012, and (4) 2002–12. In total, nearly 12,000 trends in concentration, load, and ecology metrics were evaluated in this study; there were 11,893 combinations of sites, parameters, and trend periods. The final trend results are presented with examples of how to interpret the results from each trend model. Interpretation of the trend results, such as causal analysis, is not included.

  11. CSAM Metrology Software Tool

    NASA Technical Reports Server (NTRS)

    Vu, Duc; Sandor, Michael; Agarwal, Shri

    2005-01-01

    CSAM Metrology Software Tool (CMeST) is a computer program for analysis of false-color CSAM images of plastic-encapsulated microcircuits. (CSAM signifies C-mode scanning acoustic microscopy.) The colors in the images indicate areas of delamination within the plastic packages. Heretofore, the images have been interpreted by human examiners. Hence, interpretations have not been entirely consistent and objective. CMeST processes the color information in image-data files to detect areas of delamination without incurring inconsistencies of subjective judgement. CMeST can be used to create a database of baseline images of packages acquired at given times for comparison with images of the same packages acquired at later times. Any area within an image can be selected for analysis, which can include examination of different delamination types by location. CMeST can also be used to perform statistical analyses of image data. Results of analyses are available in a spreadsheet format for further processing. The results can be exported to any data-base-processing software.

  12. Clustering change patterns using Fourier transformation with time-course gene expression data.

    PubMed

    Kim, Jaehee

    2011-01-01

    To understand the behavior of genes, it is important to explore how the patterns of gene expression change over a period of time because biologically related gene groups can share the same change patterns. In this study, the problem of finding similar change patterns is induced to clustering with the derivative Fourier coefficients. This work is aimed at discovering gene groups with similar change patterns which share similar biological properties. We developed a statistical model using derivative Fourier coefficients to identify similar change patterns of gene expression. We used a model-based method to cluster the Fourier series estimation of derivatives. We applied our model to cluster change patterns of yeast cell cycle microarray expression data with alpha-factor synchronization. It showed that, as the method clusters with the probability-neighboring data, the model-based clustering with our proposed model yielded biologically interpretable results. We expect that our proposed Fourier analysis with suitably chosen smoothing parameters could serve as a useful tool in classifying genes and interpreting possible biological change patterns.

  13. Spectroscopic and Statistical Techniques for Information Recovery in Metabonomics and Metabolomics

    NASA Astrophysics Data System (ADS)

    Lindon, John C.; Nicholson, Jeremy K.

    2008-07-01

    Methods for generating and interpreting metabolic profiles based on nuclear magnetic resonance (NMR) spectroscopy, mass spectrometry (MS), and chemometric analysis methods are summarized and the relative strengths and weaknesses of NMR and chromatography-coupled MS approaches are discussed. Given that all data sets measured to date only probe subsets of complex metabolic profiles, we describe recent developments for enhanced information recovery from the resulting complex data sets, including integration of NMR- and MS-based metabonomic results and combination of metabonomic data with data from proteomics, transcriptomics, and genomics. We summarize the breadth of applications, highlight some current activities, discuss the issues relating to metabonomics, and identify future trends.

  14. Spectroscopic and statistical techniques for information recovery in metabonomics and metabolomics.

    PubMed

    Lindon, John C; Nicholson, Jeremy K

    2008-01-01

    Methods for generating and interpreting metabolic profiles based on nuclear magnetic resonance (NMR) spectroscopy, mass spectrometry (MS), and chemometric analysis methods are summarized and the relative strengths and weaknesses of NMR and chromatography-coupled MS approaches are discussed. Given that all data sets measured to date only probe subsets of complex metabolic profiles, we describe recent developments for enhanced information recovery from the resulting complex data sets, including integration of NMR- and MS-based metabonomic results and combination of metabonomic data with data from proteomics, transcriptomics, and genomics. We summarize the breadth of applications, highlight some current activities, discuss the issues relating to metabonomics, and identify future trends.

  15. Dilute Russel Viper Venom Time analysis in a Haematology Laboratory: An audit.

    PubMed

    Kruger, W; Meyer, P W A; Nel, J G

    2018-04-17

    To determine whether the current set of evaluation criteria used for dilute Russel Viper Venom Time (dRVVT) investigations in the routine laboratory meet expectation and identify possible shortcomings. All dRVVT assays requested from January 2015 to December 2015 were appraised in this cross-sectional study. The raw data panels were compared with the new reference interval, established in 2016, to determine the sequence of assays that should have been performed. The interpretive comments were audited, and false-negative reports identified. Interpretive comments according to three interpretation guidelines were compared. The reagent cost per assay was determined, and reagent cost wastage, due to redundant tests, was calculated. Only ~9% of dRVVT results authorized during 2015 had an interpretive comment included in the report. ~15% of these results were false-negative interpretations. There is a significant statistical difference in interpretive comments between the three interpretation methods. Redundant mixing tests resulted in R 7477.91 (~11%) reagent cost wastage in 2015. We managed to demonstrate very evident deficiencies in our own practice and managed to establish a standardized workflow that will potentially render our service more efficient and cost effective, aiding clinicians in making improved treatment decisions and diagnoses. Furthermore, it is essential that standard operating procedures be kept up to date and executed by all staff in the laboratory. © 2018 John Wiley & Sons Ltd.

  16. Chesapeake bay watershed land cover data series

    USGS Publications Warehouse

    Irani, Frederick M.; Claggett, Peter

    2010-01-01

    To better understand how the land is changing and to relate those changes to water quality trends, the USGS EGSC funded the production of a Chesapeake Bay Watershed Land Cover Data Series (CBLCD) representing four dates: 1984, 1992, 2001, and 2006. EGSC will publish land change forecasts based on observed trends in the CBLCD over the coming year. They are in the process of interpreting and publishing statistics on the extent, type and patterns of land cover change for 1984-2006 in the Bay watershed, major tributaries and counties.

  17. A Novel Statistical Analysis and Interpretation of Flow Cytometry Data

    DTIC Science & Technology

    2013-07-05

    ing parameters of the mathematical model are determined by using least squares to fit the data in Figures 1 and 2 (see Section 4). The second method is...variance (for all k and j). Then the AIC, which is the expected value of the relative Kullback – Leibler distance for a given model [16], is Kn = n log ( J...emphasized that the fit of the model is quite good for both donors and cell types. As such, we proceed to analyse the dynamic responsiveness of the

  18. Breakdown of Benford's law for birth data

    NASA Astrophysics Data System (ADS)

    Ausloos, M.; Herteliu, C.; Ileanu, B.

    2015-02-01

    Long birth time series for Romania are investigated from Benford's law point of view, distinguishing between families with a religious (Orthodox and Non-Orthodox) affiliation. The data extend from Jan. 01, 1905 till Dec. 31, 2001, i.e. over 97 years or 35 429 days. The results point to a drastic breakdown of Benford's law. Some interpretation is proposed, based on the statistical aspects due to population sizes, rather than on human thought constraints when the law breakdown is usually expected. Benford's law breakdown clearly points to natural causes.

  19. The application of satellite data in monitoring strip mines

    NASA Technical Reports Server (NTRS)

    Sharber, L. A.; Shahrokhi, F.

    1977-01-01

    Strip mines in the New River Drainage Basin of Tennessee were studied through use of Landsat-1 imagery and aircraft photography. A multilevel analysis, involving conventional photo interpretation techniques, densitometric methods, multispectral analysis and statistical testing was applied to the data. The Landsat imagery proved adequate for monitoring large-scale change resulting from active mining and land-reclamation projects. However, the spatial resolution of the satellite imagery rendered it inadequate for assessment of many smaller strip mines, in the region which may be as small as a few hectares.

  20. A comparison of five serological tests for bovine brucellosis.

    PubMed Central

    Dohoo, I R; Wright, P F; Ruckerbauer, G M; Samagh, B S; Robertson, F J; Forbes, L B

    1986-01-01

    Five serological assays: the buffered plate antigen test, the standard tube agglutination test, the complement fixation test, the hemolysis-in-gel test and the indirect enzyme immunoassay were diagnostically evaluated. Test data consisted of results from 1208 cattle in brucellosis-free herds, 1578 cattle in reactor herds of unknown infection status and 174 cattle from which Brucella abortus had been cultured. The complement fixation test had the highest specificity in both nonvaccinated and vaccinated cattle. The indirect enzyme immunoassay, if interpreted at a high threshold, also exhibited a high specificity in both groups of cattle. The hemolysis-in-gel test had a very high specificity when used in nonvaccinated cattle but quite a low specificity among vaccinates. With the exception of the complement fixation test, all tests had high sensitivities if interpreted at the minimum threshold. However, the sensitivities of the standard tube agglutination test and indirect enzyme immunoassay, when interpreted at high thresholds were comparable to that of the complement fixation test. A kappa statistic was used to measure the agreement between the various tests. In general the kappa statistics were quite low, suggesting that the various tests may detect different antibody isotypes. There was however, good agreement between the buffered plate antigen test and standard tube agglutination test (the two agglutination tests evaluated) and between the complement fixation test and the indirect enzyme immunoassay when interpreted at a high threshold. With the exception of the buffered plate antigen test, all tests were evaluated as confirmatory tests by estimating their specificity and sensitivity on screening-test positive samples.(ABSTRACT TRUNCATED AT 250 WORDS) PMID:3539295

Top