Sample records for data interpretation statistical

  1. Equivalent statistics and data interpretation.

    PubMed

    Francis, Gregory

    2017-08-01

    Recent reform efforts in psychological science have led to a plethora of choices for scientists to analyze their data. A scientist making an inference about their data must now decide whether to report a p value, summarize the data with a standardized effect size and its confidence interval, report a Bayes Factor, or use other model comparison methods. To make good choices among these options, it is necessary for researchers to understand the characteristics of the various statistics used by the different analysis frameworks. Toward that end, this paper makes two contributions. First, it shows that for the case of a two-sample t test with known sample sizes, many different summary statistics are mathematically equivalent in the sense that they are based on the very same information in the data set. When the sample sizes are known, the p value provides as much information about a data set as the confidence interval of Cohen's d or a JZS Bayes factor. Second, this equivalence means that different analysis methods differ only in their interpretation of the empirical data. At first glance, it might seem that mathematical equivalence of the statistics suggests that it does not matter much which statistic is reported, but the opposite is true because the appropriateness of a reported statistic is relative to the inference it promotes. Accordingly, scientists should choose an analysis method appropriate for their scientific investigation. A direct comparison of the different inferential frameworks provides some guidance for scientists to make good choices and improve scientific practice.

  2. Interpretation of Statistical Data: The Importance of Affective Expressions

    ERIC Educational Resources Information Center

    Queiroz, Tamires; Monteiro, Carlos; Carvalho, Liliane; François, Karen

    2017-01-01

    In recent years, research on teaching and learning of statistics emphasized that the interpretation of data is a complex process that involves cognitive and technical aspects. However, it is a human activity that involves also contextual and affective aspects. This view is in line with research on affectivity and cognition. While the affective…

  3. Statistical transformation and the interpretation of inpatient glucose control data.

    PubMed

    Saulnier, George E; Castro, Janna C; Cook, Curtiss B

    2014-03-01

    To introduce a statistical method of assessing hospital-based non-intensive care unit (non-ICU) inpatient glucose control. Point-of-care blood glucose (POC-BG) data from hospital non-ICUs were extracted for January 1 through December 31, 2011. Glucose data distribution was examined before and after Box-Cox transformations and compared to normality. Different subsets of data were used to establish upper and lower control limits, and exponentially weighted moving average (EWMA) control charts were constructed from June, July, and October data as examples to determine if out-of-control events were identified differently in nontransformed versus transformed data. A total of 36,381 POC-BG values were analyzed. In all 3 monthly test samples, glucose distributions in nontransformed data were skewed but approached a normal distribution once transformed. Interpretation of out-of-control events from EWMA control chart analyses also revealed differences. In the June test data, an out-of-control process was identified at sample 53 with nontransformed data, whereas the transformed data remained in control for the duration of the observed period. Analysis of July data demonstrated an out-of-control process sooner in the transformed (sample 55) than nontransformed (sample 111) data, whereas for October, transformed data remained in control longer than nontransformed data. Statistical transformations increase the normal behavior of inpatient non-ICU glycemic data sets. The decision to transform glucose data could influence the interpretation and conclusions about the status of inpatient glycemic control. Further study is required to determine whether transformed versus nontransformed data influence clinical decisions or evaluation of interventions.

  4. Data Analysis and Statistical Methods for the Assessment and Interpretation of Geochronologic Data

    NASA Astrophysics Data System (ADS)

    Reno, B. L.; Brown, M.; Piccoli, P. M.

    2007-12-01

    Ages are traditionally reported as a weighted mean with an uncertainty based on least squares analysis of analytical error on individual dates. This method does not take into account geological uncertainties, and cannot accommodate asymmetries in the data. In most instances, this method will understate uncertainty on a given age, which may lead to over interpretation of age data. Geologic uncertainty is difficult to quantify, but is typically greater than analytical uncertainty. These factors make traditional statistical approaches inadequate to fully evaluate geochronologic data. We propose a protocol to assess populations within multi-event datasets and to calculate age and uncertainty from each population of dates interpreted to represent a single geologic event using robust and resistant statistical methods. To assess whether populations thought to represent different events are statistically separate exploratory data analysis is undertaken using a box plot, where the range of the data is represented by a 'box' of length given by the interquartile range, divided at the median of the data, with 'whiskers' that extend to the furthest datapoint that lies within 1.5 times the interquartile range beyond the box. If the boxes representing the populations do not overlap, they are interpreted to represent statistically different sets of dates. Ages are calculated from statistically distinct populations using a robust tool such as the tanh method of Kelsey et al. (2003, CMP, 146, 326-340), which is insensitive to any assumptions about the underlying probability distribution from which the data are drawn. Therefore, this method takes into account the full range of data, and is not drastically affected by outliers. The interquartile range of each population of dates (the interquartile range) gives a first pass at expressing uncertainty, which accommodates asymmetry in the dataset; outliers have a minor affect on the uncertainty. To better quantify the uncertainty, a

  5. Statistical Literacy: High School Students in Reading, Interpreting and Presenting Data

    NASA Astrophysics Data System (ADS)

    Hafiyusholeh, M.; Budayasa, K.; Siswono, T. Y. E.

    2018-01-01

    One of the foundations for high school students in statistics is to be able to read data; presents data in the form of tables and diagrams and its interpretation. The purpose of this study is to describe high school students’ competencies in reading, interpreting and presenting data. Subjects were consisted of male and female students who had high levels of mathematical ability. Collecting data was done in form of task formulation which is analyzed by reducing, presenting and verifying data. Results showed that the students read the data based on explicit explanations on the diagram, such as explaining the points in the diagram as the relation between the x and y axis and determining the simple trend of a graph, including the maximum and minimum point. In interpreting and summarizing the data, both subjects pay attention to general data trends and use them to predict increases or decreases in data. The male estimates the value of the (n+1) of weight data by using the modus of the data, while the females estimate the weigth by using the average. The male tend to do not consider the characteristics of the data, while the female more carefully consider the characteristics of data.

  6. Statistical Approaches to Interpretation of Local, Regional, and National Highway-Runoff and Urban-Stormwater Data

    USGS Publications Warehouse

    Tasker, Gary D.; Granato, Gregory E.

    2000-01-01

    Decision makers need viable methods for the interpretation of local, regional, and national-highway runoff and urban-stormwater data including flows, concentrations and loads of chemical constituents and sediment, potential effects on receiving waters, and the potential effectiveness of various best management practices (BMPs). Valid (useful for intended purposes), current, and technically defensible stormwater-runoff models are needed to interpret data collected in field studies, to support existing highway and urban-runoffplanning processes, to meet National Pollutant Discharge Elimination System (NPDES) requirements, and to provide methods for computation of Total Maximum Daily Loads (TMDLs) systematically and economically. Historically, conceptual, simulation, empirical, and statistical models of varying levels of detail, complexity, and uncertainty have been used to meet various data-quality objectives in the decision-making processes necessary for the planning, design, construction, and maintenance of highways and for other land-use applications. Water-quality simulation models attempt a detailed representation of the physical processes and mechanisms at a given site. Empirical and statistical regional water-quality assessment models provide a more general picture of water quality or changes in water quality over a region. All these modeling techniques share one common aspect-their predictive ability is poor without suitable site-specific data for calibration. To properly apply the correct model, one must understand the classification of variables, the unique characteristics of water-resources data, and the concept of population structure and analysis. Classifying variables being used to analyze data may determine which statistical methods are appropriate for data analysis. An understanding of the characteristics of water-resources data is necessary to evaluate the applicability of different statistical methods, to interpret the results of these techniques

  7. Statistical transformation and the interpretation of inpatient glucose control data from the intensive care unit.

    PubMed

    Saulnier, George E; Castro, Janna C; Cook, Curtiss B

    2014-05-01

    Glucose control can be problematic in critically ill patients. We evaluated the impact of statistical transformation on interpretation of intensive care unit inpatient glucose control data. Point-of-care blood glucose (POC-BG) data derived from patients in the intensive care unit for 2011 was obtained. Box-Cox transformation of POC-BG measurements was performed, and distribution of data was determined before and after transformation. Different data subsets were used to establish statistical upper and lower control limits. Exponentially weighted moving average (EWMA) control charts constructed from April, October, and November data determined whether out-of-control events could be identified differently in transformed versus nontransformed data. A total of 8679 POC-BG values were analyzed. POC-BG distributions in nontransformed data were skewed but approached normality after transformation. EWMA control charts revealed differences in projected detection of out-of-control events. In April, an out-of-control process resulting in the lower control limit being exceeded was identified at sample 116 in nontransformed data but not in transformed data. October transformed data detected an out-of-control process exceeding the upper control limit at sample 27 that was not detected in nontransformed data. Nontransformed November results remained in control, but transformation identified an out-of-control event less than 10 samples into the observation period. Using statistical methods to assess population-based glucose control in the intensive care unit could alter conclusions about the effectiveness of care processes for managing hyperglycemia. Further study is required to determine whether transformed versus nontransformed data change clinical decisions about the interpretation of care or intervention results. © 2014 Diabetes Technology Society.

  8. Statistical Transformation and the Interpretation of Inpatient Glucose Control Data From the Intensive Care Unit

    PubMed Central

    Saulnier, George E.; Castro, Janna C.

    2014-01-01

    Glucose control can be problematic in critically ill patients. We evaluated the impact of statistical transformation on interpretation of intensive care unit inpatient glucose control data. Point-of-care blood glucose (POC-BG) data derived from patients in the intensive care unit for 2011 was obtained. Box–Cox transformation of POC-BG measurements was performed, and distribution of data was determined before and after transformation. Different data subsets were used to establish statistical upper and lower control limits. Exponentially weighted moving average (EWMA) control charts constructed from April, October, and November data determined whether out-of-control events could be identified differently in transformed versus nontransformed data. A total of 8679 POC-BG values were analyzed. POC-BG distributions in nontransformed data were skewed but approached normality after transformation. EWMA control charts revealed differences in projected detection of out-of-control events. In April, an out-of-control process resulting in the lower control limit being exceeded was identified at sample 116 in nontransformed data but not in transformed data. October transformed data detected an out-of-control process exceeding the upper control limit at sample 27 that was not detected in nontransformed data. Nontransformed November results remained in control, but transformation identified an out-of-control event less than 10 samples into the observation period. Using statistical methods to assess population-based glucose control in the intensive care unit could alter conclusions about the effectiveness of care processes for managing hyperglycemia. Further study is required to determine whether transformed versus nontransformed data change clinical decisions about the interpretation of care or intervention results. PMID:24876620

  9. How to interpret the results of medical time series data analysis: Classical statistical approaches versus dynamic Bayesian network modeling.

    PubMed

    Onisko, Agnieszka; Druzdzel, Marek J; Austin, R Marshall

    2016-01-01

    Classical statistics is a well-established approach in the analysis of medical data. While the medical community seems to be familiar with the concept of a statistical analysis and its interpretation, the Bayesian approach, argued by many of its proponents to be superior to the classical frequentist approach, is still not well-recognized in the analysis of medical data. The goal of this study is to encourage data analysts to use the Bayesian approach, such as modeling with graphical probabilistic networks, as an insightful alternative to classical statistical analysis of medical data. This paper offers a comparison of two approaches to analysis of medical time series data: (1) classical statistical approach, such as the Kaplan-Meier estimator and the Cox proportional hazards regression model, and (2) dynamic Bayesian network modeling. Our comparison is based on time series cervical cancer screening data collected at Magee-Womens Hospital, University of Pittsburgh Medical Center over 10 years. The main outcomes of our comparison are cervical cancer risk assessments produced by the three approaches. However, our analysis discusses also several aspects of the comparison, such as modeling assumptions, model building, dealing with incomplete data, individualized risk assessment, results interpretation, and model validation. Our study shows that the Bayesian approach is (1) much more flexible in terms of modeling effort, and (2) it offers an individualized risk assessment, which is more cumbersome for classical statistical approaches.

  10. Student's Conceptions in Statistical Graph's Interpretation

    ERIC Educational Resources Information Center

    Kukliansky, Ida

    2016-01-01

    Histograms, box plots and cumulative distribution graphs are popular graphic representations for statistical distributions. The main research question that this study focuses on is how college students deal with interpretation of these statistical graphs when translating graphical representations into analytical concepts in descriptive statistics.…

  11. Statistics and Data Interpretation for Social Work

    ERIC Educational Resources Information Center

    Rosenthal, James A.

    2011-01-01

    Written by a social worker for social work students, this is a nuts and bolts guide to statistics that presents complex calculations and concepts in clear, easy-to-understand language. It includes numerous examples, data sets, and issues that students will encounter in social work practice. The first section introduces basic concepts and terms to…

  12. Interpretation of statistical results.

    PubMed

    García Garmendia, J L; Maroto Monserrat, F

    2018-02-21

    The appropriate interpretation of the statistical results is crucial to understand the advances in medical science. The statistical tools allow us to transform the uncertainty and apparent chaos in nature to measurable parameters which are applicable to our clinical practice. The importance of understanding the meaning and actual extent of these instruments is essential for researchers, the funders of research and for professionals who require a permanent update based on good evidence and supports to decision making. Various aspects of the designs, results and statistical analysis are reviewed, trying to facilitate his comprehension from the basics to what is most common but no better understood, and bringing a constructive, non-exhaustive but realistic look. Copyright © 2018 Elsevier España, S.L.U. y SEMICYUC. All rights reserved.

  13. Workplace statistical literacy for teachers: interpreting box plots

    NASA Astrophysics Data System (ADS)

    Pierce, Robyn; Chick, Helen

    2013-06-01

    As a consequence of the increased use of data in workplace environments, there is a need to understand the demands that are placed on users to make sense of such data. In education, teachers are being increasingly expected to interpret and apply complex data about student and school performance, and, yet it is not clear that they always have the appropriate knowledge and experience to interpret the graphs, tables and other data that they receive. This study examined the statistical literacy demands placed on teachers, with a particular focus on box plot representations. Although box plots summarise the data in a way that makes visual comparisons possible across sets of data, this study showed that teachers do not always have the necessary fluency with the representation to describe correctly how the data are distributed in the representation. In particular, a significant number perceived the size of the regions of the box plot to be depicting frequencies rather than density, and there were misconceptions associated with outlying data that were not displayed on the plot. As well, teachers' perceptions of box plots were found to relate to three themes: attitudes, perceived value and misconceptions.

  14. Combinatorial interpretation of Haldane-Wu fractional exclusion statistics.

    PubMed

    Aringazin, A K; Mazhitov, M I

    2002-08-01

    Assuming that the maximal allowed number of identical particles in a state is an integer parameter, q, we derive the statistical weight and analyze the associated equation that defines the statistical distribution. The derived distribution covers Fermi-Dirac and Bose-Einstein ones in the particular cases q=1 and q--> infinity (n(i)/q-->1), respectively. We show that the derived statistical weight provides a natural combinatorial interpretation of Haldane-Wu fractional exclusion statistics, and present exact solutions of the distribution equation.

  15. Can patients interpret health information? An assessment of the medical data interpretation test.

    PubMed

    Schwartz, Lisa M; Woloshin, Steven; Welch, H Gilbert

    2005-01-01

    To establish the reliability/validity of an 18-item test of patients' medical data interpretation skills. Survey with retest after 2 weeks. Subjects. 178 people recruited from advertisements in local newspapers, an outpatient clinic, and a hospital open house. The percentage of correct answers to individual items ranged from 20% to 87%, and medical data interpretation test scores (on a 0- 100 scale) were normally distributed (median 61.1, mean 61.0, range 6-94). Reliability was good (test-retest correlation=0.67, Cronbach's alpha=0.71). Construct validity was supported in several ways. Higher scores were found among people with highest versus lowest numeracy (71 v. 36, P<0.001), highest quantitative literacy (65 v. 28, P<0.001), and highest education (69 v. 42, P=0.004). Scores for 15 physician experts also completing the survey were significantly higher than participants with other postgraduate degrees (mean score 89 v. 69, P<0.001). The medical data interpretation test is a reliable and valid measure of the ability to interpret medical statistics.

  16. Statistical Knowledge and the Over-Interpretation of Student Evaluations of Teaching

    ERIC Educational Resources Information Center

    Boysen, Guy A.

    2017-01-01

    Research shows that teachers interpret small differences in student evaluations of teaching as meaningful even when available statistical information indicates that the differences are not reliable. The current research explored the effect of statistical training on college teachers' tendency to over-interpret student evaluation differences. A…

  17. Structural interpretation of seismic data and inherent uncertainties

    NASA Astrophysics Data System (ADS)

    Bond, Clare

    2013-04-01

    Geoscience is perhaps unique in its reliance on incomplete datasets and building knowledge from their interpretation. This interpretation basis for the science is fundamental at all levels; from creation of a geological map to interpretation of remotely sensed data. To teach and understand better the uncertainties in dealing with incomplete data we need to understand the strategies individual practitioners deploy that make them effective interpreters. The nature of interpretation is such that the interpreter needs to use their cognitive ability in the analysis of the data to propose a sensible solution in their final output that is both consistent not only with the original data but also with other knowledge and understanding. In a series of experiments Bond et al. (2007, 2008, 2011, 2012) investigated the strategies and pitfalls of expert and non-expert interpretation of seismic images. These studies focused on large numbers of participants to provide a statistically sound basis for analysis of the results. The outcome of these experiments showed that a wide variety of conceptual models were applied to single seismic datasets. Highlighting not only spatial variations in fault placements, but whether interpreters thought they existed at all, or had the same sense of movement. Further, statistical analysis suggests that the strategies an interpreter employs are more important than expert knowledge per se in developing successful interpretations. Experts are successful because of their application of these techniques. In a new set of experiments a small number of experts are focused on to determine how they use their cognitive and reasoning skills, in the interpretation of 2D seismic profiles. Live video and practitioner commentary were used to track the evolving interpretation and to gain insight on their decision processes. The outputs of the study allow us to create an educational resource of expert interpretation through online video footage and commentary with

  18. Interpretingstatistical hypothesis testing” results in clinical research

    PubMed Central

    Sarmukaddam, Sanjeev B.

    2012-01-01

    Difference between “Clinical Significance and Statistical Significance” should be kept in mind while interpretingstatistical hypothesis testing” results in clinical research. This fact is already known to many but again pointed out here as philosophy of “statistical hypothesis testing” is sometimes unnecessarily criticized mainly due to failure in considering such distinction. Randomized controlled trials are also wrongly criticized similarly. Some scientific method may not be applicable in some peculiar/particular situation does not mean that the method is useless. Also remember that “statistical hypothesis testing” is not for decision making and the field of “decision analysis” is very much an integral part of science of statistics. It is not correct to say that “confidence intervals have nothing to do with confidence” unless one understands meaning of the word “confidence” as used in context of confidence interval. Interpretation of the results of every study should always consider all possible alternative explanations like chance, bias, and confounding. Statistical tests in inferential statistics are, in general, designed to answer the question “How likely is the difference found in random sample(s) is due to chance” and therefore limitation of relying only on statistical significance in making clinical decisions should be avoided. PMID:22707861

  19. Vocational students' learning preferences: the interpretability of ipsative data.

    PubMed

    Smith, P J

    2000-02-01

    A number of researchers have argued that ipsative data are not suitable for statistical procedures designed for normative data. Others have argued that the interpretability of such analyses of ipsative data are little affected where the number of variables and the sample size are sufficiently large. The research reported here represents a factor analysis of the scores on the Canfield Learning Styles Inventory for 1,252 students in vocational education. The results of the factor analysis of these ipsative data were examined in a context of existing theory and research on vocational students and lend support to the argument that the factor analysis of ipsative data can provide sensibly interpretable results.

  20. Statistics Refresher for Molecular Imaging Technologists, Part 2: Accuracy of Interpretation, Significance, and Variance.

    PubMed

    Farrell, Mary Beth

    2018-06-01

    This article is the second part of a continuing education series reviewing basic statistics that nuclear medicine and molecular imaging technologists should understand. In this article, the statistics for evaluating interpretation accuracy, significance, and variance are discussed. Throughout the article, actual statistics are pulled from the published literature. We begin by explaining 2 methods for quantifying interpretive accuracy: interreader and intrareader reliability. Agreement among readers can be expressed simply as a percentage. However, the Cohen κ-statistic is a more robust measure of agreement that accounts for chance. The higher the κ-statistic is, the higher is the agreement between readers. When 3 or more readers are being compared, the Fleiss κ-statistic is used. Significance testing determines whether the difference between 2 conditions or interventions is meaningful. Statistical significance is usually expressed using a number called a probability ( P ) value. Calculation of P value is beyond the scope of this review. However, knowing how to interpret P values is important for understanding the scientific literature. Generally, a P value of less than 0.05 is considered significant and indicates that the results of the experiment are due to more than just chance. Variance, standard deviation (SD), confidence interval, and standard error (SE) explain the dispersion of data around a mean of a sample drawn from a population. SD is commonly reported in the literature. A small SD indicates that there is not much variation in the sample data. Many biologic measurements fall into what is referred to as a normal distribution taking the shape of a bell curve. In a normal distribution, 68% of the data will fall within 1 SD, 95% will fall within 2 SDs, and 99.7% will fall within 3 SDs. Confidence interval defines the range of possible values within which the population parameter is likely to lie and gives an idea of the precision of the statistic being

  1. For a statistical interpretation of Helmholtz' thermal displacement

    NASA Astrophysics Data System (ADS)

    Podio-Guidugli, Paolo

    2016-11-01

    On moving from the classic papers by Einstein and Langevin on Brownian motion, two consistent statistical interpretations are given for the thermal displacement, a scalar field formally introduced by Helmholtz, whose time derivative is by definition the absolute temperature.

  2. Statistical Data Analyses of Trace Chemical, Biochemical, and Physical Analytical Signatures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Udey, Ruth Norma

    Analytical and bioanalytical chemistry measurement results are most meaningful when interpreted using rigorous statistical treatments of the data. The same data set may provide many dimensions of information depending on the questions asked through the applied statistical methods. Three principal projects illustrated the wealth of information gained through the application of statistical data analyses to diverse problems.

  3. Interpreting statistics of small lunar craters

    NASA Technical Reports Server (NTRS)

    Schultz, P. H.; Gault, D.; Greeley, R.

    1977-01-01

    Some of the wide variations in the crater-size distributions in lunar photography and in the resulting statistics were interpreted as different degradation rates on different surfaces, different scaling laws in different targets, and a possible population of endogenic craters. These possibilities are reexamined for statistics of 26 different regions. In contrast to most other studies, crater diameters as small as 5 m were measured from enlarged Lunar Orbiter framelets. According to the results of the reported analysis, the different crater distribution types appear to be most consistent with the hypotheses of differential degradation and a superposed crater population. Differential degradation can account for the low level of equilibrium in incompetent materials such as ejecta deposits, mantle deposits, and deep regoliths where scaling law changes and catastrophic processes introduce contradictions with other observations.

  4. Using Statistics to Lie, Distort, and Abuse Data

    ERIC Educational Resources Information Center

    Bintz, William; Moore, Sara; Adams, Cheryll; Pierce, Rebecca

    2009-01-01

    Statistics is a branch of mathematics that involves organization, presentation, and interpretation of data, both quantitative and qualitative. Data do not lie, but people do. On the surface, quantitative data are basically inanimate objects, nothing more than lifeless and meaningless symbols that appear on a page, calculator, computer, or in one's…

  5. The Statistical Interpretation of Classical Thermodynamic Heating and Expansion Processes

    ERIC Educational Resources Information Center

    Cartier, Stephen F.

    2011-01-01

    A statistical model has been developed and applied to interpret thermodynamic processes typically presented from the macroscopic, classical perspective. Through this model, students learn and apply the concepts of statistical mechanics, quantum mechanics, and classical thermodynamics in the analysis of the (i) constant volume heating, (ii)…

  6. Statistical analysis of arthroplasty data

    PubMed Central

    2011-01-01

    It is envisaged that guidelines for statistical analysis and presentation of results will improve the quality and value of research. The Nordic Arthroplasty Register Association (NARA) has therefore developed guidelines for the statistical analysis of arthroplasty register data. The guidelines are divided into two parts, one with an introduction and a discussion of the background to the guidelines (Ranstam et al. 2011a, see pages x-y in this issue), and this one with a more technical statistical discussion on how specific problems can be handled. This second part contains (1) recommendations for the interpretation of methods used to calculate survival, (2) recommendations on howto deal with bilateral observations, and (3) a discussion of problems and pitfalls associated with analysis of factors that influence survival or comparisons between outcomes extracted from different hospitals. PMID:21619500

  7. A statistical model for interpreting computerized dynamic posturography data

    NASA Technical Reports Server (NTRS)

    Feiveson, Alan H.; Metter, E. Jeffrey; Paloski, William H.

    2002-01-01

    Computerized dynamic posturography (CDP) is widely used for assessment of altered balance control. CDP trials are quantified using the equilibrium score (ES), which ranges from zero to 100, as a decreasing function of peak sway angle. The problem of how best to model and analyze ESs from a controlled study is considered. The ES often exhibits a skewed distribution in repeated trials, which can lead to incorrect inference when applying standard regression or analysis of variance models. Furthermore, CDP trials are terminated when a patient loses balance. In these situations, the ES is not observable, but is assigned the lowest possible score--zero. As a result, the response variable has a mixed discrete-continuous distribution, further compromising inference obtained by standard statistical methods. Here, we develop alternative methodology for analyzing ESs under a stochastic model extending the ES to a continuous latent random variable that always exists, but is unobserved in the event of a fall. Loss of balance occurs conditionally, with probability depending on the realized latent ES. After fitting the model by a form of quasi-maximum-likelihood, one may perform statistical inference to assess the effects of explanatory variables. An example is provided, using data from the NIH/NIA Baltimore Longitudinal Study on Aging.

  8. Design, analysis, and interpretation of field quality-control data for water-sampling projects

    USGS Publications Warehouse

    Mueller, David K.; Schertz, Terry L.; Martin, Jeffrey D.; Sandstrom, Mark W.

    2015-01-01

    The report provides extensive information about statistical methods used to analyze quality-control data in order to estimate potential bias and variability in environmental data. These methods include construction of confidence intervals on various statistical measures, such as the mean, percentiles and percentages, and standard deviation. The methods are used to compare quality-control results with the larger set of environmental data in order to determine whether the effects of bias and variability might interfere with interpretation of these data. Examples from published reports are presented to illustrate how the methods are applied, how bias and variability are reported, and how the interpretation of environmental data can be qualified based on the quality-control analysis.

  9. Statistical interpretation of machine learning-based feature importance scores for biomarker discovery.

    PubMed

    Huynh-Thu, Vân Anh; Saeys, Yvan; Wehenkel, Louis; Geurts, Pierre

    2012-07-01

    Univariate statistical tests are widely used for biomarker discovery in bioinformatics. These procedures are simple, fast and their output is easily interpretable by biologists but they can only identify variables that provide a significant amount of information in isolation from the other variables. As biological processes are expected to involve complex interactions between variables, univariate methods thus potentially miss some informative biomarkers. Variable relevance scores provided by machine learning techniques, however, are potentially able to highlight multivariate interacting effects, but unlike the p-values returned by univariate tests, these relevance scores are usually not statistically interpretable. This lack of interpretability hampers the determination of a relevance threshold for extracting a feature subset from the rankings and also prevents the wide adoption of these methods by practicians. We evaluated several, existing and novel, procedures that extract relevant features from rankings derived from machine learning approaches. These procedures replace the relevance scores with measures that can be interpreted in a statistical way, such as p-values, false discovery rates, or family wise error rates, for which it is easier to determine a significance level. Experiments were performed on several artificial problems as well as on real microarray datasets. Although the methods differ in terms of computing times and the tradeoff, they achieve in terms of false positives and false negatives, some of them greatly help in the extraction of truly relevant biomarkers and should thus be of great practical interest for biologists and physicians. As a side conclusion, our experiments also clearly highlight that using model performance as a criterion for feature selection is often counter-productive. Python source codes of all tested methods, as well as the MATLAB scripts used for data simulation, can be found in the Supplementary Material.

  10. Report: New analytical and statistical approaches for interpreting the relationships among environmental stressors and biomarkers

    EPA Science Inventory

    The broad topic of biomarker research has an often-overlooked component: the documentation and interpretation of the surrounding chemical environment and other meta-data, especially from visualization, analytical, and statistical perspectives (Pleil et al. 2014; Sobus et al. 2011...

  11. Localized Smart-Interpretation

    NASA Astrophysics Data System (ADS)

    Lundh Gulbrandsen, Mats; Mejer Hansen, Thomas; Bach, Torben; Pallesen, Tom

    2014-05-01

    The complex task of setting up a geological model consists not only of combining available geological information into a conceptual plausible model, but also requires consistency with availably data, e.g. geophysical data. However, in many cases the direct geological information, e.g borehole samples, are very sparse, so in order to create a geological model, the geologist needs to rely on the geophysical data. The problem is however, that the amount of geophysical data in many cases are so vast that it is practically impossible to integrate all of them in the manual interpretation process. This means that a lot of the information available from the geophysical surveys are unexploited, which is a problem, due to the fact that the resulting geological model does not fulfill its full potential and hence are less trustworthy. We suggest an approach to geological modeling that 1. allow all geophysical data to be considered when building the geological model 2. is fast 3. allow quantification of geological modeling. The method is constructed to build a statistical model, f(d,m), describing the relation between what the geologists interpret, d, and what the geologist knows, m. The para- meter m reflects any available information that can be quantified, such as geophysical data, the result of a geophysical inversion, elevation maps, etc... The parameter d reflects an actual interpretation, such as for example the depth to the base of a ground water reservoir. First we infer a statistical model f(d,m), by examining sets of actual interpretations made by a geological expert, [d1, d2, ...], and the information used to perform the interpretation; [m1, m2, ...]. This makes it possible to quantify how the geological expert performs interpolation through f(d,m). As the geological expert proceeds interpreting, the number of interpreted datapoints from which the statistical model is inferred increases, and therefore the accuracy of the statistical model increases. When a model f

  12. Statistical analysis and interpretation of prenatal diagnostic imaging studies, Part 2: descriptive and inferential statistical methods.

    PubMed

    Tuuli, Methodius G; Odibo, Anthony O

    2011-08-01

    The objective of this article is to discuss the rationale for common statistical tests used for the analysis and interpretation of prenatal diagnostic imaging studies. Examples from the literature are used to illustrate descriptive and inferential statistics. The uses and limitations of linear and logistic regression analyses are discussed in detail.

  13. Statistical Data Editing in Scientific Articles.

    PubMed

    Habibzadeh, Farrokh

    2017-07-01

    Scientific journals are important scholarly forums for sharing research findings. Editors have important roles in safeguarding standards of scientific publication and should be familiar with correct presentation of results, among other core competencies. Editors do not have access to the raw data and should thus rely on clues in the submitted manuscripts. To identify probable errors, they should look for inconsistencies in presented results. Common statistical problems that can be picked up by a knowledgeable manuscript editor are discussed in this article. Manuscripts should contain a detailed section on statistical analyses of the data. Numbers should be reported with appropriate precisions. Standard error of the mean (SEM) should not be reported as an index of data dispersion. Mean (standard deviation [SD]) and median (interquartile range [IQR]) should be used for description of normally and non-normally distributed data, respectively. If possible, it is better to report 95% confidence interval (CI) for statistics, at least for main outcome variables. And, P values should be presented, and interpreted with caution, if there is a hypothesis. To advance knowledge and skills of their members, associations of journal editors are better to develop training courses on basic statistics and research methodology for non-experts. This would in turn improve research reporting and safeguard the body of scientific evidence. © 2017 The Korean Academy of Medical Sciences.

  14. Smart Interpretation - Application of Machine Learning in Geological Interpretation of AEM Data

    NASA Astrophysics Data System (ADS)

    Bach, T.; Gulbrandsen, M. L.; Jacobsen, R.; Pallesen, T. M.; Jørgensen, F.; Høyer, A. S.; Hansen, T. M.

    2015-12-01

    When using airborne geophysical measurements in e.g. groundwater mapping, an overwhelming amount of data is collected. Increasingly larger survey areas, denser data collection and limited resources, combines to an increasing problem of building geological models that use all the available data in a manner that is consistent with the geologists knowledge about the geology of the survey area. In the ERGO project, funded by The Danish National Advanced Technology Foundation, we address this problem, by developing new, usable tools, enabling the geologist utilize her geological knowledge directly in the interpretation of the AEM data, and thereby handle the large amount of data, In the project we have developed the mathematical basis for capturing geological expertise in a statistical model. Based on this, we have implemented new algorithms that have been operationalized and embedded in user friendly software. In this software, the machine learning algorithm, Smart Interpretation, enables the geologist to use the system as an assistant in the geological modelling process. As the software 'learns' the geology from the geologist, the system suggest new modelling features in the data. In this presentation we demonstrate the application of the results from the ERGO project, including the proposed modelling workflow utilized on a variety of data examples.

  15. Critical Views of 8th Grade Students toward Statistical Data in Newspaper Articles: Analysis in Light of Statistical Literacy

    ERIC Educational Resources Information Center

    Guler, Mustafa; Gursoy, Kadir; Guven, Bulent

    2016-01-01

    Understanding and interpreting biased data, decision-making in accordance with the data, and critically evaluating situations involving data are among the fundamental skills necessary in the modern world. To develop these required skills, emphasis on statistical literacy in school mathematics has been gradually increased in recent years. The…

  16. Statistical Compression of Wind Speed Data

    NASA Astrophysics Data System (ADS)

    Tagle, F.; Castruccio, S.; Crippa, P.; Genton, M.

    2017-12-01

    In this work we introduce a lossy compression approach that utilizes a stochastic wind generator based on a non-Gaussian distribution to reproduce the internal climate variability of daily wind speed as represented by the CESM Large Ensemble over Saudi Arabia. Stochastic wind generators, and stochastic weather generators more generally, are statistical models that aim to match certain statistical properties of the data on which they are trained. They have been used extensively in applications ranging from agricultural models to climate impact studies. In this novel context, the parameters of the fitted model can be interpreted as encoding the information contained in the original uncompressed data. The statistical model is fit to only 3 of the 30 ensemble members and it adequately captures the variability of the ensemble in terms of seasonal internannual variability of daily wind speed. To deal with such a large spatial domain, it is partitioned into 9 region, and the model is fit independently to each of these. We further discuss a recent refinement of the model, which relaxes this assumption of regional independence, by introducing a large-scale component that interacts with the fine-scale regional effects.

  17. Statistical methods of fracture characterization using acoustic borehole televiewer log interpretation

    NASA Astrophysics Data System (ADS)

    Massiot, Cécile; Townend, John; Nicol, Andrew; McNamara, David D.

    2017-08-01

    Acoustic borehole televiewer (BHTV) logs provide measurements of fracture attributes (orientations, thickness, and spacing) at depth. Orientation, censoring, and truncation sampling biases similar to those described for one-dimensional outcrop scanlines, and other logging or drilling artifacts specific to BHTV logs, can affect the interpretation of fracture attributes from BHTV logs. K-means, fuzzy K-means, and agglomerative clustering methods provide transparent means of separating fracture groups on the basis of their orientation. Fracture spacing is calculated for each of these fracture sets. Maximum likelihood estimation using truncated distributions permits the fitting of several probability distributions to the fracture attribute data sets within truncation limits, which can then be extrapolated over the entire range where they naturally occur. Akaike Information Criterion (AIC) and Schwartz Bayesian Criterion (SBC) statistical information criteria rank the distributions by how well they fit the data. We demonstrate these attribute analysis methods with a data set derived from three BHTV logs acquired from the high-temperature Rotokawa geothermal field, New Zealand. Varying BHTV log quality reduces the number of input data points, but careful selection of the quality levels where fractures are deemed fully sampled increases the reliability of the analysis. Spacing data analysis comprising up to 300 data points and spanning three orders of magnitude can be approximated similarly well (similar AIC rankings) with several distributions. Several clustering configurations and probability distributions can often characterize the data at similar levels of statistical criteria. Thus, several scenarios should be considered when using BHTV log data to constrain numerical fracture models.

  18. Evaluating Structural Equation Models for Categorical Outcomes: A New Test Statistic and a Practical Challenge of Interpretation.

    PubMed

    Monroe, Scott; Cai, Li

    2015-01-01

    This research is concerned with two topics in assessing model fit for categorical data analysis. The first topic involves the application of a limited-information overall test, introduced in the item response theory literature, to structural equation modeling (SEM) of categorical outcome variables. Most popular SEM test statistics assess how well the model reproduces estimated polychoric correlations. In contrast, limited-information test statistics assess how well the underlying categorical data are reproduced. Here, the recently introduced C2 statistic of Cai and Monroe (2014) is applied. The second topic concerns how the root mean square error of approximation (RMSEA) fit index can be affected by the number of categories in the outcome variable. This relationship creates challenges for interpreting RMSEA. While the two topics initially appear unrelated, they may conveniently be studied in tandem since RMSEA is based on an overall test statistic, such as C2. The results are illustrated with an empirical application to data from a large-scale educational survey.

  19. Distributed data collection for a database of radiological image interpretations

    NASA Astrophysics Data System (ADS)

    Long, L. Rodney; Ostchega, Yechiam; Goh, Gin-Hua; Thoma, George R.

    1997-01-01

    The National Library of Medicine, in collaboration with the National Center for Health Statistics and the National Institute for Arthritis and Musculoskeletal and Skin Diseases, has built a system for collecting radiological interpretations for a large set of x-ray images acquired as part of the data gathered in the second National Health and Nutrition Examination Survey. This system is capable of delivering across the Internet 5- and 10-megabyte x-ray images to Sun workstations equipped with X Window based 2048 X 2560 image displays, for the purpose of having these images interpreted for the degree of presence of particular osteoarthritic conditions in the cervical and lumbar spines. The collected interpretations can then be stored in a database at the National Library of Medicine, under control of the Illustra DBMS. This system is a client/server database application which integrates (1) distributed server processing of client requests, (2) a customized image transmission method for faster Internet data delivery, (3) distributed client workstations with high resolution displays, image processing functions and an on-line digital atlas, and (4) relational database management of the collected data.

  20. Statistics: The Shape of the Data. Used Numbers: Real Data in the Classroom. Grades 4-6.

    ERIC Educational Resources Information Center

    Russell, Susan Jo; Corwin, Rebecca B.

    A unit of study that introduces collecting, representing, describing, and interpreting data is presented. Suitable for students in grades 4 through 6, it provides a foundation for further work in statistics and data analysis. The investigations may extend from one to four class sessions and are grouped into three parts: "Introduction to Data…

  1. Interpretation of the results of statistical measurements. [search for basic probability model

    NASA Technical Reports Server (NTRS)

    Olshevskiy, V. V.

    1973-01-01

    For random processes, the calculated probability characteristic, and the measured statistical estimate are used in a quality functional, which defines the difference between the two functions. Based on the assumption that the statistical measurement procedure is organized so that the parameters for a selected model are optimized, it is shown that the interpretation of experimental research is a search for a basic probability model.

  2. Combining data visualization and statistical approaches for interpreting measurements and meta-data: Integrating heatmaps, variable clustering, and mixed regression models

    EPA Science Inventory

    The advent of new higher throughput analytical instrumentation has put a strain on interpreting and explaining the results from complex studies. Contemporary human, environmental, and biomonitoring data sets are comprised of tens or hundreds of analytes, multiple repeat measures...

  3. Analysis and interpretation of cost data in randomised controlled trials: review of published studies

    PubMed Central

    Barber, Julie A; Thompson, Simon G

    1998-01-01

    Objective To review critically the statistical methods used for health economic evaluations in randomised controlled trials where an estimate of cost is available for each patient in the study. Design Survey of published randomised trials including an economic evaluation with cost values suitable for statistical analysis; 45 such trials published in 1995 were identified from Medline. Main outcome measures The use of statistical methods for cost data was assessed in terms of the descriptive statistics reported, use of statistical inference, and whether the reported conclusions were justified. Results Although all 45 trials reviewed apparently had cost data for each patient, only 9 (20%) reported adequate measures of variability for these data and only 25 (56%) gave results of statistical tests or a measure of precision for the comparison of costs between the randomised groups. Only 16 (36%) of the articles gave conclusions which were justified on the basis of results presented in the paper. No paper reported sample size calculations for costs. Conclusions The analysis and interpretation of cost data from published trials reveal a lack of statistical awareness. Strong and potentially misleading conclusions about the relative costs of alternative therapies have often been reported in the absence of supporting statistical evidence. Improvements in the analysis and reporting of health economic assessments are urgently required. Health economic guidelines need to be revised to incorporate more detailed statistical advice. Key messagesHealth economic evaluations required for important healthcare policy decisions are often carried out in randomised controlled trialsA review of such published economic evaluations assessed whether statistical methods for cost outcomes have been appropriately used and interpretedFew publications presented adequate descriptive information for costs or performed appropriate statistical analysesIn at least two thirds of the papers, the main

  4. Using assemblage data in ecological indicators: A comparison and evaluation of commonly available statistical tools

    USGS Publications Warehouse

    Smith, Joseph M.; Mather, Martha E.

    2012-01-01

    Ecological indicators are science-based tools used to assess how human activities have impacted environmental resources. For monitoring and environmental assessment, existing species assemblage data can be used to make these comparisons through time or across sites. An impediment to using assemblage data, however, is that these data are complex and need to be simplified in an ecologically meaningful way. Because multivariate statistics are mathematical relationships, statistical groupings may not make ecological sense and will not have utility as indicators. Our goal was to define a process to select defensible and ecologically interpretable statistical simplifications of assemblage data in which researchers and managers can have confidence. For this, we chose a suite of statistical methods, compared the groupings that resulted from these analyses, identified convergence among groupings, then we interpreted the groupings using species and ecological guilds. When we tested this approach using a statewide stream fish dataset, not all statistical methods worked equally well. For our dataset, logistic regression (Log), detrended correspondence analysis (DCA), cluster analysis (CL), and non-metric multidimensional scaling (NMDS) provided consistent, simplified output. Specifically, the Log, DCA, CL-1, and NMDS-1 groupings were ≥60% similar to each other, overlapped with the fluvial-specialist ecological guild, and contained a common subset of species. Groupings based on number of species (e.g., Log, DCA, CL and NMDS) outperformed groupings based on abundance [e.g., principal components analysis (PCA) and Poisson regression]. Although the specific methods that worked on our test dataset have generality, here we are advocating a process (e.g., identifying convergent groupings with redundant species composition that are ecologically interpretable) rather than the automatic use of any single statistical tool. We summarize this process in step-by-step guidance for the

  5. Antecedents of obesity - analysis, interpretation, and use of longitudinal data.

    PubMed

    Gillman, Matthew W; Kleinman, Ken

    2007-07-01

    The obesity epidemic causes misery and death. Most epidemiologists accept the hypothesis that characteristics of the early stages of human development have lifelong influences on obesity-related health outcomes. Unfortunately, there is a dearth of data of sufficient scope and individual history to help unravel the associations of prenatal, postnatal, and childhood factors with adult obesity and health outcomes. Here the authors discuss analytic methods, the interpretation of models, and the use to which such rare and valuable data may be put in developing interventions to combat the epidemic. For example, analytic methods such as quantile and multinomial logistic regression can describe the effects on body mass index range rather than just its mean; structural equation models may allow comparison of the contributions of different factors at different periods in the life course. Interpretation of the data and model construction is complex, and it requires careful consideration of the biologic plausibility and statistical interpretation of putative causal factors. The goals of discovering modifiable determinants of obesity during the prenatal, postnatal, and childhood periods must be kept in sight, and analyses should be built to facilitate them. Ultimately, interventions in these factors may help prevent obesity-related adverse health outcomes for future generations.

  6. Statistical analysis of water-quality data containing multiple detection limits: S-language software for regression on order statistics

    USGS Publications Warehouse

    Lee, L.; Helsel, D.

    2005-01-01

    Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.

  7. Interpretable dimensionality reduction of single cell transcriptome data with deep generative models.

    PubMed

    Ding, Jiarui; Condon, Anne; Shah, Sohrab P

    2018-05-21

    Single-cell RNA-sequencing has great potential to discover cell types, identify cell states, trace development lineages, and reconstruct the spatial organization of cells. However, dimension reduction to interpret structure in single-cell sequencing data remains a challenge. Existing algorithms are either not able to uncover the clustering structures in the data or lose global information such as groups of clusters that are close to each other. We present a robust statistical model, scvis, to capture and visualize the low-dimensional structures in single-cell gene expression data. Simulation results demonstrate that low-dimensional representations learned by scvis preserve both the local and global neighbor structures in the data. In addition, scvis is robust to the number of data points and learns a probabilistic parametric mapping function to add new data points to an existing embedding. We then use scvis to analyze four single-cell RNA-sequencing datasets, exemplifying interpretable two-dimensional representations of the high-dimensional single-cell RNA-sequencing data.

  8. Workplace Statistical Literacy for Teachers: Interpreting Box Plots

    ERIC Educational Resources Information Center

    Pierce, Robyn; Chick, Helen

    2013-01-01

    As a consequence of the increased use of data in workplace environments, there is a need to understand the demands that are placed on users to make sense of such data. In education, teachers are being increasingly expected to interpret and apply complex data about student and school performance, and, yet it is not clear that they always have the…

  9. Material Phase Causality or a Dynamics-Statistical Interpretation of Quantum Mechanics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Koprinkov, I. G.

    2010-11-25

    The internal phase dynamics of a quantum system interacting with an electromagnetic field is revealed in details. Theoretical and experimental evidences of a causal relation of the phase of the wave function to the dynamics of the quantum system are presented sistematically for the first time. A dynamics-statistical interpretation of the quantum mechanics is introduced.

  10. Kernel-PCA data integration with enhanced interpretability

    PubMed Central

    2014-01-01

    Background Nowadays, combining the different sources of information to improve the biological knowledge available is a challenge in bioinformatics. One of the most powerful methods for integrating heterogeneous data types are kernel-based methods. Kernel-based data integration approaches consist of two basic steps: firstly the right kernel is chosen for each data set; secondly the kernels from the different data sources are combined to give a complete representation of the available data for a given statistical task. Results We analyze the integration of data from several sources of information using kernel PCA, from the point of view of reducing dimensionality. Moreover, we improve the interpretability of kernel PCA by adding to the plot the representation of the input variables that belong to any dataset. In particular, for each input variable or linear combination of input variables, we can represent the direction of maximum growth locally, which allows us to identify those samples with higher/lower values of the variables analyzed. Conclusions The integration of different datasets and the simultaneous representation of samples and variables together give us a better understanding of biological knowledge. PMID:25032747

  11. Transit Spectroscopy: new data analysis techniques and interpretation

    NASA Astrophysics Data System (ADS)

    Tinetti, Giovanna; Waldmann, Ingo P.; Morello, Giuseppe; Tessenyi, Marcell; Varley, Ryan; Barton, Emma; Yurchenko, Sergey; Tennyson, Jonathan; Hollis, Morgan

    2014-11-01

    Planetary science beyond the boundaries of our Solar System is today in its infancy. Until a couple of decades ago, the detailed investigation of the planetary properties was restricted to objects orbiting inside the Kuiper Belt. Today, we cannot ignore that the number of known planets has increased by two orders of magnitude nor that these planets resemble anything but the objects present in our own Solar System. A key observable for planets is the chemical composition and state of their atmosphere. To date, two methods can be used to sound exoplanetary atmospheres: transit and eclipse spectroscopy, and direct imaging spectroscopy. Although the field of exoplanet spectroscopy has been very successful in past years, there are a few serious hurdles that need to be overcome to progress in this area: in particular instrument systematics are often difficult to disentangle from the signal, data are sparse and often not recorded simultaneously causing degeneracy of interpretation. We will present here new data analysis techniques and interpretation developed by the “ExoLights” team at UCL to address the above-mentioned issues. Said techniques include statistical tools, non-parametric, machine-learning algorithms, optimized radiative transfer models and spectroscopic line-lists. These new tools have been successfully applied to existing data recorded with space and ground instruments, shedding new light on our knowledge and understanding of these alien worlds.

  12. Data Interpretation in the Digital Age

    PubMed Central

    Leonelli, Sabina

    2014-01-01

    The consultation of internet databases and the related use of computer software to retrieve, visualise and model data have become key components of many areas of scientific research. This paper focuses on the relation of these developments to understanding the biology of organisms, and examines the conditions under which the evidential value of data posted online is assessed and interpreted by the researchers who access them, in ways that underpin and guide the use of those data to foster discovery. I consider the types of knowledge required to interpret data as evidence for claims about organisms, and in particular the relevance of knowledge acquired through physical interaction with actual organisms to assessing the evidential value of data found online. I conclude that familiarity with research in vivo is crucial to assessing the quality and significance of data visualised in silico; and that studying how biological data are disseminated, visualised, assessed and interpreted in the digital age provides a strong rationale for viewing scientific understanding as a social and distributed, rather than individual and localised, achievement. PMID:25729262

  13. Analysis of Visual Interpretation of Satellite Data

    NASA Astrophysics Data System (ADS)

    Svatonova, H.

    2016-06-01

    Millions of people of all ages and expertise are using satellite and aerial data as an important input for their work in many different fields. Satellite data are also gradually finding a new place in education, especially in the fields of geography and in environmental issues. The article presents the results of an extensive research in the area of visual interpretation of image data carried out in the years 2013 - 2015 in the Czech Republic. The research was aimed at comparing the success rate of the interpretation of satellite data in relation to a) the substrates (to the selected colourfulness, the type of depicted landscape or special elements in the landscape) and b) to selected characteristics of users (expertise, gender, age). The results of the research showed that (1) false colour images have a slightly higher percentage of successful interpretation than natural colour images, (2) colourfulness of an element expected or rehearsed by the user (regardless of the real natural colour) increases the success rate of identifying the element (3) experts are faster in interpreting visual data than non-experts, with the same degree of accuracy of solving the task, and (4) men and women are equally successful in the interpretation of visual image data.

  14. Trends in Fertility in the United States. Vital and Health Statistics, Data from the National Vital Statistics System. Series 21, Number 28.

    ERIC Educational Resources Information Center

    Taffel, Selma

    This report presents and interprets birth statistics for the United States with particular emphasis on changes that took place during the period 1970-73. Data for the report were based on information entered on birth certificates collected from all states. The majority of the document comprises graphs and tables of data, but there are four short…

  15. A data-management system for detailed areal interpretive data

    USGS Publications Warehouse

    Ferrigno, C.F.

    1986-01-01

    A data storage and retrieval system has been developed to organize and preserve areal interpretive data. This system can be used by any study where there is a need to store areal interpretive data that generally is presented in map form. This system provides the capability to grid areal interpretive data for input to groundwater flow models at any spacing and orientation. The data storage and retrieval system is designed to be used for studies that cover small areas such as counties. The system is built around a hierarchically structured data base consisting of related latitude-longitude blocks. The information in the data base can be stored at different levels of detail, with the finest detail being a block of 6 sec of latitude by 6 sec of longitude (approximately 0.01 sq mi). This system was implemented on a mainframe computer using a hierarchical data base management system. The computer programs are written in Fortran IV and PL/1. The design and capabilities of the data storage and retrieval system, and the computer programs that are used to implement the system are described. Supplemental sections contain the data dictionary, user documentation of the data-system software, changes that would need to be made to use this system for other studies, and information on the computer software tape. (Lantz-PTT)

  16. Sources of Safety Data and Statistical Strategies for Design and Analysis: Clinical Trials.

    PubMed

    Zink, Richard C; Marchenko, Olga; Sanchez-Kam, Matilde; Ma, Haijun; Jiang, Qi

    2018-03-01

    There has been an increased emphasis on the proactive and comprehensive evaluation of safety endpoints to ensure patient well-being throughout the medical product life cycle. In fact, depending on the severity of the underlying disease, it is important to plan for a comprehensive safety evaluation at the start of any development program. Statisticians should be intimately involved in this process and contribute their expertise to study design, safety data collection, analysis, reporting (including data visualization), and interpretation. In this manuscript, we review the challenges associated with the analysis of safety endpoints and describe the safety data that are available to influence the design and analysis of premarket clinical trials. We share our recommendations for the statistical and graphical methodologies necessary to appropriately analyze, report, and interpret safety outcomes, and we discuss the advantages and disadvantages of safety data obtained from clinical trials compared to other sources. Clinical trials are an important source of safety data that contribute to the totality of safety information available to generate evidence for regulators, sponsors, payers, physicians, and patients. This work is a result of the efforts of the American Statistical Association Biopharmaceutical Section Safety Working Group.

  17. Data and graph interpretation practices among preservice science teachers

    NASA Astrophysics Data System (ADS)

    Bowen, G. Michael; Roth, Wolff-Michael

    2005-12-01

    The interpretation of data and construction and interpretation of graphs are central practices in science, which, according to recent reform documents, science and mathematics teachers are expected to foster in their classrooms. However, are (preservice) science teachers prepared to teach inquiry with the purpose of transforming and analyzing data, and interpreting graphical representations? That is, are preservice science teachers prepared to teach data analysis and graph interpretation practices that scientists use by default in their everyday work? The present study was designed to answer these and related questions. We investigated the responses of preservice elementary and secondary science teachers to data and graph interpretation tasks. Our investigation shows that, despite considerable preparation, and for many, despite bachelor of science degrees, preservice teachers do not enact the (authentic) practices that scientists routinely do when asked to interpret data or graphs. Detailed analyses are provided of what data and graph interpretation practices actually were enacted. We conclude that traditional schooling emphasizes particular beliefs in the mathematical nature of the universe that make it difficult for many individuals to deal with data possessing the random variation found in measurements of natural phenomena. The results suggest that preservice teachers need more experience in engaging in data and graph interpretation practices originating in activities that provide the degree of variation in and complexity of data present in realistic investigations.

  18. Menzerath-Altmann Law: Statistical Mechanical Interpretation as Applied to a Linguistic Organization

    NASA Astrophysics Data System (ADS)

    Eroglu, Sertac

    2014-10-01

    The distribution behavior described by the empirical Menzerath-Altmann law is frequently encountered during the self-organization of linguistic and non-linguistic natural organizations at various structural levels. This study presents a statistical mechanical derivation of the law based on the analogy between the classical particles of a statistical mechanical organization and the distinct words of a textual organization. The derived model, a transformed (generalized) form of the Menzerath-Altmann model, was termed as the statistical mechanical Menzerath-Altmann model. The derived model allows interpreting the model parameters in terms of physical concepts. We also propose that many organizations presenting the Menzerath-Altmann law behavior, whether linguistic or not, can be methodically examined by the transformed distribution model through the properly defined structure-dependent parameter and the energy associated states.

  19. Invited Commentary: Antecedents of Obesity—Analysis, Interpretation, and Use of Longitudinal Data

    PubMed Central

    Gillman, Matthew W.; Kleinman, Ken

    2007-01-01

    The obesity epidemic causes misery and death. Most epidemiologists accept the hypothesis that characteristics of the early stages of human development have lifelong influences on obesity-related health outcomes. Unfortunately, there is a dearth of data of sufficient scope and individual history to help unravel the associations of prenatal, postnatal, and childhood factors with adult obesity and health outcomes. Here the authors discuss analytic methods, the interpretation of models, and the use to which such rare and valuable data may be put in developing interventions to combat the epidemic. For example, analytic methods such as quantile and multinomial logistic regression can describe the effects on body mass index range rather than just its mean; structural equation models may allow comparison of the contributions of different factors at different periods in the life course. Interpretation of the data and model construction is complex, and it requires careful consideration of the biologic plausibility and statistical interpretation of putative causal factors. The goals of discovering modifiable determinants of obesity during the prenatal, postnatal, and childhood periods must be kept in sight, and analyses should be built to facilitate them. Ultimately, interventions in these factors may help prevent obesity-related adverse health outcomes for future generations. PMID:17490988

  20. [Big data in official statistics].

    PubMed

    Zwick, Markus

    2015-08-01

    The concept of "big data" stands to change the face of official statistics over the coming years, having an impact on almost all aspects of data production. The tasks of future statisticians will not necessarily be to produce new data, but rather to identify and make use of existing data to adequately describe social and economic phenomena. Until big data can be used correctly in official statistics, a lot of questions need to be answered and problems solved: the quality of data, data protection, privacy, and the sustainable availability are some of the more pressing issues to be addressed. The essential skills of official statisticians will undoubtedly change, and this implies a number of challenges to be faced by statistical education systems, in universities, and inside the statistical offices. The national statistical offices of the European Union have concluded a concrete strategy for exploring the possibilities of big data for official statistics, by means of the Big Data Roadmap and Action Plan 1.0. This is an important first step and will have a significant influence on implementing the concept of big data inside the statistical offices of Germany.

  1. A scan statistic to extract causal gene clusters from case-control genome-wide rare CNV data.

    PubMed

    Nishiyama, Takeshi; Takahashi, Kunihiko; Tango, Toshiro; Pinto, Dalila; Scherer, Stephen W; Takami, Satoshi; Kishino, Hirohisa

    2011-05-26

    Several statistical tests have been developed for analyzing genome-wide association data by incorporating gene pathway information in terms of gene sets. Using these methods, hundreds of gene sets are typically tested, and the tested gene sets often overlap. This overlapping greatly increases the probability of generating false positives, and the results obtained are difficult to interpret, particularly when many gene sets show statistical significance. We propose a flexible statistical framework to circumvent these problems. Inspired by spatial scan statistics for detecting clustering of disease occurrence in the field of epidemiology, we developed a scan statistic to extract disease-associated gene clusters from a whole gene pathway. Extracting one or a few significant gene clusters from a global pathway limits the overall false positive probability, which results in increased statistical power, and facilitates the interpretation of test results. In the present study, we applied our method to genome-wide association data for rare copy-number variations, which have been strongly implicated in common diseases. Application of our method to a simulated dataset demonstrated the high accuracy of this method in detecting disease-associated gene clusters in a whole gene pathway. The scan statistic approach proposed here shows a high level of accuracy in detecting gene clusters in a whole gene pathway. This study has provided a sound statistical framework for analyzing genome-wide rare CNV data by incorporating topological information on the gene pathway.

  2. Interpreting Data: The Hybrid Mind

    ERIC Educational Resources Information Center

    Heisterkamp, Kimberly; Talanquer, Vicente

    2015-01-01

    The central goal of this study was to characterize major patterns of reasoning exhibited by college chemistry students when analyzing and interpreting chemical data. Using a case study approach, we investigated how a representative student used chemical models to explain patterns in the data based on structure-property relationships. Our results…

  3. Knowledge-base for interpretation of cerebrospinal fluid data patterns. Essentials in neurology and psychiatry.

    PubMed

    Reiber, Hansotto

    2016-06-01

    The physiological and biophysical knowledge base for interpretations of cerebrospinal fluid (CSF) data and reference ranges are essential for the clinical pathologist and neurochemist. With the popular description of the CSF flow dependent barrier function, the dynamics and concentration gradients of blood-derived, brain-derived and leptomeningeal proteins in CSF or the specificity-independent functions of B-lymphocytes in brain also the neurologist, psychiatrist, neurosurgeon as well as the neuropharmacologist may find essentials for diagnosis, research or development of therapies. This review may help to replace the outdated ideas like "leakage" models of the barriers, linear immunoglobulin Index Interpretations or CSF electrophoresis. Calculations, Interpretations and analytical pitfalls are described for albumin quotients, quantitation of immunoglobulin synthesis in Reibergrams, oligoclonal IgG, IgM analysis, the polyspecific ( MRZ- ) antibody reaction, the statistical treatment of CSF data and general quality assessment in the CSF laboratory. The diagnostic relevance is documented in an accompaning review.

  4. New physicochemical interpretations for the adsorption of food dyes on chitosan films using statistical physics treatment.

    PubMed

    Dotto, G L; Pinto, L A A; Hachicha, M A; Knani, S

    2015-03-15

    In this work, statistical physics treatment was employed to study the adsorption of food dyes onto chitosan films, in order to obtain new physicochemical interpretations at molecular level. Experimental equilibrium curves were obtained for the adsorption of four dyes (FD&C red 2, FD&C yellow 5, FD&C blue 2, Acid Red 51) at different temperatures (298, 313 and 328 K). A statistical physics formula was used to interpret these curves, and the parameters such as, number of adsorbed dye molecules per site (n), anchorage number (n'), receptor sites density (NM), adsorbed quantity at saturation (N asat), steric hindrance (τ), concentration at half saturation (c1/2) and molar adsorption energy (ΔE(a)) were estimated. The relation of the above mentioned parameters with the chemical structure of the dyes and temperature was evaluated and interpreted. Copyright © 2014 Elsevier Ltd. All rights reserved.

  5. Biomembrane Permeabilization: Statistics of Individual Leakage Events Harmonize the Interpretation of Vesicle Leakage.

    PubMed

    Braun, Stefan; Pokorná, Šárka; Šachl, Radek; Hof, Martin; Heerklotz, Heiko; Hoernke, Maria

    2018-01-23

    The mode of action of membrane-active molecules, such as antimicrobial, anticancer, cell penetrating, and fusion peptides and their synthetic mimics, transfection agents, drug permeation enhancers, and biological signaling molecules (e.g., quorum sensing), involves either the general or local destabilization of the target membrane or the formation of defined, rather stable pores. Some effects aim at killing the cell, while others need to be limited in space and time to avoid serious damage. Biological tests reveal translocation of compounds and cell death but do not provide a detailed, mechanistic, and quantitative understanding of the modes of action and their molecular basis. Model membrane studies of membrane leakage have been used for decades to tackle this issue, but their interpretation in terms of biology has remained challenging and often quite limited. Here we compare two recent, powerful protocols to study model membrane leakage: the microscopic detection of dye influx into giant liposomes and time-correlated single photon counting experiments to characterize dye efflux from large unilamellar vesicles. A statistical treatment of both data sets does not only harmonize apparent discrepancies but also makes us aware of principal issues that have been confusing the interpretation of model membrane leakage data so far. Moreover, our study reveals a fundamental difference between nano- and microscale systems that needs to be taken into account when conclusions about microscale objects, such as cells, are drawn from nanoscale models.

  6. Evaluation of forensic DNA mixture evidence: protocol for evaluation, interpretation, and statistical calculations using the combined probability of inclusion.

    PubMed

    Bieber, Frederick R; Buckleton, John S; Budowle, Bruce; Butler, John M; Coble, Michael D

    2016-08-31

    The evaluation and interpretation of forensic DNA mixture evidence faces greater interpretational challenges due to increasingly complex mixture evidence. Such challenges include: casework involving low quantity or degraded evidence leading to allele and locus dropout; allele sharing of contributors leading to allele stacking; and differentiation of PCR stutter artifacts from true alleles. There is variation in statistical approaches used to evaluate the strength of the evidence when inclusion of a specific known individual(s) is determined, and the approaches used must be supportable. There are concerns that methods utilized for interpretation of complex forensic DNA mixtures may not be implemented properly in some casework. Similar questions are being raised in a number of U.S. jurisdictions, leading to some confusion about mixture interpretation for current and previous casework. Key elements necessary for the interpretation and statistical evaluation of forensic DNA mixtures are described. Given the most common method for statistical evaluation of DNA mixtures in many parts of the world, including the USA, is the Combined Probability of Inclusion/Exclusion (CPI/CPE). Exposition and elucidation of this method and a protocol for use is the focus of this article. Formulae and other supporting materials are provided. Guidance and details of a DNA mixture interpretation protocol is provided for application of the CPI/CPE method in the analysis of more complex forensic DNA mixtures. This description, in turn, should help reduce the variability of interpretation with application of this methodology and thereby improve the quality of DNA mixture interpretation throughout the forensic community.

  7. Evaluation of a statistics-based Ames mutagenicity QSAR model and interpretation of the results obtained.

    PubMed

    Barber, Chris; Cayley, Alex; Hanser, Thierry; Harding, Alex; Heghes, Crina; Vessey, Jonathan D; Werner, Stephane; Weiner, Sandy K; Wichard, Joerg; Giddings, Amanda; Glowienke, Susanne; Parenty, Alexis; Brigo, Alessandro; Spirkl, Hans-Peter; Amberg, Alexander; Kemper, Ray; Greene, Nigel

    2016-04-01

    The relative wealth of bacterial mutagenicity data available in the public literature means that in silico quantitative/qualitative structure activity relationship (QSAR) systems can readily be built for this endpoint. A good means of evaluating the performance of such systems is to use private unpublished data sets, which generally represent a more distinct chemical space than publicly available test sets and, as a result, provide a greater challenge to the model. However, raw performance metrics should not be the only factor considered when judging this type of software since expert interpretation of the results obtained may allow for further improvements in predictivity. Enough information should be provided by a QSAR to allow the user to make general, scientifically-based arguments in order to assess and overrule predictions when necessary. With all this in mind, we sought to validate the performance of the statistics-based in vitro bacterial mutagenicity prediction system Sarah Nexus (version 1.1) against private test data sets supplied by nine different pharmaceutical companies. The results of these evaluations were then analysed in order to identify findings presented by the model which would be useful for the user to take into consideration when interpreting the results and making their final decision about the mutagenic potential of a given compound. Copyright © 2015 Elsevier Inc. All rights reserved.

  8. Interpretive focus groups: a participatory method for interpreting and extending secondary analysis of qualitative data.

    PubMed

    Redman-MacLaren, Michelle; Mills, Jane; Tommbe, Rachael

    2014-01-01

    Participatory approaches to qualitative research practice constantly change in response to evolving research environments. Researchers are increasingly encouraged to undertake secondary analysis of qualitative data, despite epistemological and ethical challenges. Interpretive focus groups can be described as a more participative method for groups to analyse qualitative data. To facilitate interpretive focus groups with women in Papua New Guinea to extend analysis of existing qualitative data and co-create new primary data. The purpose of this was to inform a transformational grounded theory and subsequent health promoting action. A two-step approach was used in a grounded theory study about how women experience male circumcision in Papua New Guinea. Participants analysed portions or 'chunks' of existing qualitative data in story circles and built upon this analysis by using the visual research method of storyboarding. New understandings of the data were evoked when women in interpretive focus groups analysed the data 'chunks'. Interpretive focus groups encouraged women to share their personal experiences about male circumcision. The visual method of storyboarding enabled women to draw pictures to represent their experiences. This provided an additional focus for whole-of-group discussions about the research topic. Interpretive focus groups offer opportunity to enhance trustworthiness of findings when researchers undertake secondary analysis of qualitative data. The co-analysis of existing data and co-generation of new data between research participants and researchers informed an emergent transformational grounded theory and subsequent health promoting action.

  9. Interpretive focus groups: a participatory method for interpreting and extending secondary analysis of qualitative data

    PubMed Central

    Redman-MacLaren, Michelle; Mills, Jane; Tommbe, Rachael

    2014-01-01

    Background Participatory approaches to qualitative research practice constantly change in response to evolving research environments. Researchers are increasingly encouraged to undertake secondary analysis of qualitative data, despite epistemological and ethical challenges. Interpretive focus groups can be described as a more participative method for groups to analyse qualitative data. Objective To facilitate interpretive focus groups with women in Papua New Guinea to extend analysis of existing qualitative data and co-create new primary data. The purpose of this was to inform a transformational grounded theory and subsequent health promoting action. Design A two-step approach was used in a grounded theory study about how women experience male circumcision in Papua New Guinea. Participants analysed portions or ‘chunks’ of existing qualitative data in story circles and built upon this analysis by using the visual research method of storyboarding. Results New understandings of the data were evoked when women in interpretive focus groups analysed the data ‘chunks’. Interpretive focus groups encouraged women to share their personal experiences about male circumcision. The visual method of storyboarding enabled women to draw pictures to represent their experiences. This provided an additional focus for whole-of-group discussions about the research topic. Conclusions Interpretive focus groups offer opportunity to enhance trustworthiness of findings when researchers undertake secondary analysis of qualitative data. The co-analysis of existing data and co-generation of new data between research participants and researchers informed an emergent transformational grounded theory and subsequent health promoting action. PMID:25138532

  10. An Efficient Data Partitioning to Improve Classification Performance While Keeping Parameters Interpretable.

    PubMed

    Korjus, Kristjan; Hebart, Martin N; Vicente, Raul

    2016-01-01

    Supervised machine learning methods typically require splitting data into multiple chunks for training, validating, and finally testing classifiers. For finding the best parameters of a classifier, training and validation are usually carried out with cross-validation. This is followed by application of the classifier with optimized parameters to a separate test set for estimating the classifier's generalization performance. With limited data, this separation of test data creates a difficult trade-off between having more statistical power in estimating generalization performance versus choosing better parameters and fitting a better model. We propose a novel approach that we term "Cross-validation and cross-testing" improving this trade-off by re-using test data without biasing classifier performance. The novel approach is validated using simulated data and electrophysiological recordings in humans and rodents. The results demonstrate that the approach has a higher probability of discovering significant results than the standard approach of cross-validation and testing, while maintaining the nominal alpha level. In contrast to nested cross-validation, which is maximally efficient in re-using data, the proposed approach additionally maintains the interpretability of individual parameters. Taken together, we suggest an addition to currently used machine learning approaches which may be particularly useful in cases where model weights do not require interpretation, but parameters do.

  11. Occupational exposure decisions: can limited data interpretation training help improve accuracy?

    PubMed

    Logan, Perry; Ramachandran, Gurumurthy; Mulhausen, John; Hewett, Paul

    2009-06-01

    Accurate exposure assessments are critical for ensuring that potentially hazardous exposures are properly identified and controlled. The availability and accuracy of exposure assessments can determine whether resources are appropriately allocated to engineering and administrative controls, medical surveillance, personal protective equipment and other programs designed to protect workers. A desktop study was performed using videos, task information and sampling data to evaluate the accuracy and potential bias of participants' exposure judgments. Desktop exposure judgments were obtained from occupational hygienists for material handling jobs with small air sampling data sets (0-8 samples) and without the aid of computers. In addition, data interpretation tests (DITs) were administered to participants where they were asked to estimate the 95th percentile of an underlying log-normal exposure distribution from small data sets. Participants were presented with an exposure data interpretation or rule of thumb training which included a simple set of rules for estimating 95th percentiles for small data sets from a log-normal population. DIT was given to each participant before and after the rule of thumb training. Results of each DIT and qualitative and quantitative exposure judgments were compared with a reference judgment obtained through a Bayesian probabilistic analysis of the sampling data to investigate overall judgment accuracy and bias. There were a total of 4386 participant-task-chemical judgments for all data collections: 552 qualitative judgments made without sampling data and 3834 quantitative judgments with sampling data. The DITs and quantitative judgments were significantly better than random chance and much improved by the rule of thumb training. In addition, the rule of thumb training reduced the amount of bias in the DITs and quantitative judgments. The mean DIT % correct scores increased from 47 to 64% after the rule of thumb training (P < 0.001). The

  12. Understanding spatial organizations of chromosomes via statistical analysis of Hi-C data

    PubMed Central

    Hu, Ming; Deng, Ke; Qin, Zhaohui; Liu, Jun S.

    2015-01-01

    Understanding how chromosomes fold provides insights into the transcription regulation, hence, the functional state of the cell. Using the next generation sequencing technology, the recently developed Hi-C approach enables a global view of spatial chromatin organization in the nucleus, which substantially expands our knowledge about genome organization and function. However, due to multiple layers of biases, noises and uncertainties buried in the protocol of Hi-C experiments, analyzing and interpreting Hi-C data poses great challenges, and requires novel statistical methods to be developed. This article provides an overview of recent Hi-C studies and their impacts on biomedical research, describes major challenges in statistical analysis of Hi-C data, and discusses some perspectives for future research. PMID:26124977

  13. Muscular Dystrophy: Data and Statistics

    MedlinePlus

    ... For… Media Policy Makers MD STARnet Data and Statistics Recommend on Facebook Tweet Share Compartir Expand All Collapse All The following data and statistics come from MD STARnet. Data from the MD ...

  14. Interpreting Qualitative Data: A Methodological Inquiry.

    ERIC Educational Resources Information Center

    Newman, Isadore; MacDonald, Suzanne

    The methodology of interpretation of qualitative data was explored using a grounded theory approach to the synthesis of data, examining the construction of categories in particular. The focus is on ways of organizing data and attaching meaning, as research problems embedded in cultural context are explored. A qualitative research training task…

  15. Data Acquisition and Preprocessing in Studies on Humans: What Is Not Taught in Statistics Classes?

    PubMed

    Zhu, Yeyi; Hernandez, Ladia M; Mueller, Peter; Dong, Yongquan; Forman, Michele R

    2013-01-01

    The aim of this paper is to address issues in research that may be missing from statistics classes and important for (bio-)statistics students. In the context of a case study, we discuss data acquisition and preprocessing steps that fill the gap between research questions posed by subject matter scientists and statistical methodology for formal inference. Issues include participant recruitment, data collection training and standardization, variable coding, data review and verification, data cleaning and editing, and documentation. Despite the critical importance of these details in research, most of these issues are rarely discussed in an applied statistics program. One reason for the lack of more formal training is the difficulty in addressing the many challenges that can possibly arise in the course of a study in a systematic way. This article can help to bridge this gap between research questions and formal statistical inference by using an illustrative case study for a discussion. We hope that reading and discussing this paper and practicing data preprocessing exercises will sensitize statistics students to these important issues and achieve optimal conduct, quality control, analysis, and interpretation of a study.

  16. Improving interpretation of publically reported statistics on health and healthcare: the Figure Interpretation Assessment Tool (FIAT-Health).

    PubMed

    Gerrits, Reinie G; Kringos, Dionne S; van den Berg, Michael J; Klazinga, Niek S

    2018-03-07

    Policy-makers, managers, scientists, patients and the general public are confronted daily with figures on health and healthcare through public reporting in newspapers, webpages and press releases. However, information on the key characteristics of these figures necessary for their correct interpretation is often not adequately communicated, which can lead to misinterpretation and misinformed decision-making. The objective of this research was to map the key characteristics relevant to the interpretation of figures on health and healthcare, and to develop a Figure Interpretation Assessment Tool-Health (FIAT-Health) through which figures on health and healthcare can be systematically assessed, allowing for a better interpretation of these figures. The abovementioned key characteristics of figures on health and healthcare were identified through systematic expert consultations in the Netherlands on four topic categories of figures, namely morbidity, healthcare expenditure, healthcare outcomes and lifestyle. The identified characteristics were used as a frame for the development of the FIAT-Health. Development of the tool and its content was supported and validated through regular review by a sounding board of potential users. Identified characteristics relevant for the interpretation of figures in the four categories relate to the figures' origin, credibility, expression, subject matter, population and geographical focus, time period, and underlying data collection methods. The characteristics were translated into a set of 13 dichotomous and 4-point Likert scale questions constituting the FIAT-Health, and two final assessment statements. Users of the FIAT-Health were provided with a summary overview of their answers to support a final assessment of the correctness of a figure and the appropriateness of its reporting. FIAT-Health can support policy-makers, managers, scientists, patients and the general public to systematically assess the quality of publicly reported

  17. Data Systems and Reports as Active Participants in Data Interpretation

    ERIC Educational Resources Information Center

    Rankin, Jenny Grant

    2016-01-01

    Most data-informed decision-making in education is undermined by flawed interpretations. Educator-driven interventions to improve data use are beneficial but not omnipotent, as data misunderstandings persist at schools and school districts commended for ideal data use support. Meanwhile, most data systems and reports display figures without…

  18. Interpretive Reporting of Protein Electrophoresis Data by Microcomputer

    PubMed Central

    Talamo, Thomas S.; Losos, Frank J.; Kessler, G. Frederick

    1982-01-01

    A microcomputer based system for interpretive reporting of protein electrophoretic data has been developed. Data for serum, urine and cerebrospinal fluid protein electrophoreses as well as immunoelectrophoresis can be entered. Patient demographic information is entered through the keyboard followed by manual entry of total and fractionated protein levels obtained after densitometer scanning of the electrophoretic strip. The patterns are then coded, interpreted, and final reports generated. In most cases interpretation time is less than one second. Misinterpretation by computer is uncommon and can be corrected by edit functions within the system. These discrepancies between computer and pathologist interpretation are automatically stored in a data file for later review and possible program modification. Any or all previous tests on a patient may be reviewed with graphic display of the electrophoretic pattern. The system has been in use for several months and is presently well accepted by both laboratory and clinical staff. It also allows rapid storage, retrieval and analysis of protein electrophoretic datab.

  19. An Efficient Data Partitioning to Improve Classification Performance While Keeping Parameters Interpretable

    PubMed Central

    Korjus, Kristjan; Hebart, Martin N.; Vicente, Raul

    2016-01-01

    Supervised machine learning methods typically require splitting data into multiple chunks for training, validating, and finally testing classifiers. For finding the best parameters of a classifier, training and validation are usually carried out with cross-validation. This is followed by application of the classifier with optimized parameters to a separate test set for estimating the classifier’s generalization performance. With limited data, this separation of test data creates a difficult trade-off between having more statistical power in estimating generalization performance versus choosing better parameters and fitting a better model. We propose a novel approach that we term “Cross-validation and cross-testing” improving this trade-off by re-using test data without biasing classifier performance. The novel approach is validated using simulated data and electrophysiological recordings in humans and rodents. The results demonstrate that the approach has a higher probability of discovering significant results than the standard approach of cross-validation and testing, while maintaining the nominal alpha level. In contrast to nested cross-validation, which is maximally efficient in re-using data, the proposed approach additionally maintains the interpretability of individual parameters. Taken together, we suggest an addition to currently used machine learning approaches which may be particularly useful in cases where model weights do not require interpretation, but parameters do. PMID:27564393

  20. Statistical analysis of solid waste composition data: Arithmetic mean, standard deviation and correlation coefficients.

    PubMed

    Edjabou, Maklawe Essonanawe; Martín-Fernández, Josep Antoni; Scheutz, Charlotte; Astrup, Thomas Fruergaard

    2017-11-01

    Data for fractional solid waste composition provide relative magnitudes of individual waste fractions, the percentages of which always sum to 100, thereby connecting them intrinsically. Due to this sum constraint, waste composition data represent closed data, and their interpretation and analysis require statistical methods, other than classical statistics that are suitable only for non-constrained data such as absolute values. However, the closed characteristics of waste composition data are often ignored when analysed. The results of this study showed, for example, that unavoidable animal-derived food waste amounted to 2.21±3.12% with a confidence interval of (-4.03; 8.45), which highlights the problem of the biased negative proportions. A Pearson's correlation test, applied to waste fraction generation (kg mass), indicated a positive correlation between avoidable vegetable food waste and plastic packaging. However, correlation tests applied to waste fraction compositions (percentage values) showed a negative association in this regard, thus demonstrating that statistical analyses applied to compositional waste fraction data, without addressing the closed characteristics of these data, have the potential to generate spurious or misleading results. Therefore, ¨compositional data should be transformed adequately prior to any statistical analysis, such as computing mean, standard deviation and correlation coefficients. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. Data Literacy is Statistical Literacy

    ERIC Educational Resources Information Center

    Gould, Robert

    2017-01-01

    Past definitions of statistical literacy should be updated in order to account for the greatly amplified role that data now play in our lives. Experience working with high-school students in an innovative data science curriculum has shown that teaching statistical literacy, augmented by data literacy, can begin early.

  2. Statistics Clinic

    NASA Technical Reports Server (NTRS)

    Feiveson, Alan H.; Foy, Millennia; Ploutz-Snyder, Robert; Fiedler, James

    2014-01-01

    Do you have elevated p-values? Is the data analysis process getting you down? Do you experience anxiety when you need to respond to criticism of statistical methods in your manuscript? You may be suffering from Insufficient Statistical Support Syndrome (ISSS). For symptomatic relief of ISSS, come for a free consultation with JSC biostatisticians at our help desk during the poster sessions at the HRP Investigators Workshop. Get answers to common questions about sample size, missing data, multiple testing, when to trust the results of your analyses and more. Side effects may include sudden loss of statistics anxiety, improved interpretation of your data, and increased confidence in your results.

  3. Statistical Interpretation of Natural and Technological Hazards in China

    NASA Astrophysics Data System (ADS)

    Borthwick, Alistair, ,, Prof.; Ni, Jinren, ,, Prof.

    2010-05-01

    China is prone to catastrophic natural hazards from floods, droughts, earthquakes, storms, cyclones, landslides, epidemics, extreme temperatures, forest fires, avalanches, and even tsunami. This paper will list statistics related to the six worst natural disasters in China over the past 100 or so years, ranked according to number of fatalities. The corresponding data for the six worst natural disasters in China over the past decade will also be considered. [The data are abstracted from the International Disaster Database, Centre for Research on the Epidemiology of Disasters (CRED), Université Catholique de Louvain, Brussels, Belgium, http://www.cred.be/ where a disaster is defined as occurring if one of the following criteria is fulfilled: 10 or more people reported killed; 100 or more people reported affected; a call for international assistance; or declaration of a state of emergency.] The statistics include the number of occurrences of each type of natural disaster, the number of deaths, the number of people affected, and the cost in billions of US dollars. Over the past hundred years, the largest disasters may be related to the overabundance or scarcity of water, and to earthquake damage. However, there has been a substantial relative reduction in fatalities due to water related disasters over the past decade, even though the overall numbers of people affected remain huge, as does the economic damage. This change is largely due to the efforts put in by China's water authorities to establish effective early warning systems, the construction of engineering countermeasures for flood protection, the implementation of water pricing and other measures for reducing excessive consumption during times of drought. It should be noted that the dreadful death toll due to the Sichuan Earthquake dominates recent data. Joint research has been undertaken between the Department of Environmental Engineering at Peking University and the Department of Engineering Science at Oxford

  4. [How to Interpret and Use Routine Laboratory Data--Our Methods to Interpret Routine Laboratory Data--Chairmen's Introductory Remarks].

    PubMed

    Honda, Takayuki; Tozuka, Minoru

    2015-09-01

    In the reversed clinicopathological conference (R-CPC), three specialists in laboratory medicine interpreted routine laboratory data independently in order to understand the detailed state of a patient. R-CPC is an educational method to use laboratory data appropriately, and it is also important to select differential diagnoses in a process of clinical reasoning in addition to the present illness and physical examination. Routine laboratory tests can be performed repeatedly at a relatively low cost, and their time-series analysis can be performed. Interpretation of routine laboratory data is almost the same as taking physical findings. General findings are initially checked and then the state of each organ is examined. Although routine laboratory tests cost little, we can gain much more information from them about the patient than physical examinations.

  5. Method Designed to Respect Molecular Heterogeneity Can Profoundly Correct Present Data Interpretations for Genome-Wide Expression Analysis

    PubMed Central

    Chen, Chih-Hao; Hsu, Chueh-Lin; Huang, Shih-Hao; Chen, Shih-Yuan; Hung, Yi-Lin; Chen, Hsiao-Rong; Wu, Yu-Chung

    2015-01-01

    Although genome-wide expression analysis has become a routine tool for gaining insight into molecular mechanisms, extraction of information remains a major challenge. It has been unclear why standard statistical methods, such as the t-test and ANOVA, often lead to low levels of reproducibility, how likely applying fold-change cutoffs to enhance reproducibility is to miss key signals, and how adversely using such methods has affected data interpretations. We broadly examined expression data to investigate the reproducibility problem and discovered that molecular heterogeneity, a biological property of genetically different samples, has been improperly handled by the statistical methods. Here we give a mathematical description of the discovery and report the development of a statistical method, named HTA, for better handling molecular heterogeneity. We broadly demonstrate the improved sensitivity and specificity of HTA over the conventional methods and show that using fold-change cutoffs has lost much information. We illustrate the especial usefulness of HTA for heterogeneous diseases, by applying it to existing data sets of schizophrenia, bipolar disorder and Parkinson’s disease, and show it can abundantly and reproducibly uncover disease signatures not previously detectable. Based on 156 biological data sets, we estimate that the methodological issue has affected over 96% of expression studies and that HTA can profoundly correct 86% of the affected data interpretations. The methodological advancement can better facilitate systems understandings of biological processes, render biological inferences that are more reliable than they have hitherto been and engender translational medical applications, such as identifying diagnostic biomarkers and drug prediction, which are more robust. PMID:25793610

  6. Validity threats: overcoming interference with proposed interpretations of assessment data.

    PubMed

    Downing, Steven M; Haladyna, Thomas M

    2004-03-01

    Factors that interfere with the ability to interpret assessment scores or ratings in the proposed manner threaten validity. To be interpreted in a meaningful manner, all assessments in medical education require sound, scientific evidence of validity. The purpose of this essay is to discuss 2 major threats to validity: construct under-representation (CU) and construct-irrelevant variance (CIV). Examples of each type of threat for written, performance and clinical performance examinations are provided. The CU threat to validity refers to undersampling the content domain. Using too few items, cases or clinical performance observations to adequately generalise to the domain represents CU. Variables that systematically (rather than randomly) interfere with the ability to meaningfully interpret scores or ratings represent CIV. Issues such as flawed test items written at inappropriate reading levels or statistically biased questions represent CIV in written tests. For performance examinations, such as standardised patient examinations, flawed cases or cases that are too difficult for student ability contribute CIV to the assessment. For clinical performance data, systematic rater error, such as halo or central tendency error, represents CIV. The term face validity is rejected as representative of any type of legitimate validity evidence, although the fact that the appearance of the assessment may be an important characteristic other than validity is acknowledged. There are multiple threats to validity in all types of assessment in medical education. Methods to eliminate or control validity threats are suggested.

  7. Towards evidence-based computational statistics: lessons from clinical research on the role and design of real-data benchmark studies.

    PubMed

    Boulesteix, Anne-Laure; Wilson, Rory; Hapfelmeier, Alexander

    2017-09-09

    The goal of medical research is to develop interventions that are in some sense superior, with respect to patient outcome, to interventions currently in use. Similarly, the goal of research in methodological computational statistics is to develop data analysis tools that are themselves superior to the existing tools. The methodology of the evaluation of medical interventions continues to be discussed extensively in the literature and it is now well accepted that medicine should be at least partly "evidence-based". Although we statisticians are convinced of the importance of unbiased, well-thought-out study designs and evidence-based approaches in the context of clinical research, we tend to ignore these principles when designing our own studies for evaluating statistical methods in the context of our methodological research. In this paper, we draw an analogy between clinical trials and real-data-based benchmarking experiments in methodological statistical science, with datasets playing the role of patients and methods playing the role of medical interventions. Through this analogy, we suggest directions for improvement in the design and interpretation of studies which use real data to evaluate statistical methods, in particular with respect to dataset inclusion criteria and the reduction of various forms of bias. More generally, we discuss the concept of "evidence-based" statistical research, its limitations and its impact on the design and interpretation of real-data-based benchmark experiments. We suggest that benchmark studies-a method of assessment of statistical methods using real-world datasets-might benefit from adopting (some) concepts from evidence-based medicine towards the goal of more evidence-based statistical research.

  8. Applied statistics in ecology: common pitfalls and simple solutions

    Treesearch

    E. Ashley Steel; Maureen C. Kennedy; Patrick G. Cunningham; John S. Stanovick

    2013-01-01

    The most common statistical pitfalls in ecological research are those associated with data exploration, the logic of sampling and design, and the interpretation of statistical results. Although one can find published errors in calculations, the majority of statistical pitfalls result from incorrect logic or interpretation despite correct numerical calculations. There...

  9. Statistics Translated: A Step-by-Step Guide to Analyzing and Interpreting Data

    ERIC Educational Resources Information Center

    Terrell, Steven R.

    2012-01-01

    Written in a humorous and encouraging style, this text shows how the most common statistical tools can be used to answer interesting real-world questions, presented as mysteries to be solved. Engaging research examples lead the reader through a series of six steps, from identifying a researchable problem to stating a hypothesis, identifying…

  10. A Closer Look at Data Independence: Comment on “Lies, Damned Lies, and Statistics (in Geology)”

    NASA Astrophysics Data System (ADS)

    Kravtsov, Sergey; Saunders, Rolando Olivas

    2011-02-01

    In his Forum (Eos, 90(47), 443, doi:10.1029/2009EO470004, 2009), P. Vermeesch suggests that statistical tests are not fit to interpret long data records. He asserts that for large enough data sets any true null hypothesis will always be rejected. This is certainly not the case! Here we revisit this author's example of weekly distribution of earthquakes and show that statistical results support the commonsense expectation that seismic activity does not depend on weekday (see the online supplement to this Eos issue for details (http://www.agu.org/eos_elec/)).

  11. Interpretation of remotely sensed data and its applications in oceanography

    NASA Technical Reports Server (NTRS)

    Parada, N. D. J. (Principal Investigator); Tanaka, K.; Inostroza, H. M.; Verdesio, J. J.

    1982-01-01

    The methodology of interpretation of remote sensing data and its oceanographic applications are described. The elements of image interpretation for different types of sensors are discussed. The sensors utilized are the multispectral scanner of LANDSAT, and the thermal infrared of NOAA and geostationary satellites. Visual and automatic data interpretation in studies of pollution, the Brazil current system, and upwelling along the southeastern Brazilian coast are compared.

  12. On Improving the Quality and Interpretation of Environmental Assessments using Statistical Analysis and Geographic Information Systems

    NASA Astrophysics Data System (ADS)

    Karuppiah, R.; Faldi, A.; Laurenzi, I.; Usadi, A.; Venkatesh, A.

    2014-12-01

    An increasing number of studies are focused on assessing the environmental footprint of different products and processes, especially using life cycle assessment (LCA). This work shows how combining statistical methods and Geographic Information Systems (GIS) with environmental analyses can help improve the quality of results and their interpretation. Most environmental assessments in literature yield single numbers that characterize the environmental impact of a process/product - typically global or country averages, often unchanging in time. In this work, we show how statistical analysis and GIS can help address these limitations. For example, we demonstrate a method to separately quantify uncertainty and variability in the result of LCA models using a power generation case study. This is important for rigorous comparisons between the impacts of different processes. Another challenge is lack of data that can affect the rigor of LCAs. We have developed an approach to estimate environmental impacts of incompletely characterized processes using predictive statistical models. This method is applied to estimate unreported coal power plant emissions in several world regions. There is also a general lack of spatio-temporal characterization of the results in environmental analyses. For instance, studies that focus on water usage do not put in context where and when water is withdrawn. Through the use of hydrological modeling combined with GIS, we quantify water stress on a regional and seasonal basis to understand water supply and demand risks for multiple users. Another example where it is important to consider regional dependency of impacts is when characterizing how agricultural land occupation affects biodiversity in a region. We developed a data-driven methodology used in conjuction with GIS to determine if there is a statistically significant difference between the impacts of growing different crops on different species in various biomes of the world.

  13. Interpreting biomarker data from the COPHES/DEMOCOPHES twin projects: Using external exposure data to understand biomarker differences among countries

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Smolders, R., E-mail: roel.smolders@vito.be; Den Hond, E.; Koppen, G.

    In 2011 and 2012, the COPHES/DEMOCOPHES twin projects performed the first ever harmonized human biomonitoring survey in 17 European countries. In more than 1800 mother–child pairs, individual lifestyle data were collected and cadmium, cotinine and certain phthalate metabolites were measured in urine. Total mercury was determined in hair samples. While the main goal of the COPHES/DEMOCOPHES twin projects was to develop and test harmonized protocols and procedures, the goal of the current paper is to investigate whether the observed differences in biomarker values among the countries implementing DEMOCOPHES can be interpreted using information from external databases on environmental quality andmore » lifestyle. In general, 13 countries having implemented DEMOCOPHES provided high-quality data from external sources that were relevant for interpretation purposes. However, some data were not available for reporting or were not in line with predefined specifications. Therefore, only part of the external information could be included in the statistical analyses. Nonetheless, there was a highly significant correlation between national levels of fish consumption and mercury in hair, the strength of antismoking legislation was significantly related to urinary cotinine levels, and we were able to show indications that also urinary cadmium levels were associated with environmental quality and food quality. These results again show the potential of biomonitoring data to provide added value for (the evaluation of) evidence-informed policy making. - Highlights: • External data was collected to interpret HBM data from DEMOCOPHES. • Hg in hair could be related to fish consumption across different countries. • Urinary cotinine was related to strictness of anti-smoking legislation. • Urinary Cd was borderline significantly related to air and food quality. • Lack of comparable data among countries hampered the analysis.« less

  14. Evaluation of statistical treatments of left-censored environmental data using coincident uncensored data sets: I. Summary statistics

    USGS Publications Warehouse

    Antweiler, Ronald C.; Taylor, Howard E.

    2008-01-01

    The main classes of statistical treatment of below-detection limit (left-censored) environmental data for the determination of basic statistics that have been used in the literature are substitution methods, maximum likelihood, regression on order statistics (ROS), and nonparametric techniques. These treatments, along with using all instrument-generated data (even those below detection), were evaluated by examining data sets in which the true values of the censored data were known. It was found that for data sets with less than 70% censored data, the best technique overall for determination of summary statistics was the nonparametric Kaplan-Meier technique. ROS and the two substitution methods of assigning one-half the detection limit value to censored data or assigning a random number between zero and the detection limit to censored data were adequate alternatives. The use of these two substitution methods, however, requires a thorough understanding of how the laboratory censored the data. The technique of employing all instrument-generated data - including numbers below the detection limit - was found to be less adequate than the above techniques. At high degrees of censoring (greater than 70% censored data), no technique provided good estimates of summary statistics. Maximum likelihood techniques were found to be far inferior to all other treatments except substituting zero or the detection limit value to censored data.

  15. Powerful Statistical Inference for Nested Data Using Sufficient Summary Statistics

    PubMed Central

    Dowding, Irene; Haufe, Stefan

    2018-01-01

    Hierarchically-organized data arise naturally in many psychology and neuroscience studies. As the standard assumption of independent and identically distributed samples does not hold for such data, two important problems are to accurately estimate group-level effect sizes, and to obtain powerful statistical tests against group-level null hypotheses. A common approach is to summarize subject-level data by a single quantity per subject, which is often the mean or the difference between class means, and treat these as samples in a group-level t-test. This “naive” approach is, however, suboptimal in terms of statistical power, as it ignores information about the intra-subject variance. To address this issue, we review several approaches to deal with nested data, with a focus on methods that are easy to implement. With what we call the sufficient-summary-statistic approach, we highlight a computationally efficient technique that can improve statistical power by taking into account within-subject variances, and we provide step-by-step instructions on how to apply this approach to a number of frequently-used measures of effect size. The properties of the reviewed approaches and the potential benefits over a group-level t-test are quantitatively assessed on simulated data and demonstrated on EEG data from a simulated-driving experiment. PMID:29615885

  16. GO Explorer: A gene-ontology tool to aid in the interpretation of shotgun proteomics data.

    PubMed

    Carvalho, Paulo C; Fischer, Juliana Sg; Chen, Emily I; Domont, Gilberto B; Carvalho, Maria Gc; Degrave, Wim M; Yates, John R; Barbosa, Valmir C

    2009-02-24

    Spectral counting is a shotgun proteomics approach comprising the identification and relative quantitation of thousands of proteins in complex mixtures. However, this strategy generates bewildering amounts of data whose biological interpretation is a challenge. Here we present a new algorithm, termed GO Explorer (GOEx), that leverages the gene ontology (GO) to aid in the interpretation of proteomic data. GOEx stands out because it combines data from protein fold changes with GO over-representation statistics to help draw conclusions. Moreover, it is tightly integrated within the PatternLab for Proteomics project and, thus, lies within a complete computational environment that provides parsers and pattern recognition tools designed for spectral counting. GOEx offers three independent methods to query data: an interactive directed acyclic graph, a specialist mode where key words can be searched, and an automatic search. Its usefulness is demonstrated by applying it to help interpret the effects of perillyl alcohol, a natural chemotherapeutic agent, on glioblastoma multiform cell lines (A172). We used a new multi-surfactant shotgun proteomic strategy and identified more than 2600 proteins; GOEx pinpointed key sets of differentially expressed proteins related to cell cycle, alcohol catabolism, the Ras pathway, apoptosis, and stress response, to name a few. GOEx facilitates organism-specific studies by leveraging GO and providing a rich graphical user interface. It is a simple to use tool, specialized for biologists who wish to analyze spectral counting data from shotgun proteomics. GOEx is available at http://pcarvalho.com/patternlab.

  17. An empirical evaluation of genetic distance statistics using microsatellite data from bear (Ursidae) populations.

    PubMed

    Paetkau, D; Waits, L P; Clarkson, P L; Craighead, L; Strobeck, C

    1997-12-01

    A large microsatellite data set from three species of bear (Ursidae) was used to empirically test the performance of six genetic distance measures in resolving relationships at a variety of scales ranging from adjacent areas in a continuous distribution to species that diverged several million years ago. At the finest scale, while some distance measures performed extremely well, statistics developed specifically to accommodate the mutational processes of microsatellites performed relatively poorly, presumably because of the relatively higher variance of these statistics. At the other extreme, no statistic was able to resolve the close sister relationship of polar bears and brown bears from more distantly related pairs of species. This failure is most likely due to constraints on allele distributions at microsatellite loci. At intermediate scales, both within continuous distributions and in comparisons to insular populations of late Pleistocene origin, it was not possible to define the point where linearity was lost for each of the statistics, except that it is clearly lost after relatively short periods of independent evolution. All of the statistics were affected by the amount of genetic diversity within the populations being compared, significantly complicating the interpretation of genetic distance data.

  18. An Empirical Evaluation of Genetic Distance Statistics Using Microsatellite Data from Bear (Ursidae) Populations

    PubMed Central

    Paetkau, D.; Waits, L. P.; Clarkson, P. L.; Craighead, L.; Strobeck, C.

    1997-01-01

    A large microsatellite data set from three species of bear (Ursidae) was used to empirically test the performance of six genetic distance measures in resolving relationships at a variety of scales ranging from adjacent areas in a continuous distribution to species that diverged several million years ago. At the finest scale, while some distance measures performed extremely well, statistics developed specifically to accommodate the mutational processes of microsatellites performed relatively poorly, presumably because of the relatively higher variance of these statistics. At the other extreme, no statistic was able to resolve the close sister relationship of polar bears and brown bears from more distantly related pairs of species. This failure is most likely due to constraints on allele distributions at microsatellite loci. At intermediate scales, both within continuous distributions and in comparisons to insular populations of late Pleistocene origin, it was not possible to define the point where linearity was lost for each of the statistics, except that it is clearly lost after relatively short periods of independent evolution. All of the statistics were affected by the amount of genetic diversity within the populations being compared, significantly complicating the interpretation of genetic distance data. PMID:9409849

  19. Data Interpretation: Using Probability

    ERIC Educational Resources Information Center

    Drummond, Gordon B.; Vowler, Sarah L.

    2011-01-01

    Experimental data are analysed statistically to allow researchers to draw conclusions from a limited set of measurements. The hard fact is that researchers can never be certain that measurements from a sample will exactly reflect the properties of the entire group of possible candidates available to be studied (although using a sample is often the…

  20. STATISTICAL DATA ON CHEMICAL COMPOUNDS.

    DTIC Science & Technology

    DATA STORAGE SYSTEMS, FEASIBILITY STUDIES, COMPUTERS, STATISTICAL DATA , DOCUMENTS, ARMY...CHEMICAL COMPOUNDS, INFORMATION RETRIEVAL), (*INFORMATION RETRIEVAL, CHEMICAL COMPOUNDS), MOLECULAR STRUCTURE, BIBLIOGRAPHIES, DATA PROCESSING

  1. Imputing historical statistics, soils information, and other land-use data to crop area

    NASA Technical Reports Server (NTRS)

    Perry, C. R., Jr.; Willis, R. W.; Lautenschlager, L.

    1982-01-01

    In foreign crop condition monitoring, satellite acquired imagery is routinely used. To facilitate interpretation of this imagery, it is advantageous to have estimates of the crop types and their extent for small area units, i.e., grid cells on a map represent, at 60 deg latitude, an area nominally 25 by 25 nautical miles in size. The feasibility of imputing historical crop statistics, soils information, and other ancillary data to crop area for a province in Argentina is studied.

  2. Teaching Business Statistics with Real Data to Undergraduates and the Use of Technology in the Class Room

    ERIC Educational Resources Information Center

    Singamsetti, Rao

    2007-01-01

    In this paper an attempt is made to highlight some issues of interpretation of statistical concepts and interpretation of results as taught in undergraduate Business statistics courses. The use of modern technology in the class room is shown to have increased the efficiency and the ease of learning and teaching in statistics. The importance of…

  3. Statistical Literacy as a Function of Online versus Hybrid Course Delivery Format for an Introductory Graduate Statistics Course

    ERIC Educational Resources Information Center

    Hahs-Vaughn, Debbie L.; Acquaye, Hannah; Griffith, Matthew D.; Jo, Hang; Matthews, Ken; Acharya, Parul

    2017-01-01

    Statistical literacy refers to understanding fundamental statistical concepts. Assessment of statistical literacy can take the forms of tasks that require students to identify, translate, compute, read, and interpret data. In addition, statistical instruction can take many forms encompassing course delivery format such as face-to-face, hybrid,…

  4. Data and Statistics of DVT/PE

    MedlinePlus

    ... Controls Cancel Submit Search the CDC Venous Thromboembolism (Blood Clots) Note: Javascript is disabled or is not supported ... Challenge HA-VTE Data & Statistics HA-VTE Resources Blood Clots and Travel Research and Treatment Centers Data & Statistics ...

  5. Data-adaptive test statistics for microarray data.

    PubMed

    Mukherjee, Sach; Roberts, Stephen J; van der Laan, Mark J

    2005-09-01

    An important task in microarray data analysis is the selection of genes that are differentially expressed between different tissue samples, such as healthy and diseased. However, microarray data contain an enormous number of dimensions (genes) and very few samples (arrays), a mismatch which poses fundamental statistical problems for the selection process that have defied easy resolution. In this paper, we present a novel approach to the selection of differentially expressed genes in which test statistics are learned from data using a simple notion of reproducibility in selection results as the learning criterion. Reproducibility, as we define it, can be computed without any knowledge of the 'ground-truth', but takes advantage of certain properties of microarray data to provide an asymptotically valid guide to expected loss under the true data-generating distribution. We are therefore able to indirectly minimize expected loss, and obtain results substantially more robust than conventional methods. We apply our method to simulated and oligonucleotide array data. By request to the corresponding author.

  6. Statistical Policy Working Paper 24. Electronic Dissemination of Statistical Data

    DOT National Transportation Integrated Search

    1995-11-01

    The report, Statistical Policy Working Paper 24, Electronic Dissemination of Statistical Data, includes several topics, such as Options and Best Uses for Different Media Operation of Electronic Dissemination Service, Customer Service Programs, Cost a...

  7. GeoDash: Assisting Visual Image Interpretation in Collect Earth Online by Leveraging Big Data on Google Earth Engine

    NASA Technical Reports Server (NTRS)

    Markert, Kel; Ashmall, William; Johnson, Gary; Saah, David; Mollicone, Danilo; Diaz, Alfonso Sanchez-Paus; Anderson, Eric; Flores, Africa; Griffin, Robert

    2017-01-01

    Collect Earth Online (CEO) is a free and open online implementation of the FAO Collect Earth system for collaboratively collecting environmental data through the visual interpretation of Earth observation imagery. The primary collection mechanism in CEO is human interpretation of land surface characteristics in imagery served via Web Map Services (WMS). However, interpreters may not have enough contextual information to classify samples by only viewing the imagery served via WMS, be they high resolution or otherwise. To assist in the interpretation and collection processes in CEO, SERVIR, a joint NASA-USAID initiative that brings Earth observations to improve environmental decision making in developing countries, developed the GeoDash system, an embedded and critical component of CEO. GeoDash leverages Google Earth Engine (GEE) by allowing users to set up custom browser-based widgets that pull from GEE's massive public data catalog. These widgets can be quick looks of other satellite imagery, time series graphs of environmental variables, and statistics panels of the same. Users can customize widgets with any of GEE's image collections, such as the historical Landsat collection with data available since the 1970s, select date ranges, image stretch parameters, graph characteristics, and create custom layouts, all on-the-fly to support plot interpretation in CEO. This presentation focuses on the implementation and potential applications, including the back-end links to GEE and the user interface with custom widget building. GeoDash takes large data volumes and condenses them into meaningful, relevant information for interpreters. While designed initially with national and global forest resource assessments in mind, the system will complement disaster assessments, agriculture management, project monitoring and evaluation, and more.

  8. GeoDash: Assisting Visual Image Interpretation in Collect Earth Online by Leveraging Big Data on Google Earth Engine

    NASA Astrophysics Data System (ADS)

    Markert, K. N.; Ashmall, W.; Johnson, G.; Saah, D. S.; Anderson, E.; Flores Cordova, A. I.; Díaz, A. S. P.; Mollicone, D.; Griffin, R.

    2017-12-01

    Collect Earth Online (CEO) is a free and open online implementation of the FAO Collect Earth system for collaboratively collecting environmental data through the visual interpretation of Earth observation imagery. The primary collection mechanism in CEO is human interpretation of land surface characteristics in imagery served via Web Map Services (WMS). However, interpreters may not have enough contextual information to classify samples by only viewing the imagery served via WMS, be they high resolution or otherwise. To assist in the interpretation and collection processes in CEO, SERVIR, a joint NASA-USAID initiative that brings Earth observations to improve environmental decision making in developing countries, developed the GeoDash system, an embedded and critical component of CEO. GeoDash leverages Google Earth Engine (GEE) by allowing users to set up custom browser-based widgets that pull from GEE's massive public data catalog. These widgets can be quick looks of other satellite imagery, time series graphs of environmental variables, and statistics panels of the same. Users can customize widgets with any of GEE's image collections, such as the historical Landsat collection with data available since the 1970s, select date ranges, image stretch parameters, graph characteristics, and create custom layouts, all on-the-fly to support plot interpretation in CEO. This presentation focuses on the implementation and potential applications, including the back-end links to GEE and the user interface with custom widget building. GeoDash takes large data volumes and condenses them into meaningful, relevant information for interpreters. While designed initially with national and global forest resource assessments in mind, the system will complement disaster assessments, agriculture management, project monitoring and evaluation, and more.

  9. Interpretable Categorization of Heterogeneous Time Series Data

    NASA Technical Reports Server (NTRS)

    Lee, Ritchie; Kochenderfer, Mykel J.; Mengshoel, Ole J.; Silbermann, Joshua

    2017-01-01

    We analyze data from simulated aircraft encounters to validate and inform the development of a prototype aircraft collision avoidance system. The high-dimensional and heterogeneous time series dataset is analyzed to discover properties of near mid-air collisions (NMACs) and categorize the NMAC encounters. Domain experts use these properties to better organize and understand NMAC occurrences. Existing solutions either are not capable of handling high-dimensional and heterogeneous time series datasets or do not provide explanations that are interpretable by a domain expert. The latter is critical to the acceptance and deployment of safety-critical systems. To address this gap, we propose grammar-based decision trees along with a learning algorithm. Our approach extends decision trees with a grammar framework for classifying heterogeneous time series data. A context-free grammar is used to derive decision expressions that are interpretable, application-specific, and support heterogeneous data types. In addition to classification, we show how grammar-based decision trees can also be used for categorization, which is a combination of clustering and generating interpretable explanations for each cluster. We apply grammar-based decision trees to a simulated aircraft encounter dataset and evaluate the performance of four variants of our learning algorithm. The best algorithm is used to analyze and categorize near mid-air collisions in the aircraft encounter dataset. We describe each discovered category in detail and discuss its relevance to aircraft collision avoidance.

  10. Data-driven inference for the spatial scan statistic.

    PubMed

    Almeida, Alexandre C L; Duarte, Anderson R; Duczmal, Luiz H; Oliveira, Fernando L P; Takahashi, Ricardo H C

    2011-08-02

    Kulldorff's spatial scan statistic for aggregated area maps searches for clusters of cases without specifying their size (number of areas) or geographic location in advance. Their statistical significance is tested while adjusting for the multiple testing inherent in such a procedure. However, as is shown in this work, this adjustment is not done in an even manner for all possible cluster sizes. A modification is proposed to the usual inference test of the spatial scan statistic, incorporating additional information about the size of the most likely cluster found. A new interpretation of the results of the spatial scan statistic is done, posing a modified inference question: what is the probability that the null hypothesis is rejected for the original observed cases map with a most likely cluster of size k, taking into account only those most likely clusters of size k found under null hypothesis for comparison? This question is especially important when the p-value computed by the usual inference process is near the alpha significance level, regarding the correctness of the decision based in this inference. A practical procedure is provided to make more accurate inferences about the most likely cluster found by the spatial scan statistic.

  11. Interpreting the Weibull fitting parameters for diffusion-controlled release data

    NASA Astrophysics Data System (ADS)

    Ignacio, Maxime; Chubynsky, Mykyta V.; Slater, Gary W.

    2017-11-01

    We examine the diffusion-controlled release of molecules from passive delivery systems using both analytical solutions of the diffusion equation and numerically exact Lattice Monte Carlo data. For very short times, the release process follows a √{ t } power law, typical of diffusion processes, while the long-time asymptotic behavior is exponential. The crossover time between these two regimes is determined by the boundary conditions and initial loading of the system. We show that while the widely used Weibull function provides a reasonable fit (in terms of statistical error), it has two major drawbacks: (i) it does not capture the correct limits and (ii) there is no direct connection between the fitting parameters and the properties of the system. Using a physically motivated interpolating fitting function that correctly includes both time regimes, we are able to predict the values of the Weibull parameters which allows us to propose a physical interpretation.

  12. The Malpractice of Statistical Interpretation

    ERIC Educational Resources Information Center

    Fraas, John W.; Newman, Isadore

    1978-01-01

    Problems associated with the use of gain scores, analysis of covariance, multicollinearity, part and partial correlation, and the lack of rectilinearity in regression are discussed. Particular attention is paid to the misuse of statistical techniques. (JKS)

  13. Geovisual analytics to enhance spatial scan statistic interpretation: an analysis of U.S. cervical cancer mortality.

    PubMed

    Chen, Jin; Roth, Robert E; Naito, Adam T; Lengerich, Eugene J; Maceachren, Alan M

    2008-11-07

    Kulldorff's spatial scan statistic and its software implementation - SaTScan - are widely used for detecting and evaluating geographic clusters. However, two issues make using the method and interpreting its results non-trivial: (1) the method lacks cartographic support for understanding the clusters in geographic context and (2) results from the method are sensitive to parameter choices related to cluster scaling (abbreviated as scaling parameters), but the system provides no direct support for making these choices. We employ both established and novel geovisual analytics methods to address these issues and to enhance the interpretation of SaTScan results. We demonstrate our geovisual analytics approach in a case study analysis of cervical cancer mortality in the U.S. We address the first issue by providing an interactive visual interface to support the interpretation of SaTScan results. Our research to address the second issue prompted a broader discussion about the sensitivity of SaTScan results to parameter choices. Sensitivity has two components: (1) the method can identify clusters that, while being statistically significant, have heterogeneous contents comprised of both high-risk and low-risk locations and (2) the method can identify clusters that are unstable in location and size as the spatial scan scaling parameter is varied. To investigate cluster result stability, we conducted multiple SaTScan runs with systematically selected parameters. The results, when scanning a large spatial dataset (e.g., U.S. data aggregated by county), demonstrate that no single spatial scan scaling value is known to be optimal to identify clusters that exist at different scales; instead, multiple scans that vary the parameters are necessary. We introduce a novel method of measuring and visualizing reliability that facilitates identification of homogeneous clusters that are stable across analysis scales. Finally, we propose a logical approach to proceed through the analysis of

  14. Geovisual analytics to enhance spatial scan statistic interpretation: an analysis of U.S. cervical cancer mortality

    PubMed Central

    Chen, Jin; Roth, Robert E; Naito, Adam T; Lengerich, Eugene J; MacEachren, Alan M

    2008-01-01

    Background Kulldorff's spatial scan statistic and its software implementation – SaTScan – are widely used for detecting and evaluating geographic clusters. However, two issues make using the method and interpreting its results non-trivial: (1) the method lacks cartographic support for understanding the clusters in geographic context and (2) results from the method are sensitive to parameter choices related to cluster scaling (abbreviated as scaling parameters), but the system provides no direct support for making these choices. We employ both established and novel geovisual analytics methods to address these issues and to enhance the interpretation of SaTScan results. We demonstrate our geovisual analytics approach in a case study analysis of cervical cancer mortality in the U.S. Results We address the first issue by providing an interactive visual interface to support the interpretation of SaTScan results. Our research to address the second issue prompted a broader discussion about the sensitivity of SaTScan results to parameter choices. Sensitivity has two components: (1) the method can identify clusters that, while being statistically significant, have heterogeneous contents comprised of both high-risk and low-risk locations and (2) the method can identify clusters that are unstable in location and size as the spatial scan scaling parameter is varied. To investigate cluster result stability, we conducted multiple SaTScan runs with systematically selected parameters. The results, when scanning a large spatial dataset (e.g., U.S. data aggregated by county), demonstrate that no single spatial scan scaling value is known to be optimal to identify clusters that exist at different scales; instead, multiple scans that vary the parameters are necessary. We introduce a novel method of measuring and visualizing reliability that facilitates identification of homogeneous clusters that are stable across analysis scales. Finally, we propose a logical approach to proceed

  15. The Lure of Statistics in Data Mining

    ERIC Educational Resources Information Center

    Grover, Lovleen Kumar; Mehra, Rajni

    2008-01-01

    The field of Data Mining like Statistics concerns itself with "learning from data" or "turning data into information". For statisticians the term "Data mining" has a pejorative meaning. Instead of finding useful patterns in large volumes of data as in the case of Statistics, data mining has the connotation of searching for data to fit preconceived…

  16. Data selection techniques in the interpretation of MAGSAT data over Australia

    NASA Technical Reports Server (NTRS)

    Johnson, B. D.; Dampney, C. N. G.

    1983-01-01

    The MAGSAT data require critical selection in order to produce a self-consistent data set suitable for map construction and subsequent interpretation. Interactive data selection techniques are described which involve the use of a special-purpose profile-oriented data base and a colour graphics display. The careful application of these data selection techniques permits validation every data value and ensures that the best possible self-consistent data set is being used to construct the maps of the magnetic field measured at satellite altitudes over Australia.

  17. Statistical Inference for Data Adaptive Target Parameters.

    PubMed

    Hubbard, Alan E; Kherad-Pajouh, Sara; van der Laan, Mark J

    2016-05-01

    Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in an estimation sample (one of the V subsamples) and corresponding complementary parameter-generating sample. For each of the V parameter-generating samples, we apply an algorithm that maps the sample to a statistical target parameter. We define our sample-split data adaptive statistical target parameter as the average of these V-sample specific target parameters. We present an estimator (and corresponding central limit theorem) of this type of data adaptive target parameter. This general methodology for generating data adaptive target parameters is demonstrated with a number of practical examples that highlight new opportunities for statistical learning from data. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming "data-driven", the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods. To suggest such potential, and to verify the predictions of the theory, extensive simulation studies, along with a data analysis based on adaptively determined intervention rules are shown and give insight into how to structure such an approach. The results show that the data adaptive target parameter approach provides a general framework and resulting methodology for data-driven science.

  18. Using Statistical Mechanics and Entropy Principles to Interpret Variability in Power Law Models of the Streamflow Recession

    NASA Astrophysics Data System (ADS)

    Dralle, D.; Karst, N.; Thompson, S. E.

    2015-12-01

    Multiple competing theories suggest that power law behavior governs the observed first-order dynamics of streamflow recessions - the important process by which catchments dry-out via the stream network, altering the availability of surface water resources and in-stream habitat. Frequently modeled as: dq/dt = -aqb, recessions typically exhibit a high degree of variability, even within a single catchment, as revealed by significant shifts in the values of "a" and "b" across recession events. One potential source of this variability lies in underlying, hard-to-observe fluctuations in how catchment water storage is partitioned amongst distinct storage elements, each having different discharge behaviors. Testing this and competing hypotheses with widely available streamflow timeseries, however, has been hindered by a power law scaling artifact that obscures meaningful covariation between the recession parameters, "a" and "b". Here we briefly outline a technique that removes this artifact, revealing intriguing new patterns in the joint distribution of recession parameters. Using long-term flow data from catchments in Northern California, we explore temporal variations, and find that the "a" parameter varies strongly with catchment wetness. Then we explore how the "b" parameter changes with "a", and find that measures of its variation are maximized at intermediate "a" values. We propose an interpretation of this pattern based on statistical mechanics, meaning "b" can be viewed as an indicator of the catchment "microstate" - i.e. the partitioning of storage - and "a" as a measure of the catchment macrostate (i.e. the total storage). In statistical mechanics, entropy (i.e. microstate variance, that is the variance of "b") is maximized for intermediate values of extensive variables (i.e. wetness, "a"), as observed in the recession data. This interpretation of "a" and "b" was supported by model runs using a multiple-reservoir catchment toy model, and lends support to the

  19. Accessing seismic data through geological interpretation: Challenges and solutions

    NASA Astrophysics Data System (ADS)

    Butler, R. W.; Clayton, S.; McCaffrey, B.

    2008-12-01

    Between them, the world's research programs, national institutions and corporations, especially oil and gas companies, have acquired substantial volumes of seismic reflection data. Although the vast majority are proprietary and confidential, significant data are released and available for research, including those in public data libraries. The challenge now is to maximise use of these data, by providing routes to seismic not simply on the basis of acquisition or processing attributes but via the geology they image. The Virtual Seismic Atlas (VSA: www.seismicatlas.org) meets this challenge by providing an independent, free-to-use community based internet resource that captures and shares the geological interpretation of seismic data globally. Images and associated documents are explicitly indexed by extensive metadata trees, using not only existing survey and geographical data but also the geology they portray. The solution uses a Documentum database interrogated through Endeca Guided Navigation, to search, discover and retrieve images. The VSA allows users to compare contrasting interpretations of clean data thereby exploring the ranges of uncertainty in the geometric interpretation of subsurface structure. The metadata structures can be used to link reports and published research together with other data types such as wells. And the VSA can link to existing data libraries. Searches can take different paths, revealing arrays of geological analogues, new datasets while providing entirely novel insights and genuine surprises. This can then drive new creative opportunities for research and training, and expose the contents of seismic data libraries to the world.

  20. Statistical Literacy: Data Tell a Story

    ERIC Educational Resources Information Center

    Sole, Marla A.

    2016-01-01

    Every day, students collect, organize, and analyze data to make decisions. In this data-driven world, people need to assess how much trust they can place in summary statistics. The results of every survey and the safety of every drug that undergoes a clinical trial depend on the correct application of appropriate statistics. Recognizing the…

  1. Developments in FT-ICR MS instrumentation, ionization techniques, and data interpretation methods for petroleomics.

    PubMed

    Cho, Yunju; Ahmed, Arif; Islam, Annana; Kim, Sunghwan

    2015-01-01

    Because of the increasing importance of heavy and unconventional crude oil as an energy source, there is a growing need for petroleomics: the pursuit of more complete and detailed knowledge of the chemical compositions of crude oil. Crude oil has an extremely complex nature; hence, techniques with ultra-high resolving capabilities, such as Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS), are necessary. FT-ICR MS has been successfully applied to the study of heavy and unconventional crude oils such as bitumen and shale oil. However, the analysis of crude oil with FT-ICR MS is not trivial, and it has pushed analysis to the limits of instrumental and methodological capabilities. For example, high-resolution mass spectra of crude oils may contain over 100,000 peaks that require interpretation. To visualize large data sets more effectively, data processing methods such as Kendrick mass defect analysis and statistical analyses have been developed. The successful application of FT-ICR MS to the study of crude oil has been critically dependent on key developments in FT-ICR MS instrumentation and data processing methods. This review offers an introduction to the basic principles, FT-ICR MS instrumentation development, ionization techniques, and data interpretation methods for petroleomics and is intended for readers having no prior experience in this field of study. © 2014 Wiley Periodicals, Inc.

  2. SOCR: Statistics Online Computational Resource

    ERIC Educational Resources Information Center

    Dinov, Ivo D.

    2006-01-01

    The need for hands-on computer laboratory experience in undergraduate and graduate statistics education has been firmly established in the past decade. As a result a number of attempts have been undertaken to develop novel approaches for problem-driven statistical thinking, data analysis and result interpretation. In this paper we describe an…

  3. DATA ON YOUTH, 1967, A STATISTICAL DOCUMENT.

    ERIC Educational Resources Information Center

    SCHEIDER, GEORGE

    THE DATA IN THIS REPORT ARE STATISTICS ON YOUTH THROUGHOUT THE UNITED STATES AND IN NEW YORK STATE. INCLUDED ARE DATA ON POPULATION, SCHOOL STATISTICS, EMPLOYMENT, FAMILY INCOME, JUVENILE DELINQUENCY AND YOUTH CRIME (INCLUDING NEW YORK CITY FIGURES), AND TRAFFIC ACCIDENTS. THE STATISTICS ARE PRESENTED IN THE TEXT AND IN TABLES AND CHARTS. (NH)

  4. Hold My Calls: An Activity for Introducing the Statistical Process

    ERIC Educational Resources Information Center

    Abel, Todd; Poling, Lisa

    2015-01-01

    Working with practicing teachers, this article demonstrates, through the facilitation of a statistical activity, how to introduce and investigate the unique qualities of the statistical process including: formulate a question, collect data, analyze data, and interpret data.

  5. ArraySolver: an algorithm for colour-coded graphical display and Wilcoxon signed-rank statistics for comparing microarray gene expression data.

    PubMed

    Khan, Haseeb Ahmad

    2004-01-01

    The massive surge in the production of microarray data poses a great challenge for proper analysis and interpretation. In recent years numerous computational tools have been developed to extract meaningful interpretation of microarray gene expression data. However, a convenient tool for two-groups comparison of microarray data is still lacking and users have to rely on commercial statistical packages that might be costly and require special skills, in addition to extra time and effort for transferring data from one platform to other. Various statistical methods, including the t-test, analysis of variance, Pearson test and Mann-Whitney U test, have been reported for comparing microarray data, whereas the utilization of the Wilcoxon signed-rank test, which is an appropriate test for two-groups comparison of gene expression data, has largely been neglected in microarray studies. The aim of this investigation was to build an integrated tool, ArraySolver, for colour-coded graphical display and comparison of gene expression data using the Wilcoxon signed-rank test. The results of software validation showed similar outputs with ArraySolver and SPSS for large datasets. Whereas the former program appeared to be more accurate for 25 or fewer pairs (n < or = 25), suggesting its potential application in analysing molecular signatures that usually contain small numbers of genes. The main advantages of ArraySolver are easy data selection, convenient report format, accurate statistics and the familiar Excel platform.

  6. ArraySolver: An Algorithm for Colour-Coded Graphical Display and Wilcoxon Signed-Rank Statistics for Comparing Microarray Gene Expression Data

    PubMed Central

    2004-01-01

    The massive surge in the production of microarray data poses a great challenge for proper analysis and interpretation. In recent years numerous computational tools have been developed to extract meaningful interpretation of microarray gene expression data. However, a convenient tool for two-groups comparison of microarray data is still lacking and users have to rely on commercial statistical packages that might be costly and require special skills, in addition to extra time and effort for transferring data from one platform to other. Various statistical methods, including the t-test, analysis of variance, Pearson test and Mann–Whitney U test, have been reported for comparing microarray data, whereas the utilization of the Wilcoxon signed-rank test, which is an appropriate test for two-groups comparison of gene expression data, has largely been neglected in microarray studies. The aim of this investigation was to build an integrated tool, ArraySolver, for colour-coded graphical display and comparison of gene expression data using the Wilcoxon signed-rank test. The results of software validation showed similar outputs with ArraySolver and SPSS for large datasets. Whereas the former program appeared to be more accurate for 25 or fewer pairs (n ≤ 25), suggesting its potential application in analysing molecular signatures that usually contain small numbers of genes. The main advantages of ArraySolver are easy data selection, convenient report format, accurate statistics and the familiar Excel platform. PMID:18629036

  7. On the statistical assessment of classifiers using DNA microarray data

    PubMed Central

    Ancona, N; Maglietta, R; Piepoli, A; D'Addabbo, A; Cotugno, R; Savino, M; Liuni, S; Carella, M; Pesole, G; Perri, F

    2006-01-01

    Background In this paper we present a method for the statistical assessment of cancer predictors which make use of gene expression profiles. The methodology is applied to a new data set of microarray gene expression data collected in Casa Sollievo della Sofferenza Hospital, Foggia – Italy. The data set is made up of normal (22) and tumor (25) specimens extracted from 25 patients affected by colon cancer. We propose to give answers to some questions which are relevant for the automatic diagnosis of cancer such as: Is the size of the available data set sufficient to build accurate classifiers? What is the statistical significance of the associated error rates? In what ways can accuracy be considered dependant on the adopted classification scheme? How many genes are correlated with the pathology and how many are sufficient for an accurate colon cancer classification? The method we propose answers these questions whilst avoiding the potential pitfalls hidden in the analysis and interpretation of microarray data. Results We estimate the generalization error, evaluated through the Leave-K-Out Cross Validation error, for three different classification schemes by varying the number of training examples and the number of the genes used. The statistical significance of the error rate is measured by using a permutation test. We provide a statistical analysis in terms of the frequencies of the genes involved in the classification. Using the whole set of genes, we found that the Weighted Voting Algorithm (WVA) classifier learns the distinction between normal and tumor specimens with 25 training examples, providing e = 21% (p = 0.045) as an error rate. This remains constant even when the number of examples increases. Moreover, Regularized Least Squares (RLS) and Support Vector Machines (SVM) classifiers can learn with only 15 training examples, with an error rate of e = 19% (p = 0.035) and e = 18% (p = 0.037) respectively. Moreover, the error rate decreases as the training set

  8. Statistical Considerations of Data Processing in Giovanni Online Tool

    NASA Technical Reports Server (NTRS)

    Suhung, Shen; Leptoukh, G.; Acker, J.; Berrick, S.

    2005-01-01

    The GES DISC Interactive Online Visualization and Analysis Infrastructure (Giovanni) is a web-based interface for the rapid visualization and analysis of gridded data from a number of remote sensing instruments. The GES DISC currently employs several Giovanni instances to analyze various products, such as Ocean-Giovanni for ocean products from SeaWiFS and MODIS-Aqua; TOMS & OM1 Giovanni for atmospheric chemical trace gases from TOMS and OMI, and MOVAS for aerosols from MODIS, etc. (http://giovanni.gsfc.nasa.gov) Foremost among the Giovanni statistical functions is data averaging. Two aspects of this function are addressed here. The first deals with the accuracy of averaging gridded mapped products vs. averaging from the ungridded Level 2 data. Some mapped products contain mean values only; others contain additional statistics, such as number of pixels (NP) for each grid, standard deviation, etc. Since NP varies spatially and temporally, averaging with or without weighting by NP will be different. In this paper, we address differences of various weighting algorithms for some datasets utilized in Giovanni. The second aspect is related to different averaging methods affecting data quality and interpretation for data with non-normal distribution. The present study demonstrates results of different spatial averaging methods using gridded SeaWiFS Level 3 mapped monthly chlorophyll a data. Spatial averages were calculated using three different methods: arithmetic mean (AVG), geometric mean (GEO), and maximum likelihood estimator (MLE). Biogeochemical data, such as chlorophyll a, are usually considered to have a log-normal distribution. The study determined that differences between methods tend to increase with increasing size of a selected coastal area, with no significant differences in most open oceans. The GEO method consistently produces values lower than AVG and MLE. The AVG method produces values larger than MLE in some cases, but smaller in other cases. Further

  9. Sources of Safety Data and Statistical Strategies for Design and Analysis: Postmarket Surveillance.

    PubMed

    Izem, Rima; Sanchez-Kam, Matilde; Ma, Haijun; Zink, Richard; Zhao, Yueqin

    2018-03-01

    Safety data are continuously evaluated throughout the life cycle of a medical product to accurately assess and characterize the risks associated with the product. The knowledge about a medical product's safety profile continually evolves as safety data accumulate. This paper discusses data sources and analysis considerations for safety signal detection after a medical product is approved for marketing. This manuscript is the second in a series of papers from the American Statistical Association Biopharmaceutical Section Safety Working Group. We share our recommendations for the statistical and graphical methodologies necessary to appropriately analyze, report, and interpret safety outcomes, and we discuss the advantages and disadvantages of safety data obtained from passive postmarketing surveillance systems compared to other sources. Signal detection has traditionally relied on spontaneous reporting databases that have been available worldwide for decades. However, current regulatory guidelines and ease of reporting have increased the size of these databases exponentially over the last few years. With such large databases, data-mining tools using disproportionality analysis and helpful graphics are often used to detect potential signals. Although the data sources have many limitations, analyses of these data have been successful at identifying safety signals postmarketing. Experience analyzing these dynamic data is useful in understanding the potential and limitations of analyses with new data sources such as social media, claims, or electronic medical records data.

  10. Transportation Statistical Data and Information

    DOT National Transportation Integrated Search

    1976-12-01

    The document contains an extensive review of internal and external sources of transportation data and statistics especially created for data administrators. Organized around the transportation industry and around the elements of the U.S. Department o...

  11. Interpreting the gamma statistic in phylogenetic diversification rate studies: a rate decrease does not necessarily indicate an early burst.

    PubMed

    Fordyce, James A

    2010-07-23

    Phylogenetic hypotheses are increasingly being used to elucidate historical patterns of diversification rate-variation. Hypothesis testing is often conducted by comparing the observed vector of branching times to a null, pure-birth expectation. A popular method for inferring a decrease in speciation rate, which might suggest an early burst of diversification followed by a decrease in diversification rate is the gamma statistic. Using simulations under varying conditions, I examine the sensitivity of gamma to the distribution of the most recent branching times. Using an exploratory data analysis tool for lineages through time plots, tree deviation, I identified trees with a significant gamma statistic that do not appear to have the characteristic early accumulation of lineages consistent with an early, rapid rate of cladogenesis. I further investigated the sensitivity of the gamma statistic to recent diversification by examining the consequences of failing to simulate the full time interval following the most recent cladogenic event. The power of gamma to detect rate decrease at varying times was assessed for simulated trees with an initial high rate of diversification followed by a relatively low rate. The gamma statistic is extraordinarily sensitive to recent diversification rates, and does not necessarily detect early bursts of diversification. This was true for trees of various sizes and completeness of taxon sampling. The gamma statistic had greater power to detect recent diversification rate decreases compared to early bursts of diversification. Caution should be exercised when interpreting the gamma statistic as an indication of early, rapid diversification.

  12. Research on the Construction of Remote Sensing Automatic Interpretation Symbol Big Data

    NASA Astrophysics Data System (ADS)

    Gao, Y.; Liu, R.; Liu, J.; Cheng, T.

    2018-04-01

    Remote sensing automatic interpretation symbol (RSAIS) is an inexpensive and fast method in providing precise in-situ information for image interpretation and accuracy. This study designed a scientific and precise RSAIS data characterization method, as well as a distributed and cloud architecture massive data storage method. Additionally, it introduced an offline and online data update mode and a dynamic data evaluation mechanism, with the aim to create an efficient approach for RSAIS big data construction. Finally, a national RSAIS database with more than 3 million samples covering 86 land types was constructed during 2013-2015 based on the National Geographic Conditions Monitoring Project of China and then annually updated since the 2016 period. The RSAIS big data has proven to be a good method for large scale image interpretation and field validation. It is also notable that it has the potential to solve image automatic interpretation with the assistance of deep learning technology in the remote sensing big data era.

  13. Integrated Analysis of Pharmacologic, Clinical, and SNP Microarray Data using Projection onto the Most Interesting Statistical Evidence with Adaptive Permutation Testing

    PubMed Central

    Pounds, Stan; Cao, Xueyuan; Cheng, Cheng; Yang, Jun; Campana, Dario; Evans, William E.; Pui, Ching-Hon; Relling, Mary V.

    2010-01-01

    Powerful methods for integrated analysis of multiple biological data sets are needed to maximize interpretation capacity and acquire meaningful knowledge. We recently developed Projection Onto the Most Interesting Statistical Evidence (PROMISE). PROMISE is a statistical procedure that incorporates prior knowledge about the biological relationships among endpoint variables into an integrated analysis of microarray gene expression data with multiple biological and clinical endpoints. Here, PROMISE is adapted to the integrated analysis of pharmacologic, clinical, and genome-wide genotype data that incorporating knowledge about the biological relationships among pharmacologic and clinical response data. An efficient permutation-testing algorithm is introduced so that statistical calculations are computationally feasible in this higher-dimension setting. The new method is applied to a pediatric leukemia data set. The results clearly indicate that PROMISE is a powerful statistical tool for identifying genomic features that exhibit a biologically meaningful pattern of association with multiple endpoint variables. PMID:21516175

  14. Analyzing and interpreting genome data at the network level with ConsensusPathDB.

    PubMed

    Herwig, Ralf; Hardt, Christopher; Lienhard, Matthias; Kamburov, Atanas

    2016-10-01

    ConsensusPathDB consists of a comprehensive collection of human (as well as mouse and yeast) molecular interaction data integrated from 32 different public repositories and a web interface featuring a set of computational methods and visualization tools to explore these data. This protocol describes the use of ConsensusPathDB (http://consensuspathdb.org) with respect to the functional and network-based characterization of biomolecules (genes, proteins and metabolites) that are submitted to the system either as a priority list or together with associated experimental data such as RNA-seq. The tool reports interaction network modules, biochemical pathways and functional information that are significantly enriched by the user's input, applying computational methods for statistical over-representation, enrichment and graph analysis. The results of this protocol can be observed within a few minutes, even with genome-wide data. The resulting network associations can be used to interpret high-throughput data mechanistically, to characterize and prioritize biomarkers, to integrate different omics levels, to design follow-up functional assay experiments and to generate topology for kinetic models at different scales.

  15. Hemophilia Data and Statistics

    MedlinePlus

    ... View public health webinars on blood disorders Data & Statistics Language: English (US) Español (Spanish) Recommend on Facebook ... genetic testing is done to diagnose hemophilia before birth. For the one-third ... rates and hospitalization rates for bleeding complications from hemophilia ...

  16. Establishing Statistical Equivalence of Data from Different Sampling Approaches for Assessment of Bacterial Phenotypic Antimicrobial Resistance

    PubMed Central

    2018-01-01

    ABSTRACT To assess phenotypic bacterial antimicrobial resistance (AMR) in different strata (e.g., host populations, environmental areas, manure, or sewage effluents) for epidemiological purposes, isolates of target bacteria can be obtained from a stratum using various sample types. Also, different sample processing methods can be applied. The MIC of each target antimicrobial drug for each isolate is measured. Statistical equivalence testing of the MIC data for the isolates allows evaluation of whether different sample types or sample processing methods yield equivalent estimates of the bacterial antimicrobial susceptibility in the stratum. We demonstrate this approach on the antimicrobial susceptibility estimates for (i) nontyphoidal Salmonella spp. from ground or trimmed meat versus cecal content samples of cattle in processing plants in 2013-2014 and (ii) nontyphoidal Salmonella spp. from urine, fecal, and blood human samples in 2015 (U.S. National Antimicrobial Resistance Monitoring System data). We found that the sample types for cattle yielded nonequivalent susceptibility estimates for several antimicrobial drug classes and thus may gauge distinct subpopulations of salmonellae. The quinolone and fluoroquinolone susceptibility estimates for nontyphoidal salmonellae from human blood are nonequivalent to those from urine or feces, conjecturally due to the fluoroquinolone (ciprofloxacin) use to treat infections caused by nontyphoidal salmonellae. We also demonstrate statistical equivalence testing for comparing sample processing methods for fecal samples (culturing one versus multiple aliquots per sample) to assess AMR in fecal Escherichia coli. These methods yield equivalent results, except for tetracyclines. Importantly, statistical equivalence testing provides the MIC difference at which the data from two sample types or sample processing methods differ statistically. Data users (e.g., microbiologists and epidemiologists) may then interpret practical relevance

  17. Establishing Statistical Equivalence of Data from Different Sampling Approaches for Assessment of Bacterial Phenotypic Antimicrobial Resistance.

    PubMed

    Shakeri, Heman; Volkova, Victoriya; Wen, Xuesong; Deters, Andrea; Cull, Charley; Drouillard, James; Müller, Christian; Moradijamei, Behnaz; Jaberi-Douraki, Majid

    2018-05-01

    To assess phenotypic bacterial antimicrobial resistance (AMR) in different strata (e.g., host populations, environmental areas, manure, or sewage effluents) for epidemiological purposes, isolates of target bacteria can be obtained from a stratum using various sample types. Also, different sample processing methods can be applied. The MIC of each target antimicrobial drug for each isolate is measured. Statistical equivalence testing of the MIC data for the isolates allows evaluation of whether different sample types or sample processing methods yield equivalent estimates of the bacterial antimicrobial susceptibility in the stratum. We demonstrate this approach on the antimicrobial susceptibility estimates for (i) nontyphoidal Salmonella spp. from ground or trimmed meat versus cecal content samples of cattle in processing plants in 2013-2014 and (ii) nontyphoidal Salmonella spp. from urine, fecal, and blood human samples in 2015 (U.S. National Antimicrobial Resistance Monitoring System data). We found that the sample types for cattle yielded nonequivalent susceptibility estimates for several antimicrobial drug classes and thus may gauge distinct subpopulations of salmonellae. The quinolone and fluoroquinolone susceptibility estimates for nontyphoidal salmonellae from human blood are nonequivalent to those from urine or feces, conjecturally due to the fluoroquinolone (ciprofloxacin) use to treat infections caused by nontyphoidal salmonellae. We also demonstrate statistical equivalence testing for comparing sample processing methods for fecal samples (culturing one versus multiple aliquots per sample) to assess AMR in fecal Escherichia coli These methods yield equivalent results, except for tetracyclines. Importantly, statistical equivalence testing provides the MIC difference at which the data from two sample types or sample processing methods differ statistically. Data users (e.g., microbiologists and epidemiologists) may then interpret practical relevance of the

  18. A nonparametric spatial scan statistic for continuous data.

    PubMed

    Jung, Inkyung; Cho, Ho Jin

    2015-10-20

    Spatial scan statistics are widely used for spatial cluster detection, and several parametric models exist. For continuous data, a normal-based scan statistic can be used. However, the performance of the model has not been fully evaluated for non-normal data. We propose a nonparametric spatial scan statistic based on the Wilcoxon rank-sum test statistic and compared the performance of the method with parametric models via a simulation study under various scenarios. The nonparametric method outperforms the normal-based scan statistic in terms of power and accuracy in almost all cases under consideration in the simulation study. The proposed nonparametric spatial scan statistic is therefore an excellent alternative to the normal model for continuous data and is especially useful for data following skewed or heavy-tailed distributions.

  19. Recent statistical methods for orientation data

    NASA Technical Reports Server (NTRS)

    Batschelet, E.

    1972-01-01

    The application of statistical methods for determining the areas of animal orientation and navigation are discussed. The method employed is limited to the two-dimensional case. Various tests for determining the validity of the statistical analysis are presented. Mathematical models are included to support the theoretical considerations and tables of data are developed to show the value of information obtained by statistical analysis.

  20. Interpreting New Data from the High Energy Frontier

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Thaler, Jesse

    2016-09-26

    This is the final technical report for DOE grant DE-SC0006389, "Interpreting New Data from the High Energy Frontier", describing research accomplishments by the PI in the field of theoretical high energy physics.

  1. Weighted Feature Significance: A Simple, Interpretable Model of Compound Toxicity Based on the Statistical Enrichment of Structural Features

    PubMed Central

    Huang, Ruili; Southall, Noel; Xia, Menghang; Cho, Ming-Hsuang; Jadhav, Ajit; Nguyen, Dac-Trung; Inglese, James; Tice, Raymond R.; Austin, Christopher P.

    2009-01-01

    In support of the U.S. Tox21 program, we have developed a simple and chemically intuitive model we call weighted feature significance (WFS) to predict the toxicological activity of compounds, based on the statistical enrichment of structural features in toxic compounds. We trained and tested the model on the following: (1) data from quantitative high–throughput screening cytotoxicity and caspase activation assays conducted at the National Institutes of Health Chemical Genomics Center, (2) data from Salmonella typhimurium reverse mutagenicity assays conducted by the U.S. National Toxicology Program, and (3) hepatotoxicity data published in the Registry of Toxic Effects of Chemical Substances. Enrichments of structural features in toxic compounds are evaluated for their statistical significance and compiled into a simple additive model of toxicity and then used to score new compounds for potential toxicity. The predictive power of the model for cytotoxicity was validated using an independent set of compounds from the U.S. Environmental Protection Agency tested also at the National Institutes of Health Chemical Genomics Center. We compared the performance of our WFS approach with classical classification methods such as Naive Bayesian clustering and support vector machines. In most test cases, WFS showed similar or slightly better predictive power, especially in the prediction of hepatotoxic compounds, where WFS appeared to have the best performance among the three methods. The new algorithm has the important advantages of simplicity, power, interpretability, and ease of implementation. PMID:19805409

  2. Normative Data for Interpreting the BREAST-Q: Augmentation

    PubMed Central

    Mundy, Lily R.; Homa, Karen; Klassen, Anne F.; Pusic, Andrea L.; Kerrigan, Carolyn L.

    2016-01-01

    Background The BREAST-Q is a rigorously developed, well-validated, patient-reported outcome (PRO) instrument with a module designed for evaluating breast augmentation outcomes. However, there are no published normative BREAST-Q scores, limiting interpretation. Methods Normative data were generated for the BREAST-Q Augmentation Module via the Army of Women (AOW), an online community of women (with and without breast cancer) engaged in breast-cancer related research. Members were recruited via email, with women 18 years or older without a history of breast cancer or breast surgery invited to participate. Descriptive statistics and a linear multivariate regression were performed. A separate analysis compared normative scores to findings from previously published BREAST-Q augmentation studies. Results The preoperative BREAST-Q Augmentation Module was completed by 1,211 women. Mean age was 54 ±24 years, mean body mass index (BMI) was 27 ±6, and 39% (n=467) had a bra cup size ≥D. Mean scores were Satisfaction with Breasts (54 ±19), Psychosocial Well-being (66 ±20), Sexual Well-being (49 ±20), and Physical Well-being (86 ±15). Women with a BMI of 30 or greater and bra cup size D or greater had lower scores. In comparison to AOW scores, published BREAST-Q augmentation scores were lower before and higher after surgery for all scales except Physical Well-being. Conclusions The AOW normative data represent breast-related satisfaction and well-being in woman not actively seeking breast augmentation. This data may be used as normative comparison values for those seeking and undergoing surgery as we did, demonstrating the value of breast augmentation in this patient population. PMID:28350657

  3. Application of machine learning and expert systems to Statistical Process Control (SPC) chart interpretation

    NASA Technical Reports Server (NTRS)

    Shewhart, Mark

    1991-01-01

    Statistical Process Control (SPC) charts are one of several tools used in quality control. Other tools include flow charts, histograms, cause and effect diagrams, check sheets, Pareto diagrams, graphs, and scatter diagrams. A control chart is simply a graph which indicates process variation over time. The purpose of drawing a control chart is to detect any changes in the process signalled by abnormal points or patterns on the graph. The Artificial Intelligence Support Center (AISC) of the Acquisition Logistics Division has developed a hybrid machine learning expert system prototype which automates the process of constructing and interpreting control charts.

  4. Regional interpretation of water-quality monitoring data

    USGS Publications Warehouse

    Smith, Richard A.; Schwarz, Gregory E.; Alexander, Richard B.

    1997-01-01

    We describe a method for using spatially referenced regressions of contaminant transport on watershed attributes (SPARROW) in regional water-quality assessment. The method is designed to reduce the problems of data interpretation caused by sparse sampling, network bias, and basin heterogeneity. The regression equation relates measured transport rates in streams to spatially referenced descriptors of pollution sources and land-surface and stream-channel characteristics. Regression models of total phosphorus (TP) and total nitrogen (TN) transport are constructed for a region defined as the nontidal conterminous United States. Observed TN and TP transport rates are derived from water-quality records for 414 stations in the National Stream Quality Accounting Network. Nutrient sources identified in the equations include point sources, applied fertilizer, livestock waste, nonagricultural land, and atmospheric deposition (TN only). Surface characteristics found to be significant predictors of land-water delivery include soil permeability, stream density, and temperature (TN only). Estimated instream decay coefficients for the two contaminants decrease monotonically with increasing stream size. TP transport is found to be significantly reduced by reservoir retention. Spatial referencing of basin attributes in relation to the stream channel network greatly increases their statistical significance and model accuracy. The method is used to estimate the proportion of watersheds in the conterminous United States (i.e., hydrologic cataloging units) with outflow TP concentrations less than the criterion of 0.1 mg/L, and to classify cataloging units according to local TN yield (kg/km2/yr).

  5. Variation in benthic long-term data of transitional waters: Is interpretation more than speculation?

    PubMed Central

    Zettler, Michael Lothar; Friedland, René; Gogina, Mayya; Darr, Alexander

    2017-01-01

    Biological long-term data series in marine habitats are often used to identify anthropogenic impacts on the environment or climate induced regime shifts. However, particularly in transitional waters, environmental properties like water mass dynamics, salinity variability and the occurrence of oxygen minima not necessarily caused by either human activities or climate change can attenuate or mask apparent signals. At first glance it very often seems impossible to interpret the strong fluctuations of e.g. abundances or species richness, since abiotic variables like salinity and oxygen content vary simultaneously as well as in apparently erratic ways. The long-term development of major macrozoobenthic parameters (abundance, biomass, species numbers) and derivative macrozoobenthic indices (Shannon diversity, Margalef, Pilou’s evenness and Hurlbert) has been successfully interpreted and related to the long-term fluctuations of salinity and oxygen, incorporation of the North Atlantic Oscillation index (NAO index), relying on the statistical analysis of modelled and measured data during 35 years of observation at three stations in the south-western Baltic Sea. Our results suggest that even at a restricted spatial scale the benthic system does not appear to be tightly controlled by any single environmental driver and highlight the complexity of spatially varying temporal response. PMID:28422974

  6. Spatial Statistical Data Fusion (SSDF)

    NASA Technical Reports Server (NTRS)

    Braverman, Amy J.; Nguyen, Hai M.; Cressie, Noel

    2013-01-01

    As remote sensing for scientific purposes has transitioned from an experimental technology to an operational one, the selection of instruments has become more coordinated, so that the scientific community can exploit complementary measurements. However, tech nological and scientific heterogeneity across devices means that the statistical characteristics of the data they collect are different. The challenge addressed here is how to combine heterogeneous remote sensing data sets in a way that yields optimal statistical estimates of the underlying geophysical field, and provides rigorous uncertainty measures for those estimates. Different remote sensing data sets may have different spatial resolutions, different measurement error biases and variances, and other disparate characteristics. A state-of-the-art spatial statistical model was used to relate the true, but not directly observed, geophysical field to noisy, spatial aggregates observed by remote sensing instruments. The spatial covariances of the true field and the covariances of the true field with the observations were modeled. The observations are spatial averages of the true field values, over pixels, with different measurement noise superimposed. A kriging framework is used to infer optimal (minimum mean squared error and unbiased) estimates of the true field at point locations from pixel-level, noisy observations. A key feature of the spatial statistical model is the spatial mixed effects model that underlies it. The approach models the spatial covariance function of the underlying field using linear combinations of basis functions of fixed size. Approaches based on kriging require the inversion of very large spatial covariance matrices, and this is usually done by making simplifying assumptions about spatial covariance structure that simply do not hold for geophysical variables. In contrast, this method does not require these assumptions, and is also computationally much faster. This method is

  7. ROOT: A C++ framework for petabyte data storage, statistical analysis and visualization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Antcheva, I.; /CERN; Ballintijn, M.

    2009-01-01

    ROOT is an object-oriented C++ framework conceived in the high-energy physics (HEP) community, designed for storing and analyzing petabytes of data in an efficient way. Any instance of a C++ class can be stored into a ROOT file in a machine-independent compressed binary format. In ROOT the TTree object container is optimized for statistical data analysis over very large data sets by using vertical data storage techniques. These containers can span a large number of files on local disks, the web or a number of different shared file systems. In order to analyze this data, the user can chose outmore » of a wide set of mathematical and statistical functions, including linear algebra classes, numerical algorithms such as integration and minimization, and various methods for performing regression analysis (fitting). In particular, the RooFit package allows the user to perform complex data modeling and fitting while the RooStats library provides abstractions and implementations for advanced statistical tools. Multivariate classification methods based on machine learning techniques are available via the TMVA package. A central piece in these analysis tools are the histogram classes which provide binning of one- and multi-dimensional data. Results can be saved in high-quality graphical formats like Postscript and PDF or in bitmap formats like JPG or GIF. The result can also be stored into ROOT macros that allow a full recreation and rework of the graphics. Users typically create their analysis macros step by step, making use of the interactive C++ interpreter CINT, while running over small data samples. Once the development is finished, they can run these macros at full compiled speed over large data sets, using on-the-fly compilation, or by creating a stand-alone batch program. Finally, if processing farms are available, the user can reduce the execution time of intrinsically parallel tasks - e.g. data mining in HEP - by using PROOF, which will take care of optimally

  8. Mobile Collection and Automated Interpretation of EEG Data

    NASA Technical Reports Server (NTRS)

    Mintz, Frederick; Moynihan, Philip

    2007-01-01

    A system that would comprise mobile and stationary electronic hardware and software subsystems has been proposed for collection and automated interpretation of electroencephalographic (EEG) data from subjects in everyday activities in a variety of environments. By enabling collection of EEG data from mobile subjects engaged in ordinary activities (in contradistinction to collection from immobilized subjects in clinical settings), the system would expand the range of options and capabilities for performing diagnoses. Each subject would be equipped with one of the mobile subsystems, which would include a helmet that would hold floating electrodes (see figure) in those positions on the patient s head that are required in classical EEG data-collection techniques. A bundle of wires would couple the EEG signals from the electrodes to a multi-channel transmitter also located in the helmet. Electronic circuitry in the helmet transmitter would digitize the EEG signals and transmit the resulting data via a multidirectional RF patch antenna to a remote location. At the remote location, the subject s EEG data would be processed and stored in a database that would be auto-administered by a newly designed relational database management system (RDBMS). In this RDBMS, in nearly real time, the newly stored data would be subjected to automated interpretation that would involve comparison with other EEG data and concomitant peer-reviewed diagnoses stored in international brain data bases administered by other similar RDBMSs.

  9. Statistics for characterizing data on the periphery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Theiler, James P; Hush, Donald R

    2010-01-01

    We introduce a class of statistics for characterizing the periphery of a distribution, and show that these statistics are particularly valuable for problems in target detection. Because so many detection algorithms are rooted in Gaussian statistics, we concentrate on ellipsoidal models of high-dimensional data distributions (that is to say: covariance matrices), but we recommend several alternatives to the sample covariance matrix that more efficiently model the periphery of a distribution, and can more effectively detect anomalous data samples.

  10. Statistical Challenges in "Big Data" Human Neuroimaging.

    PubMed

    Smith, Stephen M; Nichols, Thomas E

    2018-01-17

    Smith and Nichols discuss "big data" human neuroimaging studies, with very large subject numbers and amounts of data. These studies provide great opportunities for making new discoveries about the brain but raise many new analytical challenges and interpretational risks. Copyright © 2017 Elsevier Inc. All rights reserved.

  11. 78 FR 10166 - Access Interpreting; Transfer of Data

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-02-13

    ... ENVIRONMENTAL PROTECTION AGENCY [EPA-HQ-OPP-2013-0036; FRL-9376-6] Access Interpreting; Transfer of Data AGENCY: Environmental Protection Agency (EPA). ACTION: Notice. SUMMARY: This notice announces that pesticide related information submitted to EPA's Office of Pesticide Programs (OPP) pursuant to...

  12. BIG DATA AND STATISTICS

    PubMed Central

    Rossell, David

    2016-01-01

    Big Data brings unprecedented power to address scientific, economic and societal issues, but also amplifies the possibility of certain pitfalls. These include using purely data-driven approaches that disregard understanding the phenomenon under study, aiming at a dynamically moving target, ignoring critical data collection issues, summarizing or preprocessing the data inadequately and mistaking noise for signal. We review some success stories and illustrate how statistical principles can help obtain more reliable information from data. We also touch upon current challenges that require active methodological research, such as strategies for efficient computation, integration of heterogeneous data, extending the underlying theory to increasingly complex questions and, perhaps most importantly, training a new generation of scientists to develop and deploy these strategies. PMID:27722040

  13. Interpreting Assessment Scores of Nonliterate Learners with Ethnographic Data.

    ERIC Educational Resources Information Center

    Griffin, Suzanne M.

    Research findings are reported that suggest that valid interpretation of assessment scores on illiterate and preliterate learners requires the use of ethnographic data. Data from observation notes, photos, and audiotapes indicated that learners' understanding of their tasks affected their performance in assessment situations. Previous findings…

  14. Statistical Analysis of Research Data | Center for Cancer Research

    Cancer.gov

    Recent advances in cancer biology have resulted in the need for increased statistical analysis of research data. The Statistical Analysis of Research Data (SARD) course will be held on April 5-6, 2018 from 9 a.m.-5 p.m. at the National Institutes of Health's Natcher Conference Center, Balcony C on the Bethesda Campus. SARD is designed to provide an overview on the general principles of statistical analysis of research data.  The first day will feature univariate data analysis, including descriptive statistics, probability distributions, one- and two-sample inferential statistics.

  15. Equivalent circuit models for interpreting impedance perturbation spectroscopy data

    NASA Astrophysics Data System (ADS)

    Smith, R. Lowell

    2004-07-01

    As in-situ structural integrity monitoring disciplines mature, there is a growing need to process sensor/actuator data efficiently in real time. Although smaller, faster embedded processors will contribute to this, it is also important to develop straightforward, robust methods to reduce the overall computational burden for practical applications of interest. This paper addresses the use of equivalent circuit modeling techniques for inferring structure attributes monitored using impedance perturbation spectroscopy. In pioneering work about ten years ago significant progress was associated with the development of simple impedance models derived from the piezoelectric equations. Using mathematical modeling tools currently available from research in ultrasonics and impedance spectroscopy is expected to provide additional synergistic benefits. For purposes of structural health monitoring the objective is to use impedance spectroscopy data to infer the physical condition of structures to which small piezoelectric actuators are bonded. Features of interest include stiffness changes, mass loading, and damping or mechanical losses. Equivalent circuit models are typically simple enough to facilitate the development of practical analytical models of the actuator-structure interaction. This type of parametric structure model allows raw impedance/admittance data to be interpreted optimally using standard multiple, nonlinear regression analysis. One potential long-term outcome is the possibility of cataloging measured viscoelastic properties of the mechanical subsystems of interest as simple lists of attributes and their statistical uncertainties, whose evolution can be followed in time. Equivalent circuit models are well suited for addressing calibration and self-consistency issues such as temperature corrections, Poisson mode coupling, and distributed relaxation processes.

  16. Confidentiality of Research and Statistical Data.

    ERIC Educational Resources Information Center

    Law Enforcement Assistance Administration (Dept. of Justice), Washington, DC.

    This document was prepared by the Privacy and Security Staff, National Criminal Justice Information and Statistics Service, in conjunction with the Law Enforcement Assistance Administration (LEAA) Office of General Counsel, to explain and discuss the requirements of the LEAA regulations governing confidentiality of research and statistical data.…

  17. Method of interpretation of remotely sensed data and applications to land use

    NASA Technical Reports Server (NTRS)

    Parada, N. D. J. (Principal Investigator); Dossantos, A. P.; Foresti, C.; Demoraesnovo, E. M. L.; Niero, M.; Lombardo, M. A.

    1981-01-01

    Instructional material describing a methodology of remote sensing data interpretation and examples of applicatons to land use survey are presented. The image interpretation elements are discussed for different types of sensor systems: aerial photographs, radar, and MSS/LANDSAT. Visual and automatic LANDSAT image interpretation is emphasized.

  18. New dimensions from statistical graphics for GIS (geographic information system) analysis and interpretation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McCord, R.A.; Olson, R.J.

    1988-01-01

    Environmental research and assessment activities at Oak Ridge National Laboratory (ORNL) include the analysis of spatial and temporal patterns of ecosystem response at a landscape scale. Analysis through use of geographic information system (GIS) involves an interaction between the user and thematic data sets frequently expressed as maps. A portion of GIS analysis has a mathematical or statistical aspect, especially for the analysis of temporal patterns. ARC/INFO is an excellent tool for manipulating GIS data and producing the appropriate map graphics. INFO also has some limited ability to produce statistical tabulation. At ORNL we have extended our capabilities by graphicallymore » interfacing ARC/INFO and SAS/GRAPH to provide a combined mapping and statistical graphics environment. With the data management, statistical, and graphics capabilities of SAS added to ARC/INFO, we have expanded the analytical and graphical dimensions of the GIS environment. Pie or bar charts, frequency curves, hydrographs, or scatter plots as produced by SAS can be added to maps from attribute data associated with ARC/INFO coverages. Numerous, small, simplified graphs can also become a source of complex map ''symbols.'' These additions extend the dimensions of GIS graphics to include time, details of the thematic composition, distribution, and interrelationships. 7 refs., 3 figs.« less

  19. A proposed framework for the interpretation of biomonitoring data

    PubMed Central

    Boogaard, Peter J; Money, Chris D

    2008-01-01

    Biomonitoring, the determination of chemical substances in human body fluids or tissues, is more and more frequently applied. At the same time detection limits are decreasing steadily. As a consequence, many data with potential relevance for public health are generated although they need not necessarily allow interpretation in term of health relevance. The European Centre of Ecotoxicology and Toxicology of Chemicals (ECETOC) formed a dedicated task force to build a framework for the interpretation of biomonitoring data. The framework that was developed evaluates biomonitoring data based on their analytical integrity, their ability to describe dose (toxicokinetics), their ability to relate to effects, and an overall evaluation and weight of evidence analysis. This framework was subsequently evaluated with a number of case studies and was shown to provide a rational basis to advance discussions on human biomonitoring allowing better use and application of this type of data in human health risk assessment. PMID:18541066

  20. Statistical Models for the Analysis of Zero-Inflated Pain Intensity Numeric Rating Scale Data.

    PubMed

    Goulet, Joseph L; Buta, Eugenia; Bathulapalli, Harini; Gueorguieva, Ralitza; Brandt, Cynthia A

    2017-03-01

    Pain intensity is often measured in clinical and research settings using the 0 to 10 numeric rating scale (NRS). NRS scores are recorded as discrete values, and in some samples they may display a high proportion of zeroes and a right-skewed distribution. Despite this, statistical methods for normally distributed data are frequently used in the analysis of NRS data. We present results from an observational cross-sectional study examining the association of NRS scores with patient characteristics using data collected from a large cohort of 18,935 veterans in Department of Veterans Affairs care diagnosed with a potentially painful musculoskeletal disorder. The mean (variance) NRS pain was 3.0 (7.5), and 34% of patients reported no pain (NRS = 0). We compared the following statistical models for analyzing NRS scores: linear regression, generalized linear models (Poisson and negative binomial), zero-inflated and hurdle models for data with an excess of zeroes, and a cumulative logit model for ordinal data. We examined model fit, interpretability of results, and whether conclusions about the predictor effects changed across models. In this study, models that accommodate zero inflation provided a better fit than the other models. These models should be considered for the analysis of NRS data with a large proportion of zeroes. We examined and analyzed pain data from a large cohort of veterans with musculoskeletal disorders. We found that many reported no current pain on the NRS on the diagnosis date. We present several alternative statistical methods for the analysis of pain intensity data with a large proportion of zeroes. Published by Elsevier Inc.

  1. Data Mining: Going beyond Traditional Statistics

    ERIC Educational Resources Information Center

    Zhao, Chun-Mei; Luan, Jing

    2006-01-01

    The authors provide an overview of data mining, giving special attention to the relationship between data mining and statistics to unravel some misunderstandings about the two techniques. (Contains 1 figure.)

  2. Statistical Analysis of Big Data on Pharmacogenomics

    PubMed Central

    Fan, Jianqing; Liu, Han

    2013-01-01

    This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively review several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, large-scale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identifying important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed. PMID:23602905

  3. Comparison of the Predictive Performance and Interpretability of Random Forest and Linear Models on Benchmark Data Sets.

    PubMed

    Marchese Robinson, Richard L; Palczewska, Anna; Palczewski, Jan; Kidley, Nathan

    2017-08-28

    The ability to interpret the predictions made by quantitative structure-activity relationships (QSARs) offers a number of advantages. While QSARs built using nonlinear modeling approaches, such as the popular Random Forest algorithm, might sometimes be more predictive than those built using linear modeling approaches, their predictions have been perceived as difficult to interpret. However, a growing number of approaches have been proposed for interpreting nonlinear QSAR models in general and Random Forest in particular. In the current work, we compare the performance of Random Forest to those of two widely used linear modeling approaches: linear Support Vector Machines (SVMs) (or Support Vector Regression (SVR)) and partial least-squares (PLS). We compare their performance in terms of their predictivity as well as the chemical interpretability of the predictions using novel scoring schemes for assessing heat map images of substructural contributions. We critically assess different approaches for interpreting Random Forest models as well as for obtaining predictions from the forest. We assess the models on a large number of widely employed public-domain benchmark data sets corresponding to regression and binary classification problems of relevance to hit identification and toxicology. We conclude that Random Forest typically yields comparable or possibly better predictive performance than the linear modeling approaches and that its predictions may also be interpreted in a chemically and biologically meaningful way. In contrast to earlier work looking at interpretation of nonlinear QSAR models, we directly compare two methodologically distinct approaches for interpreting Random Forest models. The approaches for interpreting Random Forest assessed in our article were implemented using open-source programs that we have made available to the community. These programs are the rfFC package ( https://r-forge.r-project.org/R/?group_id=1725 ) for the R statistical

  4. Internet Data Analysis for the Undergraduate Statistics Curriculum

    ERIC Educational Resources Information Center

    Sanchez, Juana; He, Yan

    2005-01-01

    Statistics textbooks for undergraduates have not caught up with the enormous amount of analysis of Internet data that is taking place these days. Case studies that use Web server log data or Internet network traffic data are rare in undergraduate Statistics education. And yet these data provide numerous examples of skewed and bimodal…

  5. Statistics Poster Challenge for Schools

    ERIC Educational Resources Information Center

    Payne, Brad; Freeman, Jenny; Stillman, Eleanor

    2013-01-01

    The analysis and interpretation of data are important life skills. A poster challenge for schoolchildren provides an innovative outlet for these skills and demonstrates their relevance to daily life. We discuss our Statistics Poster Challenge and the lessons we have learned.

  6. Data flow language and interpreter for a reconfigurable distributed data processor

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hurt, A.D.; Heath, J.R.

    1982-01-01

    An analytic language and an interpreter whereby an applications data flow graph may serve as an input to a reconfigurable distributed data processor is proposed. The architecture considered consists of a number of loosely coupled computing elements (CES) which may be linked to data and file memories through fully nonblocking interconnect networks. The real-time performance of such an architecture depends upon its ability to alter its topology in response to changes in application, asynchronous data rates and faults. Such a data flow language enhances the versatility of a reconfigurable architecture by allowing the user to specify the machine's topology atmore » a very high level. 11 references.« less

  7. Interpretations of Boxplots: Helping Middle School Students to Think outside the Box

    ERIC Educational Resources Information Center

    Edwards, Thomas G.; Özgün-Koca, Asli; Barr, John

    2017-01-01

    Boxplots are statistical representations for organizing and displaying data that are relatively easy to create with a five-number summary. However, boxplots are not as easy to understand, interpret, or connect with other statistical representations of the same data. We worked at two different schools with 259 middle school students who constructed…

  8. Critical analysis of adsorption data statistically

    NASA Astrophysics Data System (ADS)

    Kaushal, Achla; Singh, S. K.

    2017-10-01

    Experimental data can be presented, computed, and critically analysed in a different way using statistics. A variety of statistical tests are used to make decisions about the significance and validity of the experimental data. In the present study, adsorption was carried out to remove zinc ions from contaminated aqueous solution using mango leaf powder. The experimental data was analysed statistically by hypothesis testing applying t test, paired t test and Chi-square test to (a) test the optimum value of the process pH, (b) verify the success of experiment and (c) study the effect of adsorbent dose in zinc ion removal from aqueous solutions. Comparison of calculated and tabulated values of t and χ 2 showed the results in favour of the data collected from the experiment and this has been shown on probability charts. K value for Langmuir isotherm was 0.8582 and m value for Freundlich adsorption isotherm obtained was 0.725, both are <1, indicating favourable isotherms. Karl Pearson's correlation coefficient values for Langmuir and Freundlich adsorption isotherms were obtained as 0.99 and 0.95 respectively, which show higher degree of correlation between the variables. This validates the data obtained for adsorption of zinc ions from the contaminated aqueous solution with the help of mango leaf powder.

  9. Statistically significant relational data mining :

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berry, Jonathan W.; Leung, Vitus Joseph; Phillips, Cynthia Ann

    This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publicationsmore » that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.« less

  10. Combining Shapley value and statistics to the analysis of gene expression data in children exposed to air pollution

    PubMed Central

    Moretti, Stefano; van Leeuwen, Danitsja; Gmuender, Hans; Bonassi, Stefano; van Delft, Joost; Kleinjans, Jos; Patrone, Fioravante; Merlo, Domenico Franco

    2008-01-01

    Background In gene expression analysis, statistical tests for differential gene expression provide lists of candidate genes having, individually, a sufficiently low p-value. However, the interpretation of each single p-value within complex systems involving several interacting genes is problematic. In parallel, in the last sixty years, game theory has been applied to political and social problems to assess the power of interacting agents in forcing a decision and, more recently, to represent the relevance of genes in response to certain conditions. Results In this paper we introduce a Bootstrap procedure to test the null hypothesis that each gene has the same relevance between two conditions, where the relevance is represented by the Shapley value of a particular coalitional game defined on a microarray data-set. This method, which is called Comparative Analysis of Shapley value (shortly, CASh), is applied to data concerning the gene expression in children differentially exposed to air pollution. The results provided by CASh are compared with the results from a parametric statistical test for testing differential gene expression. Both lists of genes provided by CASh and t-test are informative enough to discriminate exposed subjects on the basis of their gene expression profiles. While many genes are selected in common by CASh and the parametric test, it turns out that the biological interpretation of the differences between these two selections is more interesting, suggesting a different interpretation of the main biological pathways in gene expression regulation for exposed individuals. A simulation study suggests that CASh offers more power than t-test for the detection of differential gene expression variability. Conclusion CASh is successfully applied to gene expression analysis of a data-set where the joint expression behavior of genes may be critical to characterize the expression response to air pollution. We demonstrate a synergistic effect between

  11. Visualization and modeling of sub-populations of compositional data: statistical methods illustrated by means of geochemical data from fumarolic fluids

    NASA Astrophysics Data System (ADS)

    Pawlowsky-Glahn, Vera; Buccianti, Antonella

    In the investigation of fluid samples of a volcanic system, collected during a given period of time, one of the main goals is to discover cause-effect relationships that allow us to explain changes in the chemical composition. They might be caused by physicochemical factors, such as temperature, pressure, or non-conservative behavior of some chemical constituents (addition or subtraction of material), among others. The presence of subgroups of observations showing different behavior is evidence of unusually complex situations, which might render even more difficult the analysis and interpretation of observed phenomena. These cases require appropriate statistical techniques as well as sound a priori hypothesis concerning underlying geological processes. The purpose of this article is to present the state of the art in the methodology for a better visualization of compositional data, as well as for detecting statistically significant sub-populations. The scheme of this article is to present first the application, and then the underlying methodology, with the aim of the first motivating the second. Thus, the first part has the goal to illustrate how to understand and interpret results, whereas the second is devoted to expose how to perform a study of this kind. The case study is related to the chemical composition of a fumarole of Vulcano Island (southern Italy), called F14. The volcanic activity at Vulcano Island is subject to a continuous program of geochemical surveillance from 1978 up to now and the large data set of observations contains the main chemical composition of volcanic gases as well as trace element concentrations in the condensates of fumarolic gases. Out of the complete set of measured components, the variables H2S, HF and As, determined in samples collected from 1978 to 1993 (As is not available in recent samples) are used to characterize two groups in the original population, which proved to be statistically distinct. The choice of the variables is

  12. Radiologist Uncertainty and the Interpretation of Screening

    PubMed Central

    Carney, Patricia A.; Elmore, Joann G.; Abraham, Linn A.; Gerrity, Martha S.; Hendrick, R. Edward; Taplin, Stephen H.; Barlow, William E.; Cutter, Gary R.; Poplack, Steven P.; D’Orsi, Carl J.

    2011-01-01

    Objective To determine radiologists’ reactions to uncertainty when interpreting mammography and the extent to which radiologist uncertainty explains variability in interpretive performance. Methods The authors used a mailed survey to assess demographic and clinical characteristics of radiologists and reactions to uncertainty associated with practice. Responses were linked to radiologists’ actual interpretive performance data obtained from 3 regionally located mammography registries. Results More than 180 radiologists were eligible to participate, and 139 consented for a response rate of 76.8%. Radiologist gender, more years interpreting, and higher volume were associated with lower uncertainty scores. Positive predictive value, recall rates, and specificity were more affected by reactions to uncertainty than sensitivity or negative predictive value; however, none of these relationships was statistically significant. Conclusion Certain practice factors, such as gender and years of interpretive experience, affect uncertainty scores. Radiologists’ reactions to uncertainty do not appear to affect interpretive performance. PMID:15155014

  13. HEP Software Foundation Community White Paper Working Group - Data Analysis and Interpretation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bauerdick, Lothar

    At the heart of experimental high energy physics (HEP) is the development of facilities and instrumentation that provide sensitivity to new phenomena. Our understanding of nature at its most fundamental level is advanced through the analysis and interpretation of data from sophisticated detectors in HEP experiments. The goal of data analysis systems is to realize the maximum possible scientific potential of the data within the constraints of computing and human resources in the least time. To achieve this goal, future analysis systems should empower physicists to access the data with a high level of interactivity, reproducibility and throughput capability. Asmore » part of the HEP Software Foundation Community White Paper process, a working group on Data Analysis and Interpretation was formed to assess the challenges and opportunities in HEP data analysis and develop a roadmap for activities in this area over the next decade. In this report, the key findings and recommendations of the Data Analysis and Interpretation Working Group are presented.« less

  14. Autoadaptivity and optimization in distributed ECG interpretation.

    PubMed

    Augustyniak, Piotr

    2010-03-01

    This paper addresses principal issues of the ECG interpretation adaptivity in a distributed surveillance network. In the age of pervasive access to wireless digital communication, distributed biosignal interpretation networks may not only optimally solve difficult medical cases, but also adapt the data acquisition, interpretation, and transmission to the variable patient's status and availability of technical resources. The background of such adaptivity is the innovative use of results from the automatic ECG analysis to the seamless remote modification of the interpreting software. Since the medical relevance of issued diagnostic data depends on the patient's status, the interpretation adaptivity implies the flexibility of report content and frequency. Proposed solutions are based on the research on human experts behavior, procedures reliability, and usage statistics. Despite the limited scale of our prototype client-server application, the tests yielded very promising results: the transmission channel occupation was reduced by 2.6 to 5.6 times comparing to the rigid reporting mode and the improvement of the remotely computed diagnostic outcome was achieved in case of over 80% of software adaptation attempts.

  15. Uses and Misuses of Student Evaluations of Teaching: The Interpretation of Differences in Teaching Evaluation Means Irrespective of Statistical Information

    ERIC Educational Resources Information Center

    Boysen, Guy A.

    2015-01-01

    Student evaluations of teaching are among the most accepted and important indicators of college teachers' performance. However, faculty and administrators can overinterpret small variations in mean teaching evaluations. The current research examined the effect of including statistical information on the interpretation of teaching evaluations.…

  16. Geologic interpretation of HCMM and aircraft thermal data

    NASA Technical Reports Server (NTRS)

    1982-01-01

    Progress on the Heat Capacity Mapping Mission (HCMM) follow-on study is reported. Numerous image products for geologic interpretation of both HCMM and aircraft thermal data were produced. These include, among others, various combinations of the thermal data with LANDSAT and SEASAT data. The combined data sets were displayed using simple color composites, principal component color composites and black and white images, and hue, saturation intensity color composites. Algorithms for incorporating both atmospheric and elevation data simultaneously into the digital processing for creation of quantitatively correct thermal inertia images, are in the final development stage. A field trip to Death Valley was undertaken to field check the aircraft and HCMM data.

  17. A Novel Statistical Analysis and Interpretation of Flow Cytometry Data

    DTIC Science & Technology

    2013-07-05

    ing parameters of the mathematical model are determined by using least squares to fit the data in Figures 1 and 2 (see Section 4). The second method is...variance (for all k and j). Then the AIC, which is the expected value of the relative Kullback – Leibler distance for a given model [16], is Kn = n log ( J...emphasized that the fit of the model is quite good for both donors and cell types. As such, we proceed to analyse the dynamic responsiveness of the

  18. The disagreeable behaviour of the kappa statistic.

    PubMed

    Flight, Laura; Julious, Steven A

    2015-01-01

    It is often of interest to measure the agreement between a number of raters when an outcome is nominal or ordinal. The kappa statistic is used as a measure of agreement. The statistic is highly sensitive to the distribution of the marginal totals and can produce unreliable results. Other statistics such as the proportion of concordance, maximum attainable kappa and prevalence and bias adjusted kappa should be considered to indicate how well the kappa statistic represents agreement in the data. Each kappa should be considered and interpreted based on the context of the data being analysed. Copyright © 2014 John Wiley & Sons, Ltd.

  19. Interpreting Survey Data to Inform Solid-Waste Education Programs

    ERIC Educational Resources Information Center

    McKeown, Rosalyn

    2006-01-01

    Few examples exist on how to use survey data to inform public environmental education programs. I suggest a process for interpreting statewide survey data with the four questions that give insights into local context and make it possible to gain insight into potential target audiences and community priorities. The four questions are: What…

  20. DAFi: A directed recursive data filtering and clustering approach for improving and interpreting data clustering identification of cell populations from polychromatic flow cytometry data.

    PubMed

    Lee, Alexandra J; Chang, Ivan; Burel, Julie G; Lindestam Arlehamn, Cecilia S; Mandava, Aishwarya; Weiskopf, Daniela; Peters, Bjoern; Sette, Alessandro; Scheuermann, Richard H; Qian, Yu

    2018-04-17

    Computational methods for identification of cell populations from polychromatic flow cytometry data are changing the paradigm of cytometry bioinformatics. Data clustering is the most common computational approach to unsupervised identification of cell populations from multidimensional cytometry data. However, interpretation of the identified data clusters is labor-intensive. Certain types of user-defined cell populations are also difficult to identify by fully automated data clustering analysis. Both are roadblocks before a cytometry lab can adopt the data clustering approach for cell population identification in routine use. We found that combining recursive data filtering and clustering with constraints converted from the user manual gating strategy can effectively address these two issues. We named this new approach DAFi: Directed Automated Filtering and Identification of cell populations. Design of DAFi preserves the data-driven characteristics of unsupervised clustering for identifying novel cell subsets, but also makes the results interpretable to experimental scientists through mapping and merging the multidimensional data clusters into the user-defined two-dimensional gating hierarchy. The recursive data filtering process in DAFi helped identify small data clusters which are otherwise difficult to resolve by a single run of the data clustering method due to the statistical interference of the irrelevant major clusters. Our experiment results showed that the proportions of the cell populations identified by DAFi, while being consistent with those by expert centralized manual gating, have smaller technical variances across samples than those from individual manual gating analysis and the nonrecursive data clustering analysis. Compared with manual gating segregation, DAFi-identified cell populations avoided the abrupt cut-offs on the boundaries. DAFi has been implemented to be used with multiple data clustering methods including K-means, FLOCK, FlowSOM, and

  1. STATISTICAL SAMPLING AND DATA ANALYSIS

    EPA Science Inventory

    Research is being conducted to develop approaches to improve soil and sediment sampling techniques, measurement design and geostatistics, and data analysis via chemometric, environmetric, and robust statistical methods. Improvements in sampling contaminated soil and other hetero...

  2. Testing the statistical compatibility of independent data sets

    NASA Astrophysics Data System (ADS)

    Maltoni, M.; Schwetz, T.

    2003-08-01

    We discuss a goodness-of-fit method which tests the compatibility between statistically independent data sets. The method gives sensible results even in cases where the χ2 minima of the individual data sets are very low or when several parameters are fitted to a large number of data points. In particular, it avoids the problem that a possible disagreement between data sets becomes diluted by data points which are insensitive to the crucial parameters. A formal derivation of the probability distribution function for the proposed test statistics is given, based on standard theorems of statistics. The application of the method is illustrated on data from neutrino oscillation experiments, and its complementarity to the standard goodness-of-fit is discussed.

  3. Basin Characterisation by Means of Joint Inversion of Electromagnetic Geophysical Data, Borehole Data and Multivariate Statistical Methods: The Loop Head Peninsula, Western Ireland, Case Study

    NASA Astrophysics Data System (ADS)

    Campanya, J. L.; Ogaya, X.; Jones, A. G.; Rath, V.; McConnell, B.; Haughton, P.; Prada, M.

    2016-12-01

    The Science Foundation Ireland funded project IRECCSEM project (www.ireccsem.ie) aims to evaluate Ireland's potential for onshore carbon sequestration in saline aquifers by integrating new electromagnetic geophysical data with existing geophysical and geological data. One of the objectives of this component of IRECCSEM is to characterise the subsurface beneath the Loop Head Peninsula (part of Clare Basin, Co. Clare, Ireland), and identify major electrical resistivity structures that can guide an interpretation of the carbon sequestration potential of this area. During the summer of 2014, a magnetotelluric (MT) survey was carried out on the Loop Head Peninsula, and data from a total of 140 sites were acquired, including audio-magnetotelluric (AMT), and broadband magnetotelluric (BBMT). The dataset was used to generate shallow three-dimensional (3-D) electrical resistivity models constraining the subsurface to depths of up to 3.5 km. The three-dimensional (3-D) joint inversions were performed using three different types of electromagnetic data: MT impedance tensor (Z), geomagnetic transfer functions (T), and inter-station horizontal magnetic transfer-functions (H). The interpretation of the results was complemented with second-derivative models of the resulting electrical resistivity models, and a quantitative comparison with borehole data using multivariate statistical methods. Second-derivative models were used to define the main interfaces between the geoelectrical structures, facilitating superior comparison with geological and seismic results, and also reducing the influence of the colour scale when interpreting the results. Specific analysis was performed to compare the extant borehole data with the electrical resistivity model, identifying those structures that are better characterised by the resistivity model. Finally, the electrical resistivity model was also used to propagate some of the physical properties measured in the borehole, when a good relation was

  4. How the Mastery Rubric for Statistical Literacy Can Generate Actionable Evidence about Statistical and Quantitative Learning Outcomes

    ERIC Educational Resources Information Center

    Tractenberg, Rochelle E.

    2017-01-01

    Statistical literacy is essential to an informed citizenry; and two emerging trends highlight a growing need for training that achieves this literacy. The first trend is towards "big" data: while automated analyses can exploit massive amounts of data, the interpretation--and possibly more importantly, the replication--of results are…

  5. Translation of EPA Research: Data Interpretation and Communication Strategies

    EPA Science Inventory

    Symposium Title: Social Determinants of Health, Environmental Exposures, and Disproportionately Impacted Communities: What We Know and How We Tell Others Topic 3: Community Engagement and Research Translation Title: Translation of EPA Research: Data Interpretation and Communicati...

  6. Interdisciplinary application and interpretation of EREP data within the Susquehanna River Basin

    NASA Technical Reports Server (NTRS)

    Mcmurtry, G. J.; Petersen, G. W. (Principal Investigator)

    1975-01-01

    The author has identified the following significant results. It has become that lineaments seen on Skylab and ERTS images are not equally well defined, and that the clarity of definition of a particular lineament is recorded somewhat differently by different interpreters. In an effort to determine the extent of these variations, a semi-quantitative classification scheme was devised. In the field, along the crest of Bald Eagle Mountain in central Pennsylvania, statistical techniques borrowed from sedimentary petrography (point counting) were used to determine the existence and location of intensely fractured float rock. Verification of Skylab and ERTS detected lineaments on aerial photography at different scales indicated that the brecciated zones appear to occur at one margin of the 1 km zone of brecciation defined as a lineament. In the Lock Haven area, comparison of the film types from the SL4 S190A sensor revealed the black and white Pan X photography to be superior in quality for general interpretation to the black and white IR film. Also, the color positive film is better for interpretation than the color IR film.

  7. Kappa statistic for clustered matched-pair data.

    PubMed

    Yang, Zhao; Zhou, Ming

    2014-07-10

    Kappa statistic is widely used to assess the agreement between two procedures in the independent matched-pair data. For matched-pair data collected in clusters, on the basis of the delta method and sampling techniques, we propose a nonparametric variance estimator for the kappa statistic without within-cluster correlation structure or distributional assumptions. The results of an extensive Monte Carlo simulation study demonstrate that the proposed kappa statistic provides consistent estimation and the proposed variance estimator behaves reasonably well for at least a moderately large number of clusters (e.g., K ≥50). Compared with the variance estimator ignoring dependence within a cluster, the proposed variance estimator performs better in maintaining the nominal coverage probability when the intra-cluster correlation is fair (ρ ≥0.3), with more pronounced improvement when ρ is further increased. To illustrate the practical application of the proposed estimator, we analyze two real data examples of clustered matched-pair data. Copyright © 2014 John Wiley & Sons, Ltd.

  8. Interpretation of commonly used statistical regression models.

    PubMed

    Kasza, Jessica; Wolfe, Rory

    2014-01-01

    A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.

  9. Redshift data and statistical inference

    NASA Technical Reports Server (NTRS)

    Newman, William I.; Haynes, Martha P.; Terzian, Yervant

    1994-01-01

    Frequency histograms and the 'power spectrum analysis' (PSA) method, the latter developed by Yu & Peebles (1969), have been widely employed as techniques for establishing the existence of periodicities. We provide a formal analysis of these two classes of methods, including controlled numerical experiments, to better understand their proper use and application. In particular, we note that typical published applications of frequency histograms commonly employ far greater numbers of class intervals or bins than is advisable by statistical theory sometimes giving rise to the appearance of spurious patterns. The PSA method generates a sequence of random numbers from observational data which, it is claimed, is exponentially distributed with unit mean and variance, essentially independent of the distribution of the original data. We show that the derived random processes is nonstationary and produces a small but systematic bias in the usual estimate of the mean and variance. Although the derived variable may be reasonably described by an exponential distribution, the tail of the distribution is far removed from that of an exponential, thereby rendering statistical inference and confidence testing based on the tail of the distribution completely unreliable. Finally, we examine a number of astronomical examples wherein these methods have been used giving rise to widespread acceptance of statistically unconfirmed conclusions.

  10. Technical Note: The Initial Stages of Statistical Data Analysis

    PubMed Central

    Tandy, Richard D.

    1998-01-01

    Objective: To provide an overview of several important data-related considerations in the design stage of a research project and to review the levels of measurement and their relationship to the statistical technique chosen for the data analysis. Background: When planning a study, the researcher must clearly define the research problem and narrow it down to specific, testable questions. The next steps are to identify the variables in the study, decide how to group and treat subjects, and determine how to measure, and the underlying level of measurement of, the dependent variables. Then the appropriate statistical technique can be selected for data analysis. Description: The four levels of measurement in increasing complexity are nominal, ordinal, interval, and ratio. Nominal data are categorical or “count” data, and the numbers are treated as labels. Ordinal data can be ranked in a meaningful order by magnitude. Interval data possess the characteristics of ordinal data and also have equal distances between levels. Ratio data have a natural zero point. Nominal and ordinal data are analyzed with nonparametric statistical techniques and interval and ratio data with parametric statistical techniques. Advantages: Understanding the four levels of measurement and when it is appropriate to use each is important in determining which statistical technique to use when analyzing data. PMID:16558489

  11. A statistical model and national data set for partioning fish-tissue mercury concentration variation between spatiotemporal and sample characteristic effects

    USGS Publications Warehouse

    Wente, Stephen P.

    2004-01-01

    Many Federal, Tribal, State, and local agencies monitor mercury in fish-tissue samples to identify sites with elevated fish-tissue mercury (fish-mercury) concentrations, track changes in fish-mercury concentrations over time, and produce fish-consumption advisories. Interpretation of such monitoring data commonly is impeded by difficulties in separating the effects of sample characteristics (species, tissues sampled, and sizes of fish) from the effects of spatial and temporal trends on fish-mercury concentrations. Without such a separation, variation in fish-mercury concentrations due to differences in the characteristics of samples collected over time or across space can be misattributed to temporal or spatial trends; and/or actual trends in fish-mercury concentration can be misattributed to differences in sample characteristics. This report describes a statistical model and national data set (31,813 samples) for calibrating the aforementioned statistical model that can separate spatiotemporal and sample characteristic effects in fish-mercury concentration data. This model could be useful for evaluating spatial and temporal trends in fishmercury concentrations and developing fish-consumption advisories. The observed fish-mercury concentration data and model predictions can be accessed, displayed geospatially, and downloaded via the World Wide Web (http://emmma.usgs.gov). This report and the associated web site may assist in the interpretation of large amounts of data from widespread fishmercury monitoring efforts.

  12. The Sport Students’ Ability of Literacy and Statistical Reasoning

    NASA Astrophysics Data System (ADS)

    Hidayah, N.

    2017-03-01

    The ability of literacy and statistical reasoning is very important for the students of sport education college due to the materials of statistical learning can be taken from their many activities such as sport competition, the result of test and measurement, predicting achievement based on training, finding connection among variables, and others. This research tries to describe the sport education college students’ ability of literacy and statistical reasoning related to the identification of data type, probability, table interpretation, description and explanation by using bar or pie graphic, explanation of variability, interpretation, the calculation and explanation of mean, median, and mode through an instrument. This instrument is tested to 50 college students majoring in sport resulting only 26% of all students have the ability above 30% while others still below 30%. Observing from all subjects; 56% of students have the ability of identification data classification, 49% of students have the ability to read, display and interpret table through graphic, 27% students have the ability in probability, 33% students have the ability to describe variability, and 16.32% students have the ability to read, count and describe mean, median and mode. The result of this research shows that the sport students’ ability of literacy and statistical reasoning has not been adequate and students’ statistical study has not reached comprehending concept, literary ability trining and statistical rasoning, so it is critical to increase the sport students’ ability of literacy and statistical reasoning

  13. Experimental statistics for biological sciences.

    PubMed

    Bang, Heejung; Davidian, Marie

    2010-01-01

    In this chapter, we cover basic and fundamental principles and methods in statistics - from "What are Data and Statistics?" to "ANOVA and linear regression," which are the basis of any statistical thinking and undertaking. Readers can easily find the selected topics in most introductory statistics textbooks, but we have tried to assemble and structure them in a succinct and reader-friendly manner in a stand-alone chapter. This text has long been used in real classroom settings for both undergraduate and graduate students who do or do not major in statistical sciences. We hope that from this chapter, readers would understand the key statistical concepts and terminologies, how to design a study (experimental or observational), how to analyze the data (e.g., describe the data and/or estimate the parameter(s) and make inference), and how to interpret the results. This text would be most useful if it is used as a supplemental material, while the readers take their own statistical courses or it would serve as a great reference text associated with a manual for any statistical software as a self-teaching guide.

  14. Notes on interpretation of geophysical data over areas of mineralization in Afghanistan

    USGS Publications Warehouse

    Drenth, Benjamin J.

    2011-01-01

    Afghanistan has the potential to contain substantial metallic mineral resources. Although valuable mineral deposits have been identified, much of the country's potential remains unknown. Geophysical surveys, particularly those conducted from airborne platforms, are a well-accepted and cost-effective method for obtaining information on the geological setting of a given area. This report summarizes interpretive findings from various geophysical surveys over selected mineral targets in Afghanistan, highlighting what existing data tell us. These interpretations are mainly qualitative in nature, because of the low resolution of available geophysical data. Geophysical data and simple interpretations are included for these six areas and deposit types: (1) Aynak: Sedimentary-hosted copper; (2) Zarkashan: Porphyry copper; (3) Kundalan: Porphyry copper; (4) Dusar Shaida: Volcanic-hosted massive sulphide; (5) Khanneshin: Carbonatite-hosted rare earth element; and (6) Chagai Hills: Porphyry copper.

  15. Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 2.1.

    PubMed

    Deutsch, Eric W; Overall, Christopher M; Van Eyk, Jennifer E; Baker, Mark S; Paik, Young-Ki; Weintraub, Susan T; Lane, Lydie; Martens, Lennart; Vandenbrouck, Yves; Kusebauch, Ulrike; Hancock, William S; Hermjakob, Henning; Aebersold, Ruedi; Moritz, Robert L; Omenn, Gilbert S

    2016-11-04

    Every data-rich community research effort requires a clear plan for ensuring the quality of the data interpretation and comparability of analyses. To address this need within the Human Proteome Project (HPP) of the Human Proteome Organization (HUPO), we have developed through broad consultation a set of mass spectrometry data interpretation guidelines that should be applied to all HPP data contributions. For submission of manuscripts reporting HPP protein identification results, the guidelines are presented as a one-page checklist containing 15 essential points followed by two pages of expanded description of each. Here we present an overview of the guidelines and provide an in-depth description of each of the 15 elements to facilitate understanding of the intentions and rationale behind the guidelines, for both authors and reviewers. Broadly, these guidelines provide specific directions regarding how HPP data are to be submitted to mass spectrometry data repositories, how error analysis should be presented, and how detection of novel proteins should be supported with additional confirmatory evidence. These guidelines, developed by the HPP community, are presented to the broader scientific community for further discussion.

  16. Analysis of filament statistics in fast camera data on MAST

    NASA Astrophysics Data System (ADS)

    Farley, Tom; Militello, Fulvio; Walkden, Nick; Harrison, James; Silburn, Scott; Bradley, James

    2017-10-01

    Coherent filamentary structures have been shown to play a dominant role in turbulent cross-field particle transport [D'Ippolito 2011]. An improved understanding of filaments is vital in order to control scrape off layer (SOL) density profiles and thus control first wall erosion, impurity flushing and coupling of radio frequency heating in future devices. The Elzar code [T. Farley, 2017 in prep.] is applied to MAST data. The code uses information about the magnetic equilibrium to calculate the intensity of light emission along field lines as seen in the camera images, as a function of the field lines' radial and toroidal locations at the mid-plane. In this way a `pseudo-inversion' of the intensity profiles in the camera images is achieved from which filaments can be identified and measured. In this work, a statistical analysis of the intensity fluctuations along field lines in the camera field of view is performed using techniques similar to those typically applied in standard Langmuir probe analyses. These filament statistics are interpreted in terms of the theoretical ergodic framework presented by F. Militello & J.T. Omotani, 2016, in order to better understand how time averaged filament dynamics produce the more familiar SOL density profiles. This work has received funding from the RCUK Energy programme (Grant Number EP/P012450/1), from Euratom (Grant Agreement No. 633053) and from the EUROfusion consortium.

  17. Bayesian statistical inference enhances the interpretation of contemporary randomized controlled trials.

    PubMed

    Wijeysundera, Duminda N; Austin, Peter C; Hux, Janet E; Beattie, W Scott; Laupacis, Andreas

    2009-01-01

    Randomized trials generally use "frequentist" statistics based on P-values and 95% confidence intervals. Frequentist methods have limitations that might be overcome, in part, by Bayesian inference. To illustrate these advantages, we re-analyzed randomized trials published in four general medical journals during 2004. We used Medline to identify randomized superiority trials with two parallel arms, individual-level randomization and dichotomous or time-to-event primary outcomes. Studies with P<0.05 in favor of the intervention were deemed "positive"; otherwise, they were "negative." We used several prior distributions and exact conjugate analyses to calculate Bayesian posterior probabilities for clinically relevant effects. Of 88 included studies, 39 were positive using a frequentist analysis. Although the Bayesian posterior probabilities of any benefit (relative risk or hazard ratio<1) were high in positive studies, these probabilities were lower and variable for larger benefits. The positive studies had only moderate probabilities for exceeding the effects that were assumed for calculating the sample size. By comparison, there were moderate probabilities of any benefit in negative studies. Bayesian and frequentist analyses complement each other when interpreting the results of randomized trials. Future reports of randomized trials should include both.

  18. Application of Ontology Technology in Health Statistic Data Analysis.

    PubMed

    Guo, Minjiang; Hu, Hongpu; Lei, Xingyun

    2017-01-01

    Research Purpose: establish health management ontology for analysis of health statistic data. Proposed Methods: this paper established health management ontology based on the analysis of the concepts in China Health Statistics Yearbook, and used protégé to define the syntactic and semantic structure of health statistical data. six classes of top-level ontology concepts and their subclasses had been extracted and the object properties and data properties were defined to establish the construction of these classes. By ontology instantiation, we can integrate multi-source heterogeneous data and enable administrators to have an overall understanding and analysis of the health statistic data. ontology technology provides a comprehensive and unified information integration structure of the health management domain and lays a foundation for the efficient analysis of multi-source and heterogeneous health system management data and enhancement of the management efficiency.

  19. Performance Data Gathering and Representation from Fixed-Size Statistical Data

    NASA Technical Reports Server (NTRS)

    Yan, Jerry C.; Jin, Haoqiang H.; Schmidt, Melisa A.; Kutler, Paul (Technical Monitor)

    1997-01-01

    The two commonly-used performance data types in the super-computing community, statistics and event traces, are discussed and compared. Statistical data are much more compact but lack the probative power event traces offer. Event traces, on the other hand, are unbounded and can easily fill up the entire file system during program execution. In this paper, we propose an innovative methodology for performance data gathering and representation that offers a middle ground. Two basic ideas are employed: the use of averages to replace recording data for each instance and 'formulae' to represent sequences associated with communication and control flow. The user can trade off tracing overhead, trace data size with data quality incrementally. In other words, the user will be able to limit the amount of trace data collected and, at the same time, carry out some of the analysis event traces offer using space-time views. With the help of a few simple examples, we illustrate the use of these techniques in performance tuning and compare the quality of the traces we collected with event traces. We found that the trace files thus obtained are, indeed, small, bounded and predictable before program execution, and that the quality of the space-time views generated from these statistical data are excellent. Furthermore, experimental results showed that the formulae proposed were able to capture all the sequences associated with 11 of the 15 applications tested. The performance of the formulae can be incrementally improved by allocating more memory at runtime to learn longer sequences.

  20. A Critique of Divorce Statistics and Their Interpretation.

    ERIC Educational Resources Information Center

    Crosby, John F.

    1980-01-01

    Increasingly, appeals to the divorce statistic are employed to substantiate claims that the family is in a state of breakdown and marriage is passe. This article contains a consideration of reasons why the divorce statistics are invalid and/or unreliable as indicators of the present state of marriage and family. (Author)

  1. Customizable tool for ecological data entry, assessment, monitoring, and interpretation

    USDA-ARS?s Scientific Manuscript database

    The Database for Inventory, Monitoring and Assessment (DIMA) is a highly customizable tool for data entry, assessment, monitoring, and interpretation. DIMA is a Microsoft Access database that can easily be used without Access knowledge and is available at no cost. Data can be entered for common, nat...

  2. Fuzzy logic and image processing techniques for the interpretation of seismic data

    NASA Astrophysics Data System (ADS)

    Orozco-del-Castillo, M. G.; Ortiz-Alemán, C.; Urrutia-Fucugauchi, J.; Rodríguez-Castellanos, A.

    2011-06-01

    Since interpretation of seismic data is usually a tedious and repetitive task, the ability to do so automatically or semi-automatically has become an important objective of recent research. We believe that the vagueness and uncertainty in the interpretation process makes fuzzy logic an appropriate tool to deal with seismic data. In this work we developed a semi-automated fuzzy inference system to detect the internal architecture of a mass transport complex (MTC) in seismic images. We propose that the observed characteristics of a MTC can be expressed as fuzzy if-then rules consisting of linguistic values associated with fuzzy membership functions. The constructions of the fuzzy inference system and various image processing techniques are presented. We conclude that this is a well-suited problem for fuzzy logic since the application of the proposed methodology yields a semi-automatically interpreted MTC which closely resembles the MTC from expert manual interpretation.

  3. Common pitfalls in statistical analysis: Clinical versus statistical significance

    PubMed Central

    Ranganathan, Priya; Pramesh, C. S.; Buyse, Marc

    2015-01-01

    In clinical research, study results, which are statistically significant are often interpreted as being clinically important. While statistical significance indicates the reliability of the study results, clinical significance reflects its impact on clinical practice. The third article in this series exploring pitfalls in statistical analysis clarifies the importance of differentiating between statistical significance and clinical significance. PMID:26229754

  4. Rasch fit statistics and sample size considerations for polytomous data.

    PubMed

    Smith, Adam B; Rush, Robert; Fallowfield, Lesley J; Velikova, Galina; Sharpe, Michael

    2008-05-29

    Previous research on educational data has demonstrated that Rasch fit statistics (mean squares and t-statistics) are highly susceptible to sample size variation for dichotomously scored rating data, although little is known about this relationship for polytomous data. These statistics help inform researchers about how well items fit to a unidimensional latent trait, and are an important adjunct to modern psychometrics. Given the increasing use of Rasch models in health research the purpose of this study was therefore to explore the relationship between fit statistics and sample size for polytomous data. Data were collated from a heterogeneous sample of cancer patients (n = 4072) who had completed both the Patient Health Questionnaire - 9 and the Hospital Anxiety and Depression Scale. Ten samples were drawn with replacement for each of eight sample sizes (n = 25 to n = 3200). The Rating and Partial Credit Models were applied and the mean square and t-fit statistics (infit/outfit) derived for each model. The results demonstrated that t-statistics were highly sensitive to sample size, whereas mean square statistics remained relatively stable for polytomous data. It was concluded that mean square statistics were relatively independent of sample size for polytomous data and that misfit to the model could be identified using published recommended ranges.

  5. Investigation of Statistical Inference Methodologies Through Scale Model Propagation Experiments

    DTIC Science & Technology

    2015-09-30

    statistical inference methodologies for ocean- acoustic problems by investigating and applying statistical methods to data collected from scale-model...to begin planning experiments for statistical inference applications. APPROACH In the ocean acoustics community over the past two decades...solutions for waveguide parameters. With the introduction of statistical inference to the field of ocean acoustics came the desire to interpret marginal

  6. How conduit models can be used to interpret volcano monitoring data

    NASA Astrophysics Data System (ADS)

    Thomas, M. E.; Neuberg, J. W.; Karl, S.; Collinson, A.; Pascal, K.

    2012-04-01

    During the last decade there have been major advances in the field of volcano monitoring, but to be able to take full advantage of these advances it is vital to link the monitoring data with the physical processes that give rise to the recorded signals. To obtain a better understanding of these physical processes it is necessary to understand the conditions of the system at depth. This can be achieved through numerical modelling. We present the results of conduit models representative of a silicic volcanic system and demonstrate how processes identified and interpreted from these models may manifest in the recorded monitoring data. Links are drawn to seismicity, deformation, and gas emissions. A key point is how these data compliment each other, and through utilising conduit models we are able to interpret how these different data may be recorded in response to a particular process. This is an invaluable tool as it is far easier to draw firm conclusions on what is happening at a volcano if there are several different data sets that suggest the same processes are occurring. Some of these interpretations appear useful in forecasting potentially catastrophic changes in eruptive behaviour, such as a dome collapse leading to violent explosive behaviour, and the role of monitoring data in this capacity will also be addressed.

  7. Estimation of Global Network Statistics from Incomplete Data

    PubMed Central

    Bliss, Catherine A.; Danforth, Christopher M.; Dodds, Peter Sheridan

    2014-01-01

    Complex networks underlie an enormous variety of social, biological, physical, and virtual systems. A profound complication for the science of complex networks is that in most cases, observing all nodes and all network interactions is impossible. Previous work addressing the impacts of partial network data is surprisingly limited, focuses primarily on missing nodes, and suggests that network statistics derived from subsampled data are not suitable estimators for the same network statistics describing the overall network topology. We generate scaling methods to predict true network statistics, including the degree distribution, from only partial knowledge of nodes, links, or weights. Our methods are transparent and do not assume a known generating process for the network, thus enabling prediction of network statistics for a wide variety of applications. We validate analytical results on four simulated network classes and empirical data sets of various sizes. We perform subsampling experiments by varying proportions of sampled data and demonstrate that our scaling methods can provide very good estimates of true network statistics while acknowledging limits. Lastly, we apply our techniques to a set of rich and evolving large-scale social networks, Twitter reply networks. Based on 100 million tweets, we use our scaling techniques to propose a statistical characterization of the Twitter Interactome from September 2008 to November 2008. Our treatment allows us to find support for Dunbar's hypothesis in detecting an upper threshold for the number of active social contacts that individuals maintain over the course of one week. PMID:25338183

  8. Aerosol backscatter lidar calibration and data interpretation

    NASA Technical Reports Server (NTRS)

    Kavaya, M. J.; Menzies, R. T.

    1984-01-01

    A treatment of the various factors involved in lidar data acquisition and analysis is presented. This treatment highlights sources of fundamental, systematic, modeling, and calibration errors that may affect the accurate interpretation and calibration of lidar aerosol backscatter data. The discussion primarily pertains to ground based, pulsed CO2 lidars that probe the troposphere and are calibrated using large, hard calibration targets. However, a large part of the analysis is relevant to other types of lidar systems such as lidars operating at other wavelengths; continuous wave (CW) lidars; lidars operating in other regions of the atmosphere; lidars measuring nonaerosol elastic or inelastic backscatter; airborne or Earth-orbiting lidar platforms; and lidars employing combinations of the above characteristics.

  9. Recommendations for Clinical Pathology Data Generation, Interpretation, and Reporting in Target Animal Safety Studies for Veterinary Drug Development.

    PubMed

    Siska, William; Gupta, Aradhana; Tomlinson, Lindsay; Tripathi, Niraj; von Beust, Barbara

    Clinical pathology testing is routinely performed in target animal safety studies in order to identify potential toxicity associated with administration of an investigational veterinary pharmaceutical product. Regulatory and other testing guidelines that address such studies provide recommendations for clinical pathology testing but occasionally contain outdated analytes and do not take into account interspecies physiologic differences that affect the practical selection of appropriate clinical pathology tests. Additionally, strong emphasis is often placed on statistical analysis and use of reference intervals for interpretation of test article-related clinical pathology changes, with limited attention given to the critical scientific review of clinically, toxicologically, or biologically relevant changes. The purpose of this communication from the Regulatory Affairs Committee of the American Society for Veterinary Clinical Pathology is to provide current recommendations for clinical pathology testing and data interpretation in target animal safety studies and thereby enhance the value of clinical pathology testing in these studies.

  10. Commentary to Library Statistical Data Base.

    ERIC Educational Resources Information Center

    Jones, Dennis; And Others

    The National Center for Higher Education Management Systems (NCHEMS) has developed a library statistical data base which concentrates on the management information needs of administrators of public and academic libraries. This document provides an overview of the framework and conceptual approach employed in the design of the data base. The data…

  11. Quasi-probabilities in conditioned quantum measurement and a geometric/statistical interpretation of Aharonov's weak value

    NASA Astrophysics Data System (ADS)

    Lee, Jaeha; Tsutsui, Izumi

    2017-05-01

    We show that the joint behavior of an arbitrary pair of (generally noncommuting) quantum observables can be described by quasi-probabilities, which are an extended version of the standard probabilities used for describing the outcome of measurement for a single observable. The physical situations that require these quasi-probabilities arise when one considers quantum measurement of an observable conditioned by some other variable, with the notable example being the weak measurement employed to obtain Aharonov's weak value. Specifically, we present a general prescription for the construction of quasi-joint probability (QJP) distributions associated with a given combination of observables. These QJP distributions are introduced in two complementary approaches: one from a bottom-up, strictly operational construction realized by examining the mathematical framework of the conditioned measurement scheme, and the other from a top-down viewpoint realized by applying the results of the spectral theorem for normal operators and their Fourier transforms. It is then revealed that, for a pair of simultaneously measurable observables, the QJP distribution reduces to the unique standard joint probability distribution of the pair, whereas for a noncommuting pair there exists an inherent indefiniteness in the choice of such QJP distributions, admitting a multitude of candidates that may equally be used for describing the joint behavior of the pair. In the course of our argument, we find that the QJP distributions furnish the space of operators in the underlying Hilbert space with their characteristic geometric structures such that the orthogonal projections and inner products of observables can be given statistical interpretations as, respectively, “conditionings” and “correlations”. The weak value Aw for an observable A is then given a geometric/statistical interpretation as either the orthogonal projection of A onto the subspace generated by another observable B, or

  12. Rasch fit statistics and sample size considerations for polytomous data

    PubMed Central

    Smith, Adam B; Rush, Robert; Fallowfield, Lesley J; Velikova, Galina; Sharpe, Michael

    2008-01-01

    Background Previous research on educational data has demonstrated that Rasch fit statistics (mean squares and t-statistics) are highly susceptible to sample size variation for dichotomously scored rating data, although little is known about this relationship for polytomous data. These statistics help inform researchers about how well items fit to a unidimensional latent trait, and are an important adjunct to modern psychometrics. Given the increasing use of Rasch models in health research the purpose of this study was therefore to explore the relationship between fit statistics and sample size for polytomous data. Methods Data were collated from a heterogeneous sample of cancer patients (n = 4072) who had completed both the Patient Health Questionnaire – 9 and the Hospital Anxiety and Depression Scale. Ten samples were drawn with replacement for each of eight sample sizes (n = 25 to n = 3200). The Rating and Partial Credit Models were applied and the mean square and t-fit statistics (infit/outfit) derived for each model. Results The results demonstrated that t-statistics were highly sensitive to sample size, whereas mean square statistics remained relatively stable for polytomous data. Conclusion It was concluded that mean square statistics were relatively independent of sample size for polytomous data and that misfit to the model could be identified using published recommended ranges. PMID:18510722

  13. Statistical Literacy in the Data Science Workplace

    ERIC Educational Resources Information Center

    Grant, Robert

    2017-01-01

    Statistical literacy, the ability to understand and make use of statistical information including methods, has particular relevance in the age of data science, when complex analyses are undertaken by teams from diverse backgrounds. Not only is it essential to communicate to the consumers of information but also within the team. Writing from the…

  14. Children's and Adults' Interpretation of Covariation Data: Does Symmetry of Variables Matter?

    ERIC Educational Resources Information Center

    Saffran, Andrea; Barchfeld, Petra; Sodian, Beate; Alibali, Martha W.

    2016-01-01

    In a series of 3 experiments, the authors investigated the influence of symmetry of variables on children's and adults' data interpretation. They hypothesized that symmetrical (i.e., present/present) variables would support correct interpretations more than asymmetrical (i.e., present/absent) variables. Participants were asked to judge covariation…

  15. Proper interpretation of chronic toxicity studies and their statistics: A critique of "Which level of evidence does the US National Toxicology Program provide? Statistical considerations using the Technical Report 578 on Ginkgo biloba as an example".

    PubMed

    Kissling, Grace E; Haseman, Joseph K; Zeiger, Errol

    2015-09-02

    A recent article by Gaus (2014) demonstrates a serious misunderstanding of the NTP's statistical analysis and interpretation of rodent carcinogenicity data as reported in Technical Report 578 (Ginkgo biloba) (NTP, 2013), as well as a failure to acknowledge the abundant literature on false positive rates in rodent carcinogenicity studies. The NTP reported Ginkgo biloba extract to be carcinogenic in mice and rats. Gaus claims that, in this study, 4800 statistical comparisons were possible, and that 209 of them were statistically significant (p<0.05) compared with 240 (4800×0.05) expected by chance alone; thus, the carcinogenicity of Ginkgo biloba extract cannot be definitively established. However, his assumptions and calculations are flawed since he incorrectly assumes that the NTP uses no correction for multiple comparisons, and that significance tests for discrete data operate at exactly the nominal level. He also misrepresents the NTP's decision making process, overstates the number of statistical comparisons made, and ignores the fact that the mouse liver tumor effects were so striking (e.g., p<0.0000000000001) that it is virtually impossible that they could be false positive outcomes. Gaus' conclusion that such obvious responses merely "generate a hypothesis" rather than demonstrate a real carcinogenic effect has no scientific credibility. Moreover, his claims regarding the high frequency of false positive outcomes in carcinogenicity studies are misleading because of his methodological misconceptions and errors. Published by Elsevier Ireland Ltd.

  16. Children's and adults' interpretation of covariation data: Does symmetry of variables matter?

    PubMed

    Saffran, Andrea; Barchfeld, Petra; Sodian, Beate; Alibali, Martha W

    2016-10-01

    In a series of 3 experiments, the authors investigated the influence of symmetry of variables on children's and adults' data interpretation. They hypothesized that symmetrical (i.e., present/present) variables would support correct interpretations more than asymmetrical (i.e., present/absent) variables. Participants were asked to judge covariation in a series of data sets presented in contingency tables and to justify their judgments. Participants in Experiments 1 and 2 were elementary school children (Experiment 1: n = 52 second graders, n = 44 fourth graders; Experiment 2: n = 50 second graders). Participants in Experiment 3 were adults (n = 62). In Experiment 1, children in the symmetrical variables condition performed better than those in the asymmetrical variables condition. Children in the symmetrical variables condition judged more data patterns correctly and they more frequently justified their choices by referring to the complete table. Experiment 2 ruled out the possibility that this effect was caused by differences in question format. Even when question format was held constant, second graders performed better with symmetrical variables. Experiment 3 showed that adults' data interpretation is also affected by symmetry of variables. Collectively, these results indicate that symmetry of variables affects interpretation of covariation data. The authors argue that symmetrical variables provide a context for meaningful comparison. With asymmetrical variables, the importance of the comparison is less salient. Thus, the symmetry of variables should be considered by researchers as well as educators. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  17. Library Statistical Data Base Formats and Definitions.

    ERIC Educational Resources Information Center

    Jones, Dennis; And Others

    Represented are the detailed set of data structures relevant to the categorization of information, terminology, and definitions employed in the design of the library statistical data base. The data base, or management information system, provides administrators with a framework of information and standardized data for library management, planning,…

  18. Statistical evaluation of accelerated stability data obtained at a single temperature. I. Effect of experimental errors in evaluation of stability data obtained.

    PubMed

    Yoshioka, S; Aso, Y; Takeda, Y

    1990-06-01

    Accelerated stability data obtained at a single temperature is statistically evaluated, and the utility of such data for assessment of stability is discussed focussing on the chemical stability of solution-state dosage forms. The probability that the drug content of a product is observed to be within the lower specification limit in the accelerated test is interpreted graphically. This probability depends on experimental errors in the assay and temperature control, as well as the true degradation rate and activation energy. Therefore, the observation that the drug content meets the specification in the accelerated testing can provide only limited information on the shelf-life of the drug, without the knowledge of the activation energy and the accuracy and precision of the assay and temperature control.

  19. Interpretation of electrical resistivity data acquired at the Aurora plant site

    DOT National Transportation Integrated Search

    2008-02-01

    MST proposes to acquire high-resolution reflection seismic data at the Knight Hawk Coal Company construction site. These geophysical data will be processed, analyzed and interpreted with the objective of locating and mapping any subsurface voids that...

  20. Interpretation of CHAMP Magnetic Anomaly Data over the Pannonian Basin Region Using Lower Altitude Horizontal Gradient Data

    NASA Technical Reports Server (NTRS)

    Taylor, P. T.; Kis, K. I.; Wittmann, G.

    2013-01-01

    The ESA SWARM mission will have three earth orbiting magnetometer bearing satellites one in a high orbit and two side-by-side in lower orbits. These latter satellites will record a horizontal magnetic gradient. In order to determine how we can use these gradient measurements for interpretation of large geologic units we used ten years of CHAMP data to compute a horizontal gradient map over a section of southeastern Europe with our goal to interpret these data over the Pannonian Basin of Hungary.

  1. SOCR: Statistics Online Computational Resource

    PubMed Central

    Dinov, Ivo D.

    2011-01-01

    The need for hands-on computer laboratory experience in undergraduate and graduate statistics education has been firmly established in the past decade. As a result a number of attempts have been undertaken to develop novel approaches for problem-driven statistical thinking, data analysis and result interpretation. In this paper we describe an integrated educational web-based framework for: interactive distribution modeling, virtual online probability experimentation, statistical data analysis, visualization and integration. Following years of experience in statistical teaching at all college levels using established licensed statistical software packages, like STATA, S-PLUS, R, SPSS, SAS, Systat, etc., we have attempted to engineer a new statistics education environment, the Statistics Online Computational Resource (SOCR). This resource performs many of the standard types of statistical analysis, much like other classical tools. In addition, it is designed in a plug-in object-oriented architecture and is completely platform independent, web-based, interactive, extensible and secure. Over the past 4 years we have tested, fine-tuned and reanalyzed the SOCR framework in many of our undergraduate and graduate probability and statistics courses and have evidence that SOCR resources build student’s intuition and enhance their learning. PMID:21451741

  2. Fetal Alcohol Spectrum Disorders (FASDs): Data and Statistics

    MedlinePlus

    ... alcohol screening and counseling for all women Data & Statistics Recommend on Facebook Tweet Share Compartir Prevalence of ... conducted annually by the National Center for Health Statistics (NCHS), CDC, to produce national estimates for a ...

  3. Ergodic Theory, Interpretations of Probability and the Foundations of Statistical Mechanics

    NASA Astrophysics Data System (ADS)

    van Lith, Janneke

    The traditional use of ergodic theory in the foundations of equilibrium statistical mechanics is that it provides a link between thermodynamic observables and microcanonical probabilities. First of all, the ergodic theorem demonstrates the equality of microcanonical phase averages and infinite time averages (albeit for a special class of systems, and up to a measure zero set of exceptions). Secondly, one argues that actual measurements of thermodynamic quantities yield time averaged quantities, since measurements take a long time. The combination of these two points is held to be an explanation why calculating microcanonical phase averages is a successful algorithm for predicting the values of thermodynamic observables. It is also well known that this account is problematic. This survey intends to show that ergodic theory nevertheless may have important roles to play, and it explores three other uses of ergodic theory. Particular attention is paid, firstly, to the relevance of specific interpretations of probability, and secondly, to the way in which the concern with systems in thermal equilibrium is translated into probabilistic language. With respect to the latter point, it is argued that equilibrium should not be represented as a stationary probability distribution as is standardly done; instead, a weaker definition is presented.

  4. Human biomonitoring data interpretation and ethics; obstacles or surmountable challenges?

    PubMed Central

    Sepai, Ovnair; Collier, Clare; Van Tongelen, Birgit; Casteleyn, Ludwine

    2008-01-01

    The use of human samples to assess environmental exposure and uptake of chemicals is more than an analytical exercise and requires consideration of the utility and interpretation of data as well as due consideration of ethical issues. These aspects are inextricably linked. In 2004 the EC expressed its commitment to the development of a harmonised approach to human biomonitoring (HBM) by including an action in the EU Environment and Health Strategy to develop a Human Biomonitoring Pilot Study. This further underlined the need for interpretation strategies as well as guidance on ethical issues. A workshop held in December 2006 brought together stakeholders from academia, policy makers as well as non-governmental organisations and chemical industry associations to a two day workshop built a mutual understanding of the issues in an open and frank discussion forum. This paper describes the discussion and recommendations from the workshop. The workshop developed key recommendations for a Pan-European HBM Study: 1. A strategy for the interpretation of human biomonitoring data should be developed. 2. The pilot study should include the development of a strategy to integrate health data and environmental monitoring with human biomonitoring data at national and international levels. 3. Communication strategies should be developed when designing the study and evolve as the study continues. 4. Early communication with stakeholders is essential to achieve maximum efficacy of policy developments and facilitate subsequent monitoring. 5. Member states will have to apply individually for project approval from their National Research Ethics Committees. 6. The study population needs to have sufficient information on the way data will be gathered, interpreted and disseminated and how samples will be stored and used in the future (if biobanking) before they can give informed consent. 7. The participants must be given the option of anonymity. This has an impact on follow-up. 8. The pilot

  5. Data exploration systems for databases

    NASA Technical Reports Server (NTRS)

    Greene, Richard J.; Hield, Christopher

    1992-01-01

    Data exploration systems apply machine learning techniques, multivariate statistical methods, information theory, and database theory to databases to identify significant relationships among the data and summarize information. The result of applying data exploration systems should be a better understanding of the structure of the data and a perspective of the data enabling an analyst to form hypotheses for interpreting the data. This paper argues that data exploration systems need a minimum amount of domain knowledge to guide both the statistical strategy and the interpretation of the resulting patterns discovered by these systems.

  6. Statistical ecology comes of age

    PubMed Central

    Gimenez, Olivier; Buckland, Stephen T.; Morgan, Byron J. T.; Bez, Nicolas; Bertrand, Sophie; Choquet, Rémi; Dray, Stéphane; Etienne, Marie-Pierre; Fewster, Rachel; Gosselin, Frédéric; Mérigot, Bastien; Monestiez, Pascal; Morales, Juan M.; Mortier, Frédéric; Munoz, François; Ovaskainen, Otso; Pavoine, Sandrine; Pradel, Roger; Schurr, Frank M.; Thomas, Len; Thuiller, Wilfried; Trenkel, Verena; de Valpine, Perry; Rexstad, Eric

    2014-01-01

    The desire to predict the consequences of global environmental change has been the driver towards more realistic models embracing the variability and uncertainties inherent in ecology. Statistical ecology has gelled over the past decade as a discipline that moves away from describing patterns towards modelling the ecological processes that generate these patterns. Following the fourth International Statistical Ecology Conference (1–4 July 2014) in Montpellier, France, we analyse current trends in statistical ecology. Important advances in the analysis of individual movement, and in the modelling of population dynamics and species distributions, are made possible by the increasing use of hierarchical and hidden process models. Exciting research perspectives include the development of methods to interpret citizen science data and of efficient, flexible computational algorithms for model fitting. Statistical ecology has come of age: it now provides a general and mathematically rigorous framework linking ecological theory and empirical data. PMID:25540151

  7. Statistical ecology comes of age.

    PubMed

    Gimenez, Olivier; Buckland, Stephen T; Morgan, Byron J T; Bez, Nicolas; Bertrand, Sophie; Choquet, Rémi; Dray, Stéphane; Etienne, Marie-Pierre; Fewster, Rachel; Gosselin, Frédéric; Mérigot, Bastien; Monestiez, Pascal; Morales, Juan M; Mortier, Frédéric; Munoz, François; Ovaskainen, Otso; Pavoine, Sandrine; Pradel, Roger; Schurr, Frank M; Thomas, Len; Thuiller, Wilfried; Trenkel, Verena; de Valpine, Perry; Rexstad, Eric

    2014-12-01

    The desire to predict the consequences of global environmental change has been the driver towards more realistic models embracing the variability and uncertainties inherent in ecology. Statistical ecology has gelled over the past decade as a discipline that moves away from describing patterns towards modelling the ecological processes that generate these patterns. Following the fourth International Statistical Ecology Conference (1-4 July 2014) in Montpellier, France, we analyse current trends in statistical ecology. Important advances in the analysis of individual movement, and in the modelling of population dynamics and species distributions, are made possible by the increasing use of hierarchical and hidden process models. Exciting research perspectives include the development of methods to interpret citizen science data and of efficient, flexible computational algorithms for model fitting. Statistical ecology has come of age: it now provides a general and mathematically rigorous framework linking ecological theory and empirical data.

  8. A probabilistic framework for microarray data analysis: fundamental probability models and statistical inference.

    PubMed

    Ogunnaike, Babatunde A; Gelmi, Claudio A; Edwards, Jeremy S

    2010-05-21

    Gene expression studies generate large quantities of data with the defining characteristic that the number of genes (whose expression profiles are to be determined) exceed the number of available replicates by several orders of magnitude. Standard spot-by-spot analysis still seeks to extract useful information for each gene on the basis of the number of available replicates, and thus plays to the weakness of microarrays. On the other hand, because of the data volume, treating the entire data set as an ensemble, and developing theoretical distributions for these ensembles provides a framework that plays instead to the strength of microarrays. We present theoretical results that under reasonable assumptions, the distribution of microarray intensities follows the Gamma model, with the biological interpretations of the model parameters emerging naturally. We subsequently establish that for each microarray data set, the fractional intensities can be represented as a mixture of Beta densities, and develop a procedure for using these results to draw statistical inference regarding differential gene expression. We illustrate the results with experimental data from gene expression studies on Deinococcus radiodurans following DNA damage using cDNA microarrays. Copyright (c) 2010 Elsevier Ltd. All rights reserved.

  9. Sensitivity of goodness-of-fit statistics to rainfall data rounding off

    NASA Astrophysics Data System (ADS)

    Deidda, Roberto; Puliga, Michelangelo

    An analysis based on the L-moments theory suggests of adopting the generalized Pareto distribution to interpret daily rainfall depths recorded by the rain-gauge network of the Hydrological Survey of the Sardinia Region. Nevertheless, a big problem, not yet completely resolved, arises in the estimation of a left-censoring threshold able to assure a good fitting of rainfall data with the generalized Pareto distribution. In order to detect an optimal threshold, keeping the largest possible number of data, we chose to apply a “failure-to-reject” method based on goodness-of-fit tests, as it was proposed by Choulakian and Stephens [Choulakian, V., Stephens, M.A., 2001. Goodness-of-fit tests for the generalized Pareto distribution. Technometrics 43, 478-484]. Unfortunately, the application of the test, using percentage points provided by Choulakian and Stephens (2001), did not succeed in detecting a useful threshold value in most analyzed time series. A deeper analysis revealed that these failures are mainly due to the presence of large quantities of rounding off values among sample data, affecting the distribution of goodness-of-fit statistics and leading to significant departures from percentage points expected for continuous random variables. A procedure based on Monte Carlo simulations is thus proposed to overcome these problems.

  10. Preliminary Interpretation of the MSL REMS Pressure Data

    NASA Astrophysics Data System (ADS)

    Haberle, Robert; Gómez-Elvira, Javier; de la Torre Juárez, Manuel; Harri, Ari-Matti; Hollingsworth, Jeffery; Kahanpää, Henrik; Kahre, Melinda; Martin-Torres, Javier; Mischna, Michael; Newman, Claire; Rafkin, Scot; Rennó, Nilton; Richardson, Mark; Rodríguez-Manfredi, Jose; Vasavada, Ashwin; Zorzano, Maria-Paz; REMS/MSL Science Teams

    2013-04-01

    The Rover Environmental Monitoring Station (REMS) on the Mars Science Laboratory (MSL) Curiosity rover consists of a suite of meteorological instruments that measure pressure, temperature (air and ground), wind (speed and direction), relative humidity, and the UV flux. A detailed description of the REMS sensors and their performance can be found in Gómez-Elvira et al. [2012, Space Science Reviews, 170(1-4), 583-640]. Here we focus on interpreting the first 100 sols of REMS operations with a particular emphasis on the pressure data. A unique feature of pressure data is that they reveal information on meteorological phenomena with time scales from seconds to years and spatial scales from local to global. From a single station we can learn about dust devils, regional circulations, thermal tides, synoptic weather systems, the CO2 cycle, dust storms, and interannual variability. Thus far MSL's REMS pressure sensor, provided by the Finnish Meteorological Institute and integrated into the REMS payload by Centro de Astrobiología, is performing flawlessly and our preliminary interpretation of its data includes the discovery of relatively dust-free convective vortices; a regional circulation system significantly modified by Gale crater and its central mound; the strongest thermal tides yet measured from the surface of Mars whose amplitudes and phases are very sensitive to fluctuations in global dust loading; and the classical signature of the seasonal cycling of carbon dioxide into and out of the polar caps.

  11. Methods of collecting and interpreting ground-water data

    USGS Publications Warehouse

    Bentall, Ray

    1963-01-01

    Because ground water is hidden from view, ancient man could only theorize as to its sources of replenishment and its behavior. His theories held sway until the latter part of the 17th century, which marked the first experimental work to determine the source and movement of ground water. Thus founded, the science of ground-water hydrology grew slowly and not until the 19th century is there substantial evidence of conclusions having been based on observational data. The 20th century has witnessed tremendous advances in the science in the methods of field investigation and interpretation of collected data, in the methods of determining the hydrologic characteristics of water-bearing material, and in the methods of inventorying ground-water supplies. Now, as is true of many other disciplines, the science of ground-water hydrology is characterized by frequent advancement of new ideas and techniques, refinement of old techniques, and an increasing wealth of data awaiting interpretation.So that its widely scattered staff of professional hydrologists could keep abreast of new ideas and advances in the techniques of groundwater investigation, it has been the practice in the U.S. Geological Survey to distribute such information for immediate internal use. As the methods become better established and developed, they are described in formal publications. Six papers pertaining to widely different phases of ground-water investigation comprise this particular contribution. For the sake of clarity and conformity, the original papers have been revised and edited by the compiler.

  12. Proper interpretation of chronic toxicity studies and their statistics: A critique of “Which level of evidence does the US National Toxicology Program provide? Statistical considerations using the Technical Report 578 on Ginkgo biloba as an example”

    PubMed Central

    Kissling, Grace E.; Haseman, Joseph K.; Zeiger, Errol

    2014-01-01

    A recent article by Gaus (2014) demonstrates a serious misunderstanding of the NTP’s statistical analysis and interpretation of rodent carcinogenicity data as reported in Technical Report 578 (Ginkgo biloba) (NTP 2013), as well as a failure to acknowledge the abundant literature on false positive rates in rodent carcinogenicity studies. The NTP reported Ginkgo biloba extract to be carcinogenic in mice and rats. Gaus claims that, in this study, 4800 statistical comparisons were possible, and that 209 of them were statistically significant (p<0.05) compared with 240 (4800 × 0.05) expected by chance alone; thus, the carcinogenicity of Ginkgo biloba extract cannot be definitively established. However, his assumptions and calculations are flawed since he incorrectly assumes that the NTP uses no correction for multiple comparisons, and that significance tests for discrete data operate at exactly the nominal level. He also misrepresents the NTP’s decision making process, overstates the number of statistical comparisons made, and ignores that fact that that the mouse liver tumor effects were so striking (e.g., p<0.0000000000001) that it is virtually impossible that they could be false positive outcomes. Gaus’ conclusion that such obvious responses merely “generate a hypothesis” rather than demonstrate a real carcinogenic effect has no scientific credibility. Moreover, his claims regarding the high frequency of false positive outcomes in carcinogenicity studies are misleading because of his methodological misconceptions and errors. PMID:25261588

  13. Methods, applications, interpretations and challenges of interrupted time series (ITS) data: protocol for a scoping review.

    PubMed

    Ewusie, Joycelyne E; Blondal, Erik; Soobiah, Charlene; Beyene, Joseph; Thabane, Lehana; Straus, Sharon E; Hamid, Jemila S

    2017-07-02

    Interrupted time series (ITS) design involves collecting data across multiple time points before and after the implementation of an intervention to assess the effect of the intervention on an outcome. ITS designs have become increasingly common in recent times with frequent use in assessing impact of evidence implementation interventions. Several statistical methods are currently available for analysing data from ITS designs; however, there is a lack of guidance on which methods are optimal for different data types and on their implications in interpreting results. Our objective is to conduct a scoping review of existing methods for analysing ITS data, to summarise their characteristics and properties, as well as to examine how the results are reported. We also aim to identify gaps and methodological deficiencies. We will search electronic databases from inception until August 2016 (eg, MEDLINE and JSTOR). Two reviewers will independently screen titles, abstracts and full-text articles and complete the data abstraction. The anticipated outcome will be a summarised description of all the methods that have been used in analysing ITS data in health research, how those methods were applied, their strengths and limitations and the transparency of interpretation/reporting of the results. We will provide summary tables of the characteristics of the included studies. We will also describe the similarities and differences of the various methods. Ethical approval is not required for this study since we are just considering the methods used in the analysis and there will not be identifiable patient data. Results will be disseminated through open access peer-reviewed publications. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  14. Applications of spatial statistical network models to stream data

    USGS Publications Warehouse

    Isaak, Daniel J.; Peterson, Erin E.; Ver Hoef, Jay M.; Wenger, Seth J.; Falke, Jeffrey A.; Torgersen, Christian E.; Sowder, Colin; Steel, E. Ashley; Fortin, Marie-Josée; Jordan, Chris E.; Ruesch, Aaron S.; Som, Nicholas; Monestiez, Pascal

    2014-01-01

    Streams and rivers host a significant portion of Earth's biodiversity and provide important ecosystem services for human populations. Accurate information regarding the status and trends of stream resources is vital for their effective conservation and management. Most statistical techniques applied to data measured on stream networks were developed for terrestrial applications and are not optimized for streams. A new class of spatial statistical model, based on valid covariance structures for stream networks, can be used with many common types of stream data (e.g., water quality attributes, habitat conditions, biological surveys) through application of appropriate distributions (e.g., Gaussian, binomial, Poisson). The spatial statistical network models account for spatial autocorrelation (i.e., nonindependence) among measurements, which allows their application to databases with clustered measurement locations. Large amounts of stream data exist in many areas where spatial statistical analyses could be used to develop novel insights, improve predictions at unsampled sites, and aid in the design of efficient monitoring strategies at relatively low cost. We review the topic of spatial autocorrelation and its effects on statistical inference, demonstrate the use of spatial statistics with stream datasets relevant to common research and management questions, and discuss additional applications and development potential for spatial statistics on stream networks. Free software for implementing the spatial statistical network models has been developed that enables custom applications with many stream databases.

  15. Interpreting Association from Graphical Displays

    ERIC Educational Resources Information Center

    Fitzallen, Noleine

    2016-01-01

    Research that has explored students' interpretations of graphical representations has not extended to include how students apply understanding of particular statistical concepts related to one graphical representation to interpret different representations. This paper reports on the way in which students' understanding of covariation, evidenced…

  16. Heat-Passing Framework for Robust Interpretation of Data in Networks

    PubMed Central

    Fang, Yi; Sun, Mengtian; Ramani, Karthik

    2015-01-01

    Researchers are regularly interested in interpreting the multipartite structure of data entities according to their functional relationships. Data is often heterogeneous with intricately hidden inner structure. With limited prior knowledge, researchers are likely to confront the problem of transforming this data into knowledge. We develop a new framework, called heat-passing, which exploits intrinsic similarity relationships within noisy and incomplete raw data, and constructs a meaningful map of the data. The proposed framework is able to rank, cluster, and visualize the data all at once. The novelty of this framework is derived from an analogy between the process of data interpretation and that of heat transfer, in which all data points contribute simultaneously and globally to reveal intrinsic similarities between regions of data, meaningful coordinates for embedding the data, and exemplar data points that lie at optimal positions for heat transfer. We demonstrate the effectiveness of the heat-passing framework for robustly partitioning the complex networks, analyzing the globin family of proteins and determining conformational states of macromolecules in the presence of high levels of noise. The results indicate that the methodology is able to reveal functionally consistent relationships in a robust fashion with no reference to prior knowledge. The heat-passing framework is very general and has the potential for applications to a broad range of research fields, for example, biological networks, social networks and semantic analysis of documents. PMID:25668316

  17. An interpretation model of GPR point data in tunnel geological prediction

    NASA Astrophysics Data System (ADS)

    He, Yu-yao; Li, Bao-qi; Guo, Yuan-shu; Wang, Teng-na; Zhu, Ya

    2017-02-01

    GPR (Ground Penetrating Radar) point data plays an absolutely necessary role in the tunnel geological prediction. However, the research work on the GPR point data is very little and the results does not meet the actual requirements of the project. In this paper, a GPR point data interpretation model which is based on WD (Wigner distribution) and deep CNN (convolutional neural network) is proposed. Firstly, the GPR point data is transformed by WD to get the map of time-frequency joint distribution; Secondly, the joint distribution maps are classified by deep CNN. The approximate location of geological target is determined by observing the time frequency map in parallel; Finally, the GPR point data is interpreted according to the classification results and position information from the map. The simulation results show that classification accuracy of the test dataset (include 1200 GPR point data) is 91.83% at the 200 iteration. Our model has the advantages of high accuracy and fast training speed, and can provide a scientific basis for the development of tunnel construction and excavation plan.

  18. The Systematic Interpretation of Cosmic Ray Data (The Transport Project)

    NASA Technical Reports Server (NTRS)

    Guzik, T. Gregory

    1997-01-01

    The Transport project's primary goals were to: (1) Provide measurements of critical fragmentation cross sections; (2) Study the cross section systematics; (3) Improve the galactic cosmic ray propagation methodology; and (4) Use the new cross section measurements to improve the interpretation of cosmic ray data. To accomplish these goals a collaboration was formed consisting of researchers in the US at Louisiana State University (LSU), Lawrence Berkeley Laboratory (LBL), Goddard Space Flight Center (GSFC), the University of Minnesota (UM), New Mexico State University (NMSU), in France at the Centre d'Etudes de Saclay and in Italy at the Universita di Catania. The US institutions, lead by LSU, were responsible for measuring new cross sections using the LBL HISS facility, analysis of these measurements and their application to interpreting cosmic ray data. France developed a liquid hydrogen target that was used in the HISS experiment and participated in the data interpretation. Italy developed a Multifunctional Neutron Spectrometer (MUFFINS) for the HISS runs to measure the energy spectra, angular distributions and multiplicities of neutrons emitted during the high energy interactions. The Transport Project was originally proposed to NASA during Summer, 1988 and funding began January, 1989. Transport was renewed twice (1991, 1994) and finally concluded at LSU on September, 30, 1997. During the more than 8 years of effort we had two major experiment runs at LBL, obtained data on the interaction of twenty different beams with a liquid hydrogen target, completed the analysis of fifteen of these datasets obtaining 590 new cross section measurements, published nine journal articles as well as eighteen conference proceedings papers, and presented more than thirty conference talks.

  19. The Empirical Nature and Statistical Treatment of Missing Data

    ERIC Educational Resources Information Center

    Tannenbaum, Christyn E.

    2009-01-01

    Introduction. Missing data is a common problem in research and can produce severely misleading analyses, including biased estimates of statistical parameters, and erroneous conclusions. In its 1999 report, the APA Task Force on Statistical Inference encouraged authors to report complications such as missing data and discouraged the use of…

  20. PANDA: pathway and annotation explorer for visualizing and interpreting gene-centric data.

    PubMed

    Hart, Steven N; Moore, Raymond M; Zimmermann, Michael T; Oliver, Gavin R; Egan, Jan B; Bryce, Alan H; Kocher, Jean-Pierre A

    2015-01-01

    Objective. Bringing together genomics, transcriptomics, proteomics, and other -omics technologies is an important step towards developing highly personalized medicine. However, instrumentation has advances far beyond expectations and now we are able to generate data faster than it can be interpreted. Materials and Methods. We have developed PANDA (Pathway AND Annotation) Explorer, a visualization tool that integrates gene-level annotation in the context of biological pathways to help interpret complex data from disparate sources. PANDA is a web-based application that displays data in the context of well-studied pathways like KEGG, BioCarta, and PharmGKB. PANDA represents data/annotations as icons in the graph while maintaining the other data elements (i.e., other columns for the table of annotations). Custom pathways from underrepresented diseases can be imported when existing data sources are inadequate. PANDA also allows sharing annotations among collaborators. Results. In our first use case, we show how easy it is to view supplemental data from a manuscript in the context of a user's own data. Another use-case is provided describing how PANDA was leveraged to design a treatment strategy from the somatic variants found in the tumor of a patient with metastatic sarcomatoid renal cell carcinoma. Conclusion. PANDA facilitates the interpretation of gene-centric annotations by visually integrating this information with context of biological pathways. The application can be downloaded or used directly from our website: http://bioinformaticstools.mayo.edu/research/panda-viewer/.

  1. Data relay system specifications for ERTS image interpretation

    NASA Technical Reports Server (NTRS)

    Daniel, J. F.

    1970-01-01

    Experiments with the Data Collection System (DCS) of the Earth Resources Technology Satellites (ERTS) have been developed to stress ERTS applications in the Earth Resources Observation Systems (EROS) Program. Active pursuit of this policy has resulted in the design of eight specific experiments requiring a total of 98 DCS ground-data platforms. Of these eight experiments, six are intended to make use of DCS data as an aid in image interpretation, while two make use of the capability to relay data from remote locations. Preliminary discussions regarding additional experiments indicate a need for at least 150 DCS platforms within the EROS Program for ERTS experimentation. Results from the experiments will be used to assess the DCS suitability for satellites providing on-line, real-time, data relay capability. The rationale of the total DCS network of ground platforms and the relationship of each experiment to that rationale are discussed.

  2. A data-driven approach for quality assessment of radiologic interpretations.

    PubMed

    Hsu, William; Han, Simon X; Arnold, Corey W; Bui, Alex At; Enzmann, Dieter R

    2016-04-01

    Given the increasing emphasis on delivering high-quality, cost-efficient healthcare, improved methodologies are needed to measure the accuracy and utility of ordered diagnostic examinations in achieving the appropriate diagnosis. Here, we present a data-driven approach for performing automated quality assessment of radiologic interpretations using other clinical information (e.g., pathology) as a reference standard for individual radiologists, subspecialty sections, imaging modalities, and entire departments. Downstream diagnostic conclusions from the electronic medical record are utilized as "truth" to which upstream diagnoses generated by radiology are compared. The described system automatically extracts and compares patient medical data to characterize concordance between clinical sources. Initial results are presented in the context of breast imaging, matching 18 101 radiologic interpretations with 301 pathology diagnoses and achieving a precision and recall of 84% and 92%, respectively. The presented data-driven method highlights the challenges of integrating multiple data sources and the application of information extraction tools to facilitate healthcare quality improvement. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  3. Using Data from Climate Science to Teach Introductory Statistics

    ERIC Educational Resources Information Center

    Witt, Gary

    2013-01-01

    This paper shows how the application of simple statistical methods can reveal to students important insights from climate data. While the popular press is filled with contradictory opinions about climate science, teachers can encourage students to use introductory-level statistics to analyze data for themselves on this important issue in public…

  4. Feature-Based Statistical Analysis of Combustion Simulation Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bennett, J; Krishnamoorthy, V; Liu, S

    2011-11-18

    We present a new framework for feature-based statistical analysis of large-scale scientific data and demonstrate its effectiveness by analyzing features from Direct Numerical Simulations (DNS) of turbulent combustion. Turbulent flows are ubiquitous and account for transport and mixing processes in combustion, astrophysics, fusion, and climate modeling among other disciplines. They are also characterized by coherent structure or organized motion, i.e. nonlocal entities whose geometrical features can directly impact molecular mixing and reactive processes. While traditional multi-point statistics provide correlative information, they lack nonlocal structural information, and hence, fail to provide mechanistic causality information between organized fluid motion and mixing andmore » reactive processes. Hence, it is of great interest to capture and track flow features and their statistics together with their correlation with relevant scalar quantities, e.g. temperature or species concentrations. In our approach we encode the set of all possible flow features by pre-computing merge trees augmented with attributes, such as statistical moments of various scalar fields, e.g. temperature, as well as length-scales computed via spectral analysis. The computation is performed in an efficient streaming manner in a pre-processing step and results in a collection of meta-data that is orders of magnitude smaller than the original simulation data. This meta-data is sufficient to support a fully flexible and interactive analysis of the features, allowing for arbitrary thresholds, providing per-feature statistics, and creating various global diagnostics such as Cumulative Density Functions (CDFs), histograms, or time-series. We combine the analysis with a rendering of the features in a linked-view browser that enables scientists to interactively explore, visualize, and analyze the equivalent of one terabyte of simulation data. We highlight the utility of this new framework for

  5. Application of multivariate statistical techniques in microbial ecology.

    PubMed

    Paliy, O; Shankar, V

    2016-03-01

    Recent advances in high-throughput methods of molecular analyses have led to an explosion of studies generating large-scale ecological data sets. In particular, noticeable effect has been attained in the field of microbial ecology, where new experimental approaches provided in-depth assessments of the composition, functions and dynamic changes of complex microbial communities. Because even a single high-throughput experiment produces large amount of data, powerful statistical techniques of multivariate analysis are well suited to analyse and interpret these data sets. Many different multivariate techniques are available, and often it is not clear which method should be applied to a particular data set. In this review, we describe and compare the most widely used multivariate statistical techniques including exploratory, interpretive and discriminatory procedures. We consider several important limitations and assumptions of these methods, and we present examples of how these approaches have been utilized in recent studies to provide insight into the ecology of the microbial world. Finally, we offer suggestions for the selection of appropriate methods based on the research question and data set structure. © 2016 John Wiley & Sons Ltd.

  6. Statistics of Statisticians: Critical Mass of Statistics and Operational Research Groups

    NASA Astrophysics Data System (ADS)

    Kenna, Ralph; Berche, Bertrand

    Using a recently developed model, inspired by mean field theory in statistical physics, and data from the UK's Research Assessment Exercise, we analyse the relationship between the qualities of statistics and operational research groups and the quantities of researchers in them. Similar to other academic disciplines, we provide evidence for a linear dependency of quality on quantity up to an upper critical mass, which is interpreted as the average maximum number of colleagues with whom a researcher can communicate meaningfully within a research group. The model also predicts a lower critical mass, which research groups should strive to achieve to avoid extinction. For statistics and operational research, the lower critical mass is estimated to be 9 ± 3. The upper critical mass, beyond which research quality does not significantly depend on group size, is 17 ± 6.

  7. The Importance of Statistical Modeling in Data Analysis and Inference

    ERIC Educational Resources Information Center

    Rollins, Derrick, Sr.

    2017-01-01

    Statistical inference simply means to draw a conclusion based on information that comes from data. Error bars are the most commonly used tool for data analysis and inference in chemical engineering data studies. This work demonstrates, using common types of data collection studies, the importance of specifying the statistical model for sound…

  8. The Statistical Interpretation of Entropy: An Activity

    ERIC Educational Resources Information Center

    Timmberlake, Todd

    2010-01-01

    The second law of thermodynamics, which states that the entropy of an isolated macroscopic system can increase but will not decrease, is a cornerstone of modern physics. Ludwig Boltzmann argued that the second law arises from the motion of the atoms that compose the system. Boltzmann's statistical mechanics provides deep insight into the…

  9. Using Pooled Data and Data Visualization to Introduce Statistical Concepts in the General Chemistry Laboratory

    ERIC Educational Resources Information Center

    Olsen, Robert J.

    2008-01-01

    I describe how data pooling and data visualization can be employed in the first-semester general chemistry laboratory to introduce core statistical concepts such as central tendency and dispersion of a data set. The pooled data are plotted as a 1-D scatterplot, a purpose-designed number line through which statistical features of the data are…

  10. Blinded interpretation of study results can feasibly and effectively diminish interpretation bias.

    PubMed

    Järvinen, Teppo L N; Sihvonen, Raine; Bhandari, Mohit; Sprague, Sheila; Malmivaara, Antti; Paavola, Mika; Schünemann, Holger J; Guyatt, Gordon H

    2014-07-01

    Controversial and misleading interpretation of data from randomized trials is common. How to avoid misleading interpretation has received little attention. Herein, we describe two applications of an approach that involves blinded interpretation of the results by study investigators. The approach involves developing two interpretations of the results on the basis of a blinded review of the primary outcome data (experimental treatment A compared with control treatment B). One interpretation assumes that A is the experimental intervention and another assumes that A is the control. After agreeing that there will be no further changes, the investigators record their decisions and sign the resulting document. The randomization code is then broken, the correct interpretation chosen, and the manuscript finalized. Review of the document by an external authority before finalization can provide another safeguard against interpretation bias. We found the blinded preparation of a summary of data interpretation described in this article practical, efficient, and useful. Blinded data interpretation may decrease the frequency of misleading data interpretation. Widespread adoption of blinded data interpretation would be greatly facilitated were it added to the minimum set of recommendations outlining proper conduct of randomized controlled trials (eg, the Consolidated Standards of Reporting Trials statement). Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.

  11. Animal movement: Statistical models for telemetry data

    USGS Publications Warehouse

    Hooten, Mevin B.; Johnson, Devin S.; McClintock, Brett T.; Morales, Juan M.

    2017-01-01

    The study of animal movement has always been a key element in ecological science, because it is inherently linked to critical processes that scale from individuals to populations and communities to ecosystems. Rapid improvements in biotelemetry data collection and processing technology have given rise to a variety of statistical methods for characterizing animal movement. The book serves as a comprehensive reference for the types of statistical models used to study individual-based animal movement. 

  12. Statistical and population genetics issues of two Hungarian datasets from the aspect of DNA evidence interpretation.

    PubMed

    Szabolcsi, Zoltán; Farkas, Zsuzsa; Borbély, Andrea; Bárány, Gusztáv; Varga, Dániel; Heinrich, Attila; Völgyi, Antónia; Pamjav, Horolma

    2015-11-01

    When the DNA profile from a crime-scene matches that of a suspect, the weight of DNA evidence depends on the unbiased estimation of the match probability of the profiles. For this reason, it is required to establish and expand the databases that reflect the actual allele frequencies in the population applied. 21,473 complete DNA profiles from Databank samples were used to establish the allele frequency database to represent the population of Hungarian suspects. We used fifteen STR loci (PowerPlex ESI16) including five, new ESS loci. The aim was to calculate the statistical, forensic efficiency parameters for the Databank samples and compare the newly detected data to the earlier report. The population substructure caused by relatedness may influence the frequency of profiles estimated. As our Databank profiles were considered non-random samples, possible relationships between the suspects can be assumed. Therefore, population inbreeding effect was estimated using the FIS calculation. The overall inbreeding parameter was found to be 0.0106. Furthermore, we tested the impact of the two allele frequency datasets on 101 randomly chosen STR profiles, including full and partial profiles. The 95% confidence interval estimates for the profile frequencies (pM) resulted in a tighter range when we used the new dataset compared to the previously published ones. We found that the FIS had less effect on frequency values in the 21,473 samples than the application of minimum allele frequency. No genetic substructure was detected by STRUCTURE analysis. Due to the low level of inbreeding effect and the high number of samples, the new dataset provides unbiased and precise estimates of LR for statistical interpretation of forensic casework and allows us to use lower allele frequencies. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  13. Compositionality and Statistics in Adjective Acquisition: 4-Year-Olds Interpret "Tall" and "Short" Based on the Size Distributions of Novel Noun Referents

    ERIC Educational Resources Information Center

    Barner, David; Snedeker, Jesse

    2008-01-01

    Four experiments investigated 4-year-olds' understanding of adjective-noun compositionality and their sensitivity to statistics when interpreting scalar adjectives. In Experiments 1 and 2, children selected "tall" and "short" items from 9 novel objects called "pimwits" (1-9 in. in height) or from this array plus 4 taller or shorter distractor…

  14. Compression technique for large statistical data bases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Eggers, S.J.; Olken, F.; Shoshani, A.

    1981-03-01

    The compression of large statistical databases is explored and are proposed for organizing the compressed data, such that the time required to access the data is logarithmic. The techniques exploit special characteristics of statistical databases, namely, variation in the space required for the natural encoding of integer attributes, a prevalence of a few repeating values or constants, and the clustering of both data of the same length and constants in long, separate series. The techniques are variations of run-length encoding, in which modified run-lengths for the series are extracted from the data stream and stored in a header, which ismore » used to form the base level of a B-tree index into the database. The run-lengths are cumulative, and therefore the access time of the data is logarithmic in the size of the header. The details of the compression scheme and its implementation are discussed, several special cases are presented, and an analysis is given of the relative performance of the various versions.« less

  15. Feature combination networks for the interpretation of statistical machine learning models: application to Ames mutagenicity.

    PubMed

    Webb, Samuel J; Hanser, Thierry; Howlin, Brendan; Krause, Paul; Vessey, Jonathan D

    2014-03-25

    A new algorithm has been developed to enable the interpretation of black box models. The developed algorithm is agnostic to learning algorithm and open to all structural based descriptors such as fragments, keys and hashed fingerprints. The algorithm has provided meaningful interpretation of Ames mutagenicity predictions from both random forest and support vector machine models built on a variety of structural fingerprints.A fragmentation algorithm is utilised to investigate the model's behaviour on specific substructures present in the query. An output is formulated summarising causes of activation and deactivation. The algorithm is able to identify multiple causes of activation or deactivation in addition to identifying localised deactivations where the prediction for the query is active overall. No loss in performance is seen as there is no change in the prediction; the interpretation is produced directly on the model's behaviour for the specific query. Models have been built using multiple learning algorithms including support vector machine and random forest. The models were built on public Ames mutagenicity data and a variety of fingerprint descriptors were used. These models produced a good performance in both internal and external validation with accuracies around 82%. The models were used to evaluate the interpretation algorithm. Interpretation was revealed that links closely with understood mechanisms for Ames mutagenicity. This methodology allows for a greater utilisation of the predictions made by black box models and can expedite further study based on the output for a (quantitative) structure activity model. Additionally the algorithm could be utilised for chemical dataset investigation and knowledge extraction/human SAR development.

  16. Statistical significance versus clinical relevance.

    PubMed

    van Rijn, Marieke H C; Bech, Anneke; Bouyer, Jean; van den Brand, Jan A J G

    2017-04-01

    In March this year, the American Statistical Association (ASA) posted a statement on the correct use of P-values, in response to a growing concern that the P-value is commonly misused and misinterpreted. We aim to translate these warnings given by the ASA into a language more easily understood by clinicians and researchers without a deep background in statistics. Moreover, we intend to illustrate the limitations of P-values, even when used and interpreted correctly, and bring more attention to the clinical relevance of study findings using two recently reported studies as examples. We argue that P-values are often misinterpreted. A common mistake is saying that P < 0.05 means that the null hypothesis is false, and P ≥0.05 means that the null hypothesis is true. The correct interpretation of a P-value of 0.05 is that if the null hypothesis were indeed true, a similar or more extreme result would occur 5% of the times upon repeating the study in a similar sample. In other words, the P-value informs about the likelihood of the data given the null hypothesis and not the other way around. A possible alternative related to the P-value is the confidence interval (CI). It provides more information on the magnitude of an effect and the imprecision with which that effect was estimated. However, there is no magic bullet to replace P-values and stop erroneous interpretation of scientific results. Scientists and readers alike should make themselves familiar with the correct, nuanced interpretation of statistical tests, P-values and CIs. © The Author 2017. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.

  17. COMPOSE: Using temporal patterns for interpreting wearable sensor data with computer interpretable guidelines.

    PubMed

    Urovi, V; Jimenez-Del-Toro, O; Dubosson, F; Ruiz Torres, A; Schumacher, M I

    2017-02-01

    This paper describes a novel temporal logic-based framework for reasoning with continuous data collected from wearable sensors. The work is motivated by the Metabolic Syndrome, a cluster of conditions which are linked to obesity and unhealthy lifestyle. We assume that, by interpreting the physiological parameters of continuous monitoring, we can identify which patients have a higher risk of Metabolic Syndrome. We define temporal patterns for reasoning with continuous data and specify the coordination mechanisms for combining different sets of clinical guidelines that relate to this condition. The proposed solution is tested with data provided by twenty subjects, which used sensors for four days of continuous monitoring. The results are compared to the gold standard. The novelty of the framework stands in extending a temporal logic formalism, namely the Event Calculus, with temporal patterns. These patterns are helpful to specify the rules for reasoning with continuous data and in combining new knowledge into one consistent outcome that is tailored to the patient's profile. The overall approach opens new possibilities for delivering patient-tailored interventions and educational material before the patients present the symptoms of the disease. Copyright © 2016 Elsevier Ltd. All rights reserved.

  18. Statistical Machine Learning for Structured and High Dimensional Data

    DTIC Science & Technology

    2014-09-17

    AFRL-OSR-VA-TR-2014-0234 STATISTICAL MACHINE LEARNING FOR STRUCTURED AND HIGH DIMENSIONAL DATA Larry Wasserman CARNEGIE MELLON UNIVERSITY Final...Re . 8-98) v Prescribed by ANSI Std. Z39.18 14-06-2014 Final Dec 2009 - Aug 2014 Statistical Machine Learning for Structured and High Dimensional...area of resource-constrained statistical estimation. machine learning , high-dimensional statistics U U U UU John Lafferty 773-702-3813 > Research under

  19. Using Facebook Data to Turn Introductory Statistics Students into Consultants

    ERIC Educational Resources Information Center

    Childers, Adam F.

    2017-01-01

    Facebook provides businesses and organizations with copious data that describe how users are interacting with their page. This data affords an excellent opportunity to turn introductory statistics students into consultants to analyze the Facebook data using descriptive and inferential statistics. This paper details a semester-long project that…

  20. Quantitative interpretation of Great Lakes remote sensing data

    NASA Technical Reports Server (NTRS)

    Shook, D. F.; Salzman, J.; Svehla, R. A.; Gedney, R. T.

    1980-01-01

    The paper discusses the quantitative interpretation of Great Lakes remote sensing water quality data. Remote sensing using color information must take into account (1) the existence of many different organic and inorganic species throughout the Great Lakes, (2) the occurrence of a mixture of species in most locations, and (3) spatial variations in types and concentration of species. The radiative transfer model provides a potential method for an orderly analysis of remote sensing data and a physical basis for developing quantitative algorithms. Predictions and field measurements of volume reflectances are presented which show the advantage of using a radiative transfer model. Spectral absorptance and backscattering coefficients for two inorganic sediments are reported.

  1. Building software tools to help contextualize and interpret monitoring data

    USDA-ARS?s Scientific Manuscript database

    Even modest monitoring efforts at landscape scales produce large volumes of data.These are most useful if they can be interpreted relative to land potential or other similar sites. However, for many ecological systems reference conditions may not be defined or are poorly described, which hinders und...

  2. Amplitude interpretation and visualization of three-dimensional reflection data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Enachescu, M.E.

    1994-07-01

    Digital recording and processing of modern three-dimensional surveys allow for relative good preservation and correct spatial positioning of seismic reflection amplitude. A four-dimensional seismic reflection field matrix R (x,y,t,A), which can be computer visualized (i.e., real-time interactively rendered, edited, and animated), is now available to the interpreter. The amplitude contains encoded geological information indirectly related to lithologies and reservoir properties. The magnitude of the amplitude depends not only on the acoustic impedance contrast across a boundary, but is also strongly affected by the shape of the reflective boundary. This allows the interpreter to image subtle tectonic and structural elements notmore » obvious on time-structure maps. The use of modern workstations allows for appropriate color coding of the total available amplitude range, routine on-screen time/amplitude extraction, and late display of horizon amplitude maps (horizon slices) or complex amplitude-structure spatial visualization. Stratigraphic, structural, tectonic, fluid distribution, and paleogeographic information are commonly obtained by displaying the amplitude variation A = A(x,y,t) associated with a particular reflective surface or seismic interval. As illustrated with several case histories, traditional structural and stratigraphic interpretation combined with a detailed amplitude study generally greatly enhance extraction of subsurface geological information from a reflection data volume. In the context of three-dimensional seismic surveys, the horizon amplitude map (horizon slice), amplitude attachment to structure and [open quotes]bright clouds[close quotes] displays are very powerful tools available to the interpreter.« less

  3. Issues affecting the interpretation of eastern hardwood resource statistics

    Treesearch

    William G. Luppold; William H. McWilliams

    2000-01-01

    Forest inventory statistics developed by the USDA Forest Service are used by customers ranging from forest industry to state and local economic development groups. In recent years, these statistics have been used increasingly to justify greater utilization of the eastem hardwood resource or to evaluate the sustainability of expanding demand for hardwood roundwood and...

  4. A Framework for Assessing High School Students' Statistical Reasoning.

    PubMed

    Chan, Shiau Wei; Ismail, Zaleha; Sumintono, Bambang

    2016-01-01

    Based on a synthesis of literature, earlier studies, analyses and observations on high school students, this study developed an initial framework for assessing students' statistical reasoning about descriptive statistics. Framework descriptors were established across five levels of statistical reasoning and four key constructs. The former consisted of idiosyncratic reasoning, verbal reasoning, transitional reasoning, procedural reasoning, and integrated process reasoning. The latter include describing data, organizing and reducing data, representing data, and analyzing and interpreting data. In contrast to earlier studies, this initial framework formulated a complete and coherent statistical reasoning framework. A statistical reasoning assessment tool was then constructed from this initial framework. The tool was administered to 10 tenth-grade students in a task-based interview. The initial framework was refined, and the statistical reasoning assessment tool was revised. The ten students then participated in the second task-based interview, and the data obtained were used to validate the framework. The findings showed that the students' statistical reasoning levels were consistent across the four constructs, and this result confirmed the framework's cohesion. Developed to contribute to statistics education, this newly developed statistical reasoning framework provides a guide for planning learning goals and designing instruction and assessments.

  5. Statistical Analysis of Time-Series from Monitoring of Active Volcanic Vents

    NASA Astrophysics Data System (ADS)

    Lachowycz, S.; Cosma, I.; Pyle, D. M.; Mather, T. A.; Rodgers, M.; Varley, N. R.

    2016-12-01

    Despite recent advances in the collection and analysis of time-series from volcano monitoring, and the resulting insights into volcanic processes, challenges remain in forecasting and interpreting activity from near real-time analysis of monitoring data. Statistical methods have potential to characterise the underlying structure and facilitate intercomparison of these time-series, and so inform interpretation of volcanic activity. We explore the utility of multiple statistical techniques that could be widely applicable to monitoring data, including Shannon entropy and detrended fluctuation analysis, by their application to various data streams from volcanic vents during periods of temporally variable activity. Each technique reveals changes through time in the structure of some of the data that were not apparent from conventional analysis. For example, we calculate the Shannon entropy (a measure of the randomness of a signal) of time-series from the recent dome-forming eruptions of Volcán de Colima (Mexico) and Soufrière Hills (Montserrat). The entropy of real-time seismic measurements and the count rate of certain volcano-seismic event types from both volcanoes is found to be temporally variable, with these data generally having higher entropy during periods of lava effusion and/or larger explosions. In some instances, the entropy shifts prior to or coincident with changes in seismic or eruptive activity, some of which were not clearly recognised by real-time monitoring. Comparison with other statistics demonstrates the sensitivity of the entropy to the data distribution, but that it is distinct from conventional statistical measures such as coefficient of variation. We conclude that each analysis technique examined could provide valuable insights for interpretation of diverse monitoring time-series.

  6. Statistical Techniques to Analyze Pesticide Data Program Food Residue Observations.

    PubMed

    Szarka, Arpad Z; Hayworth, Carol G; Ramanarayanan, Tharacad S; Joseph, Robert S I

    2018-06-26

    The U.S. EPA conducts dietary-risk assessments to ensure that levels of pesticides on food in the U.S. food supply are safe. Often these assessments utilize conservative residue estimates, maximum residue levels (MRLs), and a high-end estimate derived from registrant-generated field-trial data sets. A more realistic estimate of consumers' pesticide exposure from food may be obtained by utilizing residues from food-monitoring programs, such as the Pesticide Data Program (PDP) of the U.S. Department of Agriculture. A substantial portion of food-residue concentrations in PDP monitoring programs are below the limits of detection (left-censored), which makes the comparison of regulatory-field-trial and PDP residue levels difficult. In this paper, we present a novel adaption of established statistical techniques, the Kaplan-Meier estimator (K-M), the robust regression on ordered statistic (ROS), and the maximum-likelihood estimator (MLE), to quantify the pesticide-residue concentrations in the presence of heavily censored data sets. The examined statistical approaches include the most commonly used parametric and nonparametric methods for handling left-censored data that have been used in the fields of medical and environmental sciences. This work presents a case study in which data of thiamethoxam residue on bell pepper generated from registrant field trials were compared with PDP-monitoring residue values. The results from the statistical techniques were evaluated and compared with commonly used simple substitution methods for the determination of summary statistics. It was found that the maximum-likelihood estimator (MLE) is the most appropriate statistical method to analyze this residue data set. Using the MLE technique, the data analyses showed that the median and mean PDP bell pepper residue levels were approximately 19 and 7 times lower, respectively, than the corresponding statistics of the field-trial residues.

  7. Perception in statistical graphics

    NASA Astrophysics Data System (ADS)

    VanderPlas, Susan Ruth

    There has been quite a bit of research on statistical graphics and visualization, generally focused on new types of graphics, new software to create graphics, interactivity, and usability studies. Our ability to interpret and use statistical graphics hinges on the interface between the graph itself and the brain that perceives and interprets it, and there is substantially less research on the interplay between graph, eye, brain, and mind than is sufficient to understand the nature of these relationships. The goal of the work presented here is to further explore the interplay between a static graph, the translation of that graph from paper to mental representation (the journey from eye to brain), and the mental processes that operate on that graph once it is transferred into memory (mind). Understanding the perception of statistical graphics should allow researchers to create more effective graphs which produce fewer distortions and viewer errors while reducing the cognitive load necessary to understand the information presented in the graph. Taken together, these experiments should lay a foundation for exploring the perception of statistical graphics. There has been considerable research into the accuracy of numerical judgments viewers make from graphs, and these studies are useful, but it is more effective to understand how errors in these judgments occur so that the root cause of the error can be addressed directly. Understanding how visual reasoning relates to the ability to make judgments from graphs allows us to tailor graphics to particular target audiences. In addition, understanding the hierarchy of salient features in statistical graphics allows us to clearly communicate the important message from data or statistical models by constructing graphics which are designed specifically for the perceptual system.

  8. What defines an Expert? - Uncertainty in the interpretation of seismic data

    NASA Astrophysics Data System (ADS)

    Bond, C. E.

    2008-12-01

    Studies focusing on the elicitation of information from experts are concentrated primarily in economics and world markets, medical practice and expert witness testimonies. Expert elicitation theory has been applied in the natural sciences, most notably in the prediction of fluid flow in hydrological studies. In the geological sciences expert elicitation has been limited to theoretical analysis with studies focusing on the elicitation element, gaining expert opinion rather than necessarily understanding the basis behind the expert view. In these cases experts are defined in a traditional sense, based for example on: standing in the field, no. of years of experience, no. of peer reviewed publications, the experts position in a company hierarchy or academia. Here traditional indicators of expertise have been compared for significance on affective seismic interpretation. Polytomous regression analysis has been used to assess the relative significance of length and type of experience on the outcome of a seismic interpretation exercise. Following the initial analysis the techniques used by participants to interpret the seismic image were added as additional variables to the analysis. Specific technical skills and techniques were found to be more important for the affective geological interpretation of seismic data than the traditional indicators of expertise. The results of a seismic interpretation exercise, the techniques used to interpret the seismic and the participant's prior experience have been combined and analysed to answer the question - who is and what defines an expert?

  9. A generalized K statistic for estimating phylogenetic signal from shape and other high-dimensional multivariate data.

    PubMed

    Adams, Dean C

    2014-09-01

    Phylogenetic signal is the tendency for closely related species to display similar trait values due to their common ancestry. Several methods have been developed for quantifying phylogenetic signal in univariate traits and for sets of traits treated simultaneously, and the statistical properties of these approaches have been extensively studied. However, methods for assessing phylogenetic signal in high-dimensional multivariate traits like shape are less well developed, and their statistical performance is not well characterized. In this article, I describe a generalization of the K statistic of Blomberg et al. that is useful for quantifying and evaluating phylogenetic signal in highly dimensional multivariate data. The method (K(mult)) is found from the equivalency between statistical methods based on covariance matrices and those based on distance matrices. Using computer simulations based on Brownian motion, I demonstrate that the expected value of K(mult) remains at 1.0 as trait variation among species is increased or decreased, and as the number of trait dimensions is increased. By contrast, estimates of phylogenetic signal found with a squared-change parsimony procedure for multivariate data change with increasing trait variation among species and with increasing numbers of trait dimensions, confounding biological interpretations. I also evaluate the statistical performance of hypothesis testing procedures based on K(mult) and find that the method displays appropriate Type I error and high statistical power for detecting phylogenetic signal in high-dimensional data. Statistical properties of K(mult) were consistent for simulations using bifurcating and random phylogenies, for simulations using different numbers of species, for simulations that varied the number of trait dimensions, and for different underlying models of trait covariance structure. Overall these findings demonstrate that K(mult) provides a useful means of evaluating phylogenetic signal in high

  10. Atomic-scale phase composition through multivariate statistical analysis of atom probe tomography data.

    PubMed

    Keenan, Michael R; Smentkowski, Vincent S; Ulfig, Robert M; Oltman, Edward; Larson, David J; Kelly, Thomas F

    2011-06-01

    We demonstrate for the first time that multivariate statistical analysis techniques can be applied to atom probe tomography data to estimate the chemical composition of a sample at the full spatial resolution of the atom probe in three dimensions. Whereas the raw atom probe data provide the specific identity of an atom at a precise location, the multivariate results can be interpreted in terms of the probabilities that an atom representing a particular chemical phase is situated there. When aggregated to the size scale of a single atom (∼0.2 nm), atom probe spectral-image datasets are huge and extremely sparse. In fact, the average spectrum will have somewhat less than one total count per spectrum due to imperfect detection efficiency. These conditions, under which the variance in the data is completely dominated by counting noise, test the limits of multivariate analysis, and an extensive discussion of how to extract the chemical information is presented. Efficient numerical approaches to performing principal component analysis (PCA) on these datasets, which may number hundreds of millions of individual spectra, are put forward, and it is shown that PCA can be computed in a few seconds on a typical laptop computer.

  11. Statistical procedures for analyzing mental health services data.

    PubMed

    Elhai, Jon D; Calhoun, Patrick S; Ford, Julian D

    2008-08-15

    In mental health services research, analyzing service utilization data often poses serious problems, given the presence of substantially skewed data distributions. This article presents a non-technical introduction to statistical methods specifically designed to handle the complexly distributed datasets that represent mental health service use, including Poisson, negative binomial, zero-inflated, and zero-truncated regression models. A flowchart is provided to assist the investigator in selecting the most appropriate method. Finally, a dataset of mental health service use reported by medical patients is described, and a comparison of results across several different statistical methods is presented. Implications of matching data analytic techniques appropriately with the often complexly distributed datasets of mental health services utilization variables are discussed.

  12. Statistical Analysis of Research Data | Center for Cancer Research

    Cancer.gov

    Recent advances in cancer biology have resulted in the need for increased statistical analysis of research data. The Statistical Analysis of Research Data (SARD) course will be held on April 5-6, 2018 from 9 a.m.-5 p.m. at the National Institutes of Health's Natcher Conference Center, Balcony C on the Bethesda Campus. SARD is designed to provide an overview on the general

  13. Security of statistical data bases: invasion of privacy through attribute correlational modeling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Palley, M.A.

    This study develops, defines, and applies a statistical technique for the compromise of confidential information in a statistical data base. Attribute Correlational Modeling (ACM) recognizes that the information contained in a statistical data base represents real world statistical phenomena. As such, ACM assumes correlational behavior among the database attributes. ACM proceeds to compromise confidential information through creation of a regression model, where the confidential attribute is treated as the dependent variable. The typical statistical data base may preclude the direct application of regression. In this scenario, the research introduces the notion of a synthetic data base, created through legitimate queriesmore » of the actual data base, and through proportional random variation of responses to these queries. The synthetic data base is constructed to resemble the actual data base as closely as possible in a statistical sense. ACM then applies regression analysis to the synthetic data base, and utilizes the derived model to estimate confidential information in the actual database.« less

  14. IGESS: a statistical approach to integrating individual-level genotype data and summary statistics in genome-wide association studies.

    PubMed

    Dai, Mingwei; Ming, Jingsi; Cai, Mingxuan; Liu, Jin; Yang, Can; Wan, Xiang; Xu, Zongben

    2017-09-15

    Results from genome-wide association studies (GWAS) suggest that a complex phenotype is often affected by many variants with small effects, known as 'polygenicity'. Tens of thousands of samples are often required to ensure statistical power of identifying these variants with small effects. However, it is often the case that a research group can only get approval for the access to individual-level genotype data with a limited sample size (e.g. a few hundreds or thousands). Meanwhile, summary statistics generated using single-variant-based analysis are becoming publicly available. The sample sizes associated with the summary statistics datasets are usually quite large. How to make the most efficient use of existing abundant data resources largely remains an open question. In this study, we propose a statistical approach, IGESS, to increasing statistical power of identifying risk variants and improving accuracy of risk prediction by i ntegrating individual level ge notype data and s ummary s tatistics. An efficient algorithm based on variational inference is developed to handle the genome-wide analysis. Through comprehensive simulation studies, we demonstrated the advantages of IGESS over the methods which take either individual-level data or summary statistics data as input. We applied IGESS to perform integrative analysis of Crohns Disease from WTCCC and summary statistics from other studies. IGESS was able to significantly increase the statistical power of identifying risk variants and improve the risk prediction accuracy from 63.2% ( ±0.4% ) to 69.4% ( ±0.1% ) using about 240 000 variants. The IGESS software is available at https://github.com/daviddaigithub/IGESS . zbxu@xjtu.edu.cn or xwan@comp.hkbu.edu.hk or eeyang@hkbu.edu.hk. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  15. Interpretation methodology and analysis of in-flight lightning data

    NASA Technical Reports Server (NTRS)

    Rudolph, T.; Perala, R. A.

    1982-01-01

    A methodology is presented whereby electromagnetic measurements of inflight lightning stroke data can be understood and extended to other aircraft. Recent measurements made on the NASA F106B aircraft indicate that sophisticated numerical techniques and new developments in corona modeling are required to fully understand the data. Thus the problem is nontrivial and successful interpretation can lead to a significant understanding of the lightning/aircraft interaction event. This is of particular importance because of the problem of lightning induced transient upset of new technology low level microcircuitry which is being used in increasing quantities in modern and future avionics. Inflight lightning data is analyzed and lightning environments incident upon the F106B are determined.

  16. Kinetic Analysis of Dynamic Positron Emission Tomography Data using Open-Source Image Processing and Statistical Inference Tools.

    PubMed

    Hawe, David; Hernández Fernández, Francisco R; O'Suilleabháin, Liam; Huang, Jian; Wolsztynski, Eric; O'Sullivan, Finbarr

    2012-05-01

    In dynamic mode, positron emission tomography (PET) can be used to track the evolution of injected radio-labelled molecules in living tissue. This is a powerful diagnostic imaging technique that provides a unique opportunity to probe the status of healthy and pathological tissue by examining how it processes substrates. The spatial aspect of PET is well established in the computational statistics literature. This article focuses on its temporal aspect. The interpretation of PET time-course data is complicated because the measured signal is a combination of vascular delivery and tissue retention effects. If the arterial time-course is known, the tissue time-course can typically be expressed in terms of a linear convolution between the arterial time-course and the tissue residue. In statistical terms, the residue function is essentially a survival function - a familiar life-time data construct. Kinetic analysis of PET data is concerned with estimation of the residue and associated functionals such as flow, flux, volume of distribution and transit time summaries. This review emphasises a nonparametric approach to the estimation of the residue based on a piecewise linear form. Rapid implementation of this by quadratic programming is described. The approach provides a reference for statistical assessment of widely used one- and two-compartmental model forms. We illustrate the method with data from two of the most well-established PET radiotracers, (15)O-H(2)O and (18)F-fluorodeoxyglucose, used for assessment of blood perfusion and glucose metabolism respectively. The presentation illustrates the use of two open-source tools, AMIDE and R, for PET scan manipulation and model inference.

  17. Teaching for Statistical Literacy: Utilising Affordances in Real-World Data

    ERIC Educational Resources Information Center

    Chick, Helen L.; Pierce, Robyn

    2012-01-01

    It is widely held that context is important in teaching mathematics and statistics. Consideration of context is central to statistical thinking, and any teaching of statistics must incorporate this aspect. Indeed, it has been advocated that real-world data sets can motivate the learning of statistical principles. It is not, however, a…

  18. Quantitative Seismic Interpretation: Applying Rock Physics Tools to Reduce Interpretation Risk

    NASA Astrophysics Data System (ADS)

    Sondergeld, Carl H.

    This book is divided into seven chapters that cover rock physics, statistical rock physics, seismic inversion techniques, case studies, and work flows. On balance, the emphasis is on rock physics. Included are 56 color figures that greatly help in the interpretation of more complicated plots and displays.The domain of rock physics falls between petrophysics and seismics. It is the basis for interpreting seismic observations and therefore is pivotal to the understanding of this book. The first two chapters are dedicated to this topic (109 pages).

  19. Research as Praxis: Perspectives on Interpreting Data from a Science and Indigenous Knowledge Systems Project

    ERIC Educational Resources Information Center

    Nhalevilo, Emilia Afonso; Ogunniyi, Meshach

    2014-01-01

    This article presents a reflection on an aspect of research methodology, particularly on the interpretation strategy of data from a Science and Indigenous Knowledge Systems Project (SIKSP) in a South African university. The data interpretation problem arose while we were analysing the effects of a series of SIKSP-based workshops on the views of a…

  20. 49 CFR Schedule G to Subpart B of... - Selected Statistical Data

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 49 Transportation 8 2010-10-01 2010-10-01 false Selected Statistical Data G Schedule G to Subpart... Statistical Data [Dollars in thousands] () Greyhound Lines, Inc. () Trailways combined () All study carriers.... 9002, L. 9, col. (b) Other Statistics: 25Number of regulator route intercity passenger miles Sch. 9002...

  1. Statistical Approaches to Assess Biosimilarity from Analytical Data.

    PubMed

    Burdick, Richard; Coffey, Todd; Gutka, Hiten; Gratzl, Gyöngyi; Conlon, Hugh D; Huang, Chi-Ting; Boyne, Michael; Kuehne, Henriette

    2017-01-01

    Protein therapeutics have unique critical quality attributes (CQAs) that define their purity, potency, and safety. The analytical methods used to assess CQAs must be able to distinguish clinically meaningful differences in comparator products, and the most important CQAs should be evaluated with the most statistical rigor. High-risk CQA measurements assess the most important attributes that directly impact the clinical mechanism of action or have known implications for safety, while the moderate- to low-risk characteristics may have a lower direct impact and thereby may have a broader range to establish similarity. Statistical equivalence testing is applied for high-risk CQA measurements to establish the degree of similarity (e.g., highly similar fingerprint, highly similar, or similar) of selected attributes. Notably, some high-risk CQAs (e.g., primary sequence or disulfide bonding) are qualitative (e.g., the same as the originator or not the same) and therefore not amenable to equivalence testing. For biosimilars, an important step is the acquisition of a sufficient number of unique originator drug product lots to measure the variability in the originator drug manufacturing process and provide sufficient statistical power for the analytical data comparisons. Together, these analytical evaluations, along with PK/PD and safety data (immunogenicity), provide the data necessary to determine if the totality of the evidence warrants a designation of biosimilarity and subsequent licensure for marketing in the USA. In this paper, a case study approach is used to provide examples of analytical similarity exercises and the appropriateness of statistical approaches for the example data.

  2. Statistical Significance Testing from Three Perspectives and Interpreting Statistical Significance and Nonsignificance and the Role of Statistics in Research.

    ERIC Educational Resources Information Center

    Levin, Joel R.; And Others

    1993-01-01

    Journal editors respond to criticisms of reliance on statistical significance in research reporting. Joel R. Levin ("Journal of Educational Psychology") defends its use, whereas William D. Schafer ("Measurement and Evaluation in Counseling and Development") emphasizes the distinction between statistically significant and important. William Asher…

  3. Improved interpretation of satellite altimeter data using genetic algorithms

    NASA Technical Reports Server (NTRS)

    Messa, Kenneth; Lybanon, Matthew

    1992-01-01

    Genetic algorithms (GA) are optimization techniques that are based on the mechanics of evolution and natural selection. They take advantage of the power of cumulative selection, in which successive incremental improvements in a solution structure become the basis for continued development. A GA is an iterative procedure that maintains a 'population' of 'organisms' (candidate solutions). Through successive 'generations' (iterations) the population as a whole improves in simulation of Darwin's 'survival of the fittest'. GA's have been shown to be successful where noise significantly reduces the ability of other search techniques to work effectively. Satellite altimetry provides useful information about oceanographic phenomena. It provides rapid global coverage of the oceans and is not as severely hampered by cloud cover as infrared imagery. Despite these and other benefits, several factors lead to significant difficulty in interpretation. The GA approach to the improved interpretation of satellite data involves the representation of the ocean surface model as a string of parameters or coefficients from the model. The GA searches in parallel, a population of such representations (organisms) to obtain the individual that is best suited to 'survive', that is, the fittest as measured with respect to some 'fitness' function. The fittest organism is the one that best represents the ocean surface model with respect to the altimeter data.

  4. Data series embedding and scale invariant statistics.

    PubMed

    Michieli, I; Medved, B; Ristov, S

    2010-06-01

    Data sequences acquired from bio-systems such as human gait data, heart rate interbeat data, or DNA sequences exhibit complex dynamics that is frequently described by a long-memory or power-law decay of autocorrelation function. One way of characterizing that dynamics is through scale invariant statistics or "fractal-like" behavior. For quantifying scale invariant parameters of physiological signals several methods have been proposed. Among them the most common are detrended fluctuation analysis, sample mean variance analyses, power spectral density analysis, R/S analysis, and recently in the realm of the multifractal approach, wavelet analysis. In this paper it is demonstrated that embedding the time series data in the high-dimensional pseudo-phase space reveals scale invariant statistics in the simple fashion. The procedure is applied on different stride interval data sets from human gait measurements time series (Physio-Bank data library). Results show that introduced mapping adequately separates long-memory from random behavior. Smaller gait data sets were analyzed and scale-free trends for limited scale intervals were successfully detected. The method was verified on artificially produced time series with known scaling behavior and with the varying content of noise. The possibility for the method to falsely detect long-range dependence in the artificially generated short range dependence series was investigated. (c) 2009 Elsevier B.V. All rights reserved.

  5. Statistics corner: A guide to appropriate use of correlation coefficient in medical research.

    PubMed

    Mukaka, M M

    2012-09-01

    Correlation is a statistical method used to assess a possible linear association between two continuous variables. It is simple both to calculate and to interpret. However, misuse of correlation is so common among researchers that some statisticians have wished that the method had never been devised at all. The aim of this article is to provide a guide to appropriate use of correlation in medical research and to highlight some misuse. Examples of the applications of the correlation coefficient have been provided using data from statistical simulations as well as real data. Rule of thumb for interpreting size of a correlation coefficient has been provided.

  6. Computational Approaches and Tools for Exposure Prioritization and Biomonitoring Data Interpretation

    EPA Science Inventory

    The ability to describe the source-environment-exposure-dose-response continuum is essential for identifying exposures of greater concern to prioritize chemicals for toxicity testing or risk assessment, as well as for interpreting biomarker data for better assessment of exposure ...

  7. Statistical Quality Control of Moisture Data in GEOS DAS

    NASA Technical Reports Server (NTRS)

    Dee, D. P.; Rukhovets, L.; Todling, R.

    1999-01-01

    A new statistical quality control algorithm was recently implemented in the Goddard Earth Observing System Data Assimilation System (GEOS DAS). The final step in the algorithm consists of an adaptive buddy check that either accepts or rejects outlier observations based on a local statistical analysis of nearby data. A basic assumption in any such test is that the observed field is spatially coherent, in the sense that nearby data can be expected to confirm each other. However, the buddy check resulted in excessive rejection of moisture data, especially during the Northern Hemisphere summer. The analysis moisture variable in GEOS DAS is water vapor mixing ratio. Observational evidence shows that the distribution of mixing ratio errors is far from normal. Furthermore, spatial correlations among mixing ratio errors are highly anisotropic and difficult to identify. Both factors contribute to the poor performance of the statistical quality control algorithm. To alleviate the problem, we applied the buddy check to relative humidity data instead. This variable explicitly depends on temperature and therefore exhibits a much greater spatial coherence. As a result, reject rates of moisture data are much more reasonable and homogeneous in time and space.

  8. Interdisciplinary applications and interpretations of ERTS data within the Susquehanna River basin

    NASA Technical Reports Server (NTRS)

    Mcmurtry, G. J.; Petersen, G. W. (Principal Investigator)

    1975-01-01

    The author has identified the following significant results. The full potential of high quality data is achieved only with the application of efficient and effective interpretation techniques. An excellent operating system for handling, processing, and interpreting ERTS-1 and other MSS data was achieved. Programs for processing digital data are implemented on a large nondedicated general purpose computer. Significant results were attained in mapping land use, agricultural croplands, forest resources, and vegetative cover. Categories of land use classified and mapped depend upon the geographic location, the detail required, and the types of lands use of interest. Physiographic and structural provinces are spectacularly displayed on ERTS-1 MSS image mosaics. Geologic bedrock structures show up well and formation contacts can sometimes be traced for hundreds of kilometers. Large circular structures and regional features, previously obscured by the detail of higher resolution data, can be seen. Environmental monitoring was performed in three areas: coal strip mining, coal refuse problems, and damage to vegetation caused by insects and pollution.

  9. Some Dimensions of Data Quality in Statistical Systems

    DOT National Transportation Integrated Search

    1997-07-01

    An important objective of a statistical data system is to enable users of the data to recommend (an organizations to take) rational action for solving problems or for improving quality of service of manufactured product. With this view in mind, this ...

  10. Drug safety data mining with a tree-based scan statistic.

    PubMed

    Kulldorff, Martin; Dashevsky, Inna; Avery, Taliser R; Chan, Arnold K; Davis, Robert L; Graham, David; Platt, Richard; Andrade, Susan E; Boudreau, Denise; Gunter, Margaret J; Herrinton, Lisa J; Pawloski, Pamala A; Raebel, Marsha A; Roblin, Douglas; Brown, Jeffrey S

    2013-05-01

    In post-marketing drug safety surveillance, data mining can potentially detect rare but serious adverse events. Assessing an entire collection of drug-event pairs is traditionally performed on a predefined level of granularity. It is unknown a priori whether a drug causes a very specific or a set of related adverse events, such as mitral valve disorders, all valve disorders, or different types of heart disease. This methodological paper evaluates the tree-based scan statistic data mining method to enhance drug safety surveillance. We use a three-million-member electronic health records database from the HMO Research Network. Using the tree-based scan statistic, we assess the safety of selected antifungal and diabetes drugs, simultaneously evaluating overlapping diagnosis groups at different granularity levels, adjusting for multiple testing. Expected and observed adverse event counts were adjusted for age, sex, and health plan, producing a log likelihood ratio test statistic. Out of 732 evaluated disease groupings, 24 were statistically significant, divided among 10 non-overlapping disease categories. Five of the 10 signals are known adverse effects, four are likely due to confounding by indication, while one may warrant further investigation. The tree-based scan statistic can be successfully applied as a data mining tool in drug safety surveillance using observational data. The total number of statistical signals was modest and does not imply a causal relationship. Rather, data mining results should be used to generate candidate drug-event pairs for rigorous epidemiological studies to evaluate the individual and comparative safety profiles of drugs. Copyright © 2013 John Wiley & Sons, Ltd.

  11. A Framework for Assessing High School Students' Statistical Reasoning

    PubMed Central

    2016-01-01

    Based on a synthesis of literature, earlier studies, analyses and observations on high school students, this study developed an initial framework for assessing students’ statistical reasoning about descriptive statistics. Framework descriptors were established across five levels of statistical reasoning and four key constructs. The former consisted of idiosyncratic reasoning, verbal reasoning, transitional reasoning, procedural reasoning, and integrated process reasoning. The latter include describing data, organizing and reducing data, representing data, and analyzing and interpreting data. In contrast to earlier studies, this initial framework formulated a complete and coherent statistical reasoning framework. A statistical reasoning assessment tool was then constructed from this initial framework. The tool was administered to 10 tenth-grade students in a task-based interview. The initial framework was refined, and the statistical reasoning assessment tool was revised. The ten students then participated in the second task-based interview, and the data obtained were used to validate the framework. The findings showed that the students’ statistical reasoning levels were consistent across the four constructs, and this result confirmed the framework’s cohesion. Developed to contribute to statistics education, this newly developed statistical reasoning framework provides a guide for planning learning goals and designing instruction and assessments. PMID:27812091

  12. Teaching Real Data Interpretation with Models (TRIM): Analysis of Student Dialogue in a Large-Enrollment Cell and Developmental Biology Course

    PubMed Central

    Zagallo, Patricia; Meddleton, Shanice; Bolger, Molly S.

    2016-01-01

    We present our design for a cell biology course to integrate content with scientific practices, specifically data interpretation and model-based reasoning. A 2-yr research project within this course allowed us to understand how students interpret authentic biological data in this setting. Through analysis of written work, we measured the extent to which students’ data interpretations were valid and/or generative. By analyzing small-group audio recordings during in-class activities, we demonstrated how students used instructor-provided models to build and refine data interpretations. Often, students used models to broaden the scope of data interpretations, tying conclusions to a biological significance. Coding analysis revealed several strategies and challenges that were common among students in this collaborative setting. Spontaneous argumentation was present in 82% of transcripts, suggesting that data interpretation using models may be a way to elicit this important disciplinary practice. Argumentation dialogue included frequent co-construction of claims backed by evidence from data. Other common strategies included collaborative decoding of data representations and noticing data patterns before making interpretive claims. Focusing on irrelevant data patterns was the most common challenge. Our findings provide evidence to support the feasibility of supporting students’ data-interpretation skills within a large lecture course. PMID:27193288

  13. Graphic strategies for analyzing and interpreting curricular mapping data.

    PubMed

    Armayor, Graciela M; Leonard, Sean T

    2010-06-15

    To describe curricular mapping strategies used in analyzing and interpreting curricular mapping data and present findings on how these strategies were used to facilitate curricular development. Nova Southeastern University's doctor of pharmacy curriculum was mapped to the college's educational outcomes. The mapping process included development of educational outcomes followed by analysis of course material and semi-structured interviews with course faculty members. Data collected per course outcome included learning opportunities and assessment measures used. Nearly 1,000 variables and 10,000 discrete rows of curricular data were collected. Graphic representations of curricular data were created using bar charts and stacked area graphs relating the learning opportunities to the educational outcomes. Graphs were used in the curricular evaluation and development processes to facilitate the identification of curricular holes, sequencing misalignments, learning opportunities, and assessment measures. Mapping strategies that use graphic representations of curricular data serve as effective diagnostic and curricular development tools.

  14. Flexibility in data interpretation: effects of representational format.

    PubMed

    Braithwaite, David W; Goldstone, Robert L

    2013-01-01

    Graphs and tables differentially support performance on specific tasks. For tasks requiring reading off single data points, tables are as good as or better than graphs, while for tasks involving relationships among data points, graphs often yield better performance. However, the degree to which graphs and tables support flexibility across a range of tasks is not well-understood. In two experiments, participants detected main and interaction effects in line graphs and tables of bivariate data. Graphs led to more efficient performance, but also lower flexibility, as indicated by a larger discrepancy in performance across tasks. In particular, detection of main effects of variables represented in the graph legend was facilitated relative to detection of main effects of variables represented in the x-axis. Graphs may be a preferable representational format when the desired task or analytical perspective is known in advance, but may also induce greater interpretive bias than tables, necessitating greater care in their use and design.

  15. Flexibility in data interpretation: effects of representational format

    PubMed Central

    Braithwaite, David W.; Goldstone, Robert L.

    2013-01-01

    Graphs and tables differentially support performance on specific tasks. For tasks requiring reading off single data points, tables are as good as or better than graphs, while for tasks involving relationships among data points, graphs often yield better performance. However, the degree to which graphs and tables support flexibility across a range of tasks is not well-understood. In two experiments, participants detected main and interaction effects in line graphs and tables of bivariate data. Graphs led to more efficient performance, but also lower flexibility, as indicated by a larger discrepancy in performance across tasks. In particular, detection of main effects of variables represented in the graph legend was facilitated relative to detection of main effects of variables represented in the x-axis. Graphs may be a preferable representational format when the desired task or analytical perspective is known in advance, but may also induce greater interpretive bias than tables, necessitating greater care in their use and design. PMID:24427145

  16. Interpretation of correlations in clinical research.

    PubMed

    Hung, Man; Bounsanga, Jerry; Voss, Maren Wright

    2017-11-01

    Critically analyzing research is a key skill in evidence-based practice and requires knowledge of research methods, results interpretation, and applications, all of which rely on a foundation based in statistics. Evidence-based practice makes high demands on trained medical professionals to interpret an ever-expanding array of research evidence. As clinical training emphasizes medical care rather than statistics, it is useful to review the basics of statistical methods and what they mean for interpreting clinical studies. We reviewed the basic concepts of correlational associations, violations of normality, unobserved variable bias, sample size, and alpha inflation. The foundations of causal inference were discussed and sound statistical analyses were examined. We discuss four ways in which correlational analysis is misused, including causal inference overreach, over-reliance on significance, alpha inflation, and sample size bias. Recent published studies in the medical field provide evidence of causal assertion overreach drawn from correlational findings. The findings present a primer on the assumptions and nature of correlational methods of analysis and urge clinicians to exercise appropriate caution as they critically analyze the evidence before them and evaluate evidence that supports practice. Critically analyzing new evidence requires statistical knowledge in addition to clinical knowledge. Studies can overstate relationships, expressing causal assertions when only correlational evidence is available. Failure to account for the effect of sample size in the analyses tends to overstate the importance of predictive variables. It is important not to overemphasize the statistical significance without consideration of effect size and whether differences could be considered clinically meaningful.

  17. Basic Statistical Concepts and Methods for Earth Scientists

    USGS Publications Warehouse

    Olea, Ricardo A.

    2008-01-01

    INTRODUCTION Statistics is the science of collecting, analyzing, interpreting, modeling, and displaying masses of numerical data primarily for the characterization and understanding of incompletely known systems. Over the years, these objectives have lead to a fair amount of analytical work to achieve, substantiate, and guide descriptions and inferences.

  18. Data Warehousing: How To Make Your Statistics Meaningful.

    ERIC Educational Resources Information Center

    Flaherty, William

    2001-01-01

    Examines how one school district found a way to turn data collection from a disparate mountain of statistics into more useful information by using their Instructional Decision Support System. System software is explained as is how the district solved some data management challenges. (GR)

  19. Statistical modeling of natural backgrounds in hyperspectral LWIR data

    NASA Astrophysics Data System (ADS)

    Truslow, Eric; Manolakis, Dimitris; Cooley, Thomas; Meola, Joseph

    2016-09-01

    Hyperspectral sensors operating in the long wave infrared (LWIR) have a wealth of applications including remote material identification and rare target detection. While statistical models for modeling surface reflectance in visible and near-infrared regimes have been well studied, models for the temperature and emissivity in the LWIR have not been rigorously investigated. In this paper, we investigate modeling hyperspectral LWIR data using a statistical mixture model for the emissivity and surface temperature. Statistical models for the surface parameters can be used to simulate surface radiances and at-sensor radiance which drives the variability of measured radiance and ultimately the performance of signal processing algorithms. Thus, having models that adequately capture data variation is extremely important for studying performance trades. The purpose of this paper is twofold. First, we study the validity of this model using real hyperspectral data, and compare the relative variability of hyperspectral data in the LWIR and visible and near-infrared (VNIR) regimes. Second, we illustrate how materials that are easily distinguished in the VNIR, may be difficult to separate when imaged in the LWIR.

  20. Theoretical analysis to interpret projected image data from in-situ 3-dimensional equiaxed nucleation and growth

    NASA Astrophysics Data System (ADS)

    Mooney, Robin P.; McFadden, Shaun

    2017-12-01

    In-situ observation of crystal growth in transparent media allows us to observe solidification phase change in real-time. These systems are analogous to opaque systems such as metals. The interpretation of transient 2-dimensional area projections from 3-dimensional phase change phenomena occurring in a bulky sample is problematic due to uncertainty of impingement and hidden nucleation events; in stereology this problem is known as over-projection. This manuscript describes and demonstrates a continuous model for nucleation and growth using the well-established Johnson-Mehl-Avrami-Kolmogorov model, and provides a method to relate 3-dimensional volumetric data (nucleation events, volume fraction) to observed data in a 2-dimensional projection (nucleation count, area fraction). A parametric analysis is performed; the projection phenomenon is shown to be significant in cases where nucleation is occurring continuously with a relatively large variance. In general, area fraction on a projection plane will overestimate the volume fraction within the sample and the nuclei count recorded on the projection plane will underestimate the number of real nucleation events. The statistical framework given in this manuscript provides a methodology to deal with the differences between the observed (projected) data and the real (volumetric) measures.

  1. Interpretation of medical imaging data with a mobile application: a mobile digital imaging processing environment.

    PubMed

    Lin, Meng Kuan; Nicolini, Oliver; Waxenegger, Harald; Galloway, Graham J; Ullmann, Jeremy F P; Janke, Andrew L

    2013-01-01

    Digital Imaging Processing (DIP) requires data extraction and output from a visualization tool to be consistent. Data handling and transmission between the server and a user is a systematic process in service interpretation. The use of integrated medical services for management and viewing of imaging data in combination with a mobile visualization tool can be greatly facilitated by data analysis and interpretation. This paper presents an integrated mobile application and DIP service, called M-DIP. The objective of the system is to (1) automate the direct data tiling, conversion, pre-tiling of brain images from Medical Imaging NetCDF (MINC), Neuroimaging Informatics Technology Initiative (NIFTI) to RAW formats; (2) speed up querying of imaging measurement; and (3) display high-level of images with three dimensions in real world coordinates. In addition, M-DIP provides the ability to work on a mobile or tablet device without any software installation using web-based protocols. M-DIP implements three levels of architecture with a relational middle-layer database, a stand-alone DIP server, and a mobile application logic middle level realizing user interpretation for direct querying and communication. This imaging software has the ability to display biological imaging data at multiple zoom levels and to increase its quality to meet users' expectations. Interpretation of bioimaging data is facilitated by an interface analogous to online mapping services using real world coordinate browsing. This allows mobile devices to display multiple datasets simultaneously from a remote site. M-DIP can be used as a measurement repository that can be accessed by any network environment, such as a portable mobile or tablet device. In addition, this system and combination with mobile applications are establishing a virtualization tool in the neuroinformatics field to speed interpretation services.

  2. Interpretation of Medical Imaging Data with a Mobile Application: A Mobile Digital Imaging Processing Environment

    PubMed Central

    Lin, Meng Kuan; Nicolini, Oliver; Waxenegger, Harald; Galloway, Graham J.; Ullmann, Jeremy F. P.; Janke, Andrew L.

    2013-01-01

    Digital Imaging Processing (DIP) requires data extraction and output from a visualization tool to be consistent. Data handling and transmission between the server and a user is a systematic process in service interpretation. The use of integrated medical services for management and viewing of imaging data in combination with a mobile visualization tool can be greatly facilitated by data analysis and interpretation. This paper presents an integrated mobile application and DIP service, called M-DIP. The objective of the system is to (1) automate the direct data tiling, conversion, pre-tiling of brain images from Medical Imaging NetCDF (MINC), Neuroimaging Informatics Technology Initiative (NIFTI) to RAW formats; (2) speed up querying of imaging measurement; and (3) display high-level of images with three dimensions in real world coordinates. In addition, M-DIP provides the ability to work on a mobile or tablet device without any software installation using web-based protocols. M-DIP implements three levels of architecture with a relational middle-layer database, a stand-alone DIP server, and a mobile application logic middle level realizing user interpretation for direct querying and communication. This imaging software has the ability to display biological imaging data at multiple zoom levels and to increase its quality to meet users’ expectations. Interpretation of bioimaging data is facilitated by an interface analogous to online mapping services using real world coordinate browsing. This allows mobile devices to display multiple datasets simultaneously from a remote site. M-DIP can be used as a measurement repository that can be accessed by any network environment, such as a portable mobile or tablet device. In addition, this system and combination with mobile applications are establishing a virtualization tool in the neuroinformatics field to speed interpretation services. PMID:23847587

  3. Enhancing the Development of Statistical Literacy through the Robot Bioglyph

    ERIC Educational Resources Information Center

    Bragg, Leicha A.; Koch, Jessica; Willis, Ashley

    2017-01-01

    One way to heighten students' interest in the classroom is by personalising tasks. Through designing Robot Bioglyphs students are able to explore personalised data through a creative and engaging process. By understanding, producing and interpreting data, students are able to develop their statistical literacy, which is an essential skill in…

  4. Biosurveillance applying scan statistics with multiple, disparate data sources.

    PubMed

    Burkom, Howard S

    2003-06-01

    Researchers working on the Department of Defense Global Emerging Infections System (DoD-GEIS) pilot system, the Electronic Surveillance System for the Early Notification of Community-Based Epidemics (ESSENCE), have applied scan statistics for early outbreak detection using both traditional and nontraditional data sources. These sources include medical data indexed by International Classification of Disease, 9th Revision (ICD-9) diagnosis codes, as well as less-specific, but potentially timelier, indicators such as records of over-the-counter remedy sales and of school absenteeism. Early efforts employed the Kulldorff scan statistic as implemented in the SaTScan software of the National Cancer Institute. A key obstacle to this application is that the input data streams are typically based on time-varying factors, such as consumer behavior, rather than simply on the populations of the component subregions. We have used both modeling and recent historical data distributions to obtain background spatial distributions. Data analyses have provided guidance on how to condition and model input data to avoid excessive clustering. We have used this methodology in combining data sources for both retrospective studies of known outbreaks and surveillance of high-profile events of concern to local public health authorities. We have integrated the scan statistic capability into a Microsoft Access-based system in which we may include or exclude data sources, vary time windows separately for different data sources, censor data from subsets of individual providers or subregions, adjust the background computation method, and run retrospective or simulated studies.

  5. 49 CFR Schedule G to Subpart B of... - Selected Statistical Data

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 49 Transportation 8 2011-10-01 2011-10-01 false Selected Statistical Data G Schedule G to Subpart... Statistical Data [Dollars in thousands] () Greyhound Lines, Inc. () Trailways combined () All study carriers... purpose of Schedule G is to develop selected property, labor and operational data for use in evaluating...

  6. Statistically Correct Methodology for Compositional Data in New Discriminant Function Tectonomagmatic Diagrams and Application to Ophiolite Origin

    NASA Astrophysics Data System (ADS)

    Verma, Surendra P.; Pandarinath, Kailasa; Verma, Sanjeet K.

    2011-07-01

    In the lead presentation (invited talk) of Session SE05 (Frontiers in Geochemistry with Reference to Lithospheric Evolution and Metallogeny) of AOGS2010, we have highlighted the requirement of correct statistical treatment of geochemical data. In most diagrams used for interpreting compositional data, the basic statistical assumption of open space for all variables is violated. Among these graphic tools, discrimination diagrams have been in use for nearly 40 years to decipher tectonic setting. The newer set of five tectonomagmatic discrimination diagrams published in 2006 (based on major-elements) and two sets made available in 2008 and 2011 (both based on immobile elements) fulfill all statistical requirements for correct handling of compositional data, including the multivariate nature of compositional variables, representative sampling, and probability-based tectonic field boundaries. Additionally in the most recent proposal of 2011, samples having normally distributed, discordant-outlier free, log-ratio variables were used in linear discriminant analysis. In these three sets of five diagrams each, discrimination was successfully documented for four tectonic settings (island arc, continental rift, ocean-island, and mid-ocean ridge). The discrimination diagrams have been extensively evaluated for their performance by different workers. We exemplify these two sets of new diagrams (one set based on major-elements and the other on immobile elements) using ophiolites from Boso Peninsula, Japan. This example is included for illustration purposes only and is not meant for testing of these newer diagrams. Their evaluation and comparison with older, conventional bivariate or ternary diagrams have been reported in other papers.

  7. Using statistical correlation to compare geomagnetic data sets

    NASA Astrophysics Data System (ADS)

    Stanton, T.

    2009-04-01

    The major features of data curves are often matched, to a first order, by bump and wiggle matching to arrive at an offset between data sets. This poster describes a simple statistical correlation program that has proved useful during this stage by determining the optimal correlation between geomagnetic curves using a variety of fixed and floating windows. Its utility is suggested by the fact that it is simple to run, yet generates meaningful data comparisons, often when data noise precludes the obvious matching of curve features. Data sets can be scaled, smoothed, normalised and standardised, before all possible correlations are carried out between selected overlapping portions of each curve. Best-fit offset curves can then be displayed graphically. The program was used to cross-correlate directional and palaeointensity data from Holocene lake sediments (Stanton et al., submitted) and Holocene lava flows. Some example curve matches are shown, including some that illustrate the potential of this technique when examining particularly sparse data sets. Stanton, T., Snowball, I., Zillén, L. and Wastegård, S., submitted. Detecting potential errors in varve chronology and 14C ages using palaeosecular variation curves, lead pollution history and statistical correlation. Quaternary Geochronology.

  8. Statistical summaries of fatigue data for design purposes

    NASA Technical Reports Server (NTRS)

    Wirsching, P. H.

    1983-01-01

    Two methods are discussed for constructing a design curve on the safe side of fatigue data. Both the tolerance interval and equivalent prediction interval (EPI) concepts provide such a curve while accounting for both the distribution of the estimators in small samples and the data scatter. The EPI is also useful as a mechanism for providing necessary statistics on S-N data for a full reliability analysis which includes uncertainty in all fatigue design factors. Examples of statistical analyses of the general strain life relationship are presented. The tolerance limit and EPI techniques for defining a design curve are demonstrated. Examples usng WASPALOY B and RQC-100 data demonstrate that a reliability model could be constructed by considering the fatigue strength and fatigue ductility coefficients as two independent random variables. A technique given for establishing the fatigue strength for high cycle lives relies on an extrapolation technique and also accounts for "runners." A reliability model or design value can be specified.

  9. Statistical Interpretation of the Local Field Inside Dielectrics.

    ERIC Educational Resources Information Center

    Berrera, Ruben G.; Mello, P. A.

    1982-01-01

    Compares several derivations of the Clausius-Mossotti relation to analyze consistently the nature of approximations used and their range of applicability. Also presents a statistical-mechanical calculation of the local field for classical system of harmonic oscillators interacting via the Coulomb potential. (Author/SK)

  10. Graphic Strategies for Analyzing and Interpreting Curricular Mapping Data

    PubMed Central

    Leonard, Sean T.

    2010-01-01

    Objective To describe curricular mapping strategies used in analyzing and interpreting curricular mapping data and present findings on how these strategies were used to facilitate curricular development. Design Nova Southeastern University's doctor of pharmacy curriculum was mapped to the college's educational outcomes. The mapping process included development of educational outcomes followed by analysis of course material and semi-structured interviews with course faculty members. Data collected per course outcome included learning opportunities and assessment measures used. Assessment Nearly 1,000 variables and 10,000 discrete rows of curricular data were collected. Graphic representations of curricular data were created using bar charts and stacked area graphs relating the learning opportunities to the educational outcomes. Graphs were used in the curricular evaluation and development processes to facilitate the identification of curricular holes, sequencing misalignments, learning opportunities, and assessment measures. Conclusion Mapping strategies that use graphic representations of curricular data serve as effective diagnostic and curricular development tools. PMID:20798804

  11. Laboratory study supporting the interpretation of Solar Dynamics Observatory data

    DOE PAGES

    Trabert, E.; Beiersdorfer, P.

    2015-01-29

    High-resolution extreme ultraviolet spectra of ions in an electron beam ion trap are investigated as a laboratory complement of the moderate-resolution observation bands of the AIA experiment on board the Solar Dynamics Observatory (SDO) spacecraft. Here, the latter observations depend on dominant iron lines of various charge states which in combination yield temperature information on the solar plasma. Our measurements suggest additions to the spectral models that are used in the SDO data interpretation. In the process, we also note a fair number of inconsistencies among the wavelength reference data bases.

  12. A Generalized Approach for the Interpretation of Geophysical Well Logs in Ground-Water Studies - Theory and Application

    USGS Publications Warehouse

    Paillet, Frederick L.; Crowder, R.E.

    1996-01-01

    Quantitative analysis of geophysical logs in ground-water studies often involves at least as broad a range of applications and variation in lithology as is typically encountered in petroleum exploration, making such logs difficult to calibrate and complicating inversion problem formulation. At the same time, data inversion and analysis depend on inversion model formulation and refinement, so that log interpretation cannot be deferred to a geophysical log specialist unless active involvement with interpretation can be maintained by such an expert over the lifetime of the project. We propose a generalized log-interpretation procedure designed to guide hydrogeologists in the interpretation of geophysical logs, and in the integration of log data into ground-water models that may be systematically refined and improved in an iterative way. The procedure is designed to maximize the effective use of three primary contributions from geophysical logs: (1) The continuous depth scale of the measurements along the well bore; (2) The in situ measurement of lithologic properties and the correlation with hydraulic properties of the formations over a finite sample volume; and (3) Multiple independent measurements that can potentially be inverted for multiple physical or hydraulic properties of interest. The approach is formulated in the context of geophysical inversion theory, and is designed to be interfaced with surface geophysical soundings and conventional hydraulic testing. The step-by-step procedures given in our generalized interpretation and inversion technique are based on both qualitative analysis designed to assist formulation of the interpretation model, and quantitative analysis used to assign numerical values to model parameters. The approach bases a decision as to whether quantitative inversion is statistically warranted by formulating an over-determined inversion. If no such inversion is consistent with the inversion model, quantitative inversion is judged not

  13. A log-Weibull spatial scan statistic for time to event data.

    PubMed

    Usman, Iram; Rosychuk, Rhonda J

    2018-06-13

    Spatial scan statistics have been used for the identification of geographic clusters of elevated numbers of cases of a condition such as disease outbreaks. These statistics accompanied by the appropriate distribution can also identify geographic areas with either longer or shorter time to events. Other authors have proposed the spatial scan statistics based on the exponential and Weibull distributions. We propose the log-Weibull as an alternative distribution for the spatial scan statistic for time to events data and compare and contrast the log-Weibull and Weibull distributions through simulation studies. The effect of type I differential censoring and power have been investigated through simulated data. Methods are also illustrated on time to specialist visit data for discharged patients presenting to emergency departments for atrial fibrillation and flutter in Alberta during 2010-2011. We found northern regions of Alberta had longer times to specialist visit than other areas. We proposed the spatial scan statistic for the log-Weibull distribution as a new approach for detecting spatial clusters for time to event data. The simulation studies suggest that the test performs well for log-Weibull data.

  14. A special form of SPD covariance matrix for interpretation and visualization of data manipulated with Riemannian geometry

    NASA Astrophysics Data System (ADS)

    Congedo, Marco; Barachant, Alexandre

    2015-01-01

    Currently the Riemannian geometry of symmetric positive definite (SPD) matrices is gaining momentum as a powerful tool in a wide range of engineering applications such as image, radar and biomedical data signal processing. If the data is not natively represented in the form of SPD matrices, typically we may summarize them in such form by estimating covariance matrices of the data. However once we manipulate such covariance matrices on the Riemannian manifold we lose the representation in the original data space. For instance, we can evaluate the geometric mean of a set of covariance matrices, but not the geometric mean of the data generating the covariance matrices, the space of interest in which the geometric mean can be interpreted. As a consequence, Riemannian information geometry is often perceived by non-experts as a "black-box" tool and this perception prevents a wider adoption in the scientific community. Hereby we show that we can overcome this limitation by constructing a special form of SPD matrix embedding both the covariance structure of the data and the data itself. Incidentally, whenever the original data can be represented in the form of a generic data matrix (not even square), this special SPD matrix enables an exhaustive and unique description of the data up to second-order statistics. This is achieved embedding the covariance structure of both the rows and columns of the data matrix, allowing naturally a wide range of possible applications and bringing us over and above just an interpretability issue. We demonstrate the method by manipulating satellite images (pansharpening) and event-related potentials (ERPs) of an electroencephalography brain-computer interface (BCI) study. The first example illustrates the effect of moving along geodesics in the original data space and the second provides a novel estimation of ERP average (geometric mean), showing that, in contrast to the usual arithmetic mean, this estimation is robust to outliers. In

  15. Physics in Perspective Volume II, Part C, Statistical Data.

    ERIC Educational Resources Information Center

    National Academy of Sciences - National Research Council, Washington, DC. Physics Survey Committee.

    Statistical data relating to the sociology and economics of the physics enterprise are presented and explained. The data are divided into three sections: manpower data, data on funding and costs, and data on the literature of physics. Each section includes numerous studies, with notes on the sources and types of data, gathering procedures, and…

  16. Advances in Statistical Methods for Substance Abuse Prevention Research

    PubMed Central

    MacKinnon, David P.; Lockwood, Chondra M.

    2010-01-01

    The paper describes advances in statistical methods for prevention research with a particular focus on substance abuse prevention. Standard analysis methods are extended to the typical research designs and characteristics of the data collected in prevention research. Prevention research often includes longitudinal measurement, clustering of data in units such as schools or clinics, missing data, and categorical as well as continuous outcome variables. Statistical methods to handle these features of prevention data are outlined. Developments in mediation, moderation, and implementation analysis allow for the extraction of more detailed information from a prevention study. Advancements in the interpretation of prevention research results include more widespread calculation of effect size and statistical power, the use of confidence intervals as well as hypothesis testing, detailed causal analysis of research findings, and meta-analysis. The increased availability of statistical software has contributed greatly to the use of new methods in prevention research. It is likely that the Internet will continue to stimulate the development and application of new methods. PMID:12940467

  17. Tips and Tricks for Successful Application of Statistical Methods to Biological Data.

    PubMed

    Schlenker, Evelyn

    2016-01-01

    This chapter discusses experimental design and use of statistics to describe characteristics of data (descriptive statistics) and inferential statistics that test the hypothesis posed by the investigator. Inferential statistics, based on probability distributions, depend upon the type and distribution of the data. For data that are continuous, randomly and independently selected, as well as normally distributed more powerful parametric tests such as Student's t test and analysis of variance (ANOVA) can be used. For non-normally distributed or skewed data, transformation of the data (using logarithms) may normalize the data allowing use of parametric tests. Alternatively, with skewed data nonparametric tests can be utilized, some of which rely on data that are ranked prior to statistical analysis. Experimental designs and analyses need to balance between committing type 1 errors (false positives) and type 2 errors (false negatives). For a variety of clinical studies that determine risk or benefit, relative risk ratios (random clinical trials and cohort studies) or odds ratios (case-control studies) are utilized. Although both use 2 × 2 tables, their premise and calculations differ. Finally, special statistical methods are applied to microarray and proteomics data, since the large number of genes or proteins evaluated increase the likelihood of false discoveries. Additional studies in separate samples are used to verify microarray and proteomic data. Examples in this chapter and references are available to help continued investigation of experimental designs and appropriate data analysis.

  18. Statistical methods and neural network approaches for classification of data from multiple sources

    NASA Technical Reports Server (NTRS)

    Benediktsson, Jon Atli; Swain, Philip H.

    1990-01-01

    Statistical methods for classification of data from multiple data sources are investigated and compared to neural network models. A problem with using conventional multivariate statistical approaches for classification of data of multiple types is in general that a multivariate distribution cannot be assumed for the classes in the data sources. Another common problem with statistical classification methods is that the data sources are not equally reliable. This means that the data sources need to be weighted according to their reliability but most statistical classification methods do not have a mechanism for this. This research focuses on statistical methods which can overcome these problems: a method of statistical multisource analysis and consensus theory. Reliability measures for weighting the data sources in these methods are suggested and investigated. Secondly, this research focuses on neural network models. The neural networks are distribution free since no prior knowledge of the statistical distribution of the data is needed. This is an obvious advantage over most statistical classification methods. The neural networks also automatically take care of the problem involving how much weight each data source should have. On the other hand, their training process is iterative and can take a very long time. Methods to speed up the training procedure are introduced and investigated. Experimental results of classification using both neural network models and statistical methods are given, and the approaches are compared based on these results.

  19. Interpretation of aeromagnetic data in the Jameson Land Basin, central East Greenland: Structures and related mineralized systems

    NASA Astrophysics Data System (ADS)

    Brethes, Anaïs; Guarnieri, Pierpaolo; Rasmussen, Thorkild Maack; Bauer, Tobias Erich

    2018-01-01

    This paper provides a detailed interpretation of several aeromagnetic datasets over the Jameson Land Basin in central East Greenland. The interpretation is based on texture and lineament analysis of magnetic data and derivatives of these, in combination with geological field observations. Numerous faults and Cenozoic intrusions were identified and a chronological interpretation of the events responsible for the magnetic features is proposed built on crosscutting relationships and correlated with absolute ages. Lineaments identified in enhanced magnetic data are compared with structures controlling the mineralized systems occurring in the area and form the basis for the interpretations presented in this paper. Several structures associated with base metal mineralization systems that were known at a local scale are here delineated at a larger scale; allowing the identification of areas displaying favorable geological settings for mineralization. This study demonstrates the usefulness of high-resolution airborne magnetic data for detailed structural interpretation and mineral exploration in geological contexts such as the Jameson Land Basin.

  20. Novice Interpretations of Progress Monitoring Graphs: Extreme Values and Graphical Aids

    ERIC Educational Resources Information Center

    Newell, Kirsten W.; Christ, Theodore J.

    2017-01-01

    Curriculum-Based Measurement of Reading (CBM-R) is frequently used to monitor instructional effects and evaluate response to instruction. Educators often view the data graphically on a time-series graph that might include a variety of statistical and visual aids, which are intended to facilitate the interpretation. This study evaluated the effects…

  1. Testing independence of bivariate interval-censored data using modified Kendall's tau statistic.

    PubMed

    Kim, Yuneung; Lim, Johan; Park, DoHwan

    2015-11-01

    In this paper, we study a nonparametric procedure to test independence of bivariate interval censored data; for both current status data (case 1 interval-censored data) and case 2 interval-censored data. To do it, we propose a score-based modification of the Kendall's tau statistic for bivariate interval-censored data. Our modification defines the Kendall's tau statistic with expected numbers of concordant and disconcordant pairs of data. The performance of the modified approach is illustrated by simulation studies and application to the AIDS study. We compare our method to alternative approaches such as the two-stage estimation method by Sun et al. (Scandinavian Journal of Statistics, 2006) and the multiple imputation method by Betensky and Finkelstein (Statistics in Medicine, 1999b). © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  2. Analysis of statistical misconception in terms of statistical reasoning

    NASA Astrophysics Data System (ADS)

    Maryati, I.; Priatna, N.

    2018-05-01

    Reasoning skill is needed for everyone to face globalization era, because every person have to be able to manage and use information from all over the world which can be obtained easily. Statistical reasoning skill is the ability to collect, group, process, interpret, and draw conclusion of information. Developing this skill can be done through various levels of education. However, the skill is low because many people assume that statistics is just the ability to count and using formulas and so do students. Students still have negative attitude toward course which is related to research. The purpose of this research is analyzing students’ misconception in descriptive statistic course toward the statistical reasoning skill. The observation was done by analyzing the misconception test result and statistical reasoning skill test; observing the students’ misconception effect toward statistical reasoning skill. The sample of this research was 32 students of math education department who had taken descriptive statistic course. The mean value of misconception test was 49,7 and standard deviation was 10,6 whereas the mean value of statistical reasoning skill test was 51,8 and standard deviation was 8,5. If the minimal value is 65 to state the standard achievement of a course competence, students’ mean value is lower than the standard competence. The result of students’ misconception study emphasized on which sub discussion that should be considered. Based on the assessment result, it was found that students’ misconception happen on this: 1) writing mathematical sentence and symbol well, 2) understanding basic definitions, 3) determining concept that will be used in solving problem. In statistical reasoning skill, the assessment was done to measure reasoning from: 1) data, 2) representation, 3) statistic format, 4) probability, 5) sample, and 6) association.

  3. Laterally constrained inversion for CSAMT data interpretation

    NASA Astrophysics Data System (ADS)

    Wang, Ruo; Yin, Changchun; Wang, Miaoyue; Di, Qingyun

    2015-10-01

    Laterally constrained inversion (LCI) has been successfully applied to the inversion of dc resistivity, TEM and airborne EM data. However, it hasn't been yet applied to the interpretation of controlled-source audio-frequency magnetotelluric (CSAMT) data. In this paper, we apply the LCI method for CSAMT data inversion by preconditioning the Jacobian matrix. We apply a weighting matrix to Jacobian to balance the sensitivity of model parameters, so that the resolution with respect to different model parameters becomes more uniform. Numerical experiments confirm that this can improve the convergence of the inversion. We first invert a synthetic dataset with and without noise to investigate the effect of LCI applications to CSAMT data, for the noise free data, the results show that the LCI method can recover the true model better compared to the traditional single-station inversion; and for the noisy data, the true model is recovered even with a noise level of 8%, indicating that LCI inversions are to some extent noise insensitive. Then, we re-invert two CSAMT datasets collected respectively in a watershed and a coal mine area in Northern China and compare our results with those from previous inversions. The comparison with the previous inversion in a coal mine shows that LCI method delivers smoother layer interfaces that well correlate to seismic data, while comparison with a global searching algorithm of simulated annealing (SA) in a watershed shows that though both methods deliver very similar good results, however, LCI algorithm presented in this paper runs much faster. The inversion results for the coal mine CSAMT survey show that a conductive water-bearing zone that was not revealed by the previous inversions has been identified by the LCI. This further demonstrates that the method presented in this paper works for CSAMT data inversion.

  4. Exploring the Relationship Between Eye Movements and Electrocardiogram Interpretation Accuracy

    NASA Astrophysics Data System (ADS)

    Davies, Alan; Brown, Gavin; Vigo, Markel; Harper, Simon; Horseman, Laura; Splendiani, Bruno; Hill, Elspeth; Jay, Caroline

    2016-12-01

    Interpretation of electrocardiograms (ECGs) is a complex task involving visual inspection. This paper aims to improve understanding of how practitioners perceive ECGs, and determine whether visual behaviour can indicate differences in interpretation accuracy. A group of healthcare practitioners (n = 31) who interpret ECGs as part of their clinical role were shown 11 commonly encountered ECGs on a computer screen. The participants’ eye movement data were recorded as they viewed the ECGs and attempted interpretation. The Jensen-Shannon distance was computed for the distance between two Markov chains, constructed from the transition matrices (visual shifts from and to ECG leads) of the correct and incorrect interpretation groups for each ECG. A permutation test was then used to compare this distance against 10,000 randomly shuffled groups made up of the same participants. The results demonstrated a statistically significant (α  0.05) result in 5 of the 11 stimuli demonstrating that the gaze shift between the ECG leads is different between the groups making correct and incorrect interpretations and therefore a factor in interpretation accuracy. The results shed further light on the relationship between visual behaviour and ECG interpretation accuracy, providing information that can be used to improve both human and automated interpretation approaches.

  5. U-interpreter

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Arvind; Gostelow, K.P.

    1982-02-01

    The author argues that by giving a unique name to every activity generated during a computation, the u-interpreter can provide greater concurrency in the interpretation of data flow graphs. 19 references.

  6. Geological Interpretation of PSInSAR Data at Regional Scale

    PubMed Central

    Meisina, Claudia; Zucca, Francesco; Notti, Davide; Colombo, Alessio; Cucchi, Anselmo; Savio, Giuliano; Giannico, Chiara; Bianchi, Marco

    2008-01-01

    Results of a PSInSAR™ project carried out by the Regional Agency for Environmental Protection (ARPA) in Piemonte Region (Northern Italy) are presented and discussed. A methodology is proposed for the interpretation of the PSInSAR™ data at the regional scale, easy to use by the public administrations and by civil protection authorities. Potential and limitations of the PSInSAR™ technique for ground movement detection on a regional scale and monitoring are then estimated in relationship with different geological processes and various geological environments. PMID:27873940

  7. Performing Inferential Statistics Prior to Data Collection

    ERIC Educational Resources Information Center

    Trafimow, David; MacDonald, Justin A.

    2017-01-01

    Typically, in education and psychology research, the investigator collects data and subsequently performs descriptive and inferential statistics. For example, a researcher might compute group means and use the null hypothesis significance testing procedure to draw conclusions about the populations from which the groups were drawn. We propose an…

  8. Enhancements to the Engine Data Interpretation System (EDIS)

    NASA Technical Reports Server (NTRS)

    Hofmann, Martin O.

    1993-01-01

    The Engine Data Interpretation System (EDIS) expert system project assists the data review personnel at NASA/MSFC in performing post-test data analysis and engine diagnosis of the Space Shuttle Main Engine (SSME). EDIS uses knowledge of the engine, its components, and simple thermodynamic principles instead of, and in addition to, heuristic rules gathered from the engine experts. EDIS reasons in cooperation with human experts, following roughly the pattern of logic exhibited by human experts. EDIS concentrates on steady-state static faults, such as small leaks, and component degradations, such as pump efficiencies. The objective of this contract was to complete the set of engine component models, integrate heuristic rules into EDIS, integrate the Power Balance Model into EDIS, and investigate modification of the qualitative reasoning mechanisms to allow 'fuzzy' value classification. The result of this contract is an operational version of EDIS. EDIS will become a module of the Post-Test Diagnostic System (PTDS) and will, in this context, provide system-level diagnostic capabilities which integrate component-specific findings provided by other modules.

  9. Enhancements to the Engine Data Interpretation System (EDIS)

    NASA Technical Reports Server (NTRS)

    Hofmann, Martin O.

    1993-01-01

    The Engine Data Interpretation System (EDIS) expert system project assists the data review personnel at NASA/MSFC in performing post-test data analysis and engine diagnosis of the Space Shuttle Main Engine (SSME). EDIS uses knowledge of the engine, its components, and simple thermodynamic principles instead of, and in addition to, heuristic rules gathered from the engine experts. EDIS reasons in cooperation with human experts, following roughly the pattern of logic exhibited by human experts. EDIS concentrates on steady-state static faults, such as small leaks, and component degradations, such as pump efficiencies. The objective of this contract was to complete the set of engine component models, integrate heuristic rules into EDIS, integrate the Power Balance Model into EDIS, and investigate modification of the qualitative reasoning mechanisms to allow 'fuzzy' value classification. The results of this contract is an operational version of EDIS. EDIS will become a module of the Post-Test Diagnostic System (PTDS) and will, in this context, provide system-level diagnostic capabilities which integrate component-specific findings provided by other modules.

  10. "Magnitude-based inference": a statistical review.

    PubMed

    Welsh, Alan H; Knight, Emma J

    2015-04-01

    We consider "magnitude-based inference" and its interpretation by examining in detail its use in the problem of comparing two means. We extract from the spreadsheets, which are provided to users of the analysis (http://www.sportsci.org/), a precise description of how "magnitude-based inference" is implemented. We compare the implemented version of the method with general descriptions of it and interpret the method in familiar statistical terms. We show that "magnitude-based inference" is not a progressive improvement on modern statistics. The additional probabilities introduced are not directly related to the confidence interval but, rather, are interpretable either as P values for two different nonstandard tests (for different null hypotheses) or as approximate Bayesian calculations, which also lead to a type of test. We also discuss sample size calculations associated with "magnitude-based inference" and show that the substantial reduction in sample sizes claimed for the method (30% of the sample size obtained from standard frequentist calculations) is not justifiable so the sample size calculations should not be used. Rather than using "magnitude-based inference," a better solution is to be realistic about the limitations of the data and use either confidence intervals or a fully Bayesian analysis.

  11. Data Flow Analysis and Visualization for Spatiotemporal Statistical Data without Trajectory Information.

    PubMed

    Kim, Seokyeon; Jeong, Seongmin; Woo, Insoo; Jang, Yun; Maciejewski, Ross; Ebert, David S

    2018-03-01

    Geographic visualization research has focused on a variety of techniques to represent and explore spatiotemporal data. The goal of those techniques is to enable users to explore events and interactions over space and time in order to facilitate the discovery of patterns, anomalies and relationships within the data. However, it is difficult to extract and visualize data flow patterns over time for non-directional statistical data without trajectory information. In this work, we develop a novel flow analysis technique to extract, represent, and analyze flow maps of non-directional spatiotemporal data unaccompanied by trajectory information. We estimate a continuous distribution of these events over space and time, and extract flow fields for spatial and temporal changes utilizing a gravity model. Then, we visualize the spatiotemporal patterns in the data by employing flow visualization techniques. The user is presented with temporal trends of geo-referenced discrete events on a map. As such, overall spatiotemporal data flow patterns help users analyze geo-referenced temporal events, such as disease outbreaks, crime patterns, etc. To validate our model, we discard the trajectory information in an origin-destination dataset and apply our technique to the data and compare the derived trajectories and the original. Finally, we present spatiotemporal trend analysis for statistical datasets including twitter data, maritime search and rescue events, and syndromic surveillance.

  12. Statistical analysis of water-quality data containing multiple detection limits II: S-language software for nonparametric distribution modeling and hypothesis testing

    USGS Publications Warehouse

    Lee, L.; Helsel, D.

    2007-01-01

    Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data-perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan-Meier (K-M) method. This method has seen widespread usage in the medical sciences within a general framework termed "survival analysis" where it is employed with right-censored time-to-failure data. However, K-M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K-M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K-M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K-M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.

  13. Thoth: Software for data visualization & statistics

    NASA Astrophysics Data System (ADS)

    Laher, R. R.

    2016-10-01

    Thoth is a standalone software application with a graphical user interface for making it easy to query, display, visualize, and analyze tabular data stored in relational databases and data files. From imported data tables, it can create pie charts, bar charts, scatter plots, and many other kinds of data graphs with simple menus and mouse clicks (no programming required), by leveraging the open-source JFreeChart library. It also computes useful table-column data statistics. A mature tool, having underwent development and testing over several years, it is written in the Java computer language, and hence can be run on any computing platform that has a Java Virtual Machine and graphical-display capability. It can be downloaded and used by anyone free of charge, and has general applicability in science, engineering, medical, business, and other fields. Special tools and features for common tasks in astronomy and astrophysical research are included in the software.

  14. Beyond data collection in digital mapping: interpretation, sketching and thought process elements in geological map making

    NASA Astrophysics Data System (ADS)

    Watkins, Hannah; Bond, Clare; Butler, Rob

    2016-04-01

    Geological mapping techniques have advanced significantly in recent years from paper fieldslips to Toughbook, smartphone and tablet mapping; but how do the methods used to create a geological map affect the thought processes that result in the final map interpretation? Geological maps have many key roles in the field of geosciences including understanding geological processes and geometries in 3D, interpreting geological histories and understanding stratigraphic relationships in 2D and 3D. Here we consider the impact of the methods used to create a map on the thought processes that result in the final geological map interpretation. As mapping technology has advanced in recent years, the way in which we produce geological maps has also changed. Traditional geological mapping is undertaken using paper fieldslips, pencils and compass clinometers. The map interpretation evolves through time as data is collected. This interpretive process that results in the final geological map is often supported by recording in a field notebook, observations, ideas and alternative geological models explored with the use of sketches and evolutionary diagrams. In combination the field map and notebook can be used to challenge the map interpretation and consider its uncertainties. These uncertainties and the balance of data to interpretation are often lost in the creation of published 'fair' copy geological maps. The advent of Toughbooks, smartphones and tablets in the production of geological maps has changed the process of map creation. Digital data collection, particularly through the use of inbuilt gyrometers in phones and tablets, has changed smartphones into geological mapping tools that can be used to collect lots of geological data quickly. With GPS functionality this data is also geospatially located, assuming good GPS connectivity, and can be linked to georeferenced infield photography. In contrast line drawing, for example for lithological boundary interpretation and sketching

  15. Effects of ensemble and summary displays on interpretations of geospatial uncertainty data.

    PubMed

    Padilla, Lace M; Ruginski, Ian T; Creem-Regehr, Sarah H

    2017-01-01

    Ensemble and summary displays are two widely used methods to represent visual-spatial uncertainty; however, there is disagreement about which is the most effective technique to communicate uncertainty to the general public. Visualization scientists create ensemble displays by plotting multiple data points on the same Cartesian coordinate plane. Despite their use in scientific practice, it is more common in public presentations to use visualizations of summary displays, which scientists create by plotting statistical parameters of the ensemble members. While prior work has demonstrated that viewers make different decisions when viewing summary and ensemble displays, it is unclear what components of the displays lead to diverging judgments. This study aims to compare the salience of visual features - or visual elements that attract bottom-up attention - as one possible source of diverging judgments made with ensemble and summary displays in the context of hurricane track forecasts. We report that salient visual features of both ensemble and summary displays influence participant judgment. Specifically, we find that salient features of summary displays of geospatial uncertainty can be misunderstood as displaying size information. Further, salient features of ensemble displays evoke judgments that are indicative of accurate interpretations of the underlying probability distribution of the ensemble data. However, when participants use ensemble displays to make point-based judgments, they may overweight individual ensemble members in their decision-making process. We propose that ensemble displays are a promising alternative to summary displays in a geospatial context but that decisions about visualization methods should be informed by the viewer's task.

  16. An Exercise in Exploring Big Data for Producing Reliable Statistical Information.

    PubMed

    Rey-Del-Castillo, Pilar; Cardeñosa, Jesús

    2016-06-01

    The availability of copious data about many human, social, and economic phenomena is considered an opportunity for the production of official statistics. National statistical organizations and other institutions are more and more involved in new projects for developing what is sometimes seen as a possible change of paradigm in the way statistical figures are produced. Nevertheless, there are hardly any systems in production using Big Data sources. Arguments of confidentiality, data ownership, representativeness, and others make it a difficult task to get results in the short term. Using Call Detail Records from Ivory Coast as an illustration, this article shows some of the issues that must be dealt with when producing statistical indicators from Big Data sources. A proposal of a graphical method to evaluate one specific aspect of the quality of the computed figures is also presented, demonstrating that the visual insight provided improves the results obtained using other traditional procedures.

  17. Viral epidemics in a cell culture: novel high resolution data and their interpretation by a percolation theory based model.

    PubMed

    Gönci, Balázs; Németh, Valéria; Balogh, Emeric; Szabó, Bálint; Dénes, Ádám; Környei, Zsuzsanna; Vicsek, Tamás

    2010-12-20

    Because of its relevance to everyday life, the spreading of viral infections has been of central interest in a variety of scientific communities involved in fighting, preventing and theoretically interpreting epidemic processes. Recent large scale observations have resulted in major discoveries concerning the overall features of the spreading process in systems with highly mobile susceptible units, but virtually no data are available about observations of infection spreading for a very large number of immobile units. Here we present the first detailed quantitative documentation of percolation-type viral epidemics in a highly reproducible in vitro system consisting of tens of thousands of virtually motionless cells. We use a confluent astroglial monolayer in a Petri dish and induce productive infection in a limited number of cells with a genetically modified herpesvirus strain. This approach allows extreme high resolution tracking of the spatio-temporal development of the epidemic. We show that a simple model is capable of reproducing the basic features of our observations, i.e., the observed behaviour is likely to be applicable to many different kinds of systems. Statistical physics inspired approaches to our data, such as fractal dimension of the infected clusters as well as their size distribution, seem to fit into a percolation theory based interpretation. We suggest that our observations may be used to model epidemics in more complex systems, which are difficult to study in isolation.

  18. Viral Epidemics in a Cell Culture: Novel High Resolution Data and Their Interpretation by a Percolation Theory Based Model

    PubMed Central

    Gönci, Balázs; Németh, Valéria; Balogh, Emeric; Szabó, Bálint; Dénes, Ádám; Környei, Zsuzsanna; Vicsek, Tamás

    2010-01-01

    Because of its relevance to everyday life, the spreading of viral infections has been of central interest in a variety of scientific communities involved in fighting, preventing and theoretically interpreting epidemic processes. Recent large scale observations have resulted in major discoveries concerning the overall features of the spreading process in systems with highly mobile susceptible units, but virtually no data are available about observations of infection spreading for a very large number of immobile units. Here we present the first detailed quantitative documentation of percolation-type viral epidemics in a highly reproducible in vitro system consisting of tens of thousands of virtually motionless cells. We use a confluent astroglial monolayer in a Petri dish and induce productive infection in a limited number of cells with a genetically modified herpesvirus strain. This approach allows extreme high resolution tracking of the spatio-temporal development of the epidemic. We show that a simple model is capable of reproducing the basic features of our observations, i.e., the observed behaviour is likely to be applicable to many different kinds of systems. Statistical physics inspired approaches to our data, such as fractal dimension of the infected clusters as well as their size distribution, seem to fit into a percolation theory based interpretation. We suggest that our observations may be used to model epidemics in more complex systems, which are difficult to study in isolation. PMID:21187920

  19. Small Sample Statistics for Incomplete Nonnormal Data: Extensions of Complete Data Formulae and a Monte Carlo Comparison

    ERIC Educational Resources Information Center

    Savalei, Victoria

    2010-01-01

    Incomplete nonnormal data are common occurrences in applied research. Although these 2 problems are often dealt with separately by methodologists, they often cooccur. Very little has been written about statistics appropriate for evaluating models with such data. This article extends several existing statistics for complete nonnormal data to…

  20. Using Data Mining to Teach Applied Statistics and Correlation

    ERIC Educational Resources Information Center

    Hartnett, Jessica L.

    2016-01-01

    This article describes two class activities that introduce the concept of data mining and very basic data mining analyses. Assessment data suggest that students learned some of the conceptual basics of data mining, understood some of the ethical concerns related to the practice, and were able to perform correlations via the Statistical Package for…

  1. Use of keyword hierarchies to interpret gene expression patterns.

    PubMed

    Masys, D R; Welsh, J B; Lynn Fink, J; Gribskov, M; Klacansky, I; Corbeil, J

    2001-04-01

    High-density microarray technology permits the quantitative and simultaneous monitoring of thousands of genes. The interpretation challenge is to extract relevant information from this large amount of data. A growing variety of statistical analysis approaches are available to identify clusters of genes that share common expression characteristics, but provide no information regarding the biological similarities of genes within clusters. The published literature provides a potential source of information to assist in interpretation of clustering results. We describe a data mining method that uses indexing terms ('keywords') from the published literature linked to specific genes to present a view of the conceptual similarity of genes within a cluster or group of interest. The method takes advantage of the hierarchical nature of Medical Subject Headings used to index citations in the MEDLINE database, and the registry numbers applied to enzymes.

  2. Statistical Policy Working Paper 25. Data Editing Workshop and Exposition

    DOT National Transportation Integrated Search

    1996-12-01

    Statistical Policy Working Paper 25 is the written record of the Data Editing Workshop and Exposition held March 22, 1996, at the Bureau of Labor Statistics (BLS) Conference and Training Center. The program consisted of 44 oral presentations and 19 s...

  3. Teaching Real Data Interpretation with Models (TRIM): Analysis of Student Dialogue in a Large-Enrollment Cell and Developmental Biology Course.

    PubMed

    Zagallo, Patricia; Meddleton, Shanice; Bolger, Molly S

    2016-01-01

    We present our design for a cell biology course to integrate content with scientific practices, specifically data interpretation and model-based reasoning. A 2-yr research project within this course allowed us to understand how students interpret authentic biological data in this setting. Through analysis of written work, we measured the extent to which students' data interpretations were valid and/or generative. By analyzing small-group audio recordings during in-class activities, we demonstrated how students used instructor-provided models to build and refine data interpretations. Often, students used models to broaden the scope of data interpretations, tying conclusions to a biological significance. Coding analysis revealed several strategies and challenges that were common among students in this collaborative setting. Spontaneous argumentation was present in 82% of transcripts, suggesting that data interpretation using models may be a way to elicit this important disciplinary practice. Argumentation dialogue included frequent co-construction of claims backed by evidence from data. Other common strategies included collaborative decoding of data representations and noticing data patterns before making interpretive claims. Focusing on irrelevant data patterns was the most common challenge. Our findings provide evidence to support the feasibility of supporting students' data-interpretation skills within a large lecture course. © 2016 P. Zagallo et al. CBE—Life Sciences Education © 2016 The American Society for Cell Biology. This article is distributed by The American Society for Cell Biology under license from the author(s). It is available to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0).

  4. Three-Way Analysis of Spectrospatial Electromyography Data: Classification and Interpretation

    PubMed Central

    Kauppi, Jukka-Pekka; Hahne, Janne; Müller, Klaus-Robert; Hyvärinen, Aapo

    2015-01-01

    Classifying multivariate electromyography (EMG) data is an important problem in prosthesis control as well as in neurophysiological studies and diagnosis. With modern high-density EMG sensor technology, it is possible to capture the rich spectrospatial structure of the myoelectric activity. We hypothesize that multi-way machine learning methods can efficiently utilize this structure in classification as well as reveal interesting patterns in it. To this end, we investigate the suitability of existing three-way classification methods to EMG-based hand movement classification in spectrospatial domain, as well as extend these methods by sparsification and regularization. We propose to use Fourier-domain independent component analysis as preprocessing to improve classification and interpretability of the results. In high-density EMG experiments on hand movements across 10 subjects, three-way classification yielded higher average performance compared with state-of-the art classification based on temporal features, suggesting that the three-way analysis approach can efficiently utilize detailed spectrospatial information of high-density EMG. Phase and amplitude patterns of features selected by the classifier in finger-movement data were found to be consistent with known physiology. Thus, our approach can accurately resolve hand and finger movements on the basis of detailed spectrospatial information, and at the same time allows for physiological interpretation of the results. PMID:26039100

  5. 42 CFR 417.806 - Financial records, statistical data, and cost finding.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 42 Public Health 3 2011-10-01 2011-10-01 false Financial records, statistical data, and cost finding. 417.806 Section 417.806 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF..., statistical data, and cost finding. (a) The principles specified in § 417.568 apply to HCPPs, except those in...

  6. 42 CFR 417.806 - Financial records, statistical data, and cost finding.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 42 Public Health 3 2010-10-01 2010-10-01 false Financial records, statistical data, and cost finding. 417.806 Section 417.806 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF..., statistical data, and cost finding. (a) The principles specified in § 417.568 apply to HCPPs, except those in...

  7. Variation in reaction norms: Statistical considerations and biological interpretation.

    PubMed

    Morrissey, Michael B; Liefting, Maartje

    2016-09-01

    Analysis of reaction norms, the functions by which the phenotype produced by a given genotype depends on the environment, is critical to studying many aspects of phenotypic evolution. Different techniques are available for quantifying different aspects of reaction norm variation. We examine what biological inferences can be drawn from some of the more readily applicable analyses for studying reaction norms. We adopt a strongly biologically motivated view, but draw on statistical theory to highlight strengths and drawbacks of different techniques. In particular, consideration of some formal statistical theory leads to revision of some recently, and forcefully, advocated opinions on reaction norm analysis. We clarify what simple analysis of the slope between mean phenotype in two environments can tell us about reaction norms, explore the conditions under which polynomial regression can provide robust inferences about reaction norm shape, and explore how different existing approaches may be used to draw inferences about variation in reaction norm shape. We show how mixed model-based approaches can provide more robust inferences than more commonly used multistep statistical approaches, and derive new metrics of the relative importance of variation in reaction norm intercepts, slopes, and curvatures. © 2016 The Author(s). Evolution © 2016 The Society for the Study of Evolution.

  8. Statistical summaries of selected Iowa streamflow data through September 2013

    USGS Publications Warehouse

    Eash, David A.; O'Shea, Padraic S.; Weber, Jared R.; Nguyen, Kevin T.; Montgomery, Nicholas L.; Simonson, Adrian J.

    2016-01-04

    Statistical summaries of streamflow data collected at 184 streamgages in Iowa are presented in this report. All streamgages included for analysis have at least 10 years of continuous record collected before or through September 2013. This report is an update to two previously published reports that presented statistical summaries of selected Iowa streamflow data through September 1988 and September 1996. The statistical summaries include (1) monthly and annual flow durations, (2) annual exceedance probabilities of instantaneous peak discharges (flood frequencies), (3) annual exceedance probabilities of high discharges, and (4) annual nonexceedance probabilities of low discharges and seasonal low discharges. Also presented for each streamgage are graphs of the annual mean discharges, mean annual mean discharges, 50-percent annual flow-duration discharges (median flows), harmonic mean flows, mean daily mean discharges, and flow-duration curves. Two sets of statistical summaries are presented for each streamgage, which include (1) long-term statistics for the entire period of streamflow record and (2) recent-term statistics for or during the 30-year period of record from 1984 to 2013. The recent-term statistics are only calculated for streamgages with streamflow records pre-dating the 1984 water year and with at least 10 years of record during 1984–2013. The streamflow statistics in this report are not adjusted for the effects of water use; although some of this water is used consumptively, most of it is returned to the streams.

  9. On the use of statistical methods to interpret electrical resistivity data from the Eumsung basin (Cretaceous), Korea

    NASA Astrophysics Data System (ADS)

    Kim, Ji-Soo; Han, Soo-Hyung; Ryang, Woo-Hun

    2001-12-01

    Electrical resistivity mapping was conducted to delineate boundaries and architecture of the Eumsung Basin Cretaceous. Basin boundaries are effectively clarified in electrical dipole-dipole resistivity sections as high-resistivity contrast bands. High resistivities most likely originate from the basement of Jurassic granite and Precambrian gneiss, contrasting with the lower resistivities from infilled sedimentary rocks. The electrical properties of basin-margin boundaries are compatible with the results of vertical electrical soundings and very-low-frequency electromagnetic surveys. A statistical analysis of the resistivity sections is tested in terms of standard deviation and is found to be an effective scheme for the subsurface reconstruction of basin architecture as well as the surface demarcation of basin-margin faults and brittle fracture zones, characterized by much higher standard deviation. Pseudo three-dimensional architecture of the basin is delineated by integrating the composite resistivity structure information from two cross-basin E-W magnetotelluric lines and dipole-dipole resistivity lines. Based on statistical analysis, the maximum depth of the basin varies from about 1 km in the northern part to 3 km or more in the middle part. This strong variation supports the view that the basin experienced pull-apart opening with rapid subsidence of the central blocks and asymmetric cross-basinal extension.

  10. Targeting Change: Assessing a Faculty Learning Community Focused on Increasing Statistics Content in Life Science Curricula

    ERIC Educational Resources Information Center

    Parker, Loran Carleton; Gleichsner, Alyssa M.; Adedokun, Omolola A.; Forney, James

    2016-01-01

    Transformation of research in all biological fields necessitates the design, analysis and, interpretation of large data sets. Preparing students with the requisite skills in experimental design, statistical analysis, and interpretation, and mathematical reasoning will require both curricular reform and faculty who are willing and able to integrate…

  11. Robust statistical methods for impulse noise suppressing of spread spectrum induced polarization data, with application to a mine site, Gansu province, China

    NASA Astrophysics Data System (ADS)

    Liu, Weiqiang; Chen, Rujun; Cai, Hongzhu; Luo, Weibin

    2016-12-01

    In this paper, we investigated the robust processing of noisy spread spectrum induced polarization (SSIP) data. SSIP is a new frequency domain induced polarization method that transmits pseudo-random m-sequence as source current where m-sequence is a broadband signal. The potential information at multiple frequencies can be obtained through measurement. Removing the noise is a crucial problem for SSIP data processing. Considering that if the ordinary mean stack and digital filter are not capable of reducing the impulse noise effectively in SSIP data processing, the impact of impulse noise will remain in the complex resistivity spectrum that will affect the interpretation of profile anomalies. We implemented a robust statistical method to SSIP data processing. The robust least-squares regression is used to fit and remove the linear trend from the original data before stacking. The robust M estimate is used to stack the data of all periods. The robust smooth filter is used to suppress the residual noise for data after stacking. For robust statistical scheme, the most appropriate influence function and iterative algorithm are chosen by testing the simulated data to suppress the outliers' influence. We tested the benefits of the robust SSIP data processing using examples of SSIP data recorded in a test site beside a mine in Gansu province, China.

  12. Tools to Support Interpreting Multiple Regression in the Face of Multicollinearity

    PubMed Central

    Kraha, Amanda; Turner, Heather; Nimon, Kim; Zientek, Linda Reichwein; Henson, Robin K.

    2012-01-01

    While multicollinearity may increase the difficulty of interpreting multiple regression (MR) results, it should not cause undue problems for the knowledgeable researcher. In the current paper, we argue that rather than using one technique to investigate regression results, researchers should consider multiple indices to understand the contributions that predictors make not only to a regression model, but to each other as well. Some of the techniques to interpret MR effects include, but are not limited to, correlation coefficients, beta weights, structure coefficients, all possible subsets regression, commonality coefficients, dominance weights, and relative importance weights. This article will review a set of techniques to interpret MR effects, identify the elements of the data on which the methods focus, and identify statistical software to support such analyses. PMID:22457655

  13. Tools to support interpreting multiple regression in the face of multicollinearity.

    PubMed

    Kraha, Amanda; Turner, Heather; Nimon, Kim; Zientek, Linda Reichwein; Henson, Robin K

    2012-01-01

    While multicollinearity may increase the difficulty of interpreting multiple regression (MR) results, it should not cause undue problems for the knowledgeable researcher. In the current paper, we argue that rather than using one technique to investigate regression results, researchers should consider multiple indices to understand the contributions that predictors make not only to a regression model, but to each other as well. Some of the techniques to interpret MR effects include, but are not limited to, correlation coefficients, beta weights, structure coefficients, all possible subsets regression, commonality coefficients, dominance weights, and relative importance weights. This article will review a set of techniques to interpret MR effects, identify the elements of the data on which the methods focus, and identify statistical software to support such analyses.

  14. Application of multivariable statistical techniques in plant-wide WWTP control strategies analysis.

    PubMed

    Flores, X; Comas, J; Roda, I R; Jiménez, L; Gernaey, K V

    2007-01-01

    The main objective of this paper is to present the application of selected multivariable statistical techniques in plant-wide wastewater treatment plant (WWTP) control strategies analysis. In this study, cluster analysis (CA), principal component analysis/factor analysis (PCA/FA) and discriminant analysis (DA) are applied to the evaluation matrix data set obtained by simulation of several control strategies applied to the plant-wide IWA Benchmark Simulation Model No 2 (BSM2). These techniques allow i) to determine natural groups or clusters of control strategies with a similar behaviour, ii) to find and interpret hidden, complex and casual relation features in the data set and iii) to identify important discriminant variables within the groups found by the cluster analysis. This study illustrates the usefulness of multivariable statistical techniques for both analysis and interpretation of the complex multicriteria data sets and allows an improved use of information for effective evaluation of control strategies.

  15. Qualitative Data Analysis and Interpretation in Counseling Psychology: Strategies for Best Practices

    ERIC Educational Resources Information Center

    Yeh, Christine J.; Inman, Arpana G.

    2007-01-01

    This article presents an overview of various strategies and methods of engaging in qualitative data interpretations and analyses in counseling psychology. The authors explore the themes of self, culture, collaboration, circularity, trustworthiness, and evidence deconstruction from multiple qualitative methodologies. Commonalities and differences…

  16. MICROARRAY DATA ANALYSIS USING MULTIPLE STATISTICAL MODELS

    EPA Science Inventory

    Microarray Data Analysis Using Multiple Statistical Models

    Wenjun Bao1, Judith E. Schmid1, Amber K. Goetz1, Ming Ouyang2, William J. Welsh2,Andrew I. Brooks3,4, ChiYi Chu3,Mitsunori Ogihara3,4, Yinhe Cheng5, David J. Dix1. 1National Health and Environmental Effects Researc...

  17. Statistical Literacy in Data Revolution Era: Building Blocks and Instructional Dilemmas

    ERIC Educational Resources Information Center

    Prodromou, Theodosia; Dunne, Tim

    2017-01-01

    The data revolution has given citizens access to enormous large-scale open databases. In order to take into account the full complexity of data, we have to change the way we think in terms of the nature of data and its availability, the ways in which it is displayed and used, and the skills that are required for its interpretation. Substantial…

  18. A new statistical analysis of rare earth element diffusion data in garnet

    NASA Astrophysics Data System (ADS)

    Chu, X.; Ague, J. J.

    2015-12-01

    The incorporation of rare earth elements (REE) in garnet, Sm and Lu in particular, links garnet chemical zoning to absolute age determinations. The application of REE-based geochronology depends critically on the diffusion behaviors of the parent and daughter isotopes. Previous experimental studies on REE diffusion in garnet, however, exhibit significant discrepancies that impact interpretations of garnet Sm/Nd and Lu/Hf ages.We present a new statistical framework to analyze diffusion data for REE using an Arrhenius relationship that accounts for oxygen fugacity, cation radius and garnet unit-cell dimensions [1]. Our approach is based on Bayesian statistics and is implemented by the Markov chain Monte Carlo method. A similar approach has been recently applied to model diffusion of divalent cations in garnet [2]. The analysis incorporates recent data [3] in addition to the data compilation in ref. [1]. We also include the inter-run bias that helps reconcile the discrepancies among data sets. This additional term estimates the reproducibility and other experimental variabilities not explicitly incorporated in the Arrhenius relationship [2] (e.g., compositional dependence [3] and water content).The fitted Arrhenius relationships are consistent with the models in ref. [3], as well as refs. [1]&[4] at high temperatures. Down-temperature extrapolation leads to >0.5 order of magnitude faster diffusion coefficients than in refs. [1]&[4] at <750 °C. The predicted diffusion coefficients are significantly slower than ref. [5]. The fast diffusion [5] was supported by a field test of the Pikwitonei Granulite—the garnet Sm/Nd age postdates the metamorphic peak (750 °C) by ~30 Myr [6], suggesting considerable resetting of the Sm/Nd system during cooling. However, the Pikwitonei Granulite is a recently recognized UHT terrane with peak temperature exceeding 900 °C [7]. The revised closure temperature (~730 °C) is consistent with our new diffusion model.[1] Carlson (2012) Am

  19. Patients and Medical Statistics

    PubMed Central

    Woloshin, Steven; Schwartz, Lisa M; Welch, H Gilbert

    2005-01-01

    BACKGROUND People are increasingly presented with medical statistics. There are no existing measures to assess their level of interest or confidence in using medical statistics. OBJECTIVE To develop 2 new measures, the STAT-interest and STAT-confidence scales, and assess their reliability and validity. DESIGN Survey with retest after approximately 2 weeks. SUBJECTS Two hundred and twenty-four people were recruited from advertisements in local newspapers, an outpatient clinic waiting area, and a hospital open house. MEASURES We developed and revised 5 items on interest in medical statistics and 3 on confidence understanding statistics. RESULTS Study participants were mostly college graduates (52%); 25% had a high school education or less. The mean age was 53 (range 20 to 84) years. Most paid attention to medical statistics (6% paid no attention). The mean (SD) STAT-interest score was 68 (17) and ranged from 15 to 100. Confidence in using statistics was also high: the mean (SD) STAT-confidence score was 65 (19) and ranged from 11 to 100. STAT-interest and STAT-confidence scores were moderately correlated (r=.36, P<.001). Both scales demonstrated good test–retest repeatability (r=.60, .62, respectively), internal consistency reliability (Cronbach's α=0.70 and 0.78), and usability (individual item nonresponse ranged from 0% to 1.3%). Scale scores correlated only weakly with scores on a medical data interpretation test (r=.15 and .26, respectively). CONCLUSION The STAT-interest and STAT-confidence scales are usable and reliable. Interest and confidence were only weakly related to the ability to actually use data. PMID:16307623

  20. Examination of two methods for statistical analysis of data with magnitude and direction emphasizing vestibular research applications

    NASA Technical Reports Server (NTRS)

    Calkins, D. S.

    1998-01-01

    When the dependent (or response) variable response variable in an experiment has direction and magnitude, one approach that has been used for statistical analysis involves splitting magnitude and direction and applying univariate statistical techniques to the components. However, such treatment of quantities with direction and magnitude is not justifiable mathematically and can lead to incorrect conclusions about relationships among variables and, as a result, to flawed interpretations. This note discusses a problem with that practice and recommends mathematically correct procedures to be used with dependent variables that have direction and magnitude for 1) computation of mean values, 2) statistical contrasts of and confidence intervals for means, and 3) correlation methods.

  1. Statistical analysis and interpolation of compositional data in materials science.

    PubMed

    Pesenson, Misha Z; Suram, Santosh K; Gregoire, John M

    2015-02-09

    Compositional data are ubiquitous in chemistry and materials science: analysis of elements in multicomponent systems, combinatorial problems, etc., lead to data that are non-negative and sum to a constant (for example, atomic concentrations). The constant sum constraint restricts the sampling space to a simplex instead of the usual Euclidean space. Since statistical measures such as mean and standard deviation are defined for the Euclidean space, traditional correlation studies, multivariate analysis, and hypothesis testing may lead to erroneous dependencies and incorrect inferences when applied to compositional data. Furthermore, composition measurements that are used for data analytics may not include all of the elements contained in the material; that is, the measurements may be subcompositions of a higher-dimensional parent composition. Physically meaningful statistical analysis must yield results that are invariant under the number of composition elements, requiring the application of specialized statistical tools. We present specifics and subtleties of compositional data processing through discussion of illustrative examples. We introduce basic concepts, terminology, and methods required for the analysis of compositional data and utilize them for the spatial interpolation of composition in a sputtered thin film. The results demonstrate the importance of this mathematical framework for compositional data analysis (CDA) in the fields of materials science and chemistry.

  2. Statistical Reform in School Psychology Research: A Synthesis

    ERIC Educational Resources Information Center

    Swaminathan, Hariharan; Rogers, H. Jane

    2007-01-01

    Statistical reform in school psychology research is discussed in terms of research designs, measurement issues, statistical modeling and analysis procedures, interpretation and reporting of statistical results, and finally statistics education.

  3. Analysis and interpretation of diffraction data from complex, anisotropic materials

    NASA Astrophysics Data System (ADS)

    Tutuncu, Goknur

    Most materials are elastically anisotropic and exhibit additional anisotropy beyond elastic deformation. For instance, in ferroelectric materials the main inelastic deformation mode is via domains, which are highly anisotropic crystallographic features. To quantify this anisotropy of ferroelectrics, advanced X-ray and neutron diffraction methods were employed. Extensive sets of data were collected from tetragonal BaTiO3, PZT and other ferroelectric ceramics. Data analysis was challenging due to the complex constitutive behavior of these materials. To quantify the elastic strain and texture evolution in ferroelectrics under loading, a number of data analysis techniques such as the single peak and Rietveld methods were used and their advantages and disadvantages compared. It was observed that the single peak analysis fails at low peak intensities especially after domain switching while the Rietveld method does not account for lattice strain anisotropy although it overcomes the low intensity problem via whole pattern analysis. To better account for strain anisotropy the constant stress (Reuss) approximation was employed within the Rietveld method and new formulations to estimate lattice strain were proposed. Along the way, new approaches for handling highly anisotropic lattice strain data were also developed and applied. All of the ceramics studied exhibited significant changes in their crystallographic texture after loading indicating non-180° domain switching. For a full interpretation of domain switching the spherical harmonics method was employed in Rietveld. A procedure for simultaneous refinement of multiple data sets was established for a complete texture analysis. To further interpret diffraction data, a solid mechanics model based on the self-consistent approach was used in calculating lattice strain and texture evolution during the loading of a polycrystalline ferroelectric. The model estimates both the macroscopic average response of a specimen and its hkl

  4. Common Scientific and Statistical Errors in Obesity Research

    PubMed Central

    George, Brandon J.; Beasley, T. Mark; Brown, Andrew W.; Dawson, John; Dimova, Rositsa; Divers, Jasmin; Goldsby, TaShauna U.; Heo, Moonseong; Kaiser, Kathryn A.; Keith, Scott; Kim, Mimi Y.; Li, Peng; Mehta, Tapan; Oakes, J. Michael; Skinner, Asheley; Stuart, Elizabeth; Allison, David B.

    2015-01-01

    We identify 10 common errors and problems in the statistical analysis, design, interpretation, and reporting of obesity research and discuss how they can be avoided. The 10 topics are: 1) misinterpretation of statistical significance, 2) inappropriate testing against baseline values, 3) excessive and undisclosed multiple testing and “p-value hacking,” 4) mishandling of clustering in cluster randomized trials, 5) misconceptions about nonparametric tests, 6) mishandling of missing data, 7) miscalculation of effect sizes, 8) ignoring regression to the mean, 9) ignoring confirmation bias, and 10) insufficient statistical reporting. We hope that discussion of these errors can improve the quality of obesity research by helping researchers to implement proper statistical practice and to know when to seek the help of a statistician. PMID:27028280

  5. Modified MODIS fluorescence line height data product to improve image interpretation for red tide monitoring in the eastern Gulf of Mexico

    NASA Astrophysics Data System (ADS)

    Hu, Chuanmin; Feng, Lian

    2017-01-01

    Several satellite-based methods have been used to detect and trace Karenia brevis red tide blooms in the eastern Gulf of Mexico. Some require data statistics and multiple data products while others use a single data product. Of these, the MODIS normalized fluorescence line height (nFLH) has shown its advantage of detecting blooms in waters rich in colored dissolved organic matter, thus having been used routinely to assess bloom conditions by the Florida Fish and Wildlife Conservation Commission (FWC), the official state agency of Florida responsible for red tide monitoring and mitigation. However, elevated sediment concentrations in the water column due to wind storms can also result in high nFLH values, leading to false-positive bloom interpretation. Here, a modified nFLH data product is developed to minimize such impacts through empirical adjustments of the nFLH values using MODIS-derived remote sensing reflectance in the green band at 547 nm. The new product is termed as an algal bloom index (ABI), which has shown improved performance over the original nFLH in both retrospective evaluation statistics and near real-time applications. The ABI product has been made available in near real-time through a Web portal and has been used by the FWC on a routine basis to guide field sampling efforts and prepare for red tide bulletins distributed to many user groups.

  6. Promoting Statistical Thinking in Schools with Road Injury Data

    ERIC Educational Resources Information Center

    Woltman, Marie

    2017-01-01

    Road injury is an immediately relevant topic for 9-19 year olds. Current availability of Open Data makes it increasingly possible to find locally relevant data. Statistical lessons developed from these data can mutually reinforce life lessons about minimizing risk on the road. Devon County Council demonstrate how a wide array of statistical…

  7. A note on the kappa statistic for clustered dichotomous data.

    PubMed

    Zhou, Ming; Yang, Zhao

    2014-06-30

    The kappa statistic is widely used to assess the agreement between two raters. Motivated by a simulation-based cluster bootstrap method to calculate the variance of the kappa statistic for clustered physician-patients dichotomous data, we investigate its special correlation structure and develop a new simple and efficient data generation algorithm. For the clustered physician-patients dichotomous data, based on the delta method and its special covariance structure, we propose a semi-parametric variance estimator for the kappa statistic. An extensive Monte Carlo simulation study is performed to evaluate the performance of the new proposal and five existing methods with respect to the empirical coverage probability, root-mean-square error, and average width of the 95% confidence interval for the kappa statistic. The variance estimator ignoring the dependence within a cluster is generally inappropriate, and the variance estimators from the new proposal, bootstrap-based methods, and the sampling-based delta method perform reasonably well for at least a moderately large number of clusters (e.g., the number of clusters K ⩾50). The new proposal and sampling-based delta method provide convenient tools for efficient computations and non-simulation-based alternatives to the existing bootstrap-based methods. Moreover, the new proposal has acceptable performance even when the number of clusters is as small as K = 25. To illustrate the practical application of all the methods, one psychiatric research data and two simulated clustered physician-patients dichotomous data are analyzed. Copyright © 2014 John Wiley & Sons, Ltd.

  8. Mathematics pre-service teachers’ statistical reasoning about meaning

    NASA Astrophysics Data System (ADS)

    Kristanto, Y. D.

    2018-01-01

    This article offers a descriptive qualitative analysis of 3 second-year pre-service teachers’ statistical reasoning about the mean. Twenty-six pre-service teachers were tested using an open-ended problem where they were expected to analyze a method in finding the mean of a data. Three of their test results are selected to be analyzed. The results suggest that the pre-service teachers did not use context to develop the interpretation of mean. Therefore, this article also offers strategies to promote statistical reasoning about mean that use various contexts.

  9. Statistical Model Selection for TID Hardness Assurance

    NASA Technical Reports Server (NTRS)

    Ladbury, R.; Gorelick, J. L.; McClure, S.

    2010-01-01

    Radiation Hardness Assurance (RHA) methodologies against Total Ionizing Dose (TID) degradation impose rigorous statistical treatments for data from a part's Radiation Lot Acceptance Test (RLAT) and/or its historical performance. However, no similar methods exist for using "similarity" data - that is, data for similar parts fabricated in the same process as the part under qualification. This is despite the greater difficulty and potential risk in interpreting of similarity data. In this work, we develop methods to disentangle part-to-part, lot-to-lot and part-type-to-part-type variation. The methods we develop apply not just for qualification decisions, but also for quality control and detection of process changes and other "out-of-family" behavior. We begin by discussing the data used in ·the study and the challenges of developing a statistic providing a meaningful measure of degradation across multiple part types, each with its own performance specifications. We then develop analysis techniques and apply them to the different data sets.

  10. Revolutionizing volunteer interpreter services: an evaluation of an innovative medical interpreter education program.

    PubMed

    Hasbún Avalos, Oswaldo; Pennington, Kaylin; Osterberg, Lars

    2013-12-01

    In our ever-increasingly multicultural, multilingual society, medical interpreters serve an important role in the provision of care. Though it is known that using untrained interpreters leads to decreased quality of care for limited English proficiency patients, because of a short supply of professionals and a lack of formalized, feasible education programs for volunteers, community health centers and internal medicine practices continue to rely on untrained interpreters. To develop and formally evaluate a novel medical interpreter education program that encompasses major tenets of interpretation, tailored to the needs of volunteer medical interpreters. One-armed, quasi-experimental retro-pre-post study using survey ratings and feedback correlated by assessment scores to determine educational intervention effects. Thirty-eight students; 24 Spanish, nine Mandarin, and five Vietnamese. The majority had prior interpreting experience but no formal medical interpreter training. Students completed retrospective pre-test and post-test surveys measuring confidence in and perceived knowledge of key skills of interpretation. Primary outcome measures were a 10-point Likert scale for survey questions of knowledge, skills, and confidence, written and oral assessments of interpreter skills, and qualitative evidence of newfound knowledge in written reflections. Analyses showed a statistically significant (P <0.001) change of about two points in mean self-ratings on knowledge, skills, and confidence, with large effect sizes (d > 0.8). The second half of the program was also quantitatively and qualitatively shown to be a vital learning experience, resulting in 18 % more students passing the oral assessments; a 19 % increase in mean scores for written assessments; and a newfound understanding of interpreter roles and ways to navigate them. This innovative program was successful in increasing volunteer interpreters' skills and knowledge of interpretation, as well as confidence

  11. Using Carbon Emissions Data to "Heat Up" Descriptive Statistics

    ERIC Educational Resources Information Center

    Brooks, Robert

    2012-01-01

    This article illustrates using carbon emissions data in an introductory statistics assignment. The carbon emissions data has desirable characteristics including: choice of measure; skewness; and outliers. These complexities allow research and public policy debate to be introduced. (Contains 4 figures and 2 tables.)

  12. New Statistics for Testing Differential Expression of Pathways from Microarray Data

    NASA Astrophysics Data System (ADS)

    Siu, Hoicheong; Dong, Hua; Jin, Li; Xiong, Momiao

    Exploring biological meaning from microarray data is very important but remains a great challenge. Here, we developed three new statistics: linear combination test, quadratic test and de-correlation test to identify differentially expressed pathways from gene expression profile. We apply our statistics to two rheumatoid arthritis datasets. Notably, our results reveal three significant pathways and 275 genes in common in two datasets. The pathways we found are meaningful to uncover the disease mechanisms of rheumatoid arthritis, which implies that our statistics are a powerful tool in functional analysis of gene expression data.

  13. Combining Statistical Samples of Resolved-ISM Simulated Galaxies with Realistic Mock Observations to Fully Interpret HST and JWST Surveys

    NASA Astrophysics Data System (ADS)

    Faucher-Giguere, Claude-Andre

    2016-10-01

    HST has invested thousands of orbits to complete multi-wavelength surveys of high-redshift galaxies including the Deep Fields, COSMOS, 3D-HST and CANDELS. Over the next few years, JWST will undertake complementary, spatially-resolved infrared observations. Cosmological simulations are the most powerful tool to make detailed predictions for the properties of galaxy populations and to interpret these surveys. We will leverage recent major advances in the predictive power of cosmological hydrodynamic simulations to produce the first statistical sample of hundreds of galaxies simulated with 10 pc resolution and with explicit interstellar medium and stellar feedback physics proved to simultaneously reproduce the galaxy stellar mass function, the chemical enrichment of galaxies, and the neutral hydrogen content of galaxy halos. We will process our new set of full-volume cosmological simulations, called FIREBOX, with a mock imaging and spectral synthesis pipeline to produce realistic mock HST and JWST observations, including spatially-resolved photometry and spectroscopy. By comparing FIREBOX with recent high-redshift HST surveys, we will study the stellar build up of galaxies, the evolution massive star-forming clumps, their contribution to bulge growth, the connection of bulges to star formation quenching, and the triggering mechanisms of AGN activity. Our mock data products will also enable us to plan future JWST observing programs. We will publicly release all our mock data products to enable HST and JWST science beyond our own analysis, including with the Frontier Fields.

  14. Using Real-Life Data When Teaching Statistics: Student Perceptions of this Strategy in an Introductory Statistics Course

    ERIC Educational Resources Information Center

    Neumann, David L.; Hood, Michelle; Neumann, Michelle M.

    2013-01-01

    Many teachers of statistics recommend using real-life data during class lessons. However, there has been little systematic study of what effect this teaching method has on student engagement and learning. The present study examined this question in a first-year university statistics course. Students (n = 38) were interviewed and their reflections…

  15. XGR software for enhanced interpretation of genomic summary data, illustrated by application to immunological traits.

    PubMed

    Fang, Hai; Knezevic, Bogdan; Burnham, Katie L; Knight, Julian C

    2016-12-13

    Biological interpretation of genomic summary data such as those resulting from genome-wide association studies (GWAS) and expression quantitative trait loci (eQTL) studies is one of the major bottlenecks in medical genomics research, calling for efficient and integrative tools to resolve this problem. We introduce eXploring Genomic Relations (XGR), an open source tool designed for enhanced interpretation of genomic summary data enabling downstream knowledge discovery. Targeting users of varying computational skills, XGR utilises prior biological knowledge and relationships in a highly integrated but easily accessible way to make user-input genomic summary datasets more interpretable. We show how by incorporating ontology, annotation, and systems biology network-driven approaches, XGR generates more informative results than conventional analyses. We apply XGR to GWAS and eQTL summary data to explore the genomic landscape of the activated innate immune response and common immunological diseases. We provide genomic evidence for a disease taxonomy supporting the concept of a disease spectrum from autoimmune to autoinflammatory disorders. We also show how XGR can define SNP-modulated gene networks and pathways that are shared and distinct between diseases, how it achieves functional, phenotypic and epigenomic annotations of genes and variants, and how it enables exploring annotation-based relationships between genetic variants. XGR provides a single integrated solution to enhance interpretation of genomic summary data for downstream biological discovery. XGR is released as both an R package and a web-app, freely available at http://galahad.well.ox.ac.uk/XGR .

  16. Harnessing Multivariate Statistics for Ellipsoidal Data in Structural Geology

    NASA Astrophysics Data System (ADS)

    Roberts, N.; Davis, J. R.; Titus, S.; Tikoff, B.

    2015-12-01

    Most structural geology articles do not state significance levels, report confidence intervals, or perform regressions to find trends. This is, in part, because structural data tend to include directions, orientations, ellipsoids, and tensors, which are not treatable by elementary statistics. We describe a full procedural methodology for the statistical treatment of ellipsoidal data. We use a reconstructed dataset of deformed ooids in Maryland from Cloos (1947) to illustrate the process. Normalized ellipsoids have five degrees of freedom and can be represented by a second order tensor. This tensor can be permuted into a five dimensional vector that belongs to a vector space and can be treated with standard multivariate statistics. Cloos made several claims about the distribution of deformation in the South Mountain fold, Maryland, and we reexamine two particular claims using hypothesis testing: 1) octahedral shear strain increases towards the axial plane of the fold; 2) finite strain orientation varies systematically along the trend of the axial trace as it bends with the Appalachian orogen. We then test the null hypothesis that the southern segment of South Mountain is the same as the northern segment. This test illustrates the application of ellipsoidal statistics, which combine both orientation and shape. We report confidence intervals for each test, and graphically display our results with novel plots. This poster illustrates the importance of statistics in structural geology, especially when working with noisy or small datasets.

  17. In situ impulse test: an experimental and analytical evaluation of data interpretation procedures

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    1975-08-01

    Special experimental field testing and analytical studies were undertaken at Fort Lawton in Seattle, Washington, to study ''close-in'' wave propagation and evaluate data interpretation procedures for a new in situ impulse test. This test was developed to determine the shear wave velocity and dynamic modulus of soils underlying potential nuclear power plant sites. The test is different from conventional geophysical testing in that the velocity variation with strain is determined for each test. In general, strains between 10/sup -1/ and 10/sup -3/ percent are achieved. The experimental field work consisted of performing special tests in a large test sand fillmore » to obtain detailed ''close-in'' data. Six recording transducers were placed at various points on the energy source, while approximately 37 different transducers were installed within the soil fill, all within 7 feet of the energy source. Velocity measurements were then taken simultaneously under controlled test conditions to study shear wave propagation phenomenology and help evaluate data interpretation procedures. Typical test data are presented along with detailed descriptions of the results.« less

  18. ISSUES IN THE STATISTICAL ANALYSIS OF SMALL-AREA HEALTH DATA. (R825173)

    EPA Science Inventory

    The availability of geographically indexed health and population data, with advances in computing, geographical information systems and statistical methodology, have opened the way for serious exploration of small area health statistics based on routine data. Such analyses may be...

  19. A statistical framework to predict functional non-coding regions in the human genome through integrated analysis of annotation data.

    PubMed

    Lu, Qiongshi; Hu, Yiming; Sun, Jiehuan; Cheng, Yuwei; Cheung, Kei-Hoi; Zhao, Hongyu

    2015-05-27

    Identifying functional regions in the human genome is a major goal in human genetics. Great efforts have been made to functionally annotate the human genome either through computational predictions, such as genomic conservation, or high-throughput experiments, such as the ENCODE project. These efforts have resulted in a rich collection of functional annotation data of diverse types that need to be jointly analyzed for integrated interpretation and annotation. Here we present GenoCanyon, a whole-genome annotation method that performs unsupervised statistical learning using 22 computational and experimental annotations thereby inferring the functional potential of each position in the human genome. With GenoCanyon, we are able to predict many of the known functional regions. The ability of predicting functional regions as well as its generalizable statistical framework makes GenoCanyon a unique and powerful tool for whole-genome annotation. The GenoCanyon web server is available at http://genocanyon.med.yale.edu.

  20. TRAPR: R Package for Statistical Analysis and Visualization of RNA-Seq Data.

    PubMed

    Lim, Jae Hyun; Lee, Soo Youn; Kim, Ju Han

    2017-03-01

    High-throughput transcriptome sequencing, also known as RNA sequencing (RNA-Seq), is a standard technology for measuring gene expression with unprecedented accuracy. Numerous bioconductor packages have been developed for the statistical analysis of RNA-Seq data. However, these tools focus on specific aspects of the data analysis pipeline, and are difficult to appropriately integrate with one another due to their disparate data structures and processing methods. They also lack visualization methods to confirm the integrity of the data and the process. In this paper, we propose an R-based RNA-Seq analysis pipeline called TRAPR, an integrated tool that facilitates the statistical analysis and visualization of RNA-Seq expression data. TRAPR provides various functions for data management, the filtering of low-quality data, normalization, transformation, statistical analysis, data visualization, and result visualization that allow researchers to build customized analysis pipelines.

  1. The use of statistical tools in field testing of putative effects of genetically modified plants on nontarget organisms

    PubMed Central

    Semenov, Alexander V; Elsas, Jan Dirk; Glandorf, Debora C M; Schilthuizen, Menno; Boer, Willem F

    2013-01-01

    Abstract To fulfill existing guidelines, applicants that aim to place their genetically modified (GM) insect-resistant crop plants on the market are required to provide data from field experiments that address the potential impacts of the GM plants on nontarget organisms (NTO's). Such data may be based on varied experimental designs. The recent EFSA guidance document for environmental risk assessment (2010) does not provide clear and structured suggestions that address the statistics of field trials on effects on NTO's. This review examines existing practices in GM plant field testing such as the way of randomization, replication, and pseudoreplication. Emphasis is placed on the importance of design features used for the field trials in which effects on NTO's are assessed. The importance of statistical power and the positive and negative aspects of various statistical models are discussed. Equivalence and difference testing are compared, and the importance of checking the distribution of experimental data is stressed to decide on the selection of the proper statistical model. While for continuous data (e.g., pH and temperature) classical statistical approaches – for example, analysis of variance (ANOVA) – are appropriate, for discontinuous data (counts) only generalized linear models (GLM) are shown to be efficient. There is no golden rule as to which statistical test is the most appropriate for any experimental situation. In particular, in experiments in which block designs are used and covariates play a role GLMs should be used. Generic advice is offered that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in this testing. The combination of decision trees and a checklist for field trials, which are provided, will help in the interpretation of the statistical analyses of field trials and to assess whether such analyses were correctly applied. We offer generic advice to risk assessors and

  2. The use of statistical tools in field testing of putative effects of genetically modified plants on nontarget organisms.

    PubMed

    Semenov, Alexander V; Elsas, Jan Dirk; Glandorf, Debora C M; Schilthuizen, Menno; Boer, Willem F

    2013-08-01

    To fulfill existing guidelines, applicants that aim to place their genetically modified (GM) insect-resistant crop plants on the market are required to provide data from field experiments that address the potential impacts of the GM plants on nontarget organisms (NTO's). Such data may be based on varied experimental designs. The recent EFSA guidance document for environmental risk assessment (2010) does not provide clear and structured suggestions that address the statistics of field trials on effects on NTO's. This review examines existing practices in GM plant field testing such as the way of randomization, replication, and pseudoreplication. Emphasis is placed on the importance of design features used for the field trials in which effects on NTO's are assessed. The importance of statistical power and the positive and negative aspects of various statistical models are discussed. Equivalence and difference testing are compared, and the importance of checking the distribution of experimental data is stressed to decide on the selection of the proper statistical model. While for continuous data (e.g., pH and temperature) classical statistical approaches - for example, analysis of variance (ANOVA) - are appropriate, for discontinuous data (counts) only generalized linear models (GLM) are shown to be efficient. There is no golden rule as to which statistical test is the most appropriate for any experimental situation. In particular, in experiments in which block designs are used and covariates play a role GLMs should be used. Generic advice is offered that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in this testing. The combination of decision trees and a checklist for field trials, which are provided, will help in the interpretation of the statistical analyses of field trials and to assess whether such analyses were correctly applied. We offer generic advice to risk assessors and applicants that will

  3. Interpretation of biomonitoring data in clinical medicine and the exposure sciences

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Williams, Bryan L.; Barr, Dana B.; Wright, J. Michael

    2008-11-15

    Biomonitoring has become a fundamental tool in both exposure science and clinical medicine. Despite significant analytical advances, the clinical use of environmental biomarkers remains in its infancy. Clinical use of environmental biomarkers poses some complex scientific and ethical challenges. The purpose of this paper is compare how the clinical and exposure sciences differ with respect to their interpretation and use of biological data. Additionally, the clinical use of environmental biomonitoring data is discussed. A case study is used to illustrate the complexities of conducting biomonitoring research on highly vulnerable populations in a clinical setting.

  4. ROOT — A C++ framework for petabyte data storage, statistical analysis and visualization

    NASA Astrophysics Data System (ADS)

    Antcheva, I.; Ballintijn, M.; Bellenot, B.; Biskup, M.; Brun, R.; Buncic, N.; Canal, Ph.; Casadei, D.; Couet, O.; Fine, V.; Franco, L.; Ganis, G.; Gheata, A.; Maline, D. Gonzalez; Goto, M.; Iwaszkiewicz, J.; Kreshuk, A.; Segura, D. Marcos; Maunder, R.; Moneta, L.; Naumann, A.; Offermann, E.; Onuchin, V.; Panacek, S.; Rademakers, F.; Russo, P.; Tadel, M.

    2009-12-01

    ROOT is an object-oriented C++ framework conceived in the high-energy physics (HEP) community, designed for storing and analyzing petabytes of data in an efficient way. Any instance of a C++ class can be stored into a ROOT file in a machine-independent compressed binary format. In ROOT the TTree object container is optimized for statistical data analysis over very large data sets by using vertical data storage techniques. These containers can span a large number of files on local disks, the web, or a number of different shared file systems. In order to analyze this data, the user can chose out of a wide set of mathematical and statistical functions, including linear algebra classes, numerical algorithms such as integration and minimization, and various methods for performing regression analysis (fitting). In particular, the RooFit package allows the user to perform complex data modeling and fitting while the RooStats library provides abstractions and implementations for advanced statistical tools. Multivariate classification methods based on machine learning techniques are available via the TMVA package. A central piece in these analysis tools are the histogram classes which provide binning of one- and multi-dimensional data. Results can be saved in high-quality graphical formats like Postscript and PDF or in bitmap formats like JPG or GIF. The result can also be stored into ROOT macros that allow a full recreation and rework of the graphics. Users typically create their analysis macros step by step, making use of the interactive C++ interpreter CINT, while running over small data samples. Once the development is finished, they can run these macros at full compiled speed over large data sets, using on-the-fly compilation, or by creating a stand-alone batch program. Finally, if processing farms are available, the user can reduce the execution time of intrinsically parallel tasks — e.g. data mining in HEP — by using PROOF, which will take care of optimally

  5. “Magnitude-based Inference”: A Statistical Review

    PubMed Central

    Welsh, Alan H.; Knight, Emma J.

    2015-01-01

    ABSTRACT Purpose We consider “magnitude-based inference” and its interpretation by examining in detail its use in the problem of comparing two means. Methods We extract from the spreadsheets, which are provided to users of the analysis (http://www.sportsci.org/), a precise description of how “magnitude-based inference” is implemented. We compare the implemented version of the method with general descriptions of it and interpret the method in familiar statistical terms. Results and Conclusions We show that “magnitude-based inference” is not a progressive improvement on modern statistics. The additional probabilities introduced are not directly related to the confidence interval but, rather, are interpretable either as P values for two different nonstandard tests (for different null hypotheses) or as approximate Bayesian calculations, which also lead to a type of test. We also discuss sample size calculations associated with “magnitude-based inference” and show that the substantial reduction in sample sizes claimed for the method (30% of the sample size obtained from standard frequentist calculations) is not justifiable so the sample size calculations should not be used. Rather than using “magnitude-based inference,” a better solution is to be realistic about the limitations of the data and use either confidence intervals or a fully Bayesian analysis. PMID:25051387

  6. Toward Global Comparability of Sexual Orientation Data in Official Statistics: A Conceptual Framework of Sexual Orientation for Health Data Collection in New Zealand's Official Statistics System

    PubMed Central

    Gray, Alistair; Veale, Jaimie F.; Binson, Diane; Sell, Randell L.

    2013-01-01

    Objective. Effectively addressing health disparities experienced by sexual minority populations requires high-quality official data on sexual orientation. We developed a conceptual framework of sexual orientation to improve the quality of sexual orientation data in New Zealand's Official Statistics System. Methods. We reviewed conceptual and methodological literature, culminating in a draft framework. To improve the framework, we held focus groups and key-informant interviews with sexual minority stakeholders and producers and consumers of official statistics. An advisory board of experts provided additional guidance. Results. The framework proposes working definitions of the sexual orientation topic and measurement concepts, describes dimensions of the measurement concepts, discusses variables framing the measurement concepts, and outlines conceptual grey areas. Conclusion. The framework proposes standard definitions and concepts for the collection of official sexual orientation data in New Zealand. It presents a model for producers of official statistics in other countries, who wish to improve the quality of health data on their citizens. PMID:23840231

  7. US Geological Survey nutrient preservation experiment : experimental design, statistical analysis, and interpretation of analytical results

    USGS Publications Warehouse

    Patton, Charles J.; Gilroy, Edward J.

    1999-01-01

    Data on which this report is based, including nutrient concentrations in synthetic reference samples determined concurrently with those in real samples, are extensive (greater than 20,000 determinations) and have been published separately. In addition to confirming the well-documented instability of nitrite in acidified samples, this study also demonstrates that when biota are removed from samples at collection sites by 0.45-micrometer membrane filtration, subsequent preservation with sulfuric acid or mercury (II) provides no statistically significant improvement in nutrient concentration stability during storage at 4 degrees Celsius for 30 days. Biocide preservation had no statistically significant effect on the 30-day stability of phosphorus concentrations in whole-water splits from any of the 15 stations, but did stabilize Kjeldahl nitrogen concentrations in whole-water splits from three data-collection stations where ammonium accounted for at least half of the measured Kjeldahl nitrogen.

  8. Mississippi Public Junior Colleges Statistical Data, 1985-86.

    ERIC Educational Resources Information Center

    Moody, George V.; And Others

    Statistical data for the 1985-86 academic year are prestned here for Mississippi's 15 public junior colleges, including information on enrollments, degrees and certificates awarded, revenues, expenditures, academic salary ranges, transportation services, dormitory utilization, and auxiliary enterprises. Introductory remarks and the Board of…

  9. Primer of statistics in dental research: part I.

    PubMed

    Shintani, Ayumi

    2014-01-01

    Statistics play essential roles in evidence-based dentistry (EBD) practice and research. It ranges widely from formulating scientific questions, designing studies, collecting and analyzing data to interpreting, reporting, and presenting study findings. Mastering statistical concepts appears to be an unreachable goal among many dental researchers in part due to statistical authorities' limitations of explaining statistical principles to health researchers without elaborating complex mathematical concepts. This series of 2 articles aim to introduce dental researchers to 9 essential topics in statistics to conduct EBD with intuitive examples. The part I of the series includes the first 5 topics (1) statistical graph, (2) how to deal with outliers, (3) p-value and confidence interval, (4) testing equivalence, and (5) multiplicity adjustment. Part II will follow to cover the remaining topics including (6) selecting the proper statistical tests, (7) repeated measures analysis, (8) epidemiological consideration for causal association, and (9) analysis of agreement. Copyright © 2014. Published by Elsevier Ltd.

  10. Interpretation of Confidence Interval Facing the Conflict

    ERIC Educational Resources Information Center

    Andrade, Luisa; Fernández, Felipe

    2016-01-01

    As literature has reported, it is usual that university students in statistics courses, and even statistics teachers, interpret the confidence level associated with a confidence interval as the probability that the parameter value will be between the lower and upper interval limits. To confront this misconception, class activities have been…

  11. Statistical limitations in functional neuroimaging. I. Non-inferential methods and statistical models.

    PubMed Central

    Petersson, K M; Nichols, T E; Poline, J B; Holmes, A P

    1999-01-01

    Functional neuroimaging (FNI) provides experimental access to the intact living brain making it possible to study higher cognitive functions in humans. In this review and in a companion paper in this issue, we discuss some common methods used to analyse FNI data. The emphasis in both papers is on assumptions and limitations of the methods reviewed. There are several methods available to analyse FNI data indicating that none is optimal for all purposes. In order to make optimal use of the methods available it is important to know the limits of applicability. For the interpretation of FNI results it is also important to take into account the assumptions, approximations and inherent limitations of the methods used. This paper gives a brief overview over some non-inferential descriptive methods and common statistical models used in FNI. Issues relating to the complex problem of model selection are discussed. In general, proper model selection is a necessary prerequisite for the validity of the subsequent statistical inference. The non-inferential section describes methods that, combined with inspection of parameter estimates and other simple measures, can aid in the process of model selection and verification of assumptions. The section on statistical models covers approaches to global normalization and some aspects of univariate, multivariate, and Bayesian models. Finally, approaches to functional connectivity and effective connectivity are discussed. In the companion paper we review issues related to signal detection and statistical inference. PMID:10466149

  12. Statistical Relational Learning (SRL) as an Enabling Technology for Data Acquisition and Data Fusion in Video

    DTIC Science & Technology

    2013-05-02

    REPORT Statistical Relational Learning ( SRL ) as an Enabling Technology for Data Acquisition and Data Fusion in Video 14. ABSTRACT 16. SECURITY...particular, it is important to reason about which portions of video require expensive analysis and storage. This project aims to make these...inferences using new and existing tools from Statistical Relational Learning ( SRL ). SRL is a recently emerging technology that enables the effective 1

  13. Explorations in Statistics: The Analysis of Ratios and Normalized Data

    ERIC Educational Resources Information Center

    Curran-Everett, Douglas

    2013-01-01

    Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This ninth installment of "Explorations in Statistics" explores the analysis of ratios and normalized--or standardized--data. As researchers, we compute a ratio--a numerator divided by a denominator--to compute a…

  14. Statistical characteristics of MST radar echoes and its interpretation

    NASA Technical Reports Server (NTRS)

    Woodman, Ronald F.

    1989-01-01

    Two concepts of fundamental importance are reviewed: the autocorrelation function and the frequency power spectrum. In addition, some turbulence concepts, the relationship between radar signals and atmospheric medium statistics, partial reflection, and the characteristics of noise and clutter interference are discussed.

  15. Geological and Seismic Data Mining For The Development of An Interpretation System Within The Alptransit Project

    NASA Astrophysics Data System (ADS)

    Klose, C. D.; Giese, R.; Löw, S.; Borm, G.

    Especially for deep underground excavations, the prediction of the locations of small- scale hazardous geotechnical structures is nearly impossible when exploration is re- stricted to surface based methods. Hence, for the AlpTransit base tunnels, exploration ahead has become an essential component of the excavation plan. The project de- scribed in this talk aims at improving the technology for the geological interpretation of reflection seismic data. The discovered geological-seismic relations will be used to develop an interpretation system based on artificial intelligence to predict hazardous geotechnical structures of the advancing tunnel face. This talk gives, at first, an overview about the data mining of geological and seismic properties of metamorphic rocks within the Penninic gneiss zone in Southern Switzer- land. The data results from measurements of a specific geophysical prediction system developed by the GFZ Potsdam, Germany, along the 2600 m long and 1400 m deep Faido access tunnel. The goal is to find those seismic features (i.e. compression and shear wave velocities, velocity ratios and velocity gradients) which show a significant relation to geological properties (i.e. fracturing and fabric features). The seismic properties were acquired from different tomograms, whereas the geolog- ical features derive from tunnel face maps. The features are statistically compared with the seismic rock properties taking into account the different methods used for the tunnel excavation (TBM and Drill/Blast). Fracturing and the mica content stay in a positive relation to the velocity values. Both, P- and S-wave velocities near the tunnel surface describe the petrology better, whereas in the interior of the rock mass they correlate to natural micro- and macro-scopic fractures surrounding tectonites, i.e. cataclasites. The latter lie outside of the excavation damage zone and the tunnel loos- ening zone. The shear wave velocities are better indicators for rock

  16. Data mining and statistical inference in selective laser melting

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kamath, Chandrika

    Selective laser melting (SLM) is an additive manufacturing process that builds a complex three-dimensional part, layer-by-layer, using a laser beam to fuse fine metal powder together. The design freedom afforded by SLM comes associated with complexity. As the physical phenomena occur over a broad range of length and time scales, the computational cost of modeling the process is high. At the same time, the large number of parameters that control the quality of a part make experiments expensive. In this paper, we describe ways in which we can use data mining and statistical inference techniques to intelligently combine simulations andmore » experiments to build parts with desired properties. We start with a brief summary of prior work in finding process parameters for high-density parts. We then expand on this work to show how we can improve the approach by using feature selection techniques to identify important variables, data-driven surrogate models to reduce computational costs, improved sampling techniques to cover the design space adequately, and uncertainty analysis for statistical inference. Here, our results indicate that techniques from data mining and statistics can complement those from physical modeling to provide greater insight into complex processes such as selective laser melting.« less

  17. Data mining and statistical inference in selective laser melting

    DOE PAGES

    Kamath, Chandrika

    2016-01-11

    Selective laser melting (SLM) is an additive manufacturing process that builds a complex three-dimensional part, layer-by-layer, using a laser beam to fuse fine metal powder together. The design freedom afforded by SLM comes associated with complexity. As the physical phenomena occur over a broad range of length and time scales, the computational cost of modeling the process is high. At the same time, the large number of parameters that control the quality of a part make experiments expensive. In this paper, we describe ways in which we can use data mining and statistical inference techniques to intelligently combine simulations andmore » experiments to build parts with desired properties. We start with a brief summary of prior work in finding process parameters for high-density parts. We then expand on this work to show how we can improve the approach by using feature selection techniques to identify important variables, data-driven surrogate models to reduce computational costs, improved sampling techniques to cover the design space adequately, and uncertainty analysis for statistical inference. Here, our results indicate that techniques from data mining and statistics can complement those from physical modeling to provide greater insight into complex processes such as selective laser melting.« less

  18. Data management in large-scale collaborative toxicity studies: how to file experimental data for automated statistical analysis.

    PubMed

    Stanzel, Sven; Weimer, Marc; Kopp-Schneider, Annette

    2013-06-01

    High-throughput screening approaches are carried out for the toxicity assessment of a large number of chemical compounds. In such large-scale in vitro toxicity studies several hundred or thousand concentration-response experiments are conducted. The automated evaluation of concentration-response data using statistical analysis scripts saves time and yields more consistent results in comparison to data analysis performed by the use of menu-driven statistical software. Automated statistical analysis requires that concentration-response data are available in a standardised data format across all compounds. To obtain consistent data formats, a standardised data management workflow must be established, including guidelines for data storage, data handling and data extraction. In this paper two procedures for data management within large-scale toxicological projects are proposed. Both procedures are based on Microsoft Excel files as the researcher's primary data format and use a computer programme to automate the handling of data files. The first procedure assumes that data collection has not yet started whereas the second procedure can be used when data files already exist. Successful implementation of the two approaches into the European project ACuteTox is illustrated. Copyright © 2012 Elsevier Ltd. All rights reserved.

  19. Interpreting Evidence-of-Learning: Educational Research in the Era of Big Data

    ERIC Educational Resources Information Center

    Cope, Bill; Kalantzis, Mary

    2015-01-01

    In this article, we argue that big data can offer new opportunities and roles for educational researchers. In the traditional model of evidence-gathering and interpretation in education, researchers are independent observers, who pre-emptively create instruments of measurement, and insert these into the educational process in specialized times and…

  20. Development of computer-assisted instruction application for statistical data analysis android platform as learning resource

    NASA Astrophysics Data System (ADS)

    Hendikawati, P.; Arifudin, R.; Zahid, M. Z.

    2018-03-01

    This study aims to design an android Statistics Data Analysis application that can be accessed through mobile devices to making it easier for users to access. The Statistics Data Analysis application includes various topics of basic statistical along with a parametric statistics data analysis application. The output of this application system is parametric statistics data analysis that can be used for students, lecturers, and users who need the results of statistical calculations quickly and easily understood. Android application development is created using Java programming language. The server programming language uses PHP with the Code Igniter framework, and the database used MySQL. The system development methodology used is the Waterfall methodology with the stages of analysis, design, coding, testing, and implementation and system maintenance. This statistical data analysis application is expected to support statistical lecturing activities and make students easier to understand the statistical analysis of mobile devices.

  1. Statistical modeling of space shuttle environmental data

    NASA Technical Reports Server (NTRS)

    Tubbs, J. D.; Brewer, D. W.

    1983-01-01

    Statistical models which use a class of bivariate gamma distribution are examined. Topics discussed include: (1) the ratio of positively correlated gamma varieties; (2) a method to determine if unequal shape parameters are necessary in bivariate gamma distribution; (3) differential equations for modal location of a family of bivariate gamma distribution; and (4) analysis of some wind gust data using the analytical results developed for modeling application.

  2. Assistive Technologies for Second-Year Statistics Students Who Are Blind

    ERIC Educational Resources Information Center

    Erhardt, Robert J.; Shuman, Michael P.

    2015-01-01

    At Wake Forest University, a student who is blind enrolled in a second course in statistics. The course covered simple and multiple regression, model diagnostics, model selection, data visualization, and elementary logistic regression. These topics required that the student both interpret and produce three sets of materials: mathematical writing,…

  3. A bird's eye view: the cognitive strategies of experts interpreting seismic profiles

    NASA Astrophysics Data System (ADS)

    Bond, C. E.; Butler, R.

    2012-12-01

    Geoscience is perhaps unique in its reliance on incomplete datasets and building knowledge from their interpretation. This interpretation basis for the science is fundamental at all levels; from creation of a geological map to interpretation of remotely sensed data. To teach and understand better the uncertainties in dealing with incomplete data we need to understand the strategies individual practitioners deploy that make them effective interpreters. The nature of interpretation is such that the interpreter needs to use their cognitive ability in the analysis of the data to propose a sensible solution in their final output that is both consistent not only with the original data but also with other knowledge and understanding. In a series of experiments Bond et al. (2007, 2008, 2011, 2012) investigated the strategies and pitfalls of expert and non-expert interpretation of seismic images. These studies focused on large numbers of participants to provide a statistically sound basis for analysis of the results. The outcome of these experiments showed that techniques and strategies are more important than expert knowledge per se in developing successful interpretations. Experts are successful because of their application of these techniques. In a new set of experiments we have focused on a small number of experts to determine how they use their cognitive and reasoning skills, in the interpretation of 2D seismic profiles. Live video and practitioner commentary were used to track the evolving interpretation and to gain insight on their decision processes. The outputs of the study allow us to create an educational resource of expert interpretation through online video footage and commentary with associated further interpretation and analysis of the techniques and strategies employed. This resource will be of use to undergraduate, post-graduate, industry and academic professionals seeking to improve their seismic interpretation skills, develop reasoning strategies for

  4. 77 FR 65585 - Renewal of the Bureau of Labor Statistics Data Users Advisory Committee

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-29

    ... DEPARTMENT OF LABOR Bureau of Labor Statistics Renewal of the Bureau of Labor Statistics Data..., the Secretary of Labor has determined that the renewal of the Bureau of Labor Statistics Data Users... duties imposed upon the Commissioner of Labor Statistics by 29 U.S.C. 1 and 2. This determination follows...

  5. 29 CFR 1904.42 - Requests from the Bureau of Labor Statistics for data.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 29 Labor 5 2010-07-01 2010-07-01 false Requests from the Bureau of Labor Statistics for data. 1904... Statistics for data. (a) Basic requirement. If you receive a Survey of Occupational Injuries and Illnesses Form from the Bureau of Labor Statistics (BLS), or a BLS designee, you must promptly complete the form...

  6. Feature relevance assessment for the semantic interpretation of 3D point cloud data

    NASA Astrophysics Data System (ADS)

    Weinmann, M.; Jutzi, B.; Mallet, C.

    2013-10-01

    The automatic analysis of large 3D point clouds represents a crucial task in photogrammetry, remote sensing and computer vision. In this paper, we propose a new methodology for the semantic interpretation of such point clouds which involves feature relevance assessment in order to reduce both processing time and memory consumption. Given a standard benchmark dataset with 1.3 million 3D points, we first extract a set of 21 geometric 3D and 2D features. Subsequently, we apply a classifier-independent ranking procedure which involves a general relevance metric in order to derive compact and robust subsets of versatile features which are generally applicable for a large variety of subsequent tasks. This metric is based on 7 different feature selection strategies and thus addresses different intrinsic properties of the given data. For the example of semantically interpreting 3D point cloud data, we demonstrate the great potential of smaller subsets consisting of only the most relevant features with 4 different state-of-the-art classifiers. The results reveal that, instead of including as many features as possible in order to compensate for lack of knowledge, a crucial task such as scene interpretation can be carried out with only few versatile features and even improved accuracy.

  7. Estimating error statistics for Chambon-la-Forêt observatory definitive data

    NASA Astrophysics Data System (ADS)

    Lesur, Vincent; Heumez, Benoît; Telali, Abdelkader; Lalanne, Xavier; Soloviev, Anatoly

    2017-08-01

    We propose a new algorithm for calibrating definitive observatory data with the goal of providing users with estimates of the data error standard deviations (SDs). The algorithm has been implemented and tested using Chambon-la-Forêt observatory (CLF) data. The calibration process uses all available data. It is set as a large, weakly non-linear, inverse problem that ultimately provides estimates of baseline values in three orthogonal directions, together with their expected standard deviations. For this inverse problem, absolute data error statistics are estimated from two series of absolute measurements made within a day. Similarly, variometer data error statistics are derived by comparing variometer data time series between different pairs of instruments over few years. The comparisons of these time series led us to use an autoregressive process of order 1 (AR1 process) as a prior for the baselines. Therefore the obtained baselines do not vary smoothly in time. They have relatively small SDs, well below 300 pT when absolute data are recorded twice a week - i.e. within the daily to weekly measures recommended by INTERMAGNET. The algorithm was tested against the process traditionally used to derive baselines at CLF observatory, suggesting that statistics are less favourable when this latter process is used. Finally, two sets of definitive data were calibrated using the new algorithm. Their comparison shows that the definitive data SDs are less than 400 pT and may be slightly overestimated by our process: an indication that more work is required to have proper estimates of absolute data error statistics. For magnetic field modelling, the results show that even on isolated sites like CLF observatory, there are very localised signals over a large span of temporal frequencies that can be as large as 1 nT. The SDs reported here encompass signals of a few hundred metres and less than a day wavelengths.

  8. Statistical geometric affinity in human brain electric activity

    NASA Astrophysics Data System (ADS)

    Chornet-Lurbe, A.; Oteo, J. A.; Ros, J.

    2007-05-01

    The representation of the human electroencephalogram (EEG) records by neurophysiologists demands standardized time-amplitude scales for their correct conventional interpretation. In a suite of graphical experiments involving scaling affine transformations we have been able to convert electroencephalogram samples corresponding to any particular sleep phase and relaxed wakefulness into each other. We propound a statistical explanation for that finding in terms of data collapse. As a sequel, we determine characteristic time and amplitude scales and outline a possible physical interpretation. An analysis for characteristic times based on lacunarity is also carried out as well as a study of the synchrony between left and right EEG channels.

  9. Neural network approaches versus statistical methods in classification of multisource remote sensing data

    NASA Technical Reports Server (NTRS)

    Benediktsson, Jon A.; Swain, Philip H.; Ersoy, Okan K.

    1990-01-01

    Neural network learning procedures and statistical classificaiton methods are applied and compared empirically in classification of multisource remote sensing and geographic data. Statistical multisource classification by means of a method based on Bayesian classification theory is also investigated and modified. The modifications permit control of the influence of the data sources involved in the classification process. Reliability measures are introduced to rank the quality of the data sources. The data sources are then weighted according to these rankings in the statistical multisource classification. Four data sources are used in experiments: Landsat MSS data and three forms of topographic data (elevation, slope, and aspect). Experimental results show that two different approaches have unique advantages and disadvantages in this classification application.

  10. Statistical methods and computing for big data.

    PubMed

    Wang, Chun; Chen, Ming-Hui; Schifano, Elizabeth; Wu, Jing; Yan, Jun

    2016-01-01

    Big data are data on a massive scale in terms of volume, intensity, and complexity that exceed the capacity of standard analytic tools. They present opportunities as well as challenges to statisticians. The role of computational statisticians in scientific discovery from big data analyses has been under-recognized even by peer statisticians. This article summarizes recent methodological and software developments in statistics that address the big data challenges. Methodologies are grouped into three classes: subsampling-based, divide and conquer, and online updating for stream data. As a new contribution, the online updating approach is extended to variable selection with commonly used criteria, and their performances are assessed in a simulation study with stream data. Software packages are summarized with focuses on the open source R and R packages, covering recent tools that help break the barriers of computer memory and computing power. Some of the tools are illustrated in a case study with a logistic regression for the chance of airline delay.

  11. Statistical methods and computing for big data

    PubMed Central

    Wang, Chun; Chen, Ming-Hui; Schifano, Elizabeth; Wu, Jing

    2016-01-01

    Big data are data on a massive scale in terms of volume, intensity, and complexity that exceed the capacity of standard analytic tools. They present opportunities as well as challenges to statisticians. The role of computational statisticians in scientific discovery from big data analyses has been under-recognized even by peer statisticians. This article summarizes recent methodological and software developments in statistics that address the big data challenges. Methodologies are grouped into three classes: subsampling-based, divide and conquer, and online updating for stream data. As a new contribution, the online updating approach is extended to variable selection with commonly used criteria, and their performances are assessed in a simulation study with stream data. Software packages are summarized with focuses on the open source R and R packages, covering recent tools that help break the barriers of computer memory and computing power. Some of the tools are illustrated in a case study with a logistic regression for the chance of airline delay. PMID:27695593

  12. Interpreting Meta-Analyses of Genome-Wide Association Studies

    PubMed Central

    Han, Buhm; Eskin, Eleazar

    2012-01-01

    Meta-analysis is an increasingly popular tool for combining multiple genome-wide association studies in a single analysis to identify associations with small effect sizes. The effect sizes between studies in a meta-analysis may differ and these differences, or heterogeneity, can be caused by many factors. If heterogeneity is observed in the results of a meta-analysis, interpreting the cause of heterogeneity is important because the correct interpretation can lead to a better understanding of the disease and a more effective design of a replication study. However, interpreting heterogeneous results is difficult. The standard approach of examining the association p-values of the studies does not effectively predict if the effect exists in each study. In this paper, we propose a framework facilitating the interpretation of the results of a meta-analysis. Our framework is based on a new statistic representing the posterior probability that the effect exists in each study, which is estimated utilizing cross-study information. Simulations and application to the real data show that our framework can effectively segregate the studies predicted to have an effect, the studies predicted to not have an effect, and the ambiguous studies that are underpowered. In addition to helping interpretation, the new framework also allows us to develop a new association testing procedure taking into account the existence of effect. PMID:22396665

  13. Financial Statistics. Higher Education General Information Survey (HEGIS) [machine-readable data file].

    ERIC Educational Resources Information Center

    Center for Education Statistics (ED/OERI), Washington, DC.

    The Financial Statistics machine-readable data file (MRDF) is a subfile of the larger Higher Education General Information Survey (HEGIS). It contains basic financial statistics for over 3,000 institutions of higher education in the United States and its territories. The data are arranged sequentially by institution, with institutional…

  14. A Flexible Approach for the Statistical Visualization of Ensemble Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Potter, K.; Wilson, A.; Bremer, P.

    2009-09-29

    Scientists are increasingly moving towards ensemble data sets to explore relationships present in dynamic systems. Ensemble data sets combine spatio-temporal simulation results generated using multiple numerical models, sampled input conditions and perturbed parameters. While ensemble data sets are a powerful tool for mitigating uncertainty, they pose significant visualization and analysis challenges due to their complexity. We present a collection of overview and statistical displays linked through a high level of interactivity to provide a framework for gaining key scientific insight into the distribution of the simulation results as well as the uncertainty associated with the data. In contrast to methodsmore » that present large amounts of diverse information in a single display, we argue that combining multiple linked statistical displays yields a clearer presentation of the data and facilitates a greater level of visual data analysis. We demonstrate this approach using driving problems from climate modeling and meteorology and discuss generalizations to other fields.« less

  15. PERSEUS QC: preparing statistic data sets

    NASA Astrophysics Data System (ADS)

    Belokopytov, Vladimir; Khaliulin, Alexey; Ingerov, Andrey; Zhuk, Elena; Gertman, Isaac; Zodiatis, George; Nikolaidis, Marios; Nikolaidis, Andreas; Stylianou, Stavros

    2017-09-01

    The Desktop Oceanographic Data Processing Module was developed for visual analysis of interdisciplinary cruise measurements. The program provides the possibility of data selection based on different criteria, map plotting, sea horizontal sections, and sea depth vertical profiles. The data selection in the area of interest can be specified according to a set of different physical and chemical parameters complimented by additional parameters, such as the cruise number, ship name, and time period. The visual analysis of a set of vertical profiles in the selected area allows to determine the quality of the data, their location and the time of the in-situ measurements and to exclude any questionable data from the statistical analysis. For each selected set of profiles, the average vertical profile, the minimal and maximal values of the parameter under examination and the root mean square (r.m.s.) are estimated. These estimates are compared with the parameter ranges, set for each sub-region by MEDAR/MEDATLAS-II and SeaDataNet2 projects. In the framework of the PERSEUS project, certain parameters which lacked a range were calculated from scratch, while some of the previously used ranges were re-defined using more comprehensive data sets based on SeaDataNet2, SESAME and PERSEUS projects. In some cases we have used additional sub- regions to redefine the ranges ore precisely. The recalculated ranges are used to improve the PERSEUS Data Quality Control.

  16. Data Model Performance in Data Warehousing

    NASA Astrophysics Data System (ADS)

    Rorimpandey, G. C.; Sangkop, F. I.; Rantung, V. P.; Zwart, J. P.; Liando, O. E. S.; Mewengkang, A.

    2018-02-01

    Data Warehouses have increasingly become important in organizations that have large amount of data. It is not a product but a part of a solution for the decision support system in those organizations. Data model is the starting point for designing and developing of data warehouses architectures. Thus, the data model needs stable interfaces and consistent for a longer period of time. The aim of this research is to know which data model in data warehousing has the best performance. The research method is descriptive analysis, which has 3 main tasks, such as data collection and organization, analysis of data and interpretation of data. The result of this research is discussed in a statistic analysis method, represents that there is no statistical difference among data models used in data warehousing. The organization can utilize four data model proposed when designing and developing data warehouse.

  17. Conformity and statistical tolerancing

    NASA Astrophysics Data System (ADS)

    Leblond, Laurent; Pillet, Maurice

    2018-02-01

    Statistical tolerancing was first proposed by Shewhart (Economic Control of Quality of Manufactured Product, (1931) reprinted 1980 by ASQC), in spite of this long history, its use remains moderate. One of the probable reasons for this low utilization is undoubtedly the difficulty for designers to anticipate the risks of this approach. The arithmetic tolerance (worst case) allows a simple interpretation: conformity is defined by the presence of the characteristic in an interval. Statistical tolerancing is more complex in its definition. An interval is not sufficient to define the conformance. To justify the statistical tolerancing formula used by designers, a tolerance interval should be interpreted as the interval where most of the parts produced should probably be located. This tolerance is justified by considering a conformity criterion of the parts guaranteeing low offsets on the latter characteristics. Unlike traditional arithmetic tolerancing, statistical tolerancing requires a sustained exchange of information between design and manufacture to be used safely. This paper proposes a formal definition of the conformity, which we apply successively to the quadratic and arithmetic tolerancing. We introduce a concept of concavity, which helps us to demonstrate the link between tolerancing approach and conformity. We use this concept to demonstrate the various acceptable propositions of statistical tolerancing (in the space decentring, dispersion).

  18. Uncertainty Quantification and Statistical Convergence Guidelines for PIV Data

    NASA Astrophysics Data System (ADS)

    Stegmeir, Matthew; Kassen, Dan

    2016-11-01

    As Particle Image Velocimetry has continued to mature, it has developed into a robust and flexible technique for velocimetry used by expert and non-expert users. While historical estimates of PIV accuracy have typically relied heavily on "rules of thumb" and analysis of idealized synthetic images, recently increased emphasis has been placed on better quantifying real-world PIV measurement uncertainty. Multiple techniques have been developed to provide per-vector instantaneous uncertainty estimates for PIV measurements. Often real-world experimental conditions introduce complications in collecting "optimal" data, and the effect of these conditions is important to consider when planning an experimental campaign. The current work utilizes the results of PIV Uncertainty Quantification techniques to develop a framework for PIV users to utilize estimated PIV confidence intervals to compute reliable data convergence criteria for optimal sampling of flow statistics. Results are compared using experimental and synthetic data, and recommended guidelines and procedures leveraging estimated PIV confidence intervals for efficient sampling for converged statistics are provided.

  19. Quick Access: Find Statistical Data on the Internet.

    ERIC Educational Resources Information Center

    Su, Di

    1999-01-01

    Provides an annotated list of Internet sources (World Wide Web, ftp, and gopher sites) for current and historical statistical business data, including selected interest rates, the Consumer Price Index, the Producer Price Index, foreign currency exchange rates, noon buying rates, per diem rates, the special drawing right, stock quotes, and mutual…

  20. SEISVIZ3D: Stereoscopic system for the representation of seismic data - Interpretation and Immersion

    NASA Astrophysics Data System (ADS)

    von Hartmann, Hartwig; Rilling, Stefan; Bogen, Manfred; Thomas, Rüdiger

    2015-04-01

    The seismic method is a valuable tool for getting 3D-images from the subsurface. Seismic data acquisition today is not only a topic for oil and gas exploration but is used also for geothermal exploration, inspections of nuclear waste sites and for scientific investigations. The system presented in this contribution may also have an impact on the visualization of 3D-data of other geophysical methods. 3D-seismic data can be displayed in different ways to give a spatial impression of the subsurface.They are a combination of individual vertical cuts, possibly linked to a cubical portion of the data volume, and the stereoscopic view of the seismic data. By these methods, the spatial perception for the structures and thus of the processes in the subsurface should be increased. Stereoscopic techniques are e. g. implemented in the CAVE and the WALL, both of which require a lot of space and high technical effort. The aim of the interpretation system shown here is stereoscopic visualization of seismic data at the workplace, i.e. at the personal workstation and monitor. The system was developed with following criteria in mind: • Fast rendering of large amounts of data so that a continuous view of the data when changing the viewing angle and the data section is possible, • defining areas in stereoscopic view to translate the spatial impression directly into an interpretation, • the development of an appropriate user interface, including head-tracking, for handling the increased degrees of freedom, • the possibility of collaboration, i.e. teamwork and idea exchange with the simultaneous viewing of a scene at remote locations. The possibilities offered by the use of a stereoscopic system do not replace a conventional interpretation workflow. Rather they have to be implemented into it as an additional step. The amplitude distribution of the seismic data is a challenge for the stereoscopic display because the opacity level and the scaling and selection of the data have to

  1. General Aviation Avionics Statistics : 1979 Data

    DOT National Transportation Integrated Search

    1981-04-01

    This report presents avionics statistics for the 1979 general aviation (GA) aircraft fleet and is the sixth in a series titled General Aviation Avionics Statistics. The statistics preseneted in a capability group framework which enables one to relate...

  2. 3 CFR - Enhanced Collection of Relevant Data and Statistics Relating to Women

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 3 The President 1 2012-01-01 2012-01-01 false Enhanced Collection of Relevant Data and Statistics Relating to Women Presidential Documents Other Presidential Documents Memorandum of March 4, 2011 Enhanced Collection of Relevant Data and Statistics Relating to Women Memorandum for the Heads of Executive Departments and Agencies I am proud to work...

  3. General Aviation Avionics Statistics : 1978 Data

    DOT National Transportation Integrated Search

    1980-12-01

    The report presents avionics statistics for the 1978 general aviation (GA) aircraft fleet and is the fifth in a series titled "General Aviation Statistics." The statistics are presented in a capability group framework which enables one to relate airb...

  4. Novel Kalman Filter Algorithm for Statistical Monitoring of Extensive Landscapes with Synoptic Sensor Data

    PubMed Central

    Czaplewski, Raymond L.

    2015-01-01

    Wall-to-wall remotely sensed data are increasingly available to monitor landscape dynamics over large geographic areas. However, statistical monitoring programs that use post-stratification cannot fully utilize those sensor data. The Kalman filter (KF) is an alternative statistical estimator. I develop a new KF algorithm that is numerically robust with large numbers of study variables and auxiliary sensor variables. A National Forest Inventory (NFI) illustrates application within an official statistics program. Practical recommendations regarding remote sensing and statistical issues are offered. This algorithm has the potential to increase the value of synoptic sensor data for statistical monitoring of large geographic areas. PMID:26393588

  5. Online Updating of Statistical Inference in the Big Data Setting.

    PubMed

    Schifano, Elizabeth D; Wu, Jing; Wang, Chun; Yan, Jun; Chen, Ming-Hui

    2016-01-01

    We present statistical methods for big data arising from online analytical processing, where large amounts of data arrive in streams and require fast analysis without storage/access to the historical data. In particular, we develop iterative estimating algorithms and statistical inferences for linear models and estimating equations that update as new data arrive. These algorithms are computationally efficient, minimally storage-intensive, and allow for possible rank deficiencies in the subset design matrices due to rare-event covariates. Within the linear model setting, the proposed online-updating framework leads to predictive residual tests that can be used to assess the goodness-of-fit of the hypothesized model. We also propose a new online-updating estimator under the estimating equation setting. Theoretical properties of the goodness-of-fit tests and proposed estimators are examined in detail. In simulation studies and real data applications, our estimator compares favorably with competing approaches under the estimating equation setting.

  6. A spatial scan statistic for survival data based on Weibull distribution.

    PubMed

    Bhatt, Vijaya; Tiwari, Neeraj

    2014-05-20

    The spatial scan statistic has been developed as a geographical cluster detection analysis tool for different types of data sets such as Bernoulli, Poisson, ordinal, normal and exponential. We propose a scan statistic for survival data based on Weibull distribution. It may also be used for other survival distributions, such as exponential, gamma, and log normal. The proposed method is applied on the survival data of tuberculosis patients for the years 2004-2005 in Nainital district of Uttarakhand, India. Simulation studies reveal that the proposed method performs well for different survival distribution functions. Copyright © 2013 John Wiley & Sons, Ltd.

  7. Which statistics should tropical biologists learn?

    PubMed

    Loaiza Velásquez, Natalia; González Lutz, María Isabel; Monge-Nájera, Julián

    2011-09-01

    Tropical biologists study the richest and most endangered biodiversity in the planet, and in these times of climate change and mega-extinctions, the need for efficient, good quality research is more pressing than in the past. However, the statistical component in research published by tropical authors sometimes suffers from poor quality in data collection; mediocre or bad experimental design and a rigid and outdated view of data analysis. To suggest improvements in their statistical education, we listed all the statistical tests and other quantitative analyses used in two leading tropical journals, the Revista de Biología Tropical and Biotropica, during a year. The 12 most frequent tests in the articles were: Analysis of Variance (ANOVA), Chi-Square Test, Student's T Test, Linear Regression, Pearson's Correlation Coefficient, Mann-Whitney U Test, Kruskal-Wallis Test, Shannon's Diversity Index, Tukey's Test, Cluster Analysis, Spearman's Rank Correlation Test and Principal Component Analysis. We conclude that statistical education for tropical biologists must abandon the old syllabus based on the mathematical side of statistics and concentrate on the correct selection of these and other procedures and tests, on their biological interpretation and on the use of reliable and friendly freeware. We think that their time will be better spent understanding and protecting tropical ecosystems than trying to learn the mathematical foundations of statistics: in most cases, a well designed one-semester course should be enough for their basic requirements.

  8. A Method for Improved Interpretation of "Spot" Biomarker Data ...

    EPA Pesticide Factsheets

    A Method for Improved Interpretation of "Spot" Biomarker Data The National Exposure Research Laboratory (NERL) Human Exposure and Atmospheric Sciences Division (HEASD) conducts research in support of EPA mission to protect human health and the environment. HEASD research program supports Goal 1 (Clean Air) and Goal 4 (Healthy People) of EPA strategic plan. More specifically, our division conducts research to characterize the movement of pollutants from the source to contact with humans. Our multidisciplinary research program produces Methods, Measurements, and Models to identify relationships between and characterize processes that link source emissions, environmental concentrations, human exposures, and target-tissue dose. The impact of these tools is improved regulatory programs and policies for EPA.

  9. A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility.

    PubMed

    Moore, Jason H; Gilbert, Joshua C; Tsai, Chia-Ti; Chiang, Fu-Tien; Holden, Todd; Barney, Nate; White, Bill C

    2006-07-21

    Detecting, characterizing, and interpreting gene-gene interactions or epistasis in studies of human disease susceptibility is both a mathematical and a computational challenge. To address this problem, we have previously developed a multifactor dimensionality reduction (MDR) method for collapsing high-dimensional genetic data into a single dimension (i.e. constructive induction) thus permitting interactions to be detected in relatively small sample sizes. In this paper, we describe a comprehensive and flexible framework for detecting and interpreting gene-gene interactions that utilizes advances in information theory for selecting interesting single-nucleotide polymorphisms (SNPs), MDR for constructive induction, machine learning methods for classification, and finally graphical models for interpretation. We illustrate the usefulness of this strategy using artificial datasets simulated from several different two-locus and three-locus epistasis models. We show that the accuracy, sensitivity, specificity, and precision of a naïve Bayes classifier are significantly improved when SNPs are selected based on their information gain (i.e. class entropy removed) and reduced to a single attribute using MDR. We then apply this strategy to detecting, characterizing, and interpreting epistatic models in a genetic study (n = 500) of atrial fibrillation and show that both classification and model interpretation are significantly improved.

  10. Use of the dynamic stiffness method to interpret experimental data from a nonlinear system

    NASA Astrophysics Data System (ADS)

    Tang, Bin; Brennan, M. J.; Gatti, G.

    2018-05-01

    The interpretation of experimental data from nonlinear structures is challenging, primarily because of dependency on types and levels of excitation, and coupling issues with test equipment. In this paper, the use of the dynamic stiffness method, which is commonly used in the analysis of linear systems, is used to interpret the data from a vibration test of a controllable compressed beam structure coupled to a test shaker. For a single mode of the system, this method facilitates the separation of mass, stiffness and damping effects, including nonlinear stiffness effects. It also allows the separation of the dynamics of the shaker from the structure under test. The approach needs to be used with care, and is only suitable if the nonlinear system has a response that is predominantly at the excitation frequency. For the structure under test, the raw experimental data revealed little about the underlying causes of the dynamic behaviour. However, the dynamic stiffness approach allowed the effects due to the nonlinear stiffness to be easily determined.

  11. Statistical Physics in the Era of Big Data

    ERIC Educational Resources Information Center

    Wang, Dashun

    2013-01-01

    With the wealth of data provided by a wide range of high-throughout measurement tools and technologies, statistical physics of complex systems is entering a new phase, impacting in a meaningful fashion a wide range of fields, from cell biology to computer science to economics. In this dissertation, by applying tools and techniques developed in…

  12. Admixture, Population Structure, and F-Statistics.

    PubMed

    Peter, Benjamin M

    2016-04-01

    Many questions about human genetic history can be addressed by examining the patterns of shared genetic variation between sets of populations. A useful methodological framework for this purpose isF-statistics that measure shared genetic drift between sets of two, three, and four populations and can be used to test simple and complex hypotheses about admixture between populations. This article provides context from phylogenetic and population genetic theory. I review how F-statistics can be interpreted as branch lengths or paths and derive new interpretations, using coalescent theory. I further show that the admixture tests can be interpreted as testing general properties of phylogenies, allowing extension of some ideas applications to arbitrary phylogenetic trees. The new results are used to investigate the behavior of the statistics under different models of population structure and show how population substructure complicates inference. The results lead to simplified estimators in many cases, and I recommend to replace F3 with the average number of pairwise differences for estimating population divergence. Copyright © 2016 by the Genetics Society of America.

  13. [Notes on vital statistics for the study of perinatal health].

    PubMed

    Juárez, Sol Pía

    2014-01-01

    Vital statistics, published by the National Statistics Institute in Spain, are a highly important source for the study of perinatal health nationwide. However, the process of data collection is not well-known and has implications both for the quality and interpretation of the epidemiological results derived from this source. The aim of this study was to present how the information is collected and some of the associated problems. This study is the result of an analysis of the methodological notes from the National Statistics Institute and first-hand information obtained from hospitals, the Central Civil Registry of Madrid, and the Madrid Institute for Statistics. Greater integration between these institutions is required to improve the quality of birth and stillbirth statistics. Copyright © 2014 SESPAS. Published by Elsevier Espana. All rights reserved.

  14. Data Processing System (DPS) software with experimental design, statistical analysis and data mining developed for use in entomological research.

    PubMed

    Tang, Qi-Yi; Zhang, Chuan-Xi

    2013-04-01

    A comprehensive but simple-to-use software package called DPS (Data Processing System) has been developed to execute a range of standard numerical analyses and operations used in experimental design, statistics and data mining. This program runs on standard Windows computers. Many of the functions are specific to entomological and other biological research and are not found in standard statistical software. This paper presents applications of DPS to experimental design, statistical analysis and data mining in entomology. © 2012 The Authors Insect Science © 2012 Institute of Zoology, Chinese Academy of Sciences.

  15. MASH Suite: a user-friendly and versatile software interface for high-resolution mass spectrometry data interpretation and visualization.

    PubMed

    Guner, Huseyin; Close, Patrick L; Cai, Wenxuan; Zhang, Han; Peng, Ying; Gregorich, Zachery R; Ge, Ying

    2014-03-01

    The rapid advancements in mass spectrometry (MS) instrumentation, particularly in Fourier transform (FT) MS, have made the acquisition of high-resolution and high-accuracy mass measurements routine. However, the software tools for the interpretation of high-resolution MS data are underdeveloped. Although several algorithms for the automatic processing of high-resolution MS data are available, there is still an urgent need for a user-friendly interface with functions that allow users to visualize and validate the computational output. Therefore, we have developed MASH Suite, a user-friendly and versatile software interface for processing high-resolution MS data. MASH Suite contains a wide range of features that allow users to easily navigate through data analysis, visualize complex high-resolution MS data, and manually validate automatically processed results. Furthermore, it provides easy, fast, and reliable interpretation of top-down, middle-down, and bottom-up MS data. MASH Suite is convenient, easily operated, and freely available. It can greatly facilitate the comprehensive interpretation and validation of high-resolution MS data with high accuracy and reliability.

  16. Model-independent plot of dynamic PET data facilitates data interpretation and model selection.

    PubMed

    Munk, Ole Lajord

    2012-02-21

    When testing new PET radiotracers or new applications of existing tracers, the blood-tissue exchange and the metabolism need to be examined. However, conventional plots of measured time-activity curves from dynamic PET do not reveal the inherent kinetic information. A novel model-independent volume-influx plot (vi-plot) was developed and validated. The new vi-plot shows the time course of the instantaneous distribution volume and the instantaneous influx rate. The vi-plot visualises physiological information that facilitates model selection and it reveals when a quasi-steady state is reached, which is a prerequisite for the use of the graphical analyses by Logan and Gjedde-Patlak. Both axes of the vi-plot have direct physiological interpretation, and the plot shows kinetic parameter in close agreement with estimates obtained by non-linear kinetic modelling. The vi-plot is equally useful for analyses of PET data based on a plasma input function or a reference region input function. The vi-plot is a model-independent and informative plot for data exploration that facilitates the selection of an appropriate method for data analysis. Copyright © 2011 Elsevier Ltd. All rights reserved.

  17. On the interpretation of satellite-derived gravity and magnetic data for studies of crustal geology and metallogenesis

    NASA Technical Reports Server (NTRS)

    Hastings, D. A.

    1985-01-01

    Satellite-derived global gravity and magnetic maps have been shown to be useful in large-scale studies of the Earth's crust, despite the relative infancy of such studies. Numerous authors have made spatial associations of gravity or magnetic anomalies with geological provinces. Gravimetric interpretations are often made in terms of isostasy, regional variations of density, or of geodesy in general. Interpretations of satellite magnetic anomalies often base assumptions of overall crustal magnetism on concepts of the vertical and horizontal distribution of magnetic susceptibility, then make models of these assumed distributions. The opportunity of improving our satellite gravity and magnetic data through the proposed Geopotential Research Mission should considerably improve the scientific community's ability to analyze and interpret global magnetic and gravity data.

  18. Inference Control Mechanism for Statistical Database: Frequency-Imposed Data Distortions.

    ERIC Educational Resources Information Center

    Liew, Chong K.; And Others

    1985-01-01

    Introduces two data distortion methods (Frequency-Imposed Distortion, Frequency-Imposed Probability Distortion) and uses a Monte Carlo study to compare their performance with that of other distortion methods (Point Distortion, Probability Distortion). Indications that data generated by these two methods produce accurate statistics and protect…

  19. An Innovative Open Data-driven Approach for Improved Interpretation of Coverage Data at NASA JPL's PO.DAA

    NASA Astrophysics Data System (ADS)

    McGibbney, L. J.; Armstrong, E. M.

    2016-12-01

    Figuratively speaking, Scientific Datasets (SD) are shared by data producers in a multitude of shapes, sizes and flavors. Primarily however they exist as machine-independent manifestations supporting the creation, access, and sharing of array-oriented SD that can on occasion be spread across multiple files. Within the Earth Sciences, the most notable general examples include the HDF family, NetCDF, etc. with other formats such as GRIB being used pervasively within specific domains such as the Oceanographic, Atmospheric and Meteorological sciences. Such file formats contain Coverage Data e.g. a digital representation of some spatio-temporal phenomenon. A challenge for large data producers such as NASA and NOAA as well as consumers of coverage datasets (particularly surrounding visualization and interactive use within web clients) is that this is still not a straight-forward issue due to size, serialization and inherent complexity. Additionally existing data formats are either unsuitable for the Web (like netCDF files) or hard to interpret independently due to missing standard structures and metadata (e.g. the OPeNDAP protocol). Therefore alternative, Web friendly manifestations of such datasets are required.CoverageJSON is an emerging data format for publishing coverage data to the web in a web-friendly, way which fits in with the linked data publication paradigm hence lowering the barrier for interpretation by consumers via mobile devices and client applications, etc. as well as data producers who can build next generation Web friendly Web services around datasets. This work will detail how CoverageJSON is being evaluated at NASA JPL's PO.DAAC as an enabling data representation format for publishing SD as Linked Open Data embedded within SD landing pages as well as via semantic data repositories. We are currently evaluating how utilization of CoverageJSON within SD landing pages addresses the long-standing acknowledgement that SD producers are not currently

  20. Statistically Characterizing Intra- and Inter-Individual Variability in Children with Developmental Coordination Disorder

    ERIC Educational Resources Information Center

    King, Bradley R.; Harring, Jeffrey R.; Oliveira, Marcio A.; Clark, Jane E.

    2011-01-01

    Previous research investigating children with Developmental Coordination Disorder (DCD) has consistently reported increased intra- and inter-individual variability during motor skill performance. Statistically characterizing this variability is not only critical for the analysis and interpretation of behavioral data, but also may facilitate our…

  1. Novice Interpretations of Visual Representations of Geosciences Data

    NASA Astrophysics Data System (ADS)

    Burkemper, L. K.; Arthurs, L.

    2013-12-01

    Past cognition research of individual's perception and comprehension of bar and line graphs are substantive enough that they have resulted in the generation of graph design principles and graph comprehension theories; however, gaps remain in our understanding of how people process visual representations of data, especially of geologic and atmospheric data. This pilot project serves to build on others' prior research and begin filling the existing gaps. The primary objectives of this pilot project include: (i) design a novel data collection protocol based on a combination of paper-based surveys, think-aloud interviews, and eye-tracking tasks to investigate student data handling skills of simple to complex visual representations of geologic and atmospheric data, (ii) demonstrate that the protocol yields results that shed light on student data handling skills, and (iii) generate preliminary findings upon which tentative but perhaps helpful recommendations on how to more effectively present these data to the non-scientist community and teach essential data handling skills. An effective protocol for the combined use of paper-based surveys, think-aloud interviews, and computer-based eye-tracking tasks for investigating cognitive processes involved in perceiving, comprehending, and interpreting visual representations of geologic and atmospheric data is instrumental to future research in this area. The outcomes of this pilot study provide the foundation upon which future more in depth and scaled up investigations can build. Furthermore, findings of this pilot project are sufficient for making, at least, tentative recommendations that can help inform (i) the design of physical attributes of visual representations of data, especially more complex representations, that may aid in improving students' data handling skills and (ii) instructional approaches that have the potential to aid students in more effectively handling visual representations of geologic and atmospheric data

  2. Statistical analysis of field data for aircraft warranties

    NASA Astrophysics Data System (ADS)

    Lakey, Mary J.

    Air Force and Navy maintenance data collection systems were researched to determine their scientific applicability to the warranty process. New and unique algorithms were developed to extract failure distributions which were then used to characterize how selected families of equipment typically fails. Families of similar equipment were identified in terms of function, technology and failure patterns. Statistical analyses and applications such as goodness-of-fit test, maximum likelihood estimation and derivation of confidence intervals for the probability density function parameters were applied to characterize the distributions and their failure patterns. Statistical and reliability theory, with relevance to equipment design and operational failures were also determining factors in characterizing the failure patterns of the equipment families. Inferences about the families with relevance to warranty needs were then made.

  3. Dynamical interpretation of conditional patterns

    NASA Technical Reports Server (NTRS)

    Adrian, R. J.; Moser, R. D.; Moin, P.

    1988-01-01

    While great progress is being made in characterizing the 3-D structure of organized turbulent motions using conditional averaging analysis, there is a lack of theoretical guidance regarding the interpretation and utilization of such information. Questions concerning the significance of the structures, their contributions to various transport properties, and their dynamics cannot be answered without recourse to appropriate dynamical governing equations. One approach which addresses some of these questions uses the conditional fields as initial conditions and calculates their evolution from the Navier-Stokes equations, yielding valuable information about stability, growth, and longevity of the mean structure. To interpret statistical aspects of the structures, a different type of theory which deals with the structures in the context of their contributions to the statistics of the flow is needed. As a first step toward this end, an effort was made to integrate the structural information from the study of organized structures with a suitable statistical theory. This is done by stochastically estimating the two-point conditional averages that appear in the equation for the one-point probability density function, and relating the structures to the conditional stresses. Salient features of the estimates are identified, and the structure of the one-point estimates in channel flow is defined.

  4. Statistical analysis of the national crash severity study data

    DOT National Transportation Integrated Search

    1980-08-01

    This is the Final Report on a two-year statistical analysis of the data collected in the National Crash Severity Study (NCSS). The analysis presented is primarily concerned with the relationship between occupant injury severity and the crash conditio...

  5. Statistical primer: how to deal with missing data in scientific research?

    PubMed

    Papageorgiou, Grigorios; Grant, Stuart W; Takkenberg, Johanna J M; Mokhles, Mostafa M

    2018-05-10

    Missing data are a common challenge encountered in research which can compromise the results of statistical inference when not handled appropriately. This paper aims to introduce basic concepts of missing data to a non-statistical audience, list and compare some of the most popular approaches for handling missing data in practice and provide guidelines and recommendations for dealing with and reporting missing data in scientific research. Complete case analysis and single imputation are simple approaches for handling missing data and are popular in practice, however, in most cases they are not guaranteed to provide valid inferences. Multiple imputation is a robust and general alternative which is appropriate for data missing at random, surpassing the disadvantages of the simpler approaches, but should always be conducted with care. The aforementioned approaches are illustrated and compared in an example application using Cox regression.

  6. Exploratory Visual Analysis of Statistical Results from Microarray Experiments Comparing High and Low Grade Glioma

    PubMed Central

    Reif, David M.; Israel, Mark A.; Moore, Jason H.

    2007-01-01

    The biological interpretation of gene expression microarray results is a daunting challenge. For complex diseases such as cancer, wherein the body of published research is extensive, the incorporation of expert knowledge provides a useful analytical framework. We have previously developed the Exploratory Visual Analysis (EVA) software for exploring data analysis results in the context of annotation information about each gene, as well as biologically relevant groups of genes. We present EVA as a flexible combination of statistics and biological annotation that provides a straightforward visual interface for the interpretation of microarray analyses of gene expression in the most commonly occuring class of brain tumors, glioma. We demonstrate the utility of EVA for the biological interpretation of statistical results by analyzing publicly available gene expression profiles of two important glial tumors. The results of a statistical comparison between 21 malignant, high-grade glioblastoma multiforme (GBM) tumors and 19 indolent, low-grade pilocytic astrocytomas were analyzed using EVA. By using EVA to examine the results of a relatively simple statistical analysis, we were able to identify tumor class-specific gene expression patterns having both statistical and biological significance. Our interactive analysis highlighted the potential importance of genes involved in cell cycle progression, proliferation, signaling, adhesion, migration, motility, and structure, as well as candidate gene loci on a region of Chromosome 7 that has been implicated in glioma. Because EVA does not require statistical or computational expertise and has the flexibility to accommodate any type of statistical analysis, we anticipate EVA will prove a useful addition to the repertoire of computational methods used for microarray data analysis. EVA is available at no charge to academic users and can be found at http://www.epistasis.org. PMID:19390666

  7. Right-Sizing Statistical Models for Longitudinal Data

    PubMed Central

    Wood, Phillip K.; Steinley, Douglas; Jackson, Kristina M.

    2015-01-01

    Arguments are proposed that researchers using longitudinal data should consider more and less complex statistical model alternatives to their initially chosen techniques in an effort to “right-size” the model to the data at hand. Such model comparisons may alert researchers who use poorly fitting overly parsimonious models to more complex better fitting alternatives, and, alternatively, may identify more parsimonious alternatives to overly complex (and perhaps empirically under-identified and/or less powerful) statistical models. A general framework is proposed for considering (often nested) relationships between a variety of psychometric and growth curve models. A three-step approach is proposed in which models are evaluated based on the number and patterning of variance components prior to selection of better-fitting growth models that explain both mean and variation/covariation patterns. The orthogonal, free-curve slope-intercept (FCSI) growth model is considered as a general model which includes, as special cases, many models including the Factor Mean model (FM, McArdle & Epstein, 1987), McDonald's (1967) linearly constrained factor model, Hierarchical Linear Models (HLM), Repeated Measures MANOVA, and the Linear Slope Intercept (LinearSI) Growth Model. The FCSI model, in turn, is nested within the Tuckerized factor model. The approach is illustrated by comparing alternative models in a longitudinal study of children's vocabulary and by comparison of several candidate parametric growth and chronometric models in a Monte Carlo study. PMID:26237507

  8. Interpreting Low Spatial Resolution Thermal Data from Active Volcanoes on Io and the Earth

    NASA Technical Reports Server (NTRS)

    Keszthelyi, L.; Harris, A. J. L.; Flynn, L.; Davies, A. G.; McEwen, A.

    2001-01-01

    The style of volcanism was successfully determined at a number of active volcanoes on Io and the Earth using the same techniques to interpret thermal remote sensing data. Additional information is contained in the original extended abstract.

  9. Impact of Immediate Interpretation of Screening Tomosynthesis Mammography on Performance Metrics.

    PubMed

    Winkler, Nicole S; Freer, Phoebe; Anzai, Yoshimi; Hu, Nan; Stein, Matthew

    2018-05-07

    This study aimed to compare performance metrics for immediate and delayed batch interpretation of screening tomosynthesis mammograms. This HIPAA compliant study was approved by institutional review board with a waiver of consent. A retrospective analysis of screening performance metrics for tomosynthesis mammograms interpreted in 2015 when mammograms were read immediately was compared to historical controls from 2013 to 2014 when mammograms were batch interpreted after the patient had departed. A total of 5518 screening tomosynthesis mammograms (n = 1212 for batch interpretation and n = 4306 for immediate interpretation) were evaluated. The larger sample size for the latter group reflects a group practice shift to performing tomosynthesis for the majority of patients. Age, breast density, comparison examinations, and high-risk status were compared. An asymptotic proportion test and multivariable analysis were used to compare performance metrics. There was no statistically significant difference in recall or cancer detection rates for the batch interpretation group compared to immediate interpretation group with respective recall rate of 6.5% vs 5.3% = +1.2% (95% confidence interval -0.3 to 2.7%; P = .101) and cancer detection rate of 6.6 vs 7.2 per thousand = -0.6 (95% confidence interval -5.9 to 4.6; P = .825). There was no statistically significant difference in positive predictive values (PPVs) including PPV1 (screening recall), PPV2 (biopsy recommendation), or PPV 3 (biopsy performed) with batch interpretation (10.1%, 42.1%, and 40.0%, respectively) and immediate interpretation (13.6%, 39.2%, and 39.7%, respectively). After adjusting for age, breast density, high-risk status, and comparison mammogram, there was no difference in the odds of being recalled or cancer detection between the two groups. There is no statistically significant difference in interpretation performance metrics for screening tomosynthesis mammograms interpreted

  10. Novel Data Reduction Based on Statistical Similarity

    DOE PAGES

    Lee, Dongeun; Sim, Alex; Choi, Jaesik; ...

    2016-07-18

    Applications such as scientific simulations and power grid monitoring are generating so much data quickly that compression is essential to reduce storage requirement or transmission capacity. To achieve better compression, one is often willing to discard some repeated information. These lossy compression methods are primarily designed to minimize the Euclidean distance between the original data and the compressed data. But this measure of distance severely limits either reconstruction quality or compression performance. In this paper, we propose a new class of compression method by redefining the distance measure with a statistical concept known as exchangeability. This approach reduces the storagemore » requirement and captures essential features, while reducing the storage requirement. In this paper, we report our design and implementation of such a compression method named IDEALEM. To demonstrate its effectiveness, we apply it on a set of power grid monitoring data, and show that it can reduce the volume of data much more than the best known compression method while maintaining the quality of the compressed data. Finally, in these tests, IDEALEM captures extraordinary events in the data, while its compression ratios can far exceed 100.« less

  11. Paleomagnetism.org: An online multi-platform open source environment for paleomagnetic data analysis

    NASA Astrophysics Data System (ADS)

    Koymans, Mathijs R.; Langereis, Cor G.; Pastor-Galán, Daniel; van Hinsbergen, Douwe J. J.

    2016-08-01

    This contribution provides an overview of Paleomagnetism.org, an open-source, multi-platform online environment for paleomagnetic data analysis. Paleomagnetism.org provides an interactive environment where paleomagnetic data can be interpreted, evaluated, visualized, and exported. The Paleomagnetism.org application is split in to an interpretation portal, a statistics portal, and a portal for miscellaneous paleomagnetic tools. In the interpretation portal, principle component analysis can be performed on visualized demagnetization diagrams. Interpreted directions and great circles can be combined to find great circle solutions. These directions can be used in the statistics portal, or exported as data and figures. The tools in the statistics portal cover standard Fisher statistics for directions and VGPs, including other statistical parameters used as reliability criteria. Other available tools include an eigenvector approach foldtest, two reversal test including a Monte Carlo simulation on mean directions, and a coordinate bootstrap on the original data. An implementation is included for the detection and correction of inclination shallowing in sediments following TK03.GAD. Finally we provide a module to visualize VGPs and expected paleolatitudes, declinations, and inclinations relative to widely used global apparent polar wander path models in coordinates of major continent-bearing plates. The tools in the miscellaneous portal include a net tectonic rotation (NTR) analysis to restore a body to its paleo-vertical and a bootstrapped oroclinal test using linear regressive techniques, including a modified foldtest around a vertical axis. Paleomagnetism.org provides an integrated approach for researchers to work with visualized (e.g. hemisphere projections, Zijderveld diagrams) paleomagnetic data. The application constructs a custom exportable file that can be shared freely and included in public databases. This exported file contains all data and can later be

  12. Structure elucidation of metabolite x17299 by interpretation of mass spectrometric data.

    PubMed

    Zhang, Qibo; Ford, Lisa A; Evans, Anne M; Toal, Douglas R

    2017-01-01

    A major bottleneck in metabolomic studies is metabolite identification from accurate mass spectrometric data. Metabolite x17299 was identified in plasma as an unknown in a metabolomic study using a compound-centric approach where the associated ion features of the compound were used to determine the true molecular mass. The aim of this work is to elucidate the chemical structure of x17299, a new compound by de novo interpretation of mass spectrometric data. An Orbitrap Elite mass spectrometer was used for acquisition of mass spectra up to MS 4 at high resolution. Synthetic standards of N,N,N -trimethyl-l-alanyl-l-proline betaine (l,l-TMAP), a diastereomer, and an enantiomer were chemically prepared. The planar structure of x17299 was successfully proposed by de novo mechanistic interpretation of mass spectrometric data without any laborious purification and nuclear magnetic resonance spectroscopic analysis. The proposed structure was verified by deuterium exchanged mass spectrometric analysis and confirmed by comparison to a synthetic standard. Relative configuration of x17299 was determined by direct chromatographic comparison to a pair of synthetic diastereomers. Absolute configuration was assigned after derivatization of x17299 with a chiral auxiliary group followed by its chromatographic comparison to a pair of synthetic standards. The chemical structure of metabolite x17299 was determined to be l,l-TMAP.

  13. Can linear regression modeling help clinicians in the interpretation of genotypic resistance data? An application to derive a lopinavir-score.

    PubMed

    Cozzi-Lepri, Alessandro; Prosperi, Mattia C F; Kjær, Jesper; Dunn, David; Paredes, Roger; Sabin, Caroline A; Lundgren, Jens D; Phillips, Andrew N; Pillay, Deenan

    2011-01-01

    The question of whether a score for a specific antiretroviral (e.g. lopinavir/r in this analysis) that improves prediction of viral load response given by existing expert-based interpretation systems (IS) could be derived from analyzing the correlation between genotypic data and virological response using statistical methods remains largely unanswered. We used the data of the patients from the UK Collaborative HIV Cohort (UK CHIC) Study for whom genotypic data were stored in the UK HIV Drug Resistance Database (UK HDRD) to construct a training/validation dataset of treatment change episodes (TCE). We used the average square error (ASE) on a 10-fold cross-validation and on a test dataset (the EuroSIDA TCE database) to compare the performance of a newly derived lopinavir/r score with that of the 3 most widely used expert-based interpretation rules (ANRS, HIVDB and Rega). Our analysis identified mutations V82A, I54V, K20I and I62V, which were associated with reduced viral response and mutations I15V and V91S which determined lopinavir/r hypersensitivity. All models performed equally well (ASE on test ranging between 1.1 and 1.3, p = 0.34). We fully explored the potential of linear regression to construct a simple predictive model for lopinavir/r-based TCE. Although, the performance of our proposed score was similar to that of already existing IS, previously unrecognized lopinavir/r-associated mutations were identified. The analysis illustrates an approach of validation of expert-based IS that could be used in the future for other antiretrovirals and in other settings outside HIV research.

  14. Chi-squared and C statistic minimization for low count per bin data

    NASA Astrophysics Data System (ADS)

    Nousek, John A.; Shue, David R.

    1989-07-01

    Results are presented from a computer simulation comparing two statistical fitting techniques on data samples with large and small counts per bin; the results are then related specifically to X-ray astronomy. The Marquardt and Powell minimization techniques are compared by using both to minimize the chi-squared statistic. In addition, Cash's C statistic is applied, with Powell's method, and it is shown that the C statistic produces better fits in the low-count regime than chi-squared.

  15. Statistics of Optical Coherence Tomography Data From Human Retina

    PubMed Central

    de Juan, Joaquín; Ferrone, Claudia; Giannini, Daniela; Huang, David; Koch, Giorgio; Russo, Valentina; Tan, Ou; Bruni, Carlo

    2010-01-01

    Optical coherence tomography (OCT) has recently become one of the primary methods for noninvasive probing of the human retina. The pseudoimage formed by OCT (the so-called B-scan) varies probabilistically across pixels due to complexities in the measurement technique. Hence, sensitive automatic procedures of diagnosis using OCT may exploit statistical analysis of the spatial distribution of reflectance. In this paper, we perform a statistical study of retinal OCT data. We find that the stretched exponential probability density function can model well the distribution of intensities in OCT pseudoimages. Moreover, we show a small, but significant correlation between neighbor pixels when measuring OCT intensities with pixels of about 5 µm. We then develop a simple joint probability model for the OCT data consistent with known retinal features. This model fits well the stretched exponential distribution of intensities and their spatial correlation. In normal retinas, fit parameters of this model are relatively constant along retinal layers, but varies across layers. However, in retinas with diabetic retinopathy, large spikes of parameter modulation interrupt the constancy within layers, exactly where pathologies are visible. We argue that these results give hope for improvement in statistical pathology-detection methods even when the disease is in its early stages. PMID:20304733

  16. Homeostasis and Gauss statistics: barriers to understanding natural variability.

    PubMed

    West, Bruce J

    2010-06-01

    In this paper, the concept of knowledge is argued to be the top of a three-tiered system of science. The first tier is that of measurement and data, followed by information consisting of the patterns within the data, and ending with theory that interprets the patterns and yields knowledge. Thus, when a scientific theory ceases to be consistent with the database the knowledge based on that theory must be re-examined and potentially modified. Consequently, all knowledge, like glory, is transient. Herein we focus on the non-normal statistics of physiologic time series and conclude that the empirical inverse power-law statistics and long-time correlations are inconsistent with the theoretical notion of homeostasis. We suggest replacing the notion of homeostasis with that of Fractal Physiology.

  17. Forest statistics of western Kentucky

    Treesearch

    The Forest Survey Organization Central States Forest Experiment Station

    1950-01-01

    This Survey Release presents the more significant preliminary statistics on the forest area and timber volume for the western region of Kentucky. Similar reports for the remainder of the state will be published as soon as statistical tabulations are completed. Later, an analytical report for the state will be published which will interpret forest area, timber volume,...

  18. Forest statistics of southern Indiana

    Treesearch

    The Forest Survey Organization Central States Forest Experiment Station

    1951-01-01

    This Survey Release presents the more significant preliminary statistics on the forest area and timber volume for each of the three regions of southern Indiana. A similar report will be published for the two northern Indiana regions. Later, an analytical report for the state will be published which will interpret statistics on forest area, timber- volume, growth, and...

  19. Interpreting and Reporting Radiological Water-Quality Data

    USGS Publications Warehouse

    McCurdy, David E.; Garbarino, John R.; Mullin, Ann H.

    2008-01-01

    This document provides information to U.S. Geological Survey (USGS) Water Science Centers on interpreting and reporting radiological results for samples of environmental matrices, most notably water. The information provided is intended to be broadly useful throughout the United States, but it is recommended that scientists who work at sites containing radioactive hazardous wastes need to consult additional sources for more detailed information. The document is largely based on recognized national standards and guidance documents for radioanalytical sample processing, most notably the Multi-Agency Radiological Laboratory Analytical Protocols Manual (MARLAP), and on documents published by the U.S. Environmental Protection Agency and the American National Standards Institute. It does not include discussion of standard USGS practices including field quality-control sample analysis, interpretive report policies, and related issues, all of which shall always be included in any effort by the Water Science Centers. The use of 'shall' in this report signifies a policy requirement of the USGS Office of Water Quality.

  20. A weighted U-statistic for genetic association analyses of sequencing data.

    PubMed

    Wei, Changshuai; Li, Ming; He, Zihuai; Vsevolozhskaya, Olga; Schaid, Daniel J; Lu, Qing

    2014-12-01

    With advancements in next-generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high-dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU-SEQ, for the high-dimensional association analysis of sequencing data. Based on a nonparametric U-statistic, WU-SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU-SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy-tailed distribution). Even when the assumptions were satisfied, WU-SEQ still attained comparable performance to SKAT. Finally, we applied WU-SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol. © 2014 WILEY PERIODICALS, INC.

  1. Assessing compositional variability through graphical analysis and Bayesian statistical approaches: case studies on transgenic crops.

    PubMed

    Harrigan, George G; Harrison, Jay M

    2012-01-01

    New transgenic (GM) crops are subjected to extensive safety assessments that include compositional comparisons with conventional counterparts as a cornerstone of the process. The influence of germplasm, location, environment, and agronomic treatments on compositional variability is, however, often obscured in these pair-wise comparisons. Furthermore, classical statistical significance testing can often provide an incomplete and over-simplified summary of highly responsive variables such as crop composition. In order to more clearly describe the influence of the numerous sources of compositional variation we present an introduction to two alternative but complementary approaches to data analysis and interpretation. These include i) exploratory data analysis (EDA) with its emphasis on visualization and graphics-based approaches and ii) Bayesian statistical methodology that provides easily interpretable and meaningful evaluations of data in terms of probability distributions. The EDA case-studies include analyses of herbicide-tolerant GM soybean and insect-protected GM maize and soybean. Bayesian approaches are presented in an analysis of herbicide-tolerant GM soybean. Advantages of these approaches over classical frequentist significance testing include the more direct interpretation of results in terms of probabilities pertaining to quantities of interest and no confusion over the application of corrections for multiple comparisons. It is concluded that a standardized framework for these methodologies could provide specific advantages through enhanced clarity of presentation and interpretation in comparative assessments of crop composition.

  2. Mississippi Community and Junior Colleges. Statistical Data, 1986-87.

    ERIC Educational Resources Information Center

    Moody, George V.; And Others

    Statistical data for Mississippi's 15 public community and junior college districts are presented in this document, providing information on enrollments, degrees and certificates awarded, revenues, expenditures, academic salary ranges, learning resources, transportation services, dormitory utilization, and auxiliary enterprises in 1986-87.…

  3. [How to fit and interpret multilevel models using SPSS].

    PubMed

    Pardo, Antonio; Ruiz, Miguel A; San Martín, Rafael

    2007-05-01

    Hierarchic or multilevel models are used to analyse data when cases belong to known groups and sample units are selected both from the individual level and from the group level. In this work, the multilevel models most commonly discussed in the statistic literature are described, explaining how to fit these models using the SPSS program (any version as of the 11 th ) and how to interpret the outcomes of the analysis. Five particular models are described, fitted, and interpreted: (1) one-way analysis of variance with random effects, (2) regression analysis with means-as-outcomes, (3) one-way analysis of covariance with random effects, (4) regression analysis with random coefficients, and (5) regression analysis with means- and slopes-as-outcomes. All models are explained, trying to make them understandable to researchers in health and behaviour sciences.

  4. Application of multivariate statistical techniques in microbial ecology

    PubMed Central

    Paliy, O.; Shankar, V.

    2016-01-01

    Recent advances in high-throughput methods of molecular analyses have led to an explosion of studies generating large scale ecological datasets. Especially noticeable effect has been attained in the field of microbial ecology, where new experimental approaches provided in-depth assessments of the composition, functions, and dynamic changes of complex microbial communities. Because even a single high-throughput experiment produces large amounts of data, powerful statistical techniques of multivariate analysis are well suited to analyze and interpret these datasets. Many different multivariate techniques are available, and often it is not clear which method should be applied to a particular dataset. In this review we describe and compare the most widely used multivariate statistical techniques including exploratory, interpretive, and discriminatory procedures. We consider several important limitations and assumptions of these methods, and we present examples of how these approaches have been utilized in recent studies to provide insight into the ecology of the microbial world. Finally, we offer suggestions for the selection of appropriate methods based on the research question and dataset structure. PMID:26786791

  5. Use of Data Visualisation in the Teaching of Statistics: A New Zealand Perspective

    ERIC Educational Resources Information Center

    Forbes, Sharleen; Chapman, Jeanette; Harraway, John; Stirling, Doug; Wild, Chris

    2014-01-01

    For many years, students have been taught to visualise data by drawing graphs. Recently, there has been a growing trend to teach statistics, particularly statistical concepts, using interactive and dynamic visualisation tools. Free down-loadable teaching and simulation software designed specifically for schools, and more general data visualisation…

  6. Interpreting Reading Assessment Data: Moving From Parts to Whole in a Testing Era

    ERIC Educational Resources Information Center

    Amendum, Steven J.; Conradi, Kristin; Pendleton, Melissa J.

    2016-01-01

    This article is designed to help teachers interpret reading assessment data from DIBELS beyond individual subtests to better support their students' needs. While it is important to understand the individual subtest measures, it is more vital to understand how each fits into the larger picture of reading development. The underlying construct of…

  7. OSPAR standard method and software for statistical analysis of beach litter data.

    PubMed

    Schulz, Marcus; van Loon, Willem; Fleet, David M; Baggelaar, Paul; van der Meulen, Eit

    2017-09-15

    The aim of this study is to develop standard statistical methods and software for the analysis of beach litter data. The optimal ensemble of statistical methods comprises the Mann-Kendall trend test, the Theil-Sen slope estimation, the Wilcoxon step trend test and basic descriptive statistics. The application of Litter Analyst, a tailor-made software for analysing the results of beach litter surveys, to OSPAR beach litter data from seven beaches bordering on the south-eastern North Sea, revealed 23 significant trends in the abundances of beach litter types for the period 2009-2014. Litter Analyst revealed a large variation in the abundance of litter types between beaches. To reduce the effects of spatial variation, trend analysis of beach litter data can most effectively be performed at the beach or national level. Spatial aggregation of beach litter data within a region is possible, but resulted in a considerable reduction in the number of significant trends. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. Regional interpretation of PSInSAR(TM) data for landslide investigations

    NASA Astrophysics Data System (ADS)

    Meisina, Claudia; Zucca, Francesco; Notti, Davide

    2010-05-01

    The work presents the results of a PSInSAR™ analysis carried out in a wide area of North-Western Italy (Lombardia and Piemonte regions) with an extension of about 40.000 km2. The regional study of PS data was part of a project concerning the geological interpretation of PSInSAR™ data, carried out in collaboration with local regional environment agencies (ARPA Piemonte and Lombardia Region). A set of SAR scenes acquired between May 1992 and December 2000 by the ERS sensor of the European Space Agency along ascending orbits and a set of scenes acquired between April 2003 and June 2007 by RADARSAT sensor along descending and ascending orbits were used by TeleRilevamento-Europa for a Standard PS Analysis. The RADARSAT images cover only a sector of Lombardia Region (Varese, Brescia, Bergamo, Sondrio provinces). At regional scale the aims were: 1) to check the capability of the technique in the detection of the landslides in different geological and geomorphological environments using different sensors; 2) to verify how the PSInSAR™ technique can improve the results of the landslide database IFFI (the Italian Landslide Inventory) in terms of landslide areal extent evaluation and unmapped phenomena detection. A database containing the areas where the SAR data showed anomalous movement (the so called anomalous areas) was built. This analysis takes into account the influence of the types of sensors (RADARSAT and ERS), of the different geometries of acquisition (ascending and descending) and of the geological and geomorphological characteristics of the main environments (Alps, Langhe Hills, Apennines) on landslide detection. The results of the research have showed that the PSInSAR™ analysis for slope movements is limited to very slow landslides with constant movement (particularly DSGSD) that represent a small percentage of all landslides. Problems related with lack of scatterers and topographic effects also limit the number of landslides suitable for studies

  9. Techniques in teaching statistics : linking research production and research use.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Martinez-Moyano, I .; Smith, A.; Univ. of Massachusetts at Boston)

    In the spirit of closing the 'research-practice gap,' the authors extend evidence-based principles to statistics instruction in social science graduate education. The authors employ a Delphi method to survey experienced statistics instructors to identify teaching techniques to overcome the challenges inherent in teaching statistics to students enrolled in practitioner-oriented master's degree programs. Among the teaching techniques identi?ed as essential are using real-life examples, requiring data collection exercises, and emphasizing interpretation rather than results. Building on existing research, preliminary interviews, and the ?ndings from the study, the authors develop a model describing antecedents to the strength of the link between researchmore » and practice.« less

  10. Replicate This! Creating Individual-Level Data from Summary Statistics Using R

    ERIC Educational Resources Information Center

    Morse, Brendan J.

    2013-01-01

    Incorporating realistic data and research examples into quantitative (e.g., statistics and research methods) courses has been widely recommended for enhancing student engagement and comprehension. One way to achieve these ends is to use a data generator to emulate the data in published research articles. "MorseGen" is a free data generator that…

  11. Using interpreted large scale aerial photo data to enhance satellite-based mapping and explore forest land definitions

    Treesearch

    Tracey S. Frescino; Gretchen G. Moisen

    2009-01-01

    The Interior-West, Forest Inventory and Analysis (FIA), Nevada Photo-Based Inventory Pilot (NPIP), launched in 2004, involved acquisition, processing, and interpretation of large scale aerial photographs on a subset of FIA plots (both forest and nonforest) throughout the state of Nevada. Two objectives of the pilot were to use the interpreted photo data to enhance...

  12. Ambiguities and conflicting results: the limitations of the kappa statistic in establishing the interrater reliability of the Irish nursing minimum data set for mental health: a discussion paper.

    PubMed

    Morris, Roisin; MacNeela, Padraig; Scott, Anne; Treacy, Pearl; Hyde, Abbey; O'Brien, Julian; Lehwaldt, Daniella; Byrne, Anne; Drennan, Jonathan

    2008-04-01

    In a study to establish the interrater reliability of the Irish Nursing Minimum Data Set (I-NMDS) for mental health difficulties relating to the choice of reliability test statistic were encountered. The objective of this paper is to highlight the difficulties associated with testing interrater reliability for an ordinal scale using a relatively homogenous sample and the recommended kw statistic. One pair of mental health nurses completed the I-NMDS for mental health for a total of 30 clients attending a mental health day centre over a two-week period. Data was analysed using the kw and percentage agreement statistics. A total of 34 of the 38 I-NMDS for mental health variables with lower than acceptable levels of kw reliability scores achieved acceptable levels of reliability according to their percentage agreement scores. The study findings implied that, due to the homogeneity of the sample, low variability within the data resulted in the 'base rate problem' associated with the use of kw statistic. Conclusions point to the interpretation of kw in tandem with percentage agreement scores. Suggestions that kw scores were low due to chance agreement and that one should strive to use a study sample with known variability are queried.

  13. Statistical evaluation of stability data: criteria for change-over-time and data variability.

    PubMed

    Bar, Raphael

    2003-01-01

    In a recently issued ICH Q1E guidance on evaluation of stability data of drug substances and products, the need to perform a statistical extrapolation of a shelf-life of a drug product or a retest period for a drug substance is based heavily on whether data exhibit a change-over-time and/or variability. However, this document suggests neither measures nor acceptance criteria of these two parameters. This paper demonstrates a useful application of simple statistical parameters for determining whether sets of stability data from either accelerated or long-term storage programs exhibit a change-over-time and/or variability. These parameters are all derived from a simple linear regression analysis first performed on the stability data. The p-value of the slope of the regression line is taken as a measure for change-over-time, and a value of 0.25 is suggested as a limit to insignificant change of the quantitative stability attributes monitored. The minimal process capability index, Cpk, calculated from the standard deviation of the regression line, is suggested as a measure for variability with a value of 2.5 as a limit for an insignificant variability. The usefulness of the above two parameters, p-value and Cpk, was demonstrated on stability data of a refrigerated drug product and on pooled data of three batches of a drug substance. In both cases, the determined parameters allowed characterization of the data in terms of change-over-time and variability. Consequently, complete evaluation of the stability data could be pursued according to the ICH guidance. It is believed that the application of the above two parameters with their acceptance criteria will allow a more unified evaluation of stability data.

  14. Multivariate statistical analysis software technologies for astrophysical research involving large data bases

    NASA Technical Reports Server (NTRS)

    Djorgovski, George

    1993-01-01

    The existing and forthcoming data bases from NASA missions contain an abundance of information whose complexity cannot be efficiently tapped with simple statistical techniques. Powerful multivariate statistical methods already exist which can be used to harness much of the richness of these data. Automatic classification techniques have been developed to solve the problem of identifying known types of objects in multiparameter data sets, in addition to leading to the discovery of new physical phenomena and classes of objects. We propose an exploratory study and integration of promising techniques in the development of a general and modular classification/analysis system for very large data bases, which would enhance and optimize data management and the use of human research resource.

  15. Multivariate statistical analysis software technologies for astrophysical research involving large data bases

    NASA Technical Reports Server (NTRS)

    Djorgovski, Stanislav

    1992-01-01

    The existing and forthcoming data bases from NASA missions contain an abundance of information whose complexity cannot be efficiently tapped with simple statistical techniques. Powerful multivariate statistical methods already exist which can be used to harness much of the richness of these data. Automatic classification techniques have been developed to solve the problem of identifying known types of objects in multi parameter data sets, in addition to leading to the discovery of new physical phenomena and classes of objects. We propose an exploratory study and integration of promising techniques in the development of a general and modular classification/analysis system for very large data bases, which would enhance and optimize data management and the use of human research resources.

  16. Right-sizing statistical models for longitudinal data.

    PubMed

    Wood, Phillip K; Steinley, Douglas; Jackson, Kristina M

    2015-12-01

    Arguments are proposed that researchers using longitudinal data should consider more and less complex statistical model alternatives to their initially chosen techniques in an effort to "right-size" the model to the data at hand. Such model comparisons may alert researchers who use poorly fitting, overly parsimonious models to more complex, better-fitting alternatives and, alternatively, may identify more parsimonious alternatives to overly complex (and perhaps empirically underidentified and/or less powerful) statistical models. A general framework is proposed for considering (often nested) relationships between a variety of psychometric and growth curve models. A 3-step approach is proposed in which models are evaluated based on the number and patterning of variance components prior to selection of better-fitting growth models that explain both mean and variation-covariation patterns. The orthogonal free curve slope intercept (FCSI) growth model is considered a general model that includes, as special cases, many models, including the factor mean (FM) model (McArdle & Epstein, 1987), McDonald's (1967) linearly constrained factor model, hierarchical linear models (HLMs), repeated-measures multivariate analysis of variance (MANOVA), and the linear slope intercept (linearSI) growth model. The FCSI model, in turn, is nested within the Tuckerized factor model. The approach is illustrated by comparing alternative models in a longitudinal study of children's vocabulary and by comparing several candidate parametric growth and chronometric models in a Monte Carlo study. (c) 2015 APA, all rights reserved).

  17. Ecobat: An online resource to facilitate transparent, evidence-based interpretation of bat activity data.

    PubMed

    Lintott, Paul R; Davison, Sophie; van Breda, John; Kubasiewicz, Laura; Dowse, David; Daisley, Jonathan; Haddy, Emily; Mathews, Fiona

    2018-01-01

    Acoustic surveys of bats are one of the techniques most commonly used by ecological practitioners. The results are used in Ecological Impact Assessments to assess the likely impacts of future developments on species that are widely protected in law, and to monitor developments' postconstruction. However, there is no standardized methodology for analyzing or interpreting these data, which can make the assessment of the ecological value of a site very subjective. Comparisons of sites and projects are therefore difficult for ecologists and decision-makers, for example, when trying to identify the best location for a new road based on relative bat activity levels along alternative routes. Here, we present a new web-based, data-driven tool, Ecobat, which addresses the need for a more robust way of interpreting ecological data. Ecobat offers users an easy, standardized, and objective method for analyzing bat activity data. It allows ecological practitioners to compare bat activity data at regional and national scales and to generate a numerical indicator of the relative importance of a night's worth of bat activity. The tool is free and open-source; because the underlying algorithms are already developed, it could easily be expanded to new geographical regions and species. Data donation is required to ensure the robustness of the analyses; we use a positive feedback mechanism to encourage ecological practitioners to share data by providing in return high quality, contextualized data analysis, and graphical visualizations for direct use in ecological reports.

  18. A Climate Statistics Tool and Data Repository

    NASA Astrophysics Data System (ADS)

    Wang, J.; Kotamarthi, V. R.; Kuiper, J. A.; Orr, A.

    2017-12-01

    Researchers at Argonne National Laboratory and collaborating organizations have generated regional scale, dynamically downscaled climate model output using Weather Research and Forecasting (WRF) version 3.3.1 at a 12km horizontal spatial resolution over much of North America. The WRF model is driven by boundary conditions obtained from three independent global scale climate models and two different future greenhouse gas emission scenarios, named representative concentration pathways (RCPs). The repository of results has a temporal resolution of three hours for all the simulations, includes more than 50 variables, is stored in Network Common Data Form (NetCDF) files, and the data volume is nearly 600Tb. A condensed 800Gb set of NetCDF files were made for selected variables most useful for climate-related planning, including daily precipitation, relative humidity, solar radiation, maximum temperature, minimum temperature, and wind. The WRF model simulations are conducted for three 10-year time periods (1995-2004, 2045-2054, and 2085-2094), and two future scenarios RCP4.5 and RCP8.5). An open-source tool was coded using Python 2.7.8 and ESRI ArcGIS 10.3.1 programming libraries to parse the NetCDF files, compute summary statistics, and output results as GIS layers. Eight sets of summary statistics were generated as examples for the contiguous U.S. states and much of Alaska, including number of days over 90°F, number of days with a heat index over 90°F, heat waves, monthly and annual precipitation, drought, extreme precipitation, multi-model averages, and model bias. This paper will provide an overview of the project to generate the main and condensed data repositories, describe the Python tool and how to use it, present the GIS results of the computed examples, and discuss some of the ways they can be used for planning. The condensed climate data, Python tool, computed GIS results, and documentation of the work are shared on the Internet.

  19. The Alkaline Hydrolysis of Sulfonate Esters: Challenges in Interpreting Experimental and Theoretical Data

    PubMed Central

    2013-01-01

    Sulfonate ester hydrolysis has been the subject of recent debate, with experimental evidence interpreted in terms of both stepwise and concerted mechanisms. In particular, a recent study of the alkaline hydrolysis of a series of benzene arylsulfonates (Babtie et al., Org. Biomol. Chem.10, 2012, 8095) presented a nonlinear Brønsted plot, which was explained in terms of a change from a stepwise mechanism involving a pentavalent intermediate for poorer leaving groups to a fully concerted mechanism for good leaving groups and supported by a theoretical study. In the present work, we have performed a detailed computational study of the hydrolysis of these compounds and find no computational evidence for a thermodynamically stable intermediate for any of these compounds. Additionally, we have extended the experimental data to include pyridine-3-yl benzene sulfonate and its N-oxide and N-methylpyridinium derivatives. Inclusion of these compounds converts the Brønsted plot to a moderately scattered but linear correlation and gives a very good Hammett correlation. These data suggest a concerted pathway for this reaction that proceeds via an early transition state with little bond cleavage to the leaving group, highlighting the care that needs to be taken with the interpretation of experimental and especially theoretical data. PMID:24279349

  20. Influence of rock strength variations on interpretation of thermochronologic data

    NASA Astrophysics Data System (ADS)

    Flowers, Rebecca; Ehlers, Todd

    2017-04-01

    Low temperature thermochronologic datasets are the primary means for estimating the timing, magnitude, and rates of erosion over extended (10s to 100s of Ma) timescales. Typically, abrupt shifts in cooling rates recorded by thermochronologic data are interpreted as changes in erosion rates caused by shifts in uplift rates, drainage patterns, or climate. However, recent work shows that different rock types vary in strength and erodibility by as much as several orders of magnitude, therefore implying that lithology should be an important control on how landscapes change through time and the thermochronometer record of erosion histories. Attention in the surface processes community has begun to focus on rock strength as a critical control on short-term (Ka to Ma) landscape evolution, but there has been less consideration of the influence of this factor on landscapes over longer intervals. If intrinsic lithologic variability can strongly modify erosion rates without changes in external factors, this result would have important implications for how thermochronologic datasets are interpreted. Here we evaluate the importance of rock strength for interpreting thermochronologic datasets by examining erosion rates and total denudation magnitudes across sedimentary rock-crystalline basement rock interfaces. We particularly focus on the 'Great Unconformity', a global stratigraphic surface between Phanerozoic sedimentary rocks and Precambrian crystalline basement, which based on rock strength studies marks a dramatic rock erodibility contrast across which erosion rates should decelerate. In the Rocky Mountain basement uplifts of the western U.S., thermochronologic data and geologic observations indicate that erosion rates were high during latest Cretaceous-early Tertiary denudation of the sedimentary cover (3-4 km over 10 m.y., 300-400 m/m.y.) but dramatically decelerated when less erodible basement rocks were encountered (0.1-0.5 km over 55 m.y., 2-9 m/m.y.). Similarly, the

  1. Automated Interpretation and Extraction of Topographic Information from Time of Flight Secondary Ion Mass Spectrometry Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ievlev, Anton V.; Belianinov, Alexei; Jesse, Stephen

    Time of flight secondary ion mass spectrometry (ToF SIMS) is one of the most powerful characterization tools allowing imaging of the chemical properties of various systems and materials. It allows precise studies of the chemical composition with sub-100-nm lateral and nanometer depth spatial resolution. However, comprehensive interpretation of ToF SIMS results is challengeable, because of the data volume and its multidimensionality. Furthermore, investigation of the samples with pronounced topographical features are complicated by the spectral shift. In this work we developed approach for the comprehensive ToF SIMS data interpretation based on the data analytics and automated extraction of the samplemore » topography based on time of flight shift. We further applied this approach to investigate correlation between biological function and chemical composition in Arabidopsis roots.« less

  2. Automated Interpretation and Extraction of Topographic Information from Time of Flight Secondary Ion Mass Spectrometry Data

    DOE PAGES

    Ievlev, Anton V.; Belianinov, Alexei; Jesse, Stephen; ...

    2017-12-06

    Time of flight secondary ion mass spectrometry (ToF SIMS) is one of the most powerful characterization tools allowing imaging of the chemical properties of various systems and materials. It allows precise studies of the chemical composition with sub-100-nm lateral and nanometer depth spatial resolution. However, comprehensive interpretation of ToF SIMS results is challengeable, because of the data volume and its multidimensionality. Furthermore, investigation of the samples with pronounced topographical features are complicated by the spectral shift. In this work we developed approach for the comprehensive ToF SIMS data interpretation based on the data analytics and automated extraction of the samplemore » topography based on time of flight shift. We further applied this approach to investigate correlation between biological function and chemical composition in Arabidopsis roots.« less

  3. Bias and sensitivity in the placement of fossil taxa resulting from interpretations of missing data.

    PubMed

    Sansom, Robert S

    2015-03-01

    The utility of fossils in evolutionary contexts is dependent on their accurate placement in phylogenetic frameworks, yet intrinsic and widespread missing data make this problematic. The complex taphonomic processes occurring during fossilization can make it difficult to distinguish absence from non-preservation, especially in the case of exceptionally preserved soft-tissue fossils: is a particular morphological character (e.g., appendage, tentacle, or nerve) missing from a fossil because it was never there (phylogenetic absence), or just happened to not be preserved (taphonomic loss)? Missing data have not been tested in the context of interpretation of non-present anatomy nor in the context of directional shifts and biases in affinity. Here, complete taxa, both simulated and empirical, are subjected to data loss through the replacement of present entries (1s) with either missing (?s) or absent (0s) entries. Both cause taxa to drift down trees, from their original position, toward the root. Absolute thresholds at which downshift is significant are extremely low for introduced absences (two entries replaced, 6% of present characters). The opposite threshold in empirical fossil taxa is also found to be low; two absent entries replaced with presences causes fossil taxa to drift up trees. As such, only a few instances of non-preserved characters interpreted as absences will cause fossil organisms to be erroneously interpreted as more primitive than they were in life. This observed sensitivity to coding non-present morphology presents a problem for all evolutionary studies that attempt to use fossils to reconstruct rates of evolution or unlock sequences of morphological change. Stem-ward slippage, whereby fossilization processes cause organisms to appear artificially primitive, appears to be a ubiquitous and problematic phenomenon inherent to missing data, even when no decay biases exist. Absent characters therefore require explicit justification and taphonomic

  4. Applications of spatial statistical network models to stream data

    Treesearch

    Daniel J. Isaak; Erin E. Peterson; Jay M. Ver Hoef; Seth J. Wenger; Jeffrey A. Falke; Christian E. Torgersen; Colin Sowder; E. Ashley Steel; Marie-Josee Fortin; Chris E. Jordan; Aaron S. Ruesch; Nicholas Som; Pascal Monestiez

    2014-01-01

    Streams and rivers host a significant portion of Earth's biodiversity and provide important ecosystem services for human populations. Accurate information regarding the status and trends of stream resources is vital for their effective conservation and management. Most statistical techniques applied to data measured on stream networks were developed for...

  5. Interpretation of energy deposition data from historical operation of the transient test facility (TREAT)

    DOE PAGES

    DeHart, Mark D.; Baker, Benjamin A.; Ortensi, Javier

    2017-07-27

    The Transient Test Reactor (TREAT) at Idaho National Laboratory will resume operations in late 2017 after a 23 year hiatus while maintained in a cold standby state. Over that time period, computational power and simulation capabilities have increased substantially and now allow for new multiphysics modeling possibilities that were not practical or feasible for most of TREAT's operational history. Hence the return of TREAT to operational service provides a unique opportunity to apply state-of-the-art software and associated methods in the modeling and simulation of general three-dimensional steady state and kinetic behavior for reactor operation, and for coupling of the coremore » power transient model to experiment simulations. However, measurements taken in previous operations were intended to predict power deposition in experimental samples, with little consideration of three-dimensional core power distributions. Hence, interpretation of data for the purpose of validation of modern methods can be challenging. For the research discussed herein, efforts are described for the process of proper interpretation of data from the most recent calibration experiments performed in the core, the M8 calibration series (M8-CAL). These measurements were taken between 1990 and 1993 using a set of fission wires and test fuel pins to estimate the power deposition that would be produced in fast reactor test fuel pins during the M8 experiment series. Because of the decision to place TREAT into a standby state in 1994, the M8 series of transients were never performed. However, potentially valuable information relevant for validation is available in the M8-CAL measurement data, if properly interpreted. This article describes the current state of the process of recovery of useful data from M8-CAL measurements and quantification of biases and uncertainties to potentially apply to the validation of multiphysics methods.« less

  6. Interpretation of energy deposition data from historical operation of the transient test facility (TREAT)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    DeHart, Mark D.; Baker, Benjamin A.; Ortensi, Javier

    The Transient Test Reactor (TREAT) at Idaho National Laboratory will resume operations in late 2017 after a 23 year hiatus while maintained in a cold standby state. Over that time period, computational power and simulation capabilities have increased substantially and now allow for new multiphysics modeling possibilities that were not practical or feasible for most of TREAT's operational history. Hence the return of TREAT to operational service provides a unique opportunity to apply state-of-the-art software and associated methods in the modeling and simulation of general three-dimensional steady state and kinetic behavior for reactor operation, and for coupling of the coremore » power transient model to experiment simulations. However, measurements taken in previous operations were intended to predict power deposition in experimental samples, with little consideration of three-dimensional core power distributions. Hence, interpretation of data for the purpose of validation of modern methods can be challenging. For the research discussed herein, efforts are described for the process of proper interpretation of data from the most recent calibration experiments performed in the core, the M8 calibration series (M8-CAL). These measurements were taken between 1990 and 1993 using a set of fission wires and test fuel pins to estimate the power deposition that would be produced in fast reactor test fuel pins during the M8 experiment series. Because of the decision to place TREAT into a standby state in 1994, the M8 series of transients were never performed. However, potentially valuable information relevant for validation is available in the M8-CAL measurement data, if properly interpreted. This article describes the current state of the process of recovery of useful data from M8-CAL measurements and quantification of biases and uncertainties to potentially apply to the validation of multiphysics methods.« less

  7. Securizing data linkage in french public statistics.

    PubMed

    Guesdon, Maxence; Benzenine, Eric; Gadouche, Kamel; Quantin, Catherine

    2016-10-06

    Administrative records in France, especially medical and social records, have huge potential for statistical studies. The NIR (a national identifier) is widely used in medico-social administrations, and this would theoretically provide considerable scope for data matching, on condition that the legislation on such matters was respected.The law, however, forbids the processing of non-anonymized medical data, thus making it difficult to carry out studies that require several sources of social and medical data.We would like to benefit from computer techniques introduced since the 70 s to provide safe linkage of anonymized files, to release the current constraints of such procedures.We propose an organization and a data workflow, based on hashing and cyrptographic techniques, to strongly compartmentalize identifying and not-identifying data.The proposed method offers a strong control over who is in possession of which information, using different hashing keys for each linkage. This allows to prevent unauthorized linkage of data, to protect anonymity, by preventing cumulation of not-identifying data which can become identifying data when linked.Our proposal would make it possible to conduct such studies more easily, more regularly and more precisely while preserving a high enough level of anonymity.The main obstacle to setting up such a system, in our opinion, is not technical, but rather organizational in that it is based on the existence of a Key-Management Authority.

  8. Identification of reliable gridded reference data for statistical downscaling methods in Alberta

    NASA Astrophysics Data System (ADS)

    Eum, H. I.; Gupta, A.

    2017-12-01

    Climate models provide essential information to assess impacts of climate change at regional and global scales. However, statistical downscaling methods have been applied to prepare climate model data for various applications such as hydrologic and ecologic modelling at a watershed scale. As the reliability and (spatial and temporal) resolution of statistically downscaled climate data mainly depend on a reference data, identifying the most reliable reference data is crucial for statistical downscaling. A growing number of gridded climate products are available for key climate variables which are main input data to regional modelling systems. However, inconsistencies in these climate products, for example, different combinations of climate variables, varying data domains and data lengths and data accuracy varying with physiographic characteristics of the landscape, have caused significant challenges in selecting the most suitable reference climate data for various environmental studies and modelling. Employing various observation-based daily gridded climate products available in public domain, i.e. thin plate spline regression products (ANUSPLIN and TPS), inverse distance method (Alberta Townships), and numerical climate model (North American Regional Reanalysis) and an optimum interpolation technique (Canadian Precipitation Analysis), this study evaluates the accuracy of the climate products at each grid point by comparing with the Adjusted and Homogenized Canadian Climate Data (AHCCD) observations for precipitation, minimum and maximum temperature over the province of Alberta. Based on the performance of climate products at AHCCD stations, we ranked the reliability of these publically available climate products corresponding to the elevations of stations discretized into several classes. According to the rank of climate products for each elevation class, we identified the most reliable climate products based on the elevation of target points. A web-based system

  9. Intuitive and interpretable visual communication of a complex statistical model of disease progression and risk.

    PubMed

    Jieyi Li; Arandjelovic, Ognjen

    2017-07-01

    Computer science and machine learning in particular are increasingly lauded for their potential to aid medical practice. However, the highly technical nature of the state of the art techniques can be a major obstacle in their usability by health care professionals and thus, their adoption and actual practical benefit. In this paper we describe a software tool which focuses on the visualization of predictions made by a recently developed method which leverages data in the form of large scale electronic records for making diagnostic predictions. Guided by risk predictions, our tool allows the user to explore interactively different diagnostic trajectories, or display cumulative long term prognostics, in an intuitive and easily interpretable manner.

  10. Challenges and Lessons Learned in Generating and Interpreting NHANES Nutritional Biomarker Data1234

    PubMed Central

    Pfeiffer, Christine M; Lacher, David A; Schleicher, Rosemary L; Johnson, Clifford L; Yetley, Elizabeth A

    2017-01-01

    For the past 45 y, the National Center for Health Statistics at the CDC has carried out nutrition surveillance of the US population by collecting anthropometric, dietary intake, and nutritional biomarker data, the latter being the focus of this publication. The earliest biomarker testing assessed iron and vitamin A status. With time, a broad spectrum of water- and fat-soluble vitamins was added and biomarkers for other types of nutrients (e.g., fatty acids) and bioactive dietary compounds (e.g., phytoestrogens) were included in NHANES. The cross-sectional survey is flexible in design, and biomarkers may be measured for a short period of time or rotated in and out of surveys depending on scientific needs. Maintaining high-quality laboratory measurements over extended periods of time such that trends in status can be reliably assessed is a major goal of the testing laboratories. Physicians, health scientists, and policy makers rely on the NHANES reference data to compare the nutritional status of population groups, to assess the impact of various interventions, and to explore associations between nutritional status and health promotion or disease prevention. Focusing on the continuous NHANES, which started in 1999, this review uses a “lessons learned” approach to present a series of challenges that are relevant to researchers measuring biomarkers in NHANES and beyond. Some of those challenges are the use of multiple related biomarkers instead of a single biomarker for a specific nutrient (e.g., folate, vitamin B-12, iron), adhering to special needs for specimen collection and handling to ensure optimum specimen quality (e.g., vitamin C, folate, homocysteine, iodine, polyunsaturated fatty acids), the retrospective use of long-term quality-control data to correct for assay shifts (e.g., vitamin D, vitamin B-12), and the proper planning for and interpretation of crossover studies to adjust for systematic method changes (e.g., folate, vitamin D, ferritin). PMID

  11. Complementary roles for toxicologic pathology and mathematics in toxicogenomics, with special reference to data interpretation and oscillatory dynamics.

    PubMed

    Morgan, Kevin T; Pino, Michael; Crosby, Lynn M; Wang, Min; Elston, Timothy C; Jayyosi, Zaid; Bonnefoi, Marc; Boorman, Gary

    2004-01-01

    Toxicogenomics is an emerging multidisciplinary science that will profoundly impact the practice of toxicology. New generations of biologists, using evolving toxicogenomics tools, will generate massive data sets in need of interpretation. Mathematical tools are necessary to cluster and otherwise find meaningful structure in such data. The linking of this structure to gene functions and disease processes, and finally the generation of useful data interpretation remains a significant challenge. The training and background of pathologists make them ideally suited to contribute to the field of toxicogenomics, from experimental design to data interpretation. Toxicologic pathology, a discipline based on pattern recognition, requires familiarity with the dynamics of disease processes and interactions between organs, tissues, and cell populations. Optimal involvement of toxicologic pathologists in toxicogenomics requires that they communicate effectively with the many other scientists critical for the effective application of this complex discipline to societal problems. As noted by Petricoin III et al (Nature Genetics 32, 474-479, 2002), cooperation among regulators, sponsors and experts will be essential for realizing the potential of microarrays for public health. Following a brief introduction to the role of mathematics in toxicogenomics, "data interpretation" from the perspective of a pathologist is briefly discussed. Based on oscillatory behavior in the liver, the importance of an understanding of mathematics is addressed, and an approach to learning mathematics "later in life" is provided. An understanding of pathology by mathematicians involved in toxicogenomics is equally critical, as both mathematics and pathology are essential for transforming toxicogenomics data sets into useful knowledge.

  12. Teaching Real Data Interpretation with Models (TRIM): Analysis of Student Dialogue in a Large-Enrollment Cell and Developmental Biology Course

    ERIC Educational Resources Information Center

    Zagallo, Patricia; Meddleton, Shanice; Bolger, Molly S.

    2016-01-01

    We present our design for a cell biology course to integrate content with scientific practices, specifically data interpretation and model-based reasoning. A 2-year research project within this course allowed us to understand how students interpret authentic biological data in this setting. Through analysis of written work, we measured the extent…

  13. How accurate are interpretations of curriculum-based measurement progress monitoring data? Visual analysis versus decision rules.

    PubMed

    Van Norman, Ethan R; Christ, Theodore J

    2016-10-01

    Curriculum based measurement of oral reading (CBM-R) is used to monitor the effects of academic interventions for individual students. Decisions to continue, modify, or terminate these interventions are made by interpreting time series CBM-R data. Such interpretation is founded upon visual analysis or the application of decision rules. The purpose of this study was to compare the accuracy of visual analysis and decision rules. Visual analysts interpreted 108 CBM-R progress monitoring graphs one of three ways: (a) without graphic aids, (b) with a goal line, or (c) with a goal line and a trend line. Graphs differed along three dimensions, including trend magnitude, variability of observations, and duration of data collection. Automated trend line and data point decision rules were also applied to each graph. Inferential analyses permitted the estimation of the probability of a correct decision (i.e., the student is improving - continue the intervention, or the student is not improving - discontinue the intervention) for each evaluation method as a function of trend magnitude, variability of observations, and duration of data collection. All evaluation methods performed better when students made adequate progress. Visual analysis and decision rules performed similarly when observations were less variable. Results suggest that educators should collect data for more than six weeks, take steps to control measurement error, and visually analyze graphs when data are variable. Implications for practice and research are discussed. Copyright © 2016 Society for the Study of School Psychology. Published by Elsevier Ltd. All rights reserved.

  14. The Virtual Seismic Atlas Project: sharing the interpretation of seismic data

    NASA Astrophysics Data System (ADS)

    Butler, R.; Mortimer, E.; McCaffrey, B.; Stuart, G.; Sizer, M.; Clayton, S.

    2007-12-01

    Through the activities of academic research programs, national institutions and corporations, especially oil and gas companies, there is a substantial volume of seismic reflection data. Although the majority is proprietary and confidential, there are significant volumes of data that are potentially within the public domain and available for research. Yet the community is poorly connected to these data and consequently geological and other research using seismic reflection data is limited to very few groups of researchers. This is about to change. The Virtual Seismic Atlas (VSA) is generating an independent, free-to-use, community based internet resource that captures and shares the geological interpretation of seismic data globally. Images and associated documents are explicitly indexed using not only existing survey and geographical data but also on the geology they portray. By using "Guided Navigation" to search, discover and retrieve images, users are exposed to arrays of geological analogues that provide novel insights and opportunities for research and education. The VSA goes live, with evolving content and functionality, through 2008. There are opportunities for designed integration with other global data programs in the earth sciences.

  15. Statistical image reconstruction from correlated data with applications to PET

    PubMed Central

    Alessio, Adam; Sauer, Ken; Kinahan, Paul

    2008-01-01

    Most statistical reconstruction methods for emission tomography are designed for data modeled as conditionally independent Poisson variates. In reality, due to scanner detectors, electronics and data processing, correlations are introduced into the data resulting in dependent variates. In general, these correlations are ignored because they are difficult to measure and lead to computationally challenging statistical reconstruction algorithms. This work addresses the second concern, seeking to simplify the reconstruction of correlated data and provide a more precise image estimate than the conventional independent methods. In general, correlated variates have a large non-diagonal covariance matrix that is computationally challenging to use as a weighting term in a reconstruction algorithm. This work proposes two methods to simplify the use of a non-diagonal covariance matrix as the weighting term by (a) limiting the number of dimensions in which the correlations are modeled and (b) adopting flexible, yet computationally tractable, models for correlation structure. We apply and test these methods with simple simulated PET data and data processed with the Fourier rebinning algorithm which include the one-dimensional correlations in the axial direction and the two-dimensional correlations in the transaxial directions. The methods are incorporated into a penalized weighted least-squares 2D reconstruction and compared with a conventional maximum a posteriori approach. PMID:17921576

  16. Searching the Heavens: Astronomy, Computation, Statistics, Data Mining and Philosophy

    NASA Astrophysics Data System (ADS)

    Glymour, Clark

    2012-03-01

    Our first and purest science, the mother of scientific methods, sustained by sheer curiosity, searching the heavens we cannot manipulate. From the beginning, astronomy has combined mathematical idealization, technological ingenuity, and indefatigable data collection with procedures to search through assembled data for the processes that govern the cosmos. Astronomers are, and ever have been, data miners, and for that reason astronomical methods (but not astronomical discoveries) have often been despised by statisticians and philosophers. Epithets laced the statistical literature: Ransacking! Data dredging! Double Counting! Statistical disdain was usually directed at social scientists and biologists, rarely if ever at astronomers, but the methodological attitudes and goals that many twentieth-century philosophers and statisticians rejected were creations of the astronomical tradition. The philosophical criticisms were earlier and more direct. In the shadow (or in Alexander Pope’s phrasing, the light) cast on nature in the eighteenth century by the Newtonian triumph, David Hume revived arguments from the ancient Greeks to challenge the very possibility of coming to know what causes what. His conclusion was endorsed in the twentieth century by many philosophers who found talk of causation unnecessary or unacceptably metaphysical, and absorbed by many statisticians as a general suspicion of causal claims, except possibly when they are founded on experimental manipulation. And yet in the hands of a mathematician, Thomas Bayes, and another mathematician and philosopher, Richard Price, Hume’s essays prompted the development of a new kind of statistics, the kind we now call "Bayesian." The computer and new data acquisition methods have begun to dissolve the antipathy between astronomy, philosophy, and statistics. But the resolution is practical, without much reflection on the arguments or the course of events. So, I offer a largely unoriginal history

  17. Preliminary interpretation of industry two-dimensional seismic data from Susitna Basin, south-central Alaska

    USGS Publications Warehouse

    Lewis, Kristen A.; Potter, Christopher J.; Shah, Anjana K.; Stanley, Richard G.; Haeussler, Peter J.; Saltus, Richard W.

    2015-07-30

    The eastern seismic lines show evidence of numerous short-wavelength antiforms that appear to correspond to a series of northeast-trending lineations observed in aeromagnetic data, which have been interpreted as being due to folding of Paleogene volcanic strata. The eastern side of the basin is also cut by a number of reverse faults and thrust faults, the majority of which strike north-south. The western side of the Susitna Basin is cut by a series of regional reverse faults and is characterized by synformal structures in two fault blocks between the Kahiltna River and Skwentna faults. These synforms are progressively deeper to the west in the footwalls of the east-vergent Skwentna and northeast-vergent Beluga Mountain reverse faults. Although the seismic data are limited to the south, we interpret a potential regional south-southeast-directed reverse fault striking east-northeast on the east side of the basin that may cross the entire southern portion of the basin.

  18. Statistical atlas based extrapolation of CT data

    NASA Astrophysics Data System (ADS)

    Chintalapani, Gouthami; Murphy, Ryan; Armiger, Robert S.; Lepisto, Jyri; Otake, Yoshito; Sugano, Nobuhiko; Taylor, Russell H.; Armand, Mehran

    2010-02-01

    We present a framework to estimate the missing anatomical details from a partial CT scan with the help of statistical shape models. The motivating application is periacetabular osteotomy (PAO), a technique for treating developmental hip dysplasia, an abnormal condition of the hip socket that, if untreated, may lead to osteoarthritis. The common goals of PAO are to reduce pain, joint subluxation and improve contact pressure distribution by increasing the coverage of the femoral head by the hip socket. While current diagnosis and planning is based on radiological measurements, because of significant structural variations in dysplastic hips, a computer-assisted geometrical and biomechanical planning based on CT data is desirable to help the surgeon achieve optimal joint realignments. Most of the patients undergoing PAO are young females, hence it is usually desirable to minimize the radiation dose by scanning only the joint portion of the hip anatomy. These partial scans, however, do not provide enough information for biomechanical analysis due to missing iliac region. A statistical shape model of full pelvis anatomy is constructed from a database of CT scans. The partial volume is first aligned with the statistical atlas using an iterative affine registration, followed by a deformable registration step and the missing information is inferred from the atlas. The atlas inferences are further enhanced by the use of X-ray images of the patient, which are very common in an osteotomy procedure. The proposed method is validated with a leave-one-out analysis method. Osteotomy cuts are simulated and the effect of atlas predicted models on the actual procedure is evaluated.

  19. A Novel Statistical Analysis and Interpretation of Flow Cytometry Data

    DTIC Science & Technology

    2013-03-31

    the resulting residuals appear random. In the work that follows, I∗ = 200. The values of B and b̂j are known from the experiment. Notice that the...conjunction with the model parameter vector in a two- stage process. Unfortunately two- stage estimation may cause some parameters of the mathematical model to...information theoretic criteria such as Akaike’s Information Criterion (AIC). From (4.3), it follows that the scaled residuals rjk = λjI[n̂](tj , zk; ~q

  20. Reproducibility-optimized test statistic for ranking genes in microarray studies.

    PubMed

    Elo, Laura L; Filén, Sanna; Lahesmaa, Riitta; Aittokallio, Tero

    2008-01-01

    A principal goal of microarray studies is to identify the genes showing differential expression under distinct conditions. In such studies, the selection of an optimal test statistic is a crucial challenge, which depends on the type and amount of data under analysis. While previous studies on simulated or spike-in datasets do not provide practical guidance on how to choose the best method for a given real dataset, we introduce an enhanced reproducibility-optimization procedure, which enables the selection of a suitable gene- anking statistic directly from the data. In comparison with existing ranking methods, the reproducibilityoptimized statistic shows good performance consistently under various simulated conditions and on Affymetrix spike-in dataset. Further, the feasibility of the novel statistic is confirmed in a practical research setting using data from an in-house cDNA microarray study of asthma-related gene expression changes. These results suggest that the procedure facilitates the selection of an appropriate test statistic for a given dataset without relying on a priori assumptions, which may bias the findings and their interpretation. Moreover, the general reproducibilityoptimization procedure is not limited to detecting differential expression only but could be extended to a wide range of other applications as well.