Equivalent statistics and data interpretation.
Francis, Gregory
2017-08-01
Recent reform efforts in psychological science have led to a plethora of choices for scientists to analyze their data. A scientist making an inference about their data must now decide whether to report a p value, summarize the data with a standardized effect size and its confidence interval, report a Bayes Factor, or use other model comparison methods. To make good choices among these options, it is necessary for researchers to understand the characteristics of the various statistics used by the different analysis frameworks. Toward that end, this paper makes two contributions. First, it shows that for the case of a two-sample t test with known sample sizes, many different summary statistics are mathematically equivalent in the sense that they are based on the very same information in the data set. When the sample sizes are known, the p value provides as much information about a data set as the confidence interval of Cohen's d or a JZS Bayes factor. Second, this equivalence means that different analysis methods differ only in their interpretation of the empirical data. At first glance, it might seem that mathematical equivalence of the statistics suggests that it does not matter much which statistic is reported, but the opposite is true because the appropriateness of a reported statistic is relative to the inference it promotes. Accordingly, scientists should choose an analysis method appropriate for their scientific investigation. A direct comparison of the different inferential frameworks provides some guidance for scientists to make good choices and improve scientific practice.
Interpretation of Statistical Data: The Importance of Affective Expressions
ERIC Educational Resources Information Center
Queiroz, Tamires; Monteiro, Carlos; Carvalho, Liliane; François, Karen
In recent years, research on teaching and learning of statistics emphasized that the interpretation of data is a complex process that involves cognitive and technical aspects. However, it is a human activity that involves also contextual and affective aspects. This view is in line with research on affectivity and cognition. While the affective…
Statistical transformation and the interpretation of inpatient glucose control data.
Saulnier, George E; Castro, Janna C; Cook, Curtiss B
2014-03-01
To introduce a statistical method of assessing hospital-based non-intensive care unit (non-ICU) inpatient glucose control. Point-of-care blood glucose (POC-BG) data from hospital non-ICUs were extracted for January 1 through December 31, 2011. Glucose data distribution was examined before and after Box-Cox transformations and compared to normality. Different subsets of data were used to establish upper and lower control limits, and exponentially weighted moving average (EWMA) control charts were constructed from June, July, and October data as examples to determine if out-of-control events were identified differently in nontransformed versus transformed data. A total of 36,381 POC-BG values were analyzed. In all 3 monthly test samples, glucose distributions in nontransformed data were skewed but approached a normal distribution once transformed. Interpretation of out-of-control events from EWMA control chart analyses also revealed differences. In the June test data, an out-of-control process was identified at sample 53 with nontransformed data, whereas the transformed data remained in control for the duration of the observed period. Analysis of July data demonstrated an out-of-control process sooner in the transformed (sample 55) than nontransformed (sample 111) data, whereas for October, transformed data remained in control longer than nontransformed data. Statistical transformations increase the normal behavior of inpatient non-ICU glycemic data sets. The decision to transform glucose data could influence the interpretation and conclusions about the status of inpatient glycemic control. Further study is required to determine whether transformed versus nontransformed data influence clinical decisions or evaluation of interventions.
A statistical model for interpreting computerized dynamic posturography data
NASA Technical Reports Server (NTRS)
Feiveson, Alan H.; Metter, E. Jeffrey; Paloski, William H.
2002-01-01
Computerized dynamic posturography (CDP) is widely used for assessment of altered balance control. CDP trials are quantified using the equilibrium score (ES), which ranges from zero to 100, as a decreasing function of peak sway angle. The problem of how best to model and analyze ESs from a controlled study is considered. The ES often exhibits a skewed distribution in repeated trials, which can lead to incorrect inference when applying standard regression or analysis of variance models. Furthermore, CDP trials are terminated when a patient loses balance. In these situations, the ES is not observable, but is assigned the lowest possible score--zero. As a result, the response variable has a mixed discrete-continuous distribution, further compromising inference obtained by standard statistical methods. Here, we develop alternative methodology for analyzing ESs under a stochastic model extending the ES to a continuous latent random variable that always exists, but is unobserved in the event of a fall. Loss of balance occurs conditionally, with probability depending on the realized latent ES. After fitting the model by a form of quasi-maximum-likelihood, one may perform statistical inference to assess the effects of explanatory variables. An example is provided, using data from the NIH/NIA Baltimore Longitudinal Study on Aging.
A statistical model for interpreting computerized dynamic posturography data
NASA Technical Reports Server (NTRS)
Feiveson, Alan H.; Metter, E. Jeffrey; Paloski, William H.
2002-01-01
Computerized dynamic posturography (CDP) is widely used for assessment of altered balance control. CDP trials are quantified using the equilibrium score (ES), which ranges from zero to 100, as a decreasing function of peak sway angle. The problem of how best to model and analyze ESs from a controlled study is considered. The ES often exhibits a skewed distribution in repeated trials, which can lead to incorrect inference when applying standard regression or analysis of variance models. Furthermore, CDP trials are terminated when a patient loses balance. In these situations, the ES is not observable, but is assigned the lowest possible score--zero. As a result, the response variable has a mixed discrete-continuous distribution, further compromising inference obtained by standard statistical methods. Here, we develop alternative methodology for analyzing ESs under a stochastic model extending the ES to a continuous latent random variable that always exists, but is unobserved in the event of a fall. Loss of balance occurs conditionally, with probability depending on the realized latent ES. After fitting the model by a form of quasi-maximum-likelihood, one may perform statistical inference to assess the effects of explanatory variables. An example is provided, using data from the NIH/NIA Baltimore Longitudinal Study on Aging.
Correlation-based interpretations of paleoclimate data - where statistics meet past climates
NASA Astrophysics Data System (ADS)
Hu, Jun; Emile-Geay, Julien; Partin, Judson
2017-02-01
Correlation analysis is omnipresent in paleoclimatology, and often serves to support the proposed climatic interpretation of a given proxy record. However, this analysis presents several statistical challenges, each of which is sufficient to nullify the interpretation: the loss of degrees of freedom due to serial correlation, the test multiplicity problem in connection with a climate field, and the presence of age uncertainties. While these issues have long been known to statisticians, they are not widely appreciated by the wider paleoclimate community; yet they can have a first-order impact on scientific conclusions. Here we use three examples from the recent paleoclimate literature to highlight how spurious correlations affect the published interpretations of paleoclimate proxies, and suggest that future studies should address these issues to strengthen their conclusions. In some cases, correlations that were previously claimed to be significant are found insignificant, thereby challenging published interpretations. In other cases, minor adjustments can be made to safeguard against these concerns. Because such problems arise so commonly with paleoclimate data, we provide open-source code to address them. Ultimately, we conclude that statistics alone cannot ground-truth a proxy, and recommend establishing a mechanistic understanding of a proxy signal as a sounder basis for interpretation.
Misuse of statistics in the interpretation of data on low-level radiation
Hamilton, L.D.
1982-01-01
Four misuses of statistics in the interpretation of data of low-level radiation are reviewed: (1) post-hoc analysis and aggregation of data leading to faulty conclusions in the reanalysis of genetic effects of the atomic bomb, and premature conclusions on the Portsmouth Naval Shipyard data; (2) inappropriate adjustment for age and ignoring differences between urban and rural areas leading to potentially spurious increase in incidence of cancer at Rocky Flats; (3) hazard of summary statistics based on ill-conditioned individual rates leading to spurious association between childhood leukemia and fallout in Utah; and (4) the danger of prematurely published preliminary work with inadequate consideration of epidemiological problems - censored data - leading to inappropriate conclusions, needless alarm at the Portsmouth Naval Shipyard, and diversion of scarce research funds.
Saulnier, George E; Castro, Janna C; Cook, Curtiss B
2014-05-01
Glucose control can be problematic in critically ill patients. We evaluated the impact of statistical transformation on interpretation of intensive care unit inpatient glucose control data. Point-of-care blood glucose (POC-BG) data derived from patients in the intensive care unit for 2011 was obtained. Box-Cox transformation of POC-BG measurements was performed, and distribution of data was determined before and after transformation. Different data subsets were used to establish statistical upper and lower control limits. Exponentially weighted moving average (EWMA) control charts constructed from April, October, and November data determined whether out-of-control events could be identified differently in transformed versus nontransformed data. A total of 8679 POC-BG values were analyzed. POC-BG distributions in nontransformed data were skewed but approached normality after transformation. EWMA control charts revealed differences in projected detection of out-of-control events. In April, an out-of-control process resulting in the lower control limit being exceeded was identified at sample 116 in nontransformed data but not in transformed data. October transformed data detected an out-of-control process exceeding the upper control limit at sample 27 that was not detected in nontransformed data. Nontransformed November results remained in control, but transformation identified an out-of-control event less than 10 samples into the observation period. Using statistical methods to assess population-based glucose control in the intensive care unit could alter conclusions about the effectiveness of care processes for managing hyperglycemia. Further study is required to determine whether transformed versus nontransformed data change clinical decisions about the interpretation of care or intervention results. © 2014 Diabetes Technology Society.
Saulnier, George E.; Castro, Janna C.
2014-01-01
Glucose control can be problematic in critically ill patients. We evaluated the impact of statistical transformation on interpretation of intensive care unit inpatient glucose control data. Point-of-care blood glucose (POC-BG) data derived from patients in the intensive care unit for 2011 was obtained. Box–Cox transformation of POC-BG measurements was performed, and distribution of data was determined before and after transformation. Different data subsets were used to establish statistical upper and lower control limits. Exponentially weighted moving average (EWMA) control charts constructed from April, October, and November data determined whether out-of-control events could be identified differently in transformed versus nontransformed data. A total of 8679 POC-BG values were analyzed. POC-BG distributions in nontransformed data were skewed but approached normality after transformation. EWMA control charts revealed differences in projected detection of out-of-control events. In April, an out-of-control process resulting in the lower control limit being exceeded was identified at sample 116 in nontransformed data but not in transformed data. October transformed data detected an out-of-control process exceeding the upper control limit at sample 27 that was not detected in nontransformed data. Nontransformed November results remained in control, but transformation identified an out-of-control event less than 10 samples into the observation period. Using statistical methods to assess population-based glucose control in the intensive care unit could alter conclusions about the effectiveness of care processes for managing hyperglycemia. Further study is required to determine whether transformed versus nontransformed data change clinical decisions about the interpretation of care or intervention results. PMID:24876620
Phoenix, S.L.; Wu, E.M.
1983-03-01
This paper presents some new data on the strength and stress-rupture of Kevlar-49 fibers, fiber/epoxy strands and pressure vessels, and consolidated data obtained at LLNL over the past 10 years. This data are interpreted by using recent theoretical results from a micromechanical model of the statistical failure process, thereby gaining understanding of the roles of the epoxy matrix and ultraviolet radiation on long term lifetime.
Onisko, Agnieszka; Druzdzel, Marek J; Austin, R Marshall
2016-01-01
Classical statistics is a well-established approach in the analysis of medical data. While the medical community seems to be familiar with the concept of a statistical analysis and its interpretation, the Bayesian approach, argued by many of its proponents to be superior to the classical frequentist approach, is still not well-recognized in the analysis of medical data. The goal of this study is to encourage data analysts to use the Bayesian approach, such as modeling with graphical probabilistic networks, as an insightful alternative to classical statistical analysis of medical data. This paper offers a comparison of two approaches to analysis of medical time series data: (1) classical statistical approach, such as the Kaplan-Meier estimator and the Cox proportional hazards regression model, and (2) dynamic Bayesian network modeling. Our comparison is based on time series cervical cancer screening data collected at Magee-Womens Hospital, University of Pittsburgh Medical Center over 10 years. The main outcomes of our comparison are cervical cancer risk assessments produced by the three approaches. However, our analysis discusses also several aspects of the comparison, such as modeling assumptions, model building, dealing with incomplete data, individualized risk assessment, results interpretation, and model validation. Our study shows that the Bayesian approach is (1) much more flexible in terms of modeling effort, and (2) it offers an individualized risk assessment, which is more cumbersome for classical statistical approaches.
Onisko, Agnieszka; Druzdzel, Marek J.; Austin, R. Marshall
2016-01-01
Background: Classical statistics is a well-established approach in the analysis of medical data. While the medical community seems to be familiar with the concept of a statistical analysis and its interpretation, the Bayesian approach, argued by many of its proponents to be superior to the classical frequentist approach, is still not well-recognized in the analysis of medical data. Aim: The goal of this study is to encourage data analysts to use the Bayesian approach, such as modeling with graphical probabilistic networks, as an insightful alternative to classical statistical analysis of medical data. Materials and Methods: This paper offers a comparison of two approaches to analysis of medical time series data: (1) classical statistical approach, such as the Kaplan–Meier estimator and the Cox proportional hazards regression model, and (2) dynamic Bayesian network modeling. Our comparison is based on time series cervical cancer screening data collected at Magee-Womens Hospital, University of Pittsburgh Medical Center over 10 years. Results: The main outcomes of our comparison are cervical cancer risk assessments produced by the three approaches. However, our analysis discusses also several aspects of the comparison, such as modeling assumptions, model building, dealing with incomplete data, individualized risk assessment, results interpretation, and model validation. Conclusion: Our study shows that the Bayesian approach is (1) much more flexible in terms of modeling effort, and (2) it offers an individualized risk assessment, which is more cumbersome for classical statistical approaches. PMID:28163973
Statistics of randomly cross-linked polymer models to interpret chromatin conformation capture data
NASA Astrophysics Data System (ADS)
Shukron, O.; Holcman, D.
2017-07-01
Polymer models are used to describe chromatin, which can be folded at different spatial scales by binding molecules. By folding, chromatin generates loops of various sizes. We present here a statistical analysis of the randomly cross-linked (RCL) polymer model, where monomer pairs are connected randomly, generating a heterogeneous ensemble of chromatin conformations. We obtain asymptotic formulas for the steady-state variance, encounter probability, the radius of gyration, instantaneous displacement, and the mean first encounter time between any two monomers. The analytical results are confirmed by Brownian simulations. Finally, the present results are used to extract the mean number of cross links in a chromatin region from conformation capture data.
QC Metrics from CPTAC Raw LC-MS/MS Data Interpreted through Multivariate Statistics
2015-01-01
Shotgun proteomics experiments integrate a complex sequence of processes, any of which can introduce variability. Quality metrics computed from LC-MS/MS data have relied upon identifying MS/MS scans, but a new mode for the QuaMeter software produces metrics that are independent of identifications. Rather than evaluating each metric independently, we have created a robust multivariate statistical toolkit that accommodates the correlation structure of these metrics and allows for hierarchical relationships among data sets. The framework enables visualization and structural assessment of variability. Study 1 for the Clinical Proteomics Technology Assessment for Cancer (CPTAC), which analyzed three replicates of two common samples at each of two time points among 23 mass spectrometers in nine laboratories, provided the data to demonstrate this framework, and CPTAC Study 5 provided data from complex lysates under Standard Operating Procedures (SOPs) to complement these findings. Identification-independent quality metrics enabled the differentiation of sites and run-times through robust principal components analysis and subsequent factor analysis. Dissimilarity metrics revealed outliers in performance, and a nested ANOVA model revealed the extent to which all metrics or individual metrics were impacted by mass spectrometer and run time. Study 5 data revealed that even when SOPs have been applied, instrument-dependent variability remains prominent, although it may be reduced, while within-site variability is reduced significantly. Finally, identification-independent quality metrics were shown to be predictive of identification sensitivity in these data sets. QuaMeter and the associated multivariate framework are available from http://fenchurch.mc.vanderbilt.edu and http://homepages.uc.edu/~wang2x7/, respectively. PMID:24494671
Tasker, Gary D.; Granato, Gregory E.
2000-01-01
Decision makers need viable methods for the interpretation of local, regional, and national-highway runoff and urban-stormwater data including flows, concentrations and loads of chemical constituents and sediment, potential effects on receiving waters, and the potential effectiveness of various best management practices (BMPs). Valid (useful for intended purposes), current, and technically defensible stormwater-runoff models are needed to interpret data collected in field studies, to support existing highway and urban-runoffplanning processes, to meet National Pollutant Discharge Elimination System (NPDES) requirements, and to provide methods for computation of Total Maximum Daily Loads (TMDLs) systematically and economically. Historically, conceptual, simulation, empirical, and statistical models of varying levels of detail, complexity, and uncertainty have been used to meet various data-quality objectives in the decision-making processes necessary for the planning, design, construction, and maintenance of highways and for other land-use applications. Water-quality simulation models attempt a detailed representation of the physical processes and mechanisms at a given site. Empirical and statistical regional water-quality assessment models provide a more general picture of water quality or changes in water quality over a region. All these modeling techniques share one common aspect-their predictive ability is poor without suitable site-specific data for calibration. To properly apply the correct model, one must understand the classification of variables, the unique characteristics of water-resources data, and the concept of population structure and analysis. Classifying variables being used to analyze data may determine which statistical methods are appropriate for data analysis. An understanding of the characteristics of water-resources data is necessary to evaluate the applicability of different statistical methods, to interpret the results of these techniques
Statistics Translated: A Step-by-Step Guide to Analyzing and Interpreting Data
ERIC Educational Resources Information Center
Terrell, Steven R.
2012-01-01
Written in a humorous and encouraging style, this text shows how the most common statistical tools can be used to answer interesting real-world questions, presented as mysteries to be solved. Engaging research examples lead the reader through a series of six steps, from identifying a researchable problem to stating a hypothesis, identifying…
Asfahani, Jamal
2014-02-01
Factor analysis technique is proposed in this research for interpreting the combination of nuclear well logging, including natural gamma ray, density and neutron-porosity, and the electrical well logging of long and short normal, in order to characterize the large extended basaltic areas in southern Syria. Kodana well logging data are used for testing and applying the proposed technique. The four resulting score logs enable to establish the lithological score cross-section of the studied well. The established cross-section clearly shows the distribution and the identification of four kinds of basalt which are hard massive basalt, hard basalt, pyroclastic basalt and the alteration basalt products, clay. The factor analysis technique is successfully applied on the Kodana well logging data in southern Syria, and can be used efficiently when several wells and huge well logging data with high number of variables are required to be interpreted. © 2013 Elsevier Ltd. All rights reserved.
Statistical interpretation of traveltime fluctuations
NASA Astrophysics Data System (ADS)
Roth, Michael
1997-02-01
A ray-theoretical relation between the autocorrelation functions of traveltime and slowness fluctuations is established for recording profiles with arbitrary angles to the propagation direction of a plane wave. From this relation follows that the variance of traveltime fluctuations is independent of the profile orientation and proportional to the variance, ɛ2, of slowness fluctuations, to the correlation distance, a, and to the propagation distance L. The halfwidth of the autocorrelation function of traveltime fluctuations is proportional to a and decreases with increasing profile angle. This relationship allows us to estimate the statistical parameters ɛ and a from observed traveltime fluctuations. Numerical experiments for spatial isotropic random media characterized by a Gaussian autocorrelation function show that the statistical parameters can be reproduced successfully if L/a ≤ 10 . For larger L/a the correlation distance is overestimated and the standard deviation is underestimated. However, the results of the numerical experiments provide empirical factors to correct for these effects. The theory is applied to observed traveltime fluctuations of the Pg phase on a profile of the BABEL project. For the upper crust east of Øland (Sweden) slowness fluctuations with standard deviation ɛ = 2.2-5% and correlation distance a = 330-600 m are found.
The advent of new higher throughput analytical instrumentation has put a strain on interpreting and explaining the results from complex studies. Contemporary human, environmental, and biomonitoring data sets are comprised of tens or hundreds of analytes, multiple repeat measures...
The advent of new higher throughput analytical instrumentation has put a strain on interpreting and explaining the results from complex studies. Contemporary human, environmental, and biomonitoring data sets are comprised of tens or hundreds of analytes, multiple repeat measures...
Interpretation and use of statistics in nursing research.
Giuliano, Karen K; Polanowicz, Michelle
2008-01-01
A working understanding of the major fundamentals of statistical analysis is required to incorporate the findings of empirical research into nursing practice. The primary focus of this article is to describe common statistical terms, present some common statistical tests, and explain the interpretation of results from inferential statistics in nursing research. An overview of major concepts in statistics, including the distinction between parametric and nonparametric statistics, different types of data, and the interpretation of statistical significance, is reviewed. Examples of some of the most common statistical techniques used in nursing research, such as the Student independent t test, analysis of variance, and regression, are also discussed. Nursing knowledge based on empirical research plays a fundamental role in the development of evidence-based nursing practice. The ability to interpret and use quantitative findings from nursing research is an essential skill for advanced practice nurses to ensure provision of the best care possible for our patients.
Data Interpretation: Using Probability
ERIC Educational Resources Information Center
Drummond, Gordon B.; Vowler, Sarah L.
2011-01-01
Experimental data are analysed statistically to allow researchers to draw conclusions from a limited set of measurements. The hard fact is that researchers can never be certain that measurements from a sample will exactly reflect the properties of the entire group of possible candidates available to be studied (although using a sample is often the…
Data Interpretation: Using Probability
ERIC Educational Resources Information Center
Drummond, Gordon B.; Vowler, Sarah L.
2011-01-01
Experimental data are analysed statistically to allow researchers to draw conclusions from a limited set of measurements. The hard fact is that researchers can never be certain that measurements from a sample will exactly reflect the properties of the entire group of possible candidates available to be studied (although using a sample is often the…
Local statistical interpretation for water structure
NASA Astrophysics Data System (ADS)
Sun, Qiang
2013-05-01
In this Letter, Raman spectroscopy is employed to study supercooled water down to a temperature of 248 K at ambient pressure. Based on our interpretation of the Raman OH stretching band, decreasing temperature mainly leads to a structural transition from the single donor-single acceptor (DA) to the double donor-double acceptor (DDAA) hydrogen bonding motif. Additionally, a local statistical interpretation of the water structure is proposed, which reveals that a water molecule interacts with molecules in the first shell through various local hydrogen-bonded networks. From this, a local structure order parameter is proposed to explain the short-range order and long-range disorder.
Nash, J. Thomas; Frishman, David
1983-01-01
Analytical results for 61 elements in 370 samples from the Ranger Mine area are reported. Most of the rocks come from drill core in the Ranger No. 1 and Ranger No. 3 deposits, but 20 samples are from unmineralized drill core more than 1 km from ore. Statistical tests show that the elements Mg, Fe, F, Be, Co, Li, Ni, Pb, Sc, Th, Ti, V, CI, As, Br, Au, Ce, Dy, La Sc, Eu, Tb, Yb, and Tb have positive association with uranium, and Si, Ca, Na, K, Sr, Ba, Ce, and Cs have negative association. For most lithologic subsets Mg, Fe, Li, Cr, Ni, Pb, V, Y, Sm, Sc, Eu, and Yb are significantly enriched in ore-bearing rocks, whereas Ca, Na, K, Sr, Ba, Mn, Ce, and Cs are significantly depleted. These results are consistent with petrographic observations on altered rocks. Lithogeochemistry can aid exploration, but for these rocks requires methods that are expensive and not amenable to routine use.
Interpreting statistics of small lunar craters
NASA Technical Reports Server (NTRS)
Schultz, P. H.; Gault, D.; Greeley, R.
1977-01-01
Some of the wide variations in the crater-size distributions in lunar photography and in the resulting statistics were interpreted as different degradation rates on different surfaces, different scaling laws in different targets, and a possible population of endogenic craters. These possibilities are reexamined for statistics of 26 different regions. In contrast to most other studies, crater diameters as small as 5 m were measured from enlarged Lunar Orbiter framelets. According to the results of the reported analysis, the different crater distribution types appear to be most consistent with the hypotheses of differential degradation and a superposed crater population. Differential degradation can account for the low level of equilibrium in incompetent materials such as ejecta deposits, mantle deposits, and deep regoliths where scaling law changes and catastrophic processes introduce contradictions with other observations.
Interpreting statistics of small lunar craters
NASA Technical Reports Server (NTRS)
Schultz, P. H.; Gault, D.; Greeley, R.
1977-01-01
Some of the wide variations in the crater-size distributions in lunar photography and in the resulting statistics were interpreted as different degradation rates on different surfaces, different scaling laws in different targets, and a possible population of endogenic craters. These possibilities are reexamined for statistics of 26 different regions. In contrast to most other studies, crater diameters as small as 5 m were measured from enlarged Lunar Orbiter framelets. According to the results of the reported analysis, the different crater distribution types appear to be most consistent with the hypotheses of differential degradation and a superposed crater population. Differential degradation can account for the low level of equilibrium in incompetent materials such as ejecta deposits, mantle deposits, and deep regoliths where scaling law changes and catastrophic processes introduce contradictions with other observations.
Hemophilia Data and Statistics
... Hemophilia Women Healthcare Providers Partners Media Policy Makers Data & Statistics Language: English (US) EspaÃ±ol (Spanish) Recommend ... at a very young age. Based on CDC data, the median age at diagnosis is 36 months ...
As watershed groups in the state of Georgia form and develop, they have a need for collecting, managing, and analyzing data associated with their watershed. Possible sources of data for flow, water quality, biology, habitat, and watershed characteristics include the U.S. Geologic...
As watershed groups in the state of Georgia form and develop, they have a need for collecting, managing, and analyzing data associated with their watershed. Possible sources of data for flow, water quality, biology, habitat, and watershed characteristics include the U.S. Geologic...
Use and interpretation of statistics in wildlife journals
Tacha, Thomas C.; Warde, William D.; Burnham, Kenneth P.
1982-01-01
Use and interpretation of statistics in wildlife journals are reviewed, and suggestions for improvement are offered. Populations from which inferences are to be drawn should be clearly defined, and conclusions should be limited to the range of the data analyzed. Authors should be careful to avoid improper methods of plotting data and should clearly define the use of estimates of variance, standard deviation, standard error, or confidence intervals. Biological and statistical significant are often confused by authors and readers. Statistical hypothesis testing is a tool, and not every question should be answered by hypothesis testing. Meeting assumptions of hypothesis tests is the responsibility of authors, and assumptions should be reviewed before a test is employed. The use of statistical tools should be considered carefully both before and after gathering data.
Data collection and interpretation.
Citerio, Giuseppe; Park, Soojin; Schmidt, J Michael; Moberg, Richard; Suarez, Jose I; Le Roux, Peter D
2015-06-01
Patient monitoring is routinely performed in all patients who receive neurocritical care. The combined use of monitors, including the neurologic examination, laboratory analysis, imaging studies, and physiological parameters, is common in a platform called multi-modality monitoring (MMM). However, the full potential of MMM is only beginning to be realized since for the most part, decision making historically has focused on individual aspects of physiology in a largely threshold-based manner. The use of MMM now is being facilitated by the evolution of bio-informatics in critical care including developing techniques to acquire, store, retrieve, and display integrated data and new analytic techniques for optimal clinical decision making. In this review, we will discuss the crucial initial steps toward data and information management, which in this emerging era of data-intensive science is already shifting concepts of care for acute brain injury and has the potential to both reshape how we do research and enhance cost-effective clinical care.
Pre-service teachers challenges while interpreting statistical graphs
NASA Astrophysics Data System (ADS)
Wahid, Norabiatul Adawiah Abd; Rahim, Suzieleez Syrene Abdul; Zamri, Sharifah Norul Akmar Syed
2017-05-01
Nowadays statistical graphs has been widely used as a medium to communicate. Awareness of the important of statistical graphs have been realized by Ministry of Education. Therefore, Ministry of Education have included this topic into national standard curriculum as early as Standard 3. It proved that this field of study is important to our students. However pre-service teachers still faced some difficulties to comprehend various types of statistical graphs. Among the problem faced by those pre-service teachers are the difficulties to relate two statistical graphs that carrying related issues. Therefore this study will look at the types of difficulties faced by pre-service teachers when they need to interpret two statistical graphs which carried related issues. We focus on data which came from interviews which gives evidence of several problems faced by pre-service teachers when they try to comprehend those graphs that are related with each other. The discussion of results might contribute to an understanding of the complexity of the interpretation of such graphs, and to the possible solutions that can be used to cater on the problems arise.
Interpreting Data: The Hybrid Mind
ERIC Educational Resources Information Center
Heisterkamp, Kimberly; Talanquer, Vicente
2015-01-01
The central goal of this study was to characterize major patterns of reasoning exhibited by college chemistry students when analyzing and interpreting chemical data. Using a case study approach, we investigated how a representative student used chemical models to explain patterns in the data based on structure-property relationships. Our results…
Interpreting Shock Tube Ignition Data
2003-10-01
times only for high concentrations (of order 1% fuel or greater). The requirements of engine (IC, HCCI , CI and SI) modelers also present a different...Paper 03F-61 Interpreting Shock Tube Ignition Data D. F. Davidson and R. K. Hanson Mechanical Engineering ... Engineering Department Stanford University, Stanford CA 94305 Abstract Chemical kinetic modelers make extensive use of shock tube ignition data
Interpreting Data: The Hybrid Mind
ERIC Educational Resources Information Center
Heisterkamp, Kimberly; Talanquer, Vicente
2015-01-01
The central goal of this study was to characterize major patterns of reasoning exhibited by college chemistry students when analyzing and interpreting chemical data. Using a case study approach, we investigated how a representative student used chemical models to explain patterns in the data based on structure-property relationships. Our results…
Rossell, David
2016-01-01
Big Data brings unprecedented power to address scientific, economic and societal issues, but also amplifies the possibility of certain pitfalls. These include using purely data-driven approaches that disregard understanding the phenomenon under study, aiming at a dynamically moving target, ignoring critical data collection issues, summarizing or preprocessing the data inadequately and mistaking noise for signal. We review some success stories and illustrate how statistical principles can help obtain more reliable information from data. We also touch upon current challenges that require active methodological research, such as strategies for efficient computation, integration of heterogeneous data, extending the underlying theory to increasingly complex questions and, perhaps most importantly, training a new generation of scientists to develop and deploy these strategies. PMID:27722040
The Statistical Interpretation of Entropy: An Activity
ERIC Educational Resources Information Center
Timmberlake, Todd
2010-01-01
The second law of thermodynamics, which states that the entropy of an isolated macroscopic system can increase but will not decrease, is a cornerstone of modern physics. Ludwig Boltzmann argued that the second law arises from the motion of the atoms that compose the system. Boltzmann's statistical mechanics provides deep insight into the…
The Statistical Interpretation of Entropy: An Activity
ERIC Educational Resources Information Center
Timmberlake, Todd
2010-01-01
The second law of thermodynamics, which states that the entropy of an isolated macroscopic system can increase but will not decrease, is a cornerstone of modern physics. Ludwig Boltzmann argued that the second law arises from the motion of the atoms that compose the system. Boltzmann's statistical mechanics provides deep insight into the…
The Statistical Interpretation of Entropy: An Activity
NASA Astrophysics Data System (ADS)
Timmberlake, Todd
2010-11-01
The second law of thermodynamics, which states that the entropy of an isolated macroscopic system can increase but will not decrease, is a cornerstone of modern physics. Ludwig Boltzmann argued that the second law arises from the motion of the atoms that compose the system. Boltzmann's statistical mechanics provides deep insight into the functioning of the second law and also provided evidence for the existence of atoms at a time when many scientists (like Ernst Mach and Wilhelm Ostwald) were skeptical.
The Statistical Interpretation of Classical Thermodynamic Heating and Expansion Processes
ERIC Educational Resources Information Center
Cartier, Stephen F.
2011-01-01
A statistical model has been developed and applied to interpret thermodynamic processes typically presented from the macroscopic, classical perspective. Through this model, students learn and apply the concepts of statistical mechanics, quantum mechanics, and classical thermodynamics in the analysis of the (i) constant volume heating, (ii)…
The Statistical Interpretation of Classical Thermodynamic Heating and Expansion Processes
ERIC Educational Resources Information Center
Cartier, Stephen F.
2011-01-01
A statistical model has been developed and applied to interpret thermodynamic processes typically presented from the macroscopic, classical perspective. Through this model, students learn and apply the concepts of statistical mechanics, quantum mechanics, and classical thermodynamics in the analysis of the (i) constant volume heating, (ii)…
STATISTICAL DATA ON CHEMICAL COMPOUNDS.
DATA STORAGE SYSTEMS, FEASIBILITY STUDIES, COMPUTERS, STATISTICAL DATA , DOCUMENTS, ARMY...CHEMICAL COMPOUNDS, INFORMATION RETRIEVAL), (*INFORMATION RETRIEVAL, CHEMICAL COMPOUNDS), MOLECULAR STRUCTURE, BIBLIOGRAPHIES, DATA PROCESSING
Linda Stetzenbach; Lauren Nemnich; Davor Novosel
2009-08-31
Three independent tasks had been performed (Stetzenbach 2008, Stetzenbach 2008b, Stetzenbach 2009) to measure a variety of parameters in normative buildings across the United States. For each of these tasks 10 buildings were selected as normative indoor environments. Task 1 focused on office buildings, Task 13 focused on public schools, and Task 0606 focused on high performance buildings. To perform this task it was necessary to restructure the database for the Indoor Environmental Quality (IEQ) data and the Sound measurement as several issues were identified and resolved prior to and during the transfer of these data sets into SPSS. During overview discussions with the statistician utilized in this task it was determined that because the selection of indoor zones (1-6) was independently selected within each task; zones were not related by location across tasks. Therefore, no comparison would be valid across zones for the 30 buildings so the by location (zone) data were limited to three analysis sets of the buildings within each task. In addition, differences in collection procedures for lighting were used in Task 0606 as compared to Tasks 01 & 13 to improve sample collection. Therefore, these data sets could not be merged and compared so effects by-day data were run separately for Task 0606 and only Task 01 & 13 data were merged. Results of the statistical analysis of the IEQ parameters show statistically significant differences were found among days and zones for all tasks, although no differences were found by-day for Draft Rate data from Task 0606 (p>0.05). Thursday measurements of IEQ parameters were significantly different from Tuesday, and most Wednesday measures for all variables of Tasks 1 & 13. Data for all three days appeared to vary for Operative Temperature, whereas only Tuesday and Thursday differed for Draft Rate 1m. Although no Draft Rate measures within Task 0606 were found to significantly differ by-day, Temperature measurements for Tuesday and
Spirakis, C.S.; Pierson, C.T.; Santos, E.S.; Fishman, N.S.
1983-01-01
Statistical treatment of analytical data from 106 samples of uranium-mineralized and unmineralized or weakly mineralized rocks of the Morrison Formation from the northeastern part of the Church Rock area of the Grants uranium region indicates that along with uranium, the deposits in the northeast Church Rock area are enriched in barium, sulfur, sodium, vanadium and equivalent uranium. Selenium and molybdenum are sporadically enriched in the deposits and calcium, manganese, strontium, and yttrium are depleted. Unlike the primary deposits of the San Juan Basin, the deposits in the northeast part of the Church Rock area contain little organic carbon and several elements that are characteristically enriched in the primary deposits are not enriched or are enriched to a much lesser degree in the Church Rock deposits. The suite of elements associated with the deposits in the northeast part of the Church Rock area is also different from the suite of elements associated with the redistributed deposits in the Ambrosia Lake district. This suggests that the genesis of the Church Rock deposits is different, at least in part, from the genesis of the primary deposits of the San Juan Basin or the redistributed deposits at Ambrosia Lake.
NASA Astrophysics Data System (ADS)
Tema, E.; Zanella, E.; Pavón-Carrasco, F. J.; Kondopoulou, D.; Pavlides, S.
2015-10-01
We present the results of palaeomagnetic analysis on Late Bronge Age pottery from Santorini carried out in order to estimate the thermal effect of the Minoan eruption on the pre-Minoan habitation level. A total of 170 specimens from 108 ceramic fragments have been studied. The ceramics were collected from the surface of the pre-Minoan palaeosol at six different sites, including also samples from the Akrotiri archaeological site. The deposition temperatures of the first pyroclastic products have been estimated by the maximum overlap of the re-heating temperature intervals given by the individual fragments at site level. A new statistical elaboration of the temperature data has also been proposed, calculating at 95 per cent of probability the re-heating temperatures at each site. The obtained results show that the precursor tephra layer and the first pumice fall of the eruption were hot enough to re-heat the underlying ceramics at temperatures 160-230 °C in the non-inhabited sites while the temperatures recorded inside the Akrotiri village are slightly lower, varying from 130 to 200 °C. The decrease of the temperatures registered in the human settlements suggests that there was some interaction between the buildings and the pumice fallout deposits while probably the buildings debris layer caused by the preceding and syn-eruption earthquakes has also contributed to the decrease of the recorded re-heating temperatures.
Interpreting NHANES biomonitoring data, cadmium.
Ruiz, Patricia; Mumtaz, Moiz; Osterloh, John; Fisher, Jeffrey; Fowler, Bruce A
2010-09-15
Cadmium (Cd) occurs naturally in the environment and the general population's exposure to it is predominantly through diet. Chronic Cd exposure is a public health concern because Cd is a known carcinogen; it accumulates in the body and causes kidney damage. The National Health and Nutritional Examination Survey (NHANES) has measured urinary Cd; the 2003-2004 NHANES survey cycle reported estimates for 2257 persons aged 6 years and older in the Fourth National Report on Human Exposure to Environmental Chemicals. As part of translational research to make computerized models accessible to health risk assessors we re-coded a cadmium model in Berkeley Madonna simulation language. This model was used in our computational toxicology laboratory to predict the urinary excretion of cadmium. The model simulated the NHANES-measured data very well from ages 6 to 60+ years. An unusual increase in Cd urinary excretion was observed among 6-11-year-olds, followed by a continuous monotonic rise into the seventh decade of life. This observation was also made in earlier studies that could be life stage-related and a function of anatomical and phsysiological changes occurring during this period of life. Urinary excretion of Cd was approximately twofold higher among females than males in all age groups. The model describes Cd's cumulative nature in humans and accommodates the observed variation in exposure/uptake over the course of a lifetime. Such models may be useful for interpreting biomonitoring data and risk assessment.
Hahn, A.A.
1994-11-01
The complexity of instrumentation sometimes requires data analysis to be done before the result is presented to the control room. This tutorial reviews some of the theoretical assumptions underlying the more popular forms of data analysis and presents simple examples to illuminate the advantages and hazards of different techniques.
Comparing survival curves using an easy to interpret statistic.
Hess, Kenneth R
2010-10-15
Here, I describe a statistic for comparing two survival curves that has a clear and obvious meaning and has a long history in biostatistics. Suppose we are comparing survival times associated with two treatments A and B. The statistic operates in such a way that if it takes on the value 0.95, then the interpretation is that a randomly chosen patient treated with A has a 95% chance of surviving longer than a randomly chosen patient treated with B. This statistic was first described in the 1950s, and was generalized in the 1960s to work with right-censored survival times. It is a useful and convenient measure for assessing differences between survival curves. Software for computing the statistic is readily available on the Internet.
Spina Bifida Data and Statistics
... Materials About Us Information For… Media Policy Makers Data and Statistics Recommend on Facebook Tweet Share Compartir ... non-Hispanic white and non-Hispanic black women. Data from 12 state-based birth defects tracking programs ...
Muscular Dystrophy: Data and Statistics
... Us Information For… Media Policy Makers MD STARnet Data and Statistics Recommend on Facebook Tweet Share Compartir ... Key Findings on this research [ Read Article ] Basic Data How we got these numbers: The tables below ...
Birth Defects Data and Statistics
... this? Submit Button Information For… Media Policy Makers Data & Statistics Recommend on Facebook Tweet Share Compartir On ... of birth defects in the United States. For data on specific birth defects, please visit the specific ...
Workplace Statistical Literacy for Teachers: Interpreting Box Plots
ERIC Educational Resources Information Center
Pierce, Robyn; Chick, Helen
2013-01-01
As a consequence of the increased use of data in workplace environments, there is a need to understand the demands that are placed on users to make sense of such data. In education, teachers are being increasingly expected to interpret and apply complex data about student and school performance, and, yet it is not clear that they always have the…
Workplace Statistical Literacy for Teachers: Interpreting Box Plots
ERIC Educational Resources Information Center
Pierce, Robyn; Chick, Helen
2013-01-01
As a consequence of the increased use of data in workplace environments, there is a need to understand the demands that are placed on users to make sense of such data. In education, teachers are being increasingly expected to interpret and apply complex data about student and school performance, and, yet it is not clear that they always have the…
[Big data in official statistics].
Zwick, Markus
2015-08-01
The concept of "big data" stands to change the face of official statistics over the coming years, having an impact on almost all aspects of data production. The tasks of future statisticians will not necessarily be to produce new data, but rather to identify and make use of existing data to adequately describe social and economic phenomena. Until big data can be used correctly in official statistics, a lot of questions need to be answered and problems solved: the quality of data, data protection, privacy, and the sustainable availability are some of the more pressing issues to be addressed. The essential skills of official statisticians will undoubtedly change, and this implies a number of challenges to be faced by statistical education systems, in universities, and inside the statistical offices. The national statistical offices of the European Union have concluded a concrete strategy for exploring the possibilities of big data for official statistics, by means of the Big Data Roadmap and Action Plan 1.0. This is an important first step and will have a significant influence on implementing the concept of big data inside the statistical offices of Germany.
Adapting internal statistical models for interpreting visual cues to depth
Seydell, Anna; Knill, David C.; Trommershäuser, Julia
2010-01-01
The informativeness of sensory cues depends critically on statistical regularities in the environment. However, statistical regularities vary between different object categories and environments. We asked whether and how the brain changes the prior assumptions about scene statistics used to interpret visual depth cues when stimulus statistics change. Subjects judged the slants of stereoscopically presented figures by adjusting a virtual probe perpendicular to the surface. In addition to stereoscopic disparities, the aspect ratio of the stimulus in the image provided a “figural compression” cue to slant, whose reliability depends on the distribution of aspect ratios in the world. As we manipulated this distribution from regular to random and back again, subjects’ reliance on the compression cue relative to stereoscopic cues changed accordingly. When we randomly interleaved stimuli from shape categories (ellipses and diamonds) with different statistics, subjects gave less weight to the compression cue for figures from the category with more random aspect ratios. Our results demonstrate that relative cue weights vary rapidly as a function of recently experienced stimulus statistics, and that the brain can use different statistical models for different object categories. We show that subjects’ behavior is consistent with that of a broad class of Bayesian learning models. PMID:20465321
Statistical Interpretation of Natural and Technological Hazards in China
NASA Astrophysics Data System (ADS)
Borthwick, Alistair, ,, Prof.; Ni, Jinren, ,, Prof.
2010-05-01
China is prone to catastrophic natural hazards from floods, droughts, earthquakes, storms, cyclones, landslides, epidemics, extreme temperatures, forest fires, avalanches, and even tsunami. This paper will list statistics related to the six worst natural disasters in China over the past 100 or so years, ranked according to number of fatalities. The corresponding data for the six worst natural disasters in China over the past decade will also be considered. [The data are abstracted from the International Disaster Database, Centre for Research on the Epidemiology of Disasters (CRED), Université Catholique de Louvain, Brussels, Belgium, http://www.cred.be/ where a disaster is defined as occurring if one of the following criteria is fulfilled: 10 or more people reported killed; 100 or more people reported affected; a call for international assistance; or declaration of a state of emergency.] The statistics include the number of occurrences of each type of natural disaster, the number of deaths, the number of people affected, and the cost in billions of US dollars. Over the past hundred years, the largest disasters may be related to the overabundance or scarcity of water, and to earthquake damage. However, there has been a substantial relative reduction in fatalities due to water related disasters over the past decade, even though the overall numbers of people affected remain huge, as does the economic damage. This change is largely due to the efforts put in by China's water authorities to establish effective early warning systems, the construction of engineering countermeasures for flood protection, the implementation of water pricing and other measures for reducing excessive consumption during times of drought. It should be noted that the dreadful death toll due to the Sichuan Earthquake dominates recent data. Joint research has been undertaken between the Department of Environmental Engineering at Peking University and the Department of Engineering Science at Oxford
Statistical interpretation of ISO TC42 dynamic range: risky business
NASA Astrophysics Data System (ADS)
Williams, Don; Burns, Peter D.; Dupin, Michael
2006-01-01
Recently, two ISO electronic imaging standards aimed at digital capture device dynamic range metrology have been issued. Both ISO 15739 (digital still camera noise) and ISO 21550 (film scanner dynamic range) adopt a signal-to-noise ratio (SNR) criterion for specifying dynamic range. To resiliently compare systems with differing mean-signal transfer, or Electro-Optical Conversion Functions (OECF), an incremental SNR (SNRi) is used. The exposure levels that correspond to threshold-SNR values are used as endpoints to determine measured dynamic range. While these thresholds were developed through committee consensus with generic device applications in mind, the methodology of these standards is flexible enough to accommodate different application requirements. This can be done by setting the SNR thresholds according to particular signal-detection requirements. We will show how dynamic range metrology, as defined in the above standards, can be interpreted in terms of statistical hypothesis testing and confidence interval methods for mean signal values. We provide an interpretation of dynamic range that can be related to particular applications based on contributing influences of variance, confidence intervals, and sample size variables. In particular, we introduce the role of the spatial-correlation statistics for both signal and noise sources, not covered in previous discussions of these ISO standards. This can be interpreted in terms of a signal's spatial frequency spectrum and noise power spectrum (NPS) respectively. It is this frequency aspect to dynamic range evaluation that may well influence future standards. We maintain that this is important when comparing systems with different sampling settings, since the above noise statistics are currently computed on a per-pixel basis.
Statistical Interpretation of Key Comparison Reference Value and Degrees of Equivalence
Kacker, R. N.; Datla, R. U.; Parr, A. C.
2003-01-01
Key comparisons carried out by the Consultative Committees (CCs) of the International Committee of Weights and Measures (CIPM) or the Bureau International des Poids et Mesures (BIPM) are referred to as CIPM key comparisons. The outputs of a statistical analysis of the data from a CIPM key comparison are the key comparison reference value, the degrees of equivalence, and their associated uncertainties. The BIPM publications do not discuss statistical interpretation of these outputs. We discuss their interpretation under the following three statistical models: nonexistent laboratory-effects model, random laboratory-effects model, and systematic laboratory-effects model. PMID:27413621
Statistical description for survival data
2016-01-01
Statistical description is always the first step in data analysis. It gives investigator a general impression of the data at hand. Traditionally, data are described as central tendency and deviation. However, this framework does not fit to the survival data (also termed time-to-event data). Such data type contains two components. One is the survival time and the other is the status. Researchers are usually interested in the probability of event at a given survival time point. Hazard function, cumulative hazard function and survival function are commonly used to describe survival data. Survival function can be estimated using Kaplan-Meier estimator, which is also the default method in most statistical packages. Alternatively, Nelson-Aalen estimator is available to estimate survival function. Survival functions of subgroups can be compared using log-rank test. Furthermore, the article also introduces how to describe time-to-event data with parametric modeling. PMID:27867953
The interpretation of spectral data
NASA Technical Reports Server (NTRS)
Holter, M. R.
1972-01-01
The characteristics and extent of data which is obtainable by electromagnetic spectrum sensing and the application to earth resources survey are discussed. The wavelength and frequency ranges of operation for various remote sensors are tabulated. The spectral sensitivities of various sensing instruments are diagrammed. Examples of aerial photography to show the effects of lighting and seasonal variations on earth resources data are provided. Specific examples of multiband photography and multispectral imagery to crop analysis are included.
Engine Data Interpretation System (EDIS)
NASA Technical Reports Server (NTRS)
Cost, Thomas L.; Hofmann, Martin O.
1990-01-01
A prototype of an expert system was developed which applies qualitative or model-based reasoning to the task of post-test analysis and diagnosis of data resulting from a rocket engine firing. A combined component-based and process theory approach is adopted as the basis for system modeling. Such an approach provides a framework for explaining both normal and deviant system behavior in terms of individual component functionality. The diagnosis function is applied to digitized sensor time-histories generated during engine firings. The generic system is applicable to any liquid rocket engine but was adapted specifically in this work to the Space Shuttle Main Engine (SSME). The system is applied to idealized data resulting from turbomachinery malfunction in the SSME.
Structural interpretation of seismic data and inherent uncertainties
NASA Astrophysics Data System (ADS)
Bond, Clare
2013-04-01
Geoscience is perhaps unique in its reliance on incomplete datasets and building knowledge from their interpretation. This interpretation basis for the science is fundamental at all levels; from creation of a geological map to interpretation of remotely sensed data. To teach and understand better the uncertainties in dealing with incomplete data we need to understand the strategies individual practitioners deploy that make them effective interpreters. The nature of interpretation is such that the interpreter needs to use their cognitive ability in the analysis of the data to propose a sensible solution in their final output that is both consistent not only with the original data but also with other knowledge and understanding. In a series of experiments Bond et al. (2007, 2008, 2011, 2012) investigated the strategies and pitfalls of expert and non-expert interpretation of seismic images. These studies focused on large numbers of participants to provide a statistically sound basis for analysis of the results. The outcome of these experiments showed that a wide variety of conceptual models were applied to single seismic datasets. Highlighting not only spatial variations in fault placements, but whether interpreters thought they existed at all, or had the same sense of movement. Further, statistical analysis suggests that the strategies an interpreter employs are more important than expert knowledge per se in developing successful interpretations. Experts are successful because of their application of these techniques. In a new set of experiments a small number of experts are focused on to determine how they use their cognitive and reasoning skills, in the interpretation of 2D seismic profiles. Live video and practitioner commentary were used to track the evolving interpretation and to gain insight on their decision processes. The outputs of the study allow us to create an educational resource of expert interpretation through online video footage and commentary with
The broad topic of biomarker research has an often-overlooked component: the documentation and interpretation of the surrounding chemical environment and other meta-data, especially from visualization, analytical, and statistical perspectives (Pleil et al. 2014; Sobus et al. 2011...
The broad topic of biomarker research has an often-overlooked component: the documentation and interpretation of the surrounding chemical environment and other meta-data, especially from visualization, analytical, and statistical perspectives (Pleil et al. 2014; Sobus et al. 2011...
Optical Processing Of Statistical Data
NASA Astrophysics Data System (ADS)
Bohm, H.; Lohmann, A. W.; Weigelt, G. P.
1980-08-01
The performance / price ratio of digital data processing is steadily increasing, while the performance / price ratio of optical data processing remains nearly constant, although at a favourable level. Given this trend, one might ask: is there still a future for optical data processing? This question cannot be answered in general, since optical data processing is very competent for some jobs, but clumsy or incapable at other jobs. The category of jobs suitable for optics is characterised by features like: high data rate, large storage requirement, moderate accuracy, repetitive program consisting mainly of linear and quadratic operations. Certain statistical computing problems belong into this category. We will present two examples and analyze their data processing efficiencies. The examples are useful in astro-nomy (speckle interferometry) and in biology (motility studies on bacteria).
Statistically significant relational data mining :
Berry, Jonathan W.; Leung, Vitus Joseph; Phillips, Cynthia Ann; Pinar, Ali; Robinson, David Gerald; Berger-Wolf, Tanya; Bhowmick, Sanjukta; Casleton, Emily; Kaiser, Mark; Nordman, Daniel J.; Wilson, Alyson G.
2014-02-01
This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publications that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.
Interpreting statistics from published research to answer clinical and management questions.
White, B J; Larson, R L; Theurer, M E
2016-11-01
Appropriate statistical analysis is critical in interpreting results from published literature to answer clinical and management questions. Internal validity is an assessment of whether the study design and statistical analysis are appropriate for the hypotheses and study variables while controlling for bias and confounding. External validity is an assessment of the appropriateness of extrapolation of the study results to other populations. Knowledge about whether treatment or observation groups are truly different is unknown, but studies can be broadly categorized as exploratory or discovery, based on knowledge about previous research, biology, and study design, and this categorization affects interpretation. Confidence intervals, -values, prediction intervals, credible intervals, and other decision aids are used singly or in combination to provide evidence for the likelihood of a given model but can be interpreted only if the study is internally valid. These decision aids do not test for bias, study design, or the appropriateness of applying study results to other populations dissimilar to the population tested. The biologic and economic importance of the magnitude of difference between treatment groups or observation groups as estimated by the study data and statistical interpretation is important to consider in clinical and management decisions. Statistical results should be interpreted in light of the specific question and production system addressed, the study design, and knowledge about pertinent aspects of biology to appropriately aid decisions.
Lombard, Martani J; Steyn, Nelia P; Charlton, Karen E; Senekal, Marjanne
2015-04-22
Several statistical tests are currently applied to evaluate validity of dietary intake assessment methods. However, they provide information on different facets of validity. There is also no consensus on types and combinations of tests that should be applied to reflect acceptable validity for intakes. We aimed to 1) conduct a review to identify the tests and interpretation criteria used where dietary assessment methods was validated against a reference method and 2) illustrate the value of and challenges that arise in interpretation of outcomes of multiple statistical tests in assessment of validity using a test data set. An in-depth literature review was undertaken to identify the range of statistical tests used in the validation of quantitative food frequency questionnaires (QFFQs). Four databases were accessed to search for statistical methods and interpretation criteria used in papers focusing on relative validity. The identified tests and interpretation criteria were applied to a data set obtained using a QFFQ and four repeated 24-hour recalls from 47 adults (18-65 years) residing in rural Eastern Cape, South Africa. 102 studies were screened and 60 were included. Six statistical tests were identified; five with one set of interpretation criteria and one with two sets of criteria, resulting in seven possible validity interpretation outcomes. Twenty-one different combinations of these tests were identified, with the majority including three or less tests. Coefficient of correlation was the most commonly used (as a single test or in combination with one or more tests). Results of our application and interpretation of multiple statistical tests to assess validity of energy, macronutrients and selected micronutrients estimates illustrate that for most of the nutrients considered, some outcomes support validity, while others do not. One to three statistical tests may not be sufficient to provide comprehensive insights into various facets of validity. Results of our
Spatial Statistical Data Fusion (SSDF)
NASA Technical Reports Server (NTRS)
Braverman, Amy J.; Nguyen, Hai M.; Cressie, Noel
2013-01-01
As remote sensing for scientific purposes has transitioned from an experimental technology to an operational one, the selection of instruments has become more coordinated, so that the scientific community can exploit complementary measurements. However, tech nological and scientific heterogeneity across devices means that the statistical characteristics of the data they collect are different. The challenge addressed here is how to combine heterogeneous remote sensing data sets in a way that yields optimal statistical estimates of the underlying geophysical field, and provides rigorous uncertainty measures for those estimates. Different remote sensing data sets may have different spatial resolutions, different measurement error biases and variances, and other disparate characteristics. A state-of-the-art spatial statistical model was used to relate the true, but not directly observed, geophysical field to noisy, spatial aggregates observed by remote sensing instruments. The spatial covariances of the true field and the covariances of the true field with the observations were modeled. The observations are spatial averages of the true field values, over pixels, with different measurement noise superimposed. A kriging framework is used to infer optimal (minimum mean squared error and unbiased) estimates of the true field at point locations from pixel-level, noisy observations. A key feature of the spatial statistical model is the spatial mixed effects model that underlies it. The approach models the spatial covariance function of the underlying field using linear combinations of basis functions of fixed size. Approaches based on kriging require the inversion of very large spatial covariance matrices, and this is usually done by making simplifying assumptions about spatial covariance structure that simply do not hold for geophysical variables. In contrast, this method does not require these assumptions, and is also computationally much faster. This method is
Redshift data and statistical inference
NASA Technical Reports Server (NTRS)
Newman, William I.; Haynes, Martha P.; Terzian, Yervant
1994-01-01
Frequency histograms and the 'power spectrum analysis' (PSA) method, the latter developed by Yu & Peebles (1969), have been widely employed as techniques for establishing the existence of periodicities. We provide a formal analysis of these two classes of methods, including controlled numerical experiments, to better understand their proper use and application. In particular, we note that typical published applications of frequency histograms commonly employ far greater numbers of class intervals or bins than is advisable by statistical theory sometimes giving rise to the appearance of spurious patterns. The PSA method generates a sequence of random numbers from observational data which, it is claimed, is exponentially distributed with unit mean and variance, essentially independent of the distribution of the original data. We show that the derived random processes is nonstationary and produces a small but systematic bias in the usual estimate of the mean and variance. Although the derived variable may be reasonably described by an exponential distribution, the tail of the distribution is far removed from that of an exponential, thereby rendering statistical inference and confidence testing based on the tail of the distribution completely unreliable. Finally, we examine a number of astronomical examples wherein these methods have been used giving rise to widespread acceptance of statistically unconfirmed conclusions.
Analysis of Visual Interpretation of Satellite Data
NASA Astrophysics Data System (ADS)
Svatonova, H.
2016-06-01
Millions of people of all ages and expertise are using satellite and aerial data as an important input for their work in many different fields. Satellite data are also gradually finding a new place in education, especially in the fields of geography and in environmental issues. The article presents the results of an extensive research in the area of visual interpretation of image data carried out in the years 2013 - 2015 in the Czech Republic. The research was aimed at comparing the success rate of the interpretation of satellite data in relation to a) the substrates (to the selected colourfulness, the type of depicted landscape or special elements in the landscape) and b) to selected characteristics of users (expertise, gender, age). The results of the research showed that (1) false colour images have a slightly higher percentage of successful interpretation than natural colour images, (2) colourfulness of an element expected or rehearsed by the user (regardless of the real natural colour) increases the success rate of identifying the element (3) experts are faster in interpreting visual data than non-experts, with the same degree of accuracy of solving the task, and (4) men and women are equally successful in the interpretation of visual image data.
Statistical analysis of pyroshock data
NASA Astrophysics Data System (ADS)
Hughes, William O.
2002-05-01
The sample size of aerospace pyroshock test data is typically small. This often forces the engineer to make assumptions on its population distribution and to use conservative margins or methodologies in determining shock specifications. For example, the maximum expected environment is often derived by adding 3-6 dB to the maximum envelope of a limited amount of shock data. The recent availability of a large amount of pyroshock test data has allowed a rare statistical analysis to be performed. Findings and procedures from this analysis will be explained, including information on population distributions, procedures to properly combine families of test data, and methods of deriving appropriate shock specifications for a multipoint shock source.
Interpreting the flock algorithm from a statistical perspective.
Anderson, Eric C; Barry, Patrick D
2015-09-01
We show that the algorithm in the program flock (Duchesne & Turgeon 2009) can be interpreted as an estimation procedure based on a model essentially identical to the structure (Pritchard et al. 2000) model with no admixture and without correlated allele frequency priors. Rather than using MCMC, the flock algorithm searches for the maximum a posteriori estimate of this structure model via a simulated annealing algorithm with a rapid cooling schedule (namely, the exponent on the objective function →∞). We demonstrate the similarities between the two programs in a two-step approach. First, to enable rapid batch processing of many simulated data sets, we modified the source code of structure to use the flock algorithm, producing the program flockture. With simulated data, we confirmed that results obtained with flock and flockture are very similar (though flockture is some 200 times faster). Second, we simulated multiple large data sets under varying levels of population differentiation for both microsatellite and SNP genotypes. We analysed them with flockture and structure and assessed each program on its ability to cluster individuals to their correct subpopulation. We show that flockture yields results similar to structure albeit with greater variability from run to run. flockture did perform better than structure when genotypes were composed of SNPs and differentiation was moderate (FST= 0.022-0.032). When differentiation was low, structure outperformed flockture for both marker types. On large data sets like those we simulated, it appears that flock's reliance on inference rules regarding its 'plateau record' is not helpful. Interpreting flock's algorithm as a special case of the model in structure should aid in understanding the program's output and behaviour.
Alternative interpretations of statistics on health effects of low-level radiation
Hamilton, L.D.
1983-11-01
Four examples of the interpretation of statistics of data on low-level radiation are reviewed: (a) genetic effects of the atomic bombs at Hiroshima and Nagasaki, (b) cancer at Rocky Flats, (c) childhood leukemia and fallout in Utah, and (d) cancer among workers at the Portsmouth Naval Shipyard. Aggregation of data, adjustment for age, and other problems related to the determination of health effects of low-level radiation are discussed. Troublesome issues related to post hoc analysis are considered.
Data Interpretation in the Digital Age
Leonelli, Sabina
2014-01-01
The consultation of internet databases and the related use of computer software to retrieve, visualise and model data have become key components of many areas of scientific research. This paper focuses on the relation of these developments to understanding the biology of organisms, and examines the conditions under which the evidential value of data posted online is assessed and interpreted by the researchers who access them, in ways that underpin and guide the use of those data to foster discovery. I consider the types of knowledge required to interpret data as evidence for claims about organisms, and in particular the relevance of knowledge acquired through physical interaction with actual organisms to assessing the evidential value of data found online. I conclude that familiarity with research in vivo is crucial to assessing the quality and significance of data visualised in silico; and that studying how biological data are disseminated, visualised, assessed and interpreted in the digital age provides a strong rationale for viewing scientific understanding as a social and distributed, rather than individual and localised, achievement. PMID:25729262
Interpreting genomic data via entropic dissection
Azad, Rajeev K.; Li, Jing
2013-01-01
Since the emergence of high-throughput genome sequencing platforms and more recently the next-generation platforms, the genome databases are growing at an astronomical rate. Tremendous efforts have been invested in recent years in understanding intriguing complexities beneath the vast ocean of genomic data. This is apparent in the spurt of computational methods for interpreting these data in the past few years. Genomic data interpretation is notoriously difficult, partly owing to the inherent heterogeneities appearing at different scales. Methods developed to interpret these data often suffer from their inability to adequately measure the underlying heterogeneities and thus lead to confounding results. Here, we present an information entropy-based approach that unravels the distinctive patterns underlying genomic data efficiently and thus is applicable in addressing a variety of biological problems. We show the robustness and consistency of the proposed methodology in addressing three different biological problems of significance—identification of alien DNAs in bacterial genomes, detection of structural variants in cancer cell lines and alignment-free genome comparison. PMID:23036836
Data Systems and Reports as Active Participants in Data Interpretation
ERIC Educational Resources Information Center
Rankin, Jenny Grant
2016-01-01
Most data-informed decision-making in education is undermined by flawed interpretations. Educator-driven interventions to improve data use are beneficial but not omnipotent, as data misunderstandings persist at schools and school districts commended for ideal data use support. Meanwhile, most data systems and reports display figures without…
Regional interpretation of Kansas aeromagnetic data
Yarger, H.L.
1982-01-01
The aeromagnetic mapping techniques used in a regional aeromagnetic survey of the state are documented and a qualitative regional interpretation of the magnetic basement is presented. Geothermal gradients measured and data from oil well records indicate that geothermal resources in Kansas are of a low-grade nature. However, considerable variation in the gradient is noted statewide within the upper 500 meters of the sedimentary section; this suggests the feasibility of using groundwater for space heating by means of heat pumps.
Using Statistics to Lie, Distort, and Abuse Data
ERIC Educational Resources Information Center
Bintz, William; Moore, Sara; Adams, Cheryll; Pierce, Rebecca
2009-01-01
Statistics is a branch of mathematics that involves organization, presentation, and interpretation of data, both quantitative and qualitative. Data do not lie, but people do. On the surface, quantitative data are basically inanimate objects, nothing more than lifeless and meaningless symbols that appear on a page, calculator, computer, or in one's…
Using Statistics to Lie, Distort, and Abuse Data
ERIC Educational Resources Information Center
Bintz, William; Moore, Sara; Adams, Cheryll; Pierce, Rebecca
2009-01-01
Statistics is a branch of mathematics that involves organization, presentation, and interpretation of data, both quantitative and qualitative. Data do not lie, but people do. On the surface, quantitative data are basically inanimate objects, nothing more than lifeless and meaningless symbols that appear on a page, calculator, computer, or in one's…
Data interpretation in breath biomarker research: pitfalls and directions.
Miekisch, Wolfram; Herbig, Jens; Schubert, Jochen K
2012-09-01
Most--if not all--potential diagnostic applications in breath research involve different marker concentrations rather than unique breath markers which only occur in the diseased state. Hence, data interpretation is a crucial step in breath analysis. To avoid artificial significance in breath testing every effort should be made to implement method validation, data cross-testing and statistical validation along this process. The most common data analysis related problems can be classified into three groups: confounding variables (CVs), which have a real correlation with both the diseased state and a breath marker but lead to the erroneous conclusion that disease and breath are in a causal relationship; voodoo correlations (VCs), which can be understood as statistically true correlations that arise coincidentally in the vast number of measured variables; and statistical misconceptions in the study design (SMSD). CV: Typical confounding variables are environmental and medical history, host factors such as gender, age, weight, etc and parameters that could affect the quality of breath data such as subject breathing mode, effects of breath sampling and effects of the analytical technique itself. VC: The number of measured variables quickly overwhelms the number of samples that can feasibly be taken. As a consequence, the chances of finding coincidental 'voodoo' correlations grow proportionally. VCs can typically be expected in the following scenarios: insufficient number of patients, (too) many measurement variables, the use of advanced statistical data mining methods, and non-independent data for validation. SMSD: Non-prospective, non-blinded and non-randomized trials, a priori biased study populations or group selection with unrealistically high disease prevalence typically represent misconception of study design. In this paper important data interpretation issues are discussed, common pitfalls are addressed and directions for sound data processing and interpretation
The Lure of Statistics in Data Mining
ERIC Educational Resources Information Center
Grover, Lovleen Kumar; Mehra, Rajni
2008-01-01
The field of Data Mining like Statistics concerns itself with "learning from data" or "turning data into information". For statisticians the term "Data mining" has a pejorative meaning. Instead of finding useful patterns in large volumes of data as in the case of Statistics, data mining has the connotation of searching for data to fit preconceived…
The Lure of Statistics in Data Mining
ERIC Educational Resources Information Center
Grover, Lovleen Kumar; Mehra, Rajni
2008-01-01
The field of Data Mining like Statistics concerns itself with "learning from data" or "turning data into information". For statisticians the term "Data mining" has a pejorative meaning. Instead of finding useful patterns in large volumes of data as in the case of Statistics, data mining has the connotation of searching for data to fit preconceived…
Interpretation of Quantitative Shotgun Proteomic Data.
Aasebø, Elise; Berven, Frode S; Selheim, Frode; Barsnes, Harald; Vaudel, Marc
2016-01-01
In quantitative proteomics, large lists of identified and quantified proteins are used to answer biological questions in a systemic approach. However, working with such extensive datasets can be challenging, especially when complex experimental designs are involved. Here, we demonstrate how to post-process large quantitative datasets, detect proteins of interest, and annotate the data with biological knowledge. The protocol presented can be achieved without advanced computational knowledge thanks to the user-friendly Perseus interface (available from the MaxQuant website, www.maxquant.org ). Various visualization techniques facilitating the interpretation of quantitative results in complex biological systems are also highlighted.
Interpreting magnetic data by integral moments
NASA Astrophysics Data System (ADS)
Tontini, F. Caratori; Pedersen, L. B.
2008-09-01
The use of the integral moments for interpreting magnetic data is based on a very elegant property of potential fields, but in the past it has not been completely exploited due to problems concerning real data. We describe a new 3-D development of previous 2-D results aimed at determining the magnetization direction, extending the calculation to second-order moments to recover the centre of mass of the magnetization distribution. The method is enhanced to reduce the effects of the regional field that often alters the first-order solutions. Moreover, we introduce an iterative correction to properly assess the errors coming from finite-size surveys or interaction with neighbouring anomalies, which are the most important causes of the failing of the method for real data. We test the method on some synthetic examples, and finally, we show the results obtained by analysing the aeromagnetic anomaly of the Monte Vulture volcano in Southern Italy.
Interpretation of Statistical Significance Testing: A Matter of Perspective.
ERIC Educational Resources Information Center
McClure, John; Suen, Hoi K.
1994-01-01
This article compares three models that have been the foundation for approaches to the analysis of statistical significance in early childhood research--the Fisherian and the Neyman-Pearson models (both considered "classical" approaches), and the Bayesian model. The article concludes that all three models have a place in the analysis of research…
Statistical characteristics of MST radar echoes and its interpretation
NASA Technical Reports Server (NTRS)
Woodman, Ronald F.
1989-01-01
Two concepts of fundamental importance are reviewed: the autocorrelation function and the frequency power spectrum. In addition, some turbulence concepts, the relationship between radar signals and atmospheric medium statistics, partial reflection, and the characteristics of noise and clutter interference are discussed.
Confounded Statistical Analyses Hinder Interpretation of the NELP Report
ERIC Educational Resources Information Center
Paris, Scott G.; Luo, Serena Wenshu
2010-01-01
The National Early Literacy Panel (2008) report identified early predictors of reading achievement as good targets for instruction, and many of those skills are related to decoding. In this article, the authors suggest that the developmental trajectories of rapidly developing skills pose problems for traditional statistical analyses. Rapidly…
Evaluating bifactor models: Calculating and interpreting statistical indices.
Rodriguez, Anthony; Reise, Steven P; Haviland, Mark G
2016-06-01
Bifactor measurement models are increasingly being applied to personality and psychopathology measures (Reise, 2012). In this work, authors generally have emphasized model fit, and their typical conclusion is that a bifactor model provides a superior fit relative to alternative subordinate models. Often unexplored, however, are important statistical indices that can substantially improve the psychometric analysis of a measure. We provide a review of the particularly valuable statistical indices one can derive from bifactor models. They include omega reliability coefficients, factor determinacy, construct reliability, explained common variance, and percentage of uncontaminated correlations. We describe how these indices can be calculated and used to inform: (a) the quality of unit-weighted total and subscale score composites, as well as factor score estimates, and (b) the specification and quality of a measurement model in structural equation modeling. (PsycINFO Database Record
Variation in reaction norms: Statistical considerations and biological interpretation.
Morrissey, Michael B; Liefting, Maartje
2016-09-01
Analysis of reaction norms, the functions by which the phenotype produced by a given genotype depends on the environment, is critical to studying many aspects of phenotypic evolution. Different techniques are available for quantifying different aspects of reaction norm variation. We examine what biological inferences can be drawn from some of the more readily applicable analyses for studying reaction norms. We adopt a strongly biologically motivated view, but draw on statistical theory to highlight strengths and drawbacks of different techniques. In particular, consideration of some formal statistical theory leads to revision of some recently, and forcefully, advocated opinions on reaction norm analysis. We clarify what simple analysis of the slope between mean phenotype in two environments can tell us about reaction norms, explore the conditions under which polynomial regression can provide robust inferences about reaction norm shape, and explore how different existing approaches may be used to draw inferences about variation in reaction norm shape. We show how mixed model-based approaches can provide more robust inferences than more commonly used multistep statistical approaches, and derive new metrics of the relative importance of variation in reaction norm intercepts, slopes, and curvatures.
Aerosol backscatter lidar calibration and data interpretation
NASA Technical Reports Server (NTRS)
Kavaya, M. J.; Menzies, R. T.
1984-01-01
A treatment of the various factors involved in lidar data acquisition and analysis is presented. This treatment highlights sources of fundamental, systematic, modeling, and calibration errors that may affect the accurate interpretation and calibration of lidar aerosol backscatter data. The discussion primarily pertains to ground based, pulsed CO2 lidars that probe the troposphere and are calibrated using large, hard calibration targets. However, a large part of the analysis is relevant to other types of lidar systems such as lidars operating at other wavelengths; continuous wave (CW) lidars; lidars operating in other regions of the atmosphere; lidars measuring nonaerosol elastic or inelastic backscatter; airborne or Earth-orbiting lidar platforms; and lidars employing combinations of the above characteristics.
Systematic interpretation of differential capacitance data
NASA Astrophysics Data System (ADS)
Gavish, Nir; Promislow, Keith
2015-07-01
Differential capacitance (DC) data have been widely used to characterize the structure of electrolyte solutions near charged interfaces and as experimental validation of models for electrolyte structure. Fixing a large class of models of electrolyte free energy that incorporate finite-volume effects, a reduction is identified which permits the identification of all free energies within that class that return identical DC data. The result is an interpretation of DC data through the equivalence classes of nonideality terms, and associated boundary layer structures, that cannot be differentiated by DC data. Specifically, for binary salts, DC data, even if measured over a range of ionic concentrations, are unable to distinguish among models which exhibit charge asymmetry, charge reversal, and even ion crowding. The reduction applies to capacitors which are much wider than the associated Debye length and to finite-volume terms that are algebraic in charge density. However, within these restrictions the free energy is shown to be uniquely identified if the DC data are supplemented with measurements of the excess chemical potential of the system in the bulk state.
Data interpretation in the Automated Laboratory
Klatt, L.N.; Elling, J.W.; Mniszewski, S.
1995-12-01
The Contaminant Analysis Automation project envisions the analytical chemistry laboratory of the future being assembled from automation submodules that can be integrated into complete analysis system through a plug-and-play strategy. In this automated system the reduction of instrumental data to knowledge required by the laboratory customer must also be accomplished in an automated way. This paper presents the concept of an automated Data Interpretation Module (DIM) within the context of the plug-and-play automation strategy. The DIM is an expert system driven software module. The DIM functions as a standard laboratory module controlled by the system task sequence controller. The DIM consists of knowledge base(s) that accomplish the data assessment, quality control, and data analysis tasks. The expert system knowledge base(s) encapsulate the training and experience of the analytical chemist. Analysis of instrumental data by the DIM requires the use of pattern recognition techniques. Laboratory data from the analysis of PCBs will be used to illustrate the DIM.
DATA ON YOUTH, 1967, A STATISTICAL DOCUMENT.
ERIC Educational Resources Information Center
SCHEIDER, GEORGE
THE DATA IN THIS REPORT ARE STATISTICS ON YOUTH THROUGHOUT THE UNITED STATES AND IN NEW YORK STATE. INCLUDED ARE DATA ON POPULATION, SCHOOL STATISTICS, EMPLOYMENT, FAMILY INCOME, JUVENILE DELINQUENCY AND YOUTH CRIME (INCLUDING NEW YORK CITY FIGURES), AND TRAFFIC ACCIDENTS. THE STATISTICS ARE PRESENTED IN THE TEXT AND IN TABLES AND CHARTS. (NH)
Statistical Analysis of Geotechnical Data.
1987-09-01
The Data of Fig. 2a. 36 Figure 4. Probability Paper Plot of Compaction Data. 37 Figure 5. Scatter Plot of Compaction Control Data Showing Water 38...Autocorrelation Function of Water Content Over Small Interval 87 of San Francisco Bay Mud. Figure 22. Autocorrelation Function of Water Content Over Large Interval...A Copper 90 Porphyry. Figure 25. Autocorrelation Function of Compacted Water Content in Clay 91 Core of an Embankment Dam. Figure 26. Autocorrelation
Huynh-Thu, Vân Anh; Saeys, Yvan; Wehenkel, Louis; Geurts, Pierre
2012-07-01
Univariate statistical tests are widely used for biomarker discovery in bioinformatics. These procedures are simple, fast and their output is easily interpretable by biologists but they can only identify variables that provide a significant amount of information in isolation from the other variables. As biological processes are expected to involve complex interactions between variables, univariate methods thus potentially miss some informative biomarkers. Variable relevance scores provided by machine learning techniques, however, are potentially able to highlight multivariate interacting effects, but unlike the p-values returned by univariate tests, these relevance scores are usually not statistically interpretable. This lack of interpretability hampers the determination of a relevance threshold for extracting a feature subset from the rankings and also prevents the wide adoption of these methods by practicians. We evaluated several, existing and novel, procedures that extract relevant features from rankings derived from machine learning approaches. These procedures replace the relevance scores with measures that can be interpreted in a statistical way, such as p-values, false discovery rates, or family wise error rates, for which it is easier to determine a significance level. Experiments were performed on several artificial problems as well as on real microarray datasets. Although the methods differ in terms of computing times and the tradeoff, they achieve in terms of false positives and false negatives, some of them greatly help in the extraction of truly relevant biomarkers and should thus be of great practical interest for biologists and physicians. As a side conclusion, our experiments also clearly highlight that using model performance as a criterion for feature selection is often counter-productive. Python source codes of all tested methods, as well as the MATLAB scripts used for data simulation, can be found in the Supplementary Material.
Tools for interpretation of multispectral data
NASA Astrophysics Data System (ADS)
Speckert, Glen; Carpenter, Loren C.; Russell, Mike; Bradstreet, John; Waite, Tom; Conklin, Charlie
1990-08-01
The large size and multiple bands of todays satellite data require increasingly powerful tools in order to display and interpret the acquired imagery in a timely fashion. Pixar has developed two major tools for use in this data interpretation. These tools are the Electronic Light Table (ELT), and an extensive image processing package, ChapiP. These tools operate on images limited only by disk volume size, currently 3 Gbytes. The Electronic Light Table package provides a fully windowed interface to these large 12 bit monochrome and multiband images, passing images through a software defined image interpretation pipeline in real time during an interactive roam. A virtual image software framework allows interactive modification of the visible image. The roam software pipeline consists of a seventh order polynomial warp, bicubic resampling, a user registration affine, histogram drop sampling, a 5x5 unsharp mask, and per window contrast controls. It is important to note that these functions are done in software, and various performance tradeoffs can be made for different applications within a family of hardware configurations. Special high spped zoom, rotate, sharpness, and contrast operators provide interactive region of interest manipulation. Double window operators provide for flicker, fade, shade, and difference of two parent windows in a chained fashion. Overlay graphics capability is provided in a PostScfipt* windowed environment (NeWS**). The image is stored on disk as a multi resolution image pyramid. This allows resampling and other image operations independent of the zoom level. A set of tools layered upon ChapIP allow manipulation of the entire pyramid file. Arbitrary combinations of bands can be computed for arbitrary sized images, as well as other image processing operations. ChapIP can also be used in conjunction with ELT to dynamically operate on the current roaming window to append the image processing function onto the roam pipeline. Multiple Chapi
A data-management system for detailed areal interpretive data
Ferrigno, C.F.
1986-01-01
A data storage and retrieval system has been developed to organize and preserve areal interpretive data. This system can be used by any study where there is a need to store areal interpretive data that generally is presented in map form. This system provides the capability to grid areal interpretive data for input to groundwater flow models at any spacing and orientation. The data storage and retrieval system is designed to be used for studies that cover small areas such as counties. The system is built around a hierarchically structured data base consisting of related latitude-longitude blocks. The information in the data base can be stored at different levels of detail, with the finest detail being a block of 6 sec of latitude by 6 sec of longitude (approximately 0.01 sq mi). This system was implemented on a mainframe computer using a hierarchical data base management system. The computer programs are written in Fortran IV and PL/1. The design and capabilities of the data storage and retrieval system, and the computer programs that are used to implement the system are described. Supplemental sections contain the data dictionary, user documentation of the data-system software, changes that would need to be made to use this system for other studies, and information on the computer software tape. (Lantz-PTT)
Laterally constrained inversion for CSAMT data interpretation
NASA Astrophysics Data System (ADS)
Wang, Ruo; Yin, Changchun; Wang, Miaoyue; Di, Qingyun
2015-10-01
Laterally constrained inversion (LCI) has been successfully applied to the inversion of dc resistivity, TEM and airborne EM data. However, it hasn't been yet applied to the interpretation of controlled-source audio-frequency magnetotelluric (CSAMT) data. In this paper, we apply the LCI method for CSAMT data inversion by preconditioning the Jacobian matrix. We apply a weighting matrix to Jacobian to balance the sensitivity of model parameters, so that the resolution with respect to different model parameters becomes more uniform. Numerical experiments confirm that this can improve the convergence of the inversion. We first invert a synthetic dataset with and without noise to investigate the effect of LCI applications to CSAMT data, for the noise free data, the results show that the LCI method can recover the true model better compared to the traditional single-station inversion; and for the noisy data, the true model is recovered even with a noise level of 8%, indicating that LCI inversions are to some extent noise insensitive. Then, we re-invert two CSAMT datasets collected respectively in a watershed and a coal mine area in Northern China and compare our results with those from previous inversions. The comparison with the previous inversion in a coal mine shows that LCI method delivers smoother layer interfaces that well correlate to seismic data, while comparison with a global searching algorithm of simulated annealing (SA) in a watershed shows that though both methods deliver very similar good results, however, LCI algorithm presented in this paper runs much faster. The inversion results for the coal mine CSAMT survey show that a conductive water-bearing zone that was not revealed by the previous inversions has been identified by the LCI. This further demonstrates that the method presented in this paper works for CSAMT data inversion.
Polarimetric radar data decomposition and interpretation
NASA Technical Reports Server (NTRS)
Sun, Guoqing; Ranson, K. Jon
1993-01-01
Significant efforts have been made to decompose polarimetric radar data into several simple scattering components. The components which are selected because of their physical significance can be used to classify SAR (Synthetic Aperture Radar) image data. If particular components can be related to forest parameters, inversion procedures may be developed to estimate these parameters from the scattering components. Several methods have been used to decompose an averaged Stoke's matrix or covariance matrix into three components representing odd (surface), even (double-bounce) and diffuse (volume) scatterings. With these decomposition techniques, phenomena, such as canopy-ground interactions, randomness of orientation, and size of scatters can be examined from SAR data. In this study we applied the method recently reported by van Zyl (1992) to decompose averaged backscattering covariance matrices extracted from JPL SAR images over forest stands in Maine, USA. These stands are mostly mixed stands of coniferous and deciduous trees. Biomass data have been derived from field measurements of DBH and tree density using allometric equations. The interpretation of the decompositions and relationships with measured stand biomass are presented in this paper.
Smart Interpretation - Application of Machine Learning in Geological Interpretation of AEM Data
NASA Astrophysics Data System (ADS)
Bach, T.; Gulbrandsen, M. L.; Jacobsen, R.; Pallesen, T. M.; Jørgensen, F.; Høyer, A. S.; Hansen, T. M.
2015-12-01
When using airborne geophysical measurements in e.g. groundwater mapping, an overwhelming amount of data is collected. Increasingly larger survey areas, denser data collection and limited resources, combines to an increasing problem of building geological models that use all the available data in a manner that is consistent with the geologists knowledge about the geology of the survey area. In the ERGO project, funded by The Danish National Advanced Technology Foundation, we address this problem, by developing new, usable tools, enabling the geologist utilize her geological knowledge directly in the interpretation of the AEM data, and thereby handle the large amount of data, In the project we have developed the mathematical basis for capturing geological expertise in a statistical model. Based on this, we have implemented new algorithms that have been operationalized and embedded in user friendly software. In this software, the machine learning algorithm, Smart Interpretation, enables the geologist to use the system as an assistant in the geological modelling process. As the software 'learns' the geology from the geologist, the system suggest new modelling features in the data. In this presentation we demonstrate the application of the results from the ERGO project, including the proposed modelling workflow utilized on a variety of data examples.
A t-statistic for objective interpretation of comparative genomic hybridization (CGH) profiles.
Moore, D H; Pallavicini, M; Cher, M L; Gray, J W
1997-07-01
An objective method for interpreting comparative genomic hybridization (CGH) is described and compared with current methods of interpretation. The method is based on a two-sample t-statistic in which composite test:reference and reference:reference CGH profiles are compared at each point along the genome to detect regions of significant differences. Composite profiles are created by combining CGH profiles measured from several metaphase chromosomes for each type of chromosome in the normal human karyotype. Composites for both test:reference and reference:reference CGH analyses are used to generate mean CGH profiles and information about the variance therein. The utility of the method is demonstrated through analysis of aneusomies and partial gain and loss of DNA sequence in a myeloid leukemia specimen. Banding analyses of this specimen indicated inv (3)(q21q26), del (5)(q2?q35), -7, +8 and add (17)(p11.2). The t-statistic analyses of CGH data indicated rev ish enh (8) and rev ish dim (5q31.1q33.1,7q11.23qter). The undetected gain on 17p was small and confined to a single band (17p11.2). Thus, the t-statistic is an objective and effective method for defining significant differences between test and reference CGH profiles.
Recent statistical methods for orientation data
NASA Technical Reports Server (NTRS)
Batschelet, E.
1972-01-01
The application of statistical methods for determining the areas of animal orientation and navigation are discussed. The method employed is limited to the two-dimensional case. Various tests for determining the validity of the statistical analysis are presented. Mathematical models are included to support the theoretical considerations and tables of data are developed to show the value of information obtained by statistical analysis.
Data Torturing and the Misuse of Statistical Tools
Abate, Marcey L.
1999-08-16
Statistical concepts, methods, and tools are often used in the implementation of statistical thinking. Unfortunately, statistical tools are all too often misused by not applying them in the context of statistical thinking that focuses on processes, variation, and data. The consequences of this misuse may be ''data torturing'' or going beyond reasonable interpretation of the facts due to a misunderstanding of the processes creating the data or the misinterpretation of variability in the data. In the hope of averting future misuse and data torturing, examples are provided where the application of common statistical tools, in the absence of statistical thinking, provides deceptive results by not adequately representing the underlying process and variability. For each of the examples, a discussion is provided on how applying the concepts of statistical thinking may have prevented the data torturing. The lessons learned from these examples will provide an increased awareness of the potential for many statistical methods to mislead and a better understanding of how statistical thinking broadens and increases the effectiveness of statistical tools.
[Blood proteins in African trypanosomiasis: variations and statistical interpretations].
Cailliez, M; Poupin, F; Pages, J P; Savel, J
1982-01-01
The estimation of blood orosomucoid, haptoglobin, C-reactive protein and immunoglobulins levels, has enable us to prove a specific proteic profile in the human african trypanosomiasis, as compared with other that of parasitic diseases, and with an healthy african reference group. Data processing informatique by principal components analysis, provide a valuable pool for epidemiological surveys.
Example of scattering noise in radar data interpretation
Canavan, G.H.
1996-10-01
Radar data interpretation typically assumes well behaved, known particle distributions. Those assumptions are at variance with the unknown angular scattering characteristics of the particles measured. This note gives a simple example of how those characteristics complicate data interpretation.
Statistics for characterizing data on the periphery
Theiler, James P; Hush, Donald R
2010-01-01
We introduce a class of statistics for characterizing the periphery of a distribution, and show that these statistics are particularly valuable for problems in target detection. Because so many detection algorithms are rooted in Gaussian statistics, we concentrate on ellipsoidal models of high-dimensional data distributions (that is to say: covariance matrices), but we recommend several alternatives to the sample covariance matrix that more efficiently model the periphery of a distribution, and can more effectively detect anomalous data samples.
Statistical Literacy: Data Tell a Story
ERIC Educational Resources Information Center
Sole, Marla A.
2016-01-01
Every day, students collect, organize, and analyze data to make decisions. In this data-driven world, people need to assess how much trust they can place in summary statistics. The results of every survey and the safety of every drug that undergoes a clinical trial depend on the correct application of appropriate statistics. Recognizing the…
Statistical Literacy: Data Tell a Story
ERIC Educational Resources Information Center
Sole, Marla A.
2016-01-01
Every day, students collect, organize, and analyze data to make decisions. In this data-driven world, people need to assess how much trust they can place in summary statistics. The results of every survey and the safety of every drug that undergoes a clinical trial depend on the correct application of appropriate statistics. Recognizing the…
Data Mining: Going beyond Traditional Statistics
ERIC Educational Resources Information Center
Zhao, Chun-Mei; Luan, Jing
2006-01-01
The authors provide an overview of data mining, giving special attention to the relationship between data mining and statistics to unravel some misunderstandings about the two techniques. (Contains 1 figure.)
Data Mining: Going beyond Traditional Statistics
ERIC Educational Resources Information Center
Zhao, Chun-Mei; Luan, Jing
2006-01-01
The authors provide an overview of data mining, giving special attention to the relationship between data mining and statistics to unravel some misunderstandings about the two techniques. (Contains 1 figure.)
Distributed data collection for a database of radiological image interpretations
NASA Astrophysics Data System (ADS)
Long, L. Rodney; Ostchega, Yechiam; Goh, Gin-Hua; Thoma, George R.
1997-01-01
The National Library of Medicine, in collaboration with the National Center for Health Statistics and the National Institute for Arthritis and Musculoskeletal and Skin Diseases, has built a system for collecting radiological interpretations for a large set of x-ray images acquired as part of the data gathered in the second National Health and Nutrition Examination Survey. This system is capable of delivering across the Internet 5- and 10-megabyte x-ray images to Sun workstations equipped with X Window based 2048 X 2560 image displays, for the purpose of having these images interpreted for the degree of presence of particular osteoarthritic conditions in the cervical and lumbar spines. The collected interpretations can then be stored in a database at the National Library of Medicine, under control of the Illustra DBMS. This system is a client/server database application which integrates (1) distributed server processing of client requests, (2) a customized image transmission method for faster Internet data delivery, (3) distributed client workstations with high resolution displays, image processing functions and an on-line digital atlas, and (4) relational database management of the collected data.
Woźnicka, U; Jarzyna, J; Krynicka, E
2005-05-01
Measurements of various physical quantities in a borehole by geophysical well logging tools are designed to determine these quantities for underground geological formations. Then, the raw data (logs) are combined in a comprehensive interpretation to obtain values of geological parameters. Estimating the uncertainty of calculated geological parameters, interpreted in such a way, is difficult, often impossible, when classical statistical methods are used. The method presented here permits an estimate of the uncertainty of a quantity to be obtained. The discussion of the dependence between the uncertainty of nuclear and acoustic tool responses, and the estimated uncertainty of the interpreted geological parameters (among others: porosity, water saturation, clay content) is presented.
Data explorer: a prototype expert system for statistical analysis.
Aliferis, C.; Chao, E.; Cooper, G. F.
1993-01-01
The inadequate analysis of medical research data, due mainly to the unavailability of local statistical expertise, seriously jeopardizes the quality of new medical knowledge. Data Explorer is a prototype Expert System that builds on the versatility and power of existing statistical software, to provide automatic analyses and interpretation of medical data. The system draws much of its power by using belief network methods in place of more traditional, but difficult to automate, classical multivariate statistical techniques. Data Explorer identifies statistically significant relationships among variables, and using power-size analysis, belief network inference/learning and various explanatory techniques helps the user understand the importance of the findings. Finally the system can be used as a tool for the automatic development of predictive/diagnostic models from patient databases. PMID:8130501
Interpretation of the results of statistical measurements. [search for basic probability model
NASA Technical Reports Server (NTRS)
Olshevskiy, V. V.
1973-01-01
For random processes, the calculated probability characteristic, and the measured statistical estimate are used in a quality functional, which defines the difference between the two functions. Based on the assumption that the statistical measurement procedure is organized so that the parameters for a selected model are optimized, it is shown that the interpretation of experimental research is a search for a basic probability model.
Barber, Chris; Cayley, Alex; Hanser, Thierry; Harding, Alex; Heghes, Crina; Vessey, Jonathan D; Werner, Stephane; Weiner, Sandy K; Wichard, Joerg; Giddings, Amanda; Glowienke, Susanne; Parenty, Alexis; Brigo, Alessandro; Spirkl, Hans-Peter; Amberg, Alexander; Kemper, Ray; Greene, Nigel
2016-04-01
The relative wealth of bacterial mutagenicity data available in the public literature means that in silico quantitative/qualitative structure activity relationship (QSAR) systems can readily be built for this endpoint. A good means of evaluating the performance of such systems is to use private unpublished data sets, which generally represent a more distinct chemical space than publicly available test sets and, as a result, provide a greater challenge to the model. However, raw performance metrics should not be the only factor considered when judging this type of software since expert interpretation of the results obtained may allow for further improvements in predictivity. Enough information should be provided by a QSAR to allow the user to make general, scientifically-based arguments in order to assess and overrule predictions when necessary. With all this in mind, we sought to validate the performance of the statistics-based in vitro bacterial mutagenicity prediction system Sarah Nexus (version 1.1) against private test data sets supplied by nine different pharmaceutical companies. The results of these evaluations were then analysed in order to identify findings presented by the model which would be useful for the user to take into consideration when interpreting the results and making their final decision about the mutagenic potential of a given compound. Copyright © 2015 Elsevier Inc. All rights reserved.
Statistical Literacy in the Data Science Workplace
ERIC Educational Resources Information Center
Grant, Robert
Statistical literacy, the ability to understand and make use of statistical information including methods, has particular relevance in the age of data science, when complex analyses are undertaken by teams from diverse backgrounds. Not only is it essential to communicate to the consumers of information but also within the team. Writing from the…
Confidentiality of Research and Statistical Data.
ERIC Educational Resources Information Center
Law Enforcement Assistance Administration (Dept. of Justice), Washington, DC.
This document was prepared by the Privacy and Security Staff, National Criminal Justice Information and Statistics Service, in conjunction with the Law Enforcement Assistance Administration (LEAA) Office of General Counsel, to explain and discuss the requirements of the LEAA regulations governing confidentiality of research and statistical data.…
Basic statistical tools in research and data analysis.
Ali, Zulfiqar; Bhaskar, S Bala
2016-09-01
Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if proper statistical tests are used. This article will try to acquaint the reader with the basic research tools that are utilised while conducting various studies. The article covers a brief outline of the variables, an understanding of quantitative and qualitative variables and the measures of central tendency. An idea of the sample size estimation, power analysis and the statistical errors is given. Finally, there is a summary of parametric and non-parametric tests used for data analysis.
Topology for statistical modeling of petascale data.
Pascucci, Valerio; Mascarenhas, Ajith Arthur; Rusek, Korben; Bennett, Janine Camille; Levine, Joshua; Pebay, Philippe Pierre; Gyulassy, Attila; Thompson, David C.; Rojas, Joseph Maurice
2011-07-01
This document presents current technical progress and dissemination of results for the Mathematics for Analysis of Petascale Data (MAPD) project titled 'Topology for Statistical Modeling of Petascale Data', funded by the Office of Science Advanced Scientific Computing Research (ASCR) Applied Math program. Many commonly used algorithms for mathematical analysis do not scale well enough to accommodate the size or complexity of petascale data produced by computational simulations. The primary goal of this project is thus to develop new mathematical tools that address both the petascale size and uncertain nature of current data. At a high level, our approach is based on the complementary techniques of combinatorial topology and statistical modeling. In particular, we use combinatorial topology to filter out spurious data that would otherwise skew statistical modeling techniques, and we employ advanced algorithms from algebraic statistics to efficiently find globally optimal fits to statistical models. This document summarizes the technical advances we have made to date that were made possible in whole or in part by MAPD funding. These technical contributions can be divided loosely into three categories: (1) advances in the field of combinatorial topology, (2) advances in statistical modeling, and (3) new integrated topological and statistical methods.
Statistical analysis principles for Omics data.
Dunkler, Daniela; Sánchez-Cabo, Fátima; Heinze, Georg
2011-01-01
In Omics experiments, typically thousands of hypotheses are tested simultaneously, each based on very few independent replicates. Traditional tests like the t-test were shown to perform poorly with this new type of data. Furthermore, simultaneous consideration of many hypotheses, each prone to a decision error, requires powerful adjustments for this multiple testing situation. After a general introduction to statistical testing, we present the moderated t-statistic, the SAM statistic, and the RankProduct statistic which have been developed to evaluate hypotheses in typical Omics experiments. We also provide an introduction to the multiple testing problem and discuss some state-of-the-art procedures to address this issue. The presented test statistics are subjected to a comparative analysis of a microarray experiment comparing tissue samples of two groups of tumors. All calculations can be done using the freely available statistical software R. Accompanying, commented code is available at: http://www.meduniwien.ac.at/msi/biometrie/MIMB.
Systematic interpretation of microarray data using experiment annotations
Fellenberg, Kurt; Busold, Christian H; Witt, Olaf; Bauer, Andrea; Beckmann, Boris; Hauser, Nicole C; Frohme, Marcus; Winter, Stefan; Dippon, Jürgen; Hoheisel, Jörg D
2006-01-01
Background Up to now, microarray data are mostly assessed in context with only one or few parameters characterizing the experimental conditions under study. More explicit experiment annotations, however, are highly useful for interpreting microarray data, when available in a statistically accessible format. Results We provide means to preprocess these additional data, and to extract relevant traits corresponding to the transcription patterns under study. We found correspondence analysis particularly well-suited for mapping such extracted traits. It visualizes associations both among and between the traits, the hereby annotated experiments, and the genes, revealing how they are all interrelated. Here, we apply our methods to the systematic interpretation of radioactive (single channel) and two-channel data, stemming from model organisms such as yeast and drosophila up to complex human cancer samples. Inclusion of technical parameters allows for identification of artifacts and flaws in experimental design. Conclusion Biological and clinical traits can act as landmarks in transcription space, systematically mapping the variance of large datasets from the predominant changes down toward intricate details. PMID:17181856
NASA Astrophysics Data System (ADS)
Massiot, Cécile; Townend, John; Nicol, Andrew; McNamara, David D.
2017-08-01
Acoustic borehole televiewer (BHTV) logs provide measurements of fracture attributes (orientations, thickness, and spacing) at depth. Orientation, censoring, and truncation sampling biases similar to those described for one-dimensional outcrop scanlines, and other logging or drilling artifacts specific to BHTV logs, can affect the interpretation of fracture attributes from BHTV logs. K-means, fuzzy K-means, and agglomerative clustering methods provide transparent means of separating fracture groups on the basis of their orientation. Fracture spacing is calculated for each of these fracture sets. Maximum likelihood estimation using truncated distributions permits the fitting of several probability distributions to the fracture attribute data sets within truncation limits, which can then be extrapolated over the entire range where they naturally occur. Akaike Information Criterion (AIC) and Schwartz Bayesian Criterion (SBC) statistical information criteria rank the distributions by how well they fit the data. We demonstrate these attribute analysis methods with a data set derived from three BHTV logs acquired from the high-temperature Rotokawa geothermal field, New Zealand. Varying BHTV log quality reduces the number of input data points, but careful selection of the quality levels where fractures are deemed fully sampled increases the reliability of the analysis. Spacing data analysis comprising up to 300 data points and spanning three orders of magnitude can be approximated similarly well (similar AIC rankings) with several distributions. Several clustering configurations and probability distributions can often characterize the data at similar levels of statistical criteria. Thus, several scenarios should be considered when using BHTV log data to constrain numerical fracture models.
Statistical treatment of fatigue test data
Raske, D.T.
1980-01-01
This report discussed several aspects of fatigue data analysis in order to provide a basis for the development of statistically sound design curves. Included is a discussion on the choice of the dependent variable, the assumptions associated with least squares regression models, the variability of fatigue data, the treatment of data from suspended tests and outlying observations, and various strain-life relations.
Interpretation of Data from Uphole Refraction Surveys
1980-06-01
1RPRORU WXGNNIZATION NAME AND ADDRESS 10. PROGRAM ELEMENT, PROJECT, TASKAREA & WORK UNIT NUMBERS U. S. Army Engineer Waterways Experiment Station...travel times. The usefulness of the daca and success in interpretation may depend upon thie care given to detailing the configuration of the rock...the Subsurface Exploration Programs ," Report to Defense Nuclear Agency by U. S. Army Engineer Waterways Experiment Station, Vicksburg, Miss. Meissner
NASA Astrophysics Data System (ADS)
Karuppiah, R.; Faldi, A.; Laurenzi, I.; Usadi, A.; Venkatesh, A.
2014-12-01
An increasing number of studies are focused on assessing the environmental footprint of different products and processes, especially using life cycle assessment (LCA). This work shows how combining statistical methods and Geographic Information Systems (GIS) with environmental analyses can help improve the quality of results and their interpretation. Most environmental assessments in literature yield single numbers that characterize the environmental impact of a process/product - typically global or country averages, often unchanging in time. In this work, we show how statistical analysis and GIS can help address these limitations. For example, we demonstrate a method to separately quantify uncertainty and variability in the result of LCA models using a power generation case study. This is important for rigorous comparisons between the impacts of different processes. Another challenge is lack of data that can affect the rigor of LCAs. We have developed an approach to estimate environmental impacts of incompletely characterized processes using predictive statistical models. This method is applied to estimate unreported coal power plant emissions in several world regions. There is also a general lack of spatio-temporal characterization of the results in environmental analyses. For instance, studies that focus on water usage do not put in context where and when water is withdrawn. Through the use of hydrological modeling combined with GIS, we quantify water stress on a regional and seasonal basis to understand water supply and demand risks for multiple users. Another example where it is important to consider regional dependency of impacts is when characterizing how agricultural land occupation affects biodiversity in a region. We developed a data-driven methodology used in conjuction with GIS to determine if there is a statistically significant difference between the impacts of growing different crops on different species in various biomes of the world.
HistFitter software framework for statistical data analysis
NASA Astrophysics Data System (ADS)
Baak, M.; Besjes, G. J.; Côté, D.; Koutsman, A.; Lorenz, J.; Short, D.
2015-04-01
We present a software framework for statistical data analysis, called HistFitter, that has been used extensively by the ATLAS Collaboration to analyze big datasets originating from proton-proton collisions at the Large Hadron Collider at CERN. Since 2012 HistFitter has been the standard statistical tool in searches for supersymmetric particles performed by ATLAS. HistFitter is a programmable and flexible framework to build, book-keep, fit, interpret and present results of data models of nearly arbitrary complexity. Starting from an object-oriented configuration, defined by users, the framework builds probability density functions that are automatically fit to data and interpreted with statistical tests. Internally HistFitter uses the statistics packages RooStats and HistFactory. A key innovation of HistFitter is its design, which is rooted in analysis strategies of particle physics. The concepts of control, signal and validation regions are woven into its fabric. These are progressively treated with statistically rigorous built-in methods. Being capable of working with multiple models at once that describe the data, HistFitter introduces an additional level of abstraction that allows for easy bookkeeping, manipulation and testing of large collections of signal hypotheses. Finally, HistFitter provides a collection of tools to present results with publication quality style through a simple command-line interface.
Statistical Inference for Data Adaptive Target Parameters.
Hubbard, Alan E; Kherad-Pajouh, Sara; van der Laan, Mark J
2016-05-01
Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in an estimation sample (one of the V subsamples) and corresponding complementary parameter-generating sample. For each of the V parameter-generating samples, we apply an algorithm that maps the sample to a statistical target parameter. We define our sample-split data adaptive statistical target parameter as the average of these V-sample specific target parameters. We present an estimator (and corresponding central limit theorem) of this type of data adaptive target parameter. This general methodology for generating data adaptive target parameters is demonstrated with a number of practical examples that highlight new opportunities for statistical learning from data. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming "data-driven", the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods. To suggest such potential, and to verify the predictions of the theory, extensive simulation studies, along with a data analysis based on adaptively determined intervention rules are shown and give insight into how to structure such an approach. The results show that the data adaptive target parameter approach provides a general framework and resulting methodology for data-driven science.
Statistical Data Analyses of Trace Chemical, Biochemical, and Physical Analytical Signatures
Udey, Ruth Norma
2013-01-01
Analytical and bioanalytical chemistry measurement results are most meaningful when interpreted using rigorous statistical treatments of the data. The same data set may provide many dimensions of information depending on the questions asked through the applied statistical methods. Three principal projects illustrated the wealth of information gained through the application of statistical data analyses to diverse problems.
Lin, K K
2000-11-01
The U.S. Food and Drug Administration (FDA) is in the process of preparing a draft Guidance for Industry document on the statistical aspects of carcinogenicity studies of pharmaceuticals for public comment. The purpose of the document is to provide statistical guidance for the design of carcinogenicity experiments, methods of statistical analysis of study data, interpretation of study results, presentation of data and results in reports, and submission of electronic study data. This article covers the genesis of the guidance document and some statistical methods in study design, data analysis, and interpretation of results included in the draft FDA guidance document.
A spatial scan statistic for multinomial data.
Jung, Inkyung; Kulldorff, Martin; Richard, Otukei John
2010-08-15
As a geographical cluster detection analysis tool, the spatial scan statistic has been developed for different types of data such as Bernoulli, Poisson, ordinal, exponential and normal. Another interesting data type is multinomial. For example, one may want to find clusters where the disease-type distribution is statistically significantly different from the rest of the study region when there are different types of disease. In this paper, we propose a spatial scan statistic for such data, which is useful for geographical cluster detection analysis for categorical data without any intrinsic order information. The proposed method is applied to meningitis data consisting of five different disease categories to identify areas with distinct disease-type patterns in two counties in the U.K. The performance of the method is evaluated through a simulation study.
Transit Spectroscopy: new data analysis techniques and interpretation
NASA Astrophysics Data System (ADS)
Tinetti, Giovanna; Waldmann, Ingo P.; Morello, Giuseppe; Tessenyi, Marcell; Varley, Ryan; Barton, Emma; Yurchenko, Sergey; Tennyson, Jonathan; Hollis, Morgan
2014-11-01
Planetary science beyond the boundaries of our Solar System is today in its infancy. Until a couple of decades ago, the detailed investigation of the planetary properties was restricted to objects orbiting inside the Kuiper Belt. Today, we cannot ignore that the number of known planets has increased by two orders of magnitude nor that these planets resemble anything but the objects present in our own Solar System. A key observable for planets is the chemical composition and state of their atmosphere. To date, two methods can be used to sound exoplanetary atmospheres: transit and eclipse spectroscopy, and direct imaging spectroscopy. Although the field of exoplanet spectroscopy has been very successful in past years, there are a few serious hurdles that need to be overcome to progress in this area: in particular instrument systematics are often difficult to disentangle from the signal, data are sparse and often not recorded simultaneously causing degeneracy of interpretation. We will present here new data analysis techniques and interpretation developed by the “ExoLights” team at UCL to address the above-mentioned issues. Said techniques include statistical tools, non-parametric, machine-learning algorithms, optimized radiative transfer models and spectroscopic line-lists. These new tools have been successfully applied to existing data recorded with space and ground instruments, shedding new light on our knowledge and understanding of these alien worlds.
NASA Astrophysics Data System (ADS)
Kuić, Domagoj
2016-05-01
In this paper an alternative approach to statistical mechanics based on the maximum information entropy principle (MaxEnt) is examined, specifically its close relation with the Gibbs method of ensembles. It is shown that the MaxEnt formalism is the logical extension of the Gibbs formalism of equilibrium statistical mechanics that is entirely independent of the frequentist interpretation of probabilities only as factual (i.e. experimentally verifiable) properties of the real world. Furthermore, we show that, consistently with the law of large numbers, the relative frequencies of the ensemble of systems prepared under identical conditions (i.e. identical constraints) actually correspond to the MaxEnt probabilites in the limit of a large number of systems in the ensemble. This result implies that the probabilities in statistical mechanics can be interpreted, independently of the frequency interpretation, on the basis of the maximum information entropy principle.
Statistical data of the uranium industry
1983-01-01
This report is a compendium of information relating to US uranium reserves and potential resources and to exploration, mining, milling, and other activities of the uranium industry through 1982. The statistics are based primarily on data provided voluntarily by the uranium exploration, mining and milling companies. The compendium has been published annually since 1968 and reflects the basic programs of the Grand Junction Area Office of the US Department of Energy. Statistical data obtained from surveys conducted by the Energy Information Administration are included in Section IX. The production, reserves, and drilling data are reported in a manner which avoids disclosure of proprietary information.
Topology for Statistical Modeling of Petascale Data
Bennett, Janine Camille; Pebay, Philippe Pierre; Pascucci, Valerio; Levine, Joshua; Gyulassy, Attila; Rojas, Maurice
2014-07-01
This document presents current technical progress and dissemination of results for the Mathematics for Analysis of Petascale Data (MAPD) project titled "Topology for Statistical Modeling of Petascale Data", funded by the Office of Science Advanced Scientific Computing Research (ASCR) Applied Math program.
Teacher Perception of Tasks That Enhance Data Interpretation
ERIC Educational Resources Information Center
Wolfe, Gretchen L.
2012-01-01
The purpose of this study is to provide an account of teacher perception of core practice tasks in data use, particularly data interpretation. Data interpretation is critical to professional practice in planning instructional adjustments for student learning. This is a case study of four elementary teachers who provide numerous task-specific…
Statistical Tools for the Interpretation of Enzootic West Nile virus Transmission Dynamics.
Caillouët, Kevin A; Robertson, Suzanne
2016-01-01
Interpretation of enzootic West Nile virus (WNV) surveillance indicators requires little advanced mathematical skill, but greatly enhances the ability of public health officials to prescribe effective WNV management tactics. Stepwise procedures for the calculation of mosquito infection rates (IR) and vector index (VI) are presented alongside statistical tools that require additional computation. A brief review of advantages and important considerations for each statistic's use is provided.
Revisiting the statistical analysis of pyroclast density and porosity data
NASA Astrophysics Data System (ADS)
Bernard, B.; Kueppers, U.; Ortiz, H.
2015-07-01
Explosive volcanic eruptions are commonly characterized based on a thorough analysis of the generated deposits. Amongst other characteristics in physical volcanology, density and porosity of juvenile clasts are some of the most frequently used to constrain eruptive dynamics. In this study, we evaluate the sensitivity of density and porosity data to statistical methods and introduce a weighting parameter to correct issues raised by the use of frequency analysis. Results of textural investigation can be biased by clast selection. Using statistical tools as presented here, the meaningfulness of a conclusion can be checked for any data set easily. This is necessary to define whether or not a sample has met the requirements for statistical relevance, i.e. whether a data set is large enough to allow for reproducible results. Graphical statistics are used to describe density and porosity distributions, similar to those used for grain-size analysis. This approach helps with the interpretation of volcanic deposits. To illustrate this methodology, we chose two large data sets: (1) directed blast deposits of the 3640-3510 BC eruption of Chachimbiro volcano (Ecuador) and (2) block-and-ash-flow deposits of the 1990-1995 eruption of Unzen volcano (Japan). We propose the incorporation of this analysis into future investigations to check the objectivity of results achieved by different working groups and guarantee the meaningfulness of the interpretation.
Topology for Statistical Modeling of Petascale Data
Pascucci, Valerio; Levine, Joshua; Gyulassy, Attila; Bremer, P. -T.
2013-10-31
Many commonly used algorithms for mathematical analysis do not scale well enough to accommodate the size or complexity of petascale data produced by computational simulations. The primary goal of this project is to develop new mathematical tools that address both the petascale size and uncertain nature of current data. At a high level, the approach of the entire team involving all three institutions is based on the complementary techniques of combinatorial topology and statistical modelling. In particular, we use combinatorial topology to filter out spurious data that would otherwise skew statistical modelling techniques, and we employ advanced algorithms from algebraic statistics to efficiently find globally optimal fits to statistical models. The overall technical contributions can be divided loosely into three categories: (1) advances in the field of combinatorial topology, (2) advances in statistical modelling, and (3) new integrated topological and statistical methods. Roughly speaking, the division of labor between our 3 groups (Sandia Labs in Livermore, Texas A&M in College Station, and U Utah in Salt Lake City) is as follows: the Sandia group focuses on statistical methods and their formulation in algebraic terms, and finds the application problems (and data sets) most relevant to this project, the Texas A&M Group develops new algebraic geometry algorithms, in particular with fewnomial theory, and the Utah group develops new algorithms in computational topology via Discrete Morse Theory. However, we hasten to point out that our three groups stay in tight contact via videconference every 2 weeks, so there is much synergy of ideas between the groups. The following of this document is focused on the contributions that had grater direct involvement from the team at the University of Utah in Salt Lake City.
Statistical data of the uranium industry
1981-01-01
Data are presented on US uranium reserves, potential resources, exploration, mining, drilling, milling, and other activities of the uranium industry through 1980. The compendium reflects the basic programs of the Grand Junction Office. Statistics are based primarily on information provided by the uranium exploration, mining, and milling companies. Data on commercial U/sub 3/O/sub 8/ sales and purchases are included. Data on non-US uranium production and resources are presented in the appendix. (DMC)
Statistical data of the uranium industry
1982-01-01
Statistical Data of the Uranium Industry is a compendium of information relating to US uranium reserves and potential resources and to exploration, mining, milling, and other activities of the uranium industry through 1981. The statistics are based primarily on data provided voluntarily by the uranium exploration, mining, and milling companies. The compendium has been published annually since 1968 and reflects the basic programs of the Grand Junction Area Office (GJAO) of the US Department of Energy. The production, reserves, and drilling information is reported in a manner which avoids disclosure of proprietary information.
Interpretation of remotely sensed data and its applications in oceanography
NASA Technical Reports Server (NTRS)
Parada, N. D. J. (Principal Investigator); Tanaka, K.; Inostroza, H. M.; Verdesio, J. J.
1982-01-01
The methodology of interpretation of remote sensing data and its oceanographic applications are described. The elements of image interpretation for different types of sensors are discussed. The sensors utilized are the multispectral scanner of LANDSAT, and the thermal infrared of NOAA and geostationary satellites. Visual and automatic data interpretation in studies of pollution, the Brazil current system, and upwelling along the southeastern Brazilian coast are compared.
Interpretation of mass spectrometry data for high-throughput proteomics.
Chamrad, Daniel C; Koerting, Gerhard; Gobom, Johan; Thiele, Herbert; Klose, Joachim; Meyer, Helmut E; Blueggel, Martin
2003-08-01
Recent developments in proteomics have revealed a bottleneck in bioinformatics: high-quality interpretation of acquired MS data. The ability to generate thousands of MS spectra per day, and the demand for this, makes manual methods inadequate for analysis and underlines the need to transfer the advanced capabilities of an expert human user into sophisticated MS interpretation algorithms. The identification rate in current high-throughput proteomics studies is not only a matter of instrumentation. We present software for high-throughput PMF identification, which enables robust and confident protein identification at higher rates. This has been achieved by automated calibration, peak rejection, and use of a meta search approach which employs various PMF search engines. The automatic calibration consists of a dynamic, spectral information-dependent algorithm, which combines various known calibration methods and iteratively establishes an optimised calibration. The peak rejection algorithm filters signals that are unrelated to the analysed protein by use of automatically generated and dataset-dependent exclusion lists. In the "meta search" several known PMF search engines are triggered and their results are merged by use of a meta score. The significance of the meta score was assessed by simulation of PMF identification with 10,000 artificial spectra resembling a data situation close to the measured dataset. By means of this simulation the meta score is linked to expectation values as a statistical measure. The presented software is part of the proteome database ProteinScape which links the information derived from MS data to other relevant proteomics data. We demonstrate the performance of the presented system with MS data from 1891 PMF spectra. As a result of automatic calibration and peak rejection the identification rate increased from 6% to 44%.
ERIC Educational Resources Information Center
Guler, Mustafa; Gursoy, Kadir; Guven, Bulent
2016-01-01
Understanding and interpreting biased data, decision-making in accordance with the data, and critically evaluating situations involving data are among the fundamental skills necessary in the modern world. To develop these required skills, emphasis on statistical literacy in school mathematics has been gradually increased in recent years. The…
Interpreting Statistical Significance Test Results: A Proposed New "What If" Method.
ERIC Educational Resources Information Center
Kieffer, Kevin M.; Thompson, Bruce
As the 1994 publication manual of the American Psychological Association emphasized, "p" values are affected by sample size. As a result, it can be helpful to interpret the results of statistical significant tests in a sample size context by conducting so-called "what if" analyses. However, these methods can be inaccurate…
Dotto, G L; Pinto, L A A; Hachicha, M A; Knani, S
2015-03-15
In this work, statistical physics treatment was employed to study the adsorption of food dyes onto chitosan films, in order to obtain new physicochemical interpretations at molecular level. Experimental equilibrium curves were obtained for the adsorption of four dyes (FD&C red 2, FD&C yellow 5, FD&C blue 2, Acid Red 51) at different temperatures (298, 313 and 328 K). A statistical physics formula was used to interpret these curves, and the parameters such as, number of adsorbed dye molecules per site (n), anchorage number (n'), receptor sites density (NM), adsorbed quantity at saturation (N asat), steric hindrance (τ), concentration at half saturation (c1/2) and molar adsorption energy (ΔE(a)) were estimated. The relation of the above mentioned parameters with the chemical structure of the dyes and temperature was evaluated and interpreted.
Engine Data Interpretation System (EDIS), phase 2
NASA Technical Reports Server (NTRS)
Cost, Thomas L.; Hofmann, Martin O.
1991-01-01
A prototype of an expert system was developed which applies qualitative constraint-based reasoning to the task of post-test analysis of data resulting from a rocket engine firing. Data anomalies are detected and corresponding faults are diagnosed. Engine behavior is reconstructed using measured data and knowledge about engine behavior. Knowledge about common faults guides but does not restrict the search for the best explanation in terms of hypothesized faults. The system contains domain knowledge about the behavior of common rocket engine components and was configured for use with the Space Shuttle Main Engine (SSME). A graphical user interface allows an expert user to intimately interact with the system during diagnosis. The system was applied to data taken during actual SSME tests where data anomalies were observed.
Statistical considerations when analyzing biomarker data.
Beam, Craig A
2015-11-01
Biomarkers have become, and will continue to become, increasingly important to clinical immunology research. Yet, biomarkers often present new problems and raise new statistical and study design issues to scientists working in clinical immunology. In this paper I discuss statistical considerations related to the important biomarker problems of: 1) The design and analysis of clinical studies which seek to determine whether changes from baseline in a biomarker are associated with changes in a metabolic outcome; 2) The conditions that are required for a biomarker to be considered a "surrogate"; 3) Considerations that arise when analyzing whether or not a predictive biomarker could act as a surrogate endpoint; 4) Biomarker timing relative to the clinical endpoint; 5) The problem of analyzing studies that measure many biomarkers from few subjects; and, 6) The use of statistical models when analyzing biomarker data arising from count data.
Identification of Abnormal Screening Mammogram Interpretation Using Medicare Claims Data
Hubbard, Rebecca A; Zhu, Weiwei; Balch, Steven; Onega, Tracy; Fenton, Joshua J
2015-01-01
Objective To develop and validate Medicare claims-based approaches for identifying abnormal screening mammography interpretation. Data Sources Mammography data and linked Medicare claims for 387,709 mammograms performed from 1999 to 2005 within the Breast Cancer Surveillance Consortium (BCSC). Study Design Split-sample validation of algorithms based on claims for breast imaging or biopsy following screening mammography. Data Extraction Methods Medicare claims and BCSC mammography data were pooled at a central Statistical Coordinating Center. Principal Findings Presence of claims for subsequent imaging or biopsy had sensitivity of 74.9 percent (95 percent confidence interval [CI], 74.1–75.6) and specificity of 99.4 percent (95 percent CI, 99.4–99.5). A classification and regression tree improved sensitivity to 82.5 percent (95 percent CI, 81.9–83.2) but decreased specificity (96.6 percent, 95 percent CI, 96.6–96.8). Conclusions Medicare claims may be a feasible data source for research or quality improvement efforts addressing high rates of abnormal screening mammography. PMID:24976519
Telemetry Boards Interpret Rocket, Airplane Engine Data
NASA Technical Reports Server (NTRS)
2009-01-01
For all the data gathered by the space shuttle while in orbit, NASA engineers are just as concerned about the information it generates on the ground. From the moment the shuttle s wheels touch the runway to the break of its electrical umbilical cord at 0.4 seconds before its next launch, sensors feed streams of data about the status of the vehicle and its various systems to Kennedy Space Center s shuttle crews. Even while the shuttle orbiter is refitted in Kennedy s orbiter processing facility, engineers constantly monitor everything from power levels to the testing of the mechanical arm in the orbiter s payload bay. On the launch pad and up until liftoff, the Launch Control Center, attached to the large Vehicle Assembly Building, screens all of the shuttle s vital data. (Once the shuttle clears its launch tower, this responsibility shifts to Mission Control at Johnson Space Center, with Kennedy in a backup role.) Ground systems for satellite launches also generate significant amounts of data. At Cape Canaveral Air Force Station, across the Banana River from Kennedy s location on Merritt Island, Florida, NASA rockets carrying precious satellite payloads into space flood the Launch Vehicle Data Center with sensor information on temperature, speed, trajectory, and vibration. The remote measurement and transmission of systems data called telemetry is essential to ensuring the safe and successful launch of the Agency s space missions. When a launch is unsuccessful, as it was for this year s Orbiting Carbon Observatory satellite, telemetry data also provides valuable clues as to what went wrong and how to remedy any problems for future attempts. All of this information is streamed from sensors in the form of binary code: strings of ones and zeros. One small company has partnered with NASA to provide technology that renders raw telemetry data intelligible not only for Agency engineers, but also for those in the private sector.
Transforming Graph Data for Statistical Relational Learning
2012-10-01
Other metrics or strategies that could be used include Akaike’s information criterion (AIC) (Akaike, 1974), Mallows Cp ( Mallows , 1973), Bayesian...Machine Learning and Knowledge Discovery in Databases, 5782, 47–62. 434 Transforming Graph Data for Statistical Relational Learning Mallows , C. (1973
MICROARRAY DATA ANALYSIS USING MULTIPLE STATISTICAL MODELS
Microarray Data Analysis Using Multiple Statistical Models
Wenjun Bao1, Judith E. Schmid1, Amber K. Goetz1, Ming Ouyang2, William J. Welsh2,Andrew I. Brooks3,4, ChiYi Chu3,Mitsunori Ogihara3,4, Yinhe Cheng5, David J. Dix1. 1National Health and Environmental Effects Researc...
Performing Inferential Statistics Prior to Data Collection
ERIC Educational Resources Information Center
Trafimow, David; MacDonald, Justin A.
2017-01-01
Typically, in education and psychology research, the investigator collects data and subsequently performs descriptive and inferential statistics. For example, a researcher might compute group means and use the null hypothesis significance testing procedure to draw conclusions about the populations from which the groups were drawn. We propose an…
MICROARRAY DATA ANALYSIS USING MULTIPLE STATISTICAL MODELS
Microarray Data Analysis Using Multiple Statistical Models
Wenjun Bao1, Judith E. Schmid1, Amber K. Goetz1, Ming Ouyang2, William J. Welsh2,Andrew I. Brooks3,4, ChiYi Chu3,Mitsunori Ogihara3,4, Yinhe Cheng5, David J. Dix1. 1National Health and Environmental Effects Researc...
Redman-MacLaren, Michelle; Mills, Jane; Tommbe, Rachael
2014-01-01
Participatory approaches to qualitative research practice constantly change in response to evolving research environments. Researchers are increasingly encouraged to undertake secondary analysis of qualitative data, despite epistemological and ethical challenges. Interpretive focus groups can be described as a more participative method for groups to analyse qualitative data. To facilitate interpretive focus groups with women in Papua New Guinea to extend analysis of existing qualitative data and co-create new primary data. The purpose of this was to inform a transformational grounded theory and subsequent health promoting action. A two-step approach was used in a grounded theory study about how women experience male circumcision in Papua New Guinea. Participants analysed portions or 'chunks' of existing qualitative data in story circles and built upon this analysis by using the visual research method of storyboarding. New understandings of the data were evoked when women in interpretive focus groups analysed the data 'chunks'. Interpretive focus groups encouraged women to share their personal experiences about male circumcision. The visual method of storyboarding enabled women to draw pictures to represent their experiences. This provided an additional focus for whole-of-group discussions about the research topic. Interpretive focus groups offer opportunity to enhance trustworthiness of findings when researchers undertake secondary analysis of qualitative data. The co-analysis of existing data and co-generation of new data between research participants and researchers informed an emergent transformational grounded theory and subsequent health promoting action.
Redman-MacLaren, Michelle; Mills, Jane; Tommbe, Rachael
2014-12-01
Background Participatory approaches to qualitative research practice constantly change in response to evolving research environments. Researchers are increasingly encouraged to undertake secondary analysis of qualitative data, despite epistemological and ethical challenges. Interpretive focus groups can be described as a more participative method for groups to analyse qualitative data. Objective To facilitate interpretive focus groups with women in Papua New Guinea to extend analysis of existing qualitative data and co-create new primary data. The purpose of this was to inform a transformational grounded theory and subsequent health promoting action. Design A two-step approach was used in a grounded theory study about how women experience male circumcision in Papua New Guinea. Participants analysed portions or 'chunks' of existing qualitative data in story circles and built upon this analysis by using the visual research method of storyboarding. Results New understandings of the data were evoked when women in interpretive focus groups analysed the data 'chunks'. Interpretive focus groups encouraged women to share their personal experiences about male circumcision. The visual method of storyboarding enabled women to draw pictures to represent their experiences. This provided an additional focus for whole-of-group discussions about the research topic. Conclusions Interpretive focus groups offer opportunity to enhance trustworthiness of findings when researchers undertake secondary analysis of qualitative data. The co-analysis of existing data and co-generation of new data between research participants and researchers informed an emergent transformational grounded theory and subsequent health promoting action.
Component fragilities. Data collection, analysis and interpretation
Bandyopadhyay, K.K.; Hofmayer, C.H.
1985-01-01
As part of the component fragility research program sponsored by the US NRC, BNL is involved in establishing seismic fragility levels for various nuclear power plant equipment with emphasis on electrical equipment. To date, BNL has reviewed approximately seventy test reports to collect fragility or high level test data for switchgears, motor control centers and similar electrical cabinets, valve actuators and numerous electrical and control devices, e.g., switches, transmitters, potentiometers, indicators, relays, etc., of various manufacturers and models. BNL has also obtained test data from EPRI/ANCO. Analysis of the collected data reveals that fragility levels can best be described by a group of curves corresponding to various failure modes. The lower bound curve indicates the initiation of malfunctioning or structural damage, whereas the upper bound curve corresponds to overall failure of the equipment based on known failure modes occurring separately or interactively. For some components, the upper and lower bound fragility levels are observed to vary appreciably depending upon the manufacturers and models. For some devices, testing even at the shake table vibration limit does not exhibit any failure. Failure of a relay is observed to be a frequent cause of failure of an electrical panel or a system. An extensive amount of additional fregility or high level test data exists.
Conference summary: interpretations of asteroseismic data
NASA Astrophysics Data System (ADS)
Guzik, J.
2008-12-01
My goals in this summary are to give my personal impressions, highlight significant develop- ments, and list some of the remaining puzzles and challenges in asteroseismology as presented at this conference. I do not review future space- and ground-based observing programs dis- cussed in the final morning session, but I was encouraged by the data that these promise to return in the near term that will keep asteroseismology an exciting research field for years to come.
Biau, David Jean; Kernéis, Solen; Porcher, Raphaël
2008-09-01
The increasing volume of research by the medical community often leads to increasing numbers of contradictory findings and conclusions. Although the differences observed may represent true differences, the results also may differ because of sampling variability as all studies are performed on a limited number of specimens or patients. When planning a study reporting differences among groups of patients or describing some variable in a single group, sample size should be considered because it allows the researcher to control for the risk of reporting a false-negative finding (Type II error) or to estimate the precision his or her experiment will yield. Equally important, readers of medical journals should understand sample size because such understanding is essential to interpret the relevance of a finding with regard to their own patients. At the time of planning, the investigator must establish (1) a justifiable level of statistical significance, (2) the chances of detecting a difference of given magnitude between the groups compared, ie, the power, (3) this targeted difference (ie, effect size), and (4) the variability of the data (for quantitative data). We believe correct planning of experiments is an ethical issue of concern to the entire community.
Cho, Kyung Hwa; Park, Yongeun; Kang, Joo-Hyon; Ki, Seo Jin; Cha, Sungmin; Lee, Seung Won; Kim, Joon Ha
2009-01-01
The Yeongsan (YS) Reservoir is an estuarine reservoir which provides surrounding areas with public goods, such as water supply for agricultural and industrial areas and flood control. Beneficial uses of the YS Reservoir, however, are recently threatened by enriched non-point and point source inputs. A series of multivariate statistical approaches including principal component analysis (PCA) were applied to extract significant characteristics contained in a large suite of water quality data (18 variables monthly recorded for 5 years); thereby to provide the important phenomenal information for establishing effective water resource management plans for the YS Reservoir. The PCA results identified the most important five principal components (PCs), explaining 71% of total variance of the original data set. The five PCs were interpreted as hydro-meteorological effect, nitrogen loading, phosphorus loading, primary production of phytoplankton, and fecal indicator bacteria (FIB) loading. Furthermore, hydro-meteorological effect and nitrogen loading could be characterized by a yearly periodicity whereas FIB loading showed an increasing trend with respect to time. The study results presented here might be useful to establish preliminary strategies for abating water quality degradation in the YS Reservoir.
Multivariate Statistical Mapping of Spectroscopic Imaging Data
Young, K.; Govind, V.; Sharma, K.; Studholme, C.; Maudsley, A.A; Schuff, N.
2010-01-01
For magnetic resonance spectroscopic imaging (MRSI) studies of the brain it is important to measure the distribution of metabolites in a regionally unbiased way - that is without restrictions to apriori defined regions of interest (ROI). Since MRSI provides measures of multiple metabolites simultaneously at each voxel, there is furthermore great interest in utilizing the multidimensional nature of MRSI for gains in statistical power. Voxelwise multivariate statistical mapping is expected to address both of these issues but it has not been previously employed for SI studies of brain. The aims of this study were to: 1) develop and validate multivariate voxel based statistical mapping for MRSI and 2) demonstrate that multivariate tests can be more powerful than univariate tests in identifying patterns of altered brain metabolism. Specifically, we compared multivariate to univariate tests in identifying known regional patterns in simulated data and regional patterns of metabolite alterations due to amyotrophic lateral sclerosis, a devastating brain disease of the motor neurons. PMID:19953514
Critical analysis of adsorption data statistically
NASA Astrophysics Data System (ADS)
Kaushal, Achla; Singh, S. K.
2016-09-01
Experimental data can be presented, computed, and critically analysed in a different way using statistics. A variety of statistical tests are used to make decisions about the significance and validity of the experimental data. In the present study, adsorption was carried out to remove zinc ions from contaminated aqueous solution using mango leaf powder. The experimental data was analysed statistically by hypothesis testing applying t test, paired t test and Chi-square test to (a) test the optimum value of the process pH, (b) verify the success of experiment and (c) study the effect of adsorbent dose in zinc ion removal from aqueous solutions. Comparison of calculated and tabulated values of t and χ 2 showed the results in favour of the data collected from the experiment and this has been shown on probability charts. K value for Langmuir isotherm was 0.8582 and m value for Freundlich adsorption isotherm obtained was 0.725, both are <1, indicating favourable isotherms. Karl Pearson's correlation coefficient values for Langmuir and Freundlich adsorption isotherms were obtained as 0.99 and 0.95 respectively, which show higher degree of correlation between the variables. This validates the data obtained for adsorption of zinc ions from the contaminated aqueous solution with the help of mango leaf powder.
Animated transitions in statistical data graphics.
Heer, Jeffrey; Robertson, George
2007-01-01
In this paper we investigate the effectiveness of animated transitions between common statistical data graphics such as bar charts, pie charts, and scatter plots. We extend theoretical models of data graphics to include such transitions, introducing a taxonomy of transition types. We then propose design principles for creating effective transitions and illustrate the application of these principles in DynaVis, a visualization system featuring animated data graphics. Two controlled experiments were conducted to assess the efficacy of various transition types, finding that animated transitions can significantly improve graphical perception.
Interpretation of genomic data: questions and answers.
Simon, Richard
2008-07-01
Using a question and answer format we describe important aspects of using genomic technologies in cancer research. The main challenges are not managing the mass of data, but rather the design, analysis, and accurate reporting of studies that result in increased biological knowledge and medical utility. Many analysis issues address the use of expression microarrays but are also applicable to other whole genome assays. Microarray-based clinical investigations have generated both unrealistic hype and excessive skepticism. Genomic technologies are tremendously powerful and will play instrumental roles in elucidating the mechanisms of oncogenesis and in bringing on an era of predictive medicine in which treatments are tailored to individual tumors. Achieving these goals involves challenges in rethinking many paradigms for the conduct of basic and clinical cancer research and for the organization of interdisciplinary collaboration.
Simultaneous statistical inference for epigenetic data.
Schildknecht, Konstantin; Olek, Sven; Dickhaus, Thorsten
2015-01-01
Epigenetic research leads to complex data structures. Since parametric model assumptions for the distribution of epigenetic data are hard to verify we introduce in the present work a nonparametric statistical framework for two-group comparisons. Furthermore, epigenetic analyses are often performed at various genetic loci simultaneously. Hence, in order to be able to draw valid conclusions for specific loci, an appropriate multiple testing correction is necessary. Finally, with technologies available for the simultaneous assessment of many interrelated biological parameters (such as gene arrays), statistical approaches also need to deal with a possibly unknown dependency structure in the data. Our statistical approach to the nonparametric comparison of two samples with independent multivariate observables is based on recently developed multivariate multiple permutation tests. We adapt their theory in order to cope with families of hypotheses regarding relative effects. Our results indicate that the multivariate multiple permutation test keeps the pre-assigned type I error level for the global null hypothesis. In combination with the closure principle, the family-wise error rate for the simultaneous test of the corresponding locus/parameter-specific null hypotheses can be controlled. In applications we demonstrate that group differences in epigenetic data can be detected reliably with our methodology.
BIG DATA AND STATISTICS: A STATISTICIAN'S PERSPECTIVE.
Rossell, David
2015-01-01
Big Data brings unprecedented power to address scientific, economic and societal issues, but also amplifies the possibility of certain pitfalls. These include using purely data-driven approaches that disregard understanding the phenomenon under study, aiming at a dynamically moving target, ignoring critical data collection issues, summarizing or preprocessing the data inadequately and mistaking noise for signal. We review some success stories and illustrate how statistical principles can help obtain more reliable information from data. We also touch upon current challenges that require active methodological research, such as strategies for efficient computation, integration of heterogeneous data, extending the underlying theory to increasingly complex questions and, perhaps most importantly, training a new generation of scientists to develop and deploy these strategies.
Data series embedding and scale invariant statistics.
Michieli, I; Medved, B; Ristov, S
2010-06-01
Data sequences acquired from bio-systems such as human gait data, heart rate interbeat data, or DNA sequences exhibit complex dynamics that is frequently described by a long-memory or power-law decay of autocorrelation function. One way of characterizing that dynamics is through scale invariant statistics or "fractal-like" behavior. For quantifying scale invariant parameters of physiological signals several methods have been proposed. Among them the most common are detrended fluctuation analysis, sample mean variance analyses, power spectral density analysis, R/S analysis, and recently in the realm of the multifractal approach, wavelet analysis. In this paper it is demonstrated that embedding the time series data in the high-dimensional pseudo-phase space reveals scale invariant statistics in the simple fashion. The procedure is applied on different stride interval data sets from human gait measurements time series (Physio-Bank data library). Results show that introduced mapping adequately separates long-memory from random behavior. Smaller gait data sets were analyzed and scale-free trends for limited scale intervals were successfully detected. The method was verified on artificially produced time series with known scaling behavior and with the varying content of noise. The possibility for the method to falsely detect long-range dependence in the artificially generated short range dependence series was investigated.
Statistical analysis of the lithospheric magnetic anomaly data
NASA Astrophysics Data System (ADS)
Pavon-Carrasco, Fco Javier; de Santis, Angelo; Ferraccioli, Fausto; Catalán, Manuel; Ishihara, Takemi
2013-04-01
Different analyses carried out on the lithospheric magnetic anomaly data from GEODAS DVD v5.0.10 database (World Digital Magnetic Anomaly Map, WDMAM) show that the data distribution is not Gaussian, but Laplacian. Although this behaviour has been formerly pointed out in other works (e.g., Walker and Jackson, Geophys. J. Int, 143, 799-808, 2000), they have not given any explanation about this statistical property of the magnetic anomalies. In this work, we perform different statistical tests to confirm that the lithospheric magnetic anomaly data follow indeed a Laplacian distribution and we also give a possible interpretation of this behavior providing a model of magnetization which depends on the variation of the geomagnetic field and both induced and remanent magnetizations in the terrestrial lithosphere.
Statistical modeling of space shuttle environmental data
NASA Technical Reports Server (NTRS)
Tubbs, J. D.; Brewer, D. W.
1983-01-01
Statistical models which use a class of bivariate gamma distribution are examined. Topics discussed include: (1) the ratio of positively correlated gamma varieties; (2) a method to determine if unequal shape parameters are necessary in bivariate gamma distribution; (3) differential equations for modal location of a family of bivariate gamma distribution; and (4) analysis of some wind gust data using the analytical results developed for modeling application.
Szabolcsi, Zoltán; Farkas, Zsuzsa; Borbély, Andrea; Bárány, Gusztáv; Varga, Dániel; Heinrich, Attila; Völgyi, Antónia; Pamjav, Horolma
2015-11-01
When the DNA profile from a crime-scene matches that of a suspect, the weight of DNA evidence depends on the unbiased estimation of the match probability of the profiles. For this reason, it is required to establish and expand the databases that reflect the actual allele frequencies in the population applied. 21,473 complete DNA profiles from Databank samples were used to establish the allele frequency database to represent the population of Hungarian suspects. We used fifteen STR loci (PowerPlex ESI16) including five, new ESS loci. The aim was to calculate the statistical, forensic efficiency parameters for the Databank samples and compare the newly detected data to the earlier report. The population substructure caused by relatedness may influence the frequency of profiles estimated. As our Databank profiles were considered non-random samples, possible relationships between the suspects can be assumed. Therefore, population inbreeding effect was estimated using the FIS calculation. The overall inbreeding parameter was found to be 0.0106. Furthermore, we tested the impact of the two allele frequency datasets on 101 randomly chosen STR profiles, including full and partial profiles. The 95% confidence interval estimates for the profile frequencies (pM) resulted in a tighter range when we used the new dataset compared to the previously published ones. We found that the FIS had less effect on frequency values in the 21,473 samples than the application of minimum allele frequency. No genetic substructure was detected by STRUCTURE analysis. Due to the low level of inbreeding effect and the high number of samples, the new dataset provides unbiased and precise estimates of LR for statistical interpretation of forensic casework and allows us to use lower allele frequencies. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Thoth: Software for data visualization & statistics
NASA Astrophysics Data System (ADS)
Laher, R. R.
2016-10-01
Thoth is a standalone software application with a graphical user interface for making it easy to query, display, visualize, and analyze tabular data stored in relational databases and data files. From imported data tables, it can create pie charts, bar charts, scatter plots, and many other kinds of data graphs with simple menus and mouse clicks (no programming required), by leveraging the open-source JFreeChart library. It also computes useful table-column data statistics. A mature tool, having underwent development and testing over several years, it is written in the Java computer language, and hence can be run on any computing platform that has a Java Virtual Machine and graphical-display capability. It can be downloaded and used by anyone free of charge, and has general applicability in science, engineering, medical, business, and other fields. Special tools and features for common tasks in astronomy and astrophysical research are included in the software.
The seismic analyzer: interpreting and illustrating 2D seismic data.
Patel, Daniel; Giertsen, Christopher; Thurmond, John; Gjelberg, John; Gröller, M Eduard
2008-01-01
We present a toolbox for quickly interpreting and illustrating 2D slices of seismic volumetric reflection data. Searching for oil and gas involves creating a structural overview of seismic reflection data to identify hydrocarbon reservoirs. We improve the search of seismic structures by precalculating the horizon structures of the seismic data prior to interpretation. We improve the annotation of seismic structures by applying novel illustrative rendering algorithms tailored to seismic data, such as deformed texturing and line and texture transfer functions. The illustrative rendering results in multi-attribute and scale invariant visualizations where features are represented clearly in both highly zoomed in and zoomed out views. Thumbnail views in combination with interactive appearance control allows for a quick overview of the data before detailed interpretation takes place. These techniques help reduce the work of seismic illustrators and interpreters.
78 FR 10166 - Access Interpreting; Transfer of Data
Federal Register 2010, 2011, 2012, 2013, 2014
2013-02-13
... From the Federal Register Online via the Government Publishing Office ENVIRONMENTAL PROTECTION AGENCY Access Interpreting; Transfer of Data AGENCY: Environmental Protection Agency (EPA). ACTION: Notice. SUMMARY: This notice announces that pesticide related information submitted to EPA's Office of...
Interpreting New Data from the High Energy Frontier
Thaler, Jesse
2016-09-26
This is the final technical report for DOE grant DE-SC0006389, "Interpreting New Data from the High Energy Frontier", describing research accomplishments by the PI in the field of theoretical high energy physics.
Common misconceptions about data analysis and statistics.
Motulsky, Harvey J
2014-10-01
Ideally, any experienced investigator with the right tools should be able to reproduce a finding published in a peer-reviewed biomedical science journal. In fact, however, the reproducibility of a large percentage of published findings has been questioned. Undoubtedly, there are many reasons for this, but one reason may be that investigators fool themselves due to a poor understanding of statistical concepts. In particular, investigators often make these mistakes: 1) P-hacking, which is when you reanalyze a data set in many different ways, or perhaps reanalyze with additional replicates, until you get the result you want; 2) overemphasis on P values rather than on the actual size of the observed effect; 3) overuse of statistical hypothesis testing, and being seduced by the word "significant"; and 4) over-reliance on standard errors, which are often misunderstood. Copyright © 2014 Creative Commons Attribution-NoDerivatives 4.0 International (CC-BY-ND 4.0).
Common misconceptions about data analysis and statistics.
Motulsky, Harvey J
2015-02-01
Ideally, any experienced investigator with the right tools should be able to reproduce a finding published in a peer-reviewed biomedical science journal. In fact, the reproducibility of a large percentage of published findings has been questioned. Undoubtedly, there are many reasons for this, but one reason may be that investigators fool themselves due to a poor understanding of statistical concepts. In particular, investigators often make these mistakes: (1) P-Hacking. This is when you reanalyze a data set in many different ways, or perhaps reanalyze with additional replicates, until you get the result you want. (2) Overemphasis on P values rather than on the actual size of the observed effect. (3) Overuse of statistical hypothesis testing, and being seduced by the word "significant". (4) Overreliance on standard errors, which are often misunderstood.
Common misconceptions about data analysis and statistics.
Motulsky, Harvey J
2014-11-01
Ideally, any experienced investigator with the right tools should be able to reproduce a finding published in a peer-reviewed biomedical science journal. In fact, the reproducibility of a large percentage of published findings has been questioned. Undoubtedly, there are many reasons for this, but one reason maybe that investigators fool themselves due to a poor understanding of statistical concepts. In particular, investigators often make these mistakes: 1. P-Hacking. This is when you reanalyze a data set in many different ways, or perhaps reanalyze with additional replicates, until you get the result you want. 2. Overemphasis on P values rather than on the actual size of the observed effect. 3. Overuse of statistical hypothesis testing, and being seduced by the word "significant". 4. Overreliance on standard errors, which are often misunderstood.
Securizing data linkage in french public statistics.
Guesdon, Maxence; Benzenine, Eric; Gadouche, Kamel; Quantin, Catherine
2016-10-06
Administrative records in France, especially medical and social records, have huge potential for statistical studies. The NIR (a national identifier) is widely used in medico-social administrations, and this would theoretically provide considerable scope for data matching, on condition that the legislation on such matters was respected.The law, however, forbids the processing of non-anonymized medical data, thus making it difficult to carry out studies that require several sources of social and medical data.We would like to benefit from computer techniques introduced since the 70 s to provide safe linkage of anonymized files, to release the current constraints of such procedures.We propose an organization and a data workflow, based on hashing and cyrptographic techniques, to strongly compartmentalize identifying and not-identifying data.The proposed method offers a strong control over who is in possession of which information, using different hashing keys for each linkage. This allows to prevent unauthorized linkage of data, to protect anonymity, by preventing cumulation of not-identifying data which can become identifying data when linked.Our proposal would make it possible to conduct such studies more easily, more regularly and more precisely while preserving a high enough level of anonymity.The main obstacle to setting up such a system, in our opinion, is not technical, but rather organizational in that it is based on the existence of a Key-Management Authority.
A Novel Statistical Analysis and Interpretation of Flow Cytometry Data
2013-07-05
aCenter for Research in Scientific Computation and Center for Quantitative Sciences in Biomedicine, North Carolina State University, Raleigh, NC 27695-8212...dependent com- partmental model for computing cell numbers in CFSE-based lymphocyte proliferation assays, Math . Biosci. Eng. 9 (2012), pp. 699–736. CRSC-TR12...USA; bICREA Infection Biology Laboratory, Department of Experimental and Health Sciences , Universitat Pompeu Fabra, 08003 Barcelona, Spain (Received
A Novel Statistical Analysis and Interpretation of Flow Cytometry Data
2013-03-31
Scientific Computation and Center for Quantitative Sciences in Biomedicine North Carolina State University, Raleigh, NC 27695-8212 Cristina Peligero...Jordi Argilaguet, and Andreas Meyerhans ICREA Infection Biology Lab, Department of Experimental and Health Sciences Universitat Pompeu Fabra, 08003...the fast computational approaches as described in [27]. It is also shown how the new model can be compared with older label-structured models such as
The Statistical Literacy Needed to Interpret School Assessment Data
ERIC Educational Resources Information Center
Chick, Helen; Pierce, Robyn
2013-01-01
State-wide and national testing in areas such as literacy and numeracy produces reports containing graphs and tables illustrating school and individual performance. These are intended to inform teachers, principals, and education organisations about student and school outcomes, to guide change and improvement. Given the complexity of the…
Some statistical issues in modelling pharmacokinetic data.
Lindsey, J K; Jones, B; Jarvis, P
A fundamental assumption underlying pharmacokinetic compartment modelling is that each subject has a different individual curve. To some extent this runs counter to the statistical principle that similar individuals will have similar curves, thus making inferences to a wider population possible. In population pharmacokinetics, the compromise is to use random effects. We recommend that such models also be used in data rich situations instead of independently fitting individual curves. However, the additional information available in such studies shows that random effects are often not sufficient; generally, an autoregressive process is also required. This has the added advantage that it provides a means of tracking each individual, yielding predictions for the next observation. The compartment model curve being fitted may also be distorted in other ways. A widely held assumption is that most, if not all, pharmacokinetic concentration data follow a log-normal distribution. By examples, we show that this is not generally true, with the gamma distribution often being more suitable. When extreme individuals are present, a heavy-tailed distribution, such as the log Cauchy, can often provide more robust results. Finally, other assumptions that can distort the results include a direct dependence of the variance, or other dispersion parameter, on the mean and setting non-detectable values to some arbitrarily small value instead of treating them as censored. By pointing out these problems with standard methods of statistical modelling of pharmacokinetic data, we hope that commercial software will soon make more flexible and suitable models available.
ASTER data processing using statistical learning algorithm
NASA Astrophysics Data System (ADS)
Kumar, Anil; Dadhwal, V. K.; Ghosh, S. K.
2006-12-01
In this work fuzzy set theory based as well as statistical learning algorithm have been studied at sub-pixel classification level. Here two Fuzzy set theory based classifiers, namely, Fuzzy c-Means (FCM) and Possibilistic c- Means (PCM) have been used in supervised modes. Support Vector Machines (SVMs) have been used in this study for density estimation as a statistical learning based sub-pixel classifier while using Mean Field (MF) method for learning. An in-house package SMIC (Sub-Pixel -Multi-Spectral Image Classifier) was used and sensitivity of all the three algorithms (FCM, PCM and SVMs) has been checked for dimensionality data sets at 3 to 14 bands from ASTER data. The accuracy of sub-pixel classification outputs has been evaluated using Fuzzy Error Matrix (FERM). In contrast to FCM and PCM, SVM approach showed a clear increase in the accuracy with higher dimensionality data and clearly out performed other two approaches for sub-pixel classification.
Menzerath-Altmann Law: Statistical Mechanical Interpretation as Applied to a Linguistic Organization
NASA Astrophysics Data System (ADS)
Eroglu, Sertac
2014-10-01
The distribution behavior described by the empirical Menzerath-Altmann law is frequently encountered during the self-organization of linguistic and non-linguistic natural organizations at various structural levels. This study presents a statistical mechanical derivation of the law based on the analogy between the classical particles of a statistical mechanical organization and the distinct words of a textual organization. The derived model, a transformed (generalized) form of the Menzerath-Altmann model, was termed as the statistical mechanical Menzerath-Altmann model. The derived model allows interpreting the model parameters in terms of physical concepts. We also propose that many organizations presenting the Menzerath-Altmann law behavior, whether linguistic or not, can be methodically examined by the transformed distribution model through the properly defined structure-dependent parameter and the energy associated states.
Novel Data Reduction Based on Statistical Similarity
Lee, Dongeun; Sim, Alex; Choi, Jaesik; ...
2016-07-18
Applications such as scientific simulations and power grid monitoring are generating so much data quickly that compression is essential to reduce storage requirement or transmission capacity. To achieve better compression, one is often willing to discard some repeated information. These lossy compression methods are primarily designed to minimize the Euclidean distance between the original data and the compressed data. But this measure of distance severely limits either reconstruction quality or compression performance. In this paper, we propose a new class of compression method by redefining the distance measure with a statistical concept known as exchangeability. This approach reduces the storagemore » requirement and captures essential features, while reducing the storage requirement. In this paper, we report our design and implementation of such a compression method named IDEALEM. To demonstrate its effectiveness, we apply it on a set of power grid monitoring data, and show that it can reduce the volume of data much more than the best known compression method while maintaining the quality of the compressed data. Finally, in these tests, IDEALEM captures extraordinary events in the data, while its compression ratios can far exceed 100.« less
Novel Data Reduction Based on Statistical Similarity
Lee, Dongeun; Sim, Alex; Choi, Jaesik; Wu, Kesheng
2016-07-18
Applications such as scientific simulations and power grid monitoring are generating so much data quickly that compression is essential to reduce storage requirement or transmission capacity. To achieve better compression, one is often willing to discard some repeated information. These lossy compression methods are primarily designed to minimize the Euclidean distance between the original data and the compressed data. But this measure of distance severely limits either reconstruction quality or compression performance. In this paper, we propose a new class of compression method by redefining the distance measure with a statistical concept known as exchangeability. This approach reduces the storage requirement and captures essential features, while reducing the storage requirement. In this paper, we report our design and implementation of such a compression method named IDEALEM. To demonstrate its effectiveness, we apply it on a set of power grid monitoring data, and show that it can reduce the volume of data much more than the best known compression method while maintaining the quality of the compressed data. Finally, in these tests, IDEALEM captures extraordinary events in the data, while its compression ratios can far exceed 100.
Statistical tools to analyze continuous glucose monitor data.
Clarke, William; Kovatchev, Boris
2009-06-01
Continuous glucose monitors (CGMs) generate data streams that are both complex and voluminous. The analyses of these data require an understanding of the physical, biochemical, and mathematical properties involved in this technology. This article describes several methods that are pertinent to the analysis of CGM data, taking into account the specifics of the continuous monitoring data streams. These methods include: (1) evaluating the numerical and clinical accuracy of CGM. We distinguish two types of accuracy metrics-numerical and clinical-each having two subtypes measuring point and trend accuracy. The addition of trend accuracy, e.g., the ability of CGM to reflect the rate and direction of blood glucose (BG) change, is unique to CGM as these new devices are capable of capturing BG not only episodically, but also as a process in time. (2) Statistical approaches for interpreting CGM data. The importance of recognizing that the basic unit for most analyses is the glucose trace of an individual, i.e., a time-stamped series of glycemic data for each person, is stressed. We discuss the use of risk assessment, as well as graphical representation of the data of a person via glucose and risk traces and Poincaré plots, and at a group level via Control Variability-Grid Analysis. In summary, a review of methods specific to the analysis of CGM data series is presented, together with some new techniques. These methods should facilitate the extraction of information from, and the interpretation of, complex and voluminous CGM time series.
Regional interpretation of water-quality monitoring data
Smith, R.A.; Schwarz, G.E.; Alexander, R.B.
1997-01-01
We describe a method for using spatially referenced regressions of contaminant transport on watershed attributes (SPARROW) in regional water-quality assessment. The method is designed to reduce the problems of data interpretation caused by sparse sampling, network bias, and basin heterogeneity. The regression equation relates measured transport rates in streams to spatially referenced descriptors of pollution sources and land-surface and stream-channel characteristics. Regression models of total phosphorus (TP) and total nitrogen (TN) transport are constructed for a region defined as the nontidal conterminous United States. Observed TN and TP transport rates are derived from water-quality records for 414 stations in the National Stream Quality Accounting Network. Nutrient sources identified in the equations include point sources, applied fertilizer, livestock waste, nonagricultural land, and atmospheric deposition (TN only). Surface characteristics found to be significant predictors of land-water delivery include soil permeability, stream density, and temperature (TN only). Estimated instream decay coefficients for the two contaminants decrease monotonically with increasing stream size. TP transport is found to be significantly reduced by reservoir retention. Spatial referencing of basin attributes in relation to the stream channel network greatly increases their statistical significance and model accuracy. The method is used to estimate the proportion of watersheds in the conterminous United States (i.e., hydrologic cataloging units) with outflow TP concentrations less than the criterion of 0.1 mg L, and to classify cataloging units according to local TN yield (kg/km2/yr).
Regional interpretation of water-quality monitoring data
NASA Astrophysics Data System (ADS)
Smith, Richard A.; Schwarz, Gregory E.; Alexander, Richard B.
1997-12-01
We describe a method for using spatially referenced regressions of contaminant transport on watershed attributes (SPARROW) in regional water-quality assessment. The method is designed to reduce the problems of data interpretation caused by sparse sampling, network bias, and basin heterogeneity. The regression equation relates measured transport rates in streams to spatially referenced descriptors of pollution sources and land-surface and stream-channel characteristics. Regression models of total phosphorus (TP) and total nitrogen (TN) transport are constructed for a region defined as the nontidal conterminous United States. Observed TN and TP transport rates are derived from water-quality records for 414 stations in the National Stream Quality Accounting Network. Nutrient sources identified in the equations include point sources, applied fertilizer, livestock waste, nonagricultural land, and atmospheric deposition (TN only). Surface characteristics found to be significant predictors of land-water delivery include soil permeability, stream density, and temperature (TN only). Estimated instream decay coefficients for the two contaminants decrease monotonically with increasing stream size. TP transport is found to be significantly reduced by reservoir retention. Spatial referencing of basin attributes in relation to the stream channel network greatly increases their statistical significance and model accuracy. The method is used to estimate the proportion of watersheds in the conterminous United States (i.e., hydrologic cataloging units) with outflow TP concentrations less than the criterion of 0.1 mg/L, and to classify cataloging units according to local TN yield (kg/km2/yr).
Revisiting the statistical analysis of pyroclast density and porosity data
NASA Astrophysics Data System (ADS)
Bernard, B.; Kueppers, U.; Ortiz, H.
2015-03-01
Explosive volcanic eruptions are commonly characterized based on a thorough analysis of the generated deposits. Amongst other characteristics in physical volcanology, density and porosity of juvenile clasts are some of the most frequently used characteristics to constrain eruptive dynamics. In this study, we evaluate the sensitivity of density and porosity data and introduce a weighting parameter to correct issues raised by the use of frequency analysis. Results of textural investigation can be biased by clast selection. Using statistical tools as presented here, the meaningfulness of a conclusion can be checked for any dataset easily. This is necessary to define whether or not a sample has met the requirements for statistical relevance, i.e. whether a dataset is large enough to allow for reproducible results. Graphical statistics are used to describe density and porosity distributions, similar to those used for grain-size analysis. This approach helps with the interpretation of volcanic deposits. To illustrate this methodology we chose two large datasets: (1) directed blast deposits of the 3640-3510 BC eruption of Chachimbiro volcano (Ecuador) and (2) block-and-ash-flow deposits of the 1990-1995 eruption of Unzen volcano (Japan). We propose add the use of this analysis for future investigations to check the objectivity of results achieved by different working groups and guarantee the meaningfulness of the interpretation.
Statistical methods and computing for big data
Wang, Chun; Chen, Ming-Hui; Schifano, Elizabeth; Wu, Jing
2016-01-01
Big data are data on a massive scale in terms of volume, intensity, and complexity that exceed the capacity of standard analytic tools. They present opportunities as well as challenges to statisticians. The role of computational statisticians in scientific discovery from big data analyses has been under-recognized even by peer statisticians. This article summarizes recent methodological and software developments in statistics that address the big data challenges. Methodologies are grouped into three classes: subsampling-based, divide and conquer, and online updating for stream data. As a new contribution, the online updating approach is extended to variable selection with commonly used criteria, and their performances are assessed in a simulation study with stream data. Software packages are summarized with focuses on the open source R and R packages, covering recent tools that help break the barriers of computer memory and computing power. Some of the tools are illustrated in a case study with a logistic regression for the chance of airline delay. PMID:27695593
Quantitative interpretation of airborne gravity gradiometry data for mineral exploration
NASA Astrophysics Data System (ADS)
Martinez, Cericia D.
In the past two decades, commercialization of previously classified instrumentation has provided the ability to rapidly collect quality gravity gradient measurements for resource exploration. In the near future, next-generation instrumentation are expected to further advance acquisition of higher-quality data not subject to pre-processing regulations. Conversely, the ability to process and interpret gravity gradiometry data has not kept pace with innovations occurring in data acquisition systems. The purpose of the research presented in this thesis is to contribute to the understanding, development, and application of processing and interpretation techniques available for airborne gravity gradiometry in resource exploration. In particular, this research focuses on the utility of 3D inversion of gravity gradiometry for interpretation purposes. Towards this goal, I investigate the requisite components for an integrated interpretation workflow. In addition to practical 3D inversions, components of the workflow include estimation of density for terrain correction, processing of multi-component data using equivalent source for denoising, quantification of noise level, and component conversion. The objective is to produce high quality density distributions for subsequent geological interpretation. I then investigate the use of the inverted density model in orebody imaging, lithology differentiation, and resource evaluation. The systematic and sequential approach highlighted in the thesis addresses some of the challenges facing the use of gravity gradiometry as an exploration tool, while elucidating a procedure for incorporating gravity gradient interpretations into the lifecycle of not only resource exploration, but also resource modeling.
Weatherization Assistance Program - Background Data and Statistics
Eisenberg, Joel Fred
2010-03-01
This technical memorandum is intended to provide readers with information that may be useful in understanding the purposes, performance, and outcomes of the Department of Energy's (DOE's) Weatherization Assistance Program (Weatherization). Weatherization has been in operation for over thirty years and is the nation's largest single residential energy efficiency program. Its primary purpose, established by law, is 'to increase the energy efficiency of dwellings owned or occupied by low-income persons, reduce their total residential energy expenditures, and improve their health and safety, especially low-income persons who are particularly vulnerable such as the elderly, the handicapped, and children.' The American Reinvestment and Recovery Act PL111-5 (ARRA), passed and signed into law in February 2009, committed $5 Billion over two years to an expanded Weatherization Assistance Program. This has created substantial interest in the program, the population it serves, the energy and cost savings it produces, and its cost-effectiveness. This memorandum is intended to address the need for this kind of information. Statistically valid answers to many of the questions surrounding Weatherization and its performance require comprehensive evaluation of the program. DOE is undertaking precisely this kind of independent evaluation in order to ascertain program effectiveness and to improve its performance. Results of this evaluation effort will begin to emerge in late 2010 and 2011, but they require substantial time and effort. In the meantime, the data and statistics in this memorandum can provide reasonable and transparent estimates of key program characteristics. The memorandum is laid out in three sections. The first deals with some key characteristics describing low-income energy consumption and expenditures. The second section provides estimates of energy savings and energy bill reductions that the program can reasonably be presumed to be producing. The third section
Normative Data for Interpreting the BREAST-Q: Augmentation.
Mundy, Lily R; Homa, Karen; Klassen, Anne F; Pusic, Andrea L; Kerrigan, Carolyn L
2017-04-01
The BREAST-Q is a rigorously developed, well-validated, patient-reported outcome instrument with a module designed for evaluating breast augmentation outcomes. However, there are no published normative BREAST-Q scores, limiting interpretation. Normative data were generated for the BREAST-Q Augmentation module by means of the Army of Women, an online community of women (with and without breast cancer) engaged in breast-cancer related research. Members were recruited by means of e-mail; women aged 18 years or older without a history of breast cancer or breast surgery were invited to participate. Descriptive statistics and a linear multivariate regression were performed. A separate analysis compared normative scores to findings from previously published BREAST-Q augmentation studies. The preoperative BREAST-Q Augmentation module was completed by 1211 women. Mean age was 54 ± 24 years, the mean body mass index was 27 ± 6 kg/m, and 39 percent (n = 467) had a bra cup size of D or greater. Mean scores were as follows: Satisfaction with Breasts, 54 ± 19; Psychosocial Well-being, 66 ± 20; Sexual Well-being, 49 ± 20; and Physical Well-being, 86 ± 15. Women with a body mass index of 30 kg/m or greater and bra cup size of D or greater had lower scores. In comparison with Army of Women scores, published BREAST-Q augmentation scores were lower before and higher after surgery for all scales except Physical Well-being. The Army of Women normative data represent breast-related satisfaction and well-being in women not actively seeking breast augmentation. These data may be used as normative comparison values for those seeking and undergoing surgery as we did, demonstrating the value of breast augmentation in this patient population.
Statistical atlas based extrapolation of CT data
NASA Astrophysics Data System (ADS)
Chintalapani, Gouthami; Murphy, Ryan; Armiger, Robert S.; Lepisto, Jyri; Otake, Yoshito; Sugano, Nobuhiko; Taylor, Russell H.; Armand, Mehran
2010-02-01
We present a framework to estimate the missing anatomical details from a partial CT scan with the help of statistical shape models. The motivating application is periacetabular osteotomy (PAO), a technique for treating developmental hip dysplasia, an abnormal condition of the hip socket that, if untreated, may lead to osteoarthritis. The common goals of PAO are to reduce pain, joint subluxation and improve contact pressure distribution by increasing the coverage of the femoral head by the hip socket. While current diagnosis and planning is based on radiological measurements, because of significant structural variations in dysplastic hips, a computer-assisted geometrical and biomechanical planning based on CT data is desirable to help the surgeon achieve optimal joint realignments. Most of the patients undergoing PAO are young females, hence it is usually desirable to minimize the radiation dose by scanning only the joint portion of the hip anatomy. These partial scans, however, do not provide enough information for biomechanical analysis due to missing iliac region. A statistical shape model of full pelvis anatomy is constructed from a database of CT scans. The partial volume is first aligned with the statistical atlas using an iterative affine registration, followed by a deformable registration step and the missing information is inferred from the atlas. The atlas inferences are further enhanced by the use of X-ray images of the patient, which are very common in an osteotomy procedure. The proposed method is validated with a leave-one-out analysis method. Osteotomy cuts are simulated and the effect of atlas predicted models on the actual procedure is evaluated.
Statistical tests on clustered global earthquake synthetic data sets
NASA Astrophysics Data System (ADS)
Daub, Eric G.; Trugman, Daniel T.; Johnson, Paul A.
2015-08-01
We study the ability of statistical tests to identify nonrandom features of earthquake catalogs, with a focus on the global earthquake record since 1900. We construct four types of synthetic data sets containing varying strengths of clustering, with each data set containing on average 10,000 events over 100 years with magnitudes above M = 6. We apply a suite of statistical tests to each synthetic realization in order to evaluate the ability of each test to identify the sequences of events as nonrandom. Our results show that detection ability is dependent on the quantity of data, the nature of the type of clustering, and the specific signal used in the statistical test. Data sets that exhibit a stronger variation in the seismicity rate are generally easier to identify as nonrandom for a given background rate. We also show that we can address this problem in a Bayesian framework, with the clustered data sets as prior distributions. Using this new Bayesian approach, we can place quantitative bounds on the range of possible clustering strengths that are consistent with the global earthquake data. At M = 7, we can estimate 99th percentile confidence bounds on the number of triggered events, with an upper bound of 20% of the catalog for global aftershock sequences, with a stronger upper bound on the fraction of triggered events of 10% for long-term event clusters. At M = 8, the bounds are less strict due to the reduced number of events. However, our analysis shows that other types of clustering could be present in the data that we are unable to detect. Our results aid in the interpretation of the results of statistical tests on earthquake catalogs, both worldwide and regionally.
Encoding Dissimilarity Data for Statistical Model Building
Wahba, Grace
2010-01-01
We summarize, review and comment upon three papers which discuss the use of discrete, noisy, incomplete, scattered pairwise dissimilarity data in statistical model building. Convex cone optimization codes are used to embed the objects into a Euclidean space which respects the dissimilarity information while controlling the dimension of the space. A “newbie” algorithm is provided for embedding new objects into this space. This allows the dissimilarity information to be incorporated into a Smoothing Spline ANOVA penalized likelihood model, a Support Vector Machine, or any model that will admit Reproducing Kernel Hilbert Space components, for nonparametric regression, supervised learning, or semi-supervised learning. Future work and open questions are discussed. The papers are: F. Lu, S. Keles, S. Wright and G. Wahba 2005. A framework for kernel regularization with application to protein clustering. Proceedings of the National Academy of Sciences 102, 12332–1233.G. Corrada Bravo, G. Wahba, K. Lee, B. Klein, R. Klein and S. Iyengar 2009. Examining the relative influence of familial, genetic and environmental covariate information in flexible risk models. Proceedings of the National Academy of Sciences 106, 8128–8133F. Lu, Y. Lin and G. Wahba. Robust manifold unfolding with kernel regularization. TR 1008, Department of Statistics, University of Wisconsin-Madison. PMID:20814436
Encoding Dissimilarity Data for Statistical Model Building.
Wahba, Grace
2010-12-01
We summarize, review and comment upon three papers which discuss the use of discrete, noisy, incomplete, scattered pairwise dissimilarity data in statistical model building. Convex cone optimization codes are used to embed the objects into a Euclidean space which respects the dissimilarity information while controlling the dimension of the space. A "newbie" algorithm is provided for embedding new objects into this space. This allows the dissimilarity information to be incorporated into a Smoothing Spline ANOVA penalized likelihood model, a Support Vector Machine, or any model that will admit Reproducing Kernel Hilbert Space components, for nonparametric regression, supervised learning, or semi-supervised learning. Future work and open questions are discussed. The papers are: F. Lu, S. Keles, S. Wright and G. Wahba 2005. A framework for kernel regularization with application to protein clustering. Proceedings of the National Academy of Sciences 102, 12332-1233.G. Corrada Bravo, G. Wahba, K. Lee, B. Klein, R. Klein and S. Iyengar 2009. Examining the relative influence of familial, genetic and environmental covariate information in flexible risk models. Proceedings of the National Academy of Sciences 106, 8128-8133F. Lu, Y. Lin and G. Wahba. Robust manifold unfolding with kernel regularization. TR 1008, Department of Statistics, University of Wisconsin-Madison.
Fordyce, James A.
2010-01-01
Background Phylogenetic hypotheses are increasingly being used to elucidate historical patterns of diversification rate-variation. Hypothesis testing is often conducted by comparing the observed vector of branching times to a null, pure-birth expectation. A popular method for inferring a decrease in speciation rate, which might suggest an early burst of diversification followed by a decrease in diversification rate is the γ statistic. Methodology Using simulations under varying conditions, I examine the sensitivity of γ to the distribution of the most recent branching times. Using an exploratory data analysis tool for lineages through time plots, tree deviation, I identified trees with a significant γ statistic that do not appear to have the characteristic early accumulation of lineages consistent with an early, rapid rate of cladogenesis. I further investigated the sensitivity of the γ statistic to recent diversification by examining the consequences of failing to simulate the full time interval following the most recent cladogenic event. The power of γ to detect rate decrease at varying times was assessed for simulated trees with an initial high rate of diversification followed by a relatively low rate. Conclusions The γ statistic is extraordinarily sensitive to recent diversification rates, and does not necessarily detect early bursts of diversification. This was true for trees of various sizes and completeness of taxon sampling. The γ statistic had greater power to detect recent diversification rate decreases compared to early bursts of diversification. Caution should be exercised when interpreting the γ statistic as an indication of early, rapid diversification. PMID:20668707
Interpreting Survey Data to Inform Solid-Waste Education Programs
ERIC Educational Resources Information Center
McKeown, Rosalyn
2006-01-01
Few examples exist on how to use survey data to inform public environmental education programs. I suggest a process for interpreting statewide survey data with the four questions that give insights into local context and make it possible to gain insight into potential target audiences and community priorities. The four questions are: What…
Customizable tool for ecological data entry, assessment, monitoring, and interpretation
USDA-ARS?s Scientific Manuscript database
The Database for Inventory, Monitoring and Assessment (DIMA) is a highly customizable tool for data entry, assessment, monitoring, and interpretation. DIMA is a Microsoft Access database that can easily be used without Access knowledge and is available at no cost. Data can be entered for common, nat...
Interpreting Survey Data to Inform Solid-Waste Education Programs
ERIC Educational Resources Information Center
McKeown, Rosalyn
2006-01-01
Few examples exist on how to use survey data to inform public environmental education programs. I suggest a process for interpreting statewide survey data with the four questions that give insights into local context and make it possible to gain insight into potential target audiences and community priorities. The four questions are: What…
Computer Simulation of Incomplete-Data Interpretation Exercise.
ERIC Educational Resources Information Center
Robertson, Douglas Frederick
1987-01-01
Described is a computer simulation that was used to help general education students enrolled in a large introductory geology course. The purpose of the simulation is to learn to interpret incomplete data. Students design a plan to collect bathymetric data for an area of the ocean. Procedures used by the students and instructor are included.…
NASA Technical Reports Server (NTRS)
Shewhart, Mark
1991-01-01
Statistical Process Control (SPC) charts are one of several tools used in quality control. Other tools include flow charts, histograms, cause and effect diagrams, check sheets, Pareto diagrams, graphs, and scatter diagrams. A control chart is simply a graph which indicates process variation over time. The purpose of drawing a control chart is to detect any changes in the process signalled by abnormal points or patterns on the graph. The Artificial Intelligence Support Center (AISC) of the Acquisition Logistics Division has developed a hybrid machine learning expert system prototype which automates the process of constructing and interpreting control charts.
Statistical Resolution of Ambiguous HLA Typing Data
Listgarten, Jennifer; Brumme, Zabrina; Kadie, Carl; Xiaojiang, Gao; Walker, Bruce; Carrington, Mary; Goulder, Philip; Heckerman, David
2008-01-01
High-resolution HLA typing plays a central role in many areas of immunology, such as in identifying immunogenetic risk factors for disease, in studying how the genomes of pathogens evolve in response to immune selection pressures, and also in vaccine design, where identification of HLA-restricted epitopes may be used to guide the selection of vaccine immunogens. Perhaps one of the most immediate applications is in direct medical decisions concerning the matching of stem cell transplant donors to unrelated recipients. However, high-resolution HLA typing is frequently unavailable due to its high cost or the inability to re-type historical data. In this paper, we introduce and evaluate a method for statistical, in silico refinement of ambiguous and/or low-resolution HLA data. Our method, which requires an independent, high-resolution training data set drawn from the same population as the data to be refined, uses linkage disequilibrium in HLA haplotypes as well as four-digit allele frequency data to probabilistically refine HLA typings. Central to our approach is the use of haplotype inference. We introduce new methodology to this area, improving upon the Expectation-Maximization (EM)-based approaches currently used within the HLA community. Our improvements are achieved by using a parsimonious parameterization for haplotype distributions and by smoothing the maximum likelihood (ML) solution. These improvements make it possible to scale the refinement to a larger number of alleles and loci in a more computationally efficient and stable manner. We also show how to augment our method in order to incorporate ethnicity information (as HLA allele distributions vary widely according to race/ethnicity as well as geographic area), and demonstrate the potential utility of this experimentally. A tool based on our approach is freely available for research purposes at http://microsoft.com/science. PMID:18392148
2010-01-01
Background The null hypothesis significance test (NHST) is the most frequently used statistical method, although its inferential validity has been widely criticized since its introduction. In 1988, the International Committee of Medical Journal Editors (ICMJE) warned against sole reliance on NHST to substantiate study conclusions and suggested supplementary use of confidence intervals (CI). Our objective was to evaluate the extent and quality in the use of NHST and CI, both in English and Spanish language biomedical publications between 1995 and 2006, taking into account the International Committee of Medical Journal Editors recommendations, with particular focus on the accuracy of the interpretation of statistical significance and the validity of conclusions. Methods Original articles published in three English and three Spanish biomedical journals in three fields (General Medicine, Clinical Specialties and Epidemiology - Public Health) were considered for this study. Papers published in 1995-1996, 2000-2001, and 2005-2006 were selected through a systematic sampling method. After excluding the purely descriptive and theoretical articles, analytic studies were evaluated for their use of NHST with P-values and/or CI for interpretation of statistical "significance" and "relevance" in study conclusions. Results Among 1,043 original papers, 874 were selected for detailed review. The exclusive use of P-values was less frequent in English language publications as well as in Public Health journals; overall such use decreased from 41% in 1995-1996 to 21% in 2005-2006. While the use of CI increased over time, the "significance fallacy" (to equate statistical and substantive significance) appeared very often, mainly in journals devoted to clinical specialties (81%). In papers originally written in English and Spanish, 15% and 10%, respectively, mentioned statistical significance in their conclusions. Conclusions Overall, results of our review show some improvements in
Issues in the statistical analysis of small area health data.
Wakefield, J; Elliott, P
The availability of geographically indexed health and population data, with advances in computing, geographical information systems and statistical methodology, have opened the way for serious exploration of small area health statistics based on routine data. Such analyses may be used to address specific questions concerning health in relation to sources of pollution, to investigate clustering of disease or for hypothesis generation. We distinguish four types of analysis: disease mapping; geographic correlation studies; the assessment of risk in relation to a prespecified point or line source, and cluster detection and disease clustering. A general framework for the statistical analysis of small area studies will be considered. This framework assumes that populations at risk arise from inhomogeneous Poisson processes. Disease cases are then realizations of a thinned Poisson process where the risk of disease depends on the characteristics of the person, time and spatial location. Difficulties of analysis and interpretation due to data inaccuracies and aggregation will be addressed with particular reference to ecological bias and confounding. The use of errors-in-variables modelling in small area analyses will be discussed.
Statistical analysis of SSME system data
NASA Technical Reports Server (NTRS)
Temple, Enoch C.; Shipman, Jerry R.
1988-01-01
A statistical methodology to enhance the Space Shuttle Main Engine (SSME) performance prediction accuracy is proposed. This methodology was to be used in conjunction with existing SSME performance prediction computer codes to improve parameter prediction accuracy and to quantify that accuracy. However, after a review of related literature, researchers concluded that the proposed problem required a coverage of areas such as linear and nonlinear system theory, measurement theory, statistics, and stochastic estimation. Since state space theory is the foundation for a more complete study of each of the before mentioned areas, these researchers chose to refocus emphasis to cover the more specialized topic of state vector estimation procedures. State vector estimation was also selected because of current and future concerns by NASA for SSME performance evaluation; i.e., there is a current interest in an improved evaluation procedure for actual SSME post flight performance as well as for post static test performance of a single SSME. A current investigation of analytical methods may be used to improve test stand failure detection. This paper considers the issue of post flight/test state variable reconstruction through the application of observations made on the output of the Space Shuttle propulsion system. Rogers used the Kalman filtering procedure to reconstruct the state variables of the Space Shuttle propulsion system. An objective of this paper is to give the general setup of the Kalman filter and its connection to linear regression. A second objective is to examine the reconstruction methodology for application to the reconstruction of the state vector of a single Space Shuttle Main Engine (SSME) by using static test firing data.
Accessing seismic data through geological interpretation: Challenges and solutions
NASA Astrophysics Data System (ADS)
Butler, R. W.; Clayton, S.; McCaffrey, B.
2008-12-01
Between them, the world's research programs, national institutions and corporations, especially oil and gas companies, have acquired substantial volumes of seismic reflection data. Although the vast majority are proprietary and confidential, significant data are released and available for research, including those in public data libraries. The challenge now is to maximise use of these data, by providing routes to seismic not simply on the basis of acquisition or processing attributes but via the geology they image. The Virtual Seismic Atlas (VSA: www.seismicatlas.org) meets this challenge by providing an independent, free-to-use community based internet resource that captures and shares the geological interpretation of seismic data globally. Images and associated documents are explicitly indexed by extensive metadata trees, using not only existing survey and geographical data but also the geology they portray. The solution uses a Documentum database interrogated through Endeca Guided Navigation, to search, discover and retrieve images. The VSA allows users to compare contrasting interpretations of clean data thereby exploring the ranges of uncertainty in the geometric interpretation of subsurface structure. The metadata structures can be used to link reports and published research together with other data types such as wells. And the VSA can link to existing data libraries. Searches can take different paths, revealing arrays of geological analogues, new datasets while providing entirely novel insights and genuine surprises. This can then drive new creative opportunities for research and training, and expose the contents of seismic data libraries to the world.
2-D Versus 3-D Magnetotelluric Data Interpretation
NASA Astrophysics Data System (ADS)
Ledo, Juanjo
2005-09-01
In recent years, the number of publications dealing with the mathematical and physical 3-D aspects of the magnetotelluric method has increased drastically. However, field experiments on a grid are often impractical and surveys are frequently restricted to single or widely separated profiles. So, in many cases we find ourselves with the following question: is the applicability of the 2-D hypothesis valid to extract geoelectric and geological information from real 3-D environments? The aim of this paper is to explore a few instructive but general situations to understand the basics of a 2-D interpretation of 3-D magnetotelluric data and to determine which data subset (TE-mode or TM-mode) is best for obtaining the electrical conductivity distribution of the subsurface using 2-D techniques. A review of the mathematical and physical fundamentals of the electromagnetic fields generated by a simple 3-D structure allows us to prioritise the choice of modes in a 2-D interpretation of responses influenced by 3-D structures. This analysis is corroborated by numerical results from synthetic models and by real data acquired by other authors. One important result of this analysis is that the mode most unaffected by 3-D effects depends on the position of the 3-D structure with respect to the regional 2-D strike direction. When the 3-D body is normal to the regional strike, the TE-mode is affected mainly by galvanic effects, while the TM-mode is affected by galvanic and inductive effects. In this case, a 2-D interpretation of the TM-mode is prone to error. When the 3-D body is parallel to the regional 2-D strike the TE-mode is affected by galvanic and inductive effects and the TM-mode is affected mainly by galvanic effects, making it more suitable for 2-D interpretation. In general, a wise 2-D interpretation of 3-D magnetotelluric data can be a guide to a reasonable geological interpretation.
Design, analysis, and interpretation of field quality-control data for water-sampling projects
Mueller, David K.; Schertz, Terry L.; Martin, Jeffrey D.; Sandstrom, Mark W.
2015-01-01
The report provides extensive information about statistical methods used to analyze quality-control data in order to estimate potential bias and variability in environmental data. These methods include construction of confidence intervals on various statistical measures, such as the mean, percentiles and percentages, and standard deviation. The methods are used to compare quality-control results with the larger set of environmental data in order to determine whether the effects of bias and variability might interfere with interpretation of these data. Examples from published reports are presented to illustrate how the methods are applied, how bias and variability are reported, and how the interpretation of environmental data can be qualified based on the quality-control analysis.
Statistical mapping of count survey data
Royle, J. Andrew; Link, W.A.; Sauer, J.R.; Scott, J. Michael; Heglund, Patricia J.; Morrison, Michael L.; Haufler, Jonathan B.; Wall, William A.
2002-01-01
We apply a Poisson mixed model to the problem of mapping (or predicting) bird relative abundance from counts collected from the North American Breeding Bird Survey (BBS). The model expresses the logarithm of the Poisson mean as a sum of a fixed term (which may depend on habitat variables) and a random effect which accounts for remaining unexplained variation. The random effect is assumed to be spatially correlated, thus providing a more general model than the traditional Poisson regression approach. Consequently, the model is capable of improved prediction when data are autocorrelated. Moreover, formulation of the mapping problem in terms of a statistical model facilitates a wide variety of inference problems which are cumbersome or even impossible using standard methods of mapping. For example, assessment of prediction uncertainty, including the formal comparison of predictions at different locations, or through time, using the model-based prediction variance is straightforward under the Poisson model (not so with many nominally model-free methods). Also, ecologists may generally be interested in quantifying the response of a species to particular habitat covariates or other landscape attributes. Proper accounting for the uncertainty in these estimated effects is crucially dependent on specification of a meaningful statistical model. Finally, the model may be used to aid in sampling design, by modifying the existing sampling plan in a manner which minimizes some variance-based criterion. Model fitting under this model is carried out using a simulation technique known as Markov Chain Monte Carlo. Application of the model is illustrated using Mourning Dove (Zenaida macroura) counts from Pennsylvania BBS routes. We produce both a model-based map depicting relative abundance, and the corresponding map of prediction uncertainty. We briefly address the issue of spatial sampling design under this model. Finally, we close with some discussion of mapping in relation to
Geologic interpretation of HCMM and aircraft thermal data
NASA Technical Reports Server (NTRS)
1982-01-01
Progress on the Heat Capacity Mapping Mission (HCMM) follow-on study is reported. Numerous image products for geologic interpretation of both HCMM and aircraft thermal data were produced. These include, among others, various combinations of the thermal data with LANDSAT and SEASAT data. The combined data sets were displayed using simple color composites, principal component color composites and black and white images, and hue, saturation intensity color composites. Algorithms for incorporating both atmospheric and elevation data simultaneously into the digital processing for creation of quantitatively correct thermal inertia images, are in the final development stage. A field trip to Death Valley was undertaken to field check the aircraft and HCMM data.
A study aid for seismic data interpretation and analysis
NASA Astrophysics Data System (ADS)
Seok, R.; Lee, Y.; Lee, B.; Lee, G.
2011-12-01
We present the workflow for 3-D seismic data interpretation and analysis that is routinely performed throughout the exploration phase in the industry. The workflow is used as a study aid for the first-year graduate students in the Department of Energy Resources Engineering at Pukyong National University, Busan, Korea. The data used in this work consist of 3-D seismic and well-log data from the Sooner field, Colorado, USA and 2-D and 3-D seismic data from the Penobscot surveys carried out in the Scotian shelf, Canada. The Sooner field data are part of the tutorial data sets of Kingdom Suite° which was used for data interpretation and mapping. The Penobscot data are available from the OpendTect°'s website. OpendTect° was used for seismic attribute generation and Hampson-Russell° was used for amplitude variation with offset (AVO) analysis and inversion. The workflow includes: (1) structural interpretation and mapping and 3-D visualization; (2) time-depth conversion; (3) (sequence) stratigraphic analysis; (4) attribute analysis and 3-D visualization; (5) quantitative analysis (e.g., AVO, inversion); and (6) volumetric calculations.
Interpretation of Student Data: Contextual Variables and Cultural Implications.
ERIC Educational Resources Information Center
Papalewis, Rosemary
This paper explores the common elements identified as context variables that may effect student evaluation of instruction, and presents literature from sociology, anthropology, and linguistics that offers renewed challenges to researchers in this area of data interpretation. The common context variables that are seen as effecting student…
Building software tools to help contextualize and interpret monitoring data
USDA-ARS?s Scientific Manuscript database
Even modest monitoring efforts at landscape scales produce large volumes of data.These are most useful if they can be interpreted relative to land potential or other similar sites. However, for many ecological systems reference conditions may not be defined or are poorly described, which hinders und...
Soil VisNIR chemometric performance statistics should be interpreted as random variables
NASA Astrophysics Data System (ADS)
Brown, David J.; Gasch, Caley K.; Poggio, Matteo; Morgan, Cristine L. S.
2015-04-01
Chemometric models are normally evaluated using performance statistics such as the Standard Error of Prediction (SEP) or the Root Mean Squared Error of Prediction (RMSEP). These statistics are used to evaluate the quality of chemometric models relative to other published work on a specific soil property or to compare the results from different processing and modeling techniques (e.g. Partial Least Squares Regression or PLSR and random forest algorithms). Claims are commonly made about the overall success of an application or the relative performance of different modeling approaches assuming that these performance statistics are fixed population parameters. While most researchers would acknowledge that small differences in performance statistics are not important, rarely are performance statistics treated as random variables. Given that we are usually comparing modeling approaches for general application, and given that the intent of VisNIR soil spectroscopy is to apply chemometric calibrations to larger populations than are included in our soil-spectral datasets, it is more appropriate to think of performance statistics as random variables with variation introduced through the selection of samples for inclusion in a given study and through the division of samples into calibration and validation sets (including spiking approaches). Here we look at the variation in VisNIR performance statistics for the following soil-spectra datasets: (1) a diverse US Soil Survey soil-spectral library with 3768 samples from all 50 states and 36 different countries; (2) 389 surface and subsoil samples taken from US Geological Survey continental transects; (3) the Texas Soil Spectral Library (TSSL) with 3000 samples; (4) intact soil core scans of Texas soils with 700 samples; (5) approximately 400 in situ scans from the Pacific Northwest region; and (6) miscellaneous local datasets. We find the variation in performance statistics to be surprisingly large. This has important
Network Data: Statistical Theory and New Models
2016-02-17
research covered a wide range of topics in statistics including analysis and methods for spectral clustering for sparse and structured networks...signals, bootstrapping, Lasso+OLS, confidence interval, concise comparative summarization, EM algorithm, spectral clustering , aerosol retrieval...covered a wide range of topics in statistics including analysis and methods for spectral clustering for sparse and structured networks [2,7,8,21
Statistical Analysis of DWPF ARG-1 Data
Harris, S.P.
2001-03-02
A statistical analysis of analytical results for ARG-1, an Analytical Reference Glass, blanks, and the associated calibration and bench standards has been completed. These statistics provide a means for DWPF to review the performance of their laboratory as well as identify areas of improvement.
NASA Astrophysics Data System (ADS)
Bellac, Michel Le
2014-11-01
Although nobody can question the practical efficiency of quantum mechanics, there remains the serious question of its interpretation. As Valerio Scarani puts it, "We do not feel at ease with the indistinguishability principle (that is, the superposition principle) and some of its consequences." Indeed, this principle which pervades the quantum world is in stark contradiction with our everyday experience. From the very beginning of quantum mechanics, a number of physicists--but not the majority of them!--have asked the question of its "interpretation". One may simply deny that there is a problem: according to proponents of the minimalist interpretation, quantum mechanics is self-sufficient and needs no interpretation. The point of view held by a majority of physicists, that of the Copenhagen interpretation, will be examined in Section 10.1. The crux of the problem lies in the status of the state vector introduced in the preceding chapter to describe a quantum system, which is no more than a symbolic representation for the Copenhagen school of thought. Conversely, one may try to attribute some "external reality" to this state vector, that is, a correspondence between the mathematical description and the physical reality. In this latter case, it is the measurement problem which is brought to the fore. In 1932, von Neumann was first to propose a global approach, in an attempt to build a purely quantum theory of measurement examined in Section 10.2. This theory still underlies modern approaches, among them those grounded on decoherence theory, or on the macroscopic character of the measuring apparatus: see Section 10.3. Finally, there are non-standard interpretations such as Everett's many worlds theory or the hidden variables theory of de Broglie and Bohm (Section 10.4). Note, however, that this variety of interpretations has no bearing whatsoever on the practical use of quantum mechanics. There is no controversy on the way we should use quantum mechanics!
Shock Classication of Ordinary Chondrites: New Data and Interpretations
NASA Astrophysics Data System (ADS)
Stoffler, D.; Keil, K.; Scott, E. R. D.
1992-07-01
Introduction. The recently proposed classification system for shocked chondrites (1) is based on a microscopic survey of 76 non-Antarctic H, L, and LL chondrites. Obviously, a larger database is highly desirable in order to confirm earlier conclusions and to allow for a statistically relevant interpretation of the data. Here, we report the shock classification of an additional 54 ordinary chondrites and summarize implications based on a total of 130 samples. New observations on shock effects. Continued studies of those shock effects in olivine and plagioclase that are indicative of the shock stages S1 - S6 as defined in (1) revealed the following: Planar deformation features in olivine, considered typical of stage S5, occur occasionally in stage S3 and are common in stage S4. In some S4 chondrites plagioclase is not partially isotropic but still birefringent coexisting with a small fraction of S3 olivines. Opaque shock veins occur not only in shock stage S3 and above (1) but have now been found in a few chondrites of shock stage S2. Thermal annealing of shock effects. Planar fractures and planar deformation features in olivine persist up to the temperatures required for recrystallization of olivine (> ca. 900 degrees C). Shock history of breccias. In a number of petrologic types 3 and 4 chondrites without recognizable (polymict) breccia texture, we found chondrules and olivine fragments with different shock histories ranging from S1 to S3. Regolith and fragmental breccias are polymict with regard to lithology and shock. The intensity of the latest shock typically varies from S1 to S4 in the breccias studied so far. Frequency distribution of shock stages. A significant difference between H and L chondrites is emerging in contrast to our previous statistics (1), whereas the conspicuous lack of shock stages S5 and S6 in type 3 and 4 chondrites is clearly confirmed (Fig. 1). Correlation between shock and noble gas content. The concentration of radiogenic argon and of
Toxic substances and human risk: principles of data interpretation
Tardiff, R.G.; Rodricks, J.V.
1988-01-01
This book provides a comprehensive overview of the relationship between toxicology and risk assessment and identifying the principles that should be used to evaluate toxicological data for human risk assessment. The book opens by distinguishing between the practice of toxicology as a science (observational and data-gathering activities) and its practice as an art (predictive or risk-estimating activities). This dichotomous nature produces the two elemental problems with which users of toxicological data must grapple. First, how relevant are data provided by the science of toxicology to assessment of human health risks. Second, what methods of data interpretation should be used to formulate hypotheses or predictions regarding human health risk.
2014-01-01
Background A new algorithm has been developed to enable the interpretation of black box models. The developed algorithm is agnostic to learning algorithm and open to all structural based descriptors such as fragments, keys and hashed fingerprints. The algorithm has provided meaningful interpretation of Ames mutagenicity predictions from both random forest and support vector machine models built on a variety of structural fingerprints. A fragmentation algorithm is utilised to investigate the model’s behaviour on specific substructures present in the query. An output is formulated summarising causes of activation and deactivation. The algorithm is able to identify multiple causes of activation or deactivation in addition to identifying localised deactivations where the prediction for the query is active overall. No loss in performance is seen as there is no change in the prediction; the interpretation is produced directly on the model’s behaviour for the specific query. Results Models have been built using multiple learning algorithms including support vector machine and random forest. The models were built on public Ames mutagenicity data and a variety of fingerprint descriptors were used. These models produced a good performance in both internal and external validation with accuracies around 82%. The models were used to evaluate the interpretation algorithm. Interpretation was revealed that links closely with understood mechanisms for Ames mutagenicity. Conclusion This methodology allows for a greater utilisation of the predictions made by black box models and can expedite further study based on the output for a (quantitative) structure activity model. Additionally the algorithm could be utilised for chemical dataset investigation and knowledge extraction/human SAR development. PMID:24661325
Statistical modelling for falls count data.
Ullah, Shahid; Finch, Caroline F; Day, Lesley
2010-03-01
Falls and their injury outcomes have count distributions that are highly skewed toward the right with clumping at zero, posing analytical challenges. Different modelling approaches have been used in the published literature to describe falls count distributions, often without consideration of the underlying statistical and modelling assumptions. This paper compares the use of modified Poisson and negative binomial (NB) models as alternatives to Poisson (P) regression, for the analysis of fall outcome counts. Four different count-based regression models (P, NB, zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB)) were each individually fitted to four separate fall count datasets from Australia, New Zealand and United States. The finite mixtures of P and NB regression models were also compared to the standard NB model. Both analytical (F, Vuong and bootstrap tests) and graphical approaches were used to select and compare models. Simulation studies assessed the size and power of each model fit. This study confirms that falls count distributions are over-dispersed, but not dispersed due to excess zero counts or heterogeneous population. Accordingly, the P model generally provided the poorest fit to all datasets. The fit improved significantly with NB and both zero-inflated models. The fit was also improved with the NB model, compared to finite mixtures of both P and NB regression models. Although there was little difference in fit between NB and ZINB models, in the interests of parsimony it is recommended that future studies involving modelling of falls count data routinely use the NB models in preference to the P or ZINB or finite mixture distribution. The fact that these conclusions apply across four separate datasets from four different samples of older people participating in studies of different methodology, adds strength to this general guiding principle.
47 CFR 1.363 - Introduction of statistical data.
Code of Federal Regulations, 2010 CFR
2010-10-01
... 47 Telecommunication 1 2010-10-01 2010-10-01 false Introduction of statistical data. 1.363 Section... Proceedings Evidence § 1.363 Introduction of statistical data. (a) All statistical studies, offered in... analyses, and experiments, and those parts of other studies involving statistical methodology shall...
47 CFR 1.363 - Introduction of statistical data.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 47 Telecommunication 1 2011-10-01 2011-10-01 false Introduction of statistical data. 1.363 Section... Proceedings Evidence § 1.363 Introduction of statistical data. (a) All statistical studies, offered in... analyses, and experiments, and those parts of other studies involving statistical methodology shall...
47 CFR 1.363 - Introduction of statistical data.
Code of Federal Regulations, 2014 CFR
2014-10-01
... 47 Telecommunication 1 2014-10-01 2014-10-01 false Introduction of statistical data. 1.363 Section... Proceedings Evidence § 1.363 Introduction of statistical data. (a) All statistical studies, offered in... analyses, and experiments, and those parts of other studies involving statistical methodology shall...
47 CFR 1.363 - Introduction of statistical data.
Code of Federal Regulations, 2013 CFR
2013-10-01
... 47 Telecommunication 1 2013-10-01 2013-10-01 false Introduction of statistical data. 1.363 Section... Proceedings Evidence § 1.363 Introduction of statistical data. (a) All statistical studies, offered in... analyses, and experiments, and those parts of other studies involving statistical methodology shall...
A technique for interpretation of multispectral remote sensor data
NASA Technical Reports Server (NTRS)
Williamson, A. N.
1973-01-01
The author has identified the following significant results. The U.S. Army Engineer Waterways Experiment Station is engaged in a study to detect from ERTS-1 satellite data alterations to the absorption and scattering properties caused by movement of suspended particles and solutes in selected areas of the Chesapeake Bay and to correlate the data to determine the feasibility of delineating flow patterns, flushing action of the estuary, and sediment and pollutant dispersion. As a part of this study, ADP techniques have been developed that permit automatic interpretation of data from any multispectral remote sensor with computer systems which have limited memory capacity and computing speed. The multispectral remote sensor is considered as a reflectance spectrophotometer. The data which define the spectral reflectance characteristics of a scene are scanned pixel by pixel. Each pixel whose spectral reflectance matches a reference spectrum is identified, and the results are shown in a map that identifies the locations where spectrum matches were detected and spectrum that was matched. The interpretation technique is described and an example of interpreted data from ERTS-1 is presented.
New Statistical Approach to the Analysis of Hierarchical Data
NASA Astrophysics Data System (ADS)
Neuman, S. P.; Guadagnini, A.; Riva, M.
2014-12-01
Many variables possess a hierarchical structure reflected in how their increments vary in space and/or time. Quite commonly the increments (a) fluctuate in a highly irregular manner; (b) possess symmetric, non-Gaussian frequency distributions characterized by heavy tails that often decay with separation distance or lag; (c) exhibit nonlinear power-law scaling of sample structure functions in a midrange of lags, with breakdown in such scaling at small and large lags; (d) show extended power-law scaling (ESS) at all lags; and (e) display nonlinear scaling of power-law exponent with order of sample structure function. Some interpret this to imply that the variables are multifractal, which explains neither breakdowns in power-law scaling nor ESS. We offer an alternative interpretation consistent with all above phenomena. It views data as samples from stationary, anisotropic sub-Gaussian random fields subordinated to truncated fractional Brownian motion (tfBm) or truncated fractional Gaussian noise (tfGn). The fields are scaled Gaussian mixtures with random variances. Truncation of fBm and fGn entails filtering out components below data measurement or resolution scale and above domain scale. Our novel interpretation of the data allows us to obtain maximum likelihood estimates of all parameters characterizing the underlying truncated sub-Gaussian fields. These parameters in turn make it possible to downscale or upscale all statistical moments to situations entailing smaller or larger measurement or resolution and sampling scales, respectively. They also allow one to perform conditional or unconditional Monte Carlo simulations of random field realizations corresponding to these scales. Aspects of our approach are illustrated on field and laboratory measured porous and fractured rock permeabilities, as well as soil texture characteristics and neural network estimates of unsaturated hydraulic parameters in a deep vadose zone near Phoenix, Arizona. We also use our approach
Mobile Collection and Automated Interpretation of EEG Data
NASA Technical Reports Server (NTRS)
Mintz, Frederick; Moynihan, Philip
2007-01-01
A system that would comprise mobile and stationary electronic hardware and software subsystems has been proposed for collection and automated interpretation of electroencephalographic (EEG) data from subjects in everyday activities in a variety of environments. By enabling collection of EEG data from mobile subjects engaged in ordinary activities (in contradistinction to collection from immobilized subjects in clinical settings), the system would expand the range of options and capabilities for performing diagnoses. Each subject would be equipped with one of the mobile subsystems, which would include a helmet that would hold floating electrodes (see figure) in those positions on the patient s head that are required in classical EEG data-collection techniques. A bundle of wires would couple the EEG signals from the electrodes to a multi-channel transmitter also located in the helmet. Electronic circuitry in the helmet transmitter would digitize the EEG signals and transmit the resulting data via a multidirectional RF patch antenna to a remote location. At the remote location, the subject s EEG data would be processed and stored in a database that would be auto-administered by a newly designed relational database management system (RDBMS). In this RDBMS, in nearly real time, the newly stored data would be subjected to automated interpretation that would involve comparison with other EEG data and concomitant peer-reviewed diagnoses stored in international brain data bases administered by other similar RDBMSs.
Amplitude interpretation and visualization of three-dimensional reflection data
Enachescu, M.E. )
1994-07-01
Digital recording and processing of modern three-dimensional surveys allow for relative good preservation and correct spatial positioning of seismic reflection amplitude. A four-dimensional seismic reflection field matrix R (x,y,t,A), which can be computer visualized (i.e., real-time interactively rendered, edited, and animated), is now available to the interpreter. The amplitude contains encoded geological information indirectly related to lithologies and reservoir properties. The magnitude of the amplitude depends not only on the acoustic impedance contrast across a boundary, but is also strongly affected by the shape of the reflective boundary. This allows the interpreter to image subtle tectonic and structural elements not obvious on time-structure maps. The use of modern workstations allows for appropriate color coding of the total available amplitude range, routine on-screen time/amplitude extraction, and late display of horizon amplitude maps (horizon slices) or complex amplitude-structure spatial visualization. Stratigraphic, structural, tectonic, fluid distribution, and paleogeographic information are commonly obtained by displaying the amplitude variation A = A(x,y,t) associated with a particular reflective surface or seismic interval. As illustrated with several case histories, traditional structural and stratigraphic interpretation combined with a detailed amplitude study generally greatly enhance extraction of subsurface geological information from a reflection data volume. In the context of three-dimensional seismic surveys, the horizon amplitude map (horizon slice), amplitude attachment to structure and [open quotes]bright clouds[close quotes] displays are very powerful tools available to the interpreter.
Metal Complexes of EDTA: An Exercise in Data Interpretation
NASA Astrophysics Data System (ADS)
Mitchell, Philip C. H.
1997-10-01
Stability constants of metal complexes of edta with main group and transition metals are correlated with properties of the elements and cations (ion charge, atomic and ionic radii, ionization energies and electronegativities) and interpreted with an ionic bonding model including a covalent contribution. Enthalpy and entropy contributions are discussed. It is shown how chemists recognize patterns in data with the help of a general theory and so develop a model.
Bayesian Statistics for Biological Data: Pedigree Analysis
ERIC Educational Resources Information Center
Stanfield, William D.; Carlton, Matthew A.
2004-01-01
The use of Bayes' formula is applied to the biological problem of pedigree analysis to show that the Bayes' formula and non-Bayesian or "classical" methods of probability calculation give different answers. First year college students of biology can be introduced to the Bayesian statistics.
Bayesian Statistics for Biological Data: Pedigree Analysis
ERIC Educational Resources Information Center
Stanfield, William D.; Carlton, Matthew A.
2004-01-01
The use of Bayes' formula is applied to the biological problem of pedigree analysis to show that the Bayes' formula and non-Bayesian or "classical" methods of probability calculation give different answers. First year college students of biology can be introduced to the Bayesian statistics.
Statistical Treatment of Looking-Time Data
ERIC Educational Resources Information Center
Csibra, Gergely; Hernik, Mikolaj; Mascaro, Olivier; Tatone, Denis; Lengyel, Máté
2016-01-01
Looking times (LTs) are frequently measured in empirical research on infant cognition. We analyzed the statistical distribution of LTs across participants to develop recommendations for their treatment in infancy research. Our analyses focused on a common within-subject experimental design, in which longer looking to novel or unexpected stimuli is…
Implementation of ILLIAC 4 algorithms for multispectral image interpretation. [earth resources data
NASA Technical Reports Server (NTRS)
Ray, R. M.; Thomas, J. D.; Donovan, W. E.; Swain, P. H.
1974-01-01
Research has focused on the design and partial implementation of a comprehensive ILLIAC software system for computer-assisted interpretation of multispectral earth resources data such as that now collected by the Earth Resources Technology Satellite. Research suggests generally that the ILLIAC 4 should be as much as two orders of magnitude more cost effective than serial processing computers for digital interpretation of ERTS imagery via multivariate statistical classification techniques. The potential of the ARPA Network as a mechanism for interfacing geographically-dispersed users to an ILLIAC 4 image processing facility is discussed.
Statistical data of the uranium industry
1980-01-01
This document is a compilation of historical facts and figures through 1979. These statistics are based primarily on information provided voluntarily by the uranium exploration, mining, and milling companies. The production, reserves, drilling, and production capability information has been reported in a manner which avoids disclosure of proprietary information. Only the totals for the $1.5 reserves are reported. Because of increased interest in higher cost resources for long range planning purposes, a section covering the distribution of $100 per pound reserves statistics has been newly included. A table of mill recovery ranges for the January 1, 1980 reserves has also been added to this year's edition. The section on domestic uranium production capability has been deleted this year but will be included next year. The January 1, 1980 potential resource estimates are unchanged from the January 1, 1979 estimates.
Interpretation of Landsat-4 Thematic Mapper and Multispectral Scanner data for forest surveys
NASA Technical Reports Server (NTRS)
Benson, A. S.; Degloria, S. D.
1985-01-01
Landsat-4 Thematic Mapper (TM) and Multispectral Scanner (MSS) data were evaluated by interpreting film and digital products and statistical data for selected forest cover types in California. Significant results were: (1) TM color image products should contain a spectral band in the visible (bands 1, 2, or 3), near infrared (band 4), and middle infrared (band 5) regions for maximizing the interpretability of vegetation types; (2) TM color composites should contain band 4 in all cases even at the expense of excluding band 5; and (3) MSS color composites were more interpretable than all TM color composites for certain cover types and for all cover types when band 4 was excluded from the TM composite.
Reiber, Hansotto
2016-06-01
The physiological and biophysical knowledge base for interpretations of cerebrospinal fluid (CSF) data and reference ranges are essential for the clinical pathologist and neurochemist. With the popular description of the CSF flow dependent barrier function, the dynamics and concentration gradients of blood-derived, brain-derived and leptomeningeal proteins in CSF or the specificity-independent functions of B-lymphocytes in brain also the neurologist, psychiatrist, neurosurgeon as well as the neuropharmacologist may find essentials for diagnosis, research or development of therapies. This review may help to replace the outdated ideas like "leakage" models of the barriers, linear immunoglobulin Index Interpretations or CSF electrophoresis. Calculations, Interpretations and analytical pitfalls are described for albumin quotients, quantitation of immunoglobulin synthesis in Reibergrams, oligoclonal IgG, IgM analysis, the polyspecific ( MRZ- ) antibody reaction, the statistical treatment of CSF data and general quality assessment in the CSF laboratory. The diagnostic relevance is documented in an accompaning review.
Practical Considerations in Clinical Pathology Data Interpretation and Description.
Hall, Robert L
2017-02-01
Although interpretation and description of clinical pathology test results for any preclinical safety assessment study should employ a consistent standard approach, companies differ regarding that approach and the appearance of the end product. Some rely heavily on statistical analysis, others do not. Some believe reference intervals are important, most do not. Some prefer severity of effects be described by percentage differences from, or multiples of, baseline or control, others prefer only word modifiers. Some expect a definitive decision for every potential effect, others accept uncertainty. This commentary addresses these differences and underscores the need for flexibility in a "consistent standard approach" because the conditions of every study are unique. This article constitutes an overview of material originally presented at Session 2 of the 2016 Society of Toxicologic Pathology Annual Symposium.
New Results on Nuclear Fission--Data and Interpretation
Kelic, Aleksandra; Ricciardi, Maria Valentina; Schmidt, Karl-Heinz
2008-04-17
An overview on phenomena observed in low-energy fission is presented, including new results from a GSI experiment with relativistic secondary beams. The interpretation of the structural effects in terms of fission channels reveals an astonishing stability of the fission-channel positions in the heavy fragment in nuclear charge in contrast to the previously assumed constancy in mass. The statistical model is applied to deduce the relevant characteristics of the potential-energy surface. It is assumed that the different degrees of freedom are frozen at a specific stage each on the descent from saddle to scission due to the fission dynamics. Evidence for the separability of compound-nucleus and fragment properties in fission is deduced.
Patton, Charles J.; Gilroy, Edward J.
1999-01-01
Data on which this report is based, including nutrient concentrations in synthetic reference samples determined concurrently with those in real samples, are extensive (greater than 20,000 determinations) and have been published separately. In addition to confirming the well-documented instability of nitrite in acidified samples, this study also demonstrates that when biota are removed from samples at collection sites by 0.45-micrometer membrane filtration, subsequent preservation with sulfuric acid or mercury (II) provides no statistically significant improvement in nutrient concentration stability during storage at 4 degrees Celsius for 30 days. Biocide preservation had no statistically significant effect on the 30-day stability of phosphorus concentrations in whole-water splits from any of the 15 stations, but did stabilize Kjeldahl nitrogen concentrations in whole-water splits from three data-collection stations where ammonium accounted for at least half of the measured Kjeldahl nitrogen.
Interdisciplinary applications and interpretations of remotely sensed data
NASA Technical Reports Server (NTRS)
Peterson, G. W.; Mcmurtry, G. J.
1972-01-01
An interdisciplinary approach to use remote sensor for the inventory of natural resources is discussed. The areas under investigation are land use, determination of pollution sources and damage, and analysis of geologic structure and terrain. The geographical area of primary interest is the Susquehanna River Basin. Descriptions of the data obtained by aerial cameras, multiband cameras, optical mechanical scanners, and radar are included. The Earth Resources Technology Satellite and Skylab program are examined. Interpretations of spacecraft data to show specific areas of interest are developed.
Quantitative interpretation of Great Lakes remote sensing data
NASA Technical Reports Server (NTRS)
Shook, D. F.; Salzman, J.; Svehla, R. A.; Gedney, R. T.
1980-01-01
The paper discusses the quantitative interpretation of Great Lakes remote sensing water quality data. Remote sensing using color information must take into account (1) the existence of many different organic and inorganic species throughout the Great Lakes, (2) the occurrence of a mixture of species in most locations, and (3) spatial variations in types and concentration of species. The radiative transfer model provides a potential method for an orderly analysis of remote sensing data and a physical basis for developing quantitative algorithms. Predictions and field measurements of volume reflectances are presented which show the advantage of using a radiative transfer model. Spectral absorptance and backscattering coefficients for two inorganic sediments are reported.
Quantitative interpretation of Great Lakes remote sensing data
NASA Technical Reports Server (NTRS)
Shook, D. F.; Salzman, J.; Svehla, R. A.; Gedney, R. T.
1980-01-01
The paper discusses the quantitative interpretation of Great Lakes remote sensing water quality data. Remote sensing using color information must take into account (1) the existence of many different organic and inorganic species throughout the Great Lakes, (2) the occurrence of a mixture of species in most locations, and (3) spatial variations in types and concentration of species. The radiative transfer model provides a potential method for an orderly analysis of remote sensing data and a physical basis for developing quantitative algorithms. Predictions and field measurements of volume reflectances are presented which show the advantage of using a radiative transfer model. Spectral absorptance and backscattering coefficients for two inorganic sediments are reported.
Laboratory study supporting the interpretation of Solar Dynamics Observatory data
Trabert, E.; Beiersdorfer, P.
2015-01-29
High-resolution extreme ultraviolet spectra of ions in an electron beam ion trap are investigated as a laboratory complement of the moderate-resolution observation bands of the AIA experiment on board the Solar Dynamics Observatory (SDO) spacecraft. Here, the latter observations depend on dominant iron lines of various charge states which in combination yield temperature information on the solar plasma. Our measurements suggest additions to the spectral models that are used in the SDO data interpretation. In the process, we also note a fair number of inconsistencies among the wavelength reference data bases.
Statistics: The Shape of the Data. Used Numbers: Real Data in the Classroom. Grades 4-6.
ERIC Educational Resources Information Center
Russell, Susan Jo; Corwin, Rebecca B.
A unit of study that introduces collecting, representing, describing, and interpreting data is presented. Suitable for students in grades 4 through 6, it provides a foundation for further work in statistics and data analysis. The investigations may extend from one to four class sessions and are grouped into three parts: "Introduction to Data…
Borehole seismic data processing and interpretation: New free software
NASA Astrophysics Data System (ADS)
Farfour, Mohammed; Yoon, Wang Jung
2015-12-01
Vertical Seismic Profile (VSP) surveying is a vital tool in subsurface imaging and reservoir characterization. The technique allows geophysicists to infer critical information that cannot be obtained otherwise. MVSP is a new MATLAB tool with a graphical user interface (GUI) for VSP shot modeling, data processing, and interpretation. The software handles VSP data from the loading and preprocessing stages to the final stage of corridor plotting and integration with well and seismic data. Several seismic and signal processing toolboxes are integrated and modified to suit and enrich the processing and display packages. The main motivation behind the development of the software is to provide new geoscientists and students in the geoscience fields with free software that brings together all VSP modules in one easy-to-use package. The software has several modules that allow the user to test, process, compare, visualize, and produce publication-quality results. The software is developed as a stand-alone MATLAB application that requires only MATLAB Compiler Runtime (MCR) to run with full functionality. We present a detailed description of MVSP and use the software to create synthetic VSP data. The data are then processed using different available tools. Next, real data are loaded and fully processed using the software. The data are then integrated with well data for more detailed analysis and interpretation. In order to evaluate the software processing flow accuracy, the same data are processed using commercial software. Comparison of the processing results shows that MVSP is able to process VSP data as efficiently as commercial software packages currently used in industry, and provides similar high-quality processed data.
Graphic Strategies for Analyzing and Interpreting Curricular Mapping Data
Leonard, Sean T.
2010-01-01
Objective To describe curricular mapping strategies used in analyzing and interpreting curricular mapping data and present findings on how these strategies were used to facilitate curricular development. Design Nova Southeastern University's doctor of pharmacy curriculum was mapped to the college's educational outcomes. The mapping process included development of educational outcomes followed by analysis of course material and semi-structured interviews with course faculty members. Data collected per course outcome included learning opportunities and assessment measures used. Assessment Nearly 1,000 variables and 10,000 discrete rows of curricular data were collected. Graphic representations of curricular data were created using bar charts and stacked area graphs relating the learning opportunities to the educational outcomes. Graphs were used in the curricular evaluation and development processes to facilitate the identification of curricular holes, sequencing misalignments, learning opportunities, and assessment measures. Conclusion Mapping strategies that use graphic representations of curricular data serve as effective diagnostic and curricular development tools. PMID:20798804
Graphic strategies for analyzing and interpreting curricular mapping data.
Armayor, Graciela M; Leonard, Sean T
2010-06-15
To describe curricular mapping strategies used in analyzing and interpreting curricular mapping data and present findings on how these strategies were used to facilitate curricular development. Nova Southeastern University's doctor of pharmacy curriculum was mapped to the college's educational outcomes. The mapping process included development of educational outcomes followed by analysis of course material and semi-structured interviews with course faculty members. Data collected per course outcome included learning opportunities and assessment measures used. Nearly 1,000 variables and 10,000 discrete rows of curricular data were collected. Graphic representations of curricular data were created using bar charts and stacked area graphs relating the learning opportunities to the educational outcomes. Graphs were used in the curricular evaluation and development processes to facilitate the identification of curricular holes, sequencing misalignments, learning opportunities, and assessment measures. Mapping strategies that use graphic representations of curricular data serve as effective diagnostic and curricular development tools.
Data relay system specifications for ERTS image interpretation
NASA Technical Reports Server (NTRS)
Daniel, J. F.
1970-01-01
Experiments with the Data Collection System (DCS) of the Earth Resources Technology Satellites (ERTS) have been developed to stress ERTS applications in the Earth Resources Observation Systems (EROS) Program. Active pursuit of this policy has resulted in the design of eight specific experiments requiring a total of 98 DCS ground-data platforms. Of these eight experiments, six are intended to make use of DCS data as an aid in image interpretation, while two make use of the capability to relay data from remote locations. Preliminary discussions regarding additional experiments indicate a need for at least 150 DCS platforms within the EROS Program for ERTS experimentation. Results from the experiments will be used to assess the DCS suitability for satellites providing on-line, real-time, data relay capability. The rationale of the total DCS network of ground platforms and the relationship of each experiment to that rationale are discussed.
Interpretation of Absorption Bands in Airborne Hyperspectral Radiance Data
Szekielda, Karl H.; Bowles, Jeffrey H.; Gillis, David B.; Miller, W. David
2009-01-01
It is demonstrated that hyperspectral imagery can be used, without atmospheric correction, to determine the presence of accessory phytoplankton pigments in coastal waters using derivative techniques. However, care must be taken not to confuse other absorptions for those caused by the presence of pigments. Atmospheric correction, usually the first step to making products from hyperspectral data, may not completely remove Fraunhofer lines and atmospheric absorption bands and these absorptions may interfere with identification of phytoplankton accessory pigments. Furthermore, the ability to resolve absorption bands depends on the spectral resolution of the spectrometer, which for a fixed spectral range also determines the number of observed bands. Based on this information, a study was undertaken to determine under what circumstances a hyperspectral sensor may determine the presence of pigments. As part of the study a hyperspectral imager was used to take high spectral resolution data over two different water masses. In order to avoid the problems associated with atmospheric correction this data was analyzed as radiance data without atmospheric correction. Here, the purpose was to identify spectral regions that might be diagnostic for photosynthetic pigments. Two well proven techniques were used to aid in absorption band recognition, the continuum removal of the spectra and the fourth derivative. The findings in this study suggest that interpretation of absorption bands in remote sensing data, whether atmospherically corrected or not, have to be carefully reviewed when they are interpreted in terms of photosynthetic pigments. PMID:22574053
Interpretation of absorption bands in airborne hyperspectral radiance data.
Szekielda, Karl H; Bowles, Jeffrey H; Gillis, David B; Miller, W David
2009-01-01
It is demonstrated that hyperspectral imagery can be used, without atmospheric correction, to determine the presence of accessory phytoplankton pigments in coastal waters using derivative techniques. However, care must be taken not to confuse other absorptions for those caused by the presence of pigments. Atmospheric correction, usually the first step to making products from hyperspectral data, may not completely remove Fraunhofer lines and atmospheric absorption bands and these absorptions may interfere with identification of phytoplankton accessory pigments. Furthermore, the ability to resolve absorption bands depends on the spectral resolution of the spectrometer, which for a fixed spectral range also determines the number of observed bands. Based on this information, a study was undertaken to determine under what circumstances a hyperspectral sensor may determine the presence of pigments. As part of the study a hyperspectral imager was used to take high spectral resolution data over two different water masses. In order to avoid the problems associated with atmospheric correction this data was analyzed as radiance data without atmospheric correction. Here, the purpose was to identify spectral regions that might be diagnostic for photosynthetic pigments. Two well proven techniques were used to aid in absorption band recognition, the continuum removal of the spectra and the fourth derivative. The findings in this study suggest that interpretation of absorption bands in remote sensing data, whether atmospherically corrected or not, have to be carefully reviewed when they are interpreted in terms of photosynthetic pigments.
Huang, Ruili; Southall, Noel; Xia, Menghang; Cho, Ming-Hsuang; Jadhav, Ajit; Nguyen, Dac-Trung; Inglese, James; Tice, Raymond R.; Austin, Christopher P.
2009-01-01
In support of the U.S. Tox21 program, we have developed a simple and chemically intuitive model we call weighted feature significance (WFS) to predict the toxicological activity of compounds, based on the statistical enrichment of structural features in toxic compounds. We trained and tested the model on the following: (1) data from quantitative high–throughput screening cytotoxicity and caspase activation assays conducted at the National Institutes of Health Chemical Genomics Center, (2) data from Salmonella typhimurium reverse mutagenicity assays conducted by the U.S. National Toxicology Program, and (3) hepatotoxicity data published in the Registry of Toxic Effects of Chemical Substances. Enrichments of structural features in toxic compounds are evaluated for their statistical significance and compiled into a simple additive model of toxicity and then used to score new compounds for potential toxicity. The predictive power of the model for cytotoxicity was validated using an independent set of compounds from the U.S. Environmental Protection Agency tested also at the National Institutes of Health Chemical Genomics Center. We compared the performance of our WFS approach with classical classification methods such as Naive Bayesian clustering and support vector machines. In most test cases, WFS showed similar or slightly better predictive power, especially in the prediction of hepatotoxic compounds, where WFS appeared to have the best performance among the three methods. The new algorithm has the important advantages of simplicity, power, interpretability, and ease of implementation. PMID:19805409
The Systematic Interpretation of Cosmic Ray Data (The Transport Project)
NASA Technical Reports Server (NTRS)
Guzik, T. Gregory
1997-01-01
The Transport project's primary goals were to: (1) Provide measurements of critical fragmentation cross sections; (2) Study the cross section systematics; (3) Improve the galactic cosmic ray propagation methodology; and (4) Use the new cross section measurements to improve the interpretation of cosmic ray data. To accomplish these goals a collaboration was formed consisting of researchers in the US at Louisiana State University (LSU), Lawrence Berkeley Laboratory (LBL), Goddard Space Flight Center (GSFC), the University of Minnesota (UM), New Mexico State University (NMSU), in France at the Centre d'Etudes de Saclay and in Italy at the Universita di Catania. The US institutions, lead by LSU, were responsible for measuring new cross sections using the LBL HISS facility, analysis of these measurements and their application to interpreting cosmic ray data. France developed a liquid hydrogen target that was used in the HISS experiment and participated in the data interpretation. Italy developed a Multifunctional Neutron Spectrometer (MUFFINS) for the HISS runs to measure the energy spectra, angular distributions and multiplicities of neutrons emitted during the high energy interactions. The Transport Project was originally proposed to NASA during Summer, 1988 and funding began January, 1989. Transport was renewed twice (1991, 1994) and finally concluded at LSU on September, 30, 1997. During the more than 8 years of effort we had two major experiment runs at LBL, obtained data on the interaction of twenty different beams with a liquid hydrogen target, completed the analysis of fifteen of these datasets obtaining 590 new cross section measurements, published nine journal articles as well as eighteen conference proceedings papers, and presented more than thirty conference talks.
Empirical approach to interpreting card-sorting data
NASA Astrophysics Data System (ADS)
Wolf, Steven F.; Dougherty, Daniel P.; Kortemeyer, Gerd
2012-06-01
Since it was first published 30 years ago, the seminal paper of Chi et al. on expert and novice categorization of introductory problems led to a plethora of follow-up studies within and outside of the area of physics [Cogn. Sci. 5, 121 (1981)COGSD50364-021310.1207/s15516709cog0502_2]. These studies frequently encompass “card-sorting” exercises whereby the participants group problems. While this technique certainly allows insights into problem solving approaches, simple descriptive statistics more often than not fail to find significant differences between experts and novices. In moving beyond descriptive statistics, we describe a novel microscopic approach that takes into account the individual identity of the cards and uses graph theory and models to visualize, analyze, and interpret problem categorization experiments. We apply these methods to an introductory physics (mechanics) problem categorization experiment, and find that most of the variation in sorting outcome is not due to the sorter being an expert versus a novice, but rather due to an independent characteristic that we named “stacker” versus “spreader.” The fact that the expert-novice distinction only accounts for a smaller amount of the variation may explain the frequent null results when conducting these experiments.
Urovi, V; Jimenez-Del-Toro, O; Dubosson, F; Ruiz Torres, A; Schumacher, M I
2017-02-01
This paper describes a novel temporal logic-based framework for reasoning with continuous data collected from wearable sensors. The work is motivated by the Metabolic Syndrome, a cluster of conditions which are linked to obesity and unhealthy lifestyle. We assume that, by interpreting the physiological parameters of continuous monitoring, we can identify which patients have a higher risk of Metabolic Syndrome. We define temporal patterns for reasoning with continuous data and specify the coordination mechanisms for combining different sets of clinical guidelines that relate to this condition. The proposed solution is tested with data provided by twenty subjects, which used sensors for four days of continuous monitoring. The results are compared to the gold standard. The novelty of the framework stands in extending a temporal logic formalism, namely the Event Calculus, with temporal patterns. These patterns are helpful to specify the rules for reasoning with continuous data and in combining new knowledge into one consistent outcome that is tailored to the patient's profile. The overall approach opens new possibilities for delivering patient-tailored interventions and educational material before the patients present the symptoms of the disease.
Presentation and interpretation of chemical data for igneous rocks
Wright, T.L.
1974-01-01
Arguments are made in favor of using variation diagrams to plot analyses of igneous rocks and their derivatives and modeling differentiation processes by least-squares mixing procedures. These methods permit study of magmatic differentiation and related processes in terms of all of the chemical data available. Data are presented as they are reported by the chemist and specific processes may be modeled and either quantitatively described or rejected as inappropriate or too simple. Examples are given of the differing interpretations that can arise when data are plotted on an AEM ternary vs. the same data on a full set of MgO variation diagrams. Mixing procedures are illustrated with reference to basaltic lavas from the Columbia Plateau. ?? 1974 Springer-Verlag.
Lee, L.; Helsel, D.
2005-01-01
Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.
A derivation of the statistical characteristics of SAR imagery data. [Rayleigh speckle statistics
NASA Technical Reports Server (NTRS)
Wu, C.
1981-01-01
Basic statistical properties of the speckle effect and the associated spatial correlation of SAR image data are discussed. Statistics of SAR sensed measurement and their relationships to the surface mean power reflectivity are derived. The Rayleigh speckle model is reviewed. Applications of the derived statistics to SAR radiometric measures and image processing are considered.
A derivation of the statistical characteristics of SAR imagery data. [Rayleigh speckle statistics
NASA Technical Reports Server (NTRS)
Wu, C.
1981-01-01
Basic statistical properties of the speckle effect and the associated spatial correlation of SAR image data are discussed. Statistics of SAR sensed measurement and their relationships to the surface mean power reflectivity are derived. The Rayleigh speckle model is reviewed. Applications of the derived statistics to SAR radiometric measures and image processing are considered.
NASA Astrophysics Data System (ADS)
Dralle, D.; Karst, N.; Thompson, S. E.
2015-12-01
Multiple competing theories suggest that power law behavior governs the observed first-order dynamics of streamflow recessions - the important process by which catchments dry-out via the stream network, altering the availability of surface water resources and in-stream habitat. Frequently modeled as: dq/dt = -aqb, recessions typically exhibit a high degree of variability, even within a single catchment, as revealed by significant shifts in the values of "a" and "b" across recession events. One potential source of this variability lies in underlying, hard-to-observe fluctuations in how catchment water storage is partitioned amongst distinct storage elements, each having different discharge behaviors. Testing this and competing hypotheses with widely available streamflow timeseries, however, has been hindered by a power law scaling artifact that obscures meaningful covariation between the recession parameters, "a" and "b". Here we briefly outline a technique that removes this artifact, revealing intriguing new patterns in the joint distribution of recession parameters. Using long-term flow data from catchments in Northern California, we explore temporal variations, and find that the "a" parameter varies strongly with catchment wetness. Then we explore how the "b" parameter changes with "a", and find that measures of its variation are maximized at intermediate "a" values. We propose an interpretation of this pattern based on statistical mechanics, meaning "b" can be viewed as an indicator of the catchment "microstate" - i.e. the partitioning of storage - and "a" as a measure of the catchment macrostate (i.e. the total storage). In statistical mechanics, entropy (i.e. microstate variance, that is the variance of "b") is maximized for intermediate values of extensive variables (i.e. wetness, "a"), as observed in the recession data. This interpretation of "a" and "b" was supported by model runs using a multiple-reservoir catchment toy model, and lends support to the
Implications of pyrosequencing error correction for biological data interpretation.
Bakker, Matthew G; Tu, Zheng J; Bradeen, James M; Kinkel, Linda L
2012-01-01
There has been a rapid proliferation of approaches for processing and manipulating second generation DNA sequence data. However, users are often left with uncertainties about how the choice of processing methods may impact biological interpretation of data. In this report, we probe differences in output between two different processing pipelines: a de-noising approach using the AmpliconNoise algorithm for error correction, and a standard approach using quality filtering and preclustering to reduce error. There was a large overlap in reads culled by each method, although AmpliconNoise removed a greater net number of reads. Most OTUs produced by one method had a clearly corresponding partner in the other. Although each method resulted in OTUs consisting entirely of reads that were culled by the other method, there were many more such OTUs formed in the standard pipeline. Total OTU richness was reduced by AmpliconNoise processing, but per-sample OTU richness, diversity and evenness were increased. Increases in per-sample richness and diversity may be a result of AmpliconNoise processing producing a more even OTU rank-abundance distribution. Because communities were randomly subsampled to equalize sample size across communities, and because rare sequence variants are less likely to be selected during subsampling, fewer OTUs were lost from individual communities when subsampling AmpliconNoise-processed data. In contrast to taxon-based diversity estimates, phylogenetic diversity was reduced even on a per-sample basis by de-noising, and samples switched widely in diversity rankings. This work illustrates the significant impacts of processing pipelines on the biological interpretations that can be made from pyrosequencing surveys. This study provides important cautions for analyses of contemporary data, for requisite data archiving (processed vs. non-processed data), and for drawing comparisons among studies performed using distinct data processing pipelines.
Analysis and interpretation of diffraction data from complex, anisotropic materials
NASA Astrophysics Data System (ADS)
Tutuncu, Goknur
Most materials are elastically anisotropic and exhibit additional anisotropy beyond elastic deformation. For instance, in ferroelectric materials the main inelastic deformation mode is via domains, which are highly anisotropic crystallographic features. To quantify this anisotropy of ferroelectrics, advanced X-ray and neutron diffraction methods were employed. Extensive sets of data were collected from tetragonal BaTiO3, PZT and other ferroelectric ceramics. Data analysis was challenging due to the complex constitutive behavior of these materials. To quantify the elastic strain and texture evolution in ferroelectrics under loading, a number of data analysis techniques such as the single peak and Rietveld methods were used and their advantages and disadvantages compared. It was observed that the single peak analysis fails at low peak intensities especially after domain switching while the Rietveld method does not account for lattice strain anisotropy although it overcomes the low intensity problem via whole pattern analysis. To better account for strain anisotropy the constant stress (Reuss) approximation was employed within the Rietveld method and new formulations to estimate lattice strain were proposed. Along the way, new approaches for handling highly anisotropic lattice strain data were also developed and applied. All of the ceramics studied exhibited significant changes in their crystallographic texture after loading indicating non-180° domain switching. For a full interpretation of domain switching the spherical harmonics method was employed in Rietveld. A procedure for simultaneous refinement of multiple data sets was established for a complete texture analysis. To further interpret diffraction data, a solid mechanics model based on the self-consistent approach was used in calculating lattice strain and texture evolution during the loading of a polycrystalline ferroelectric. The model estimates both the macroscopic average response of a specimen and its hkl
Interpretation of solar extinction data for stratospheric aerosols
NASA Technical Reports Server (NTRS)
Pepin, T. J.
1980-01-01
This paper discusses the inversion problem for aerosols using the solar extinction method. A series of numerical experiments is described in which solar extinction measurement systems are modeled. A numerical model of a solar extinction measurement system has been coupled with model atmospheres that exhibit fine scale structures to produce numerically generated data signals. These signals were then inverted to study the effect that measurement errors and desired vertical resolution produce in the inverted results. Knowledge o2 the trade off between vertical resolution and the accuracy of inversion aid in the interpretation of the inverted results.
About the problems to interpret spectroscopic data from plasmas
NASA Astrophysics Data System (ADS)
Rosmej, F. B.; Guedda, E. H.; Lisitsa, V. S.; Capes, H.; Stamm, R.
2006-01-01
Continued developments of quantitative spectroscopy and related atomic physics are originating from inertial and magnetic fusion research. In almost all experimental facilities, non-equilibrium phenomena are now a central issue and the interpretation of related spectroscopic data is a great challenge. We discuss new general diagnostic/spectroscopic approaches and usual point of views: high density methods and high density atomic physics for magnetic fusion research like ITER and the Virtual Contour Shape Kinetic Theory VCSKT which unifies low and high density plasma regimes and therefore allows to employ complex satellite transitions in non-equilibrium, non-LTE and non-Coronal plasmas.
Infrared spectroscopy for geologic interpretation of TIMS data
NASA Technical Reports Server (NTRS)
Bartholomew, Mary Jane
1986-01-01
The Portable Field Emission Spectrometer (PFES) was designed to collect meaningful spectra in the field under climatic, thermal, and sky conditions that approximate those at the time of the overflight. The specifications and procedures of PFES are discussed. Laboratory reflectance measurements of rocks and minerals were examined for the purpose of interpreting Thermal Infrared Multispectral Scanner (TIMS) data. The capability is currently being developed to perform direct laboratory measurement of the normal spectral radiance of Earth surface materials at low temperatures (20 to 30 C) at the Jet Propulsion Laboratory.
Geological Interpretation of PSInSAR Data at Regional Scale
Meisina, Claudia; Zucca, Francesco; Notti, Davide; Colombo, Alessio; Cucchi, Anselmo; Savio, Giuliano; Giannico, Chiara; Bianchi, Marco
2008-01-01
Results of a PSInSAR™ project carried out by the Regional Agency for Environmental Protection (ARPA) in Piemonte Region (Northern Italy) are presented and discussed. A methodology is proposed for the interpretation of the PSInSAR™ data at the regional scale, easy to use by the public administrations and by civil protection authorities. Potential and limitations of the PSInSAR™ technique for ground movement detection on a regional scale and monitoring are then estimated in relationship with different geological processes and various geological environments. PMID:27873940
NASA Astrophysics Data System (ADS)
Holden, Eun-Jung; Wong, Jason C.; Wedge, Daniel; Martis, Michael; Lindsay, Mark; Gessner, Klaus
2016-02-01
Geological structures are recognisable as discontinuities within magnetic geophysical surveys, typically as linear features. However, their interpretation is a challenging task in a dataset with abundant complex geophysical signatures representing subsurface geology, leading to significant variations in interpretation outcomes amongst, and within, individual interpreters. Previously, numerous computational methods were developed to enhance and delineate lineaments as indicators for geological structures. While these methods provide rapid and objective analysis, selection and geological classification of the detected lineaments for structure mapping is in the hands of interpreters through a time consuming process. This paper presents new ways of assisting magnetic data interpretation, with a specific aim to improve the confidence of structural interpretation through feature evidence provided by automated lineament detection. The proposed methods produce quantitative measures of feature evidence on interpreted structures and interactive visualisation to quickly assess and modify structural mapping. Automated lineament detection algorithms find the feature strengths of ridges, valleys and edges within data by analysing their local frequencies. Ridges and valleys are positive and negative line-like features detected by the phase symmetry algorithm which finds locations where local frequency components are at their extremum, the most symmetric point in their cycle. Edge features are detected by the phase congruency algorithm which finds locations where local frequency components are in phase. Their outputs are used as feature evidence through interactive visualisation to drive data evidenced interpretation. Our experiment uses magnetic data and structural interpretation from the west Kimberley region in northern Western Australia to demonstrate the use of automated analysis outputs to provide: quantitative measures of data evidence on interpreted structures, and
Methods of collecting and interpreting ground-water data
Bentall, Ray
1963-01-01
Because ground water is hidden from view, ancient man could only theorize as to its sources of replenishment and its behavior. His theories held sway until the latter part of the 17th century, which marked the first experimental work to determine the source and movement of ground water. Thus founded, the science of ground-water hydrology grew slowly and not until the 19th century is there substantial evidence of conclusions having been based on observational data. The 20th century has witnessed tremendous advances in the science in the methods of field investigation and interpretation of collected data, in the methods of determining the hydrologic characteristics of water-bearing material, and in the methods of inventorying ground-water supplies. Now, as is true of many other disciplines, the science of ground-water hydrology is characterized by frequent advancement of new ideas and techniques, refinement of old techniques, and an increasing wealth of data awaiting interpretation.So that its widely scattered staff of professional hydrologists could keep abreast of new ideas and advances in the techniques of groundwater investigation, it has been the practice in the U.S. Geological Survey to distribute such information for immediate internal use. As the methods become better established and developed, they are described in formal publications. Six papers pertaining to widely different phases of ground-water investigation comprise this particular contribution. For the sake of clarity and conformity, the original papers have been revised and edited by the compiler.
Preliminary Interpretation of the MSL REMS Pressure Data
NASA Astrophysics Data System (ADS)
Haberle, Robert; Gómez-Elvira, Javier; de la Torre Juárez, Manuel; Harri, Ari-Matti; Hollingsworth, Jeffery; Kahanpää, Henrik; Kahre, Melinda; Martin-Torres, Javier; Mischna, Michael; Newman, Claire; Rafkin, Scot; Rennó, Nilton; Richardson, Mark; Rodríguez-Manfredi, Jose; Vasavada, Ashwin; Zorzano, Maria-Paz; REMS/MSL Science Teams
2013-04-01
The Rover Environmental Monitoring Station (REMS) on the Mars Science Laboratory (MSL) Curiosity rover consists of a suite of meteorological instruments that measure pressure, temperature (air and ground), wind (speed and direction), relative humidity, and the UV flux. A detailed description of the REMS sensors and their performance can be found in Gómez-Elvira et al. [2012, Space Science Reviews, 170(1-4), 583-640]. Here we focus on interpreting the first 100 sols of REMS operations with a particular emphasis on the pressure data. A unique feature of pressure data is that they reveal information on meteorological phenomena with time scales from seconds to years and spatial scales from local to global. From a single station we can learn about dust devils, regional circulations, thermal tides, synoptic weather systems, the CO2 cycle, dust storms, and interannual variability. Thus far MSL's REMS pressure sensor, provided by the Finnish Meteorological Institute and integrated into the REMS payload by Centro de Astrobiología, is performing flawlessly and our preliminary interpretation of its data includes the discovery of relatively dust-free convective vortices; a regional circulation system significantly modified by Gale crater and its central mound; the strongest thermal tides yet measured from the surface of Mars whose amplitudes and phases are very sensitive to fluctuations in global dust loading; and the classical signature of the seasonal cycling of carbon dioxide into and out of the polar caps.
The Importance of Statistical Modeling in Data Analysis and Inference
ERIC Educational Resources Information Center
Rollins, Derrick, Sr.
2017-01-01
Statistical inference simply means to draw a conclusion based on information that comes from data. Error bars are the most commonly used tool for data analysis and inference in chemical engineering data studies. This work demonstrates, using common types of data collection studies, the importance of specifying the statistical model for sound…
Guidelines for Statistical Analysis of Percentage of Syllables Stuttered Data
ERIC Educational Resources Information Center
Jones, Mark; Onslow, Mark; Packman, Ann; Gebski, Val
2006-01-01
Purpose: The purpose of this study was to develop guidelines for the statistical analysis of percentage of syllables stuttered (%SS) data in stuttering research. Method; Data on %SS from various independent sources were used to develop a statistical model to describe this type of data. On the basis of this model, %SS data were simulated with…
Internet Data Analysis for the Undergraduate Statistics Curriculum
ERIC Educational Resources Information Center
Sanchez, Juana; He, Yan
2005-01-01
Statistics textbooks for undergraduates have not caught up with the enormous amount of analysis of Internet data that is taking place these days. Case studies that use Web server log data or Internet network traffic data are rare in undergraduate Statistics education. And yet these data provide numerous examples of skewed and bimodal…
Autonomous image data reduction by analysis and interpretation
NASA Technical Reports Server (NTRS)
Eberlein, Susan; Yates, Gigi; Ritter, Niles
1988-01-01
Image data is a critical component of the scientific information acquired by space missions. Compression of image data is required due to the limited bandwidth of the data transmission channel and limited memory space on the acquisition vehicle. This need becomes more pressing when dealing with multispectral data where each pixel may comprise 300 or more bytes. An autonomous, real time, on-board image analysis system for an exploratory vehicle such as a Mars Rover is developed. The completed system will be capable of interpreting image data to produce reduced representations of the image, and of making decisions regarding the importance of data based on current scientific goals. Data from multiple sources, including stereo images, color images, and multispectral data, are fused into single image representations. Analysis techniques emphasize artificial neural networks. Clusters are described by their outlines and class values. These analysis and compression techniques are coupled with decision making capacity for determining importance of each image region. Areas determined to be noise or uninteresting can be discarded in favor of more important areas. Thus limited resources for data storage and transmission are allocated to the most significant images.
Autonomous image data reduction by analysis and interpretation
NASA Astrophysics Data System (ADS)
Eberlein, Susan; Yates, Gigi; Ritter, Niles
Image data is a critical component of the scientific information acquired by space missions. Compression of image data is required due to the limited bandwidth of the data transmission channel and limited memory space on the acquisition vehicle. This need becomes more pressing when dealing with multispectral data where each pixel may comprise 300 or more bytes. An autonomous, real time, on-board image analysis system for an exploratory vehicle such as a Mars Rover is developed. The completed system will be capable of interpreting image data to produce reduced representations of the image, and of making decisions regarding the importance of data based on current scientific goals. Data from multiple sources, including stereo images, color images, and multispectral data, are fused into single image representations. Analysis techniques emphasize artificial neural networks. Clusters are described by their outlines and class values. These analysis and compression techniques are coupled with decision-making capacity for determining importance of each image region. Areas determined to be noise or uninteresting can be discarded in favor of more important areas. Thus limited resources for data storage and transmission are allocated to the most significant images.
Reliability of travel time data computed from interpreted migrated events
NASA Astrophysics Data System (ADS)
Jannaud, L. R.
1995-02-01
In the Sequential Migration Aided Reflection Tomography (SMART) method, travel times used by reflection tomography are computed by tracing rays which propagate with the migration velocity and reflect from reflectors picked on migrated images. Because of limits of migration resolution, this picking involves inaccuracies, to which computed travel times are unfortunately very sensitive. The objective of this paper is to predict a priori the confidence we can have in emergence data, i.e., emergence point location and travel time, from the statistical information that describes the uncertainties of the reflectors. (These reflectors can be obtained by picking on migrated images as explained above or by any other method). The proposed method relies on a linearization of each step of the ray computation, allowing one to deduce, from the statistical properties of reflector fluctuations, the statistical properties of ray-tracing outputs. The computed confidences and correlations give access to a more realistic analysis of emergence data. Moreover, they can be used as inputs for reflection tomography to compute models that match travel times according to the confidence we have in the reflector. Applications on real data show that the uncertainties are generally large and, what is much more interesting, strongly varying from one ray to another. Taking them into account is therefore very important for both a better understanding of the kinematic information in the data and the computation of a model that matches these travel times.
Efficient statistical mapping of avian count data
Royle, J. Andrew; Wikle, C.K.
2005-01-01
We develop a spatial modeling framework for count data that is efficient to implement in high-dimensional prediction problems. We consider spectral parameterizations for the spatially varying mean of a Poisson model. The spectral parameterization of the spatial process is very computationally efficient, enabling effective estimation and prediction in large problems using Markov chain Monte Carlo techniques. We apply this model to creating avian relative abundance maps from North American Breeding Bird Survey (BBS) data. Variation in the ability of observers to count birds is modeled as spatially independent noise, resulting in over-dispersion relative to the Poisson assumption. This approach represents an improvement over existing approaches used for spatial modeling of BBS data which are either inefficient for continental scale modeling and prediction or fail to accommodate important distributional features of count data thus leading to inaccurate accounting of prediction uncertainty.
Interpretation methodology and analysis of in-flight lightning data
NASA Technical Reports Server (NTRS)
Rudolph, T.; Perala, R. A.
1982-01-01
A methodology is presented whereby electromagnetic measurements of inflight lightning stroke data can be understood and extended to other aircraft. Recent measurements made on the NASA F106B aircraft indicate that sophisticated numerical techniques and new developments in corona modeling are required to fully understand the data. Thus the problem is nontrivial and successful interpretation can lead to a significant understanding of the lightning/aircraft interaction event. This is of particular importance because of the problem of lightning induced transient upset of new technology low level microcircuitry which is being used in increasing quantities in modern and future avionics. Inflight lightning data is analyzed and lightning environments incident upon the F106B are determined.
Flexibility in data interpretation: effects of representational format
Braithwaite, David W.; Goldstone, Robert L.
2013-01-01
Graphs and tables differentially support performance on specific tasks. For tasks requiring reading off single data points, tables are as good as or better than graphs, while for tasks involving relationships among data points, graphs often yield better performance. However, the degree to which graphs and tables support flexibility across a range of tasks is not well-understood. In two experiments, participants detected main and interaction effects in line graphs and tables of bivariate data. Graphs led to more efficient performance, but also lower flexibility, as indicated by a larger discrepancy in performance across tasks. In particular, detection of main effects of variables represented in the graph legend was facilitated relative to detection of main effects of variables represented in the x-axis. Graphs may be a preferable representational format when the desired task or analytical perspective is known in advance, but may also induce greater interpretive bias than tables, necessitating greater care in their use and design. PMID:24427145
NASA Astrophysics Data System (ADS)
Mani, Peter; Heuer, Markus; Hofmann, Beda A.; Milliken, Kitty L.; West, Julia M.
This paper evaluates a mathematical model of bio-signature search processes on Mars samples returned to Earth and studied inside a Mars Sample Return Facility (MSRF). Asimple porosity model for a returned Mars sample, based on initial observations on Mars meteorites, has been stochastically simulated and the data analysed in a computer study. The resulting false positive, true negative and false negative values - as a typical output of the simulations - was statistically analysed. The results were used in Bayes’ statistics to correct the a-priori probability of presence of bio-signature and the resulting posteriori probability was used in turn to improve the initial assumption of the value of extra-terrestrial presence for life forms in Mars material. Such an iterative algorithm can lead to a better estimate of the positive predictive value for life on Mars and therefore, together with Poisson statistics for a null result, it should be possible to bound the probability for the presence of extra-terrestrial bio-signatures to an upper level.
Statistical analysis of life history calendar data.
Eerola, Mervi; Helske, Satu
2016-04-01
The life history calendar is a data-collection tool for obtaining reliable retrospective data about life events. To illustrate the analysis of such data, we compare the model-based probabilistic event history analysis and the model-free data mining method, sequence analysis. In event history analysis, we estimate instead of transition hazards the cumulative prediction probabilities of life events in the entire trajectory. In sequence analysis, we compare several dissimilarity metrics and contrast data-driven and user-defined substitution costs. As an example, we study young adults' transition to adulthood as a sequence of events in three life domains. The events define the multistate event history model and the parallel life domains in multidimensional sequence analysis. The relationship between life trajectories and excess depressive symptoms in middle age is further studied by their joint prediction in the multistate model and by regressing the symptom scores on individual-specific cluster indices. The two approaches complement each other in life course analysis; sequence analysis can effectively find typical and atypical life patterns while event history analysis is needed for causal inquiries.
Statistical Considerations of Data Processing in Giovanni Online Tool
NASA Technical Reports Server (NTRS)
Suhung, Shen; Leptoukh, G.; Acker, J.; Berrick, S.
2005-01-01
The GES DISC Interactive Online Visualization and Analysis Infrastructure (Giovanni) is a web-based interface for the rapid visualization and analysis of gridded data from a number of remote sensing instruments. The GES DISC currently employs several Giovanni instances to analyze various products, such as Ocean-Giovanni for ocean products from SeaWiFS and MODIS-Aqua; TOMS & OM1 Giovanni for atmospheric chemical trace gases from TOMS and OMI, and MOVAS for aerosols from MODIS, etc. (http://giovanni.gsfc.nasa.gov) Foremost among the Giovanni statistical functions is data averaging. Two aspects of this function are addressed here. The first deals with the accuracy of averaging gridded mapped products vs. averaging from the ungridded Level 2 data. Some mapped products contain mean values only; others contain additional statistics, such as number of pixels (NP) for each grid, standard deviation, etc. Since NP varies spatially and temporally, averaging with or without weighting by NP will be different. In this paper, we address differences of various weighting algorithms for some datasets utilized in Giovanni. The second aspect is related to different averaging methods affecting data quality and interpretation for data with non-normal distribution. The present study demonstrates results of different spatial averaging methods using gridded SeaWiFS Level 3 mapped monthly chlorophyll a data. Spatial averages were calculated using three different methods: arithmetic mean (AVG), geometric mean (GEO), and maximum likelihood estimator (MLE). Biogeochemical data, such as chlorophyll a, are usually considered to have a log-normal distribution. The study determined that differences between methods tend to increase with increasing size of a selected coastal area, with no significant differences in most open oceans. The GEO method consistently produces values lower than AVG and MLE. The AVG method produces values larger than MLE in some cases, but smaller in other cases. Further
Statistical inference for serial dilution assay data.
Lee, M L; Whitmore, G A
1999-12-01
Serial dilution assays are widely employed for estimating substance concentrations and minimum inhibitory concentrations. The Poisson-Bernoulli model for such assays is appropriate for count data but not for continuous measurements that are encountered in applications involving substance concentrations. This paper presents practical inference methods based on a log-normal model and illustrates these methods using a case application involving bacterial toxins.
The Knowledge Content of Statistical Data.
ERIC Educational Resources Information Center
Preuss, Lucien; Vorkauf, Helmut
1997-01-01
An information-theoretic framework is used to analyze the knowledge content in multivariate cross-classified data. Proposes measures based on the information concept, including the knowledge content of a cross classification, its terseness, and the separability of one variable. Presents applications for situations when classical analysis is…
Statistical Analysis of Japanese Structural Damage Data
1977-01-01
Calculated Peak Overpressure Data ......... .. 36 Figure 4. Frequency Functions for Assumed Damage Laws ............ .. 38 Figure 5. Conversions to Value of...Buildings .... .......... .. 92 Figure 20. Effect of Damage Law on Confidence Regions ............ .. 97 * Figure 21. Comparison of Confidence Limits on...Value of ad (Cumulative Log Normal Damage Law ) ...... ............ .. 99 * Figure 22. Comparison of Confidence Limits on Value of ad (Cumulative Log
... People with Blood Clots at Risk of Permanent Work-Related Disability CDC collaborated on a study of individuals who had participated in two previous ... VTE subsequently received a disability pension due to work-related disability. (Published ... Study Findings Multiple data sources needed for accurate reporting ...
Improved interpretation of satellite altimeter data using genetic algorithms
NASA Technical Reports Server (NTRS)
Messa, Kenneth; Lybanon, Matthew
1992-01-01
Genetic algorithms (GA) are optimization techniques that are based on the mechanics of evolution and natural selection. They take advantage of the power of cumulative selection, in which successive incremental improvements in a solution structure become the basis for continued development. A GA is an iterative procedure that maintains a 'population' of 'organisms' (candidate solutions). Through successive 'generations' (iterations) the population as a whole improves in simulation of Darwin's 'survival of the fittest'. GA's have been shown to be successful where noise significantly reduces the ability of other search techniques to work effectively. Satellite altimetry provides useful information about oceanographic phenomena. It provides rapid global coverage of the oceans and is not as severely hampered by cloud cover as infrared imagery. Despite these and other benefits, several factors lead to significant difficulty in interpretation. The GA approach to the improved interpretation of satellite data involves the representation of the ocean surface model as a string of parameters or coefficients from the model. The GA searches in parallel, a population of such representations (organisms) to obtain the individual that is best suited to 'survive', that is, the fittest as measured with respect to some 'fitness' function. The fittest organism is the one that best represents the ocean surface model with respect to the altimeter data.
Interpretation of MINOS data in terms of nonstandard neutrino interactions
NASA Astrophysics Data System (ADS)
Kopp, Joachim; Machado, Pedro A. N.; Parke, Stephen J.
2010-12-01
The MINOS experiment at Fermilab has recently reported a tension between the oscillation results for neutrinos and antineutrinos. We show that this tension, if it persists, can be understood in the framework of nonstandard neutrino interactions (NSI). While neutral current NSI (nonstandard matter effects) are disfavored by atmospheric neutrinos, a new charged current coupling between tau neutrinos and nucleons can fit the MINOS data without violating other constraints. In particular, we show that loop-level contributions to flavor-violating τ decays are sufficiently suppressed. However, conflicts with existing bounds could arise once the effective theory considered here is embedded into a complete renormalizable model. We predict the future sensitivity of the T2K and NOνA experiments to the NSI parameter region favored by the MINOS fit, and show that both experiments are excellent tools to test the NSI interpretation of the MINOS data.
Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.
Dielectric Property Measurements to Support Interpretation of Cassini Radar Data
NASA Astrophysics Data System (ADS)
Jamieson, Corey; Barmatz, M.
2012-10-01
Radar observations are useful for constraining surface and near-surface compositions and illuminating geologic processes on Solar System bodies. The interpretation of Cassini radiometric and radar data at 13.78 GHz (2.2 cm) of Titan and other Saturnian icy satellites is aided by laboratory measurements of the dielectric properties of relevant materials. However, existing dielectric measurements of candidate surface materials at microwave frequencies and low temperatures is sparse. We have set up a microwave cavity and cryogenic system to measure the complex dielectric properties of liquid hydrocarbons relevant to Titan, specifically methane, ethane and their mixtures to support the interpretation of spacecraft instrument and telescope radar observations. To perform these measurements, we excite and detect the TM020 mode in a custom-built cavity with small metal loop antennas powered by a Vector Network Analyzer. The hydrocarbon samples are condensed into a cylindrical quartz tube that is axially oriented in the cavity. Frequency sweeps through a resonance are performed with an empty cavity, an empty quartz tube inserted into the cavity, and with a sample-filled quartz tube in the cavity. These sweeps are fit by a Lorentzian line shape, from which we obtain the resonant frequency, f, and quality factor, Q, for each experimental arrangement. We then derive dielectric constants and loss tangents for our samples near 13.78 GHz using a new technique ideally suited for measuring liquid samples. We will present temperature-dependent, dielectric property measurements for liquid methane and ethane. The full interpretation of the radar and radiometry observations of Saturn’s icy satellites depends critically on understanding the dielectric properties of potential surface materials. By investigating relevant liquids and solids we will improve constrains on lake depths, volumes and compositions, which are important to understand Titan’s carbon/organic cycle and inevitably
Plausible inference and the interpretation of quantitative data
Nakhleh, C.W.
1998-02-01
The analysis of quantitative data is central to scientific investigation. Probability theory, which is founded on two rules, the sum and product rules, provides the unique, logically consistent method for drawing valid inferences from quantitative data. This primer on the use of probability theory is meant to fulfill a pedagogical purpose. The discussion begins at the foundation of scientific inference by showing how the sum and product rules of probability theory follow from some very basic considerations of logical consistency. The authors then develop general methods of probability theory that are essential to the analysis and interpretation of data. They discuss how to assign probability distributions using the principle of maximum entropy, how to estimate parameters from data, how to handle nuisance parameters whose values are of little interest, and how to determine which of a set of models is most justified by a data set. All these methods are used together in most realistic data analyses. Examples are given throughout to illustrate the basic points.
Statistical Machine Learning for Structured and High Dimensional Data
2014-09-17
AFRL-OSR-VA-TR-2014-0234 STATISTICAL MACHINE LEARNING FOR STRUCTURED AND HIGH DIMENSIONAL DATA Larry Wasserman CARNEGIE MELLON UNIVERSITY Final...Re . 8-98) v Prescribed by ANSI Std. Z39.18 14-06-2014 Final Dec 2009 - Aug 2014 Statistical Machine Learning for Structured and High Dimensional...area of resource-constrained statistical estimation. machine learning , high-dimensional statistics U U U UU John Lafferty 773-702-3813 > Research under
Statistical approach for evaluation of contraceptive data.
Tripathi, Vriyesh
2008-04-01
This article will define how best to analyse data collected from a longitudinal follow up on contraceptive use and discontinuation, with special consideration to the needs of developing countries. Accessibility and acceptability of contraceptives at the ground level remains low and it is an overlooked area of research. The author presents a set of propositions that are closer in spirit to practical recommendations than to formal theorems. We will comment specifically on issues of model validation of model through bootstrapping techniques. The paper makes a presentation of a multivariate model to assess the rate of discontinuation of contraception, while accounting for the possibility that there may be factors that influence both a couple's choice of provider and their probability of discontinuation.
ERIC Educational Resources Information Center
McArthur, David; Chou, Chih-Ping
Diagnostic testing confronts several challenges at once, among which are issues of test interpretation and immediate modification of the test itself in response to the interpretation. Several methods are available for administering and evaluating a test in real-time, towards optimizing the examiner's chances of isolating a persistent pattern of…
ERIC Educational Resources Information Center
Boysen, Guy A.
2015-01-01
Student evaluations of teaching are among the most accepted and important indicators of college teachers' performance. However, faculty and administrators can overinterpret small variations in mean teaching evaluations. The current research examined the effect of including statistical information on the interpretation of teaching evaluations.…
ERIC Educational Resources Information Center
Boysen, Guy A.
2015-01-01
Student evaluations of teaching are among the most accepted and important indicators of college teachers' performance. However, faculty and administrators can overinterpret small variations in mean teaching evaluations. The current research examined the effect of including statistical information on the interpretation of teaching evaluations.…
The interpretation of time-varying data with DIAMON-1.
Steimann, F
1996-08-01
Applying the methods of Artificial Intelligence to clinical monitoring requires some kind of signal-to-symbol conversion as a prior step. Subsequent processing of the derived symbolic information must also be sensitive to history and development, as the failure to address temporal relationships between findings invariably leads to inferior results. DIAMON-1, a framework for the design of diagnostic monitors, provides two methods for the interpretation of time-varying data: one for the detection of trends based on classes of courses, and one for the tracking of disease histories modelled through deterministic automata. Both methods make use of fuzzy set theory taking account of the elasticity of medical categories and allowing discrete disease models to mirror the patient's continuous progression through the stages of illness.
Interpretation of AMS-02 electrons and positrons data
Mauro, M. Di; Donato, F.; Fornengo, N.; Vittino, A.; Lineros, R. E-mail: donato@to.infn.it E-mail: rlineros@ific.uv.es
2014-04-01
We perform a combined analysis of the recent AMS-02 data on electrons, positrons, electrons plus positrons and positron fraction, in a self-consistent framework where we realize a theoretical modeling of all the astrophysical components that can contribute to the observed fluxes in the whole energy range. The primary electron contribution is modeled through the sum of an average flux from distant sources and the fluxes from the local supernova remnants in the Green catalog. The secondary electron and positron fluxes originate from interactions on the interstellar medium of primary cosmic rays, for which we derive a novel determination by using AMS-02 proton and helium data. Primary positrons and electrons from pulsar wind nebulae in the ATNF catalog are included and studied in terms of their most significant (while loosely known) properties and under different assumptions (average contribution from the whole catalog, single dominant pulsar, a few dominant pulsars). We obtain a remarkable agreement between our various modeling and the AMS-02 data for all types of analysis, demonstrating that the whole AMS-02 leptonic data admit a self-consistent interpretation in terms of astrophysical contributions.
Interpretation of evidence in data by untrained medical students: a scenario-based study
2010-01-01
Background To determine which approach to assessment of evidence in data - statistical tests or likelihood ratios - comes closest to the interpretation of evidence by untrained medical students. Methods Empirical study of medical students (N = 842), untrained in statistical inference or in the interpretation of diagnostic tests. They were asked to interpret a hypothetical diagnostic test, presented in four versions that differed in the distributions of test scores in diseased and non-diseased populations. Each student received only one version. The intuitive application of the statistical test approach would lead to rejecting the null hypothesis of no disease in version A, and to accepting the null in version B. Application of the likelihood ratio approach led to opposite conclusions - against the disease in A, and in favour of disease in B. Version C tested the importance of the p-value (A: 0.04 versus C: 0.08) and version D the importance of the likelihood ratio (C: 1/4 versus D: 1/8). Results In version A, 7.5% concluded that the result was in favour of disease (compatible with p value), 43.6% ruled against the disease (compatible with likelihood ratio), and 48.9% were undecided. In version B, 69.0% were in favour of disease (compatible with likelihood ratio), 4.5% against (compatible with p value), and 26.5% undecided. Increasing the p value from 0.04 to 0.08 did not change the results. The change in the likelihood ratio from 1/4 to 1/8 increased the proportion of non-committed responses. Conclusions Most untrained medical students appear to interpret evidence from data in a manner that is compatible with the use of likelihood ratios. PMID:20796297
Mantel-Haenszel test statistics for correlated binary data.
Zhang, J; Boos, D D
1997-12-01
This paper proposes two new Mantel-Haenszel test statistics for correlated binary data in 2 x 2 tables that are asymptotically valid in both sparse data (many strata) and large-strata limiting models. Monte Carlo experiments show that the statistics compare favorably to previously proposed test statistics, especially for 5-25 small to moderate-sized strata. Confidence intervals are also obtained and compared to those from the test of Liang (1985, Biometrika 72, 678-682).
Component outage data analysis methods. Volume 2: Basic statistical methods
NASA Astrophysics Data System (ADS)
Marshall, J. A.; Mazumdar, M.; McCutchan, D. A.
1981-08-01
Statistical methods for analyzing outage data on major power system components such as generating units, transmission lines, and transformers are identified. The analysis methods produce outage statistics from component failure and repair data that help in understanding the failure causes and failure modes of various types of components. Methods for forecasting outage statistics for those components used in the evaluation of system reliability are emphasized.
Rapp, J.B.
1991-01-01
Q-mode factor analysis was used to quantitate the distribution of the major aliphatic hydrocarbon (n-alkanes, pristane, phytane) systems in sediments from a variety of marine environments. The compositions of the pure end members of the systems were obtained from factor scores and the distribution of the systems within each sample was obtained from factor loadings. All the data, from the diverse environments sampled (estuarine (San Francisco Bay), fresh-water (San Francisco Peninsula), polar-marine (Antarctica) and geothermal-marine (Gorda Ridge) sediments), were reduced to three major systems: a terrestrial system (mostly high molecular weight aliphatics with odd-numbered-carbon predominance), a mature system (mostly low molecular weight aliphatics without predominance) and a system containing mostly high molecular weight aliphatics with even-numbered-carbon predominance. With this statistical approach, it is possible to assign the percentage contribution from various sources to the observed distribution of aliphatic hydrocarbons in each sediment sample. ?? 1991.
Novice Interpretations of Visual Representations of Geosciences Data
NASA Astrophysics Data System (ADS)
Burkemper, L. K.; Arthurs, L.
2013-12-01
Past cognition research of individual's perception and comprehension of bar and line graphs are substantive enough that they have resulted in the generation of graph design principles and graph comprehension theories; however, gaps remain in our understanding of how people process visual representations of data, especially of geologic and atmospheric data. This pilot project serves to build on others' prior research and begin filling the existing gaps. The primary objectives of this pilot project include: (i) design a novel data collection protocol based on a combination of paper-based surveys, think-aloud interviews, and eye-tracking tasks to investigate student data handling skills of simple to complex visual representations of geologic and atmospheric data, (ii) demonstrate that the protocol yields results that shed light on student data handling skills, and (iii) generate preliminary findings upon which tentative but perhaps helpful recommendations on how to more effectively present these data to the non-scientist community and teach essential data handling skills. An effective protocol for the combined use of paper-based surveys, think-aloud interviews, and computer-based eye-tracking tasks for investigating cognitive processes involved in perceiving, comprehending, and interpreting visual representations of geologic and atmospheric data is instrumental to future research in this area. The outcomes of this pilot study provide the foundation upon which future more in depth and scaled up investigations can build. Furthermore, findings of this pilot project are sufficient for making, at least, tentative recommendations that can help inform (i) the design of physical attributes of visual representations of data, especially more complex representations, that may aid in improving students' data handling skills and (ii) instructional approaches that have the potential to aid students in more effectively handling visual representations of geologic and atmospheric data
Yu, Victoria; Kishan, Amar U.; Cao, Minsong; Low, Daniel; Lee, Percy; Ruan, Dan
2014-03-15
Purpose: To demonstrate a new method of evaluating dose response of treatment-induced lung radiographic injury post-SBRT (stereotactic body radiotherapy) treatment and the discovery of bimodal dose behavior within clinically identified injury volumes. Methods: Follow-up CT scans at 3, 6, and 12 months were acquired from 24 patients treated with SBRT for stage-1 primary lung cancers or oligometastic lesions. Injury regions in these scans were propagated to the planning CT coordinates by performing deformable registration of the follow-ups to the planning CTs. A bimodal behavior was repeatedly observed from the probability distribution for dose values within the deformed injury regions. Based on a mixture-Gaussian assumption, an Expectation-Maximization (EM) algorithm was used to obtain characteristic parameters for such distribution. Geometric analysis was performed to interpret such parameters and infer the critical dose level that is potentially inductive of post-SBRT lung injury. Results: The Gaussian mixture obtained from the EM algorithm closely approximates the empirical dose histogram within the injury volume with good consistency. The average Kullback-Leibler divergence values between the empirical differential dose volume histogram and the EM-obtained Gaussian mixture distribution were calculated to be 0.069, 0.063, and 0.092 for the 3, 6, and 12 month follow-up groups, respectively. The lower Gaussian component was located at approximately 70% prescription dose (35 Gy) for all three follow-up time points. The higher Gaussian component, contributed by the dose received by planning target volume, was located at around 107% of the prescription dose. Geometrical analysis suggests the mean of the lower Gaussian component, located at 35 Gy, as a possible indicator for a critical dose that induces lung injury after SBRT. Conclusions: An innovative and improved method for analyzing the correspondence between lung radiographic injury and SBRT treatment dose has
Using Data from Climate Science to Teach Introductory Statistics
ERIC Educational Resources Information Center
Witt, Gary
2013-01-01
This paper shows how the application of simple statistical methods can reveal to students important insights from climate data. While the popular press is filled with contradictory opinions about climate science, teachers can encourage students to use introductory-level statistics to analyze data for themselves on this important issue in public…
INDIANS IN OKLAHOMA, SOCIAL AND ECONOMIC STATISTICAL DATA.
ERIC Educational Resources Information Center
HUNTER, BILL; TUCKER, TOM
STATISTICAL DATA ARE PRESENTED ON THE INDIAN POPULATION OF OKLAHOMA, ALONG WITH A BRIEF HISTORY OF SOME OF THE 67 INDIAN TRIBES FOUND IN THE STATE AND NARRATIVE SUMMARIES OF THE STATISTICAL DATA. MAPS OF CURRENT AND PAST INDIAN LANDS ARE SHOWN IN RELATION TO CURRENT COUNTY LINES. GRAPHS PORTRAY POPULATION COMPOSITION, RURAL AND URBAN POPULATION…
The Empirical Nature and Statistical Treatment of Missing Data
ERIC Educational Resources Information Center
Tannenbaum, Christyn E.
2009-01-01
Introduction. Missing data is a common problem in research and can produce severely misleading analyses, including biased estimates of statistical parameters, and erroneous conclusions. In its 1999 report, the APA Task Force on Statistical Inference encouraged authors to report complications such as missing data and discouraged the use of…
The Empirical Nature and Statistical Treatment of Missing Data
ERIC Educational Resources Information Center
Tannenbaum, Christyn E.
2009-01-01
Introduction. Missing data is a common problem in research and can produce severely misleading analyses, including biased estimates of statistical parameters, and erroneous conclusions. In its 1999 report, the APA Task Force on Statistical Inference encouraged authors to report complications such as missing data and discouraged the use of…
Using Data from Climate Science to Teach Introductory Statistics
ERIC Educational Resources Information Center
Witt, Gary
2013-01-01
This paper shows how the application of simple statistical methods can reveal to students important insights from climate data. While the popular press is filled with contradictory opinions about climate science, teachers can encourage students to use introductory-level statistics to analyze data for themselves on this important issue in public…
Enhancements to the Engine Data Interpretation System (EDIS)
NASA Technical Reports Server (NTRS)
Hofmann, Martin O.
1993-01-01
The Engine Data Interpretation System (EDIS) expert system project assists the data review personnel at NASA/MSFC in performing post-test data analysis and engine diagnosis of the Space Shuttle Main Engine (SSME). EDIS uses knowledge of the engine, its components, and simple thermodynamic principles instead of, and in addition to, heuristic rules gathered from the engine experts. EDIS reasons in cooperation with human experts, following roughly the pattern of logic exhibited by human experts. EDIS concentrates on steady-state static faults, such as small leaks, and component degradations, such as pump efficiencies. The objective of this contract was to complete the set of engine component models, integrate heuristic rules into EDIS, integrate the Power Balance Model into EDIS, and investigate modification of the qualitative reasoning mechanisms to allow 'fuzzy' value classification. The results of this contract is an operational version of EDIS. EDIS will become a module of the Post-Test Diagnostic System (PTDS) and will, in this context, provide system-level diagnostic capabilities which integrate component-specific findings provided by other modules.
Enhancements to the Engine Data Interpretation System (EDIS)
NASA Technical Reports Server (NTRS)
Hofmann, Martin O.
1993-01-01
The Engine Data Interpretation System (EDIS) expert system project assists the data review personnel at NASA/MSFC in performing post-test data analysis and engine diagnosis of the Space Shuttle Main Engine (SSME). EDIS uses knowledge of the engine, its components, and simple thermodynamic principles instead of, and in addition to, heuristic rules gathered from the engine experts. EDIS reasons in cooperation with human experts, following roughly the pattern of logic exhibited by human experts. EDIS concentrates on steady-state static faults, such as small leaks, and component degradations, such as pump efficiencies. The objective of this contract was to complete the set of engine component models, integrate heuristic rules into EDIS, integrate the Power Balance Model into EDIS, and investigate modification of the qualitative reasoning mechanisms to allow 'fuzzy' value classification. The result of this contract is an operational version of EDIS. EDIS will become a module of the Post-Test Diagnostic System (PTDS) and will, in this context, provide system-level diagnostic capabilities which integrate component-specific findings provided by other modules.
Experimental uncertainty estimation and statistics for data having interval uncertainty.
Kreinovich, Vladik (Applied Biomathematics, Setauket, New York); Oberkampf, William Louis (Applied Biomathematics, Setauket, New York); Ginzburg, Lev (Applied Biomathematics, Setauket, New York); Ferson, Scott (Applied Biomathematics, Setauket, New York); Hajagos, Janos (Applied Biomathematics, Setauket, New York)
2007-05-01
This report addresses the characterization of measurements that include epistemic uncertainties in the form of intervals. It reviews the application of basic descriptive statistics to data sets which contain intervals rather than exclusively point estimates. It describes algorithms to compute various means, the median and other percentiles, variance, interquartile range, moments, confidence limits, and other important statistics and summarizes the computability of these statistics as a function of sample size and characteristics of the intervals in the data (degree of overlap, size and regularity of widths, etc.). It also reviews the prospects for analyzing such data sets with the methods of inferential statistics such as outlier detection and regressions. The report explores the tradeoff between measurement precision and sample size in statistical results that are sensitive to both. It also argues that an approach based on interval statistics could be a reasonable alternative to current standard methods for evaluating, expressing and propagating measurement uncertainties.
Rasch fit statistics and sample size considerations for polytomous data
Smith, Adam B; Rush, Robert; Fallowfield, Lesley J; Velikova, Galina; Sharpe, Michael
2008-01-01
Background Previous research on educational data has demonstrated that Rasch fit statistics (mean squares and t-statistics) are highly susceptible to sample size variation for dichotomously scored rating data, although little is known about this relationship for polytomous data. These statistics help inform researchers about how well items fit to a unidimensional latent trait, and are an important adjunct to modern psychometrics. Given the increasing use of Rasch models in health research the purpose of this study was therefore to explore the relationship between fit statistics and sample size for polytomous data. Methods Data were collated from a heterogeneous sample of cancer patients (n = 4072) who had completed both the Patient Health Questionnaire – 9 and the Hospital Anxiety and Depression Scale. Ten samples were drawn with replacement for each of eight sample sizes (n = 25 to n = 3200). The Rating and Partial Credit Models were applied and the mean square and t-fit statistics (infit/outfit) derived for each model. Results The results demonstrated that t-statistics were highly sensitive to sample size, whereas mean square statistics remained relatively stable for polytomous data. Conclusion It was concluded that mean square statistics were relatively independent of sample size for polytomous data and that misfit to the model could be identified using published recommended ranges. PMID:18510722
Phase 1 report on sensor technology, data fusion and data interpretation for site characterization
Beckerman, M.
1991-10-01
In this report we discuss sensor technology, data fusion and data interpretation approaches of possible maximal usefulness for subsurface imaging and characterization of land-fill waste sites. Two sensor technologies, terrain conductivity using electromagnetic induction and ground penetrating radar, are described and the literature on the subject is reviewed. We identify the maximum entropy stochastic method as one providing a rigorously justifiable framework for fusing the sensor data, briefly summarize work done by us in this area, and examine some of the outstanding issues with regard to data fusion and interpretation. 25 refs., 17 figs.
Estimating aquifer channel recharge using optical data interpretation.
Walter, Gary R; Necsoiu, Marius; McGinnis, Ronald
2012-01-01
Recharge through intermittent and ephemeral stream channels is believed to be a primary aquifer recharge process in arid and semiarid environments. The intermittent nature of precipitation and flow events in these channels, and their often remote locations, makes direct flow and loss measurements difficult and expensive. Airborne and satellite optical images were interpreted to evaluate aquifer recharge due to stream losses on the Frio River in south-central Texas. Losses in the Frio River are believed to be a major contributor of recharge to the Edwards Aquifer. The results of this work indicate that interpretation of readily available remote sensing optical images can offer important insights into the spatial distribution of aquifer recharge from losing streams. In cases where upstream gauging data are available, simple visual analysis of the length of the flowing reach downstream from the gauging station can be used to estimate channel losses. In the case of the Frio River, the rate of channel loss estimated from the length of the flowing reach at low flows was about half of the loss rate calculated from in-stream gain-loss measurements. Analysis based on water-surface width and channel slope indicated that losses were mainly in a reach downstream of the mapped recharge zone. The analysis based on water-surface width, however, did not indicate that this method could yield accurate estimates of actual flow in pool and riffle streams, such as the Frio River and similar rivers draining the Edwards Plateau. © 2011, Southwest Research Institute. Ground Water © 2011, National Ground Water Association.
Statistical information of ASAR observations over wetland areas: An interaction model interpretation
NASA Astrophysics Data System (ADS)
Grings, F.; Salvia, M.; Karszenbaum, H.; Ferrazzoli, P.; Perna, P.; Barber, M.; Jacobo Berlles, J.
2010-01-01
This paper presents the results obtained after studying the relation between the statistical parameters that describe the backscattering distribution of junco marshes and their biophysical variables. The results are based on the texture analysis of a time series of Envisat ASAR C-band data (APP mode, V V +HH polarizations) acquired between October 2003 and January 2005 over the Lower Paraná River Delta, Argentina. The image power distributions were analyzed, and we show that the K distribution provides a good fitting of SAR data extracted from wetland observations for both polarizations. We also show that the estimated values of the order parameter of the K distribution can be explained using fieldwork and reasonable assumptions. In order to explore these results, we introduce a radiative transfer based interaction model to simulate the junco marsh σ0 distribution. After analyzing model simulations, we found evidence that the order parameter is related to the junco plant density distribution inside the junco marsh patch. It is concluded that the order parameter of the K distribution could be a useful parameter to estimate the junco plant density. This result is important for basin hydrodynamic modeling, since marsh plant density is the most important parameter to estimate marsh water conductance.
Chen, Jin; Roth, Robert E; Naito, Adam T; Lengerich, Eugene J; MacEachren, Alan M
2008-01-01
Background Kulldorff's spatial scan statistic and its software implementation – SaTScan – are widely used for detecting and evaluating geographic clusters. However, two issues make using the method and interpreting its results non-trivial: (1) the method lacks cartographic support for understanding the clusters in geographic context and (2) results from the method are sensitive to parameter choices related to cluster scaling (abbreviated as scaling parameters), but the system provides no direct support for making these choices. We employ both established and novel geovisual analytics methods to address these issues and to enhance the interpretation of SaTScan results. We demonstrate our geovisual analytics approach in a case study analysis of cervical cancer mortality in the U.S. Results We address the first issue by providing an interactive visual interface to support the interpretation of SaTScan results. Our research to address the second issue prompted a broader discussion about the sensitivity of SaTScan results to parameter choices. Sensitivity has two components: (1) the method can identify clusters that, while being statistically significant, have heterogeneous contents comprised of both high-risk and low-risk locations and (2) the method can identify clusters that are unstable in location and size as the spatial scan scaling parameter is varied. To investigate cluster result stability, we conducted multiple SaTScan runs with systematically selected parameters. The results, when scanning a large spatial dataset (e.g., U.S. data aggregated by county), demonstrate that no single spatial scan scaling value is known to be optimal to identify clusters that exist at different scales; instead, multiple scans that vary the parameters are necessary. We introduce a novel method of measuring and visualizing reliability that facilitates identification of homogeneous clusters that are stable across analysis scales. Finally, we propose a logical approach to proceed
Spatial Statistics for Dyadic Data: Analyzing the Relationship Landscape.
Wood, Nathan D; Okhotnikov, Ilya A
2017-01-01
Spatial statistics has a rich tradition in earth, economic, and epidemiological sciences and has potential to affect the study of couples as well. When applied to couple data, spatial statistics can model within- and between-couple differences with results that are readily accessible for researchers and clinicians. This article offers a primer in using spatial statistics as a methodological tool for analyzing dyadic data. The article will introduce spatial approaches, review data structure required for spatial analysis, available software, and examples of data output. © 2016 American Association for Marriage and Family Therapy.
Eigenanalysis of SNP data with an identity by descent interpretation.
Zheng, Xiuwen; Weir, Bruce S
2016-02-01
Principal component analysis (PCA) is widely used in genome-wide association studies (GWAS), and the principal component axes often represent perpendicular gradients in geographic space. The explanation of PCA results is of major interest for geneticists to understand fundamental demographic parameters. Here, we provide an interpretation of PCA based on relatedness measures, which are described by the probability that sets of genes are identical-by-descent (IBD). An approximately linear transformation between ancestral proportions (AP) of individuals with multiple ancestries and their projections onto the principal components is found. In addition, a new method of eigenanalysis "EIGMIX" is proposed to estimate individual ancestries. EIGMIX is a method of moments with computational efficiency suitable for millions of SNP data, and it is not subject to the assumption of linkage equilibrium. With the assumptions of multiple ancestries and their surrogate ancestral samples, EIGMIX is able to infer ancestral proportions (APs) of individuals. The methods were applied to the SNP data from the HapMap Phase 3 project and the Human Genome Diversity Panel. The APs of individuals inferred by EIGMIX are consistent with the findings of the program ADMIXTURE. In conclusion, EIGMIX can be used to detect population structure and estimate genome-wide ancestral proportions with a relatively high accuracy.
Interpretations of the OSCAR data for reactive-gas scavenging
Easter, R.C.; Hales, J.M.
1982-11-01
A description is given of the application of a reactive scavenging model for the interpretation of data from the Oxidation and Scavenging Characteristics of April Rains (OSCAR) field study to evaluate scavenging mechanisms. The OSCAR experiment, conducted during April 1982, was a cooperative field investigation of wet removal by cyclonic storms. A part of the experiment involved intensive measurements at a site in NE Indiana and was designed to provide needed inputs for diagnostic scavenging models. Sequential precipitation chemistry, surface and airborne air chemistry, cloud physics, and meteorological measurements were performed. The model application reported here involves a single storm event at the Indiana site. Although the work presented involves the analysis of only a single precipitation event over a limited geographical area (10/sup 4/ km/sup 2/), the data utilized have considerable uncertainties, and the model contains numerous approximations, it is nevertheless concluded that the ability of the model to reproduce much of the observed precipitation chemistry behavior for the event is quite encouraging.
3D seismic data interpretation of Boonsville Field, Texas
NASA Astrophysics Data System (ADS)
Alhakeem, Aamer Ali
The Boonsville field is one of the largest gas fields in the US located in the Fort Worth Basin, north central Texas. The highest potential reservoirs reside in the Bend Conglomerate deposited during the Pennsylvanian. The Boonsville data set is prepared by the Bureau of Economic Geology at the University of Texas, Austin, as part of the secondary gas recovery program. The Boonsville field seismic data set covers an area of 5.5 mi2. It includes 38 wells data. The Bend Conglomerate is deposited in fluvio-deltaic transaction. It is subdivided into many genetic sequences which include depositions of sandy conglomerate representing the potential reserves in the Boonsville field. The geologic structure of the Boonsville field subsurface are visualized by constructing structure maps of Caddo, Davis, Runaway, Beans Cr, Vineyard, and Wade. The mapping includes time structure, depth structure, horizon slice, velocity maps, and isopach maps. Many anticlines and folds are illustrated. Karst collapse features are indicated specially in the lower Atoka. Dipping direction of the Bend Conglomerate horizons are changing from dipping toward north at the top to dipping toward east at the bottom. Stratigraphic interpretation of the Runaway Formation and the Vineyard Formation using well logs and seismic data integration showed presence of fluvial dominated channels, point bars, and a mouth bar. RMS amplitude maps are generated and used as direct hydrocarbon indicator for the targeted formations. As a result, bright spots are indicated and used to identify potential reservoirs. Petrophysical analysis is conducted to obtain gross, net pay, NGR, water saturation, shale volume, porosity, and gas formation factor. Volumetric calculations estimated 989.44 MMSCF as the recoverable original gas in-place for a prospect in the Runaway and 3.32 BSCF for a prospect in the Vineyard Formation.
Identification and interpretation of patterns in rocket engine data
NASA Astrophysics Data System (ADS)
Lo, C. F.; Wu, K.; Whitehead, B. A.
1993-10-01
A prototype software system was constructed to detect anomalous Space Shuttle Main Engine (SSME) behavior in the early stages of fault development significantly earlier than the indication provided by either redline detection mechanism or human expert analysis. The major task of the research project is to analyze ground test data, to identify patterns associated with the anomalous engine behavior, and to develop a pattern identification and detection system on the basis of this analysis. A prototype expert system which was developed on both PC and Symbolics 3670 lisp machine for detecting anomalies in turbopump vibration data was checked with data from ground tests 902-473, 902-501, 902-519, and 904-097 of the Space Shuttle Main Engine. The neural networks method was also applied to supplement the statistical method utilized in the prototype system to investigate the feasibility in detecting anomalies in turbopump vibration of SSME. In most cases the anomalies detected by the expert system agree with those reported by NASA. On the neural networks approach, the results are given the successful detection rate higher than 95 percent to identify either normal or abnormal running condition based on the experimental data as well as numerical simulation.
Identification and interpretation of patterns in rocket engine data
NASA Technical Reports Server (NTRS)
Lo, C. F.; Wu, K.; Whitehead, B. A.
1993-01-01
A prototype software system was constructed to detect anomalous Space Shuttle Main Engine (SSME) behavior in the early stages of fault development significantly earlier than the indication provided by either redline detection mechanism or human expert analysis. The major task of the research project is to analyze ground test data, to identify patterns associated with the anomalous engine behavior, and to develop a pattern identification and detection system on the basis of this analysis. A prototype expert system which was developed on both PC and Symbolics 3670 lisp machine for detecting anomalies in turbopump vibration data was checked with data from ground tests 902-473, 902-501, 902-519, and 904-097 of the Space Shuttle Main Engine. The neural networks method was also applied to supplement the statistical method utilized in the prototype system to investigate the feasibility in detecting anomalies in turbopump vibration of SSME. In most cases the anomalies detected by the expert system agree with those reported by NASA. On the neural networks approach, the results are given the successful detection rate higher than 95 percent to identify either normal or abnormal running condition based on the experimental data as well as numerical simulation.
NASA Astrophysics Data System (ADS)
Bouzid, Mohamed; Sellaoui, Lotfi; Khalfaoui, Mohamed; Belmabrouk, Hafedh; Lamine, Abdelmottaleb Ben
2016-02-01
In this work, we studied the adsorption of ethanol on three types of activated carbon, namely parent Maxsorb III and two chemically modified activated carbons (H2-Maxsorb III and KOH-H2-Maxsorb III). This investigation has been conducted on the basis of the grand canonical formalism in statistical physics and on simplified assumptions. This led to three parameter equations describing the adsorption of ethanol onto the three types of activated carbon. There was a good correlation between experimental data and results obtained by the new proposed equation. The parameters characterizing the adsorption isotherm were the number of adsorbed molecules (s) per site n, the density of the receptor sites per unit mass of the adsorbent Nm, and the energetic parameter p1/2. They were estimated for the studied systems by a non linear least square regression. The results show that the ethanol molecules were adsorbed in perpendicular (or non parallel) position to the adsorbent surface. The magnitude of the calculated adsorption energies reveals that ethanol is physisorbed onto activated carbon. Both van der Waals and hydrogen interactions were involved in the adsorption process. The calculated values of the specific surface AS, proved that the three types of activated carbon have a highly microporous surface.
"What If" Analyses: Ways to Interpret Statistical Significance Test Results Using EXCEL or "R"
ERIC Educational Resources Information Center
Ozturk, Elif
2012-01-01
The present paper aims to review two motivations to conduct "what if" analyses using Excel and "R" to understand the statistical significance tests through the sample size context. "What if" analyses can be used to teach students what statistical significance tests really do and in applied research either prospectively to estimate what sample size…
Statistical Analysis of Research Data | Center for Cancer Research
Recent advances in cancer biology have resulted in the need for increased statistical analysis of research data. The Statistical Analysis of Research Data (SARD) course will be held on April 12-13, 2017 from 9:00 AM – 5:00 PM at the Natcher Conference Center, Balcony A on the Bethesda campus. SARD is designed to provide an overview of the general principles of statistical analysis of research data. The course will be taught by Paul W. Thurman of Columbia University.
Using Data Mining to Teach Applied Statistics and Correlation
ERIC Educational Resources Information Center
Hartnett, Jessica L.
2016-01-01
This article describes two class activities that introduce the concept of data mining and very basic data mining analyses. Assessment data suggest that students learned some of the conceptual basics of data mining, understood some of the ethical concerns related to the practice, and were able to perform correlations via the Statistical Package for…
Using Data Mining to Teach Applied Statistics and Correlation
ERIC Educational Resources Information Center
Hartnett, Jessica L.
2016-01-01
This article describes two class activities that introduce the concept of data mining and very basic data mining analyses. Assessment data suggest that students learned some of the conceptual basics of data mining, understood some of the ethical concerns related to the practice, and were able to perform correlations via the Statistical Package for…
NASA Astrophysics Data System (ADS)
Lee, J.; Chang, H.
2001-12-01
In this research, we investigate the reciprocal influence between groundwater flow and its salinization occurred in two underground cavern sites, using major ion chemistry, PCA for chemical analysis data, and cross-correlation for various hydraulic data. The study areas are two underground LPG storage facilities constructed in South Sea coast, Yosu, and West Sea coastal regions, Pyeongtaek, Korea. Considerably high concentration of major cations and anions of groundwaters at both sites showed brackish or saline water types. In Yosu site, some great chemical difference of groundwater samples between rainy and dry season was caused by temporal intrusion of high-saline water into propane and butane cavern zone, but not in Pyeongtaek site. Cl/Br ratios and δ 18O- δ D distribution for tracing of salinization source water in both sites revealed that two kind of saline water (seawater and halite-dissolved solution) could influence the groundwater salinization in Yosu site, whereas only seawater intrusion could affect the groundwater chemistry of the observation wells in Pyeongtaek site. PCA performed by 8 and 10 chemical ions as statistical variables in both sites showed that intensive intrusion of seawater through butane cavern was occurred at Yosu site while seawater-groundwater mixing was observed at some observation wells located in the marginal part of Pyeongtaek site. Cross-correlation results revealed that the positive relationship between hydraulic head and cavern operating pressure was far more conspicuous at propane cavern zone in both sites (65 ~90% of correlation coefficients). According to the cross-correlation results of Yosu site, small change of head could provoke massive influx of halite-dissolved solution from surface through vertically developed fracture networks. However in Pyeongtaek site, the pressure-sensitive observation wells are not completely consistent with seawater-mixed wells, and the hydraulic change of heads at these wells related to the
Bayesian Analysis of Order-Statistics Models for Ranking Data.
ERIC Educational Resources Information Center
Yu, Philip L. H.
2000-01-01
Studied the order-statistics models, extending the usual normal order-statistics model into one in which the underlying random variables followed a multivariate normal distribution. Used a Bayesian approach and the Gibbs sampling technique. Applied the proposed method to analyze presidential election data from the American Psychological…
Explorations in Statistics: The Analysis of Ratios and Normalized Data
ERIC Educational Resources Information Center
Curran-Everett, Douglas
2013-01-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This ninth installment of "Explorations in Statistics" explores the analysis of ratios and normalized--or standardized--data. As researchers, we compute a ratio--a numerator divided by a denominator--to compute a…
Analysis of Accelerants in Fire Debris - Data Interpretation.
Bertsch, W
1997-06-01
Analysis of accelerants in fire debris involves the isolation of residual volatiles from the matrix and the analysis of these volatiles, usually by gas chromatography (GC). The resulting chromatograms are interpreted by comparing to a library of accelerant chromatograms obtained under similar conditions. This review first mentions ASTM's system in classifying fire accelerants into light petroleum distillates, gasoline, medium petroleum distillates, kerosene, heavy petroleum distillates, and unclassified compounds. Chromatograms with well-resolved n-alkane homolog patterns are most recognizable. Chromatograms that are inadequately resolved can be improved by columns having higher efficiency or selectivity, while those with too much interference can be improved by physical removal or reduction of these interfering compounds or selective detection. Using a mass spectrometer (MS) as the detector in GC/MS applications allows the display of common ions shared by compounds with similar structural features, thus greatly facilitating pattern recognition practices. Computer algorithms are now available for automated recognition of patterns possessed by various categories of accelerants. The state-of-the-art in forensic laboratories' analysis of accelerants in fire debris is presented as an appendix to this review. Data generated in annual proficiency tests over an 8-year period (1987-1995) revealed increased use of GC/MS instrumentation and some persisting problems, which include false positives and difficulties associated with component discrimination in the sample preparation process and recognition of partially evaporated distillates.
The Galactic Center: possible interpretations of observational data.
NASA Astrophysics Data System (ADS)
Zakharov, Alexander
2015-08-01
There are not too many astrophysical cases where one really has an opportunity to check predictions of general relativity in the strong gravitational field limit. For these aims the black hole at the Galactic Center is one of the most interesting cases since it is the closest supermassive black hole. Gravitational lensing is a natural phenomenon based on the effect of light deflection in a gravitational field (isotropic geodesics are not straight lines in gravitational field and in a weak gravitational field one has small corrections for light deflection while the perturbative approach is not suitable for a strong gravitational field). Now there are two basic observational techniques to investigate a gravitational potential at the Galactic Center, namely, a) monitoring the orbits of bright stars near the Galactic Center to reconstruct a gravitational potential; b) measuring a size and a shape of shadows around black hole giving an alternative possibility to evaluate black hole parameters in mm-band with VLBI-technique. At the moment one can use a small relativistic correction approach for stellar orbit analysis (however, in the future the approximation will not be not precise enough due to enormous progress of observational facilities) while now for smallest structure analysis in VLBI observations one really needs a strong gravitational field approximation. We discuss results of observations, their conventional interpretations, tensions between observations and models and possible hints for a new physics from the observational data and tensions between observations and interpretations.References1. A.F. Zakharov, F. De Paolis, G. Ingrosso, and A. A. Nucita, New Astronomy Reviews, 56, 64 (2012).2. D. Borka, P. Jovanovic, V. Borka Jovanovic and A.F. Zakharov, Physical Reviews D, 85, 124004 (2012).3. D. Borka, P. Jovanovic, V. Borka Jovanovic and A.F. Zakharov, Journal of Cosmology and Astroparticle Physics, 11, 050 (2013).4. A.F. Zakharov, Physical Reviews D 90
NASA Astrophysics Data System (ADS)
Borradaile, Graham J.; Werner, Tomasz; Lagroix, France
2003-02-01
The Kapuskasing Structural Zone (KSZ) reveals a section through the Archean lower crustal granoblastic gneisses. Our new paleomagnetic data largely agree with previous work but we show that interpretations vary according to the choices of statistical, demagnetization and field-correction techniques. First, where the orientation distribution of characteristic remanence directions on the sphere is not symmetrically circular, the commonly used statistical model is invalid [Fisher, R.A., Proc. R. Soc. A217 (1953) 295]. Any tendency to form an elliptical distribution indicates that the sample is drawn from a Bingham-type population [Bingham, C., 1964. Distributions on the sphere and on the projective plane. PhD thesis, Yale University]. Fisher and Bingham statistics produce different confidence estimates from the same data and the traditionally defined mean vector may differ from the maximum eigenvector of an orthorhombic Bingham distribution. It seems prudent to apply both models wherever a non-Fisher population is suspected and that may be appropriate in any tectonized rocks. Non-Fisher populations require larger sample sizes so that focussing on individual sites may not be the most effective policy in tectonized rocks. More dispersed sampling across tectonic structures may be more productive. Second, from the same specimens, mean vectors isolated by thermal and alternating field (AF) demagnetization differ. Which treatment gives more meaningful results is difficult to decipher, especially in metamorphic rocks where the history of the magnetic minerals is not easily related to the ages of tectonic and petrological events. In this study, thermal demagnetization gave lower inclinations for paleomagnetic vectors and thus more distant paleopoles. Third, of more parochial significance, tilt corrections may be unnecessary in the KSZ because magnetic fabrics and thrust ramp are constant in orientation to the depth at which they level off, at approximately 15-km depth. With
Boyle temperature as a point of ideal gas in gentile statistics and its economic interpretation
NASA Astrophysics Data System (ADS)
Maslov, V. P.; Maslova, T. V.
2014-07-01
Boyle temperature is interpreted as the temperature at which the formation of dimers becomes impossible. To Irving Fisher's correspondence principle we assign two more quantities: the number of degrees of freedom, and credit. We determine the danger level of the mass of money M when the mutual trust between economic agents begins to fall.
Hermida, Leandro; Poussin, Carine; Stadler, Michael B; Gubian, Sylvain; Sewer, Alain; Gaidatzis, Dimos; Hotz, Hans-Rudolf; Martin, Florian; Belcastro, Vincenzo; Cano, Stéphane; Peitsch, Manuel C; Hoeng, Julia
2013-07-29
High-throughput omics technologies such as microarrays and next-generation sequencing (NGS) have become indispensable tools in biological research. Computational analysis and biological interpretation of omics data can pose significant challenges due to a number of factors, in particular the systems integration required to fully exploit and compare data from different studies and/or technology platforms. In transcriptomics, the identification of differentially expressed genes when studying effect(s) or contrast(s) of interest constitutes the starting point for further downstream computational analysis (e.g. gene over-representation/enrichment analysis, reverse engineering) leading to mechanistic insights. Therefore, it is important to systematically store the full list of genes with their associated statistical analysis results (differential expression, t-statistics, p-value) corresponding to one or more effect(s) or contrast(s) of interest (shortly termed as " contrast data") in a comparable manner and extract gene sets in order to efficiently support downstream analyses and further leverage data on a long-term basis. Filling this gap would open new research perspectives for biologists to discover disease-related biomarkers and to support the understanding of molecular mechanisms underlying specific biological perturbation effects (e.g. disease, genetic, environmental, etc.). To address these challenges, we developed Confero, a contrast data and gene set platform for downstream analysis and biological interpretation of omics data. The Confero software platform provides storage of contrast data in a simple and standard format, data transformation to enable cross-study and platform data comparison, and automatic extraction and storage of gene sets to build new a priori knowledge which is leveraged by integrated and extensible downstream computational analysis tools. Gene Set Enrichment Analysis (GSEA) and Over-Representation Analysis (ORA) are currently integrated
Integrative analyses of cancer data: a review from a statistical perspective.
Wei, Yingying
2015-01-01
It has become increasingly common for large-scale public data repositories and clinical settings to have multiple types of data, including high-dimensional genomics, epigenomics, and proteomics data as well as survival data, measured simultaneously for the same group of biological samples, which provides unprecedented opportunities to understand cancer mechanisms from a more comprehensive scope and to develop new cancer therapies. Nevertheless, how to interpret a wealth of data into biologically and clinically meaningful information remains very challenging. In this paper, I review recent development in statistics for integrative analyses of cancer data. Topics will cover meta-analysis of homogeneous type of data across multiple studies, integrating multiple heterogeneous genomic data types, survival analysis with high-or ultrahigh-dimensional genomic profiles, and cross-data-type prediction where both predictors and responses are high-or ultrahigh-dimensional vectors. I compare existing statistical methods and comment on potential future research problems.
Integrative Analyses of Cancer Data: A Review from a Statistical Perspective
Wei, Yingying
2015-01-01
It has become increasingly common for large-scale public data repositories and clinical settings to have multiple types of data, including high-dimensional genomics, epigenomics, and proteomics data as well as survival data, measured simultaneously for the same group of biological samples, which provides unprecedented opportunities to understand cancer mechanisms from a more comprehensive scope and to develop new cancer therapies. Nevertheless, how to interpret a wealth of data into biologically and clinically meaningful information remains very challenging. In this paper, I review recent development in statistics for integrative analyses of cancer data. Topics will cover meta-analysis of homogeneous type of data across multiple studies, integrating multiple heterogeneous genomic data types, survival analysis with high-or ultrahigh-dimensional genomic profiles, and cross-data-type prediction where both predictors and responses are high-or ultrahigh-dimensional vectors. I compare existing statistical methods and comment on potential future research problems. PMID:26041968
Statistical Modeling of Natural Backgrounds in Hyperspectral LWIR Data
2016-09-06
Statistical Modeling of Natural Backgrounds in Hyperspectral LWIR Data Eric Truslowa, Dimitris Manolakisa, Thomas Cooleyb, and Joseph Meolac aMIT...87117 cSensors Directorate, Air Force Research Laboratory, Wright-Patterson AFB, OH 45433 Hyperspectral sensors operating in the long wave infrared (LWIR...investigated. In this paper, we investigate modeling hyperspectral LWIR data using a statistical mixture model for the emissivity and surface
Antweiler, R.C.; Taylor, H.E.
2008-01-01
The main classes of statistical treatment of below-detection limit (left-censored) environmental data for the determination of basic statistics that have been used in the literature are substitution methods, maximum likelihood, regression on order statistics (ROS), and nonparametric techniques. These treatments, along with using all instrument-generated data (even those below detection), were evaluated by examining data sets in which the true values of the censored data were known. It was found that for data sets with less than 70% censored data, the best technique overall for determination of summary statistics was the nonparametric Kaplan-Meier technique. ROS and the two substitution methods of assigning one-half the detection limit value to censored data or assigning a random number between zero and the detection limit to censored data were adequate alternatives. The use of these two substitution methods, however, requires a thorough understanding of how the laboratory censored the data. The technique of employing all instrument-generated data - including numbers below the detection limit - was found to be less adequate than the above techniques. At high degrees of censoring (greater than 70% censored data), no technique provided good estimates of summary statistics. Maximum likelihood techniques were found to be far inferior to all other treatments except substituting zero or the detection limit value to censored data.
2013-01-01
Background High-throughput omics technologies such as microarrays and next-generation sequencing (NGS) have become indispensable tools in biological research. Computational analysis and biological interpretation of omics data can pose significant challenges due to a number of factors, in particular the systems integration required to fully exploit and compare data from different studies and/or technology platforms. In transcriptomics, the identification of differentially expressed genes when studying effect(s) or contrast(s) of interest constitutes the starting point for further downstream computational analysis (e.g. gene over-representation/enrichment analysis, reverse engineering) leading to mechanistic insights. Therefore, it is important to systematically store the full list of genes with their associated statistical analysis results (differential expression, t-statistics, p-value) corresponding to one or more effect(s) or contrast(s) of interest (shortly termed as ” contrast data”) in a comparable manner and extract gene sets in order to efficiently support downstream analyses and further leverage data on a long-term basis. Filling this gap would open new research perspectives for biologists to discover disease-related biomarkers and to support the understanding of molecular mechanisms underlying specific biological perturbation effects (e.g. disease, genetic, environmental, etc.). Results To address these challenges, we developed Confero, a contrast data and gene set platform for downstream analysis and biological interpretation of omics data. The Confero software platform provides storage of contrast data in a simple and standard format, data transformation to enable cross-study and platform data comparison, and automatic extraction and storage of gene sets to build new a priori knowledge which is leveraged by integrated and extensible downstream computational analysis tools. Gene Set Enrichment Analysis (GSEA) and Over-Representation Analysis (ORA) are
Measuring and interpretation of three-component borehole magnetic data
NASA Astrophysics Data System (ADS)
Virgil, C.; Ehmann, S.; Hördt, A.; Leven, M.; Steveling, E.
2012-04-01
Three-component borehole magnetics provides important additional information compared with total field or horizontal and vertical measurements. The "Göttinger Bohrloch Magnetometer" (GBM) is capable of recording the vector of the magnetic field along with the orientation of the tool using three fluxgate magnetometers and fibre-optic gyros. The GBM was successfully applied in the Outokumpu Deep Drill Hole (OKU R2500), Finland in September 2008 and in the Louisville Seamount Trail (IODP Expedition 330) from December 2010 until February 2011, and in several shallower boreholes. With the declination of the magnetic field, the GBM provides additional information compared to conventional tools, which reduces the ambiguity for structural interpretation. The position of ferromagnetic objects in the vicinity of the borehole can be computed with higher accuracy. In the case of drilled-through structures, three-component borehole magnetics allow the computation of the vector of magnetization. Using supplementary susceptibility data, the natural remanent magnetization (NRM) vector can be derived, which yields information about the apparent polar wander curve and/or about the structural evolution of the rock units. The NRM vector can further be used to reorient core samples in regions of strong magnetization. The most important aspect in three-component borehole magnetics is the knowledge of the orientation of the probe along the drillhole. With the GBM we use three fibre-optic gyros (FOG), which are aligned orthogonal to each other. These instruments record the turning rate about the three main axes of the probe. The FOGs benefit from a high resolution (< 9 · 10-4 °) and a low drift (< 2 °/h). However, to reach optimal results, extensive data processing and calibration measurements are necessary. Properties to be taken into account are the misalignment, scaling factors and offsets of the fluxgate and FOG triplet, temperature dependent drift of the FOGs, misalignment of the
The IUE data bank: Statistics and future aspects
NASA Technical Reports Server (NTRS)
Schmitz, Marion; Barylak, Michael
1988-01-01
The data exchange policy between Goddard Space Flight Center and ESA's Villafranca (Spain) station is described. The IUE data banks and their uses are outlined. Statistical information on objects observed, the quantity of data distributed and retrieved from the archives, together with a detailed design of the final format of the IUE merged log are also given.
Promoting Statistical Thinking in Schools with Road Injury Data
ERIC Educational Resources Information Center
Woltman, Marie
2017-01-01
Road injury is an immediately relevant topic for 9-19 year olds. Current availability of Open Data makes it increasingly possible to find locally relevant data. Statistical lessons developed from these data can mutually reinforce life lessons about minimizing risk on the road. Devon County Council demonstrate how a wide array of statistical…
Statistical methods of combining information: Applications to sensor data fusion
Burr, T.
1996-12-31
This paper reviews some statistical approaches to combining information from multiple sources. Promising new approaches will be described, and potential applications to combining not-so-different data sources such as sensor data will be discussed. Experiences with one real data set are described.
Promoting Statistical Thinking in Schools with Road Injury Data
ERIC Educational Resources Information Center
Woltman, Marie
2017-01-01
Road injury is an immediately relevant topic for 9-19 year olds. Current availability of Open Data makes it increasingly possible to find locally relevant data. Statistical lessons developed from these data can mutually reinforce life lessons about minimizing risk on the road. Devon County Council demonstrate how a wide array of statistical…
Social inequality: from data to statistical physics modeling
NASA Astrophysics Data System (ADS)
Chatterjee, Arnab; Ghosh, Asim; Inoue, Jun-ichi; Chakrabarti, Bikas K.
2015-09-01
Social inequality is a topic of interest since ages, and has attracted researchers across disciplines to ponder over it origin, manifestation, characteristics, consequences, and finally, the question of how to cope with it. It is manifested across different strata of human existence, and is quantified in several ways. In this review we discuss the origins of social inequality, the historical and commonly used non-entropic measures such as Lorenz curve, Gini index and the recently introduced k index. We also discuss some analytical tools that aid in understanding and characterizing them. Finally, we argue how statistical physics modeling helps in reproducing the results and interpreting them.
Mars Geological Province Designations for the Interpretation of GRS Data
NASA Technical Reports Server (NTRS)
Dohm, J. M.; Kerry, K.; Baker, V. R.; Boynton, W.; Maruyama, Shige; Anderson, R. C.
2005-01-01
elemental information, we have defined geologic provinces that represent significant windows into the geological evolution of Mars, unfolding the GEOMARS Theory and forming the basis for interpreting GRS data.
Interpretation of biostratigraphic data at a sequence stratigraphic scale
Goodman, D.K. ); Posamentier, H.W. )
1993-02-01
Recent advances in sequence stratigraphic concepts provide a framework within which biostratigraphic data can be used in conjunction with other stratigraphic tools in an integrated approach to stratigraphic analysis. Sequence stratigraphic concepts suggest that lithologic sections are composed of a succession of unconformity-bounded units or sequences. These sequences can, in turn, be subdivided into systems tracts bounded by flooding surfaces or maximum flooding surfaces. Three systems tracts comprise a sequence: the lowstand or shelf margin systems tract at the base, followed by the transgressive systems tract, and the highstand systems tract. On continental margins with a discrete shelf/slope break, deep-sea submarine fans, shelf-margin deltas, and incised-valley fills characterize the lowstand systems tract. On ramp-like continental margin, the lowstand systems tract is characterized by basinally-isolated lowstand shorelines with or without preserved incised-valley feeder systems. The transgressive systems tract has backstepping shorelines and estuarine fill of incised valley systems. The highstand systems tract is has forestepping depositional systems and widespread floodplain development. A large amount of the structure in the stratigraphic distribution of fossils can be attributed to sequence architecture. This structure can be statistically delineated in terms of a hierarchy that is independent of both fossil group and geological age. The definition of [open quotes]biological analogs[close quotes] of systems tracts reduces the complexity of paleontological census data to a set of (1) internal characteristics within genetic units and (2) boundary conditions at stratal discontinuities. Different taxon groups should display unique and repeatable patterns within different systems tracts and at stratal discontinuities within a basin, providing a new perspective of paleontological data across a spectrum of applications in sequence characterization and correlation.
Statistical summaries of selected Iowa streamflow data through September 2013
Eash, David A.; O'Shea, Padraic S.; Weber, Jared R.; Nguyen, Kevin T.; Montgomery, Nicholas L.; Simonson, Adrian J.
2016-01-04
Statistical summaries of streamflow data collected at 184 streamgages in Iowa are presented in this report. All streamgages included for analysis have at least 10 years of continuous record collected before or through September 2013. This report is an update to two previously published reports that presented statistical summaries of selected Iowa streamflow data through September 1988 and September 1996. The statistical summaries include (1) monthly and annual flow durations, (2) annual exceedance probabilities of instantaneous peak discharges (flood frequencies), (3) annual exceedance probabilities of high discharges, and (4) annual nonexceedance probabilities of low discharges and seasonal low discharges. Also presented for each streamgage are graphs of the annual mean discharges, mean annual mean discharges, 50-percent annual flow-duration discharges (median flows), harmonic mean flows, mean daily mean discharges, and flow-duration curves. Two sets of statistical summaries are presented for each streamgage, which include (1) long-term statistics for the entire period of streamflow record and (2) recent-term statistics for or during the 30-year period of record from 1984 to 2013. The recent-term statistics are only calculated for streamgages with streamflow records pre-dating the 1984 water year and with at least 10 years of record during 1984–2013. The streamflow statistics in this report are not adjusted for the effects of water use; although some of this water is used consumptively, most of it is returned to the streams.
Korjus, Kristjan; Hebart, Martin N.; Vicente, Raul
2016-01-01
Supervised machine learning methods typically require splitting data into multiple chunks for training, validating, and finally testing classifiers. For finding the best parameters of a classifier, training and validation are usually carried out with cross-validation. This is followed by application of the classifier with optimized parameters to a separate test set for estimating the classifier’s generalization performance. With limited data, this separation of test data creates a difficult trade-off between having more statistical power in estimating generalization performance versus choosing better parameters and fitting a better model. We propose a novel approach that we term “Cross-validation and cross-testing” improving this trade-off by re-using test data without biasing classifier performance. The novel approach is validated using simulated data and electrophysiological recordings in humans and rodents. The results demonstrate that the approach has a higher probability of discovering significant results than the standard approach of cross-validation and testing, while maintaining the nominal alpha level. In contrast to nested cross-validation, which is maximally efficient in re-using data, the proposed approach additionally maintains the interpretability of individual parameters. Taken together, we suggest an addition to currently used machine learning approaches which may be particularly useful in cases where model weights do not require interpretation, but parameters do. PMID:27564393
[Some basic aspects in statistical analysis of visual acuity data].
Ren, Ze-Qin
2007-06-01
All visual acuity charts used currently have their own shortcomings. Therefore, it is difficult for ophthalmologists to evaluate visual acuity data. Many problems present in the use of statistical methods for handling visual acuity data in clinical research. The quantitative relationship between visual acuity and visual angle varied in different visual acuity charts. The type of visual acuity and visual angle are different from each other. Therefore, different statistical methods should be used for different data sources. A correct understanding and analysis of visual acuity data could be obtained only after the elucidation of these aspects.
MacKinnon, David P; Pirlott, Angela G
2015-02-01
Statistical mediation methods provide valuable information about underlying mediating psychological processes, but the ability to infer that the mediator variable causes the outcome variable is more complex than widely known. Researchers have recently emphasized how violating assumptions about confounder bias severely limits causal inference of the mediator to dependent variable relation. Our article describes and addresses these limitations by drawing on new statistical developments in causal mediation analysis. We first review the assumptions underlying causal inference and discuss three ways to examine the effects of confounder bias when assumptions are violated. We then describe four approaches to address the influence of confounding variables and enhance causal inference, including comprehensive structural equation models, instrumental variable methods, principal stratification, and inverse probability weighting. Our goal is to further the adoption of statistical methods to enhance causal inference in mediation studies.
MacKinnon, David P.; Pirlott, Angela G.
2016-01-01
Statistical mediation methods provide valuable information about underlying mediating psychological processes, but the ability to infer that the mediator variable causes the outcome variable is more complex than widely known. Researchers have recently emphasized how violating assumptions about confounder bias severely limits causal inference of the mediator to dependent variable relation. Our article describes and addresses these limitations by drawing on new statistical developments in causal mediation analysis. We first review the assumptions underlying causal inference and discuss three ways to examine the effects of confounder bias when assumptions are violated. We then describe four approaches to address the influence of confounding variables and enhance causal inference, including comprehensive structural equation models, instrumental variable methods, principal stratification, and inverse probability weighting. Our goal is to further the adoption of statistical methods to enhance causal inference in mediation studies. PMID:25063043
Joint interpretation of geophysical data using Image Fusion techniques
NASA Astrophysics Data System (ADS)
Karamitrou, A.; Tsokas, G.; Petrou, M.
2013-12-01
Joint interpretation of geophysical data produced from different methods is a challenging area of research in a wide range of applications. In this work we apply several image fusion approaches to combine maps of electrical resistivity, electromagnetic conductivity, vertical gradient of the magnetic field, magnetic susceptibility, and ground penetrating radar reflections, in order to detect archaeological relics. We utilize data gathered from Arkansas University, with the support of the U.S. Department of Defense, through the Strategic Environmental Research and Development Program (SERDP-CS1263). The area of investigation is the Army City, situated in Riley Country of Kansas, USA. The depth of the relics is estimated about 30 cm from the surface, yet the surface indications of its existence are limited. We initially register the images from the different methods to correct from random offsets due to the use of hand-held devices during the measurement procedure. Next, we apply four different image fusion approaches to create combined images, using fusion with mean values, wavelet decomposition, curvelet transform, and curvelet transform enhancing the images along specific angles. We create seven combinations of pairs between the available geophysical datasets. The combinations are such that for every pair at least one high-resolution method (resistivity or magnetic gradiometry) is included. Our results indicate that in almost every case the method of mean values produces satisfactory fused images that corporate the majority of the features of the initial images. However, the contrast of the final image is reduced, and in some cases the averaging process nearly eliminated features that are fade in the original images. Wavelet based fusion outputs also good results, providing additional control in selecting the feature wavelength. Curvelet based fusion is proved the most effective method in most of the cases. The ability of curvelet domain to unfold the image in
ERIC Educational Resources Information Center
Maltese, Adam V.; Svetina, Dubravka; Harsh, Joseph A.
2015-01-01
In the STEM fields, adequate proficiency in reading and interpreting graphs is widely held as a central element for scientific literacy given the importance of data visualizations to succinctly present complex information. Although prior research espouses methods to improve graphing proficiencies, there is little understanding about when and how…
[Statistic pitfalls or how should we interprete numbers in the evaluation of a new treatment].
Martin-Du Pan, R
1998-06-01
Are cholesterol lowering drugs useful? Do they increase life expectancy? Do third generation oral contraceptives increase the risk of venous thromboembolism? Is there a worldwide decline in semen quality over the last 50 years? Do vitamin supplements improve your child's IQ? Does homeopathy work better than placebo? These questions illustrate some statistical problems and some bias encountered during clinical studies, which can lead to erroneous results. Type I and II errors, surveillance, prescription or publication bias as well as the healthy user effect are described. Problems of regression to the mean, limits of meta-analysis validity and other statistical problems are discussed.
Professional judgment and the interpretation of viable mold air sampling data.
Johnson, David; Thompson, David; Clinkenbeard, Rodney; Redus, Jason
2008-10-01
Although mold air sampling is technically straightforward, interpreting the results to decide if there is an indoor source is not. Applying formal statistical tests to mold sampling data is an error-prone practice due to the extreme data variability. With neither established exposure limits nor useful statistical techniques, indoor air quality investigators often must rely on their professional judgment, but the lack of a consensus "decision strategy" incorporating explicit decision criteria requires professionals to establish their own personal set of criteria when interpreting air sampling data. This study examined the level of agreement among indoor air quality practitioners in their evaluation of airborne mold sampling data and explored differences in inter-evaluator assessments. Eighteen investigators independently judged 30 sets of viable mold air sampling results to indicate: "definite indoor mold source," "likely indoor mold source," "not enough information to decide," "likely no indoor mold source," or "definitely no indoor mold source." Kappa coefficient analysis indicated weak inter-observer reliability, and comparison of evaluator mean scores showed clear inter-evaluator differences in their overall scoring patterns. The responses were modeled on indicator "traits" of the data sets using a generalized, linear mixed model approach and showed several traits to be associated with respondents' ratings, but they also demonstrated distinct and divergent inter-evaluator response patterns. Conclusions were that there was only weak overall agreement in evaluation of the mold sampling data, that particular traits of the data were associated with the conclusions reached, and that there were substantial inter-evaluator differences that were likely due to differences in the personal decision criteria employed by the individual evaluators. The overall conclusion was that there is a need for additional work to rigorously explore the constellation of decision criteria
Mining gene expression data by interpreting principal components
Roden, Joseph C; King, Brandon W; Trout, Diane; Mortazavi, Ali; Wold, Barbara J; Hart, Christopher E
2006-01-01
Background There are many methods for analyzing microarray data that group together genes having similar patterns of expression over all conditions tested. However, in many instances the biologically important goal is to identify relatively small sets of genes that share coherent expression across only some conditions, rather than all or most conditions as required in traditional clustering; e.g. genes that are highly up-regulated and/or down-regulated similarly across only a subset of conditions. Equally important is the need to learn which conditions are the decisive ones in forming such gene sets of interest, and how they relate to diverse conditional covariates, such as disease diagnosis or prognosis. Results We present a method for automatically identifying such candidate sets of biologically relevant genes using a combination of principal components analysis and information theoretic metrics. To enable easy use of our methods, we have developed a data analysis package that facilitates visualization and subsequent data mining of the independent sources of significant variation present in gene microarray expression datasets (or in any other similarly structured high-dimensional dataset). We applied these tools to two public datasets, and highlight sets of genes most affected by specific subsets of conditions (e.g. tissues, treatments, samples, etc.). Statistically significant associations for highlighted gene sets were shown via global analysis for Gene Ontology term enrichment. Together with covariate associations, the tool provides a basis for building testable hypotheses about the biological or experimental causes of observed variation. Conclusion We provide an unsupervised data mining technique for diverse microarray expression datasets that is distinct from major methods now in routine use. In test uses, this method, based on publicly available gene annotations, appears to identify numerous sets of biologically relevant genes. It has proven especially
Influence of heterogeneity on the interpretation of pumping test data in leaky aquifers
NASA Astrophysics Data System (ADS)
Copty, Nadim K.; Trinchero, Paolo; Sanchez-Vila, Xavier; Sarioglu, Murat Savas; Findikakis, Angelos N.
2008-11-01
Pumping tests are routinely interpreted from the analysis of drawdown data and their derivatives. These interpretations result in a small number of apparent parameter values which lump the underlying heterogeneous structure of the aquifer. Key questions in such interpretations are (1) what is the physical meaning of those lumped parameters and (2) whether it is possible to infer some information about the spatial variability of the hydraulic parameters. The system analyzed in this paper consists of an aquifer separated from a second recharging aquifer by means of an aquitard. The natural log transforms of the transmissivity, ln T, and the vertical conductance of the aquitard, ln C, are modeled as two independent second-order stationary spatial random functions (SRFs). The Monte Carlo approach is used to simulate the time-dependent drawdown at a suite of observation points for different values of the statistical parameters defining the SRFs. Drawdown data at each observation point are independently used to estimate hydraulic parameters using three existing methods: (1) the inflection-point method, (2) curve-fitting, and (3) the double inflection-point method. The resulting estimated parameters are shown to be space dependent and vary with the interpretation method since each method gives different emphasis to different parts of the time-drawdown data. Moreover, the heterogeneity in the pumped aquifer or the aquitard influences the estimates in distinct manners. Finally, we show that, by combining the parameter estimates obtained from the different analysis procedures, information about the heterogeneity of the leaky aquifer system may be inferred.
Estimation of global network statistics from incomplete data.
Bliss, Catherine A; Danforth, Christopher M; Dodds, Peter Sheridan
2014-01-01
Complex networks underlie an enormous variety of social, biological, physical, and virtual systems. A profound complication for the science of complex networks is that in most cases, observing all nodes and all network interactions is impossible. Previous work addressing the impacts of partial network data is surprisingly limited, focuses primarily on missing nodes, and suggests that network statistics derived from subsampled data are not suitable estimators for the same network statistics describing the overall network topology. We generate scaling methods to predict true network statistics, including the degree distribution, from only partial knowledge of nodes, links, or weights. Our methods are transparent and do not assume a known generating process for the network, thus enabling prediction of network statistics for a wide variety of applications. We validate analytical results on four simulated network classes and empirical data sets of various sizes. We perform subsampling experiments by varying proportions of sampled data and demonstrate that our scaling methods can provide very good estimates of true network statistics while acknowledging limits. Lastly, we apply our techniques to a set of rich and evolving large-scale social networks, Twitter reply networks. Based on 100 million tweets, we use our scaling techniques to propose a statistical characterization of the Twitter Interactome from September 2008 to November 2008. Our treatment allows us to find support for Dunbar's hypothesis in detecting an upper threshold for the number of active social contacts that individuals maintain over the course of one week.
Estimation of Global Network Statistics from Incomplete Data
Bliss, Catherine A.; Danforth, Christopher M.; Dodds, Peter Sheridan
2014-01-01
Complex networks underlie an enormous variety of social, biological, physical, and virtual systems. A profound complication for the science of complex networks is that in most cases, observing all nodes and all network interactions is impossible. Previous work addressing the impacts of partial network data is surprisingly limited, focuses primarily on missing nodes, and suggests that network statistics derived from subsampled data are not suitable estimators for the same network statistics describing the overall network topology. We generate scaling methods to predict true network statistics, including the degree distribution, from only partial knowledge of nodes, links, or weights. Our methods are transparent and do not assume a known generating process for the network, thus enabling prediction of network statistics for a wide variety of applications. We validate analytical results on four simulated network classes and empirical data sets of various sizes. We perform subsampling experiments by varying proportions of sampled data and demonstrate that our scaling methods can provide very good estimates of true network statistics while acknowledging limits. Lastly, we apply our techniques to a set of rich and evolving large-scale social networks, Twitter reply networks. Based on 100 million tweets, we use our scaling techniques to propose a statistical characterization of the Twitter Interactome from September 2008 to November 2008. Our treatment allows us to find support for Dunbar's hypothesis in detecting an upper threshold for the number of active social contacts that individuals maintain over the course of one week. PMID:25338183
ERIC Educational Resources Information Center
Lipsey, Mark W.; Puzio, Kelly; Yun, Cathy; Hebert, Michael A.; Steinka-Fry, Kasia; Cole, Mikel W.; Roberts, Megan; Anthony, Karen S.; Busick, Matthew D.
2012-01-01
This paper is directed to researchers who conduct and report education intervention studies. Its purpose is to stimulate and guide them to go a step beyond reporting the statistics that emerge from their analysis of the differences between experimental groups on the respective outcome variables. With what is often very minimal additional effort,…
Data flow language and interpreter for a reconfigurable distributed data processor
Hurt, A.D.; Heath, J.R.
1982-01-01
An analytic language and an interpreter whereby an applications data flow graph may serve as an input to a reconfigurable distributed data processor is proposed. The architecture considered consists of a number of loosely coupled computing elements (CES) which may be linked to data and file memories through fully nonblocking interconnect networks. The real-time performance of such an architecture depends upon its ability to alter its topology in response to changes in application, asynchronous data rates and faults. Such a data flow language enhances the versatility of a reconfigurable architecture by allowing the user to specify the machine's topology at a very high level. 11 references.
Interpreting School Satisfaction Data from a Marketing Perspective.
ERIC Educational Resources Information Center
Pandiani, John A.; James, Brad C.; Banks, Steven M.
This paper presents results of a customer satisfaction survey of Vermont elementary and secondary public schools concerning satisfaction with mental health services during the 1996-97 school year. Analysis of completed questionnaires (N=233) are interpreted from a marketing perspective. Findings are reported for: (1) treated prevalence of…
Interpreting Physiological Data from Riparian Vegetation: Cautions and Complications
John G. Williams
1989-01-01
Water potential and stomatal conductance are important indicators of the response of vegetation to manipulations of riparian systems. However, interpretation of measurements of these variables is not always straightforward. An extensive monitoring program along the Carmel River in central California, carried out by the Monterey Peninsula Water Management District,...
Evaluation Design Project: Multilevel Interpretation of Evaluation Data Study.
ERIC Educational Resources Information Center
Miller, M. David; Burstein, Leigh
Two studies are presented in this report. The first is titled "Empirical Studies of Multilevel Approaches to Test Development and Interpretation: Measuring Between-Group Differences in Instruction." Because of a belief that schooling does affect student achievement, researchers have questioned the empirical and measurement techniques…
Nonparametric statistical testing of EEG- and MEG-data.
Maris, Eric; Oostenveld, Robert
2007-08-15
In this paper, we show how ElectroEncephaloGraphic (EEG) and MagnetoEncephaloGraphic (MEG) data can be analyzed statistically using nonparametric techniques. Nonparametric statistical tests offer complete freedom to the user with respect to the test statistic by means of which the experimental conditions are compared. This freedom provides a straightforward way to solve the multiple comparisons problem (MCP) and it allows to incorporate biophysically motivated constraints in the test statistic, which may drastically increase the sensitivity of the statistical test. The paper is written for two audiences: (1) empirical neuroscientists looking for the most appropriate data analysis method, and (2) methodologists interested in the theoretical concepts behind nonparametric statistical tests. For the empirical neuroscientist, a large part of the paper is written in a tutorial-like fashion, enabling neuroscientists to construct their own statistical test, maximizing the sensitivity to the expected effect. And for the methodologist, it is explained why the nonparametric test is formally correct. This means that we formulate a null hypothesis (identical probability distribution in the different experimental conditions) and show that the nonparametric test controls the false alarm rate under this null hypothesis.
Kappa statistic for clustered matched-pair data.
Yang, Zhao; Zhou, Ming
2014-07-10
Kappa statistic is widely used to assess the agreement between two procedures in the independent matched-pair data. For matched-pair data collected in clusters, on the basis of the delta method and sampling techniques, we propose a nonparametric variance estimator for the kappa statistic without within-cluster correlation structure or distributional assumptions. The results of an extensive Monte Carlo simulation study demonstrate that the proposed kappa statistic provides consistent estimation and the proposed variance estimator behaves reasonably well for at least a moderately large number of clusters (e.g., K ≥50). Compared with the variance estimator ignoring dependence within a cluster, the proposed variance estimator performs better in maintaining the nominal coverage probability when the intra-cluster correlation is fair (ρ ≥0.3), with more pronounced improvement when ρ is further increased. To illustrate the practical application of the proposed estimator, we analyze two real data examples of clustered matched-pair data.
Imputing historical statistics, soils information, and other land-use data to crop area
NASA Technical Reports Server (NTRS)
Perry, C. R., Jr.; Willis, R. W.; Lautenschlager, L.
1982-01-01
In foreign crop condition monitoring, satellite acquired imagery is routinely used. To facilitate interpretation of this imagery, it is advantageous to have estimates of the crop types and their extent for small area units, i.e., grid cells on a map represent, at 60 deg latitude, an area nominally 25 by 25 nautical miles in size. The feasibility of imputing historical crop statistics, soils information, and other ancillary data to crop area for a province in Argentina is studied.
Using Carbon Emissions Data to "Heat Up" Descriptive Statistics
ERIC Educational Resources Information Center
Brooks, Robert
2012-01-01
This article illustrates using carbon emissions data in an introductory statistics assignment. The carbon emissions data has desirable characteristics including: choice of measure; skewness; and outliers. These complexities allow research and public policy debate to be introduced. (Contains 4 figures and 2 tables.)
Data Warehousing: How To Make Your Statistics Meaningful.
ERIC Educational Resources Information Center
Flaherty, William
2001-01-01
Examines how one school district found a way to turn data collection from a disparate mountain of statistics into more useful information by using their Instructional Decision Support System. System software is explained as is how the district solved some data management challenges. (GR)
47 CFR 1.363 - Introduction of statistical data.
Code of Federal Regulations, 2012 CFR
2012-10-01
... studies, so as to be available upon request. In the case of experimental analyses, a clear and complete... adjustments, if any, to observed data shall be described. In the case of every kind of statistical study, the... input data shall be made available. (b) In the case of all studies and analyses offered in evidence...
Using Carbon Emissions Data to "Heat Up" Descriptive Statistics
ERIC Educational Resources Information Center
Brooks, Robert
2012-01-01
This article illustrates using carbon emissions data in an introductory statistics assignment. The carbon emissions data has desirable characteristics including: choice of measure; skewness; and outliers. These complexities allow research and public policy debate to be introduced. (Contains 4 figures and 2 tables.)
Advanced Statistical and Data Analysis Tools for Astrophysics
NASA Technical Reports Server (NTRS)
Kashyap, V.; Scargle, Jeffrey D. (Technical Monitor)
2001-01-01
The goal of the project is to obtain, derive, and develop statistical and data analysis tools that would be of use in the analyses of high-resolution, high-sensitivity data that are becoming available with new instruments. This is envisioned as a cross-disciplinary effort with a number of collaborators.
Data Warehousing: How To Make Your Statistics Meaningful.
ERIC Educational Resources Information Center
Flaherty, William
2001-01-01
Examines how one school district found a way to turn data collection from a disparate mountain of statistics into more useful information by using their Instructional Decision Support System. System software is explained as is how the district solved some data management challenges. (GR)
Smolders, R.; Den Hond, E.; Koppen, G.; Govarts, E.; Willems, H.; Casteleyn, L.; Kolossa-Gehring, M.; Fiddicke, U.; Castaño, A.; Koch, H.M.; Angerer, J.; Esteban, M.; Sepai, O.; Exley, K.; Bloemen, L.; Horvat, M.; Knudsen, L.E.; Joas, A.; Joas, R.; Biot, P.; and others
2015-08-15
In 2011 and 2012, the COPHES/DEMOCOPHES twin projects performed the first ever harmonized human biomonitoring survey in 17 European countries. In more than 1800 mother–child pairs, individual lifestyle data were collected and cadmium, cotinine and certain phthalate metabolites were measured in urine. Total mercury was determined in hair samples. While the main goal of the COPHES/DEMOCOPHES twin projects was to develop and test harmonized protocols and procedures, the goal of the current paper is to investigate whether the observed differences in biomarker values among the countries implementing DEMOCOPHES can be interpreted using information from external databases on environmental quality and lifestyle. In general, 13 countries having implemented DEMOCOPHES provided high-quality data from external sources that were relevant for interpretation purposes. However, some data were not available for reporting or were not in line with predefined specifications. Therefore, only part of the external information could be included in the statistical analyses. Nonetheless, there was a highly significant correlation between national levels of fish consumption and mercury in hair, the strength of antismoking legislation was significantly related to urinary cotinine levels, and we were able to show indications that also urinary cadmium levels were associated with environmental quality and food quality. These results again show the potential of biomonitoring data to provide added value for (the evaluation of) evidence-informed policy making. - Highlights: • External data was collected to interpret HBM data from DEMOCOPHES. • Hg in hair could be related to fish consumption across different countries. • Urinary cotinine was related to strictness of anti-smoking legislation. • Urinary Cd was borderline significantly related to air and food quality. • Lack of comparable data among countries hampered the analysis.
Cho, Yunju; Ahmed, Arif; Islam, Annana; Kim, Sunghwan
2015-01-01
Because of the increasing importance of heavy and unconventional crude oil as an energy source, there is a growing need for petroleomics: the pursuit of more complete and detailed knowledge of the chemical compositions of crude oil. Crude oil has an extremely complex nature; hence, techniques with ultra-high resolving capabilities, such as Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS), are necessary. FT-ICR MS has been successfully applied to the study of heavy and unconventional crude oils such as bitumen and shale oil. However, the analysis of crude oil with FT-ICR MS is not trivial, and it has pushed analysis to the limits of instrumental and methodological capabilities. For example, high-resolution mass spectra of crude oils may contain over 100,000 peaks that require interpretation. To visualize large data sets more effectively, data processing methods such as Kendrick mass defect analysis and statistical analyses have been developed. The successful application of FT-ICR MS to the study of crude oil has been critically dependent on key developments in FT-ICR MS instrumentation and data processing methods. This review offers an introduction to the basic principles, FT-ICR MS instrumentation development, ionization techniques, and data interpretation methods for petroleomics and is intended for readers having no prior experience in this field of study. © 2014 Wiley Periodicals, Inc.
Weiß, Verena; Schmidt, Matthias; Hellmich, Martin
2015-01-01
Introduction: For survival data the coefficient of determination cannot be used to describe how good a model fits to the data. Therefore, several measures of explained variation for survival data have been proposed in recent years. Methods: We analyse an existing measure of explained variation with regard to minimisation aspects and demonstrate that these are not fulfilled for the measure. Results: In analogy to the least squares method from linear regression analysis we develop a novel measure for categorical covariates which is based only on the Kaplan-Meier estimator. Hence, the novel measure is a completely nonparametric measure with an easy graphical interpretation. For the novel measure different weighting possibilities are available and a statistical test of significance can be performed. Eventually, we apply the novel measure and further measures of explained variation to a dataset comprising persons with a histopathological papillary thyroid carcinoma. Conclusion: We propose a novel measure of explained variation with a comprehensible derivation as well as a graphical interpretation, which may be used in further analyses with survival data. PMID:26550007
Advanced petrophysical interpretation of nuclear well logging data
NASA Astrophysics Data System (ADS)
Kozhevnikov, D. A.; Lazutkina, N. Ye.
1995-04-01
A new approach to rock component analyses using “adaptive petrophysical tuning” provides three crucially new benefits: an original method for interpreting well logs; an algorithm for adaptive tuning and a reliable method of isolating reservoirs within a section. The latter can be regarded as a kind of “petrophysical filtration” based on using the dynamic porosity. Some results of component analyses of terrigenous deposits of the Tyumen suite (West Siberia) are presented.
ERIC Educational Resources Information Center
Olsen, Robert J.
2008-01-01
I describe how data pooling and data visualization can be employed in the first-semester general chemistry laboratory to introduce core statistical concepts such as central tendency and dispersion of a data set. The pooled data are plotted as a 1-D scatterplot, a purpose-designed number line through which statistical features of the data are…
ERIC Educational Resources Information Center
Olsen, Robert J.
2008-01-01
I describe how data pooling and data visualization can be employed in the first-semester general chemistry laboratory to introduce core statistical concepts such as central tendency and dispersion of a data set. The pooled data are plotted as a 1-D scatterplot, a purpose-designed number line through which statistical features of the data are…
A Flexible Approach for the Statistical Visualization of Ensemble Data
Potter, K.; Wilson, A.; Bremer, P.; Williams, Dean N.; Pascucci, V.; Johnson, C.
2009-09-29
Scientists are increasingly moving towards ensemble data sets to explore relationships present in dynamic systems. Ensemble data sets combine spatio-temporal simulation results generated using multiple numerical models, sampled input conditions and perturbed parameters. While ensemble data sets are a powerful tool for mitigating uncertainty, they pose significant visualization and analysis challenges due to their complexity. We present a collection of overview and statistical displays linked through a high level of interactivity to provide a framework for gaining key scientific insight into the distribution of the simulation results as well as the uncertainty associated with the data. In contrast to methods that present large amounts of diverse information in a single display, we argue that combining multiple linked statistical displays yields a clearer presentation of the data and facilitates a greater level of visual data analysis. We demonstrate this approach using driving problems from climate modeling and meteorology and discuss generalizations to other fields.
Estimation of context for statistical classification of multispectral image data
NASA Technical Reports Server (NTRS)
Tilton, J. C.; Vardeman, S. B.; Swain, P. H.
1982-01-01
Recent investigations have demonstrated the effectiveness of a contextual classifier that combines spatial and spectral information employing a general statistical approach. This statistical classification algorithm exploits the tendency of certain ground cover classes to occur more frequently in some spatial contexts than in others. Indeed, a key input to this algorithm is a statistical characterization of the context: the context function. An unbiased estimator of the context function is discussed which, besides having the advantage of statistical unbiasedness, has the additional advantage over other estimation techniques of being amenable to an adaptive implementation in which the context-function estimate varies according to local contextual information. Results from applying the unbiased estimator to the contextual classification of three real Landsat data sets are presented and contrasted with results from noncontextual classifications and from contextual classifications utilizing other context-function estimation techniques.
Data analysis using the Gnu R system for statistical computation
Simone, James; /Fermilab
2011-07-01
R is a language system for statistical computation. It is widely used in statistics, bioinformatics, machine learning, data mining, quantitative finance, and the analysis of clinical drug trials. Among the advantages of R are: it has become the standard language for developing statistical techniques, it is being actively developed by a large and growing global user community, it is open source software, it is highly portable (Linux, OS-X and Windows), it has a built-in documentation system, it produces high quality graphics and it is easily extensible with over four thousand extension library packages available covering statistics and applications. This report gives a very brief introduction to R with some examples using lattice QCD simulation results. It then discusses the development of R packages designed for chi-square minimization fits for lattice n-pt correlation functions.
Statistics for correlated data: phylogenies, space, and time.
Ives, Anthony R; Zhu, Jun
2006-02-01
Here we give an introduction to the growing number of statistical techniques for analyzing data that are not independent realizations of the same sampling process--in other words, correlated data. We focus on regression problems, in which the value of a given variable depends linearly on the value of another variable. To illustrate different types of processes leading to correlated data, we analyze four simulated examples representing diverse problems arising in ecological studies. The first example is a comparison among species to determine the relationship between home-range area and body size; because species are phylogenetically related, they do not represent independent samples. The second example addresses spatial variation in net primary production and how this might be affected by soil nitrogen; because nearby locations are likely to have similar net primary productivity for reasons other than soil nitrogen, spatial correlation is likely. In the third example, we consider a time-series model to ask whether the decrease in density of a butterfly species is the result of decreases in its host-plant density; because the population density of a species in one generation is likely to affect the density in the following generation, time-series data are often correlated. The fourth example combines both spatial and temporal correlation in an experiment in which prey densities are manipulated to determine the response of predators to their food supply. For each of these examples, we use a different statistical approach for analyzing models of correlated data. Our goal is to give an overview of conceptual issues surrounding correlated data, rather than a detailed tutorial in how to apply different statistical techniques. By dispelling some of the mystery behind correlated data, we hope to encourage ecologists to learn about statistics that could be useful in their own work. Although at first encounter these techniques might seem complicated, they have the power to
Statistical and methodological considerations for the interpretation of intranasal oxytocin studies
Walum, Hasse; Waldman, Irwin D.; Young, Larry J.
2015-01-01
Over the last decade, oxytocin (OT) has received focus in numerous studies associating intranasal administration of this peptide with various aspects of human social behavior. These studies in humans are inspired by animal research, especially in rodents, showing that central manipulations of the OT system affect behavioral phenotypes related to social cognition, including parental behavior, social bonding and individual recognition. Taken together, these studies in humans appear to provide compelling, but sometimes bewildering evidence for the role of OT in influencing a vast array of complex social cognitive processes in humans. In this paper we investigate to what extent the human intranasal OT literature lends support to the hypothesis that intranasal OT consistently influences a wide spectrum of social behavior in humans. We do this by considering statistical features of studies within this field, including factors like statistical power, pre-study odds and bias. Our conclusion is that intranasal OT studies are generally underpowered and that there is a high probability that most of the published intranasal OT findings do not represent true effects. Thus the remarkable reports that intranasal OT influences a large number of human social behaviors should be viewed with healthy skepticism, and we make recommendations to improve the reliability of human OT studies in the future. PMID:26210057
Method of interpretation of remotely sensed data and applications to land use
NASA Technical Reports Server (NTRS)
Parada, N. D. J. (Principal Investigator); Dossantos, A. P.; Foresti, C.; Demoraesnovo, E. M. L.; Niero, M.; Lombardo, M. A.
1981-01-01
Instructional material describing a methodology of remote sensing data interpretation and examples of applicatons to land use survey are presented. The image interpretation elements are discussed for different types of sensor systems: aerial photographs, radar, and MSS/LANDSAT. Visual and automatic LANDSAT image interpretation is emphasized.
Online Updating of Statistical Inference in the Big Data Setting.
Schifano, Elizabeth D; Wu, Jing; Wang, Chun; Yan, Jun; Chen, Ming-Hui
2016-01-01
We present statistical methods for big data arising from online analytical processing, where large amounts of data arrive in streams and require fast analysis without storage/access to the historical data. In particular, we develop iterative estimating algorithms and statistical inferences for linear models and estimating equations that update as new data arrive. These algorithms are computationally efficient, minimally storage-intensive, and allow for possible rank deficiencies in the subset design matrices due to rare-event covariates. Within the linear model setting, the proposed online-updating framework leads to predictive residual tests that can be used to assess the goodness-of-fit of the hypothesized model. We also propose a new online-updating estimator under the estimating equation setting. Theoretical properties of the goodness-of-fit tests and proposed estimators are examined in detail. In simulation studies and real data applications, our estimator compares favorably with competing approaches under the estimating equation setting.
A note on the kappa statistic for clustered dichotomous data.
Zhou, Ming; Yang, Zhao
2014-06-30
The kappa statistic is widely used to assess the agreement between two raters. Motivated by a simulation-based cluster bootstrap method to calculate the variance of the kappa statistic for clustered physician-patients dichotomous data, we investigate its special correlation structure and develop a new simple and efficient data generation algorithm. For the clustered physician-patients dichotomous data, based on the delta method and its special covariance structure, we propose a semi-parametric variance estimator for the kappa statistic. An extensive Monte Carlo simulation study is performed to evaluate the performance of the new proposal and five existing methods with respect to the empirical coverage probability, root-mean-square error, and average width of the 95% confidence interval for the kappa statistic. The variance estimator ignoring the dependence within a cluster is generally inappropriate, and the variance estimators from the new proposal, bootstrap-based methods, and the sampling-based delta method perform reasonably well for at least a moderately large number of clusters (e.g., the number of clusters K ⩾50). The new proposal and sampling-based delta method provide convenient tools for efficient computations and non-simulation-based alternatives to the existing bootstrap-based methods. Moreover, the new proposal has acceptable performance even when the number of clusters is as small as K = 25. To illustrate the practical application of all the methods, one psychiatric research data and two simulated clustered physician-patients dichotomous data are analyzed.
Multilevel statistical models and the analysis of experimental data.
Behm, Jocelyn E; Edmonds, Devin A; Harmon, Jason P; Ives, Anthony R
2013-07-01
Data sets from ecological experiments can be difficult to analyze, due to lack of independence of experimental units and complex variance structures. In addition, information of interest may lie in complicated contrasts among treatments, rather than direct output from statistical tests. Here, we present a statistical framework for analyzing data sets containing non-independent experimental units and differences in variance among treatments (heteroscedasticity) and apply this framework to experimental data on interspecific competition among three tadpole species. Our framework involves three steps: (1) use a multilevel regression model to calculate coefficients of treatment effects on response variables; (2) combine coefficients to quantify the strength of competition (the target information of our experiment); and (3) use parametric bootstrapping to calculate significance of competition strengths. We repeated this framework using three multilevel regression models to analyze data at the level of individual tadpoles, at the replicate level, and at the replicate level accounting for heteroscedasticity. Comparing results shows the need to correctly specify the statistical model, with the model that accurately accounts for heteroscedasticity leading to different conclusions from the other two models. This approach gives a single, comprehensive analysis of experimental data that can be used to extract informative biological parameters in a statistically rigorous way.
Simpson's Paradox in the Interpretation of "Leaky Pipeline" Data
ERIC Educational Resources Information Center
Walton, Paul H.; Walton, Daniel J.
2016-01-01
The traditional "leaky pipeline" plots are widely used to inform gender equality policy and practice. Herein, we demonstrate how a statistical phenomenon known as Simpson's paradox can obscure trends in gender "leaky pipeline" plots. Our approach has been to use Excel spreadsheets to generate hypothetical "leaky…
Simpson's Paradox in the Interpretation of "Leaky Pipeline" Data
ERIC Educational Resources Information Center
Walton, Paul H.; Walton, Daniel J.
2016-01-01
The traditional "leaky pipeline" plots are widely used to inform gender equality policy and practice. Herein, we demonstrate how a statistical phenomenon known as Simpson's paradox can obscure trends in gender "leaky pipeline" plots. Our approach has been to use Excel spreadsheets to generate hypothetical "leaky…
NASA Astrophysics Data System (ADS)
Chung, Jung R.; DeLaughter, Aimee H.; Baba, Justin S.; Spiegelman, Clifford H.; Amoss, M. S.; Cote, Gerard L.
2003-07-01
The Mueller matrix describes all the polarizing properties of a sample, and therefore the optical differences between cancerous and non-cancerous tissue should be present within the matrix elements. We present in this paper the Mueller matrices of three types of tissue; normal, benign mole, and malignant melanoma on a Sinclair swine model. Feature extraction is done on the Mueller matrix elements resulting in the retardance images, diattenuation images, and depolarization images. These images are analyzed in an attempt to determine the important factors for the identification of cancerous lesions from their benign counterparts. In addition, the extracted features are analyzed using statistical processing to develop an accurate classification scheme and to identify the importance of each parameter in the determination of cancerous versus non-cancerous tissue.
Statistical interpretation of transient current power-law decay in colloidal quantum dot arrays
NASA Astrophysics Data System (ADS)
Sibatov, R. T.
2011-08-01
A new statistical model of the charge transport in colloidal quantum dot arrays is proposed. It takes into account Coulomb blockade forbidding multiple occupancy of nanocrystals and the influence of energetic disorder of interdot space. The model explains power-law current transients and the presence of the memory effect. The fractional differential analogue of the Ohm law is found phenomenologically for nanocrystal arrays. The model combines ideas that were considered as conflicting by other authors: the Scher-Montroll idea about the power-law distribution of waiting times in localized states for disordered semiconductors is applied taking into account Coulomb blockade; Novikov's condition about the asymptotic power-law distribution of time intervals between successful current pulses in conduction channels is fulfilled; and the carrier injection blocking predicted by Ginger and Greenham (2000 J. Appl. Phys. 87 1361) takes place.
NASA Astrophysics Data System (ADS)
Samfira, Ionel; Boldea, Marius; Popescu, Cosmin
2012-09-01
Significant parameters of permanent grasslands are represented by the pastoral value and Shannon and Simpson biodiversity indices. The dynamics of these parameters has been studied in several plant associations in Banat Plain, Romania. From the point of view of their typology, these permanent grasslands belong to the steppe area, series Festuca pseudovina, type Festuca pseudovina-Achilea millefolium, subtype Lolium perenne. The methods used for the purpose of this research included plant cover analysis (double meter method, calculation of Shannon and Simpson indices), and statistical methods of regression and correlation. The results show that, in the permanent grasslands in the plain region, when the pastoral value is average to low, the level of interspecific biodiversity is on the increase.
Methodology of remote sensing data interpretation and geological applications. [Brazil
NASA Technical Reports Server (NTRS)
Parada, N. D. J. (Principal Investigator); Veneziani, P.; Dosanjos, C. E.
1982-01-01
Elements of photointerpretation discussed include the analysis of photographic texture and structure as well as film tonality. The method used is based on conventional techniques developed for interpreting aerial black and white photographs. By defining the properties which characterize the form and individuality of dual images, homologous zones can be identified. Guy's logic method (1966) was adapted and used on functions of resolution, scale, and spectral characteristics of remotely sensed products. Applications of LANDSAT imagery are discussed for regional geological mapping, mineral exploration, hydrogeology, and geotechnical engineering in Brazil.
Statistical methods for handling unwanted variation in metabolomics data.
De Livera, Alysha M; Sysi-Aho, Marko; Jacob, Laurent; Gagnon-Bartsch, Johann A; Castillo, Sandra; Simpson, Julie A; Speed, Terence P
2015-04-07
Metabolomics experiments are inevitably subject to a component of unwanted variation, due to factors such as batch effects, long runs of samples, and confounding biological variation. Although the removal of this unwanted variation is a vital step in the analysis of metabolomics data, it is considered a gray area in which there is a recognized need to develop a better understanding of the procedures and statistical methods required to achieve statistically relevant optimal biological outcomes. In this paper, we discuss the causes of unwanted variation in metabolomics experiments, review commonly used metabolomics approaches for handling this unwanted variation, and present a statistical approach for the removal of unwanted variation to obtain normalized metabolomics data. The advantages and performance of the approach relative to several widely used metabolomics normalization approaches are illustrated through two metabolomics studies, and recommendations are provided for choosing and assessing the most suitable normalization method for a given metabolomics experiment. Software for the approach is made freely available.
Statistical data for the tensile properties of natural fibre composites.
Torres, J P; Vandi, L-J; Veidt, M; Heiztmann, M T
2017-06-01
This article features a large statistical database on the tensile properties of natural fibre reinforced composite laminates. The data presented here corresponds to a comprehensive experimental testing program of several composite systems including: different material constituents (epoxy and vinyl ester resins; flax, jute and carbon fibres), different fibre configurations (short-fibre mats, unidirectional, and plain, twill and satin woven fabrics) and different fibre orientations (0°, 90°, and [0,90] angle plies). For each material, ~50 specimens were tested under uniaxial tensile loading. Here, we provide the complete set of stress-strain curves together with the statistical distributions of their calculated elastic modulus, strength and failure strain. The data is also provided as support material for the research article: "The mechanical properties of natural fibre composite laminates: A statistical study" [1].
A COMPREHENSIVE STATISTICALLY-BASED METHOD TO INTERPRET REAL-TIME FLOWING MEASUREMENTS
Pinan Dawkrajai; Analis A. Romero; Keita Yoshioka; Ding Zhu; A.D. Hill; Larry W. Lake
2004-10-01
In this project, we are developing new methods for interpreting measurements in complex wells (horizontal, multilateral and multi-branching wells) to determine the profiles of oil, gas, and water entry. These methods are needed to take full advantage of ''smart'' well instrumentation, a technology that is rapidly evolving to provide the ability to continuously and permanently monitor downhole temperature, pressure, volumetric flow rate, and perhaps other fluid flow properties at many locations along a wellbore; and hence, to control and optimize well performance. In this first year, we have made considerable progress in the development of the forward model of temperature and pressure behavior in complex wells. In this period, we have progressed on three major parts of the forward problem of predicting the temperature and pressure behavior in complex wells. These three parts are the temperature and pressure behaviors in the reservoir near the wellbore, in the wellbore or laterals in the producing intervals, and in the build sections connecting the laterals, respectively. Many models exist to predict pressure behavior in reservoirs and wells, but these are almost always isothermal models. To predict temperature behavior we derived general mass, momentum, and energy balance equations for these parts of the complex well system. Analytical solutions for the reservoir and wellbore parts for certain special conditions show the magnitude of thermal effects that could occur. Our preliminary sensitivity analyses show that thermal effects caused by near-wellbore reservoir flow can cause temperature changes that are measurable with smart well technology. This is encouraging for the further development of the inverse model.
A Comprehensive Statistically-Based Method to Interpret Real-Time Flowing Measurements
Keita Yoshioka; Pinan Dawkrajai; Analis A. Romero; Ding Zhu; A. D. Hill; Larry W. Lake
2007-01-15
With the recent development of temperature measurement systems, continuous temperature profiles can be obtained with high precision. Small temperature changes can be detected by modern temperature measuring instruments such as fiber optic distributed temperature sensor (DTS) in intelligent completions and will potentially aid the diagnosis of downhole flow conditions. In vertical wells, since elevational geothermal changes make the wellbore temperature sensitive to the amount and the type of fluids produced, temperature logs can be used successfully to diagnose the downhole flow conditions. However, geothermal temperature changes along the wellbore being small for horizontal wells, interpretations of a temperature log become difficult. The primary temperature differences for each phase (oil, water, and gas) are caused by frictional effects. Therefore, in developing a thermal model for horizontal wellbore, subtle temperature changes must be accounted for. In this project, we have rigorously derived governing equations for a producing horizontal wellbore and developed a prediction model of the temperature and pressure by coupling the wellbore and reservoir equations. Also, we applied Ramey's model (1962) to the build section and used an energy balance to infer the temperature profile at the junction. The multilateral wellbore temperature model was applied to a wide range of cases at varying fluid thermal properties, absolute values of temperature and pressure, geothermal gradients, flow rates from each lateral, and the trajectories of each build section. With the prediction models developed, we present inversion studies of synthetic and field examples. These results are essential to identify water or gas entry, to guide flow control devices in intelligent completions, and to decide if reservoir stimulation is needed in particular horizontal sections. This study will complete and validate these inversion studies.
A Comprehensive Statistically-Based Method to Interpret Real-Time Flowing Measurements
Pinan Dawkrajai; Keita Yoshioka; Analis A. Romero; Ding Zhu; A.D. Hill; Larry W. Lake
2005-10-01
This project is motivated by the increasing use of distributed temperature sensors for real-time monitoring of complex wells (horizontal, multilateral and multi-branching wells) to infer the profiles of oil, gas, and water entry. Measured information can be used to interpret flow profiles along the wellbore including junction and build section. In this second project year, we have completed a forward model to predict temperature and pressure profiles in complex wells. As a comprehensive temperature model, we have developed an analytical reservoir flow model which takes into account Joule-Thomson effects in the near well vicinity and multiphase non-isothermal producing wellbore model, and couples those models accounting mass and heat transfer between them. For further inferences such as water coning or gas evaporation, we will need a numerical non-isothermal reservoir simulator, and unlike existing (thermal recovery, geothermal) simulators, it should capture subtle temperature change occurring in a normal production. We will show the results from the analytical coupled model (analytical reservoir solution coupled with numerical multi-segment well model) to infer the anomalous temperature or pressure profiles under various conditions, and the preliminary results from the numerical coupled reservoir model which solves full matrix including wellbore grids. We applied Ramey's model to the build section and used an enthalpy balance to infer the temperature profile at the junction. The multilateral wellbore temperature model was applied to a wide range of cases varying fluid thermal properties, absolute values of temperature and pressure, geothermal gradients, flow rates from each lateral, and the trajectories of each build section.
Interpreting and Reporting Radiological Water-Quality Data
McCurdy, David E.; Garbarino, John R.; Mullin, Ann H.
2008-01-01
This document provides information to U.S. Geological Survey (USGS) Water Science Centers on interpreting and reporting radiological results for samples of environmental matrices, most notably water. The information provided is intended to be broadly useful throughout the United States, but it is recommended that scientists who work at sites containing radioactive hazardous wastes need to consult additional sources for more detailed information. The document is largely based on recognized national standards and guidance documents for radioanalytical sample processing, most notably the Multi-Agency Radiological Laboratory Analytical Protocols Manual (MARLAP), and on documents published by the U.S. Environmental Protection Agency and the American National Standards Institute. It does not include discussion of standard USGS practices including field quality-control sample analysis, interpretive report policies, and related issues, all of which shall always be included in any effort by the Water Science Centers. The use of 'shall' in this report signifies a policy requirement of the USGS Office of Water Quality.
Statistical analysis and interpolation of compositional data in materials science.
Pesenson, Misha Z; Suram, Santosh K; Gregoire, John M
2015-02-09
Compositional data are ubiquitous in chemistry and materials science: analysis of elements in multicomponent systems, combinatorial problems, etc., lead to data that are non-negative and sum to a constant (for example, atomic concentrations). The constant sum constraint restricts the sampling space to a simplex instead of the usual Euclidean space. Since statistical measures such as mean and standard deviation are defined for the Euclidean space, traditional correlation studies, multivariate analysis, and hypothesis testing may lead to erroneous dependencies and incorrect inferences when applied to compositional data. Furthermore, composition measurements that are used for data analytics may not include all of the elements contained in the material; that is, the measurements may be subcompositions of a higher-dimensional parent composition. Physically meaningful statistical analysis must yield results that are invariant under the number of composition elements, requiring the application of specialized statistical tools. We present specifics and subtleties of compositional data processing through discussion of illustrative examples. We introduce basic concepts, terminology, and methods required for the analysis of compositional data and utilize them for the spatial interpolation of composition in a sputtered thin film. The results demonstrate the importance of this mathematical framework for compositional data analysis (CDA) in the fields of materials science and chemistry.
Applications of spatial statistical network models to stream data
Daniel J. Isaak; Erin E. Peterson; Jay M. Ver Hoef; Seth J. Wenger; Jeffrey A. Falke; Christian E. Torgersen; Colin Sowder; E. Ashley Steel; Marie-Josee Fortin; Chris E. Jordan; Aaron S. Ruesch; Nicholas Som; Pascal. Monestiez
2014-01-01
Streams and rivers host a significant portion of Earth's biodiversity and provide important ecosystem services for human populations. Accurate information regarding the status and trends of stream resources is vital for their effective conservation and management. Most statistical techniques applied to data measured on stream networks were developed for...
Statistical Modeling for Radiation Hardness Assurance: Toward Bigger Data
NASA Technical Reports Server (NTRS)
Ladbury, R.; Campola, M. J.
2015-01-01
New approaches to statistical modeling in radiation hardness assurance are discussed. These approaches yield quantitative bounds on flight-part radiation performance even in the absence of conventional data sources. This allows the analyst to bound radiation risk at all stages and for all decisions in the RHA process. It also allows optimization of RHA procedures for the project's risk tolerance.
Mississippi Community and Junior Colleges: Statistical Data, 1987-88.
ERIC Educational Resources Information Center
Moody, George V.; And Others
A brief description of Mississippi's 15 public community and junior colleges is provided in this report, along with statistical data for the 1987-88 academic year on enrollments, degrees and certificates awarded, revenues, expenditures, academic salary ranges, learning resources, transportation services, dormitory utilization, and auxiliary…
Data Desk Professional: Statistical Analysis for the Macintosh.
ERIC Educational Resources Information Center
Wise, Steven L.; Kutish, Gerald W.
This review of Data Desk Professional, a statistical software package for Macintosh microcomputers, includes information on: (1) cost and the amount and allocation of memory; (2) usability (documentation quality, ease of use); (3) running programs; (4) program output (quality of graphics); (5) accuracy; and (6) user services. In conclusion, it is…
Exploring Foundation Concepts in Introductory Statistics Using Dynamic Data Points
ERIC Educational Resources Information Center
Ekol, George
2015-01-01
This paper analyses introductory statistics students' verbal and gestural expressions as they interacted with a dynamic sketch (DS) designed using "Sketchpad" software. The DS involved numeric data points built on the number line whose values changed as the points were dragged along the number line. The study is framed on aggregate…
Statistical comparisons of AGDISP prediction with Mission III data
Baozhong Duan; Karl Mierzejewski; William G. Yendol
1991-01-01
Statistical comparison of AGDISP prediction were made against data obtained during aerial spray field trials ("Mission III") conducted in March 1987 at the APHIS Facility, Moore Air Base, Edinburg, Texas, by the NEFAAT group (Northeast Forest Aerial Application Technology). Seven out of twenty one runs were observed and predicted means (O and P), mean bias...
Statistical Physics in the Era of Big Data
ERIC Educational Resources Information Center
Wang, Dashun
2013-01-01
With the wealth of data provided by a wide range of high-throughout measurement tools and technologies, statistical physics of complex systems is entering a new phase, impacting in a meaningful fashion a wide range of fields, from cell biology to computer science to economics. In this dissertation, by applying tools and techniques developed in…
Introduction to Statistics and Data Analysis With Computer Applications I.
ERIC Educational Resources Information Center
Morris, Carl; Rolph, John
This document consists of unrevised lecture notes for the first half of a 20-week in-house graduate course at Rand Corporation. The chapter headings are: (1) Histograms and descriptive statistics; (2) Measures of dispersion, distance and goodness of fit; (3) Using JOSS for data analysis; (4) Binomial distribution and normal approximation; (5)…
Statistical Physics in the Era of Big Data
ERIC Educational Resources Information Center
Wang, Dashun
2013-01-01
With the wealth of data provided by a wide range of high-throughout measurement tools and technologies, statistical physics of complex systems is entering a new phase, impacting in a meaningful fashion a wide range of fields, from cell biology to computer science to economics. In this dissertation, by applying tools and techniques developed in…
Quick Access: Find Statistical Data on the Internet.
ERIC Educational Resources Information Center
Su, Di
1999-01-01
Provides an annotated list of Internet sources (World Wide Web, ftp, and gopher sites) for current and historical statistical business data, including selected interest rates, the Consumer Price Index, the Producer Price Index, foreign currency exchange rates, noon buying rates, per diem rates, the special drawing right, stock quotes, and mutual…
ERIC Educational Resources Information Center
Knirk, Frederick G.
Designed to assist educational researchers in utilizing microcomputers, this paper presents information on four types of computer software: writing tools for educators, statistical software designed to perform analyses of small and moderately large data sets, project management tools, and general education/research oriented information services…
Using Non-Linear Statistical Methods with Laboratory Kinetic Data
NASA Technical Reports Server (NTRS)
Anicich, Vincent
1997-01-01
This paper will demonstrate the usefulness of standard non-linear statistical analysis on ICR and SIFT kinetic data. The specific systems used in the demonstration are the isotopic and change transfer reactions in the system of H2O+/D2O, H30+/D2O, and other permutations.
Harnessing Multivariate Statistics for Ellipsoidal Data in Structural Geology
NASA Astrophysics Data System (ADS)
Roberts, N.; Davis, J. R.; Titus, S.; Tikoff, B.
2015-12-01
Most structural geology articles do not state significance levels, report confidence intervals, or perform regressions to find trends. This is, in part, because structural data tend to include directions, orientations, ellipsoids, and tensors, which are not treatable by elementary statistics. We describe a full procedural methodology for the statistical treatment of ellipsoidal data. We use a reconstructed dataset of deformed ooids in Maryland from Cloos (1947) to illustrate the process. Normalized ellipsoids have five degrees of freedom and can be represented by a second order tensor. This tensor can be permuted into a five dimensional vector that belongs to a vector space and can be treated with standard multivariate statistics. Cloos made several claims about the distribution of deformation in the South Mountain fold, Maryland, and we reexamine two particular claims using hypothesis testing: 1) octahedral shear strain increases towards the axial plane of the fold; 2) finite strain orientation varies systematically along the trend of the axial trace as it bends with the Appalachian orogen. We then test the null hypothesis that the southern segment of South Mountain is the same as the northern segment. This test illustrates the application of ellipsoidal statistics, which combine both orientation and shape. We report confidence intervals for each test, and graphically display our results with novel plots. This poster illustrates the importance of statistics in structural geology, especially when working with noisy or small datasets.
Statistical data preparation: management of missing values and outliers.
Kwak, Sang Kyu; Kim, Jong Hae
2017-08-01
Missing values and outliers are frequently encountered while collecting data. The presence of missing values reduces the data available to be analyzed, compromising the statistical power of the study, and eventually the reliability of its results. In addition, it causes a significant bias in the results and degrades the efficiency of the data. Outliers significantly affect the process of estimating statistics (e.g., the average and standard deviation of a sample), resulting in overestimated or underestimated values. Therefore, the results of data analysis are considerably dependent on the ways in which the missing values and outliers are processed. In this regard, this review discusses the types of missing values, ways of identifying outliers, and dealing with the two.
Statistical summaries of fatigue data for design purposes
NASA Technical Reports Server (NTRS)
Wirsching, P. H.
1983-01-01
Two methods are discussed for constructing a design curve on the safe side of fatigue data. Both the tolerance interval and equivalent prediction interval (EPI) concepts provide such a curve while accounting for both the distribution of the estimators in small samples and the data scatter. The EPI is also useful as a mechanism for providing necessary statistics on S-N data for a full reliability analysis which includes uncertainty in all fatigue design factors. Examples of statistical analyses of the general strain life relationship are presented. The tolerance limit and EPI techniques for defining a design curve are demonstrated. Examples usng WASPALOY B and RQC-100 data demonstrate that a reliability model could be constructed by considering the fatigue strength and fatigue ductility coefficients as two independent random variables. A technique given for establishing the fatigue strength for high cycle lives relies on an extrapolation technique and also accounts for "runners." A reliability model or design value can be specified.
Statistical approaches for the analysis of DNA methylation microarray data.
Siegmund, Kimberly D
2011-06-01
Following the rapid development and adoption in DNA methylation microarray assays, we are now experiencing a growth in the number of statistical tools to analyze the resulting large-scale data sets. As is the case for other microarray applications, biases caused by technical issues are of concern. Some of these issues are old (e.g., two-color dye bias and probe- and array-specific effects), while others are new (e.g., fragment length bias and bisulfite conversion efficiency). Here, I highlight characteristics of DNA methylation that suggest standard statistical tools developed for other data types may not be directly suitable. I then describe the microarray technologies most commonly in use, along with the methods used for preprocessing and obtaining a summary measure. I finish with a section describing downstream analyses of the data, focusing on methods that model percentage DNA methylation as the outcome, and methods for integrating DNA methylation with gene expression or genotype data.
Ewusie, Joycelyne E; Blondal, Erik; Soobiah, Charlene; Beyene, Joseph; Thabane, Lehana; Straus, Sharon E; Hamid, Jemila S
2017-07-02
Interrupted time series (ITS) design involves collecting data across multiple time points before and after the implementation of an intervention to assess the effect of the intervention on an outcome. ITS designs have become increasingly common in recent times with frequent use in assessing impact of evidence implementation interventions. Several statistical methods are currently available for analysing data from ITS designs; however, there is a lack of guidance on which methods are optimal for different data types and on their implications in interpreting results. Our objective is to conduct a scoping review of existing methods for analysing ITS data, to summarise their characteristics and properties, as well as to examine how the results are reported. We also aim to identify gaps and methodological deficiencies. We will search electronic databases from inception until August 2016 (eg, MEDLINE and JSTOR). Two reviewers will independently screen titles, abstracts and full-text articles and complete the data abstraction. The anticipated outcome will be a summarised description of all the methods that have been used in analysing ITS data in health research, how those methods were applied, their strengths and limitations and the transparency of interpretation/reporting of the results. We will provide summary tables of the characteristics of the included studies. We will also describe the similarities and differences of the various methods. Ethical approval is not required for this study since we are just considering the methods used in the analysis and there will not be identifiable patient data. Results will be disseminated through open access peer-reviewed publications. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Edjabou, Maklawe Essonanawe; Martín-Fernández, Josep Antoni; Scheutz, Charlotte; Astrup, Thomas Fruergaard
2017-09-04
Data for fractional solid waste composition provide relative magnitudes of individual waste fractions, the percentages of which always sum to 100, thereby connecting them intrinsically. Due to this sum constraint, waste composition data represent closed data, and their interpretation and analysis require statistical methods, other than classical statistics that are suitable only for non-constrained data such as absolute values. However, the closed characteristics of waste composition data are often ignored when analysed. The results of this study showed, for example, that unavoidable animal-derived food waste amounted to 2.21±3.12% with a confidence interval of (-4.03; 8.45), which highlights the problem of the biased negative proportions. A Pearson's correlation test, applied to waste fraction generation (kg mass), indicated a positive correlation between avoidable vegetable food waste and plastic packaging. However, correlation tests applied to waste fraction compositions (percentage values) showed a negative association in this regard, thus demonstrating that statistical analyses applied to compositional waste fraction data, without addressing the closed characteristics of these data, have the potential to generate spurious or misleading results. Therefore, ¨compositional data should be transformed adequately prior to any statistical analysis, such as computing mean, standard deviation and correlation coefficients. Copyright © 2017 Elsevier Ltd. All rights reserved.
Hysteresis model and statistical interpretation of energy losses in non-oriented steels
NASA Astrophysics Data System (ADS)
Mănescu (Păltânea), Veronica; Păltânea, Gheorghe; Gavrilă, Horia
2016-04-01
In this paper the hysteresis energy losses in two non-oriented industrial steels (M400-65A and M800-65A) were determined, by means of an efficient classical Preisach model, which is based on the Pescetti-Biorci method for the identification of the Preisach density. The excess and the total energy losses were also determined, using a statistical framework, based on magnetic object theory. The hysteresis energy losses, in a non-oriented steel alloy, depend on the peak magnetic polarization and they can be computed using a Preisach model, due to the fact that in these materials there is a direct link between the elementary rectangular loops and the discontinuous character of the magnetization process (Barkhausen jumps). To determine the Preisach density it was necessary to measure the normal magnetization curve and the saturation hysteresis cycle. A system of equations was deduced and the Preisach density was calculated for a magnetic polarization of 1.5 T; then the hysteresis cycle was reconstructed. Using the same pattern for the Preisach distribution, it was computed the hysteresis cycle for 1 T. The classical losses were calculated using a well known formula and the excess energy losses were determined by means of the magnetic object theory. The total energy losses were mathematically reconstructed and compared with those, measured experimentally.
Feature-Based Statistical Analysis of Combustion Simulation Data
Bennett, J; Krishnamoorthy, V; Liu, S; Grout, R; Hawkes, E; Chen, J; Pascucci, V; Bremer, P T
2011-11-18
We present a new framework for feature-based statistical analysis of large-scale scientific data and demonstrate its effectiveness by analyzing features from Direct Numerical Simulations (DNS) of turbulent combustion. Turbulent flows are ubiquitous and account for transport and mixing processes in combustion, astrophysics, fusion, and climate modeling among other disciplines. They are also characterized by coherent structure or organized motion, i.e. nonlocal entities whose geometrical features can directly impact molecular mixing and reactive processes. While traditional multi-point statistics provide correlative information, they lack nonlocal structural information, and hence, fail to provide mechanistic causality information between organized fluid motion and mixing and reactive processes. Hence, it is of great interest to capture and track flow features and their statistics together with their correlation with relevant scalar quantities, e.g. temperature or species concentrations. In our approach we encode the set of all possible flow features by pre-computing merge trees augmented with attributes, such as statistical moments of various scalar fields, e.g. temperature, as well as length-scales computed via spectral analysis. The computation is performed in an efficient streaming manner in a pre-processing step and results in a collection of meta-data that is orders of magnitude smaller than the original simulation data. This meta-data is sufficient to support a fully flexible and interactive analysis of the features, allowing for arbitrary thresholds, providing per-feature statistics, and creating various global diagnostics such as Cumulative Density Functions (CDFs), histograms, or time-series. We combine the analysis with a rendering of the features in a linked-view browser that enables scientists to interactively explore, visualize, and analyze the equivalent of one terabyte of simulation data. We highlight the utility of this new framework for combustion
Statistical Quality Control of Moisture Data in GEOS DAS
NASA Technical Reports Server (NTRS)
Dee, D. P.; Rukhovets, L.; Todling, R.
1999-01-01
A new statistical quality control algorithm was recently implemented in the Goddard Earth Observing System Data Assimilation System (GEOS DAS). The final step in the algorithm consists of an adaptive buddy check that either accepts or rejects outlier observations based on a local statistical analysis of nearby data. A basic assumption in any such test is that the observed field is spatially coherent, in the sense that nearby data can be expected to confirm each other. However, the buddy check resulted in excessive rejection of moisture data, especially during the Northern Hemisphere summer. The analysis moisture variable in GEOS DAS is water vapor mixing ratio. Observational evidence shows that the distribution of mixing ratio errors is far from normal. Furthermore, spatial correlations among mixing ratio errors are highly anisotropic and difficult to identify. Both factors contribute to the poor performance of the statistical quality control algorithm. To alleviate the problem, we applied the buddy check to relative humidity data instead. This variable explicitly depends on temperature and therefore exhibits a much greater spatial coherence. As a result, reject rates of moisture data are much more reasonable and homogeneous in time and space.
Statistical Quality Control of Moisture Data in GEOS DAS
NASA Technical Reports Server (NTRS)
Dee, D. P.; Rukhovets, L.; Todling, R.
1999-01-01
A new statistical quality control algorithm was recently implemented in the Goddard Earth Observing System Data Assimilation System (GEOS DAS). The final step in the algorithm consists of an adaptive buddy check that either accepts or rejects outlier observations based on a local statistical analysis of nearby data. A basic assumption in any such test is that the observed field is spatially coherent, in the sense that nearby data can be expected to confirm each other. However, the buddy check resulted in excessive rejection of moisture data, especially during the Northern Hemisphere summer. The analysis moisture variable in GEOS DAS is water vapor mixing ratio. Observational evidence shows that the distribution of mixing ratio errors is far from normal. Furthermore, spatial correlations among mixing ratio errors are highly anisotropic and difficult to identify. Both factors contribute to the poor performance of the statistical quality control algorithm. To alleviate the problem, we applied the buddy check to relative humidity data instead. This variable explicitly depends on temperature and therefore exhibits a much greater spatial coherence. As a result, reject rates of moisture data are much more reasonable and homogeneous in time and space.
Data and statistical methods for analysis of trends and patterns
Atwood, C.L.; Gentillon, C.D.; Wilson, G.E.
1992-11-01
This report summarizes topics considered at a working meeting on data and statistical methods for analysis of trends and patterns in US commercial nuclear power plants. This meeting was sponsored by the Office of Analysis and Evaluation of Operational Data (AEOD) of the Nuclear Regulatory Commission (NRC). Three data sets are briefly described: Nuclear Plant Reliability Data System (NPRDS), Licensee Event Report (LER) data, and Performance Indicator data. Two types of study are emphasized: screening studies, to see if any trends or patterns appear to be present; and detailed studies, which are more concerned with checking the analysis assumptions, modeling any patterns that are present, and searching for causes. A prescription is given for a screening study, and ideas are suggested for a detailed study, when the data take of any of three forms: counts of events per time, counts of events per demand, and non-event data.
Using demographic data to better interpret pitfall trap catches
Matalin, Andrey V.; Makarov, Kirill V.
2011-01-01
Abstract The results of pitfall trapping are often interpreted as abundance in a particular habitat. At the same time, there are numerous cases of almost unrealistically high catches of ground beetles in seemingly unsuitable sites. The correlation of catches by pitfall trapping with the true distribution and abundance of Carabidae needs corroboration. During a full year survey in 2006/07 in the Lake Elton region (Volgograd Area, Russia), 175 species of ground beetles were trapped. Considering the differences in demographic structure of the local populations, and not their abundances, three groups of species were recognized: residents, migrants and sporadic. In residents, the demographic structure of local populations is complete, and their habitats can be considered “residential”. In migrants and sporadic species, the demographic structure of the local populations is incomplete, and their habitats can be considered “transit”. Residents interact both with their prey and with each other in a particular habitat. Sporadic species are hardly important to a carabid community because of their low abundances. The contribution of migrants to the structure of carabid communities is not apparent and requires additional research. Migrants and sporadic species represent a “labile” component in ground beetles communities, as opposed to a “stable” component, represented by residents. The variability of the labile component substantially limits our interpretation of species diversity in carabid communities. Thus, the criteria for determining the most abundant, or dominant species inevitably vary because the abundance of migrants in some cases can be one order of magnitude higher than that of residents. The results of pitfall trapping adequately reflect the state of carabid communities only in zonal habitats, while azonal and disturbed habitats are merely transit ones for many species of ground beetles. A study of the demographic structure of local populations and
VOStat: The R Statistics Package for Astronomical Data Analysis
NASA Astrophysics Data System (ADS)
Feigelson, E. D.; Babu, J.; Mahabal, A.; Djorgovski, S. G.; Graham, M.; Williams, R.; Nichol, R.; Vanden Berk, D.; Wasserman, L.
2004-05-01
A long-standing limitation on the range and sophistication of statistical analysis of astronomical data has been the paucity of non-proprietary software. Our StatCodes Web metasite (http://www.astro.psu.edu/statcodes) has provided links to statistical programs and packages useful to astronomers and other physical scientists, but this collection is heterogeneous and quite incomplete. A more comprehensive and coherent data and statistical analysis environment has recently emerged in the open source R package (http://www.r-project.org) and its associated CRAN archive of add-on packages (http://lib.stat.cmu.edu/R/CRAN). Together, they provide hundreds of statistical functionalities, both simple and advanced, in a programmable data analysis environment with graphics and flexible links to external programs, languages, and databases. We outline these capabilities covering multivariate analysis and classification, parametric and non-parametric tests, regression and smoothing, survival analysis, time series analysis, and spatial analysis. Some of these R functionalities are implemented in our VOStatistics Web service. This work is supported in part by the NSF grant DMS-0101360.
Fusing measurements statistically: combining aerosol data from MISR and MODIS
NASA Astrophysics Data System (ADS)
Cressie, N.; Braverman, A.; Nguyen, H.
2007-12-01
We are interested in producing an aerosol data set that provides 1) the best possible representation of aerosol properties, given information from the MISR and MODIS instruments, and 2) quantitative measures of uncertainty associated with that representation. Uncertainties are due to instrument measurement errors, aggregations over space and time arising from different sampling characteristics and footprints, and incomplete data. Our approach is to consider this as a statistical estimation problem. That is, using all information available, find the best statistical estimate of the quantity of interest, say aerosol optical depth (AOD), as a function of location and time. We do this in two steps. First, we use geostatistical smoothing (GS) to estimate the true values of AOD using each instrument's data individually, on a reference grid of locations and times. GS, also known as kriging, is a spatial analog of simple linear regression that accounts for and exploits spatial autocorrelation to produce optimal estimates, or predictions, of unobserved values. Estimates from GS are routinely accompanied by the kriging variance, a formal measure of estimation uncertainty. In the second step, which we call Bayesian Data Fusion (BDF), we form linear combinations of instruments' smoothed estimates at each point of the reference grid. The coefficients for these linear combinations are derived from a statistical model for the relationship between the smoothed data and the true but unobserved values of AOD. BDF not only combines the individual instruments' information in a statistically optimal fashion, but also propagates their uncertainties through the fusion step to produce the desired data set. Both GS and BDF have been used successfully for many years in social- and physical-science applications; their combination in this context offers a coherent way to make inferences with NASA data in the presence of uncertainty.
ERIC Educational Resources Information Center
Mickler, J. Ernest
This 60th annual report on collegiate enrollments in the United States is based on data received from 1,635 four-year institutions in the U.S., Puerto Rico, and the U.S. Territories. General notes, survey methodology notes, and a summary of findings are presented. Detailed statistical charts present institutional data on men and women students and…
Mathematical and statistical approaches for interpreting biomarker compounds in exhaled human breath
The various instrumental techniques, human studies, and diagnostic tests that produce data from samples of exhaled breath have one thing in common: they all need to be put into a context wherein a posed question can actually be answered. Exhaled breath contains numerous compoun...
Mathematical and statistical approaches for interpreting biomarker compounds in exhaled human breath
The various instrumental techniques, human studies, and diagnostic tests that produce data from samples of exhaled breath have one thing in common: they all need to be put into a context wherein a posed question can actually be answered. Exhaled breath contains numerous compoun...
Interpreting the Results of Weighted Least-Squares Regression: Caveats for the Statistical Consumer.
ERIC Educational Resources Information Center
Willett, John B.; Singer, Judith D.
In research, data sets often occur in which the variance of the distribution of the dependent variable at given levels of the predictors is a function of the values of the predictors. In this situation, the use of weighted least-squares (WLS) or techniques is required. Weights suitable for use in a WLS regression analysis must be estimated. A…
Statistical analysis of the seasonal variation in demographic data.
Fellman, J; Eriksson, A W
2000-10-01
There has been little agreement as to whether reproduction or similar demographic events occur seasonally and, especially, whether there is any universal seasonal pattern. One reason is that the seasonal pattern may vary in different populations and at different times. Another reason is that different statistical methods have been used. Every statistical model is based on certain assumed conditions and hence is designed to identify specific components of the seasonal pattern. Therefore, the statistical method applied should be chosen with due consideration. In this study we present, develop, and compare different statistical methods for the study of seasonal variation. Furthermore, we stress that the methods are applicable for the analysis of many kinds of demographic data. The first approaches in the literature were based on monthly frequencies, on the simple sine curve, and on the approximation that the months are of equal length. Later, "the population at risk" and the fact that the months have different lengths were considered. Under these later assumptions the targets of the statistical analyses are the rates. In this study we present and generalize the earlier models. Furthermore, we use trigonometric regression methods. The trigonometric regression model in its simplest form corresponds to the sine curve. We compare the regression methods with the earlier models and reanalyze some data. Our results show that models for rates eliminate the disturbing effects of the varying length of the months, including the effect of leap years, and of the seasonal pattern of the population at risk. Therefore, they give the purest analysis of the seasonal pattern of the demographic data in question, e.g., rates of general births, twin maternities, neural tube defects, and mortality. Our main finding is that the trigonometric regression methods are more flexible and easier to handle than the earlier methods, particularly when the data differ from the simple sine curve.
Statistically invalid classification of high throughput gene expression data.
Barbash, Shahar; Soreq, Hermona
2013-01-01
Classification analysis based on high throughput data is a common feature in neuroscience and other fields of science, with a rapidly increasing impact on both basic biology and disease-related studies. The outcome of such classifications often serves to delineate novel biochemical mechanisms in health and disease states, identify new targets for therapeutic interference, and develop innovative diagnostic approaches. Given the importance of this type of studies, we screened 111 recently-published high-impact manuscripts involving classification analysis of gene expression, and found that 58 of them (53%) based their conclusions on a statistically invalid method which can lead to bias in a statistical sense (lower true classification accuracy then the reported classification accuracy). In this report we characterize the potential methodological error and its scope, investigate how it is influenced by different experimental parameters, and describe statistically valid methods for avoiding such classification mistakes.
Statistical Treatment of Earth Observing System Pyroshock Separation Test Data
NASA Technical Reports Server (NTRS)
McNelis, Anne M.; Hughes, William O.
1998-01-01
The Earth Observing System (EOS) AM-1 spacecraft for NASA's Mission to Planet Earth is scheduled to be launched on an Atlas IIAS vehicle in June of 1998. One concern is that the instruments on the EOS spacecraft are sensitive to the shock-induced vibration produced when the spacecraft separates from the launch vehicle. By employing unique statistical analysis to the available ground test shock data, the NASA Lewis Research Center found that shock-induced vibrations would not be as great as the previously specified levels of Lockheed Martin. The EOS pyroshock separation testing, which was completed in 1997, produced a large quantity of accelerometer data to characterize the shock response levels at the launch vehicle/spacecraft interface. Thirteen pyroshock separation firings of the EOS and payload adapter configuration yielded 78 total measurements at the interface. The multiple firings were necessary to qualify the newly developed Lockheed Martin six-hardpoint separation system. Because of the unusually large amount of data acquired, Lewis developed a statistical methodology to predict the maximum expected shock levels at the interface between the EOS spacecraft and the launch vehicle. Then, this methodology, which is based on six shear plate accelerometer measurements per test firing at the spacecraft/launch vehicle interface, was used to determine the shock endurance specification for EOS. Each pyroshock separation test of the EOS spacecraft simulator produced its own set of interface accelerometer data. Probability distributions, histograms, the median, and higher order moments (skew and kurtosis) were analyzed. The data were found to be lognormally distributed, which is consistent with NASA pyroshock standards. Each set of lognormally transformed test data produced was analyzed to determine if the data should be combined statistically. Statistical testing of the data's standard deviations and means (F and t testing, respectively) determined if data sets were
Collegiate Enrollments in the U.S., 1981-82. Statistics, Interpretations, and Trends.
ERIC Educational Resources Information Center
Mickler, J. Ernest
Data and narrative information are presented on college enrollments, based on a survey of institutions in the United States, Puerto Rico, and U.S. Territories. The total four-year college enrollment for fall 1981 was 7,530,013, of which 5,306,832 were full-time and 2,223,181 were part-time. The total two-year college enrollment for fall 1981 was…
NASA Astrophysics Data System (ADS)
Abraham, J. D.; Ball, L. B.; Bedrosian, P. A.; Cannia, J. C.; Deszcz-Pan, M.; Minsley, B. J.; Peterson, S. M.; Smith, B. D.
2009-12-01
contacts between hydrostratigraphic units. This provides a 3D image of the hydrostratigraphic units interpreted from the electrical resistivity derived from the HEM tied to statistical confidences on the picked contacts. The interpreted 2D and 3D data provides the groundwater modeler with a high-resolution hydrogeologic framework and a solid understanding of the uncertainty in the information it provides. This interpretation facilitates more informed modeling decisions, more accurate groundwater models, and development of more effective water-resources management strategies.
Statistical Analysis of Strength Data for an Aerospace Aluminum Alloy
NASA Technical Reports Server (NTRS)
Neergaard, Lynn; Malone, Tina; Gentz, Steven J. (Technical Monitor)
2000-01-01
Aerospace vehicles are produced in limited quantities that do not always allow development of MIL-HDBK-5 A-basis design allowables. One method of examining production and composition variations is to perform 100% lot acceptance testing for aerospace Aluminum (Al) alloys. This paper discusses statistical trends seen in strength data for one Al alloy. A four-step approach reduced the data to residuals, visualized residuals as a function of time, grouped data with quantified scatter, and conducted analysis of variance (ANOVA).
Statistical Analysis of Strength Data for an Aerospace Aluminum Alloy
NASA Technical Reports Server (NTRS)
Neergaard, L.; Malone, T.
2001-01-01
Aerospace vehicles are produced in limited quantities that do not always allow development of MIL-HDBK-5 A-basis design allowables. One method of examining production and composition variations is to perform 100% lot acceptance testing for aerospace Aluminum (Al) alloys. This paper discusses statistical trends seen in strength data for one Al alloy. A four-step approach reduced the data to residuals, visualized residuals as a function of time, grouped data with quantified scatter, and conducted analysis of variance (ANOVA).
Statistical comparison of similarity tests applied to speech production data
NASA Astrophysics Data System (ADS)
Kollia, H.; Jorgenson, Jay; Saint Fleur, Rose; Foster, Kevin
2004-05-01
Statistical analysis of data variability in speech production research has traditionally been addressed with the assumption of normally distributed error terms. The correct and valid application of statistical procedure requires a thorough investigation of the assumptions that underlie the methodology. In previous work [Kollia and Jorgenson, J. Acoust. Soc. Am. 102 (1997); 109 (2002)], it was shown that the error terms of speech production data in a linear regression can be modeled accurately using a quadratic probability distribution, rather than a normal distribution as is frequently assumed. The measurement used in the earlier Kollia-Jorgenson work involved the classical Kolmogorov-Smirnov statistical test. In the present work, the authors further explore the problem of analyzing the error terms coming from linear regression using a variety of known statistical tests, including, but not limited to chi-square, Kolmogorov-Smirnov, Anderson-Darling, Cramer-von Mises, skewness and kurtosis, and Durbin. Our study complements a similar study by Shapiro, Wilk, and Chen [J. Am. Stat. Assoc. (1968)]. [Partial support provided by PSC-CUNY and NSF to Jay Jorgenson.
Statistical modeling of natural backgrounds in hyperspectral LWIR data
NASA Astrophysics Data System (ADS)
Truslow, Eric; Manolakis, Dimitris; Cooley, Thomas; Meola, Joseph
2016-09-01
Hyperspectral sensors operating in the long wave infrared (LWIR) have a wealth of applications including remote material identification and rare target detection. While statistical models for modeling surface reflectance in visible and near-infrared regimes have been well studied, models for the temperature and emissivity in the LWIR have not been rigorously investigated. In this paper, we investigate modeling hyperspectral LWIR data using a statistical mixture model for the emissivity and surface temperature. Statistical models for the surface parameters can be used to simulate surface radiances and at-sensor radiance which drives the variability of measured radiance and ultimately the performance of signal processing algorithms. Thus, having models that adequately capture data variation is extremely important for studying performance trades. The purpose of this paper is twofold. First, we study the validity of this model using real hyperspectral data, and compare the relative variability of hyperspectral data in the LWIR and visible and near-infrared (VNIR) regimes. Second, we illustrate how materials that are easily distinguished in the VNIR, may be difficult to separate when imaged in the LWIR.
Kim, Kyoung-Ho; Yun, Seong-Taek; Choi, Byoung-Young; Chae, Gi-Tak; Joo, Yongsung; Kim, Kangjoo; Kim, Hyoung-Soo
2009-07-21
Hydrochemical and multivariate statistical interpretations of 16 physicochemical parameters of 45 groundwater samples from a riverside alluvial aquifer underneath an agricultural area in Osong, central Korea, were performed in this study to understand the spatial controls of nitrate concentrations in terms of biogeochemical processes occurring near oxbow lakes within a fluvial plain. Nitrate concentrations in groundwater showed a large variability from 0.1 to 190.6 mg/L (mean=35.0 mg/L) with significantly lower values near oxbow lakes. The evaluation of hydrochemical data indicated that the groundwater chemistry (especially, degree of nitrate contamination) is mainly controlled by two competing processes: 1) agricultural contamination and 2) redox processes. In addition, results of factorial kriging, consisting of two steps (i.e., co-regionalization and factor analysis), reliably showed a spatial control of the concentrations of nitrate and other redox-sensitive species; in particular, significant denitrification was observed restrictedly near oxbow lakes. The results of this study indicate that sub-oxic conditions in an alluvial groundwater system are developed geologically and geochemically in and near oxbow lakes, which can effectively enhance the natural attenuation of nitrate before the groundwater discharges to nearby streams. This study also demonstrates the usefulness of multivariate statistical analysis in groundwater study as a supplementary tool for interpretation of complex hydrochemical data sets.
NASA Astrophysics Data System (ADS)
Kim, Kyoung-Ho; Yun, Seong-Taek; Choi, Byoung-Young; Chae, Gi-Tak; Joo, Yongsung; Kim, Kangjoo; Kim, Hyoung-Soo
2009-07-01
Hydrochemical and multivariate statistical interpretations of 16 physicochemical parameters of 45 groundwater samples from a riverside alluvial aquifer underneath an agricultural area in Osong, central Korea, were performed in this study to understand the spatial controls of nitrate concentrations in terms of biogeochemical processes occurring near oxbow lakes within a fluvial plain. Nitrate concentrations in groundwater showed a large variability from 0.1 to 190.6 mg/L (mean = 35.0 mg/L) with significantly lower values near oxbow lakes. The evaluation of hydrochemical data indicated that the groundwater chemistry (especially, degree of nitrate contamination) is mainly controlled by two competing processes: 1) agricultural contamination and 2) redox processes. In addition, results of factorial kriging, consisting of two steps (i.e., co-regionalization and factor analysis), reliably showed a spatial control of the concentrations of nitrate and other redox-sensitive species; in particular, significant denitrification was observed restrictedly near oxbow lakes. The results of this study indicate that sub-oxic conditions in an alluvial groundwater system are developed geologically and geochemically in and near oxbow lakes, which can effectively enhance the natural attenuation of nitrate before the groundwater discharges to nearby streams. This study also demonstrates the usefulness of multivariate statistical analysis in groundwater study as a supplementary tool for interpretation of complex hydrochemical data sets.
Virsik, R.P.; Harder, D.
1981-01-01
The hypothesis that overdispersion of the chromosome aberration number per cell results from multiple aberrations per particle traversal is investigated in mathematical terms. At a given absorbed dose, Poisson distributions are assumed both for the number of ionizing particles traversing a cell nucleus and for the number of aberrations induced by a single particle traversal. The resulting distribution of the number of aberrations per cell is the Neyman type A distribution, a special case of the generalized Poisson distribution. This function is generally overdispersed, its relative variance 1 + lambda being determined by the expectation value lambda of aberrations per particle traversal. Data from experiments with neutrons and ..cap alpha.. particles are found to agree with this theory. The developed formalism provides a method to determine the efficiency of aberration induction per particle traversal, lambda, from the frequency distribution of aberrations.
Bayesian Case Influence Measures for Statistical Models with Missing Data
Zhu, Hongtu; Ibrahim, Joseph G.; Cho, Hyunsoon; Tang, Niansheng
2011-01-01
We examine three Bayesian case influence measures including the φ-divergence, Cook's posterior mode distance and Cook's posterior mean distance for identifying a set of influential observations for a variety of statistical models with missing data including models for longitudinal data and latent variable models in the absence/presence of missing data. Since it can be computationally prohibitive to compute these Bayesian case influence measures in models with missing data, we derive simple first-order approximations to the three Bayesian case influence measures by using the Laplace approximation formula and examine the applications of these approximations to the identification of influential sets. All of the computations for the first-order approximations can be easily done using Markov chain Monte Carlo samples from the posterior distribution based on the full data. Simulated data and an AIDS dataset are analyzed to illustrate the methodology. PMID:23399928
Noshing on Numbers: Using and Interpreting Data in Activities
NASA Astrophysics Data System (ADS)
Shupla, C. B.
2014-07-01
Students must learn how to plot and analyze data as a fundamental science and math skill. Data must also be incorporated into activities in meaningful ways that allow students to build understanding of the concepts being shared. In this workshop, attendees participated in three graphing activities, which served as the basis for discussion of these numerical literacy issues in the science classroom.
Pre-Service Teachers' Interpretation of CBM Progress Monitoring Data
ERIC Educational Resources Information Center
Wagner, Dana L.; Hammerschmidt-Snidarich, Stephanie M.; Espin, Christine A.; Seifert, Kathleen; McMaster, Kristen L.
2017-01-01
Teachers must be proficient at using data to evaluate the effects of instructional strategies and interventions, and must be able to make, describe, justify, and validate their data-based instructional decisions to parents, students, and educational colleagues. An important related skill is the ability to accurately read and interpret…
Helping Students Interpret Large-Scale Data Tables
ERIC Educational Resources Information Center
Prodromou, Theodosia
2016-01-01
New technologies have completely altered the ways that citizens can access data. Indeed, emerging online data sources give citizens access to an enormous amount of numerical information that provides new sorts of evidence used to influence public opinion. In this new environment, two trends have had a significant impact on our increasingly…
A Geophysical Atlas for Interpretation of Satellite-derived Data
NASA Technical Reports Server (NTRS)
Lowman, P. D., Jr. (Editor); Frey, H. V. (Editor); Davis, W. M.; Greenberg, A. P.; Hutchinson, M. K.; Langel, R. A.; Lowrey, B. E.; Marsh, J. G.; Mead, G. D.; Okeefe, J. A.
1979-01-01
A compilation of maps of global geophysical and geological data plotted on a common scale and projection is presented. The maps include satellite gravity, magnetic, seismic, volcanic, tectonic activity, and mantle velocity anomaly data. The Bibliographic references for all maps are included.
Resonant heating - An interpretation of coronal loop data
NASA Technical Reports Server (NTRS)
Hollweg, J. V.; Sterling, A. C.
1984-01-01
It is shown that the resonant heating theory of Hollweg can be used to organize the coronal loop data of Golub et al. (1980). When combined with a reasonable form for the input power spectrum, the resonant heating theory is fully compatible with the loop data.
Helping Students Interpret Large-Scale Data Tables
ERIC Educational Resources Information Center
Prodromou, Theodosia
2016-01-01
New technologies have completely altered the ways that citizens can access data. Indeed, emerging online data sources give citizens access to an enormous amount of numerical information that provides new sorts of evidence used to influence public opinion. In this new environment, two trends have had a significant impact on our increasingly…
Kissling, Grace E.; Haseman, Joseph K.; Zeiger, Errol
2014-01-01
A recent article by Gaus (2014) demonstrates a serious misunderstanding of the NTP’s statistical analysis and interpretation of rodent carcinogenicity data as reported in Technical Report 578 (Ginkgo biloba) (NTP 2013), as well as a failure to acknowledge the abundant literature on false positive rates in rodent carcinogenicity studies. The NTP reported Ginkgo biloba extract to be carcinogenic in mice and rats. Gaus claims that, in this study, 4800 statistical comparisons were possible, and that 209 of them were statistically significant (p<0.05) compared with 240 (4800 × 0.05) expected by chance alone; thus, the carcinogenicity of Ginkgo biloba extract cannot be definitively established. However, his assumptions and calculations are flawed since he incorrectly assumes that the NTP uses no correction for multiple comparisons, and that significance tests for discrete data operate at exactly the nominal level. He also misrepresents the NTP’s decision making process, overstates the number of statistical comparisons made, and ignores that fact that that the mouse liver tumor effects were so striking (e.g., p<0.0000000000001) that it is virtually impossible that they could be false positive outcomes. Gaus’ conclusion that such obvious responses merely “generate a hypothesis” rather than demonstrate a real carcinogenic effect has no scientific credibility. Moreover, his claims regarding the high frequency of false positive outcomes in carcinogenicity studies are misleading because of his methodological misconceptions and errors. PMID:25261588
Kissling, Grace E; Haseman, Joseph K; Zeiger, Errol
2015-09-02
A recent article by Gaus (2014) demonstrates a serious misunderstanding of the NTP's statistical analysis and interpretation of rodent carcinogenicity data as reported in Technical Report 578 (Ginkgo biloba) (NTP, 2013), as well as a failure to acknowledge the abundant literature on false positive rates in rodent carcinogenicity studies. The NTP reported Ginkgo biloba extract to be carcinogenic in mice and rats. Gaus claims that, in this study, 4800 statistical comparisons were possible, and that 209 of them were statistically significant (p<0.05) compared with 240 (4800×0.05) expected by chance alone; thus, the carcinogenicity of Ginkgo biloba extract cannot be definitively established. However, his assumptions and calculations are flawed since he incorrectly assumes that the NTP uses no correction for multiple comparisons, and that significance tests for discrete data operate at exactly the nominal level. He also misrepresents the NTP's decision making process, overstates the number of statistical comparisons made, and ignores the fact that the mouse liver tumor effects were so striking (e.g., p<0.0000000000001) that it is virtually impossible that they could be false positive outcomes. Gaus' conclusion that such obvious responses merely "generate a hypothesis" rather than demonstrate a real carcinogenic effect has no scientific credibility. Moreover, his claims regarding the high frequency of false positive outcomes in carcinogenicity studies are misleading because of his methodological misconceptions and errors.
Probability and Statistics in Astronomical Machine Learning and Data Minin
NASA Astrophysics Data System (ADS)
Scargle, Jeffrey
2012-03-01
Statistical issues peculiar to astronomy have implications for machine learning and data mining. It should be obvious that statistics lies at the heart of machine learning and data mining. Further it should be no surprise that the passive observational nature of astronomy, the concomitant lack of sampling control, and the uniqueness of its realm (the whole universe!) lead to some special statistical issues and problems. As described in the Introduction to this volume, data analysis technology is largely keeping up with major advances in astrophysics and cosmology, even driving many of them. And I realize that there are many scientists with good statistical knowledge and instincts, especially in the modern era I like to call the Age of Digital Astronomy. Nevertheless, old impediments still lurk, and the aim of this chapter is to elucidate some of them. Many experiences with smart people doing not-so-smart things (cf. the anecdotes collected in the Appendix here) have convinced me that the cautions given here need to be emphasized. Consider these four points: 1. Data analysis often involves searches of many cases, for example, outcomes of a repeated experiment, for a feature of the data. 2. The feature comprising the goal of such searches may not be defined unambiguously until the search is carried out, or perhaps vaguely even then. 3. The human visual system is very good at recognizing patterns in noisy contexts. 4. People are much easier to convince of something they want to believe, or already believe, as opposed to unpleasant or surprising facts. One can argue that all four are good things during the initial, exploratory phases of most data analysis. They represent the curiosity and creativity of the scientific process, especially during the exploration of data collections from new observational programs such as all-sky surveys in wavelengths not accessed before or sets of images of a planetary surface not yet explored. On the other hand, confirmatory scientific
Sampson, Uchechukwu K A; Metcalfe, Chris; Pfeffer, Marc A; Solomon, Scott D; Zou, Kelly H
2010-10-01
In trials of chronic disease therapy, each patient may experience several nonfatal illnesses and death. "Composite" outcome measures combine information from these different components of disease burden. Most common is the binary distinction between patients undergoing one or more events and those undergoing no events. We compare this approach with a composite score that preserves information on the number and severity of events. The binary composite measure and composite score were derived for each patient in a trial of cardiovascular therapy. All nonfatal events contributed to the composite score according to their severity: recurrent myocardial infarction (weight 0.5), congestive heart failure that required the use of open-label angiotensin-converting enzyme (ACE) inhibitors (weight 0.2), and hospitalization to treat congestive heart failure (weight 0.5). In the example data set, the composite score required a 10% larger sample size to achieve the same power as the binary measure. However, the composite score suggested that the treatment impacted on the first nonfatal event and mortality only. The composite score provides a more informative measure of disease burden and may avoid overestimating the evidence supporting a treatment effect when that evidence is largely from less severe early events. Copyright (c) 2010 Elsevier Inc. All rights reserved.
Summary of Quantitative Interpretation of Image Far Ultraviolet Auroral Data
NASA Technical Reports Server (NTRS)
Frey, H. U.; Immel, T. J.; Mende, S. B.; Gerard, J.-C.; Hubert, B.; Habraken, S.; Span, J.; Gladstone, G. R.; Bisikalo, D. V.; Shematovich, V. I.; Six, N. Frank (Technical Monitor)
2002-01-01
Direct imaging of the magnetosphere by instruments on the IMAGE spacecraft is supplemented by simultaneous observations of the global aurora in three far ultraviolet (FUV) wavelength bands. The purpose of the multi-wavelength imaging is to study the global auroral particle and energy input from thc magnetosphere into the atmosphere. This paper describes provides the method for quantitative interpretation of FUV measurements. The Wide-Band Imaging Camera (WIC) provides broad band ultraviolet images of the aurora with maximum spatial and temporal resolution by imaging the nitrogen lines and bands between 140 and 180 nm wavelength. The Spectrographic Imager (SI), a dual wavelength monochromatic instrument, images both Doppler-shifted Lyman alpha emissions produced by precipitating protons, in the SI-12 channel and OI 135.6 nm emissions in the SI-13 channel. From the SI-12 Doppler shifted Lyman alpha images it is possible to obtain the precipitating proton flux provided assumptions are made regarding the mean energy of the protons. Knowledge of the proton (flux and energy) component allows the calculation of the contribution produced by protons in the WIC and SI-13 instruments. Comparison of the corrected WIC and SI-13 signals provides a measure of the electron mean energy, which can then be used to determine the electron energy fluxun-. To accomplish this reliable modeling emission modeling and instrument calibrations are required. In-flight calibration using early-type stars was used to validate the pre-flight laboratory calibrations and determine long-term trends in sensitivity. In general, very reasonable agreement is found between in-situ measurements and remote quantitative determinations.
Summary of Quantitative Interpretation of Image Far Ultraviolet Auroral Data
NASA Technical Reports Server (NTRS)
Frey, H. U.; Immel, T. J.; Mende, S. B.; Gerard, J.-C.; Hubert, B.; Habraken, S.; Span, J.; Gladstone, G. R.; Bisikalo, D. V.; Shematovich, V. I.;
2002-01-01
Direct imaging of the magnetosphere by instruments on the IMAGE spacecraft is supplemented by simultaneous observations of the global aurora in three far ultraviolet (FUV) wavelength bands. The purpose of the multi-wavelength imaging is to study the global auroral particle and energy input from thc magnetosphere into the atmosphere. This paper describes provides the method for quantitative interpretation of FUV measurements. The Wide-Band Imaging Camera (WIC) provides broad band ultraviolet images of the aurora with maximum spatial and temporal resolution by imaging the nitrogen lines and bands between 140 and 180 nm wavelength. The Spectrographic Imager (SI), a dual wavelength monochromatic instrument, images both Doppler-shifted Lyman alpha emissions produced by precipitating protons, in the SI-12 channel and OI 135.6 nm emissions in the SI-13 channel. From the SI-12 Doppler shifted Lyman alpha images it is possible to obtain the precipitating proton flux provided assumptions are made regarding the mean energy of the protons. Knowledge of the proton (flux and energy) component allows the calculation of the contribution produced by protons in the WIC and SI-13 instruments. Comparison of the corrected WIC and SI-13 signals provides a measure of the electron mean energy, which can then be used to determine the electron energy fluxun-. To accomplish this reliable modeling emission modeling and instrument calibrations are required. In-flight calibration using early-type stars was used to validate the pre-flight laboratory calibrations and determine long-term trends in sensitivity. In general, very reasonable agreement is found between in-situ measurements and remote quantitative determinations.
Theoretical Interpretation of Pass 8 Fermi -LAT e + + e - Data
Di Mauro, M.; Manconi, S.; Vittino, A.; ...
2017-08-17
The flux of positrons and electrons (e + + e -) has been measured by the Fermi Large Area Telescope (LAT) in the energy range between 7 GeV and 2 TeV. Here, we discuss a number of interpretations of Pass 8 Fermi-LAT e sup>+ + e - spectrum, combining electron and positron emission from supernova remnants (SNRs) and pulsar wind nebulae (PWNe), or produced by the collision of cosmic rays (CRs) with the interstellar medium. We also found that the Fermi-LAT spectrum is compatible with the sum of electrons from a smooth SNR population, positrons from cataloged PWNe, and amore » secondary component. If we include in our analysis constraints from the AMS-02 positron spectrum, we obtain a slightly worse fit to the e sup>+ + e - Fermi-LAT spectrum, depending on the propagation model. As an additional scenario, we replace the smooth SNR component within 0.7 kpc with the individual sources found in Green's catalog of Galactic SNRs. We find that separate consideration of far and near sources helps to reproduce the e sup>+ + e - Fermi-LAT spectrum. However, we show that the fit degrades when the radio constraints on the positron emission from Vela SNR (which is the main contributor at high energies) are taken into account. We find that a break in the power-law injection spectrum at about 100 GeV can also reproduce the measured e sup>+ + e - spectrum and, among the CR propagation models that we consider, no reasonable break of the power-law dependence of the diffusion coefficient can modify the electron flux enough to reproduce the observed shape.« less
A decision-theory approach to interpretable set analysis for high-dimensional data.
Boca, Simina M; Bravo, Héctor Céorrada; Caffo, Brian; Leek, Jeffrey T; Parmigiani, Giovanni
2013-09-01
A key problem in high-dimensional significance analysis is to find pre-defined sets that show enrichment for a statistical signal of interest; the classic example is the enrichment of gene sets for differentially expressed genes. Here, we propose a new decision-theory approach to the analysis of gene sets which focuses on estimating the fraction of non-null variables in a set. We introduce the idea of "atoms," non-overlapping sets based on the original pre-defined set annotations. Our approach focuses on finding the union of atoms that minimizes a weighted average of the number of false discoveries and missed discoveries. We introduce a new false discovery rate for sets, called the atomic false discovery rate (afdr), and prove that the optimal estimator in our decision-theory framework is to threshold the afdr. These results provide a coherent and interpretable framework for the analysis of sets that addresses the key issues of overlapping annotations and difficulty in interpreting p values in both competitive and self-contained tests. We illustrate our method and compare it to a popular existing method using simulated examples, as well as gene-set and brain ROI data analyses. © 2013, The International Biometric Society.
Tunariu, Aneta D; Reavey, Paula
2007-12-01
This paper explores the notion of sexual boredom through combining the use of qualitative and quantitative methods. Drawing on ideas from discursive psychology, we provide an interpretative reading of both numerical and textual data obtained via a postal questionnaire. Within the mixed-methods strategy adopted here, the questionnaire is treated as a medium that can deliver interesting material about prevalent linguistic resources, their content and pattern of use, available to romantic partners in making sense of sexual boredom. A total of 144 women and 66 men from the general population completed a set of structured questions, including a Sexual Boredom Scale (SBS; Watt & Ewing, 1996), followed by an open-ended question prompting more elaborated views on the topic. Statistical analysis found gender to explain some of the variation across SBS scores. An interpretative analysis of respondent ratings of disagreement/agreement and the actual meaning content of the scale's statements also reveals ranked and gendered regularities. Written responses to the open-ended question were subjected to a thematic analysis, revealing how specific changes to quality of sex, intensity of sexual interest and degree of romantic relatedness with a current partner are used by participants to delineate key dimensions of sexual boredom. Overall, the unfolding narratives of sexual boredom are greatly indebted to a static view of relationship satisfaction founded on wishful expectations for consistent, idealized displays of sexual excitement and interest from oneself and one's partner. The interplay between these understandings and a missing discourse of sexuo-erotic calmness is also considered.
Adaptive statistical pattern classifiers for remotely sensed data
NASA Technical Reports Server (NTRS)
Gonzalez, R. C.; Pace, M. O.; Raulston, H. S.
1975-01-01
A technique for the adaptive estimation of nonstationary statistics necessary for Bayesian classification is developed. The basic approach to the adaptive estimation procedure consists of two steps: (1) an optimal stochastic approximation of the parameters of interest and (2) a projection of the parameters in time or position. A divergence criterion is developed to monitor algorithm performance. Comparative results of adaptive and nonadaptive classifier tests are presented for simulated four dimensional spectral scan data.
Computational and Statistical Analysis of Protein Mass Spectrometry Data
Noble, William Stafford; MacCoss, Michael J.
2012-01-01
High-throughput proteomics experiments involving tandem mass spectrometry produce large volumes of complex data that require sophisticated computational analyses. As such, the field offers many challenges for computational biologists. In this article, we briefly introduce some of the core computational and statistical problems in the field and then describe a variety of outstanding problems that readers of PLoS Computational Biology might be able to help solve. PMID:22291580
A statistical model for iTRAQ data analysis.
Hill, Elizabeth G; Schwacke, John H; Comte-Walters, Susana; Slate, Elizabeth H; Oberg, Ann L; Eckel-Passow, Jeanette E; Therneau, Terry M; Schey, Kevin L
2008-08-01
We describe biological and experimental factors that induce variability in reporter ion peak areas obtained from iTRAQ experiments. We demonstrate how these factors can be incorporated into a statistical model for use in evaluating differential protein expression and highlight the benefits of using analysis of variance to quantify fold change. We demonstrate the model's utility based on an analysis of iTRAQ data derived from a spike-in study.
Statistical Approaches to Assess Biosimilarity from Analytical Data.
Burdick, Richard; Coffey, Todd; Gutka, Hiten; Gratzl, Gyöngyi; Conlon, Hugh D; Huang, Chi-Ting; Boyne, Michael; Kuehne, Henriette
2017-01-01
Protein therapeutics have unique critical quality attributes (CQAs) that define their purity, potency, and safety. The analytical methods used to assess CQAs must be able to distinguish clinically meaningful differences in comparator products, and the most important CQAs should be evaluated with the most statistical rigor. High-risk CQA measurements assess the most important attributes that directly impact the clinical mechanism of action or have known implications for safety, while the moderate- to low-risk characteristics may have a lower direct impact and thereby may have a broader range to establish similarity. Statistical equivalence testing is applied for high-risk CQA measurements to establish the degree of similarity (e.g., highly similar fingerprint, highly similar, or similar) of selected attributes. Notably, some high-risk CQAs (e.g., primary sequence or disulfide bonding) are qualitative (e.g., the same as the originator or not the same) and therefore not amenable to equivalence testing. For biosimilars, an important step is the acquisition of a sufficient number of unique originator drug product lots to measure the variability in the originator drug manufacturing process and provide sufficient statistical power for the analytical data comparisons. Together, these analytical evaluations, along with PK/PD and safety data (immunogenicity), provide the data necessary to determine if the totality of the evidence warrants a designation of biosimilarity and subsequent licensure for marketing in the USA. In this paper, a case study approach is used to provide examples of analytical similarity exercises and the appropriateness of statistical approaches for the example data.
ERIC Educational Resources Information Center
Nhalevilo, Emilia Afonso; Ogunniyi, Meshach
2014-01-01
This article presents a reflection on an aspect of research methodology, particularly on the interpretation strategy of data from a Science and Indigenous Knowledge Systems Project (SIKSP) in a South African university. The data interpretation problem arose while we were analysing the effects of a series of SIKSP-based workshops on the views of a…
Mars Geological Province Designations for the Interpretation of GRS Data
NASA Astrophysics Data System (ADS)
Dohm, J. M.; Kerry, K.; Keller, J.; Baker, V. R.; Boynton, W. V.; Maruyama, S.; Anderson, R. C.
2005-03-01
Based on a synthesis of published geologic, paleohydrologic, topographic, and geophysical information, we have defined geologic provinces that represent significant windows into the geologic evolution of Mars, consistent with the GEOMARS theory and supported by GRS data.
The GEOS Ozone Data Assimilation System: Specification of Error Statistics
NASA Technical Reports Server (NTRS)
Stajner, Ivanka; Riishojgaard, Lars Peter; Rood, Richard B.
2000-01-01
A global three-dimensional ozone data assimilation system has been developed at the Data Assimilation Office of the NASA/Goddard Space Flight Center. The Total Ozone Mapping Spectrometer (TOMS) total ozone and the Solar Backscatter Ultraviolet (SBUV) or (SBUV/2) partial ozone profile observations are assimilated. The assimilation, into an off-line ozone transport model, is done using the global Physical-space Statistical Analysis Scheme (PSAS). This system became operational in December 1999. A detailed description of the statistical analysis scheme, and in particular, the forecast and observation error covariance models is given. A new global anisotropic horizontal forecast error correlation model accounts for a varying distribution of observations with latitude. Correlations are largest in the zonal direction in the tropics where data is sparse. Forecast error variance model is proportional to the ozone field. The forecast error covariance parameters were determined by maximum likelihood estimation. The error covariance models are validated using x squared statistics. The analyzed ozone fields in the winter 1992 are validated against independent observations from ozone sondes and HALOE. There is better than 10% agreement between mean Halogen Occultation Experiment (HALOE) and analysis fields between 70 and 0.2 hPa. The global root-mean-square (RMS) difference between TOMS observed and forecast values is less than 4%. The global RMS difference between SBUV observed and analyzed ozone between 50 and 3 hPa is less than 15%.
Training for Rapid Interpretation of Voluminous Multimodal Data
2008-04-01
Data DASW01-02-K-0001 5b. PROGRAM ELEMENT NUMBER 611102A 6. AUTHOR(S) 5c. PROJECT NUMBER Dennis J. Folds (Georgia Institute of Technology) B74F Cad T...TSD) and in contracts sponsored by that agency (see Cannon-Bowers & Salas, 1998). This program of research has generated many useful findings...of the proposed research program are as follows: 0 Assess the effects of data format, density, and overall volume on determination of relevance and
The statistical analysis of multivariate serological frequency data.
Reyment, Richard A
2005-11-01
Data occurring in the form of frequencies are common in genetics-for example, in serology. Examples are provided by the AB0 group, the Rhesus group, and also DNA data. The statistical analysis of tables of frequencies is carried out using the available methods of multivariate analysis with usually three principal aims. One of these is to seek meaningful relationships between the components of a data set, the second is to examine relationships between populations from which the data have been obtained, the third is to bring about a reduction in dimensionality. This latter aim is usually realized by means of bivariate scatter diagrams using scores computed from a multivariate analysis. The multivariate statistical analysis of tables of frequencies cannot safely be carried out by standard multivariate procedures because they represent compositions and are therefore embedded in simplex space, a subspace of full space. Appropriate procedures for simplex space are compared and contrasted with simple standard methods of multivariate analysis ("raw" principal component analysis). The study shows that the differences between a log-ratio model and a simple logarithmic transformation of proportions may not be very great, particularly as regards graphical ordinations, but important discrepancies do occur. The divergencies between logarithmically based analyses and raw data are, however, great. Published data on Rhesus alleles observed for Italian populations are used to exemplify the subject.
Breast Cancer and Reconstruction: Normative Data for Interpreting the BREAST-Q.
Mundy, Lily R; Homa, Karen; Klassen, Anne F; Pusic, Andrea L; Kerrigan, Carolyn L
2017-05-01
The BREAST-Q is a patient-reported outcome instrument used to evaluate outcomes in patients undergoing breast cancer surgery and reconstruction. Normative values for the BREAST-Q breast cancer modules have not been established, limiting data interpretation. Participants were recruited by means of the Army of Women, an online community of women (with and without breast cancer), to complete Mastectomy, Breast Conserving Therapy, and Reconstruction preoperative BREAST-Q scales. Inclusion criteria were women aged 18 years or older without a history of breast surgery or breast cancer. Analysis included descriptive statistics, a linear multivariate regression, and a comparison of the generated normative data to previously published BREAST-Q findings. The BREAST-Q was completed by 1201 women. The mean patient age was 54 ± 13 years, mean body mass index 26 ± 6 kg/m, and 38 percent (n = 455) had a bra cup size of D or greater. Mean ± SD scores for BREAST-Q scales were as follows: Satisfaction with Breasts (58 ± 18), Psychosocial Well-being (71 ± 18), Sexual Well-being (56 ± 18), Physical Well-being-Chest (93 ± 11), and Physical Well-being Abdomen (78 ± 20). Women with a body mass index of 30 kg/m or greater, cup size of D or greater, age younger than 40 years, and annual income less than $40,000 reported lower scores. Comparing normative scores to published data in breast cancer patients, Satisfaction with Breasts scores were higher after autologous reconstruction and lower after mastectomy; Sexual Well-being scores were lower after mastectomy and breast conserving therapy; and Physical Well-being Chest scores were lower after mastectomy, breast conserving therapy, and reconstruction. These are the first published normative scores for the BREAST-Q breast cancer modules and provide a clinical reference point for the interpretation of data.
Marchese Robinson, Richard L; Palczewska, Anna; Palczewski, Jan; Kidley, Nathan
2017-08-28
The ability to interpret the predictions made by quantitative structure-activity relationships (QSARs) offers a number of advantages. While QSARs built using nonlinear modeling approaches, such as the popular Random Forest algorithm, might sometimes be more predictive than those built using linear modeling approaches, their predictions have been perceived as difficult to interpret. However, a growing number of approaches have been proposed for interpreting nonlinear QSAR models in general and Random Forest in particular. In the current work, we compare the performance of Random Forest to those of two widely used linear modeling approaches: linear Support Vector Machines (SVMs) (or Support Vector Regression (SVR)) and partial least-squares (PLS). We compare their performance in terms of their predictivity as well as the chemical interpretability of the predictions using novel scoring schemes for assessing heat map images of substructural contributions. We critically assess different approaches for interpreting Random Forest models as well as for obtaining predictions from the forest. We assess the models on a large number of widely employed public-domain benchmark data sets corresponding to regression and binary classification problems of relevance to hit identification and toxicology. We conclude that Random Forest typically yields comparable or possibly better predictive performance than the linear modeling approaches and that its predictions may also be interpreted in a chemically and biologically meaningful way. In contrast to earlier work looking at interpretation of nonlinear QSAR models, we directly compare two methodologically distinct approaches for interpreting Random Forest models. The approaches for interpreting Random Forest assessed in our article were implemented using open-source programs that we have made available to the community. These programs are the rfFC package ( https://r-forge.r-project.org/R/?group_id=1725 ) for the R statistical
Interpretation of recent AMPTE data at the magnetopause
NASA Astrophysics Data System (ADS)
Heikkila, Walter J.
1997-02-01
Phan and Paschmann [1996] have done a superposed epoch analysis of conditions near the dayside magnetopause and have found significant structure within the magnetopause current sheet itself. Among their many important results is that the electron temperature for an outward profile shows cooling of the solar wind plasma for the inner part followed by heating for the outer. Since these two cases are associated with
Autonomous exploration system: Techniques for interpretation of multispectral data
NASA Technical Reports Server (NTRS)
Yates, Gigi; Eberlein, Susan
1989-01-01
An on-board autonomous exploration system that fuses data from multiple sensors, and makes decisions based on scientific goals is being developed using a series of artificial neural networks. Emphasis is placed on classifying minerals into broad geological categories by analyzing multispectral data from an imaging spectrometer. Artificial neural network architectures are being investigated for pattern matching and feature detection, information extraction, and decision making. As a first step, a stereogrammetry net extracts distance data from two gray scale stereo images. For each distance plane, the output is the probable mineral composition of the region, and a list of spectral features such as peaks, valleys, or plateaus, showing the characteristics of energy absorption and reflection. The classifier net is constructed using a grandmother cell architecture: an input layer of spectral data, an intermediate processor, and an output value. The feature detector is a three-layer feed-forward network that was developed to map input spectra to four geological classes, and will later be expanded to encompass more classes. Results from the classifier and feature detector nets will help to determine the relative importance of the region being examined with regard to current scientific goals of the system. This information is fed into a decision making neural net along with data from other sensors to decide on a plan of activity. A plan may be to examine the region at higher resolution, move closer, employ other sensors, or record an image and transmit it back to Earth.
Optimization of Statistical Methods Impact on Quantitative Proteomics Data.
Pursiheimo, Anna; Vehmas, Anni P; Afzal, Saira; Suomi, Tomi; Chand, Thaman; Strauss, Leena; Poutanen, Matti; Rokka, Anne; Corthals, Garry L; Elo, Laura L
2015-10-02
As tools for quantitative label-free mass spectrometry (MS) rapidly develop, a consensus about the best practices is not apparent. In the work described here we compared popular statistical methods for detecting differential protein expression from quantitative MS data using both controlled experiments with known quantitative differences for specific proteins used as standards as well as "real" experiments where differences in protein abundance are not known a priori. Our results suggest that data-driven reproducibility-optimization can consistently produce reliable differential expression rankings for label-free proteome tools and are straightforward in their application.
Statistical Inference for Big Data Problems in Molecular Biophysics
Ramanathan, Arvind; Savol, Andrej; Burger, Virginia; Quinn, Shannon; Agarwal, Pratul K; Chennubhotla, Chakra
2012-01-01
We highlight the role of statistical inference techniques in providing biological insights from analyzing long time-scale molecular simulation data. Technologi- cal and algorithmic improvements in computation have brought molecular simu- lations to the forefront of techniques applied to investigating the basis of living systems. While these longer simulations, increasingly complex reaching petabyte scales presently, promise a detailed view into microscopic behavior, teasing out the important information has now become a true challenge on its own. Mining this data for important patterns is critical to automating therapeutic intervention discovery, improving protein design, and fundamentally understanding the mech- anistic basis of cellular homeostasis.
Accidents in Malaysian construction industry: statistical data and court cases.
Chong, Heap Yih; Low, Thuan Siang
2014-01-01
Safety and health issues remain critical to the construction industry due to its working environment and the complexity of working practises. This research attempts to adopt 2 research approaches using statistical data and court cases to address and identify the causes and behavior underlying construction safety and health issues in Malaysia. Factual data on the period of 2000-2009 were retrieved to identify the causes and agents that contributed to health issues. Moreover, court cases were tabulated and analyzed to identify legal patterns of parties involved in construction site accidents. Approaches of this research produced consistent results and highlighted a significant reduction in the rate of accidents per construction project in Malaysia.
Interpreting Disasters From Limited Data Availability: A Guatemalan Study Case
NASA Astrophysics Data System (ADS)
Soto Gomez, A.
2012-12-01
Guatemala is located in a geographical area exposed to multiple natural hazards. Although Guatemalan populations live in hazardous conditions, limited scientific research is being focused in this particular geographical area. Thorough studies are needed to understand the disasters occurring in the country and consequently enable decision makers and professionals to plan future actions, yet available data is limited. Data comprised in the available data sources is limited by their timespan or the size of the events included and therefore is insufficient to provide the whole picture of the disasters in the country. This study proposes a methodology to use the available data within one of the most important catchments in the country, the Samala River basin, to look for answers to what kind of disasters occurs? Where such events happen? And, why do they happen? Three datasets from different source agencies -one global, one regional, and one local- have been analyzed numerically and spatially using spreadsheets, numerical computing software, and geographic information systems. Analyses results have been coupled in order to search for possible answers to the established questions. It has been found a relation between the compositions of data of two of the three datasets analyzed. The third has shown a very different composition probably because the inclusion criteria of the dataset exclude smaller but more frequent disasters in its records. In all the datasets the most frequent type of disasters are those caused by hydrometeorological hazards i.e. floods and landslides. It has been found a relation between the occurrences of disasters and the records of precipitation in the area, but this relation is not strong enough to affirm that the disasters are the direct result of rain in the area and further studies must be carried out to explore other potential causes. Analyzing the existing data contributes to identify what kind of data is needed and this would be useful to
NASA Astrophysics Data System (ADS)
Lee, Jaeha; Tsutsui, Izumi
2017-05-01
We show that the joint behavior of an arbitrary pair of (generally noncommuting) quantum observables can be described by quasi-probabilities, which are an extended version of the standard probabilities used for describing the outcome of measurement for a single observable. The physical situations that require these quasi-probabilities arise when one considers quantum measurement of an observable conditioned by some other variable, with the notable example being the weak measurement employed to obtain Aharonov's weak value. Specifically, we present a general prescription for the construction of quasi-joint probability (QJP) distributions associated with a given combination of observables. These QJP distributions are introduced in two complementary approaches: one from a bottom-up, strictly operational construction realized by examining the mathematical framework of the conditioned measurement scheme, and the other from a top-down viewpoint realized by applying the results of the spectral theorem for normal operators and their Fourier transforms. It is then revealed that, for a pair of simultaneously measurable observables, the QJP distribution reduces to the unique standard joint probability distribution of the pair, whereas for a noncommuting pair there exists an inherent indefiniteness in the choice of such QJP distributions, admitting a multitude of candidates that may equally be used for describing the joint behavior of the pair. In the course of our argument, we find that the QJP distributions furnish the space of operators in the underlying Hilbert space with their characteristic geometric structures such that the orthogonal projections and inner products of observables can be given statistical interpretations as, respectively, "conditionings" and "correlations". The weak value Aw for an observable A is then given a geometric/statistical interpretation as either the orthogonal projection of A onto the subspace generated by another observable B, or equivalently
The analysis, interpretation, and presentation of quality of life data.
Stephens, Richard
2004-02-01
All too often in clinical trials the assessment of quality of life is seen as a bolt-on study. Consequently insufficient consideration is often given to its design, collection, analysis and presentation, and its impact on the trial results and on clinical practice is minimal. In many trials quality of life is a key endpoint, and it is vital that quality of life expertise is involved as soon as possible in the design. Setting a priori quality of life hypotheses will focus the decisions regarding which questionnaire to use, when to administer it, the sample size required, and the primary analyses. Nevertheless quality of life data are complex, and require much skill in determining how to deal with multi-dimensional and longitudinal data, much of which is often missing. There are no agreed standard ways of analysing and presenting quality of life data, but there are guidelines, which if followed, will add transparency to the way results have been calculated. Understanding the impact of treatments on their quality of life is vital to patients, and it is up to us, as statisticians and trialists, to present the data as clearly as we can.
Spatial Statistical Procedures to Validate Input Data in Energy Models
Lawrence Livermore National Laboratory
2006-01-27
Energy modeling and analysis often relies on data collected for other purposes such as census counts, atmospheric and air quality observations, economic trends, and other primarily non-energy-related uses. Systematic collection of empirical data solely for regional, national, and global energy modeling has not been established as in the above-mentioned fields. Empirical and modeled data relevant to energy modeling is reported and available at various spatial and temporal scales that might or might not be those needed and used by the energy modeling community. The incorrect representation of spatial and temporal components of these data sets can result in energy models producing misleading conclusions, especially in cases of newly evolving technologies with spatial and temporal operating characteristics different from the dominant fossil and nuclear technologies that powered the energy economy over the last two hundred years. Increased private and government research and development and public interest in alternative technologies that have a benign effect on the climate and the environment have spurred interest in wind, solar, hydrogen, and other alternative energy sources and energy carriers. Many of these technologies require much finer spatial and temporal detail to determine optimal engineering designs, resource availability, and market potential. This paper presents exploratory and modeling techniques in spatial statistics that can improve the usefulness of empirical and modeled data sets that do not initially meet the spatial and/or temporal requirements of energy models. In particular, we focus on (1) aggregation and disaggregation of spatial data, (2) predicting missing data, and (3) merging spatial data sets. In addition, we introduce relevant statistical software models commonly used in the field for various sizes and types of data sets.
Spatial Statistical Procedures to Validate Input Data in Energy Models
Johannesson, G.; Stewart, J.; Barr, C.; Brady Sabeff, L.; George, R.; Heimiller, D.; Milbrandt, A.
2006-01-01
Energy modeling and analysis often relies on data collected for other purposes such as census counts, atmospheric and air quality observations, economic trends, and other primarily non-energy related uses. Systematic collection of empirical data solely for regional, national, and global energy modeling has not been established as in the abovementioned fields. Empirical and modeled data relevant to energy modeling is reported and available at various spatial and temporal scales that might or might not be those needed and used by the energy modeling community. The incorrect representation of spatial and temporal components of these data sets can result in energy models producing misleading conclusions, especially in cases of newly evolving technologies with spatial and temporal operating characteristics different from the dominant fossil and nuclear technologies that powered the energy economy over the last two hundred years. Increased private and government research and development and public interest in alternative technologies that have a benign effect on the climate and the environment have spurred interest in wind, solar, hydrogen, and other alternative energy sources and energy carriers. Many of these technologies require much finer spatial and temporal detail to determine optimal engineering designs, resource availability, and market potential. This paper presents exploratory and modeling techniques in spatial statistics that can improve the usefulness of empirical and modeled data sets that do not initially meet the spatial and/or temporal requirements of energy models. In particular, we focus on (1) aggregation and disaggregation of spatial data, (2) predicting missing data, and (3) merging spatial data sets. In addition, we introduce relevant statistical software models commonly used in the field for various sizes and types of data sets.
Statistics of Optical Coherence Tomography Data From Human Retina
de Juan, Joaquín; Ferrone, Claudia; Giannini, Daniela; Huang, David; Koch, Giorgio; Russo, Valentina; Tan, Ou; Bruni, Carlo
2010-01-01
Optical coherence tomography (OCT) has recently become one of the primary methods for noninvasive probing of the human retina. The pseudoimage formed by OCT (the so-called B-scan) varies probabilistically across pixels due to complexities in the measurement technique. Hence, sensitive automatic procedures of diagnosis using OCT may exploit statistical analysis of the spatial distribution of reflectance. In this paper, we perform a statistical study of retinal OCT data. We find that the stretched exponential probability density function can model well the distribution of intensities in OCT pseudoimages. Moreover, we show a small, but significant correlation between neighbor pixels when measuring OCT intensities with pixels of about 5 µm. We then develop a simple joint probability model for the OCT data consistent with known retinal features. This model fits well the stretched exponential distribution of intensities and their spatial correlation. In normal retinas, fit parameters of this model are relatively constant along retinal layers, but varies across layers. However, in retinas with diabetic retinopathy, large spikes of parameter modulation interrupt the constancy within layers, exactly where pathologies are visible. We argue that these results give hope for improvement in statistical pathology-detection methods even when the disease is in its early stages. PMID:20304733
Interpreting Microarray Data to Build Models of Microbial Genetic Regulation Networks
Sokhansanj, B; Garnham, J B; Fitch, J P
2002-01-23
Microarrays and DNA chips are an efficient, high-throughput technology for measuring temporal changes in the expression of message RNA (mRNA) from thousands of genes (often the entire genome of an organism) in a single experiment. A crucial drawback of microarray experiments is that results are inherently qualitative: data are generally neither quantitatively repeatable, nor may microarray spot intensities be calibrated to in vivo mRNA concentrations. Nevertheless, microarrays represent by the far the cheapest and fastest way to obtain information about a cells global genetic regulatory networks. Besides poor signal characteristics, the massive number of data produced by microarray experiments poses challenges for visualization, interpretation and model building. Towards initial model development, we have developed a Java tool for visualizing the spatial organization of gene expression in bacteria. We are also developing an approach to inferring and testing qualitative fuzzy logic models of gene regulation using microarray data. Because we are developing and testing qualitative hypotheses that do not require quantitative precision, our statistical evaluation of experimental data is limited to checking for validity and consistency. Our goals are to maximize the impact of inexpensive microarray technology, bearing in mind that biological models and hypotheses are typically qualitative.
Methods for Quantitative Interpretation of Retarding Field Analyzer Data
Calvey, J.R.; Crittenden, J.A.; Dugan, G.F.; Palmer, M.A.; Furman, M.; Harkay, K.
2011-03-28
Over the course of the CesrTA program at Cornell, over 30 Retarding Field Analyzers (RFAs) have been installed in the CESR storage ring, and a great deal of data has been taken with them. These devices measure the local electron cloud density and energy distribution, and can be used to evaluate the efficacy of different cloud mitigation techniques. Obtaining a quantitative understanding of RFA data requires use of cloud simulation programs, as well as a detailed model of the detector itself. In a drift region, the RFA can be modeled by postprocessing the output of a simulation code, and one can obtain best fit values for important simulation parameters with a chi-square minimization method.
Statistical comparison of the AGDISP model with deposit data
NASA Astrophysics Data System (ADS)
Duan, Baozhong; Yendol, William G.; Mierzejewski, Karl
An aerial spray Agricultural Dispersal (AGDISP) model was tested against quantitative field data. The microbial pesticide Bacillus thuringiensis (Bt) was sprayed as fine spray from a helicopted over a flat site in various meteorological conditions. Droplet deposition on evenly spaced Kromekote cards, 0.15 m above the ground, was measured with image analysis equipment. Six complete data sets out of the 12 trials were selected for data comparison. A set of statistical parameters suggested by the American Meteorological Society and other authors was applied for comparisons of the model prediction with the ground deposit data. The results indicated that AGDISP tended to overpredict the average volume deposition by a factor of two. The sensitivity test of the AGDISP model to the input wind direction showed that the model may not be sensitive to variations in wind direction within 10 degrees relative to aircraft flight path.
Outpatient health care statistics data warehouse--implementation.
Zilli, D
1999-01-01
Data warehouse implementation is assumed to be a very knowledge-demanding, expensive and long-lasting process. As such it requires senior management sponsorship, involvement of experts, a big budget and probably years of development time. Presented Outpatient Health Care Statistics Data Warehouse implementation research provides ample evidence against the infallibility of the above statements. New, inexpensive, but powerful technology, which provides outstanding platform for On-Line Analytical Processing (OLAP), has emerged recently. Presumably, it will be the basis for the estimated future growth of data warehouse market, both in the medical and in other business fields. Methods and tools for building, maintaining and exploiting data warehouses are also briefly discussed in the paper.
Suitability of Archie's Law For Interpreting Electrical Resistivity Data
NASA Astrophysics Data System (ADS)
Singha, K.; Gorelick, S. M.
2003-12-01
Electrical resistivity tomography (ERT) is examined as a method to provide spatially continuous images of saline tracer concentrations during transport through unconsolidated fluid-saturated media. It is frequently accepted that there exists a quantitative relationship between the electrical conductivity of dilute electrolytes in pore water and bulk electrical conductivity of the subsurface measured using resistivity methods. The assumed relationship is typically Archie's Law. We tested the applicability of Archie's Law to field-scale data collected over a 10 m by 14 m area. A 20-day weak-dipole tracer test was conducted, in which 2 g/L NaCl were introduced into the upper 30 m of the saturated zone in a coarse sand and gravel aquifer. Cross-well ERT data were collected at 4 geophysical monitoring wells and inverted in 3-D. Fluid electrical conductivity was measured directly from a multilevel sampler. The change in the direct measurements of fluid electrical conductivity exceeded the change in bulk conductivity values in the tomograms by an order of magnitude. The estimated Archie formation factor from the field data was not constant with time, due largely to smoothing during the image reconstruction process. We illustrate by modeling synthetic cases over the field site that the ERT response is difficult to match to measured fluid conductivities due to the variability in the effects of regularization, which change in both space and time. Analysis of both the field data and synthetic cases suggest that Archie's Law cannot be used to directly scale ERT conductivities to fluid conductivities.
Interpreting sediment transport data with channel cross section analysis
NASA Astrophysics Data System (ADS)
Park, J.; Hunt, J. R.
2013-12-01
Suspended sediment load estimation is important for the management of stream environments. However suspended load data are uncommon and scalable models are needed to take maximum advantage of the measurements available. One of the most commonly used models for correlating suspended sediment load is an empirical power law relationship (Qs=aQ^b, Qs: suspended load, Q: flow rate). However, the relationship of log-scaled suspended load to flow rate has multiple exponents for different flow regimes at a given site, so a single power law relationship is not a good fit. Thus we are exploring an alternative approach that employs channel cross section data historically collected by the US Geological Survey during stream gauge calibration. For our research, daily flow and sediment discharge were selected from about 180 possible USGS gauging sites in California. Among those, about 20 sites were relatively unaffected by human activities, and had more than three years of data including near monthly measurements of channel cross section data. From our analysis, a slope break was consistently observed in the relationship of log-scaled suspended load to flow rate as illustrated in Figure 1 for Redwood Creek at Orick, CA. Most of the selected natural sites clearly show this slope break. The slope break corresponds to a transition of flow from a flat, wide stream to flow constrained by steep banks as verified in Figure 2 for the same site. This suggests that physical factors in the streams such as shear stress are affected by this channel morphological change and result in the greater exponent of sediment load during higher flow regime. Figure1. Daily values of measured sediment transport and flow rate reported by USGS between 1970 and 2001. Figure2. Near monthly values of measured mean water depth and width reported by USGS between 1969 and 1987.
Assessment of Dermatophytosis Treatment Studies: Interpreting the Data.
Rosen, Theodore
2015-10-01
Antifungal therapy has recently enjoyed a resurgence of interest due to the introduction of a number of new formulations of topical drugs and novel molecules. This has led to a plethora of new publications on management of cutaneous fungal disease. This paper summarizes the various clinical trial factors which may affect the published data regarding how well antifungal drugs work. Understanding these parameters allows the healthcare provider to choose more rationally between available agents based upon an assessment of the evidence.
Using fuzzy sets for data interpretation in natural analogue studies
De Lemos, F.L.; Sullivan, T.; Hellmuth, K.H.
2008-07-01
Natural analogue studies can play a key role in deep geological radioactive disposal systems safety assessment. These studies can help develop a better understanding of complex natural processes and, therefore, provide valuable means of confidence building in the safety assessment. In evaluation of natural analogues, there are, however, several sources of uncertainties that stem from factors such as complexity; lack of data; and ignorance. Often, analysts have to simplify the mathematical models in order to cope with the various sources of complexity and this ads uncertainty to the model results. The uncertainties reflected in model predictions must be addressed to understand their impact on safety assessment and therefore, the utility of natural analogues. Fuzzy sets can be used to represent the information regarding the natural processes and their mutual connections. With this methodology we are able to quantify and propagate the epistemic uncertainties in both processes and, thereby, assign degrees of truth to the similarities between them. An example calculation with literature data is provided. In conclusion: Fuzzy sets are an effective way of quantifying semi-quantitative information such as natural analogues data. Epistemic uncertainty that stems from complexity and lack of knowledge regarding natural processes are represented by the degrees of membership. It also facilitates the propagation of this uncertainty throughout the performance assessment by the extension principle. This principle allows calculation with fuzzy numbers, where fuzzy input results in fuzzy output. This may be one of the main applications of fuzzy sets theory to radioactive waste disposal facility performance assessment. Through the translation of natural data into fuzzy numbers, the effect of parameters in important processes in one site can be quantified and compared to processes in other sites with different conditions. The approach presented in this paper can be extended to
Common misconceptions about data analysis and statistics1
Motulsky, Harvey J
2015-01-01
Ideally, any experienced investigator with the right tools should be able to reproduce a finding published in a peer-reviewed biomedical science journal. In fact, the reproducibility of a large percentage of published findings has been questioned. Undoubtedly, there are many reasons for this, but one reason may be that investigators fool themselves due to a poor understanding of statistical concepts. In particular, investigators often make these mistakes: (1) P-Hacking. This is when you reanalyze a data set in many different ways, or perhaps reanalyze with additional replicates, until you get the result you want. (2) Overemphasis on P values rather than on the actual size of the observed effect. (3) Overuse of statistical hypothesis testing, and being seduced by the word “significant”. (4) Overreliance on standard errors, which are often misunderstood. PMID:25692012
Interpreting Multiple Environmental Tracer Data in a Perialpine Catchment
NASA Astrophysics Data System (ADS)
Onnis, G. A.; Althaus, R.; Klump, S.; Purtschert, R.; Kipfer, R.; Hendricks-Franssen, H.; Stauffer, F.; Kinzelbach, W.
2008-12-01
A case study for the environmental tracers Tritium, Helium-3 and Krypton-85 in a small sand-gravel aquifer catchment in Northern Switzerland is presented. The groundwater flow is determined by means of Stochastic Inverse Modelling, using available transient hydraulic head and transmissivity (T) data to calibrate the transmissivity field with the Sequential Self-Calibration Technique as implemented in the code INVERTO. The evaluation of the aquifer recharge and its discharge via natural springs is independently performed and confirmed through comparison of simulated and observed head after the inversion procedure. A number of equally-likely transmissivity field realizations honoring both transmissivity and transient head measurements is generated, establishing the basis for environmental tracer transport modeling. The impact of the spatially-variable, thick unsaturated zone (>10 m) on tracer transport is accounted for by means of a numerical solution to the vertical advection-dispersion equation. Starting from the measured tracer concentrations in the atmosphere, the input history to the saturated zone is reconstructed for different groundwater table depths. Environmental tracer transport in the saturated zone is investigated for each calibrated T -realization. The transport simulation results are in general fair for all tracers and can well reproduce the tracer data at most observation locations, with a small uncertainty bandwidth related to the T -parameter. Ad-hoc zonation of transport parameters (vadose zone gas-phase tortuosity and saturated porosity) can help in achieving a simultaneous match of the tracer data at all locations. However, the model can account for only 20% of the amplitude of the high-frequency oscillations in Krypton-85 concentrations observed at one pumping station. Short term variations of the recharge rate and of the actual pumping rate can account for a further 10% each to the Krypton-85 concentration fluctuations amplitudes. The origin of
Statistical mechanics of complex neural systems and high dimensional data
NASA Astrophysics Data System (ADS)
Advani, Madhu; Lahiri, Subhaneil; Ganguli, Surya
2013-03-01
Recent experimental advances in neuroscience have opened new vistas into the immense complexity of neuronal networks. This proliferation of data challenges us on two parallel fronts. First, how can we form adequate theoretical frameworks for understanding how dynamical network processes cooperate across widely disparate spatiotemporal scales to solve important computational problems? Second, how can we extract meaningful models of neuronal systems from high dimensional datasets? To aid in these challenges, we give a pedagogical review of a collection of ideas and theoretical methods arising at the intersection of statistical physics, computer science and neurobiology. We introduce the interrelated replica and cavity methods, which originated in statistical physics as powerful ways to quantitatively analyze large highly heterogeneous systems of many interacting degrees of freedom. We also introduce the closely related notion of message passing in graphical models, which originated in computer science as a distributed algorithm capable of solving large inference and optimization problems involving many coupled variables. We then show how both the statistical physics and computer science perspectives can be applied in a wide diversity of contexts to problems arising in theoretical neuroscience and data analysis. Along the way we discuss spin glasses, learning theory, illusions of structure in noise, random matrices, dimensionality reduction and compressed sensing, all within the unified formalism of the replica method. Moreover, we review recent conceptual connections between message passing in graphical models, and neural computation and learning. Overall, these ideas illustrate how statistical physics and computer science might provide a lens through which we can uncover emergent computational functions buried deep within the dynamical complexities of neuronal networks.
Understanding spatial organizations of chromosomes via statistical analysis of Hi-C data
Hu, Ming; Deng, Ke; Qin, Zhaohui; Liu, Jun S.
2015-01-01
Understanding how chromosomes fold provides insights into the transcription regulation, hence, the functional state of the cell. Using the next generation sequencing technology, the recently developed Hi-C approach enables a global view of spatial chromatin organization in the nucleus, which substantially expands our knowledge about genome organization and function. However, due to multiple layers of biases, noises and uncertainties buried in the protocol of Hi-C experiments, analyzing and interpreting Hi-C data poses great challenges, and requires novel statistical methods to be developed. This article provides an overview of recent Hi-C studies and their impacts on biomedical research, describes major challenges in statistical analysis of Hi-C data, and discusses some perspectives for future research. PMID:26124977
Modern statistical modeling approaches for analyzing repeated-measures data.
Hayat, Matthew J; Hedlin, Haley
2012-01-01
Researchers often describe the collection of repeated measurements on each individual in a study design. Advanced statistical methods, namely, mixed and marginal models, are the preferred analytic choices for analyzing this type of data. The aim was to provide a conceptual understanding of these modeling techniques. An understanding of mixed models and marginal models is provided via a thorough exploration of the methods that have been used historically in the biomedical literature to summarize and make inferences about this type of data. The limitations are discussed, as is work done on expanding the classic linear regression model to account for repeated measurements taken on an individual, leading to the broader mixed-model framework. A description is provided of a variety of common types of study designs and data structures that can be analyzed using a mixed model and a marginal model. This work provides an overview of advanced statistical modeling techniques used for analyzing the many types of correlated .data collected in a research study.