A novel statistical analysis and interpretation of flow cytometry data
Banks, H.T.; Kapraun, D.F.; Thompson, W. Clayton; Peligero, Cristina; Argilaguet, Jordi; Meyerhans, Andreas
2013-01-01
A recently developed class of models incorporating the cyton model of population generation structure into a conservation-based model of intracellular label dynamics is reviewed. Statistical aspects of the data collection process are quantified and incorporated into a parameter estimation scheme. This scheme is then applied to experimental data for PHA-stimulated CD4+ T and CD8+ T cells collected from two healthy donors. This novel mathematical and statistical framework is shown to form the basis for accurate, meaningful analysis of cellular behaviour for a population of cells labelled with the dye carboxyfluorescein succinimidyl ester and stimulated to divide. PMID:23826744
A statistical model for interpreting computerized dynamic posturography data
NASA Technical Reports Server (NTRS)
Feiveson, Alan H.; Metter, E. Jeffrey; Paloski, William H.
2002-01-01
Computerized dynamic posturography (CDP) is widely used for assessment of altered balance control. CDP trials are quantified using the equilibrium score (ES), which ranges from zero to 100, as a decreasing function of peak sway angle. The problem of how best to model and analyze ESs from a controlled study is considered. The ES often exhibits a skewed distribution in repeated trials, which can lead to incorrect inference when applying standard regression or analysis of variance models. Furthermore, CDP trials are terminated when a patient loses balance. In these situations, the ES is not observable, but is assigned the lowest possible score--zero. As a result, the response variable has a mixed discrete-continuous distribution, further compromising inference obtained by standard statistical methods. Here, we develop alternative methodology for analyzing ESs under a stochastic model extending the ES to a continuous latent random variable that always exists, but is unobserved in the event of a fall. Loss of balance occurs conditionally, with probability depending on the realized latent ES. After fitting the model by a form of quasi-maximum-likelihood, one may perform statistical inference to assess the effects of explanatory variables. An example is provided, using data from the NIH/NIA Baltimore Longitudinal Study on Aging.
Misuse of statistics in the interpretation of data on low-level radiation
Hamilton, L.D.
1982-01-01
Four misuses of statistics in the interpretation of data of low-level radiation are reviewed: (1) post-hoc analysis and aggregation of data leading to faulty conclusions in the reanalysis of genetic effects of the atomic bomb, and premature conclusions on the Portsmouth Naval Shipyard data; (2) inappropriate adjustment for age and ignoring differences between urban and rural areas leading to potentially spurious increase in incidence of cancer at Rocky Flats; (3) hazard of summary statistics based on ill-conditioned individual rates leading to spurious association between childhood leukemia and fallout in Utah; and (4) the danger of prematurely published preliminary work with inadequate consideration of epidemiological problems - censored data - leading to inappropriate conclusions, needless alarm at the Portsmouth Naval Shipyard, and diversion of scarce research funds.
Statistics Translated: A Step-by-Step Guide to Analyzing and Interpreting Data
ERIC Educational Resources Information Center
Terrell, Steven R.
2012-01-01
Written in a humorous and encouraging style, this text shows how the most common statistical tools can be used to answer interesting real-world questions, presented as mysteries to be solved. Engaging research examples lead the reader through a series of six steps, from identifying a researchable problem to stating a hypothesis, identifying…
Logical, epistemological and statistical aspects of nature-nurture data interpretation.
Kempthorne, O
1978-03-01
In this paper the nature of the reasoning processes applied to the nature-nurture question is discussed in general and with particular reference to mental and behavioral traits. The nature of data analysis and analysis of variance is discussed. Necessarily, the nature of causation is considered. The notion that mere data analysis can establish "real" causation is attacked. Logic of quantitative genetic theory is reviewed briefly. The idea that heritability is meaningful in the human mental and behavioral arena is attacked. The conclusion is that the heredity-IQ controversy has been a "tale full of sound and fury, signifying nothing". To suppose that one can establish effects of an intervention process when it does not occur in the data is plainly ludicrous. Mere observational studies can easily lead to stupidities, and it is suggested that this has happened in the heredity-IQ arena. The idea that there are racial-genetic differences in mental abilities and behavioral traits of humans is, at best, no more than idle speculation. PMID:637918
Statistical weld process monitoring with expert interpretation
Cook, G.E.; Barnett, R.J.; Strauss, A.M.; Thompson, F.M. Jr.
1996-12-31
A statistical weld process monitoring system is described. Using data of voltage, current, wire feed speed, gas flow rate, travel speed, and elapsed arc time collected while welding, the welding statistical process control (SPC) tool provides weld process quality control by implementing techniques of data trending analysis, tolerance analysis, and sequential analysis. For purposes of quality control, the control limits required for acceptance are specified in the weld procedure acceptance specifications. The control charts then provide quality assurance documentation for each weld. The statistical data trending analysis performed by the SPC program is not only valuable as a quality assurance monitoring and documentation system, it is also valuable in providing diagnostic assistance in troubleshooting equipment and material problems. Possible equipment/process problems are identified and matched with features of the SPC control charts. To aid in interpreting the voluminous statistical output generated by the SPC system, a large number of If-Then rules have been devised for providing computer-based expert advice for pinpointing problems based on out-of-limit variations of the control charts. The paper describes the SPC monitoring tool and the rule-based expert interpreter that has been developed for relating control chart trends to equipment/process problems.
NASA Astrophysics Data System (ADS)
Irving, J.; Knight, R.; Holliger, K.
2007-12-01
The distribution of subsurface water content can be an excellent indicator of soil texture, which strongly influences the unsaturated hydraulic properties controlling vadose zone contaminant transport. Characterizing the heterogeneity in subsurface water content for use in numerical transport models, however, is an extremely difficult task as conventional hydrological measurement techniques do not offer the combined high spatial resolution and coverage required for accurate simulations. A number of recent studies have shown that ground-penetrating radar (GPR) reflection images may contain useful information regarding the statistical structure of subsurface water content. Comparisons of the horizontal correlation structures of radar images and those obtained from water content measurements have shown that, in some cases, the statistical characteristics are remarkably similar. However, a key issue in these studies is that a reflection GPR image is primarily related to changes in subsurface water content, and not the water content distribution directly. As a result, statistics gathered on the reflection image have a very complex relationship with the statistics of the underlying water content distribution, this relationship depending on a number of factors including the frequency of the GPR antennas used. In this work, we attempt to address the above issue by posing the estimation of the statistical structure of water content from reflection GPR data as an inverse problem. Using a simple convolution model for a radar image, we first derive a forward model relating the statistical structure of a radar image to that of the underlying water content distribution. We then use this forward model to invert for the spatial statistics of the water content distribution, given the spatial statistics of the GPR reflection image as data. We do this within a framework of uncertainty, such that realistic statistical bounds can be placed on the information that is inferred. In other
Asfahani, Jamal
2014-02-01
Factor analysis technique is proposed in this research for interpreting the combination of nuclear well logging, including natural gamma ray, density and neutron-porosity, and the electrical well logging of long and short normal, in order to characterize the large extended basaltic areas in southern Syria. Kodana well logging data are used for testing and applying the proposed technique. The four resulting score logs enable to establish the lithological score cross-section of the studied well. The established cross-section clearly shows the distribution and the identification of four kinds of basalt which are hard massive basalt, hard basalt, pyroclastic basalt and the alteration basalt products, clay. The factor analysis technique is successfully applied on the Kodana well logging data in southern Syria, and can be used efficiently when several wells and huge well logging data with high number of variables are required to be interpreted. PMID:24296157
The advent of new higher throughput analytical instrumentation has put a strain on interpreting and explaining the results from complex studies. Contemporary human, environmental, and biomonitoring data sets are comprised of tens or hundreds of analytes, multiple repeat measures...
Statistical mechanics and the ontological interpretation
NASA Astrophysics Data System (ADS)
Bohm, D.; Hiley, B. J.
1996-06-01
To complete our ontological interpretation of quantum theory we have to conclude a treatment of quantum statistical mechanics. The basic concepts in the ontological approach are the particle and the wave function. The density matrix cannot play a fundamental role here. Therefore quantum statistical mechanics will require a further statistical distribution over wave functions in addition to the distribution of particles that have a specified wave function. Ultimately the wave function of the universe will he required, but we show that if the universe in not in thermodynamic equilibrium then it can he treated in terms of weakly interacting large scale constituents that are very nearly independent of each other. In this way we obtain the same results as those of the usual approach within the framework of the ontological interpretation.
Fadda, Valeria; Maratea, Dario; Trippoli, Sabrina; Gatto, Roberta; De Rosa, Mauro; Marinai, Claudio
2014-01-01
Background: No equivalence analysis has yet been conducted on the effectiveness of biologics in rheumatoid arthritis. Equivalence testing has a specific scientific interest, but can also be useful for deciding whether acquisition tenders are feasible for the pharmacological agents being compared. Methods: Our search covered the literature up to August 2014. Our methodology was a combination of standard pairwise meta-analysis, Bayesian network meta-analysis and equivalence testing. The agents examined for their potential equivalence were etanercept, adalimumab, golimumab, certolizumab, and tocilizumab, each in combination with methotrexate (MTX). The reference treatment was MTX monotherapy. The endpoint was ACR50 achievement at 12 months. Odds ratio was the outcome measure. The equivalence margins were established by analyzing the statistical power data of the trials. Results: Our search identified seven randomized controlled trials (2846 patients). No study was retrieved for tocilizumab, and so only four biologics were evaluable. The equivalence range was set at odds ratio from 0.56 to 1.78. There were 10 head-to-head comparisons (4 direct, 6 indirect). Bayesian network meta-analysis estimated the odds ratio (with 90% credible intervals) for each of these comparisons. Between-trial heterogeneity was marked. According to our results, all credible intervals of the 10 comparisons were wide and none of them satisfied the equivalence criterion. A superiority finding was confirmed for the treatment with MTX plus adalimumab or certolizumab in comparison with MTX monotherapy, but not for the other two biologics. Conclusion: Our results indicate that these four biologics improved the rates of ACR50 achievement, but there was an evident between-study heterogeneity. The head-to-head indirect comparisons between individual biologics showed no significant difference, but failed to demonstrate the proof of no difference (i.e. equivalence). This body of evidence presently
Nash, J. Thomas; Frishman, David
1983-01-01
Analytical results for 61 elements in 370 samples from the Ranger Mine area are reported. Most of the rocks come from drill core in the Ranger No. 1 and Ranger No. 3 deposits, but 20 samples are from unmineralized drill core more than 1 km from ore. Statistical tests show that the elements Mg, Fe, F, Be, Co, Li, Ni, Pb, Sc, Th, Ti, V, CI, As, Br, Au, Ce, Dy, La Sc, Eu, Tb, Yb, and Tb have positive association with uranium, and Si, Ca, Na, K, Sr, Ba, Ce, and Cs have negative association. For most lithologic subsets Mg, Fe, Li, Cr, Ni, Pb, V, Y, Sm, Sc, Eu, and Yb are significantly enriched in ore-bearing rocks, whereas Ca, Na, K, Sr, Ba, Mn, Ce, and Cs are significantly depleted. These results are consistent with petrographic observations on altered rocks. Lithogeochemistry can aid exploration, but for these rocks requires methods that are expensive and not amenable to routine use.
Interpreting Educational Research Using Statistical Software.
ERIC Educational Resources Information Center
Evans, Elizabeth A.
A live demonstration of how a typical set of educational data can be examined using quantitative statistical software was conducted. The topic of tutorial support was chosen. Setting up a hypothetical research scenario, the researcher created 300 cases from random data generation adjusted to correct obvious error. Each case represented a student…
NASA Astrophysics Data System (ADS)
Tadaki, Kohtaro
2010-12-01
The statistical mechanical interpretation of algorithmic information theory (AIT, for short) was introduced and developed by our former works [K. Tadaki, Local Proceedings of CiE 2008, pp. 425-434, 2008] and [K. Tadaki, Proceedings of LFCS'09, Springer's LNCS, vol. 5407, pp. 422-440, 2009], where we introduced the notion of thermodynamic quantities, such as partition function Z(T), free energy F(T), energy E(T), statistical mechanical entropy S(T), and specific heat C(T), into AIT. We then discovered that, in the interpretation, the temperature T equals to the partial randomness of the values of all these thermodynamic quantities, where the notion of partial randomness is a stronger representation of the compression rate by means of program-size complexity. Furthermore, we showed that this situation holds for the temperature T itself, which is one of the most typical thermodynamic quantities. Namely, we showed that, for each of the thermodynamic quantities Z(T), F(T), E(T), and S(T) above, the computability of its value at temperature T gives a sufficient condition for T (0,1) to satisfy the condition that the partial randomness of T equals to T. In this paper, based on a physical argument on the same level of mathematical strictness as normal statistical mechanics in physics, we develop a total statistical mechanical interpretation of AIT which actualizes a perfect correspondence to normal statistical mechanics. We do this by identifying a microcanonical ensemble in the framework of AIT. As a result, we clarify the statistical mechanical meaning of the thermodynamic quantities of AIT.
Use and interpretation of statistics in wildlife journals
Tacha, Thomas C.; Warde, William D.; Burnham, Kenneth P.
1982-01-01
Use and interpretation of statistics in wildlife journals are reviewed, and suggestions for improvement are offered. Populations from which inferences are to be drawn should be clearly defined, and conclusions should be limited to the range of the data analyzed. Authors should be careful to avoid improper methods of plotting data and should clearly define the use of estimates of variance, standard deviation, standard error, or confidence intervals. Biological and statistical significant are often confused by authors and readers. Statistical hypothesis testing is a tool, and not every question should be answered by hypothesis testing. Meeting assumptions of hypothesis tests is the responsibility of authors, and assumptions should be reviewed before a test is employed. The use of statistical tools should be considered carefully both before and after gathering data.
Integrating statistical rock physics and sedimentology for quantitative seismic interpretation
NASA Astrophysics Data System (ADS)
Avseth, Per; Mukerji, Tapan; Mavko, Gary; Gonzalez, Ezequiel
This paper presents an integrated approach for seismic reservoir characterization that can be applied both in petroleum exploration and in hydrological subsurface analysis. We integrate fundamental concepts and models of rock physics, sedimentology, statistical pattern recognition, and information theory, with seismic inversions and geostatistics. Rock physics models enable us to link seismic amplitudes to geological facies and reservoir properties. Seismic imaging brings indirect, noninvasive, but nevertheless spatially exhaustive information about the reservoir properties that are not available from well data alone. Classification and estimation methods based on computational statistical techniques such as nonparametric Bayesian classification, Monte Carlo simulations and bootstrap, help to quantitatively measure the interpretation uncertainty and the mis-classification risk at each spatial location. Geostatistical stochastic simulations incorporate the spatial correlation and the small scale variability which is hard to capture with only seismic information because of the limits of resolution. Combining deterministic physical models with statistical techniques has provided us with a successful way of performing quantitative interpretation and estimation of reservoir properties from seismic data. These formulations identify not only the most likely interpretation but also the uncertainty of the interpretation, and serve as a guide for quantitative decision analysis. The methodology shown in this article is applied successfully to map petroleum reservoirs, and the examples are from relatively deeply buried oil fields. However, we suggest that this approach can also be carried out for improved characterization of shallow hydrologic aquifers using shallow seismic or GPR data.
Tuberculosis Data and Statistics
... Organization Chart Advisory Groups Federal TB Task Force Data and Statistics Language: English Español (Spanish) Recommend on ... United States publication. PDF [6 MB] Interactive TB Data Tool Online Tuberculosis Information System (OTIS) OTIS is ...
As watershed groups in the state of Georgia form and develop, they have a need for collecting, managing, and analyzing data associated with their watershed. Possible sources of data for flow, water quality, biology, habitat, and watershed characteristics include the U.S. Geologic...
Data collection and interpretation.
Citerio, Giuseppe; Park, Soojin; Schmidt, J Michael; Moberg, Richard; Suarez, Jose I; Le Roux, Peter D
2015-06-01
Patient monitoring is routinely performed in all patients who receive neurocritical care. The combined use of monitors, including the neurologic examination, laboratory analysis, imaging studies, and physiological parameters, is common in a platform called multi-modality monitoring (MMM). However, the full potential of MMM is only beginning to be realized since for the most part, decision making historically has focused on individual aspects of physiology in a largely threshold-based manner. The use of MMM now is being facilitated by the evolution of bio-informatics in critical care including developing techniques to acquire, store, retrieve, and display integrated data and new analytic techniques for optimal clinical decision making. In this review, we will discuss the crucial initial steps toward data and information management, which in this emerging era of data-intensive science is already shifting concepts of care for acute brain injury and has the potential to both reshape how we do research and enhance cost-effective clinical care. PMID:25846711
The Statistical Interpretation of Entropy: An Activity
ERIC Educational Resources Information Center
Timmberlake, Todd
2010-01-01
The second law of thermodynamics, which states that the entropy of an isolated macroscopic system can increase but will not decrease, is a cornerstone of modern physics. Ludwig Boltzmann argued that the second law arises from the motion of the atoms that compose the system. Boltzmann's statistical mechanics provides deep insight into the…
Interpreting Data: The Hybrid Mind
ERIC Educational Resources Information Center
Heisterkamp, Kimberly; Talanquer, Vicente
2015-01-01
The central goal of this study was to characterize major patterns of reasoning exhibited by college chemistry students when analyzing and interpreting chemical data. Using a case study approach, we investigated how a representative student used chemical models to explain patterns in the data based on structure-property relationships. Our results…
The Statistical Interpretation of Entropy: An Activity
NASA Astrophysics Data System (ADS)
Timmberlake, Todd
2010-11-01
The second law of thermodynamics, which states that the entropy of an isolated macroscopic system can increase but will not decrease, is a cornerstone of modern physics. Ludwig Boltzmann argued that the second law arises from the motion of the atoms that compose the system. Boltzmann's statistical mechanics provides deep insight into the functioning of the second law and also provided evidence for the existence of atoms at a time when many scientists (like Ernst Mach and Wilhelm Ostwald) were skeptical.
Workplace statistical literacy for teachers: interpreting box plots
NASA Astrophysics Data System (ADS)
Pierce, Robyn; Chick, Helen
2013-06-01
As a consequence of the increased use of data in workplace environments, there is a need to understand the demands that are placed on users to make sense of such data. In education, teachers are being increasingly expected to interpret and apply complex data about student and school performance, and, yet it is not clear that they always have the appropriate knowledge and experience to interpret the graphs, tables and other data that they receive. This study examined the statistical literacy demands placed on teachers, with a particular focus on box plot representations. Although box plots summarise the data in a way that makes visual comparisons possible across sets of data, this study showed that teachers do not always have the necessary fluency with the representation to describe correctly how the data are distributed in the representation. In particular, a significant number perceived the size of the regions of the box plot to be depicting frequencies rather than density, and there were misconceptions associated with outlying data that were not displayed on the plot. As well, teachers' perceptions of box plots were found to relate to three themes: attitudes, perceived value and misconceptions.
For a statistical interpretation of Helmholtz' thermal displacement
NASA Astrophysics Data System (ADS)
Podio-Guidugli, Paolo
2016-05-01
On moving from the classic papers by Einstein and Langevin on Brownian motion, two consistent statistical interpretations are given for the thermal displacement, a scalar field formally introduced by Helmholtz, whose time derivative is by definition the absolute temperature.
Paleomicrobiology Data: Authentification and Interpretation.
Drancourt, Michel
2016-06-01
The authenticity of some of the very first works in the field of paleopathology has been questioned, and standards have been progressively established for the experiments and the interpretation of data. Whereas most problems initially arose from the contamination of ancient specimens with modern human DNA, the situation is different in the field of paleomicrobiology, in which the risk for contamination is well-known and adequately managed by any laboratory team with expertise in the routine diagnosis of modern-day infections. Indeed, the exploration of ancient microbiota and pathogens is best done by such laboratory teams, with research directed toward the discovery and implementation of new techniques and the interpretation of data. PMID:27337456
The Statistical Interpretation of Classical Thermodynamic Heating and Expansion Processes
ERIC Educational Resources Information Center
Cartier, Stephen F.
2011-01-01
A statistical model has been developed and applied to interpret thermodynamic processes typically presented from the macroscopic, classical perspective. Through this model, students learn and apply the concepts of statistical mechanics, quantum mechanics, and classical thermodynamics in the analysis of the (i) constant volume heating, (ii)…
Muscular Dystrophy: Data and Statistics
... Statistics Recommend on Facebook Tweet Share Compartir MD STAR net Data and Statistics The following data and ... research [ Read Article ] For more information on MD STAR net see Research and Tracking . Key Findings Feature ...
Linda Stetzenbach; Lauren Nemnich; Davor Novosel
2009-08-31
Three independent tasks had been performed (Stetzenbach 2008, Stetzenbach 2008b, Stetzenbach 2009) to measure a variety of parameters in normative buildings across the United States. For each of these tasks 10 buildings were selected as normative indoor environments. Task 1 focused on office buildings, Task 13 focused on public schools, and Task 0606 focused on high performance buildings. To perform this task it was necessary to restructure the database for the Indoor Environmental Quality (IEQ) data and the Sound measurement as several issues were identified and resolved prior to and during the transfer of these data sets into SPSS. During overview discussions with the statistician utilized in this task it was determined that because the selection of indoor zones (1-6) was independently selected within each task; zones were not related by location across tasks. Therefore, no comparison would be valid across zones for the 30 buildings so the by location (zone) data were limited to three analysis sets of the buildings within each task. In addition, differences in collection procedures for lighting were used in Task 0606 as compared to Tasks 01 & 13 to improve sample collection. Therefore, these data sets could not be merged and compared so effects by-day data were run separately for Task 0606 and only Task 01 & 13 data were merged. Results of the statistical analysis of the IEQ parameters show statistically significant differences were found among days and zones for all tasks, although no differences were found by-day for Draft Rate data from Task 0606 (p>0.05). Thursday measurements of IEQ parameters were significantly different from Tuesday, and most Wednesday measures for all variables of Tasks 1 & 13. Data for all three days appeared to vary for Operative Temperature, whereas only Tuesday and Thursday differed for Draft Rate 1m. Although no Draft Rate measures within Task 0606 were found to significantly differ by-day, Temperature measurements for Tuesday and
Spirakis, C.S.; Pierson, C.T.; Santos, E.S.; Fishman, N.S.
1983-01-01
Statistical treatment of analytical data from 106 samples of uranium-mineralized and unmineralized or weakly mineralized rocks of the Morrison Formation from the northeastern part of the Church Rock area of the Grants uranium region indicates that along with uranium, the deposits in the northeast Church Rock area are enriched in barium, sulfur, sodium, vanadium and equivalent uranium. Selenium and molybdenum are sporadically enriched in the deposits and calcium, manganese, strontium, and yttrium are depleted. Unlike the primary deposits of the San Juan Basin, the deposits in the northeast part of the Church Rock area contain little organic carbon and several elements that are characteristically enriched in the primary deposits are not enriched or are enriched to a much lesser degree in the Church Rock deposits. The suite of elements associated with the deposits in the northeast part of the Church Rock area is also different from the suite of elements associated with the redistributed deposits in the Ambrosia Lake district. This suggests that the genesis of the Church Rock deposits is different, at least in part, from the genesis of the primary deposits of the San Juan Basin or the redistributed deposits at Ambrosia Lake.
NASA Astrophysics Data System (ADS)
Tema, E.; Zanella, E.; Pavón-Carrasco, F. J.; Kondopoulou, D.; Pavlides, S.
2015-10-01
We present the results of palaeomagnetic analysis on Late Bronge Age pottery from Santorini carried out in order to estimate the thermal effect of the Minoan eruption on the pre-Minoan habitation level. A total of 170 specimens from 108 ceramic fragments have been studied. The ceramics were collected from the surface of the pre-Minoan palaeosol at six different sites, including also samples from the Akrotiri archaeological site. The deposition temperatures of the first pyroclastic products have been estimated by the maximum overlap of the re-heating temperature intervals given by the individual fragments at site level. A new statistical elaboration of the temperature data has also been proposed, calculating at 95 per cent of probability the re-heating temperatures at each site. The obtained results show that the precursor tephra layer and the first pumice fall of the eruption were hot enough to re-heat the underlying ceramics at temperatures 160-230 °C in the non-inhabited sites while the temperatures recorded inside the Akrotiri village are slightly lower, varying from 130 to 200 °C. The decrease of the temperatures registered in the human settlements suggests that there was some interaction between the buildings and the pumice fallout deposits while probably the buildings debris layer caused by the preceding and syn-eruption earthquakes has also contributed to the decrease of the recorded re-heating temperatures.
On Interpreting Test Scores as Social Indicators: Statistical Considerations.
ERIC Educational Resources Information Center
Spencer, Bruce D.
1983-01-01
Because test scores are ordinal not cordinal attributes, the average test score often is a misleading way to summarize the scores of a group of individuals. Similarly, correlation coefficients may be misleading summary measures of association between test scores. Proper, readily interpretable, summary statistics are developed from a theory of…
Comparing survival curves using an easy to interpret statistic.
Hess, Kenneth R
2010-10-15
Here, I describe a statistic for comparing two survival curves that has a clear and obvious meaning and has a long history in biostatistics. Suppose we are comparing survival times associated with two treatments A and B. The statistic operates in such a way that if it takes on the value 0.95, then the interpretation is that a randomly chosen patient treated with A has a 95% chance of surviving longer than a randomly chosen patient treated with B. This statistic was first described in the 1950s, and was generalized in the 1960s to work with right-censored survival times. It is a useful and convenient measure for assessing differences between survival curves. Software for computing the statistic is readily available on the Internet. PMID:20732962
Adapting internal statistical models for interpreting visual cues to depth
Seydell, Anna; Knill, David C.; Trommershäuser, Julia
2010-01-01
The informativeness of sensory cues depends critically on statistical regularities in the environment. However, statistical regularities vary between different object categories and environments. We asked whether and how the brain changes the prior assumptions about scene statistics used to interpret visual depth cues when stimulus statistics change. Subjects judged the slants of stereoscopically presented figures by adjusting a virtual probe perpendicular to the surface. In addition to stereoscopic disparities, the aspect ratio of the stimulus in the image provided a “figural compression” cue to slant, whose reliability depends on the distribution of aspect ratios in the world. As we manipulated this distribution from regular to random and back again, subjects’ reliance on the compression cue relative to stereoscopic cues changed accordingly. When we randomly interleaved stimuli from shape categories (ellipses and diamonds) with different statistics, subjects gave less weight to the compression cue for figures from the category with more random aspect ratios. Our results demonstrate that relative cue weights vary rapidly as a function of recently experienced stimulus statistics, and that the brain can use different statistical models for different object categories. We show that subjects’ behavior is consistent with that of a broad class of Bayesian learning models. PMID:20465321
Workplace Statistical Literacy for Teachers: Interpreting Box Plots
ERIC Educational Resources Information Center
Pierce, Robyn; Chick, Helen
2013-01-01
As a consequence of the increased use of data in workplace environments, there is a need to understand the demands that are placed on users to make sense of such data. In education, teachers are being increasingly expected to interpret and apply complex data about student and school performance, and, yet it is not clear that they always have the…
Hahn, A.A.
1994-11-01
The complexity of instrumentation sometimes requires data analysis to be done before the result is presented to the control room. This tutorial reviews some of the theoretical assumptions underlying the more popular forms of data analysis and presents simple examples to illuminate the advantages and hazards of different techniques.
Pass-Fail Testing: Statistical Requirements and Interpretations
Gilliam, David; Leigh, Stefan; Rukhin, Andrew; Strawderman, William
2009-01-01
Performance standards for detector systems often include requirements for probability of detection and probability of false alarm at a specified level of statistical confidence. This paper reviews the accepted definitions of confidence level and of critical value. It describes the testing requirements for establishing either of these probabilities at a desired confidence level. These requirements are computable in terms of functions that are readily available in statistical software packages and general spreadsheet applications. The statistical interpretations of the critical values are discussed. A table is included for illustration, and a plot is presented showing the minimum required numbers of pass-fail tests. The results given here are applicable to one-sided testing of any system with performance characteristics conforming to a binomial distribution. PMID:27504221
Spina Bifida Data and Statistics
... Materials About Us Information For... Media Policy Makers Data and Statistics Recommend on Facebook Tweet Share Compartir ... non-Hispanic white and non-Hispanic black women. Data from 12 state-based birth defects tracking programs ...
Birth Defects Data and Statistics
... Websites About Us Information For... Media Policy Makers Data & Statistics Language: English Español (Spanish) Recommend on Facebook ... of birth defects in the United States. For data on specific birth defects, please visit the specific ...
[Big data in official statistics].
Zwick, Markus
2015-08-01
The concept of "big data" stands to change the face of official statistics over the coming years, having an impact on almost all aspects of data production. The tasks of future statisticians will not necessarily be to produce new data, but rather to identify and make use of existing data to adequately describe social and economic phenomena. Until big data can be used correctly in official statistics, a lot of questions need to be answered and problems solved: the quality of data, data protection, privacy, and the sustainable availability are some of the more pressing issues to be addressed. The essential skills of official statisticians will undoubtedly change, and this implies a number of challenges to be faced by statistical education systems, in universities, and inside the statistical offices. The national statistical offices of the European Union have concluded a concrete strategy for exploring the possibilities of big data for official statistics, by means of the Big Data Roadmap and Action Plan 1.0. This is an important first step and will have a significant influence on implementing the concept of big data inside the statistical offices of Germany. PMID:26077871
Statistical Interpretation of Key Comparison Reference Value and Degrees of Equivalence
Kacker, R. N.; Datla, R. U.; Parr, A. C.
2003-01-01
Key comparisons carried out by the Consultative Committees (CCs) of the International Committee of Weights and Measures (CIPM) or the Bureau International des Poids et Mesures (BIPM) are referred to as CIPM key comparisons. The outputs of a statistical analysis of the data from a CIPM key comparison are the key comparison reference value, the degrees of equivalence, and their associated uncertainties. The BIPM publications do not discuss statistical interpretation of these outputs. We discuss their interpretation under the following three statistical models: nonexistent laboratory-effects model, random laboratory-effects model, and systematic laboratory-effects model.
Statistical Interpretation of Natural and Technological Hazards in China
NASA Astrophysics Data System (ADS)
Borthwick, Alistair, ,, Prof.; Ni, Jinren, ,, Prof.
2010-05-01
China is prone to catastrophic natural hazards from floods, droughts, earthquakes, storms, cyclones, landslides, epidemics, extreme temperatures, forest fires, avalanches, and even tsunami. This paper will list statistics related to the six worst natural disasters in China over the past 100 or so years, ranked according to number of fatalities. The corresponding data for the six worst natural disasters in China over the past decade will also be considered. [The data are abstracted from the International Disaster Database, Centre for Research on the Epidemiology of Disasters (CRED), Université Catholique de Louvain, Brussels, Belgium, http://www.cred.be/ where a disaster is defined as occurring if one of the following criteria is fulfilled: 10 or more people reported killed; 100 or more people reported affected; a call for international assistance; or declaration of a state of emergency.] The statistics include the number of occurrences of each type of natural disaster, the number of deaths, the number of people affected, and the cost in billions of US dollars. Over the past hundred years, the largest disasters may be related to the overabundance or scarcity of water, and to earthquake damage. However, there has been a substantial relative reduction in fatalities due to water related disasters over the past decade, even though the overall numbers of people affected remain huge, as does the economic damage. This change is largely due to the efforts put in by China's water authorities to establish effective early warning systems, the construction of engineering countermeasures for flood protection, the implementation of water pricing and other measures for reducing excessive consumption during times of drought. It should be noted that the dreadful death toll due to the Sichuan Earthquake dominates recent data. Joint research has been undertaken between the Department of Environmental Engineering at Peking University and the Department of Engineering Science at Oxford
STATISTICAL SAMPLING AND DATA ANALYSIS
Research is being conducted to develop approaches to improve soil and sediment sampling techniques, measurement design and geostatistics, and data analysis via chemometric, environmetric, and robust statistical methods. Improvements in sampling contaminated soil and other hetero...
The interpretation of spectral data
NASA Technical Reports Server (NTRS)
Holter, M. R.
1972-01-01
The characteristics and extent of data which is obtainable by electromagnetic spectrum sensing and the application to earth resources survey are discussed. The wavelength and frequency ranges of operation for various remote sensors are tabulated. The spectral sensitivities of various sensing instruments are diagrammed. Examples of aerial photography to show the effects of lighting and seasonal variations on earth resources data are provided. Specific examples of multiband photography and multispectral imagery to crop analysis are included.
Structural interpretation of seismic data and inherent uncertainties
NASA Astrophysics Data System (ADS)
Bond, Clare
2013-04-01
Geoscience is perhaps unique in its reliance on incomplete datasets and building knowledge from their interpretation. This interpretation basis for the science is fundamental at all levels; from creation of a geological map to interpretation of remotely sensed data. To teach and understand better the uncertainties in dealing with incomplete data we need to understand the strategies individual practitioners deploy that make them effective interpreters. The nature of interpretation is such that the interpreter needs to use their cognitive ability in the analysis of the data to propose a sensible solution in their final output that is both consistent not only with the original data but also with other knowledge and understanding. In a series of experiments Bond et al. (2007, 2008, 2011, 2012) investigated the strategies and pitfalls of expert and non-expert interpretation of seismic images. These studies focused on large numbers of participants to provide a statistically sound basis for analysis of the results. The outcome of these experiments showed that a wide variety of conceptual models were applied to single seismic datasets. Highlighting not only spatial variations in fault placements, but whether interpreters thought they existed at all, or had the same sense of movement. Further, statistical analysis suggests that the strategies an interpreter employs are more important than expert knowledge per se in developing successful interpretations. Experts are successful because of their application of these techniques. In a new set of experiments a small number of experts are focused on to determine how they use their cognitive and reasoning skills, in the interpretation of 2D seismic profiles. Live video and practitioner commentary were used to track the evolving interpretation and to gain insight on their decision processes. The outputs of the study allow us to create an educational resource of expert interpretation through online video footage and commentary with
Interpreting health statistics for policymaking: the story behind the headlines.
Walker, Neff; Bryce, Jennifer; Black, Robert E
2007-03-17
Politicians, policymakers, and public-health professionals make complex decisions on the basis of estimates of disease burden from different sources, many of which are "marketed" by skilled advocates. To help people who rely on such statistics make more informed decisions, we explain how health estimates are developed, and offer basic guidance on how to assess and interpret them. We describe the different levels of estimates used to quantify disease burden and its correlates; understanding how closely linked a type of statistic is to disease and death rates is crucial in designing health policies and programmes. We also suggest questions that people using such statistics should ask and offer tips to help separate advocacy from evidence-based positions. Global health agencies have a key role in communicating robust estimates of disease, as do policymakers at national and subnational levels where key public-health decisions are made. A common framework and standardised methods, building on the work of Child Health Epidemiology Reference Group (CHERG) and others, are urgently needed. PMID:17368157
Engine Data Interpretation System (EDIS)
NASA Technical Reports Server (NTRS)
Cost, Thomas L.; Hofmann, Martin O.
1990-01-01
A prototype of an expert system was developed which applies qualitative or model-based reasoning to the task of post-test analysis and diagnosis of data resulting from a rocket engine firing. A combined component-based and process theory approach is adopted as the basis for system modeling. Such an approach provides a framework for explaining both normal and deviant system behavior in terms of individual component functionality. The diagnosis function is applied to digitized sensor time-histories generated during engine firings. The generic system is applicable to any liquid rocket engine but was adapted specifically in this work to the Space Shuttle Main Engine (SSME). The system is applied to idealized data resulting from turbomachinery malfunction in the SSME.
The broad topic of biomarker research has an often-overlooked component: the documentation and interpretation of the surrounding chemical environment and other meta-data, especially from visualization, analytical, and statistical perspectives (Pleil et al. 2014; Sobus et al. 2011...
Statistics by Example, Exploring Data.
ERIC Educational Resources Information Center
Mosteller, Frederick; And Others
Part of a series of four pamphlets providing real-life problems in probability and statistics for the secondary school level, this booklet shows how to organize data in tables and graphs in order to get and to exhibit messages. Elementary probability concepts are also introduced. Fourteen different problem situations arising from biology,…
STATISTICS AND DATA ANALYSIS WORKSHOP
On Janauary 15 and 16, 2003, a workshop for Tribal water resources staff on Statistics and Data Analysis was held at the Indian Springs Lodge on the Forest County Potowatomi Reservation near Wabeno, WI. The workshop was co-sponsored by the EPA, Sokaogon Chippewa (Mole Lake) Comm...
Interpretation of data from uphole refraction surveys
NASA Astrophysics Data System (ADS)
Franklin, A. G.
1980-06-01
The conventional interpretation of the data from an uphole refraction survey is based on the similarity between a plot of contours drawn on uphole arrival times and a wave-front diagram, which shows successive positions of the wave front produced by a single shot location at the ground surface. However, the two are alike only when the ground consists solely of homogeneous strata, oriented either horizontally or vertically. In this report, the term 'Meissner diagram' is used for the plot of arrival times from the uphole refraction survey in order to maintain the distinction between it and a true wave-front diagram. Where departures from the case of homogeneous, horizontal strata exist, the interpretation of the Meissner diagram is not straightforward, although a partial interpretation in terms of a horizontally stratified system is usually possible. A systematic approach to the interpretation problem, making use of such a partial interpretation, is proposed.
Interpretation of gamma-ray burst source count statistics
NASA Technical Reports Server (NTRS)
Petrosian, Vahe
1993-01-01
Ever since the discovery of gamma-ray bursts, the so-called log N-log S relation has been used for determination of their distances and distribution. This task has not been straightforward because of varying thresholds for the detection of bursts. Most of the current analyses of these data are couched in terms of ambiguous distributions, such as the distribution of Cp/Clim, the ratio of peak to threshold photon count rates, or the distribution of V/Vmax = (Cp/Clim) exp -3/2. It is shown that these distributions are not always a true reflection of the log N-log S relation. Some kind of deconvolution is required for obtaining the true log N-log S. Therefore, care is required in the interpretation of results of such analyses. A new method of analysis of these data is described, whereby the bivariate distribution of Cp and Clim is obtained directly from the data.
Statistical issues in the design, analysis and interpretation of animal carcinogenicity studies.
Haseman, J K
1984-01-01
Statistical issues in the design, analysis and interpretation of animal carcinogenicity studies are discussed. In the area of experimental design, issues that must be considered include randomization of animals, sample size considerations, dose selection and allocation of animals to experimental groups, and control of potentially confounding factors. In the analysis of tumor incidence data, survival differences among groups should be taken into account. It is important to try to distinguish between tumors that contribute to the death of the animal and "incidental" tumors discovered at autopsy in an animal dying of an unrelated cause. Life table analyses (appropriate for lethal tumors) and incidental tumor tests (appropriate for nonfatal tumors) are described, and the utilization of these procedures by the National Toxicology Program is discussed. Despite the fact that past interpretations of carcinogenicity data have tended to focus on pairwise comparisons in general and high-dose effects in particular, the importance of trend tests should not be overlooked, since these procedures are more sensitive than pairwise comparisons to the detection of carcinogenic effects. No rigid statistical "decision rule" should be employed in the interpretation of carcinogenicity data. Although the statistical significance of an observed tumor increase is perhaps the single most important piece of evidence used in the evaluation process, a number of biological factors must also be taken into account. The use of historical control data, the false-positive issue and the interpretation of negative trends are also discussed. PMID:6525993
Interpreting the flock algorithm from a statistical perspective.
Anderson, Eric C; Barry, Patrick D
2015-09-01
We show that the algorithm in the program flock (Duchesne & Turgeon 2009) can be interpreted as an estimation procedure based on a model essentially identical to the structure (Pritchard et al. 2000) model with no admixture and without correlated allele frequency priors. Rather than using MCMC, the flock algorithm searches for the maximum a posteriori estimate of this structure model via a simulated annealing algorithm with a rapid cooling schedule (namely, the exponent on the objective function →∞). We demonstrate the similarities between the two programs in a two-step approach. First, to enable rapid batch processing of many simulated data sets, we modified the source code of structure to use the flock algorithm, producing the program flockture. With simulated data, we confirmed that results obtained with flock and flockture are very similar (though flockture is some 200 times faster). Second, we simulated multiple large data sets under varying levels of population differentiation for both microsatellite and SNP genotypes. We analysed them with flockture and structure and assessed each program on its ability to cluster individuals to their correct subpopulation. We show that flockture yields results similar to structure albeit with greater variability from run to run. flockture did perform better than structure when genotypes were composed of SNPs and differentiation was moderate (FST= 0.022-0.032). When differentiation was low, structure outperformed flockture for both marker types. On large data sets like those we simulated, it appears that flock's reliance on inference rules regarding its 'plateau record' is not helpful. Interpreting flock's algorithm as a special case of the model in structure should aid in understanding the program's output and behaviour. PMID:25913195
Analysis of Visual Interpretation of Satellite Data
NASA Astrophysics Data System (ADS)
Svatonova, H.
2016-06-01
Millions of people of all ages and expertise are using satellite and aerial data as an important input for their work in many different fields. Satellite data are also gradually finding a new place in education, especially in the fields of geography and in environmental issues. The article presents the results of an extensive research in the area of visual interpretation of image data carried out in the years 2013 - 2015 in the Czech Republic. The research was aimed at comparing the success rate of the interpretation of satellite data in relation to a) the substrates (to the selected colourfulness, the type of depicted landscape or special elements in the landscape) and b) to selected characteristics of users (expertise, gender, age). The results of the research showed that (1) false colour images have a slightly higher percentage of successful interpretation than natural colour images, (2) colourfulness of an element expected or rehearsed by the user (regardless of the real natural colour) increases the success rate of identifying the element (3) experts are faster in interpreting visual data than non-experts, with the same degree of accuracy of solving the task, and (4) men and women are equally successful in the interpretation of visual image data.
Statistically significant relational data mining :
Berry, Jonathan W.; Leung, Vitus Joseph; Phillips, Cynthia Ann; Pinar, Ali; Robinson, David Gerald; Berger-Wolf, Tanya; Bhowmick, Sanjukta; Casleton, Emily; Kaiser, Mark; Nordman, Daniel J.; Wilson, Alyson G.
2014-02-01
This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publications that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.
Spatial Statistical Data Fusion (SSDF)
NASA Technical Reports Server (NTRS)
Braverman, Amy J.; Nguyen, Hai M.; Cressie, Noel
2013-01-01
As remote sensing for scientific purposes has transitioned from an experimental technology to an operational one, the selection of instruments has become more coordinated, so that the scientific community can exploit complementary measurements. However, tech nological and scientific heterogeneity across devices means that the statistical characteristics of the data they collect are different. The challenge addressed here is how to combine heterogeneous remote sensing data sets in a way that yields optimal statistical estimates of the underlying geophysical field, and provides rigorous uncertainty measures for those estimates. Different remote sensing data sets may have different spatial resolutions, different measurement error biases and variances, and other disparate characteristics. A state-of-the-art spatial statistical model was used to relate the true, but not directly observed, geophysical field to noisy, spatial aggregates observed by remote sensing instruments. The spatial covariances of the true field and the covariances of the true field with the observations were modeled. The observations are spatial averages of the true field values, over pixels, with different measurement noise superimposed. A kriging framework is used to infer optimal (minimum mean squared error and unbiased) estimates of the true field at point locations from pixel-level, noisy observations. A key feature of the spatial statistical model is the spatial mixed effects model that underlies it. The approach models the spatial covariance function of the underlying field using linear combinations of basis functions of fixed size. Approaches based on kriging require the inversion of very large spatial covariance matrices, and this is usually done by making simplifying assumptions about spatial covariance structure that simply do not hold for geophysical variables. In contrast, this method does not require these assumptions, and is also computationally much faster. This method is
Statistical analysis of pyroshock data
NASA Astrophysics Data System (ADS)
Hughes, William O.
2002-05-01
The sample size of aerospace pyroshock test data is typically small. This often forces the engineer to make assumptions on its population distribution and to use conservative margins or methodologies in determining shock specifications. For example, the maximum expected environment is often derived by adding 3-6 dB to the maximum envelope of a limited amount of shock data. The recent availability of a large amount of pyroshock test data has allowed a rare statistical analysis to be performed. Findings and procedures from this analysis will be explained, including information on population distributions, procedures to properly combine families of test data, and methods of deriving appropriate shock specifications for a multipoint shock source.
A plug-and-play approach to automated data interpretation: the data interpretation module (DIM)
Hartog, B.K.D.; Elling, J.W.; Mniszewski, S.M.
1995-12-31
The Contaminant Analysis Automation (CAA) Project`s automated analysis laboratory provides a ``plug-and-play`` reusable infrastructure for many types of environmental assays. As a sample progresses through sample preparation to sample analysis and finally to data interpretation, increasing expertise and judgment are needed at each step. The Data Interpretation Module (DIM) echoes the automation`s plug-and-play philosophy as a reusable engine and architecture for handling both the uncertainty and knowledge required for interpreting contaminant sample data. This presentation describes the implementation and performance of the DIM in interpreting polychlorinated biphenyl (PCB) gas chromatogram and shows the DIM architecture`s reusability for other applications.
de Irala, J; Fernandez-Crehuet Navajas, R; Serrano del Castillo, A
1997-03-01
This study describes the behavior of eight statistical programs (BMDP, EGRET, JMP, SAS, SPSS, STATA, STATISTIX, and SYSTAT) when performing a logistic regression with a simulated data set that contains a numerical problem created by the presence of a cell value equal to zero. The programs respond in different ways to this problem. Most of them give a warning, although many simultaneously present incorrect results, among which are confidence intervals that tend toward infinity. Such results can mislead the user. Various guidelines are offered for detecting these problems in actual analyses, and users are reminded of the importance of critical interpretation of the results of statistical programs. PMID:9162592
Data Interpretation in the Digital Age
Leonelli, Sabina
2014-01-01
The consultation of internet databases and the related use of computer software to retrieve, visualise and model data have become key components of many areas of scientific research. This paper focuses on the relation of these developments to understanding the biology of organisms, and examines the conditions under which the evidential value of data posted online is assessed and interpreted by the researchers who access them, in ways that underpin and guide the use of those data to foster discovery. I consider the types of knowledge required to interpret data as evidence for claims about organisms, and in particular the relevance of knowledge acquired through physical interaction with actual organisms to assessing the evidential value of data found online. I conclude that familiarity with research in vivo is crucial to assessing the quality and significance of data visualised in silico; and that studying how biological data are disseminated, visualised, assessed and interpreted in the digital age provides a strong rationale for viewing scientific understanding as a social and distributed, rather than individual and localised, achievement. PMID:25729262
Data compression preserving statistical independence
NASA Technical Reports Server (NTRS)
Morduch, G. E.; Rice, W. M.
1973-01-01
The purpose of this study was to determine the optimum points of evaluation of data compressed by means of polynomial smoothing. It is shown that a set y of m statistically independent observations Y(t sub 1), Y(t sub 2), ... Y(t sub m) of a quantity X(t), which can be described by a (n-1)th degree polynomial in time, may be represented by a set Z of n statistically independent compressed observations Z (tau sub 1), Z (tau sub 2),...Z (tau sub n), such that The compressed set Z has the same information content as the observed set Y. the times tau sub 1, tau sub 2,.. tau sub n are the zeros of an nth degree polynomial P sub n, to whose definition and properties the bulk of this report is devoted. The polynomials P sub n are defined as functions of the observation times t sub 1, t sub 2,.. t sub n, and it is interesting to note that if the observation times are continuously distributed the polynomials P sub n degenerate to legendre polynomials. The proposed data compression scheme is a little more complex than those usually employed, but has the advantage of preserving all the information content of the original observations.
Interpreting genomic data via entropic dissection
Azad, Rajeev K.; Li, Jing
2013-01-01
Since the emergence of high-throughput genome sequencing platforms and more recently the next-generation platforms, the genome databases are growing at an astronomical rate. Tremendous efforts have been invested in recent years in understanding intriguing complexities beneath the vast ocean of genomic data. This is apparent in the spurt of computational methods for interpreting these data in the past few years. Genomic data interpretation is notoriously difficult, partly owing to the inherent heterogeneities appearing at different scales. Methods developed to interpret these data often suffer from their inability to adequately measure the underlying heterogeneities and thus lead to confounding results. Here, we present an information entropy-based approach that unravels the distinctive patterns underlying genomic data efficiently and thus is applicable in addressing a variety of biological problems. We show the robustness and consistency of the proposed methodology in addressing three different biological problems of significance—identification of alien DNAs in bacterial genomes, detection of structural variants in cancer cell lines and alignment-free genome comparison. PMID:23036836
Regional interpretation of Kansas aeromagnetic data
Yarger, H.L.
1982-01-01
The aeromagnetic mapping techniques used in a regional aeromagnetic survey of the state are documented and a qualitative regional interpretation of the magnetic basement is presented. Geothermal gradients measured and data from oil well records indicate that geothermal resources in Kansas are of a low-grade nature. However, considerable variation in the gradient is noted statewide within the upper 500 meters of the sedimentary section; this suggests the feasibility of using groundwater for space heating by means of heat pumps.
Using Statistics to Lie, Distort, and Abuse Data
ERIC Educational Resources Information Center
Bintz, William; Moore, Sara; Adams, Cheryll; Pierce, Rebecca
2009-01-01
Statistics is a branch of mathematics that involves organization, presentation, and interpretation of data, both quantitative and qualitative. Data do not lie, but people do. On the surface, quantitative data are basically inanimate objects, nothing more than lifeless and meaningless symbols that appear on a page, calculator, computer, or in one's…
The Lure of Statistics in Data Mining
ERIC Educational Resources Information Center
Grover, Lovleen Kumar; Mehra, Rajni
2008-01-01
The field of Data Mining like Statistics concerns itself with "learning from data" or "turning data into information". For statisticians the term "Data mining" has a pejorative meaning. Instead of finding useful patterns in large volumes of data as in the case of Statistics, data mining has the connotation of searching for data to fit preconceived…
Statistical Interpretation of the Local Field Inside Dielectrics.
ERIC Educational Resources Information Center
Berrera, Ruben G.; Mello, P. A.
1982-01-01
Compares several derivations of the Clausius-Mossotti relation to analyze consistently the nature of approximations used and their range of applicability. Also presents a statistical-mechanical calculation of the local field for classical system of harmonic oscillators interacting via the Coulomb potential. (Author/SK)
Confounded Statistical Analyses Hinder Interpretation of the NELP Report
ERIC Educational Resources Information Center
Paris, Scott G.; Luo, Serena Wenshu
2010-01-01
The National Early Literacy Panel (2008) report identified early predictors of reading achievement as good targets for instruction, and many of those skills are related to decoding. In this article, the authors suggest that the developmental trajectories of rapidly developing skills pose problems for traditional statistical analyses. Rapidly…
Statistical characteristics of MST radar echoes and its interpretation
NASA Technical Reports Server (NTRS)
Woodman, Ronald F.
1989-01-01
Two concepts of fundamental importance are reviewed: the autocorrelation function and the frequency power spectrum. In addition, some turbulence concepts, the relationship between radar signals and atmospheric medium statistics, partial reflection, and the characteristics of noise and clutter interference are discussed.
Interpretation of Quantitative Shotgun Proteomic Data.
Aasebø, Elise; Berven, Frode S; Selheim, Frode; Barsnes, Harald; Vaudel, Marc
2016-01-01
In quantitative proteomics, large lists of identified and quantified proteins are used to answer biological questions in a systemic approach. However, working with such extensive datasets can be challenging, especially when complex experimental designs are involved. Here, we demonstrate how to post-process large quantitative datasets, detect proteins of interest, and annotate the data with biological knowledge. The protocol presented can be achieved without advanced computational knowledge thanks to the user-friendly Perseus interface (available from the MaxQuant website, www.maxquant.org ). Various visualization techniques facilitating the interpretation of quantitative results in complex biological systems are also highlighted. PMID:26700055
Interpreting magnetic data by integral moments
NASA Astrophysics Data System (ADS)
Tontini, F. Caratori; Pedersen, L. B.
2008-09-01
The use of the integral moments for interpreting magnetic data is based on a very elegant property of potential fields, but in the past it has not been completely exploited due to problems concerning real data. We describe a new 3-D development of previous 2-D results aimed at determining the magnetization direction, extending the calculation to second-order moments to recover the centre of mass of the magnetization distribution. The method is enhanced to reduce the effects of the regional field that often alters the first-order solutions. Moreover, we introduce an iterative correction to properly assess the errors coming from finite-size surveys or interaction with neighbouring anomalies, which are the most important causes of the failing of the method for real data. We test the method on some synthetic examples, and finally, we show the results obtained by analysing the aeromagnetic anomaly of the Monte Vulture volcano in Southern Italy.
Need for Caution in Interpreting Extreme Weather Statistics
NASA Astrophysics Data System (ADS)
Sardeshmukh, P. D.; Compo, G. P.; Penland, M. C.
2011-12-01
Given the substantial anthropogenic contribution to 20th century global warming, it is tempting to seek an anthropogenic component in any unusual recent weather event, or more generally in any observed change in the statistics of extreme weather. This study cautions that such detection and attribution efforts may, however, very likely lead to wrong conclusions if the non-Gaussian aspects of the probability distributions of observed daily atmospheric variations, especially their skewness and heavy tails, are not explicitly taken into account. Departures of three or more standard deviations from the mean, although rare, are far more common in such a non-Gaussian world than they are in a Gaussian world. This exacerbates the already difficult problem of establishing the significance of changes in extreme value probabilities from historical climate records of limited length, using either raw histograms or Generalized Extreme Value (GEV) distributions fitted to the sample extreme values. A possible solution is suggested by the fact that the non-Gaussian aspects of the observed distributions are well captured by a general class of "Stochastically Generated Skewed distributions" (SGS distributions) recently introduced in the meteorological literature by Sardeshmukh and Sura (J. Climate 2009). These distributions arise from simple modifications to a red noise process and reduce to Gaussian distributions under appropriate limits. As such, they represent perhaps the simplest physically based non-Gaussian prototypes of the distributions of daily atmospheric variations. Fitting such SGS distributions to all (not just the extreme) values in 25, 50, or 100-yr daily records also yields corresponding extreme value distributions that are much less prone to sampling uncertainty than GEV distributions. For both of the above reasons, SGS distributions provide an attractive alternative for assessing the significance of changes in extreme weather statistics (including changes in the
Aerosol backscatter lidar calibration and data interpretation
NASA Technical Reports Server (NTRS)
Kavaya, M. J.; Menzies, R. T.
1984-01-01
A treatment of the various factors involved in lidar data acquisition and analysis is presented. This treatment highlights sources of fundamental, systematic, modeling, and calibration errors that may affect the accurate interpretation and calibration of lidar aerosol backscatter data. The discussion primarily pertains to ground based, pulsed CO2 lidars that probe the troposphere and are calibrated using large, hard calibration targets. However, a large part of the analysis is relevant to other types of lidar systems such as lidars operating at other wavelengths; continuous wave (CW) lidars; lidars operating in other regions of the atmosphere; lidars measuring nonaerosol elastic or inelastic backscatter; airborne or Earth-orbiting lidar platforms; and lidars employing combinations of the above characteristics.
Data interpretation in the Automated Laboratory
Klatt, L.N.; Elling, J.W.; Mniszewski, S.
1995-12-01
The Contaminant Analysis Automation project envisions the analytical chemistry laboratory of the future being assembled from automation submodules that can be integrated into complete analysis system through a plug-and-play strategy. In this automated system the reduction of instrumental data to knowledge required by the laboratory customer must also be accomplished in an automated way. This paper presents the concept of an automated Data Interpretation Module (DIM) within the context of the plug-and-play automation strategy. The DIM is an expert system driven software module. The DIM functions as a standard laboratory module controlled by the system task sequence controller. The DIM consists of knowledge base(s) that accomplish the data assessment, quality control, and data analysis tasks. The expert system knowledge base(s) encapsulate the training and experience of the analytical chemist. Analysis of instrumental data by the DIM requires the use of pattern recognition techniques. Laboratory data from the analysis of PCBs will be used to illustrate the DIM.
DATA ON YOUTH, 1967, A STATISTICAL DOCUMENT.
ERIC Educational Resources Information Center
SCHEIDER, GEORGE
THE DATA IN THIS REPORT ARE STATISTICS ON YOUTH THROUGHOUT THE UNITED STATES AND IN NEW YORK STATE. INCLUDED ARE DATA ON POPULATION, SCHOOL STATISTICS, EMPLOYMENT, FAMILY INCOME, JUVENILE DELINQUENCY AND YOUTH CRIME (INCLUDING NEW YORK CITY FIGURES), AND TRAFFIC ACCIDENTS. THE STATISTICS ARE PRESENTED IN THE TEXT AND IN TABLES AND CHARTS. (NH)
Tools for interpretation of multispectral data
NASA Astrophysics Data System (ADS)
Speckert, Glen; Carpenter, Loren C.; Russell, Mike; Bradstreet, John; Waite, Tom; Conklin, Charlie
1990-08-01
The large size and multiple bands of todays satellite data require increasingly powerful tools in order to display and interpret the acquired imagery in a timely fashion. Pixar has developed two major tools for use in this data interpretation. These tools are the Electronic Light Table (ELT), and an extensive image processing package, ChapiP. These tools operate on images limited only by disk volume size, currently 3 Gbytes. The Electronic Light Table package provides a fully windowed interface to these large 12 bit monochrome and multiband images, passing images through a software defined image interpretation pipeline in real time during an interactive roam. A virtual image software framework allows interactive modification of the visible image. The roam software pipeline consists of a seventh order polynomial warp, bicubic resampling, a user registration affine, histogram drop sampling, a 5x5 unsharp mask, and per window contrast controls. It is important to note that these functions are done in software, and various performance tradeoffs can be made for different applications within a family of hardware configurations. Special high spped zoom, rotate, sharpness, and contrast operators provide interactive region of interest manipulation. Double window operators provide for flicker, fade, shade, and difference of two parent windows in a chained fashion. Overlay graphics capability is provided in a PostScfipt* windowed environment (NeWS**). The image is stored on disk as a multi resolution image pyramid. This allows resampling and other image operations independent of the zoom level. A set of tools layered upon ChapIP allow manipulation of the entire pyramid file. Arbitrary combinations of bands can be computed for arbitrary sized images, as well as other image processing operations. ChapIP can also be used in conjunction with ELT to dynamically operate on the current roaming window to append the image processing function onto the roam pipeline. Multiple Chapi
Laterally constrained inversion for CSAMT data interpretation
NASA Astrophysics Data System (ADS)
Wang, Ruo; Yin, Changchun; Wang, Miaoyue; Di, Qingyun
2015-10-01
Laterally constrained inversion (LCI) has been successfully applied to the inversion of dc resistivity, TEM and airborne EM data. However, it hasn't been yet applied to the interpretation of controlled-source audio-frequency magnetotelluric (CSAMT) data. In this paper, we apply the LCI method for CSAMT data inversion by preconditioning the Jacobian matrix. We apply a weighting matrix to Jacobian to balance the sensitivity of model parameters, so that the resolution with respect to different model parameters becomes more uniform. Numerical experiments confirm that this can improve the convergence of the inversion. We first invert a synthetic dataset with and without noise to investigate the effect of LCI applications to CSAMT data, for the noise free data, the results show that the LCI method can recover the true model better compared to the traditional single-station inversion; and for the noisy data, the true model is recovered even with a noise level of 8%, indicating that LCI inversions are to some extent noise insensitive. Then, we re-invert two CSAMT datasets collected respectively in a watershed and a coal mine area in Northern China and compare our results with those from previous inversions. The comparison with the previous inversion in a coal mine shows that LCI method delivers smoother layer interfaces that well correlate to seismic data, while comparison with a global searching algorithm of simulated annealing (SA) in a watershed shows that though both methods deliver very similar good results, however, LCI algorithm presented in this paper runs much faster. The inversion results for the coal mine CSAMT survey show that a conductive water-bearing zone that was not revealed by the previous inversions has been identified by the LCI. This further demonstrates that the method presented in this paper works for CSAMT data inversion.
A data-management system for detailed areal interpretive data
Ferrigno, C.F.
1986-01-01
A data storage and retrieval system has been developed to organize and preserve areal interpretive data. This system can be used by any study where there is a need to store areal interpretive data that generally is presented in map form. This system provides the capability to grid areal interpretive data for input to groundwater flow models at any spacing and orientation. The data storage and retrieval system is designed to be used for studies that cover small areas such as counties. The system is built around a hierarchically structured data base consisting of related latitude-longitude blocks. The information in the data base can be stored at different levels of detail, with the finest detail being a block of 6 sec of latitude by 6 sec of longitude (approximately 0.01 sq mi). This system was implemented on a mainframe computer using a hierarchical data base management system. The computer programs are written in Fortran IV and PL/1. The design and capabilities of the data storage and retrieval system, and the computer programs that are used to implement the system are described. Supplemental sections contain the data dictionary, user documentation of the data-system software, changes that would need to be made to use this system for other studies, and information on the computer software tape. (Lantz-PTT)
Polarimetric radar data decomposition and interpretation
NASA Technical Reports Server (NTRS)
Sun, Guoqing; Ranson, K. Jon
1993-01-01
Significant efforts have been made to decompose polarimetric radar data into several simple scattering components. The components which are selected because of their physical significance can be used to classify SAR (Synthetic Aperture Radar) image data. If particular components can be related to forest parameters, inversion procedures may be developed to estimate these parameters from the scattering components. Several methods have been used to decompose an averaged Stoke's matrix or covariance matrix into three components representing odd (surface), even (double-bounce) and diffuse (volume) scatterings. With these decomposition techniques, phenomena, such as canopy-ground interactions, randomness of orientation, and size of scatters can be examined from SAR data. In this study we applied the method recently reported by van Zyl (1992) to decompose averaged backscattering covariance matrices extracted from JPL SAR images over forest stands in Maine, USA. These stands are mostly mixed stands of coniferous and deciduous trees. Biomass data have been derived from field measurements of DBH and tree density using allometric equations. The interpretation of the decompositions and relationships with measured stand biomass are presented in this paper.
Data Torturing and the Misuse of Statistical Tools
Abate, Marcey L.
1999-08-16
Statistical concepts, methods, and tools are often used in the implementation of statistical thinking. Unfortunately, statistical tools are all too often misused by not applying them in the context of statistical thinking that focuses on processes, variation, and data. The consequences of this misuse may be ''data torturing'' or going beyond reasonable interpretation of the facts due to a misunderstanding of the processes creating the data or the misinterpretation of variability in the data. In the hope of averting future misuse and data torturing, examples are provided where the application of common statistical tools, in the absence of statistical thinking, provides deceptive results by not adequately representing the underlying process and variability. For each of the examples, a discussion is provided on how applying the concepts of statistical thinking may have prevented the data torturing. The lessons learned from these examples will provide an increased awareness of the potential for many statistical methods to mislead and a better understanding of how statistical thinking broadens and increases the effectiveness of statistical tools.
Michigan Library Statistical Report, 1999 Edition. Reporting 1998 Statistical Data.
ERIC Educational Resources Information Center
Krefman, Naomi, Comp.; Dwyer, Molly, Comp.; Krueger, Beth, Comp.
This statistical report on Michigan's libraries presents data provided by academic libraries, public libraries, public library cooperatives, and those public libraries that that serve as regional or subregional outlets to provide services for blind and physically handicapped patrons. For academic libraries, data are compiled from the 1998 academic…
Data Integration for Interpretation of Near-Surface Geophysical Tomograms
NASA Astrophysics Data System (ADS)
Day-Lewis, F. D.; Singha, K.
2007-12-01
Traditionally, interpretation of geophysical tomograms for geologic structure or engineering properties has been either qualitative, or based on petrophysical or statistical mapping to convert tomograms of the geophysical parameter (e.g., seismic velocity, radar velocity, or electrical conductivity) to some hydraulic parameter or engineering property of interest (e.g., hydraulic conductivity, porosity, or shear strength). Standard approaches to petrophysical and statistical mapping do not account for variable geophysical resolution, and thus it is difficult to obtain reliable, quantitative estimates of hydrologic properties or to characterize hydrologic processes in situ. Recent research to understand the limitations of tomograms for quantitative estimation points to the need for data integration. We divide near-surface geophysical data integration into two categories: 'inversion-based' and 'post- inversion' approaches. The first category includes 'informed-inversion' strategies that integrate complementary information in the form of prior information; constraints; physically-based regularization or parameterization; or coupled inversion. Post-inversion approaches include probabilistic frameworks to map tomograms to models of engineering properties, while accounting for geophysical resolution, survey design, heterogeneity, and physical models for hydrologic processes. Here, we review recent research demonstrating the need for, and advantages of, data integration. We present examples of both inversion-based and post-inversion data integration to reduce uncertainty, improve interpretation of near-surface geophysical results, and produce more reliable predictive models.
Impact of Equity Models and Statistical Measures on Interpretations of Educational Reform
ERIC Educational Resources Information Center
Rodriguez, Idaykis; Brewe, Eric; Sawtelle, Vashti; Kramer, Laird H.
2012-01-01
We present three models of equity and show how these, along with the statistical measures used to evaluate results, impact interpretation of equity in education reform. Equity can be defined and interpreted in many ways. Most equity education reform research strives to achieve equity by closing achievement gaps between groups. An example is given…
Recent statistical methods for orientation data
NASA Technical Reports Server (NTRS)
Batschelet, E.
1972-01-01
The application of statistical methods for determining the areas of animal orientation and navigation are discussed. The method employed is limited to the two-dimensional case. Various tests for determining the validity of the statistical analysis are presented. Mathematical models are included to support the theoretical considerations and tables of data are developed to show the value of information obtained by statistical analysis.
Distributed data collection for a database of radiological image interpretations
NASA Astrophysics Data System (ADS)
Long, L. Rodney; Ostchega, Yechiam; Goh, Gin-Hua; Thoma, George R.
1997-01-01
The National Library of Medicine, in collaboration with the National Center for Health Statistics and the National Institute for Arthritis and Musculoskeletal and Skin Diseases, has built a system for collecting radiological interpretations for a large set of x-ray images acquired as part of the data gathered in the second National Health and Nutrition Examination Survey. This system is capable of delivering across the Internet 5- and 10-megabyte x-ray images to Sun workstations equipped with X Window based 2048 X 2560 image displays, for the purpose of having these images interpreted for the degree of presence of particular osteoarthritic conditions in the cervical and lumbar spines. The collected interpretations can then be stored in a database at the National Library of Medicine, under control of the Illustra DBMS. This system is a client/server database application which integrates (1) distributed server processing of client requests, (2) a customized image transmission method for faster Internet data delivery, (3) distributed client workstations with high resolution displays, image processing functions and an on-line digital atlas, and (4) relational database management of the collected data.
Shafieloo, Arman
2012-05-01
By introducing Crossing functions and hyper-parameters I show that the Bayesian interpretation of the Crossing Statistics [1] can be used trivially for the purpose of model selection among cosmological models. In this approach to falsify a cosmological model there is no need to compare it with other models or assume any particular form of parametrization for the cosmological quantities like luminosity distance, Hubble parameter or equation of state of dark energy. Instead, hyper-parameters of Crossing functions perform as discriminators between correct and wrong models. Using this approach one can falsify any assumed cosmological model without putting priors on the underlying actual model of the universe and its parameters, hence the issue of dark energy parametrization is resolved. It will be also shown that the sensitivity of the method to the intrinsic dispersion of the data is small that is another important characteristic of the method in testing cosmological models dealing with data with high uncertainties.
Statistics for characterizing data on the periphery
Theiler, James P; Hush, Donald R
2010-01-01
We introduce a class of statistics for characterizing the periphery of a distribution, and show that these statistics are particularly valuable for problems in target detection. Because so many detection algorithms are rooted in Gaussian statistics, we concentrate on ellipsoidal models of high-dimensional data distributions (that is to say: covariance matrices), but we recommend several alternatives to the sample covariance matrix that more efficiently model the periphery of a distribution, and can more effectively detect anomalous data samples.
Barber, Chris; Cayley, Alex; Hanser, Thierry; Harding, Alex; Heghes, Crina; Vessey, Jonathan D; Werner, Stephane; Weiner, Sandy K; Wichard, Joerg; Giddings, Amanda; Glowienke, Susanne; Parenty, Alexis; Brigo, Alessandro; Spirkl, Hans-Peter; Amberg, Alexander; Kemper, Ray; Greene, Nigel
2016-04-01
The relative wealth of bacterial mutagenicity data available in the public literature means that in silico quantitative/qualitative structure activity relationship (QSAR) systems can readily be built for this endpoint. A good means of evaluating the performance of such systems is to use private unpublished data sets, which generally represent a more distinct chemical space than publicly available test sets and, as a result, provide a greater challenge to the model. However, raw performance metrics should not be the only factor considered when judging this type of software since expert interpretation of the results obtained may allow for further improvements in predictivity. Enough information should be provided by a QSAR to allow the user to make general, scientifically-based arguments in order to assess and overrule predictions when necessary. With all this in mind, we sought to validate the performance of the statistics-based in vitro bacterial mutagenicity prediction system Sarah Nexus (version 1.1) against private test data sets supplied by nine different pharmaceutical companies. The results of these evaluations were then analysed in order to identify findings presented by the model which would be useful for the user to take into consideration when interpreting the results and making their final decision about the mutagenic potential of a given compound. PMID:26708083
NATIONAL VITAL STATISTICS SYSTEM - MORTALITY DATA
In the United States, State laws require death certificates to be completed for all deaths, and Federal law mandates national collection and publication of deaths and other vital statistics data. The National Vital Statistics System, the Federal compilation of this data, is the r...
Data explorer: a prototype expert system for statistical analysis.
Aliferis, C.; Chao, E.; Cooper, G. F.
1993-01-01
The inadequate analysis of medical research data, due mainly to the unavailability of local statistical expertise, seriously jeopardizes the quality of new medical knowledge. Data Explorer is a prototype Expert System that builds on the versatility and power of existing statistical software, to provide automatic analyses and interpretation of medical data. The system draws much of its power by using belief network methods in place of more traditional, but difficult to automate, classical multivariate statistical techniques. Data Explorer identifies statistically significant relationships among variables, and using power-size analysis, belief network inference/learning and various explanatory techniques helps the user understand the importance of the findings. Finally the system can be used as a tool for the automatic development of predictive/diagnostic models from patient databases. PMID:8130501
Data Mining: Going beyond Traditional Statistics
ERIC Educational Resources Information Center
Zhao, Chun-Mei; Luan, Jing
2006-01-01
The authors provide an overview of data mining, giving special attention to the relationship between data mining and statistics to unravel some misunderstandings about the two techniques. (Contains 1 figure.)
... to Other Websites Information For... Media Policy Makers Data & Statistics Language: English Español (Spanish) Recommend on Facebook Tweet Share Compartir * The data on this page are from the article, “Venous ...
Statistical analysis of scintillation data
Chua, S.; Noonan, J.P.; Basu, S.
1981-09-01
The Nakagami-m distribution has traditionally been used successfully to model the probability characteristics of ionospheric scintillations at UHF. This report investigates the distribution properties of scintillation data in the L-band range. Specifically, the appropriateness of the Nakagami-m and lognormal distributions is tested. Briefly the results confirm that the Nakagami-m is appropriate for UHF but not for L-band scintillations. The lognormal provides a better fit to the distribution of L-band scintillations and is an adequate model allowing for an error of + or - 0.1 or smaller in predicted probability with a sample size of 256.
NASA Astrophysics Data System (ADS)
Karuppiah, R.; Faldi, A.; Laurenzi, I.; Usadi, A.; Venkatesh, A.
2014-12-01
An increasing number of studies are focused on assessing the environmental footprint of different products and processes, especially using life cycle assessment (LCA). This work shows how combining statistical methods and Geographic Information Systems (GIS) with environmental analyses can help improve the quality of results and their interpretation. Most environmental assessments in literature yield single numbers that characterize the environmental impact of a process/product - typically global or country averages, often unchanging in time. In this work, we show how statistical analysis and GIS can help address these limitations. For example, we demonstrate a method to separately quantify uncertainty and variability in the result of LCA models using a power generation case study. This is important for rigorous comparisons between the impacts of different processes. Another challenge is lack of data that can affect the rigor of LCAs. We have developed an approach to estimate environmental impacts of incompletely characterized processes using predictive statistical models. This method is applied to estimate unreported coal power plant emissions in several world regions. There is also a general lack of spatio-temporal characterization of the results in environmental analyses. For instance, studies that focus on water usage do not put in context where and when water is withdrawn. Through the use of hydrological modeling combined with GIS, we quantify water stress on a regional and seasonal basis to understand water supply and demand risks for multiple users. Another example where it is important to consider regional dependency of impacts is when characterizing how agricultural land occupation affects biodiversity in a region. We developed a data-driven methodology used in conjuction with GIS to determine if there is a statistically significant difference between the impacts of growing different crops on different species in various biomes of the world.
Statistical Data Analyses of Trace Chemical, Biochemical, and Physical Analytical Signatures
Udey, Ruth Norma
2013-01-01
Analytical and bioanalytical chemistry measurement results are most meaningful when interpreted using rigorous statistical treatments of the data. The same data set may provide many dimensions of information depending on the questions asked through the applied statistical methods. Three principal projects illustrated the wealth of information gained through the application of statistical data analyses to diverse problems.
NASA Astrophysics Data System (ADS)
Kuić, Domagoj
2016-05-01
In this paper an alternative approach to statistical mechanics based on the maximum information entropy principle (MaxEnt) is examined, specifically its close relation with the Gibbs method of ensembles. It is shown that the MaxEnt formalism is the logical extension of the Gibbs formalism of equilibrium statistical mechanics that is entirely independent of the frequentist interpretation of probabilities only as factual (i.e. experimentally verifiable) properties of the real world. Furthermore, we show that, consistently with the law of large numbers, the relative frequencies of the ensemble of systems prepared under identical conditions (i.e. identical constraints) actually correspond to the MaxEnt probabilites in the limit of a large number of systems in the ensemble. This result implies that the probabilities in statistical mechanics can be interpreted, independently of the frequency interpretation, on the basis of the maximum information entropy principle.
Transit Spectroscopy: new data analysis techniques and interpretation
NASA Astrophysics Data System (ADS)
Tinetti, Giovanna; Waldmann, Ingo P.; Morello, Giuseppe; Tessenyi, Marcell; Varley, Ryan; Barton, Emma; Yurchenko, Sergey; Tennyson, Jonathan; Hollis, Morgan
2014-11-01
Planetary science beyond the boundaries of our Solar System is today in its infancy. Until a couple of decades ago, the detailed investigation of the planetary properties was restricted to objects orbiting inside the Kuiper Belt. Today, we cannot ignore that the number of known planets has increased by two orders of magnitude nor that these planets resemble anything but the objects present in our own Solar System. A key observable for planets is the chemical composition and state of their atmosphere. To date, two methods can be used to sound exoplanetary atmospheres: transit and eclipse spectroscopy, and direct imaging spectroscopy. Although the field of exoplanet spectroscopy has been very successful in past years, there are a few serious hurdles that need to be overcome to progress in this area: in particular instrument systematics are often difficult to disentangle from the signal, data are sparse and often not recorded simultaneously causing degeneracy of interpretation. We will present here new data analysis techniques and interpretation developed by the “ExoLights” team at UCL to address the above-mentioned issues. Said techniques include statistical tools, non-parametric, machine-learning algorithms, optimized radiative transfer models and spectroscopic line-lists. These new tools have been successfully applied to existing data recorded with space and ground instruments, shedding new light on our knowledge and understanding of these alien worlds.
Statistical Data Analysis in the Computer Age
NASA Astrophysics Data System (ADS)
Efron, Bradley; Tibshirani, Robert
1991-07-01
Most of our familiar statistical methods, such as hypothesis testing, linear regression, analysis of variance, and maximum likelihood estimation, were designed to be implemented on mechanical calculators. modern electronic computation has encouraged a host of new statistical methods that require fewer distributional assumptions than their predecessors and can be applied to more complicated statistical estimators. These methods allow the scientist to explore and describe data and draw valid statistical inferences without the usual concerns for mathematical tractability. This is possible because traditional methods of mathematical analysis are replaced by specially constructed computer algorithms. Mathematics has not disappeared from statistical theory. It is the main method for deciding which algorithms are correct and efficient tools for automating statistical inference.
Topology for statistical modeling of petascale data.
Pascucci, Valerio; Mascarenhas, Ajith Arthur; Rusek, Korben; Bennett, Janine Camille; Levine, Joshua; Pebay, Philippe Pierre; Gyulassy, Attila; Thompson, David C.; Rojas, Joseph Maurice
2011-07-01
This document presents current technical progress and dissemination of results for the Mathematics for Analysis of Petascale Data (MAPD) project titled 'Topology for Statistical Modeling of Petascale Data', funded by the Office of Science Advanced Scientific Computing Research (ASCR) Applied Math program. Many commonly used algorithms for mathematical analysis do not scale well enough to accommodate the size or complexity of petascale data produced by computational simulations. The primary goal of this project is thus to develop new mathematical tools that address both the petascale size and uncertain nature of current data. At a high level, our approach is based on the complementary techniques of combinatorial topology and statistical modeling. In particular, we use combinatorial topology to filter out spurious data that would otherwise skew statistical modeling techniques, and we employ advanced algorithms from algebraic statistics to efficiently find globally optimal fits to statistical models. This document summarizes the technical advances we have made to date that were made possible in whole or in part by MAPD funding. These technical contributions can be divided loosely into three categories: (1) advances in the field of combinatorial topology, (2) advances in statistical modeling, and (3) new integrated topological and statistical methods.
HistFitter software framework for statistical data analysis
NASA Astrophysics Data System (ADS)
Baak, M.; Besjes, G. J.; Côté, D.; Koutsman, A.; Lorenz, J.; Short, D.
2015-04-01
We present a software framework for statistical data analysis, called HistFitter, that has been used extensively by the ATLAS Collaboration to analyze big datasets originating from proton-proton collisions at the Large Hadron Collider at CERN. Since 2012 HistFitter has been the standard statistical tool in searches for supersymmetric particles performed by ATLAS. HistFitter is a programmable and flexible framework to build, book-keep, fit, interpret and present results of data models of nearly arbitrary complexity. Starting from an object-oriented configuration, defined by users, the framework builds probability density functions that are automatically fit to data and interpreted with statistical tests. Internally HistFitter uses the statistics packages RooStats and HistFactory. A key innovation of HistFitter is its design, which is rooted in analysis strategies of particle physics. The concepts of control, signal and validation regions are woven into its fabric. These are progressively treated with statistically rigorous built-in methods. Being capable of working with multiple models at once that describe the data, HistFitter introduces an additional level of abstraction that allows for easy bookkeeping, manipulation and testing of large collections of signal hypotheses. Finally, HistFitter provides a collection of tools to present results with publication quality style through a simple command-line interface.
Statistical Tools for the Interpretation of Enzootic West Nile virus Transmission Dynamics.
Caillouët, Kevin A; Robertson, Suzanne
2016-01-01
Interpretation of enzootic West Nile virus (WNV) surveillance indicators requires little advanced mathematical skill, but greatly enhances the ability of public health officials to prescribe effective WNV management tactics. Stepwise procedures for the calculation of mosquito infection rates (IR) and vector index (VI) are presented alongside statistical tools that require additional computation. A brief review of advantages and important considerations for each statistic's use is provided. PMID:27188561
Extracting meaningful information from metabonomic data using multivariate statistics.
Bylesjö, Max
2015-01-01
Metabonomics aims to identify and quantify all small-molecule metabolites in biologically relevant samples using high-throughput techniques such as NMR and chromatography/mass spectrometry. This generates high-dimensional data sets with properties that require specialized approaches to data analysis. This chapter describes multivariate statistics and analysis tools to extract meaningful information from metabonomic data sets. The focus is on the use and interpretation of latent variable methods such as principal component analysis (PCA), partial least squares/projections to latent structures (PLS), and orthogonal PLS (OPLS). Descriptions of the key steps of the multivariate data analyses are provided with demonstrations from example data. PMID:25677152
Statistical treatment of fatigue test data
Raske, D.T.
1980-01-01
This report discussed several aspects of fatigue data analysis in order to provide a basis for the development of statistically sound design curves. Included is a discussion on the choice of the dependent variable, the assumptions associated with least squares regression models, the variability of fatigue data, the treatment of data from suspended tests and outlying observations, and various strain-life relations.
Seasonal variations of decay rate measurement data and their interpretation.
Schrader, Heinrich
2016-08-01
Measurement data of long-lived radionuclides, for example, (85)Kr, (90)Sr, (108m)Ag, (133)Ba, (152)Eu, (154)Eu and (226)Ra, and particularly the relative residuals of fitted raw data from current measurements of ionization chambers for half-life determination show small periodic seasonal variations with amplitudes of about 0.15%. The interpretation of these fluctuations is a matter of controversy whether the observed effect is produced by some interaction with the radionuclides themselves or is an artifact of the measuring chain. At the origin of such a discussion there is the exponential decay law of radioactive substances used for data fitting, one of the fundamentals of nuclear physics. Some groups of physicists use statistical methods and analyze correlations with various parameters of the measurement data and, for example, the Earth-Sun distance, as a basis of interpretation. In this article, data measured at the Physikalisch-Technische Bundesanstalt and published earlier are the subject of a correlation analysis using the corresponding time series of data with varying measurement conditions. An overview of these measurement conditions producing instrument instabilities is given and causality relations are discussed. The resulting correlation coefficients for various series of the same radionuclide using similar measurement conditions are in the order of 0.7, which indicates a high correlation, and for series of the same radionuclide using different measurement conditions and changes of the measuring chain of the order of -0.2 or even lower, which indicates an anti-correlation. These results provide strong arguments that the observed seasonal variations are caused by the measuring chain and, in particular, by the type of measuring electronics used. PMID:27258217
A spatial scan statistic for multinomial data
Jung, Inkyung; Kulldorff, Martin; Richard, Otukei John
2014-01-01
As a geographical cluster detection analysis tool, the spatial scan statistic has been developed for different types of data such as Bernoulli, Poisson, ordinal, exponential and normal. Another interesting data type is multinomial. For example, one may want to find clusters where the disease-type distribution is statistically significantly different from the rest of the study region when there are different types of disease. In this paper, we propose a spatial scan statistic for such data, which is useful for geographical cluster detection analysis for categorical data without any intrinsic order information. The proposed method is applied to meningitis data consisting of five different disease categories to identify areas with distinct disease-type patterns in two counties in the U.K. The performance of the method is evaluated through a simulation study. PMID:20680984
Interpreting Statistical Significance Test Results: A Proposed New "What If" Method.
ERIC Educational Resources Information Center
Kieffer, Kevin M.; Thompson, Bruce
As the 1994 publication manual of the American Psychological Association emphasized, "p" values are affected by sample size. As a result, it can be helpful to interpret the results of statistical significant tests in a sample size context by conducting so-called "what if" analyses. However, these methods can be inaccurate unless "corrected" effect…
Dotto, G L; Pinto, L A A; Hachicha, M A; Knani, S
2015-03-15
In this work, statistical physics treatment was employed to study the adsorption of food dyes onto chitosan films, in order to obtain new physicochemical interpretations at molecular level. Experimental equilibrium curves were obtained for the adsorption of four dyes (FD&C red 2, FD&C yellow 5, FD&C blue 2, Acid Red 51) at different temperatures (298, 313 and 328 K). A statistical physics formula was used to interpret these curves, and the parameters such as, number of adsorbed dye molecules per site (n), anchorage number (n'), receptor sites density (NM), adsorbed quantity at saturation (N asat), steric hindrance (τ), concentration at half saturation (c1/2) and molar adsorption energy (ΔE(a)) were estimated. The relation of the above mentioned parameters with the chemical structure of the dyes and temperature was evaluated and interpreted. PMID:25308634
On the Interpretation of Running Trends as Summary Statistics for Time Series Analysis
NASA Astrophysics Data System (ADS)
Vigo, Isabel M.; Trottini, Mario; Belda, Santiago
2016-04-01
In recent years, running trends analysis (RTA) has been widely used in climate applied research as summary statistics for time series analysis. There is no doubt that RTA might be a useful descriptive tool, but despite its general use in applied research, precisely what it reveals about the underlying time series is unclear and, as a result, its interpretation is unclear too. This work contributes to such interpretation in two ways: 1) an explicit formula is obtained for the set of time series with a given series of running trends, making it possible to show that running trends, alone, perform very poorly as summary statistics for time series analysis; and 2) an equivalence is established between RTA and the estimation of a (possibly nonlinear) trend component of the underlying time series using a weighted moving average filter. Such equivalence provides a solid ground for RTA implementation and interpretation/validation.
HistFitter - A flexible framework for statistical data analysis
NASA Astrophysics Data System (ADS)
Lorenz, J. M.; Baak, M.; Besjes, G. J.; Côté, D.; Koutsman, A.; Short, D.
2015-05-01
We present a software framework for statistical data analysis, called HistFitter, that has extensively been used in the ATLAS Collaboration to analyze data of proton-proton collisions produced by the Large Hadron Collider at CERN. Most notably, HistFitter has become a de-facto standard in searches for supersymmetric particles since 2012, with some usage for Exotic and Higgs boson physics. HistFitter coherently combines several statistics tools in a programmable and flexible framework that is capable of bookkeeping hundreds of data models under study using thousands of generated input histograms. The key innovations of HistFitter are to weave the concepts of control, validation and signal regions into its very fabric, and to treat them with rigorous statistical methods, while providing multiple tools to visualize and interpret the results through a simple configuration interface.
Revisiting the statistical analysis of pyroclast density and porosity data
NASA Astrophysics Data System (ADS)
Bernard, B.; Kueppers, U.; Ortiz, H.
2015-07-01
Explosive volcanic eruptions are commonly characterized based on a thorough analysis of the generated deposits. Amongst other characteristics in physical volcanology, density and porosity of juvenile clasts are some of the most frequently used to constrain eruptive dynamics. In this study, we evaluate the sensitivity of density and porosity data to statistical methods and introduce a weighting parameter to correct issues raised by the use of frequency analysis. Results of textural investigation can be biased by clast selection. Using statistical tools as presented here, the meaningfulness of a conclusion can be checked for any data set easily. This is necessary to define whether or not a sample has met the requirements for statistical relevance, i.e. whether a data set is large enough to allow for reproducible results. Graphical statistics are used to describe density and porosity distributions, similar to those used for grain-size analysis. This approach helps with the interpretation of volcanic deposits. To illustrate this methodology, we chose two large data sets: (1) directed blast deposits of the 3640-3510 BC eruption of Chachimbiro volcano (Ecuador) and (2) block-and-ash-flow deposits of the 1990-1995 eruption of Unzen volcano (Japan). We propose the incorporation of this analysis into future investigations to check the objectivity of results achieved by different working groups and guarantee the meaningfulness of the interpretation.
Statistical data of the uranium industry
1983-01-01
This report is a compendium of information relating to US uranium reserves and potential resources and to exploration, mining, milling, and other activities of the uranium industry through 1982. The statistics are based primarily on data provided voluntarily by the uranium exploration, mining and milling companies. The compendium has been published annually since 1968 and reflects the basic programs of the Grand Junction Area Office of the US Department of Energy. Statistical data obtained from surveys conducted by the Energy Information Administration are included in Section IX. The production, reserves, and drilling data are reported in a manner which avoids disclosure of proprietary information.
Vocational Education Statistical Data Plans and Programs.
ERIC Educational Resources Information Center
Schwartz, Mark
This document provides information on the Data on Vocational Education (DOVE) plan, which has provided the National Center for Education Statistics (NCES) with a framework on which a viable data collection and dissemination program is being constructed for vocational education. A section on the status of DOVE discusses the attainment of the…
Topology for Statistical Modeling of Petascale Data.
Bennett, Janine Camille; Pebay, Philippe Pierre; Pascucci, Valerio; Levine, Joshua; Gyulassy, Attila; Rojas, Joseph Maurice
2014-07-01
This document presents current technical progress and dissemination of results for the Mathematics for Analysis of Petascale Data (MAPD) project titled "Topology for Statistical Modeling of Petascale Data", funded by the Office of Science Advanced Scientific Computing Research (ASCR) Applied Math program.
Interpretation of remotely sensed data and its applications in oceanography
NASA Technical Reports Server (NTRS)
Parada, N. D. J. (Principal Investigator); Tanaka, K.; Inostroza, H. M.; Verdesio, J. J.
1982-01-01
The methodology of interpretation of remote sensing data and its oceanographic applications are described. The elements of image interpretation for different types of sensors are discussed. The sensors utilized are the multispectral scanner of LANDSAT, and the thermal infrared of NOAA and geostationary satellites. Visual and automatic data interpretation in studies of pollution, the Brazil current system, and upwelling along the southeastern Brazilian coast are compared.
Analysis and Interpretation of Financial Data.
ERIC Educational Resources Information Center
Robinson, Daniel D.
1975-01-01
Understanding the financial reports of colleges and universities has long been a problem because of the lack of comparability of the data presented. Recently, there has been a move to agree on uniform standards for financial accounting and reporting for the field of higher education. In addition to comparable data, the efforts to make financial…
The Top 100: Interpreting the Data.
ERIC Educational Resources Information Center
Borden, Victor M. H.
1999-01-01
The sources and structure of data reported in the annual "Top 100" list of colleges and universities conferring the highest numbers of degrees to students of color are described, and the use of the data to make comparisons between historically black and traditionally white institutions is explained. Some trends in the eight-year history of the…
Engine Data Interpretation System (EDIS), phase 2
NASA Technical Reports Server (NTRS)
Cost, Thomas L.; Hofmann, Martin O.
1991-01-01
A prototype of an expert system was developed which applies qualitative constraint-based reasoning to the task of post-test analysis of data resulting from a rocket engine firing. Data anomalies are detected and corresponding faults are diagnosed. Engine behavior is reconstructed using measured data and knowledge about engine behavior. Knowledge about common faults guides but does not restrict the search for the best explanation in terms of hypothesized faults. The system contains domain knowledge about the behavior of common rocket engine components and was configured for use with the Space Shuttle Main Engine (SSME). A graphical user interface allows an expert user to intimately interact with the system during diagnosis. The system was applied to data taken during actual SSME tests where data anomalies were observed.
Redman-MacLaren, Michelle; Mills, Jane; Tommbe, Rachael
2014-01-01
Background Participatory approaches to qualitative research practice constantly change in response to evolving research environments. Researchers are increasingly encouraged to undertake secondary analysis of qualitative data, despite epistemological and ethical challenges. Interpretive focus groups can be described as a more participative method for groups to analyse qualitative data. Objective To facilitate interpretive focus groups with women in Papua New Guinea to extend analysis of existing qualitative data and co-create new primary data. The purpose of this was to inform a transformational grounded theory and subsequent health promoting action. Design A two-step approach was used in a grounded theory study about how women experience male circumcision in Papua New Guinea. Participants analysed portions or ‘chunks’ of existing qualitative data in story circles and built upon this analysis by using the visual research method of storyboarding. Results New understandings of the data were evoked when women in interpretive focus groups analysed the data ‘chunks’. Interpretive focus groups encouraged women to share their personal experiences about male circumcision. The visual method of storyboarding enabled women to draw pictures to represent their experiences. This provided an additional focus for whole-of-group discussions about the research topic. Conclusions Interpretive focus groups offer opportunity to enhance trustworthiness of findings when researchers undertake secondary analysis of qualitative data. The co-analysis of existing data and co-generation of new data between research participants and researchers informed an emergent transformational grounded theory and subsequent health promoting action. PMID:25138532
Statistical data of the uranium industry
1981-01-01
Data are presented on US uranium reserves, potential resources, exploration, mining, drilling, milling, and other activities of the uranium industry through 1980. The compendium reflects the basic programs of the Grand Junction Office. Statistics are based primarily on information provided by the uranium exploration, mining, and milling companies. Data on commercial U/sub 3/O/sub 8/ sales and purchases are included. Data on non-US uranium production and resources are presented in the appendix. (DMC)
Statistical data of the uranium industry
1982-01-01
Statistical Data of the Uranium Industry is a compendium of information relating to US uranium reserves and potential resources and to exploration, mining, milling, and other activities of the uranium industry through 1981. The statistics are based primarily on data provided voluntarily by the uranium exploration, mining, and milling companies. The compendium has been published annually since 1968 and reflects the basic programs of the Grand Junction Area Office (GJAO) of the US Department of Energy. The production, reserves, and drilling information is reported in a manner which avoids disclosure of proprietary information.
The Top 100: Interpreting the Data.
ERIC Educational Resources Information Center
Borden, Victor M. H.
1999-01-01
The sources and structure of data reported in the annual "Top 100" list of colleges and universities conferring the highest numbers of degrees to students of color are described, including the way in which various student categories are reported. (MSE)
Telemetry Boards Interpret Rocket, Airplane Engine Data
NASA Technical Reports Server (NTRS)
2009-01-01
For all the data gathered by the space shuttle while in orbit, NASA engineers are just as concerned about the information it generates on the ground. From the moment the shuttle s wheels touch the runway to the break of its electrical umbilical cord at 0.4 seconds before its next launch, sensors feed streams of data about the status of the vehicle and its various systems to Kennedy Space Center s shuttle crews. Even while the shuttle orbiter is refitted in Kennedy s orbiter processing facility, engineers constantly monitor everything from power levels to the testing of the mechanical arm in the orbiter s payload bay. On the launch pad and up until liftoff, the Launch Control Center, attached to the large Vehicle Assembly Building, screens all of the shuttle s vital data. (Once the shuttle clears its launch tower, this responsibility shifts to Mission Control at Johnson Space Center, with Kennedy in a backup role.) Ground systems for satellite launches also generate significant amounts of data. At Cape Canaveral Air Force Station, across the Banana River from Kennedy s location on Merritt Island, Florida, NASA rockets carrying precious satellite payloads into space flood the Launch Vehicle Data Center with sensor information on temperature, speed, trajectory, and vibration. The remote measurement and transmission of systems data called telemetry is essential to ensuring the safe and successful launch of the Agency s space missions. When a launch is unsuccessful, as it was for this year s Orbiting Carbon Observatory satellite, telemetry data also provides valuable clues as to what went wrong and how to remedy any problems for future attempts. All of this information is streamed from sensors in the form of binary code: strings of ones and zeros. One small company has partnered with NASA to provide technology that renders raw telemetry data intelligible not only for Agency engineers, but also for those in the private sector.
Component fragilities. Data collection, analysis and interpretation
Bandyopadhyay, K.K.; Hofmayer, C.H.
1985-01-01
As part of the component fragility research program sponsored by the US NRC, BNL is involved in establishing seismic fragility levels for various nuclear power plant equipment with emphasis on electrical equipment. To date, BNL has reviewed approximately seventy test reports to collect fragility or high level test data for switchgears, motor control centers and similar electrical cabinets, valve actuators and numerous electrical and control devices, e.g., switches, transmitters, potentiometers, indicators, relays, etc., of various manufacturers and models. BNL has also obtained test data from EPRI/ANCO. Analysis of the collected data reveals that fragility levels can best be described by a group of curves corresponding to various failure modes. The lower bound curve indicates the initiation of malfunctioning or structural damage, whereas the upper bound curve corresponds to overall failure of the equipment based on known failure modes occurring separately or interactively. For some components, the upper and lower bound fragility levels are observed to vary appreciably depending upon the manufacturers and models. For some devices, testing even at the shake table vibration limit does not exhibit any failure. Failure of a relay is observed to be a frequent cause of failure of an electrical panel or a system. An extensive amount of additional fregility or high level test data exists.
The Top 100: Interpreting the Data.
ERIC Educational Resources Information Center
Borden, Victor M. H.
1998-01-01
Using data from federal surveys, the colleges and universities conferring the largest number of degrees on students of color are ranked. Tables include total minority degrees (bachelor's and associate) awarded; individual minority groups (African Americans, Hispanics, Asians, Native Americans); and individual disciplines (life sciences, business…
Phenomenological approach to scatterometer data interpretation
NASA Technical Reports Server (NTRS)
Alzofon, F. E.
1970-01-01
A graphic method of analyzing radar scatterometer sea clutter data leading to linear relations between scattering cross sections and tan angle of incidence of the radiation is proposed. This relation permits formulation of simple analytic relations without reference to the ocean surface spectrum. Parameters introduced depend on the wavelength of the incident radiation and its polarization, and on wind and sea states. The simplicity of the expressions derived suggests a corresponding simplicity in the physical mechanism of radar sea clutter return.
MICROARRAY DATA ANALYSIS USING MULTIPLE STATISTICAL MODELS
Microarray Data Analysis Using Multiple Statistical Models
Wenjun Bao1, Judith E. Schmid1, Amber K. Goetz1, Ming Ouyang2, William J. Welsh2,Andrew I. Brooks3,4, ChiYi Chu3,Mitsunori Ogihara3,4, Yinhe Cheng5, David J. Dix1. 1National Health and Environmental Effects Researc...
Stratigraphic interpretation of seismic data on the workstation
Bahorich, M.; Van Bemmel, P.
1994-12-31
Until recently, interpretation of seismic data in the workstation environment has been restricted primarily to horizon and attribute maps. Interpreters have not had the ability to make various types of notations on seismic data and subsequent map views as has been done for years on paper. New thinking in the industry is leading to the development of software which provides the geoscientist with a broader range of interpretive functionality on seismic and subsequent map views. This new functionality reduces the tedious bookkeeping tasks associated with seismic sequence stratigraphy and facies analysis. Interpreters may now perform stratigraphic analysis in more detail in less time by employing the power of the interpretive workstation. A data set over a deep-water fan illustrates the power of this technology.
Bayesian methods for interpreting plutonium urinalysis data
Miller, G.; Inkret, W.C.
1995-09-01
The authors discuss an internal dosimetry problem, where measurements of plutonium in urine are used to calculate radiation doses. The authors have developed an algorithm using the MAXENT method. The method gives reasonable results, however the role of the entropy prior distribution is to effectively fit the urine data using intakes occurring close in time to each measured urine result, which is unrealistic. A better approximation for the actual prior is the log-normal distribution; however, with the log-normal distribution another calculational approach must be used. Instead of calculating the most probable values, they turn to calculating expectation values directly from the posterior probability, which is feasible for a small number of intakes.
Impact of equity models and statistical measures on interpretations of educational reform
NASA Astrophysics Data System (ADS)
Rodriguez, Idaykis; Brewe, Eric; Sawtelle, Vashti; Kramer, Laird H.
2012-12-01
We present three models of equity and show how these, along with the statistical measures used to evaluate results, impact interpretation of equity in education reform. Equity can be defined and interpreted in many ways. Most equity education reform research strives to achieve equity by closing achievement gaps between groups. An example is given by the study by Lorenzo et al. that shows that interactive engagement methods lead to increased gender equity. In this paper, we reexamine the results of Lorenzo et al. through three models of equity. We find that interpretation of the results strongly depends on the model of equity chosen. Further, we argue that researchers must explicitly state their model of equity as well as use effect size measurements to promote clarity in education reform.
MSL DAN Passive Data and Interpretations
NASA Astrophysics Data System (ADS)
Tate, C. G.; Moersch, J.; Jun, I.; Ming, D. W.; Mitrofanov, I. G.; Litvak, M. L.; Behar, A.; Boynton, W. V.; Drake, D.; Lisov, D.; Mischna, M. A.; Hardgrove, C. J.; Milliken, R.; Sanin, A. B.; Starr, R. D.; Martín-Torres, J.; Zorzano, M. P.; Fedosov, F.; Golovin, D.; Harshman, K.; Kozyrev, A.; Malakhov, A. V.; Mokrousov, M.; Nikiforov, S.; Varenikov, A.
2014-12-01
In its passive mode of operation, The Mars Science Laboratory Dynamic Albedo of Neutrons experiment (DAN) detects low energy neutrons that are produced by two different sources on Mars. Neutrons are produced by the rover's Multi-Mission Radioisotope Thermoelectric Generator (MMRTG) and by interactions of high energy galactic cosmic rays (GCR) within the atmosphere and regolith. As these neutrons propagate through the subsurface, their energies can be moderated by interactions with hydrogen nuclei. More hydrogen leads to greater moderation (thermalization) of the neutron population energies. The presence of high thermal neutron absorbing elements within the regolith also complicates the spectrum of the returning neutron population, as shown by Hardgrove et al. DAN measures the thermal and epithermal neutron populations leaking from the surface to infer the amount of water equivalent hydrogen (WEH) in the shallow regolith. Extensive modeling is performed using a Monte Carlo approach (MCNPX) to analyze DAN passive measurements at fixed locations and along rover traverse segments. DAN passive WEH estimates along Curiosity's traverse will be presented along with an analysis of trends in the data and a description of correlations between these results and the geologic characteristics of the surfaces traversed.
Statistical Treatment of Looking-Time Data
2016-01-01
Looking times (LTs) are frequently measured in empirical research on infant cognition. We analyzed the statistical distribution of LTs across participants to develop recommendations for their treatment in infancy research. Our analyses focused on a common within-subject experimental design, in which longer looking to novel or unexpected stimuli is predicted. We analyzed data from 2 sources: an in-house set of LTs that included data from individual participants (47 experiments, 1,584 observations), and a representative set of published articles reporting group-level LT statistics (149 experiments from 33 articles). We established that LTs are log-normally distributed across participants, and therefore, should always be log-transformed before parametric statistical analyses. We estimated the typical size of significant effects in LT studies, which allowed us to make recommendations about setting sample sizes. We show how our estimate of the distribution of effect sizes of LT studies can be used to design experiments to be analyzed by Bayesian statistics, where the experimenter is required to determine in advance the predicted effect size rather than the sample size. We demonstrate the robustness of this method in both sets of LT experiments. PMID:26845505
Statistical treatment of looking-time data.
Csibra, Gergely; Hernik, Mikołaj; Mascaro, Olivier; Tatone, Denis; Lengyel, Máté
2016-04-01
Looking times (LTs) are frequently measured in empirical research on infant cognition. We analyzed the statistical distribution of LTs across participants to develop recommendations for their treatment in infancy research. Our analyses focused on a common within-subject experimental design, in which longer looking to novel or unexpected stimuli is predicted. We analyzed data from 2 sources: an in-house set of LTs that included data from individual participants (47 experiments, 1,584 observations), and a representative set of published articles reporting group-level LT statistics (149 experiments from 33 articles). We established that LTs are log-normally distributed across participants, and therefore, should always be log-transformed before parametric statistical analyses. We estimated the typical size of significant effects in LT studies, which allowed us to make recommendations about setting sample sizes. We show how our estimate of the distribution of effect sizes of LT studies can be used to design experiments to be analyzed by Bayesian statistics, where the experimenter is required to determine in advance the predicted effect size rather than the sample size. We demonstrate the robustness of this method in both sets of LT experiments. (PsycINFO Database Record PMID:26845505
Simultaneous Statistical Inference for Epigenetic Data
Schildknecht, Konstantin; Olek, Sven; Dickhaus, Thorsten
2015-01-01
Epigenetic research leads to complex data structures. Since parametric model assumptions for the distribution of epigenetic data are hard to verify we introduce in the present work a nonparametric statistical framework for two-group comparisons. Furthermore, epigenetic analyses are often performed at various genetic loci simultaneously. Hence, in order to be able to draw valid conclusions for specific loci, an appropriate multiple testing correction is necessary. Finally, with technologies available for the simultaneous assessment of many interrelated biological parameters (such as gene arrays), statistical approaches also need to deal with a possibly unknown dependency structure in the data. Our statistical approach to the nonparametric comparison of two samples with independent multivariate observables is based on recently developed multivariate multiple permutation tests. We adapt their theory in order to cope with families of hypotheses regarding relative effects. Our results indicate that the multivariate multiple permutation test keeps the pre-assigned type I error level for the global null hypothesis. In combination with the closure principle, the family-wise error rate for the simultaneous test of the corresponding locus/parameter-specific null hypotheses can be controlled. In applications we demonstrate that group differences in epigenetic data can be detected reliably with our methodology. PMID:25965389
Menzerath-Altmann Law: Statistical Mechanical Interpretation as Applied to a Linguistic Organization
NASA Astrophysics Data System (ADS)
Eroglu, Sertac
2014-10-01
The distribution behavior described by the empirical Menzerath-Altmann law is frequently encountered during the self-organization of linguistic and non-linguistic natural organizations at various structural levels. This study presents a statistical mechanical derivation of the law based on the analogy between the classical particles of a statistical mechanical organization and the distinct words of a textual organization. The derived model, a transformed (generalized) form of the Menzerath-Altmann model, was termed as the statistical mechanical Menzerath-Altmann model. The derived model allows interpreting the model parameters in terms of physical concepts. We also propose that many organizations presenting the Menzerath-Altmann law behavior, whether linguistic or not, can be methodically examined by the transformed distribution model through the properly defined structure-dependent parameter and the energy associated states.
78 FR 10166 - Access Interpreting; Transfer of Data
Federal Register 2010, 2011, 2012, 2013, 2014
2013-02-13
... From the Federal Register Online via the Government Publishing Office ENVIRONMENTAL PROTECTION AGENCY Access Interpreting; Transfer of Data AGENCY: Environmental Protection Agency (EPA). ACTION: Notice. SUMMARY: This notice announces that pesticide related information submitted to EPA's Office...
Statistical modeling of space shuttle environmental data
NASA Technical Reports Server (NTRS)
Tubbs, J. D.; Brewer, D. W.
1983-01-01
Statistical models which use a class of bivariate gamma distribution are examined. Topics discussed include: (1) the ratio of positively correlated gamma varieties; (2) a method to determine if unequal shape parameters are necessary in bivariate gamma distribution; (3) differential equations for modal location of a family of bivariate gamma distribution; and (4) analysis of some wind gust data using the analytical results developed for modeling application.
HistFitter: a flexible framework for statistical data analysis
NASA Astrophysics Data System (ADS)
Besjes, G. J.; Baak, M.; Côté, D.; Koutsman, A.; Lorenz, J. M.; Short, D.
2015-12-01
HistFitter is a software framework for statistical data analysis that has been used extensively in the ATLAS Collaboration to analyze data of proton-proton collisions produced by the Large Hadron Collider at CERN. Most notably, HistFitter has become a de-facto standard in searches for supersymmetric particles since 2012, with some usage for Exotic and Higgs boson physics. HistFitter coherently combines several statistics tools in a programmable and flexible framework that is capable of bookkeeping hundreds of data models under study using thousands of generated input histograms. HistFitter interfaces with the statistics tools HistFactory and RooStats to construct parametric models and to perform statistical tests of the data, and extends these tools in four key areas. The key innovations are to weave the concepts of control, validation and signal regions into the very fabric of HistFitter, and to treat these with rigorous methods. Multiple tools to visualize and interpret the results through a simple configuration interface are also provided.
Models to interpret bedform geometries from cross-bed data
Luthi, S.M. ); Banavar, J.R. ); Bayer, U. )
1990-03-01
Semi-elliptical and sinusoidal bedform crestlines were modeled with curvature and sinuosity as parameters. Both bedform crestlines are propagated at various angles of migration over a finite area of deposition. Two computational approaches are used, a statistical random sampling (Monte Carlo) technique over the area of the deposit, and an analytical method based on topology and differential geometry. The resulting foreset azimuth distributions provide a catalogue for a variety of situations. The resulting thickness distributions have a simple shape and can be combined with the azimuth distributions to constrain further the cross-strata geometry. Paleocurrent directions obtained by these models can differ substantially from other methods, especially for obliquely migrating low-curvature bedforms. Interpretation of foreset azimuth data from outcrops and wells can be done either by visual comparison with the catalogued distributions, or by iterative computational fits. Studied examples include eolian cross-strata from the Permian Rotliegendes in the North Sea, fluvial dunes from the Devonian in the Catskills (New York State), the Triassic Schilfsandstein (West Germany) and the Paleozoic-Jurassic of the Western Desert (Egypt), as well as recent tidal dunes from the German coast of the North Sea and tidal cross-strata from the Devonian Koblentquartzit (West Germany). In all cases the semi-elliptical bedform model gave a good fit to the data, suggesting that it may be applicable over a wide range of bedforms. The data from the Western Desert could only be explained by data scatter due to channel sinuosity combining with the scatter attributed to the ellipticity of the bedform crestlines. These models, therefore, may also allow simulations of some hierarchically structured bedforms.
A computer system for interpreting blood glucose data.
Deutsch, T; Gergely, T; Trunov, V
2004-10-01
This paper presents an overview on the design and implementation of a computer system for the interpretation of home monitoring data of diabetic patients. The comprehensive methodology covers the major information processing steps leading from raw data to a concise summary of what has happened between two subsequent visits. It includes techniques for summarising and interpreting data, checking for inconsistency, identifying and diagnosing metabolic problems and learning from patient data. Data interpretation focuses on extracting trend patterns and classifying/clustering daily blood glucose (BG) profiles. The software helps clinicians to explore data recorded before the main meals and bedtime, and to identify problems in the patient's metabolic control which should be addressed either by educating the patient and/or adjusting the current management regimen. PMID:15313541
Common misconceptions about data analysis and statistics.
Motulsky, Harvey J
2015-02-01
Ideally, any experienced investigator with the right tools should be able to reproduce a finding published in a peer-reviewed biomedical science journal. In fact, the reproducibility of a large percentage of published findings has been questioned. Undoubtedly, there are many reasons for this, but one reason may be that investigators fool themselves due to a poor understanding of statistical concepts. In particular, investigators often make these mistakes: (1) P-Hacking. This is when you reanalyze a data set in many different ways, or perhaps reanalyze with additional replicates, until you get the result you want. (2) Overemphasis on P values rather than on the actual size of the observed effect. (3) Overuse of statistical hypothesis testing, and being seduced by the word "significant". (4) Overreliance on standard errors, which are often misunderstood. PMID:25692012
Common misconceptions about data analysis and statistics.
Motulsky, Harvey J
2014-10-01
Ideally, any experienced investigator with the right tools should be able to reproduce a finding published in a peer-reviewed biomedical science journal. In fact, however, the reproducibility of a large percentage of published findings has been questioned. Undoubtedly, there are many reasons for this, but one reason may be that investigators fool themselves due to a poor understanding of statistical concepts. In particular, investigators often make these mistakes: 1) P-hacking, which is when you reanalyze a data set in many different ways, or perhaps reanalyze with additional replicates, until you get the result you want; 2) overemphasis on P values rather than on the actual size of the observed effect; 3) overuse of statistical hypothesis testing, and being seduced by the word "significant"; and 4) over-reliance on standard errors, which are often misunderstood. PMID:25204545
Common misconceptions about data analysis and statistics.
Motulsky, Harvey J
2014-11-01
Ideally, any experienced investigator with the right tools should be able to reproduce a finding published in a peer-reviewed biomedical science journal. In fact, the reproducibility of a large percentage of published findings has been questioned. Undoubtedly, there are many reasons for this, but one reason maybe that investigators fool themselves due to a poor understanding of statistical concepts. In particular, investigators often make these mistakes: 1. P-Hacking. This is when you reanalyze a data set in many different ways, or perhaps reanalyze with additional replicates, until you get the result you want. 2. Overemphasis on P values rather than on the actual size of the observed effect. 3. Overuse of statistical hypothesis testing, and being seduced by the word "significant". 4. Overreliance on standard errors, which are often misunderstood. PMID:25213136
The Statistical Literacy Needed to Interpret School Assessment Data
ERIC Educational Resources Information Center
Chick, Helen; Pierce, Robyn
2013-01-01
State-wide and national testing in areas such as literacy and numeracy produces reports containing graphs and tables illustrating school and individual performance. These are intended to inform teachers, principals, and education organisations about student and school outcomes, to guide change and improvement. Given the complexity of the…
Revisiting the statistical analysis of pyroclast density and porosity data
NASA Astrophysics Data System (ADS)
Bernard, B.; Kueppers, U.; Ortiz, H.
2015-03-01
Explosive volcanic eruptions are commonly characterized based on a thorough analysis of the generated deposits. Amongst other characteristics in physical volcanology, density and porosity of juvenile clasts are some of the most frequently used characteristics to constrain eruptive dynamics. In this study, we evaluate the sensitivity of density and porosity data and introduce a weighting parameter to correct issues raised by the use of frequency analysis. Results of textural investigation can be biased by clast selection. Using statistical tools as presented here, the meaningfulness of a conclusion can be checked for any dataset easily. This is necessary to define whether or not a sample has met the requirements for statistical relevance, i.e. whether a dataset is large enough to allow for reproducible results. Graphical statistics are used to describe density and porosity distributions, similar to those used for grain-size analysis. This approach helps with the interpretation of volcanic deposits. To illustrate this methodology we chose two large datasets: (1) directed blast deposits of the 3640-3510 BC eruption of Chachimbiro volcano (Ecuador) and (2) block-and-ash-flow deposits of the 1990-1995 eruption of Unzen volcano (Japan). We propose add the use of this analysis for future investigations to check the objectivity of results achieved by different working groups and guarantee the meaningfulness of the interpretation.
NASA Technical Reports Server (NTRS)
Shewhart, Mark
1991-01-01
Statistical Process Control (SPC) charts are one of several tools used in quality control. Other tools include flow charts, histograms, cause and effect diagrams, check sheets, Pareto diagrams, graphs, and scatter diagrams. A control chart is simply a graph which indicates process variation over time. The purpose of drawing a control chart is to detect any changes in the process signalled by abnormal points or patterns on the graph. The Artificial Intelligence Support Center (AISC) of the Acquisition Logistics Division has developed a hybrid machine learning expert system prototype which automates the process of constructing and interpreting control charts.
Energy statistics data finder. [Monograph; energy-related census data
Not Available
1980-08-01
Energy-related data collected by the Bureau of the Census covers economic and demographic areas and provides data on a regular basis to produce current estimates from survey programs. Series report numbers, a summary of subject content, geographic detail, and report frequency are identified under the following major publication title categories: Agriculture, Retail Trade, Wholesale Trade, Service Industries, Construction, Transportation, Enterprise Statistics, County Business Patterns, Foreign Trade, Governments, Manufacturers, Mineral Industries, 1980 Census of Population and Housing, Annual Housing Survey and Travel-to-Work Supplement, and Statistical Compendia. The data are also available on computer tapes, microfiche, and in special tabulations. (DCK)
Multivariate statistical mapping of spectroscopic imaging data.
Young, Karl; Govind, Varan; Sharma, Khema; Studholme, Colin; Maudsley, Andrew A; Schuff, Norbert
2010-01-01
For magnetic resonance spectroscopic imaging studies of the brain, it is important to measure the distribution of metabolites in a regionally unbiased way; that is, without restrictions to a priori defined regions of interest. Since magnetic resonance spectroscopic imaging provides measures of multiple metabolites simultaneously at each voxel, there is furthermore great interest in utilizing the multidimensional nature of magnetic resonance spectroscopic imaging for gains in statistical power. Voxelwise multivariate statistical mapping is expected to address both of these issues, but it has not been previously employed for spectroscopic imaging (SI) studies of brain. The aims of this study were to (1) develop and validate multivariate voxel-based statistical mapping for magnetic resonance spectroscopic imaging and (2) demonstrate that multivariate tests can be more powerful than univariate tests in identifying patterns of altered brain metabolism. Specifically, we compared multivariate to univariate tests in identifying known regional patterns in simulated data and regional patterns of metabolite alterations due to amyotrophic lateral sclerosis, a devastating brain disease of the motor neurons. PMID:19953514
Statistical challenges of high-dimensional data
Johnstone, Iain M.; Titterington, D. Michael
2009-01-01
Modern applications of statistical theory and methods can involve extremely large datasets, often with huge numbers of measurements on each of a comparatively small number of experimental units. New methodology and accompanying theory have emerged in response: the goal of this Theme Issue is to illustrate a number of these recent developments. This overview article introduces the difficulties that arise with high-dimensional data in the context of the very familiar linear statistical model: we give a taste of what can nevertheless be achieved when the parameter vector of interest is sparse, that is, contains many zero elements. We describe other ways of identifying low-dimensional subspaces of the data space that contain all useful information. The topic of classification is then reviewed along with the problem of identifying, from within a very large set, the variables that help to classify observations. Brief mention is made of the visualization of high-dimensional data and ways to handle computational problems in Bayesian analysis are described. At appropriate points, reference is made to the other papers in the issue. PMID:19805443
Computer Simulation of Incomplete-Data Interpretation Exercise.
ERIC Educational Resources Information Center
Robertson, Douglas Frederick
1987-01-01
Described is a computer simulation that was used to help general education students enrolled in a large introductory geology course. The purpose of the simulation is to learn to interpret incomplete data. Students design a plan to collect bathymetric data for an area of the ocean. Procedures used by the students and instructor are included.…
Customizable tool for ecological data entry, assessment, monitoring, and interpretation
Technology Transfer Automated Retrieval System (TEKTRAN)
The Database for Inventory, Monitoring and Assessment (DIMA) is a highly customizable tool for data entry, assessment, monitoring, and interpretation. DIMA is a Microsoft Access database that can easily be used without Access knowledge and is available at no cost. Data can be entered for common, nat...
Interpreting Survey Data to Inform Solid-Waste Education Programs
ERIC Educational Resources Information Center
McKeown, Rosalyn
2006-01-01
Few examples exist on how to use survey data to inform public environmental education programs. I suggest a process for interpreting statewide survey data with the four questions that give insights into local context and make it possible to gain insight into potential target audiences and community priorities. The four questions are: What…
Design, analysis, and interpretation of field quality-control data for water-sampling projects
Mueller, David K.; Schertz, Terry L.; Martin, Jeffrey D.; Sandstrom, Mark W.
2015-01-01
The report provides extensive information about statistical methods used to analyze quality-control data in order to estimate potential bias and variability in environmental data. These methods include construction of confidence intervals on various statistical measures, such as the mean, percentiles and percentages, and standard deviation. The methods are used to compare quality-control results with the larger set of environmental data in order to determine whether the effects of bias and variability might interfere with interpretation of these data. Examples from published reports are presented to illustrate how the methods are applied, how bias and variability are reported, and how the interpretation of environmental data can be qualified based on the quality-control analysis.
Securing cooperation from persons supplying statistical data.
AUBENQUE, M J; BLAIKLEY, R M; HARRIS, F F; LAL, R B; NEURDENBURG, M G; DE SHELLY HERNANDEZ, R
1954-01-01
Securing the co-operation of persons supplying information required for medical statistics is essentially a problem in human relations, and an understanding of the motivations, attitudes, and behaviour of the respondents is necessary.Before any new statistical survey is undertaken, it is suggested by Aubenque and Harris that a preliminary review be made so that the maximum use is made of existing information. Care should also be taken not to burden respondents with an overloaded questionnaire. Aubenque and Harris recommend simplified reporting. Complete population coverage is not necessary.Neurdenburg suggests that the co-operation and support of such organizations as medical associations and social security boards are important and that propaganda should be directed specifically to the groups whose co-operation is sought. Informal personal contacts are valuable and desirable, according to Blaikley, but may have adverse effects if the right kind of approach is not made.Financial payments as an incentive in securing co-operation are opposed by Neurdenburg, who proposes that only postage-free envelopes or similar small favours be granted. Blaikley and Harris, on the other hand, express the view that financial incentives may do much to gain the support of those required to furnish data; there are, however, other incentives, and full use should be made of the natural inclinations of respondents. Compulsion may be necessary in certain instances, but administrative rather than statutory measures should be adopted. Penalties, according to Aubenque, should be inflicted only when justified by imperative health requirements.The results of surveys should be made available as soon as possible to those who co-operated, and Aubenque and Harris point out that they should also be of practical value to the suppliers of the information.Greater co-operation can be secured from medical persons who have an understanding of the statistical principles involved; Aubenque and Neurdenburg
Weatherization Assistance Program - Background Data and Statistics
Eisenberg, Joel Fred
2010-03-01
This technical memorandum is intended to provide readers with information that may be useful in understanding the purposes, performance, and outcomes of the Department of Energy's (DOE's) Weatherization Assistance Program (Weatherization). Weatherization has been in operation for over thirty years and is the nation's largest single residential energy efficiency program. Its primary purpose, established by law, is 'to increase the energy efficiency of dwellings owned or occupied by low-income persons, reduce their total residential energy expenditures, and improve their health and safety, especially low-income persons who are particularly vulnerable such as the elderly, the handicapped, and children.' The American Reinvestment and Recovery Act PL111-5 (ARRA), passed and signed into law in February 2009, committed $5 Billion over two years to an expanded Weatherization Assistance Program. This has created substantial interest in the program, the population it serves, the energy and cost savings it produces, and its cost-effectiveness. This memorandum is intended to address the need for this kind of information. Statistically valid answers to many of the questions surrounding Weatherization and its performance require comprehensive evaluation of the program. DOE is undertaking precisely this kind of independent evaluation in order to ascertain program effectiveness and to improve its performance. Results of this evaluation effort will begin to emerge in late 2010 and 2011, but they require substantial time and effort. In the meantime, the data and statistics in this memorandum can provide reasonable and transparent estimates of key program characteristics. The memorandum is laid out in three sections. The first deals with some key characteristics describing low-income energy consumption and expenditures. The second section provides estimates of energy savings and energy bill reductions that the program can reasonably be presumed to be producing. The third section
Statistical atlas based extrapolation of CT data
NASA Astrophysics Data System (ADS)
Chintalapani, Gouthami; Murphy, Ryan; Armiger, Robert S.; Lepisto, Jyri; Otake, Yoshito; Sugano, Nobuhiko; Taylor, Russell H.; Armand, Mehran
2010-02-01
We present a framework to estimate the missing anatomical details from a partial CT scan with the help of statistical shape models. The motivating application is periacetabular osteotomy (PAO), a technique for treating developmental hip dysplasia, an abnormal condition of the hip socket that, if untreated, may lead to osteoarthritis. The common goals of PAO are to reduce pain, joint subluxation and improve contact pressure distribution by increasing the coverage of the femoral head by the hip socket. While current diagnosis and planning is based on radiological measurements, because of significant structural variations in dysplastic hips, a computer-assisted geometrical and biomechanical planning based on CT data is desirable to help the surgeon achieve optimal joint realignments. Most of the patients undergoing PAO are young females, hence it is usually desirable to minimize the radiation dose by scanning only the joint portion of the hip anatomy. These partial scans, however, do not provide enough information for biomechanical analysis due to missing iliac region. A statistical shape model of full pelvis anatomy is constructed from a database of CT scans. The partial volume is first aligned with the statistical atlas using an iterative affine registration, followed by a deformable registration step and the missing information is inferred from the atlas. The atlas inferences are further enhanced by the use of X-ray images of the patient, which are very common in an osteotomy procedure. The proposed method is validated with a leave-one-out analysis method. Osteotomy cuts are simulated and the effect of atlas predicted models on the actual procedure is evaluated.
Soil VisNIR chemometric performance statistics should be interpreted as random variables
NASA Astrophysics Data System (ADS)
Brown, David J.; Gasch, Caley K.; Poggio, Matteo; Morgan, Cristine L. S.
2015-04-01
Chemometric models are normally evaluated using performance statistics such as the Standard Error of Prediction (SEP) or the Root Mean Squared Error of Prediction (RMSEP). These statistics are used to evaluate the quality of chemometric models relative to other published work on a specific soil property or to compare the results from different processing and modeling techniques (e.g. Partial Least Squares Regression or PLSR and random forest algorithms). Claims are commonly made about the overall success of an application or the relative performance of different modeling approaches assuming that these performance statistics are fixed population parameters. While most researchers would acknowledge that small differences in performance statistics are not important, rarely are performance statistics treated as random variables. Given that we are usually comparing modeling approaches for general application, and given that the intent of VisNIR soil spectroscopy is to apply chemometric calibrations to larger populations than are included in our soil-spectral datasets, it is more appropriate to think of performance statistics as random variables with variation introduced through the selection of samples for inclusion in a given study and through the division of samples into calibration and validation sets (including spiking approaches). Here we look at the variation in VisNIR performance statistics for the following soil-spectra datasets: (1) a diverse US Soil Survey soil-spectral library with 3768 samples from all 50 states and 36 different countries; (2) 389 surface and subsoil samples taken from US Geological Survey continental transects; (3) the Texas Soil Spectral Library (TSSL) with 3000 samples; (4) intact soil core scans of Texas soils with 700 samples; (5) approximately 400 in situ scans from the Pacific Northwest region; and (6) miscellaneous local datasets. We find the variation in performance statistics to be surprisingly large. This has important
Accessing seismic data through geological interpretation: Challenges and solutions
NASA Astrophysics Data System (ADS)
Butler, R. W.; Clayton, S.; McCaffrey, B.
2008-12-01
Between them, the world's research programs, national institutions and corporations, especially oil and gas companies, have acquired substantial volumes of seismic reflection data. Although the vast majority are proprietary and confidential, significant data are released and available for research, including those in public data libraries. The challenge now is to maximise use of these data, by providing routes to seismic not simply on the basis of acquisition or processing attributes but via the geology they image. The Virtual Seismic Atlas (VSA: www.seismicatlas.org) meets this challenge by providing an independent, free-to-use community based internet resource that captures and shares the geological interpretation of seismic data globally. Images and associated documents are explicitly indexed by extensive metadata trees, using not only existing survey and geographical data but also the geology they portray. The solution uses a Documentum database interrogated through Endeca Guided Navigation, to search, discover and retrieve images. The VSA allows users to compare contrasting interpretations of clean data thereby exploring the ranges of uncertainty in the geometric interpretation of subsurface structure. The metadata structures can be used to link reports and published research together with other data types such as wells. And the VSA can link to existing data libraries. Searches can take different paths, revealing arrays of geological analogues, new datasets while providing entirely novel insights and genuine surprises. This can then drive new creative opportunities for research and training, and expose the contents of seismic data libraries to the world.
NASA Astrophysics Data System (ADS)
Jha, Sanjeev Kumar; Comunian, Alessandro; Mariethoz, Gregoire; Kelly, Bryce F. J.
2014-10-01
We develop a stochastic approach to construct channelized 3-D geological models constrained to borehole measurements as well as geological interpretation. The methodology is based on simple 2-D geologist-provided sketches of fluvial depositional elements, which are extruded in the 3rd dimension. Multiple-point geostatistics (MPS) is used to impair horizontal variability to the structures by introducing geometrical transformation parameters. The sketches provided by the geologist are used as elementary training images, whose statistical information is expanded through randomized transformations. We demonstrate the applicability of the approach by applying it to modeling a fluvial valley filling sequence in the Maules Creek catchment, Australia. The facies models are constrained to borehole logs, spatial information borrowed from an analogue and local orientations derived from the present-day stream networks. The connectivity in the 3-D facies models is evaluated using statistical measures and transport simulations. Comparison with a statistically equivalent variogram-based model shows that our approach is more suited for building 3-D facies models that contain structures specific to the channelized environment and which have a significant influence on the transport processes.
Geologic interpretation of HCMM and aircraft thermal data
NASA Technical Reports Server (NTRS)
1982-01-01
Progress on the Heat Capacity Mapping Mission (HCMM) follow-on study is reported. Numerous image products for geologic interpretation of both HCMM and aircraft thermal data were produced. These include, among others, various combinations of the thermal data with LANDSAT and SEASAT data. The combined data sets were displayed using simple color composites, principal component color composites and black and white images, and hue, saturation intensity color composites. Algorithms for incorporating both atmospheric and elevation data simultaneously into the digital processing for creation of quantitatively correct thermal inertia images, are in the final development stage. A field trip to Death Valley was undertaken to field check the aircraft and HCMM data.
Statistical Analysis of Cardiovascular Data from FAP
NASA Technical Reports Server (NTRS)
Sealey, Meghan
2016-01-01
pressure, etc.) to see which could best predict how long the subjects could tolerate the tilt tests. With this I plan to analyze an artificial gravity study in order to determine the effects of orthostatic intolerance during spaceflight. From these projects, I became efficient in using the statistical software Stata, which I had previously never used before. I learned new statistical methods, such as mixed-effects linear regression, maximum likelihood estimation on longitudinal data, and post model-fitting tests to see if certain parameters contribute significantly to the model, all of which will better my understanding for when I continue studying for my masters' degree. I was also able to demonstrate my knowledge of statistics by helping other students run statistical analyses for their own projects. After completing these projects, the experience and knowledge gained from completing this analysis exemplifies the type of work that I would like to pursue in the future. After completing my masters' degree, I plan to pursue a career in biostatistics, which is exactly the position that I interned as, and I plan to use this experience to contribute to that goal
Shock Classication of Ordinary Chondrites: New Data and Interpretations
NASA Astrophysics Data System (ADS)
Stoffler, D.; Keil, K.; Scott, E. R. D.
1992-07-01
Introduction. The recently proposed classification system for shocked chondrites (1) is based on a microscopic survey of 76 non-Antarctic H, L, and LL chondrites. Obviously, a larger database is highly desirable in order to confirm earlier conclusions and to allow for a statistically relevant interpretation of the data. Here, we report the shock classification of an additional 54 ordinary chondrites and summarize implications based on a total of 130 samples. New observations on shock effects. Continued studies of those shock effects in olivine and plagioclase that are indicative of the shock stages S1 - S6 as defined in (1) revealed the following: Planar deformation features in olivine, considered typical of stage S5, occur occasionally in stage S3 and are common in stage S4. In some S4 chondrites plagioclase is not partially isotropic but still birefringent coexisting with a small fraction of S3 olivines. Opaque shock veins occur not only in shock stage S3 and above (1) but have now been found in a few chondrites of shock stage S2. Thermal annealing of shock effects. Planar fractures and planar deformation features in olivine persist up to the temperatures required for recrystallization of olivine (> ca. 900 degrees C). Shock history of breccias. In a number of petrologic types 3 and 4 chondrites without recognizable (polymict) breccia texture, we found chondrules and olivine fragments with different shock histories ranging from S1 to S3. Regolith and fragmental breccias are polymict with regard to lithology and shock. The intensity of the latest shock typically varies from S1 to S4 in the breccias studied so far. Frequency distribution of shock stages. A significant difference between H and L chondrites is emerging in contrast to our previous statistics (1), whereas the conspicuous lack of shock stages S5 and S6 in type 3 and 4 chondrites is clearly confirmed (Fig. 1). Correlation between shock and noble gas content. The concentration of radiogenic argon and of
Toxic substances and human risk: principles of data interpretation
Tardiff, R.G.; Rodricks, J.V.
1988-01-01
This book provides a comprehensive overview of the relationship between toxicology and risk assessment and identifying the principles that should be used to evaluate toxicological data for human risk assessment. The book opens by distinguishing between the practice of toxicology as a science (observational and data-gathering activities) and its practice as an art (predictive or risk-estimating activities). This dichotomous nature produces the two elemental problems with which users of toxicological data must grapple. First, how relevant are data provided by the science of toxicology to assessment of human health risks. Second, what methods of data interpretation should be used to formulate hypotheses or predictions regarding human health risk.
Parameter Interpretation and Reduction for a Unified Statistical Mechanical Surface Tension Model.
Boyer, Hallie; Wexler, Anthony; Dutcher, Cari S
2015-09-01
Surface properties of aqueous solutions are important for environments as diverse as atmospheric aerosols and biocellular membranes. Previously, we developed a surface tension model for both electrolyte and nonelectrolyte aqueous solutions across the entire solute concentration range (Wexler and Dutcher, J. Phys. Chem. Lett. 2013, 4, 1723-1726). The model differentiated between adsorption of solute molecules in the bulk and surface of solution using the statistical mechanics of multilayer sorption solution model of Dutcher et al. (J. Phys. Chem. A 2013, 117, 3198-3213). The parameters in the model had physicochemical interpretations, but remained largely empirical. In the current work, these parameters are related to solute molecular properties in aqueous solutions. For nonelectrolytes, sorption tendencies suggest a strong relation with molecular size and functional group spacing. For electrolytes, surface adsorption of ions follows ion surface-bulk partitioning calculations by Pegram and Record (J. Phys. Chem. B 2007, 111, 5411-5417). PMID:26275040
2014-01-01
Background A new algorithm has been developed to enable the interpretation of black box models. The developed algorithm is agnostic to learning algorithm and open to all structural based descriptors such as fragments, keys and hashed fingerprints. The algorithm has provided meaningful interpretation of Ames mutagenicity predictions from both random forest and support vector machine models built on a variety of structural fingerprints. A fragmentation algorithm is utilised to investigate the model’s behaviour on specific substructures present in the query. An output is formulated summarising causes of activation and deactivation. The algorithm is able to identify multiple causes of activation or deactivation in addition to identifying localised deactivations where the prediction for the query is active overall. No loss in performance is seen as there is no change in the prediction; the interpretation is produced directly on the model’s behaviour for the specific query. Results Models have been built using multiple learning algorithms including support vector machine and random forest. The models were built on public Ames mutagenicity data and a variety of fingerprint descriptors were used. These models produced a good performance in both internal and external validation with accuracies around 82%. The models were used to evaluate the interpretation algorithm. Interpretation was revealed that links closely with understood mechanisms for Ames mutagenicity. Conclusion This methodology allows for a greater utilisation of the predictions made by black box models and can expedite further study based on the output for a (quantitative) structure activity model. Additionally the algorithm could be utilised for chemical dataset investigation and knowledge extraction/human SAR development. PMID:24661325
47 CFR 1.363 - Introduction of statistical data.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 47 Telecommunication 1 2011-10-01 2011-10-01 false Introduction of statistical data. 1.363 Section... Proceedings Evidence § 1.363 Introduction of statistical data. (a) All statistical studies, offered in... analyses, and experiments, and those parts of other studies involving statistical methodology shall...
47 CFR 1.363 - Introduction of statistical data.
Code of Federal Regulations, 2010 CFR
2010-10-01
... 47 Telecommunication 1 2010-10-01 2010-10-01 false Introduction of statistical data. 1.363 Section... Proceedings Evidence § 1.363 Introduction of statistical data. (a) All statistical studies, offered in... analyses, and experiments, and those parts of other studies involving statistical methodology shall...
Statistical mapping of count survey data
Royle, J. Andrew; Link, W.A.; Sauer, J.R.
2002-01-01
We apply a Poisson mixed model to the problem of mapping (or predicting) bird relative abundance from counts collected from the North American Breeding Bird Survey (BBS). The model expresses the logarithm of the Poisson mean as a sum of a fixed term (which may depend on habitat variables) and a random effect which accounts for remaining unexplained variation. The random effect is assumed to be spatially correlated, thus providing a more general model than the traditional Poisson regression approach. Consequently, the model is capable of improved prediction when data are autocorrelated. Moreover, formulation of the mapping problem in terms of a statistical model facilitates a wide variety of inference problems which are cumbersome or even impossible using standard methods of mapping. For example, assessment of prediction uncertainty, including the formal comparison of predictions at different locations, or through time, using the model-based prediction variance is straightforward under the Poisson model (not so with many nominally model-free methods). Also, ecologists may generally be interested in quantifying the response of a species to particular habitat covariates or other landscape attributes. Proper accounting for the uncertainty in these estimated effects is crucially dependent on specification of a meaningful statistical model. Finally, the model may be used to aid in sampling design, by modifying the existing sampling plan in a manner which minimizes some variance-based criterion. Model fitting under this model is carried out using a simulation technique known as Markov Chain Monte Carlo. Application of the model is illustrated using Mourning Dove (Zenaida macroura) counts from Pennsylvania BBS routes. We produce both a model-based map depicting relative abundance, and the corresponding map of prediction uncertainty. We briefly address the issue of spatial sampling design under this model. Finally, we close with some discussion of mapping in relation to
Design and coding considerations of the soil data language interpreter
NASA Astrophysics Data System (ADS)
Kollias, V. J.; Kollias, J. G.
A query language, named Soil Data Language (SDL), for retrieving information from a Soil. Data Bank, is part of the ARSIS (A Relational Soil Information System) system, currently being developed in Greece. The interpreter of the language accepts input programs, expressed as SDL commands, and outputs the requested information. This paper describes design principles employed during the coding of the interpreter of the language. The derived program can be modified to cover eventual alterations to the specifications of language or the content and structure of the data bank. The study may be seen as an initiative to the design of Generalized Soil and Land Information Systems that primarily are concerned with the easy adaptation to a variety of National processing requirements.
Implementation of ILLIAC 4 algorithms for multispectral image interpretation. [earth resources data
NASA Technical Reports Server (NTRS)
Ray, R. M.; Thomas, J. D.; Donovan, W. E.; Swain, P. H.
1974-01-01
Research has focused on the design and partial implementation of a comprehensive ILLIAC software system for computer-assisted interpretation of multispectral earth resources data such as that now collected by the Earth Resources Technology Satellite. Research suggests generally that the ILLIAC 4 should be as much as two orders of magnitude more cost effective than serial processing computers for digital interpretation of ERTS imagery via multivariate statistical classification techniques. The potential of the ARPA Network as a mechanism for interfacing geographically-dispersed users to an ILLIAC 4 image processing facility is discussed.
Rosenfield, G.H.
1986-01-01
Statistical analysis is conducted to determine the unique value of real- and synthetic-aperture side-looking airborne radar (SLAR) to detect interpreted structural elements. SLAR images were compared to standard and digitally enhanced Landsat multispectral scanner (MSS) images and to aerial photographs. After interpretation of the imagery, data were cumulated by total length in miles and by frequency of counts. Maximum uniqueness is obtained first from real-aperture SLAR, 58.3% of total, and, second, from digitally enhanced Landsat MSS images, 54.1% of total. ?? 1986 Plenum Publishing Corporation.
Statistical Analysis of DWPF ARG-1 Data
Harris, S.P.
2001-03-02
A statistical analysis of analytical results for ARG-1, an Analytical Reference Glass, blanks, and the associated calibration and bench standards has been completed. These statistics provide a means for DWPF to review the performance of their laboratory as well as identify areas of improvement.
Patton, Charles J.; Gilroy, Edward J.
1999-01-01
Data on which this report is based, including nutrient concentrations in synthetic reference samples determined concurrently with those in real samples, are extensive (greater than 20,000 determinations) and have been published separately. In addition to confirming the well-documented instability of nitrite in acidified samples, this study also demonstrates that when biota are removed from samples at collection sites by 0.45-micrometer membrane filtration, subsequent preservation with sulfuric acid or mercury (II) provides no statistically significant improvement in nutrient concentration stability during storage at 4 degrees Celsius for 30 days. Biocide preservation had no statistically significant effect on the 30-day stability of phosphorus concentrations in whole-water splits from any of the 15 stations, but did stabilize Kjeldahl nitrogen concentrations in whole-water splits from three data-collection stations where ammonium accounted for at least half of the measured Kjeldahl nitrogen.
Reiber, Hansotto
2016-06-01
The physiological and biophysical knowledge base for interpretations of cerebrospinal fluid (CSF) data and reference ranges are essential for the clinical pathologist and neurochemist. With the popular description of the CSF flow dependent barrier function, the dynamics and concentration gradients of blood-derived, brain-derived and leptomeningeal proteins in CSF or the specificity-independent functions of B-lymphocytes in brain also the neurologist, psychiatrist, neurosurgeon as well as the neuropharmacologist may find essentials for diagnosis, research or development of therapies. This review may help to replace the outdated ideas like "leakage" models of the barriers, linear immunoglobulin Index Interpretations or CSF electrophoresis. Calculations, Interpretations and analytical pitfalls are described for albumin quotients, quantitation of immunoglobulin synthesis in Reibergrams, oligoclonal IgG, IgM analysis, the polyspecific ( MRZ- ) antibody reaction, the statistical treatment of CSF data and general quality assessment in the CSF laboratory. The diagnostic relevance is documented in an accompaning review. PMID:27332077
Interpretation of Landsat-4 Thematic Mapper and Multispectral Scanner data for forest surveys
NASA Technical Reports Server (NTRS)
Benson, A. S.; Degloria, S. D.
1985-01-01
Landsat-4 Thematic Mapper (TM) and Multispectral Scanner (MSS) data were evaluated by interpreting film and digital products and statistical data for selected forest cover types in California. Significant results were: (1) TM color image products should contain a spectral band in the visible (bands 1, 2, or 3), near infrared (band 4), and middle infrared (band 5) regions for maximizing the interpretability of vegetation types; (2) TM color composites should contain band 4 in all cases even at the expense of excluding band 5; and (3) MSS color composites were more interpretable than all TM color composites for certain cover types and for all cover types when band 4 was excluded from the TM composite.
Mobile Collection and Automated Interpretation of EEG Data
NASA Technical Reports Server (NTRS)
Mintz, Frederick; Moynihan, Philip
2007-01-01
A system that would comprise mobile and stationary electronic hardware and software subsystems has been proposed for collection and automated interpretation of electroencephalographic (EEG) data from subjects in everyday activities in a variety of environments. By enabling collection of EEG data from mobile subjects engaged in ordinary activities (in contradistinction to collection from immobilized subjects in clinical settings), the system would expand the range of options and capabilities for performing diagnoses. Each subject would be equipped with one of the mobile subsystems, which would include a helmet that would hold floating electrodes (see figure) in those positions on the patient s head that are required in classical EEG data-collection techniques. A bundle of wires would couple the EEG signals from the electrodes to a multi-channel transmitter also located in the helmet. Electronic circuitry in the helmet transmitter would digitize the EEG signals and transmit the resulting data via a multidirectional RF patch antenna to a remote location. At the remote location, the subject s EEG data would be processed and stored in a database that would be auto-administered by a newly designed relational database management system (RDBMS). In this RDBMS, in nearly real time, the newly stored data would be subjected to automated interpretation that would involve comparison with other EEG data and concomitant peer-reviewed diagnoses stored in international brain data bases administered by other similar RDBMSs.
NASA Astrophysics Data System (ADS)
Grzempowski, Piotr; Bac-Bronowicz, Joanna; Blachowski, Jan; Milczarek, Wojciech
2014-05-01
The increasing number of data made available on Open Source servers allows for interdisciplinary interpretations of deformation measurements at both the local and the continental scales. The openly available vector and raster models of topographic, geological, geophysical, geodetic, remote sensing data have different spatial and temporal resolutions and are of various quality. The reliability of deformation modelling results depend on the resolution and accuracy of the models describing factors and conditions, in which these deformations take place. The paper describes the structure of a system for integration and processing of data obtained from Open Source servers including topographic, geological, geophysical, seismic, geodetic, remote sensing and other data needed for interpretation of deformation measurements and development of statistical models. The system is based on GIS environment in the scope of data storage and fundamental spatial analyses and support of external expert software. In the paper the results of interpretations and statistical models in local and continental scale taking into account analysis of the data resolution and accuracy and their influence on the final result of the modelling have been presented. Example influence models taking into account quantitative and qualitative data have also been shown.
Huang, Ruili; Southall, Noel; Xia, Menghang; Cho, Ming-Hsuang; Jadhav, Ajit; Nguyen, Dac-Trung; Inglese, James; Tice, Raymond R.; Austin, Christopher P.
2009-01-01
In support of the U.S. Tox21 program, we have developed a simple and chemically intuitive model we call weighted feature significance (WFS) to predict the toxicological activity of compounds, based on the statistical enrichment of structural features in toxic compounds. We trained and tested the model on the following: (1) data from quantitative high–throughput screening cytotoxicity and caspase activation assays conducted at the National Institutes of Health Chemical Genomics Center, (2) data from Salmonella typhimurium reverse mutagenicity assays conducted by the U.S. National Toxicology Program, and (3) hepatotoxicity data published in the Registry of Toxic Effects of Chemical Substances. Enrichments of structural features in toxic compounds are evaluated for their statistical significance and compiled into a simple additive model of toxicity and then used to score new compounds for potential toxicity. The predictive power of the model for cytotoxicity was validated using an independent set of compounds from the U.S. Environmental Protection Agency tested also at the National Institutes of Health Chemical Genomics Center. We compared the performance of our WFS approach with classical classification methods such as Naive Bayesian clustering and support vector machines. In most test cases, WFS showed similar or slightly better predictive power, especially in the prediction of hepatotoxic compounds, where WFS appeared to have the best performance among the three methods. The new algorithm has the important advantages of simplicity, power, interpretability, and ease of implementation. PMID:19805409
New Statistical Approach to the Analysis of Hierarchical Data
NASA Astrophysics Data System (ADS)
Neuman, S. P.; Guadagnini, A.; Riva, M.
2014-12-01
Many variables possess a hierarchical structure reflected in how their increments vary in space and/or time. Quite commonly the increments (a) fluctuate in a highly irregular manner; (b) possess symmetric, non-Gaussian frequency distributions characterized by heavy tails that often decay with separation distance or lag; (c) exhibit nonlinear power-law scaling of sample structure functions in a midrange of lags, with breakdown in such scaling at small and large lags; (d) show extended power-law scaling (ESS) at all lags; and (e) display nonlinear scaling of power-law exponent with order of sample structure function. Some interpret this to imply that the variables are multifractal, which explains neither breakdowns in power-law scaling nor ESS. We offer an alternative interpretation consistent with all above phenomena. It views data as samples from stationary, anisotropic sub-Gaussian random fields subordinated to truncated fractional Brownian motion (tfBm) or truncated fractional Gaussian noise (tfGn). The fields are scaled Gaussian mixtures with random variances. Truncation of fBm and fGn entails filtering out components below data measurement or resolution scale and above domain scale. Our novel interpretation of the data allows us to obtain maximum likelihood estimates of all parameters characterizing the underlying truncated sub-Gaussian fields. These parameters in turn make it possible to downscale or upscale all statistical moments to situations entailing smaller or larger measurement or resolution and sampling scales, respectively. They also allow one to perform conditional or unconditional Monte Carlo simulations of random field realizations corresponding to these scales. Aspects of our approach are illustrated on field and laboratory measured porous and fractured rock permeabilities, as well as soil texture characteristics and neural network estimates of unsaturated hydraulic parameters in a deep vadose zone near Phoenix, Arizona. We also use our approach
Eide, I; Zahlsen, K
1996-01-01
The paper describes experimental and statistical methods for toxicokinetic evaluation of mixtures in inhalation experiments. Synthetic mixtures of three C9 n-paraffinic, naphthenic and aromatic hydrocarbons (n-nonane, trimethylcyclohexane and trimethylbenzene, respectively) were studied in the rat after inhalation for 12h. The hydrocarbons were mixed according to principles for statistical experimental design using mixture design at four vapour levels (75, 150, 300 and 450 ppm) to support an empirical model with linear, interaction and quadratic terms (Taylor polynome). Immediately after exposure, concentrations of hydrocarbons were measured by head space gas chromatography in blood, brain, liver, kidneys and perirenal fat. Multivariate data analysis and modelling were performed with PLS (projections to latent structures). The best models were obtained after removing all interaction terms, suggesting that there were no interactions between the hydrocarbons with respect to absorption and distribution. Uptake of paraffins and particularly aromatics is best described by quadratic models, whereas the uptake of the naphthenic hydrocarbons is nearly linear. All models are good, with high correlation (r2) and prediction properties (Q2), the latter after cross validation. The concentrations of aromates in blood were high compared to the other hydrocarbons. At concentrations below 250 ppm, the naphthene reached higher concentrations in the brain compared to the paraffin and the aromate. Statistical experimental design, multivariate data analysis and modelling have proved useful for the evaluation of synthetic mixtures. The principles may also be used in the design of liquid mixtures, which may be evaporated partially or completely. PMID:8740533
NASA Astrophysics Data System (ADS)
Dralle, D.; Karst, N.; Thompson, S. E.
2015-12-01
Multiple competing theories suggest that power law behavior governs the observed first-order dynamics of streamflow recessions - the important process by which catchments dry-out via the stream network, altering the availability of surface water resources and in-stream habitat. Frequently modeled as: dq/dt = -aqb, recessions typically exhibit a high degree of variability, even within a single catchment, as revealed by significant shifts in the values of "a" and "b" across recession events. One potential source of this variability lies in underlying, hard-to-observe fluctuations in how catchment water storage is partitioned amongst distinct storage elements, each having different discharge behaviors. Testing this and competing hypotheses with widely available streamflow timeseries, however, has been hindered by a power law scaling artifact that obscures meaningful covariation between the recession parameters, "a" and "b". Here we briefly outline a technique that removes this artifact, revealing intriguing new patterns in the joint distribution of recession parameters. Using long-term flow data from catchments in Northern California, we explore temporal variations, and find that the "a" parameter varies strongly with catchment wetness. Then we explore how the "b" parameter changes with "a", and find that measures of its variation are maximized at intermediate "a" values. We propose an interpretation of this pattern based on statistical mechanics, meaning "b" can be viewed as an indicator of the catchment "microstate" - i.e. the partitioning of storage - and "a" as a measure of the catchment macrostate (i.e. the total storage). In statistical mechanics, entropy (i.e. microstate variance, that is the variance of "b") is maximized for intermediate values of extensive variables (i.e. wetness, "a"), as observed in the recession data. This interpretation of "a" and "b" was supported by model runs using a multiple-reservoir catchment toy model, and lends support to the
Quantitative interpretation of Great Lakes remote sensing data
NASA Technical Reports Server (NTRS)
Shook, D. F.; Salzman, J.; Svehla, R. A.; Gedney, R. T.
1980-01-01
The paper discusses the quantitative interpretation of Great Lakes remote sensing water quality data. Remote sensing using color information must take into account (1) the existence of many different organic and inorganic species throughout the Great Lakes, (2) the occurrence of a mixture of species in most locations, and (3) spatial variations in types and concentration of species. The radiative transfer model provides a potential method for an orderly analysis of remote sensing data and a physical basis for developing quantitative algorithms. Predictions and field measurements of volume reflectances are presented which show the advantage of using a radiative transfer model. Spectral absorptance and backscattering coefficients for two inorganic sediments are reported.
Interdisciplinary applications and interpretations of remotely sensed data
NASA Technical Reports Server (NTRS)
Peterson, G. W.; Mcmurtry, G. J.
1972-01-01
An interdisciplinary approach to use remote sensor for the inventory of natural resources is discussed. The areas under investigation are land use, determination of pollution sources and damage, and analysis of geologic structure and terrain. The geographical area of primary interest is the Susquehanna River Basin. Descriptions of the data obtained by aerial cameras, multiband cameras, optical mechanical scanners, and radar are included. The Earth Resources Technology Satellite and Skylab program are examined. Interpretations of spacecraft data to show specific areas of interest are developed.
Borehole seismic data processing and interpretation: New free software
NASA Astrophysics Data System (ADS)
Farfour, Mohammed; Yoon, Wang Jung
2015-12-01
Vertical Seismic Profile (VSP) surveying is a vital tool in subsurface imaging and reservoir characterization. The technique allows geophysicists to infer critical information that cannot be obtained otherwise. MVSP is a new MATLAB tool with a graphical user interface (GUI) for VSP shot modeling, data processing, and interpretation. The software handles VSP data from the loading and preprocessing stages to the final stage of corridor plotting and integration with well and seismic data. Several seismic and signal processing toolboxes are integrated and modified to suit and enrich the processing and display packages. The main motivation behind the development of the software is to provide new geoscientists and students in the geoscience fields with free software that brings together all VSP modules in one easy-to-use package. The software has several modules that allow the user to test, process, compare, visualize, and produce publication-quality results. The software is developed as a stand-alone MATLAB application that requires only MATLAB Compiler Runtime (MCR) to run with full functionality. We present a detailed description of MVSP and use the software to create synthetic VSP data. The data are then processed using different available tools. Next, real data are loaded and fully processed using the software. The data are then integrated with well data for more detailed analysis and interpretation. In order to evaluate the software processing flow accuracy, the same data are processed using commercial software. Comparison of the processing results shows that MVSP is able to process VSP data as efficiently as commercial software packages currently used in industry, and provides similar high-quality processed data.
Data relay system specifications for ERTS image interpretation
NASA Technical Reports Server (NTRS)
Daniel, J. F.
1970-01-01
Experiments with the Data Collection System (DCS) of the Earth Resources Technology Satellites (ERTS) have been developed to stress ERTS applications in the Earth Resources Observation Systems (EROS) Program. Active pursuit of this policy has resulted in the design of eight specific experiments requiring a total of 98 DCS ground-data platforms. Of these eight experiments, six are intended to make use of DCS data as an aid in image interpretation, while two make use of the capability to relay data from remote locations. Preliminary discussions regarding additional experiments indicate a need for at least 150 DCS platforms within the EROS Program for ERTS experimentation. Results from the experiments will be used to assess the DCS suitability for satellites providing on-line, real-time, data relay capability. The rationale of the total DCS network of ground platforms and the relationship of each experiment to that rationale are discussed.
Lee, L.; Helsel, D.
2005-01-01
Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.
Bayesian Statistics for Biological Data: Pedigree Analysis
ERIC Educational Resources Information Center
Stanfield, William D.; Carlton, Matthew A.
2004-01-01
The use of Bayes' formula is applied to the biological problem of pedigree analysis to show that the Bayes' formula and non-Bayesian or "classical" methods of probability calculation give different answers. First year college students of biology can be introduced to the Bayesian statistics.
Statistical Treatment of Looking-Time Data
ERIC Educational Resources Information Center
Csibra, Gergely; Hernik, Mikolaj; Mascaro, Olivier; Tatone, Denis; Lengyel, Máté
2016-01-01
Looking times (LTs) are frequently measured in empirical research on infant cognition. We analyzed the statistical distribution of LTs across participants to develop recommendations for their treatment in infancy research. Our analyses focused on a common within-subject experimental design, in which longer looking to novel or unexpected stimuli is…
The Systematic Interpretation of Cosmic Ray Data (The Transport Project)
NASA Technical Reports Server (NTRS)
Guzik, T. Gregory
1997-01-01
The Transport project's primary goals were to: (1) Provide measurements of critical fragmentation cross sections; (2) Study the cross section systematics; (3) Improve the galactic cosmic ray propagation methodology; and (4) Use the new cross section measurements to improve the interpretation of cosmic ray data. To accomplish these goals a collaboration was formed consisting of researchers in the US at Louisiana State University (LSU), Lawrence Berkeley Laboratory (LBL), Goddard Space Flight Center (GSFC), the University of Minnesota (UM), New Mexico State University (NMSU), in France at the Centre d'Etudes de Saclay and in Italy at the Universita di Catania. The US institutions, lead by LSU, were responsible for measuring new cross sections using the LBL HISS facility, analysis of these measurements and their application to interpreting cosmic ray data. France developed a liquid hydrogen target that was used in the HISS experiment and participated in the data interpretation. Italy developed a Multifunctional Neutron Spectrometer (MUFFINS) for the HISS runs to measure the energy spectra, angular distributions and multiplicities of neutrons emitted during the high energy interactions. The Transport Project was originally proposed to NASA during Summer, 1988 and funding began January, 1989. Transport was renewed twice (1991, 1994) and finally concluded at LSU on September, 30, 1997. During the more than 8 years of effort we had two major experiment runs at LBL, obtained data on the interaction of twenty different beams with a liquid hydrogen target, completed the analysis of fifteen of these datasets obtaining 590 new cross section measurements, published nine journal articles as well as eighteen conference proceedings papers, and presented more than thirty conference talks.
47 CFR 1.363 - Introduction of statistical data.
Code of Federal Regulations, 2012 CFR
2012-10-01
... 47 Telecommunication 1 2012-10-01 2012-10-01 false Introduction of statistical data. 1.363 Section 1.363 Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL PRACTICE AND PROCEDURE Hearing Proceedings Evidence § 1.363 Introduction of statistical data. (a) All statistical studies, offered in evidence in common carrier hearing...
Describing Middle School Students' Organization of Statistical Data.
ERIC Educational Resources Information Center
Johnson, Yolanda; Hofbauer, Pamela
The purpose of this study was to describe how middle school students physically arrange and organize statistical data. A case-study analysis was used to define and characterize the styles in which students handle, organize, and group statistical data. A series of four statistical tasks (Mooney, Langrall, Hofbauer, & Johnson, 2001) were given to…
Presentation and interpretation of chemical data for igneous rocks
Wright, T.L.
1974-01-01
Arguments are made in favor of using variation diagrams to plot analyses of igneous rocks and their derivatives and modeling differentiation processes by least-squares mixing procedures. These methods permit study of magmatic differentiation and related processes in terms of all of the chemical data available. Data are presented as they are reported by the chemist and specific processes may be modeled and either quantitatively described or rejected as inappropriate or too simple. Examples are given of the differing interpretations that can arise when data are plotted on an AEM ternary vs. the same data on a full set of MgO variation diagrams. Mixing procedures are illustrated with reference to basaltic lavas from the Columbia Plateau. ?? 1974 Springer-Verlag.
Statistical data of the uranium industry
1980-01-01
This document is a compilation of historical facts and figures through 1979. These statistics are based primarily on information provided voluntarily by the uranium exploration, mining, and milling companies. The production, reserves, drilling, and production capability information has been reported in a manner which avoids disclosure of proprietary information. Only the totals for the $1.5 reserves are reported. Because of increased interest in higher cost resources for long range planning purposes, a section covering the distribution of $100 per pound reserves statistics has been newly included. A table of mill recovery ranges for the January 1, 1980 reserves has also been added to this year's edition. The section on domestic uranium production capability has been deleted this year but will be included next year. The January 1, 1980 potential resource estimates are unchanged from the January 1, 1979 estimates.
Simple Hartmann test data interpretation for ophthalmic lenses
NASA Astrophysics Data System (ADS)
Salas-Peimbert, Didia Patricia; Trujillo-Schiaffino, Gerardo; González-Silva, Jorge Alberto; Almazán-Cuellar, Saúl; Malacara-Doblado, Daniel
2006-04-01
This article describes a simple Hartmann test data interpretation that can be used to evaluate the performance of ophthalmic lenses. Considering each spot of the Hartmann pattern such as a single test ray, using simple ray tracing analysis, it is possible to calculate the power values from the lens under test at the point corresponding with each spot. The values obtained by this procedure are used to plot the power distribution map of the entire lens. We present the results obtained applying this method with single vision, bifocal, and progressive lenses.
Geological Interpretation of PSInSAR Data at Regional Scale
Meisina, Claudia; Zucca, Francesco; Notti, Davide; Colombo, Alessio; Cucchi, Anselmo; Savio, Giuliano; Giannico, Chiara; Bianchi, Marco
2008-01-01
Results of a PSInSAR™ project carried out by the Regional Agency for Environmental Protection (ARPA) in Piemonte Region (Northern Italy) are presented and discussed. A methodology is proposed for the interpretation of the PSInSAR™ data at the regional scale, easy to use by the public administrations and by civil protection authorities. Potential and limitations of the PSInSAR™ technique for ground movement detection on a regional scale and monitoring are then estimated in relationship with different geological processes and various geological environments.
Flexibility in data interpretation: effects of representational format
Braithwaite, David W.; Goldstone, Robert L.
2013-01-01
Graphs and tables differentially support performance on specific tasks. For tasks requiring reading off single data points, tables are as good as or better than graphs, while for tasks involving relationships among data points, graphs often yield better performance. However, the degree to which graphs and tables support flexibility across a range of tasks is not well-understood. In two experiments, participants detected main and interaction effects in line graphs and tables of bivariate data. Graphs led to more efficient performance, but also lower flexibility, as indicated by a larger discrepancy in performance across tasks. In particular, detection of main effects of variables represented in the graph legend was facilitated relative to detection of main effects of variables represented in the x-axis. Graphs may be a preferable representational format when the desired task or analytical perspective is known in advance, but may also induce greater interpretive bias than tables, necessitating greater care in their use and design. PMID:24427145
Interpretation methodology and analysis of in-flight lightning data
NASA Technical Reports Server (NTRS)
Rudolph, T.; Perala, R. A.
1982-01-01
A methodology is presented whereby electromagnetic measurements of inflight lightning stroke data can be understood and extended to other aircraft. Recent measurements made on the NASA F106B aircraft indicate that sophisticated numerical techniques and new developments in corona modeling are required to fully understand the data. Thus the problem is nontrivial and successful interpretation can lead to a significant understanding of the lightning/aircraft interaction event. This is of particular importance because of the problem of lightning induced transient upset of new technology low level microcircuitry which is being used in increasing quantities in modern and future avionics. Inflight lightning data is analyzed and lightning environments incident upon the F106B are determined.
Internet Data Analysis for the Undergraduate Statistics Curriculum
ERIC Educational Resources Information Center
Sanchez, Juana; He, Yan
2005-01-01
Statistics textbooks for undergraduates have not caught up with the enormous amount of analysis of Internet data that is taking place these days. Case studies that use Web server log data or Internet network traffic data are rare in undergraduate Statistics education. And yet these data provide numerous examples of skewed and bimodal…
Guidelines for Statistical Analysis of Percentage of Syllables Stuttered Data
ERIC Educational Resources Information Center
Jones, Mark; Onslow, Mark; Packman, Ann; Gebski, Val
2006-01-01
Purpose: The purpose of this study was to develop guidelines for the statistical analysis of percentage of syllables stuttered (%SS) data in stuttering research. Method; Data on %SS from various independent sources were used to develop a statistical model to describe this type of data. On the basis of this model, %SS data were simulated with…
Improved interpretation of satellite altimeter data using genetic algorithms
NASA Technical Reports Server (NTRS)
Messa, Kenneth; Lybanon, Matthew
1992-01-01
Genetic algorithms (GA) are optimization techniques that are based on the mechanics of evolution and natural selection. They take advantage of the power of cumulative selection, in which successive incremental improvements in a solution structure become the basis for continued development. A GA is an iterative procedure that maintains a 'population' of 'organisms' (candidate solutions). Through successive 'generations' (iterations) the population as a whole improves in simulation of Darwin's 'survival of the fittest'. GA's have been shown to be successful where noise significantly reduces the ability of other search techniques to work effectively. Satellite altimetry provides useful information about oceanographic phenomena. It provides rapid global coverage of the oceans and is not as severely hampered by cloud cover as infrared imagery. Despite these and other benefits, several factors lead to significant difficulty in interpretation. The GA approach to the improved interpretation of satellite data involves the representation of the ocean surface model as a string of parameters or coefficients from the model. The GA searches in parallel, a population of such representations (organisms) to obtain the individual that is best suited to 'survive', that is, the fittest as measured with respect to some 'fitness' function. The fittest organism is the one that best represents the ocean surface model with respect to the altimeter data.
Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.
ERIC Educational Resources Information Center
Boysen, Guy A.
2015-01-01
Student evaluations of teaching are among the most accepted and important indicators of college teachers' performance. However, faculty and administrators can overinterpret small variations in mean teaching evaluations. The current research examined the effect of including statistical information on the interpretation of teaching evaluations.…
Plausible inference and the interpretation of quantitative data
Nakhleh, C.W.
1998-02-01
The analysis of quantitative data is central to scientific investigation. Probability theory, which is founded on two rules, the sum and product rules, provides the unique, logically consistent method for drawing valid inferences from quantitative data. This primer on the use of probability theory is meant to fulfill a pedagogical purpose. The discussion begins at the foundation of scientific inference by showing how the sum and product rules of probability theory follow from some very basic considerations of logical consistency. The authors then develop general methods of probability theory that are essential to the analysis and interpretation of data. They discuss how to assign probability distributions using the principle of maximum entropy, how to estimate parameters from data, how to handle nuisance parameters whose values are of little interest, and how to determine which of a set of models is most justified by a data set. All these methods are used together in most realistic data analyses. Examples are given throughout to illustrate the basic points.
Dielectric Property Measurements to Support Interpretation of Cassini Radar Data
NASA Astrophysics Data System (ADS)
Jamieson, Corey; Barmatz, M.
2012-10-01
Radar observations are useful for constraining surface and near-surface compositions and illuminating geologic processes on Solar System bodies. The interpretation of Cassini radiometric and radar data at 13.78 GHz (2.2 cm) of Titan and other Saturnian icy satellites is aided by laboratory measurements of the dielectric properties of relevant materials. However, existing dielectric measurements of candidate surface materials at microwave frequencies and low temperatures is sparse. We have set up a microwave cavity and cryogenic system to measure the complex dielectric properties of liquid hydrocarbons relevant to Titan, specifically methane, ethane and their mixtures to support the interpretation of spacecraft instrument and telescope radar observations. To perform these measurements, we excite and detect the TM020 mode in a custom-built cavity with small metal loop antennas powered by a Vector Network Analyzer. The hydrocarbon samples are condensed into a cylindrical quartz tube that is axially oriented in the cavity. Frequency sweeps through a resonance are performed with an empty cavity, an empty quartz tube inserted into the cavity, and with a sample-filled quartz tube in the cavity. These sweeps are fit by a Lorentzian line shape, from which we obtain the resonant frequency, f, and quality factor, Q, for each experimental arrangement. We then derive dielectric constants and loss tangents for our samples near 13.78 GHz using a new technique ideally suited for measuring liquid samples. We will present temperature-dependent, dielectric property measurements for liquid methane and ethane. The full interpretation of the radar and radiometry observations of Saturn’s icy satellites depends critically on understanding the dielectric properties of potential surface materials. By investigating relevant liquids and solids we will improve constrains on lake depths, volumes and compositions, which are important to understand Titan’s carbon/organic cycle and inevitably
Statistical Considerations of Data Processing in Giovanni Online Tool
NASA Astrophysics Data System (ADS)
Shen, S.; Leptoukh, G.; Acker, J.; Berrick, S.
2005-12-01
The GES DISC Interactive Online Visualization and Analysis Infrastructure (Giovanni) is a web-based interface for the rapid visualization and analysis of gridded data from a number of remote sensing instruments. The GES DISC currently employs several Giovanni instances to analyze various products, such as Ocean-Giovanni for ocean products from SeaWiFS and MODIS-Aqua; TOMS & OMI Giovanni for atmospheric chemical trace gases from TOMS and OMI, and MOVAS for aerosols from MODIS, etc. (http://giovanni.gsfc.nasa.gov) Foremost among the Giovanni statistical functions is data averaging. Two aspects of this function are addressed here. The first deals with the accuracy of averaging gridded mapped products vs. averaging from the ungridded Level 2 data. Some mapped products contain mean values only; others contain additional statistics, such as number of pixels (NP) for each grid, standard deviation, etc. Since NP varies spatially and temporally, averaging with or without weighting by NP will be different. In this paper, we address differences of various weighting algorithms for some datasets utilized in Giovanni. The second aspect is related to different averaging methods affecting data quality and interpretation for data with non-normal distribution. The present study demonstrates results of different spatial averaging methods using gridded SeaWiFS Level 3 mapped monthly chlorophyll a data. Spatial averages were calculated using three different methods: arithmetic mean (AVG), geometric mean (GEO), and maximum likelihood estimator (MLE). Biogeochemical data, such as chlorophyll a, are usually considered to have a log-normal distribution. The study determined that differences between methods tend to increase with increasing size of a selected coastal area, with no significant differences in most open oceans. The GEO method consistently produces values lower than AVG and MLE. The AVG method produces values larger than MLE in some cases, but smaller in other cases. Further
Statistical Considerations of Data Processing in Giovanni Online Tool
NASA Technical Reports Server (NTRS)
Suhung, Shen; Leptoukh, G.; Acker, J.; Berrick, S.
2005-01-01
The GES DISC Interactive Online Visualization and Analysis Infrastructure (Giovanni) is a web-based interface for the rapid visualization and analysis of gridded data from a number of remote sensing instruments. The GES DISC currently employs several Giovanni instances to analyze various products, such as Ocean-Giovanni for ocean products from SeaWiFS and MODIS-Aqua; TOMS & OM1 Giovanni for atmospheric chemical trace gases from TOMS and OMI, and MOVAS for aerosols from MODIS, etc. (http://giovanni.gsfc.nasa.gov) Foremost among the Giovanni statistical functions is data averaging. Two aspects of this function are addressed here. The first deals with the accuracy of averaging gridded mapped products vs. averaging from the ungridded Level 2 data. Some mapped products contain mean values only; others contain additional statistics, such as number of pixels (NP) for each grid, standard deviation, etc. Since NP varies spatially and temporally, averaging with or without weighting by NP will be different. In this paper, we address differences of various weighting algorithms for some datasets utilized in Giovanni. The second aspect is related to different averaging methods affecting data quality and interpretation for data with non-normal distribution. The present study demonstrates results of different spatial averaging methods using gridded SeaWiFS Level 3 mapped monthly chlorophyll a data. Spatial averages were calculated using three different methods: arithmetic mean (AVG), geometric mean (GEO), and maximum likelihood estimator (MLE). Biogeochemical data, such as chlorophyll a, are usually considered to have a log-normal distribution. The study determined that differences between methods tend to increase with increasing size of a selected coastal area, with no significant differences in most open oceans. The GEO method consistently produces values lower than AVG and MLE. The AVG method produces values larger than MLE in some cases, but smaller in other cases. Further
Interpretation of evidence in data by untrained medical students: a scenario-based study
2010-01-01
Background To determine which approach to assessment of evidence in data - statistical tests or likelihood ratios - comes closest to the interpretation of evidence by untrained medical students. Methods Empirical study of medical students (N = 842), untrained in statistical inference or in the interpretation of diagnostic tests. They were asked to interpret a hypothetical diagnostic test, presented in four versions that differed in the distributions of test scores in diseased and non-diseased populations. Each student received only one version. The intuitive application of the statistical test approach would lead to rejecting the null hypothesis of no disease in version A, and to accepting the null in version B. Application of the likelihood ratio approach led to opposite conclusions - against the disease in A, and in favour of disease in B. Version C tested the importance of the p-value (A: 0.04 versus C: 0.08) and version D the importance of the likelihood ratio (C: 1/4 versus D: 1/8). Results In version A, 7.5% concluded that the result was in favour of disease (compatible with p value), 43.6% ruled against the disease (compatible with likelihood ratio), and 48.9% were undecided. In version B, 69.0% were in favour of disease (compatible with likelihood ratio), 4.5% against (compatible with p value), and 26.5% undecided. Increasing the p value from 0.04 to 0.08 did not change the results. The change in the likelihood ratio from 1/4 to 1/8 increased the proportion of non-committed responses. Conclusions Most untrained medical students appear to interpret evidence from data in a manner that is compatible with the use of likelihood ratios. PMID:20796297
Rapp, J.B.
1991-01-01
Q-mode factor analysis was used to quantitate the distribution of the major aliphatic hydrocarbon (n-alkanes, pristane, phytane) systems in sediments from a variety of marine environments. The compositions of the pure end members of the systems were obtained from factor scores and the distribution of the systems within each sample was obtained from factor loadings. All the data, from the diverse environments sampled (estuarine (San Francisco Bay), fresh-water (San Francisco Peninsula), polar-marine (Antarctica) and geothermal-marine (Gorda Ridge) sediments), were reduced to three major systems: a terrestrial system (mostly high molecular weight aliphatics with odd-numbered-carbon predominance), a mature system (mostly low molecular weight aliphatics without predominance) and a system containing mostly high molecular weight aliphatics with even-numbered-carbon predominance. With this statistical approach, it is possible to assign the percentage contribution from various sources to the observed distribution of aliphatic hydrocarbons in each sediment sample. ?? 1991.
Hayashi, C
1986-04-01
The subjects which are often encountered in the statistical design and analysis of data in medical science studies were discussed. The five topics examined were: Medical science and statistical methods So-called mathematical statistics and medical science Fundamentals of cross-tabulation analysis of statistical data and inference Exploratory study by multidimensional data analyses Optimal process control of individual, medical science and informatics of statistical data In I, the author's statistico-mathematical idea is characterized as the analysis of phenomena by statistical data. This is closely related to the logic, methodology and philosophy of science. This statistical concept and method are based on operational and pragmatic ideas. Self-examination of mathematical statistics is particularly focused in II and III. In II, the effectiveness of experimental design and statistical testing is thoroughly examined with regard to the study of medical science, and the limitation of its application is discussed. In III the apparent paradox of analysis of cross-tabulation of statistical data and statistical inference is shown. This is due to the operation of a simple two- or three-fold cross-tabulation analysis of (more than two or three) multidimensional data, apart from the sophisticated statistical test theory of association. In IV, the necessity of informatics of multidimensional data analysis in medical science is stressed. In V, the following point is discussed. The essential point of clinical trials is that they are not based on any simple statistical test in a traditional experimental design but on the optimal process control of individuals in the information space of the body and mind, which is based on a knowledge of medical science and the informatics of multidimensional statistical data analysis. PMID:3729436
Seismic data processing and interpretation on the loess plateau, Part 1: Seismic data processing
NASA Astrophysics Data System (ADS)
Jiang, Jiayu; Fu, Shouxian; Li, Jiuling
2005-12-01
Branching river channels and the coexistence of valleys, ridges, hills, and slopes as the result of long-term weathering and erosion form the unique loess topography. The Changqing Geophysical Company, working in these complex conditions, has established a suite of technologies for high-fidelity processing and fine interpretation of seismic data. This article introduces the processes involved in the data processing and interpretation and illustrates the results.
Traumatic Brain Injury (TBI) Data and Statistics
... data.cdc.gov . Emergency Department Visits, Hospitalizations, and Deaths Rates of TBI-related Emergency Department Visits, Hospitalizations, ... related Hospitalizations by Age Group and Injury Mechanism Deaths Rates of TBI-related Deaths by Sex Rates ...
Efficient statistical mapping of avian count data
Royle, J. Andrew; Wikle, C.K.
2005-01-01
We develop a spatial modeling framework for count data that is efficient to implement in high-dimensional prediction problems. We consider spectral parameterizations for the spatially varying mean of a Poisson model. The spectral parameterization of the spatial process is very computationally efficient, enabling effective estimation and prediction in large problems using Markov chain Monte Carlo techniques. We apply this model to creating avian relative abundance maps from North American Breeding Bird Survey (BBS) data. Variation in the ability of observers to count birds is modeled as spatially independent noise, resulting in over-dispersion relative to the Poisson assumption. This approach represents an improvement over existing approaches used for spatial modeling of BBS data which are either inefficient for continental scale modeling and prediction or fail to accommodate important distributional features of count data thus leading to inaccurate accounting of prediction uncertainty.
Yu, Victoria; Kishan, Amar U.; Cao, Minsong; Low, Daniel; Lee, Percy; Ruan, Dan
2014-03-15
Purpose: To demonstrate a new method of evaluating dose response of treatment-induced lung radiographic injury post-SBRT (stereotactic body radiotherapy) treatment and the discovery of bimodal dose behavior within clinically identified injury volumes. Methods: Follow-up CT scans at 3, 6, and 12 months were acquired from 24 patients treated with SBRT for stage-1 primary lung cancers or oligometastic lesions. Injury regions in these scans were propagated to the planning CT coordinates by performing deformable registration of the follow-ups to the planning CTs. A bimodal behavior was repeatedly observed from the probability distribution for dose values within the deformed injury regions. Based on a mixture-Gaussian assumption, an Expectation-Maximization (EM) algorithm was used to obtain characteristic parameters for such distribution. Geometric analysis was performed to interpret such parameters and infer the critical dose level that is potentially inductive of post-SBRT lung injury. Results: The Gaussian mixture obtained from the EM algorithm closely approximates the empirical dose histogram within the injury volume with good consistency. The average Kullback-Leibler divergence values between the empirical differential dose volume histogram and the EM-obtained Gaussian mixture distribution were calculated to be 0.069, 0.063, and 0.092 for the 3, 6, and 12 month follow-up groups, respectively. The lower Gaussian component was located at approximately 70% prescription dose (35 Gy) for all three follow-up time points. The higher Gaussian component, contributed by the dose received by planning target volume, was located at around 107% of the prescription dose. Geometrical analysis suggests the mean of the lower Gaussian component, located at 35 Gy, as a possible indicator for a critical dose that induces lung injury after SBRT. Conclusions: An innovative and improved method for analyzing the correspondence between lung radiographic injury and SBRT treatment dose has
Interpretation of AMS-02 electrons and positrons data
Mauro, M. Di; Donato, F.; Fornengo, N.; Vittino, A.; Lineros, R. E-mail: donato@to.infn.it E-mail: rlineros@ific.uv.es
2014-04-01
We perform a combined analysis of the recent AMS-02 data on electrons, positrons, electrons plus positrons and positron fraction, in a self-consistent framework where we realize a theoretical modeling of all the astrophysical components that can contribute to the observed fluxes in the whole energy range. The primary electron contribution is modeled through the sum of an average flux from distant sources and the fluxes from the local supernova remnants in the Green catalog. The secondary electron and positron fluxes originate from interactions on the interstellar medium of primary cosmic rays, for which we derive a novel determination by using AMS-02 proton and helium data. Primary positrons and electrons from pulsar wind nebulae in the ATNF catalog are included and studied in terms of their most significant (while loosely known) properties and under different assumptions (average contribution from the whole catalog, single dominant pulsar, a few dominant pulsars). We obtain a remarkable agreement between our various modeling and the AMS-02 data for all types of analysis, demonstrating that the whole AMS-02 leptonic data admit a self-consistent interpretation in terms of astrophysical contributions.
Revisiting the interpretation of casein micelle SAXS data.
Ingham, B; Smialowska, A; Erlangga, G D; Matia-Merino, L; Kirby, N M; Wang, C; Haverkamp, R G; Carr, A J
2016-08-17
An in-depth, critical review of model-dependent fitting of small-angle X-ray scattering (SAXS) data of bovine skim milk has led us to develop a new mathematical model for interpreting these data. Calcium-edge resonant soft X-ray scattering data provides unequivocal evidence as to the shape and location of the scattering due to colloidal calcium phosphate, which is manifested as a correlation peak centred at q = 0.035 Å(-1). In SAXS data this feature is seldom seen, although most literature studies attribute another feature centred at q = 0.08-0.1 Å(-1) to CCP. This work shows that the major SAXS features are due to protein arrangements: the casein micelle itself; internal regions approximately 20 nm in size, separated by water channels; and protein structures which are inhomogeneous on a 1-3 nm length scale. The assignment of these features is consistent with their behaviour under various conditions, including hydration time after reconstitution, addition of EDTA (a Ca-chelating agent), addition of urea, and reduction of pH. PMID:27491477
ERIC Educational Resources Information Center
Taffel, Selma
This report presents and interprets birth statistics for the United States with particular emphasis on changes that took place during the period 1970-73. Data for the report were based on information entered on birth certificates collected from all states. The majority of the document comprises graphs and tables of data, but there are four short…
Statistical analysis of life history calendar data.
Eerola, Mervi; Helske, Satu
2016-04-01
The life history calendar is a data-collection tool for obtaining reliable retrospective data about life events. To illustrate the analysis of such data, we compare the model-based probabilistic event history analysis and the model-free data mining method, sequence analysis. In event history analysis, we estimate instead of transition hazards the cumulative prediction probabilities of life events in the entire trajectory. In sequence analysis, we compare several dissimilarity metrics and contrast data-driven and user-defined substitution costs. As an example, we study young adults' transition to adulthood as a sequence of events in three life domains. The events define the multistate event history model and the parallel life domains in multidimensional sequence analysis. The relationship between life trajectories and excess depressive symptoms in middle age is further studied by their joint prediction in the multistate model and by regressing the symptom scores on individual-specific cluster indices. The two approaches complement each other in life course analysis; sequence analysis can effectively find typical and atypical life patterns while event history analysis is needed for causal inquiries. PMID:23117406
Novice Interpretations of Visual Representations of Geosciences Data
NASA Astrophysics Data System (ADS)
Burkemper, L. K.; Arthurs, L.
2013-12-01
Past cognition research of individual's perception and comprehension of bar and line graphs are substantive enough that they have resulted in the generation of graph design principles and graph comprehension theories; however, gaps remain in our understanding of how people process visual representations of data, especially of geologic and atmospheric data. This pilot project serves to build on others' prior research and begin filling the existing gaps. The primary objectives of this pilot project include: (i) design a novel data collection protocol based on a combination of paper-based surveys, think-aloud interviews, and eye-tracking tasks to investigate student data handling skills of simple to complex visual representations of geologic and atmospheric data, (ii) demonstrate that the protocol yields results that shed light on student data handling skills, and (iii) generate preliminary findings upon which tentative but perhaps helpful recommendations on how to more effectively present these data to the non-scientist community and teach essential data handling skills. An effective protocol for the combined use of paper-based surveys, think-aloud interviews, and computer-based eye-tracking tasks for investigating cognitive processes involved in perceiving, comprehending, and interpreting visual representations of geologic and atmospheric data is instrumental to future research in this area. The outcomes of this pilot study provide the foundation upon which future more in depth and scaled up investigations can build. Furthermore, findings of this pilot project are sufficient for making, at least, tentative recommendations that can help inform (i) the design of physical attributes of visual representations of data, especially more complex representations, that may aid in improving students' data handling skills and (ii) instructional approaches that have the potential to aid students in more effectively handling visual representations of geologic and atmospheric data
GO Explorer: A gene-ontology tool to aid in the interpretation of shotgun proteomics data
Carvalho, Paulo C; Fischer, Juliana SG; Chen, Emily I; Domont, Gilberto B; Carvalho, Maria GC; Degrave, Wim M; Yates, John R; Barbosa, Valmir C
2009-01-01
Background Spectral counting is a shotgun proteomics approach comprising the identification and relative quantitation of thousands of proteins in complex mixtures. However, this strategy generates bewildering amounts of data whose biological interpretation is a challenge. Results Here we present a new algorithm, termed GO Explorer (GOEx), that leverages the gene ontology (GO) to aid in the interpretation of proteomic data. GOEx stands out because it combines data from protein fold changes with GO over-representation statistics to help draw conclusions. Moreover, it is tightly integrated within the PatternLab for Proteomics project and, thus, lies within a complete computational environment that provides parsers and pattern recognition tools designed for spectral counting. GOEx offers three independent methods to query data: an interactive directed acyclic graph, a specialist mode where key words can be searched, and an automatic search. Its usefulness is demonstrated by applying it to help interpret the effects of perillyl alcohol, a natural chemotherapeutic agent, on glioblastoma multiform cell lines (A172). We used a new multi-surfactant shotgun proteomic strategy and identified more than 2600 proteins; GOEx pinpointed key sets of differentially expressed proteins related to cell cycle, alcohol catabolism, the Ras pathway, apoptosis, and stress response, to name a few. Conclusion GOEx facilitates organism-specific studies by leveraging GO and providing a rich graphical user interface. It is a simple to use tool, specialized for biologists who wish to analyze spectral counting data from shotgun proteomics. GOEx is available at . PMID:19239707
Statistical information of ASAR observations over wetland areas: An interaction model interpretation
NASA Astrophysics Data System (ADS)
Grings, F.; Salvia, M.; Karszenbaum, H.; Ferrazzoli, P.; Perna, P.; Barber, M.; Jacobo Berlles, J.
2010-01-01
This paper presents the results obtained after studying the relation between the statistical parameters that describe the backscattering distribution of junco marshes and their biophysical variables. The results are based on the texture analysis of a time series of Envisat ASAR C-band data (APP mode, V V +HH polarizations) acquired between October 2003 and January 2005 over the Lower Paraná River Delta, Argentina. The image power distributions were analyzed, and we show that the K distribution provides a good fitting of SAR data extracted from wetland observations for both polarizations. We also show that the estimated values of the order parameter of the K distribution can be explained using fieldwork and reasonable assumptions. In order to explore these results, we introduce a radiative transfer based interaction model to simulate the junco marsh σ0 distribution. After analyzing model simulations, we found evidence that the order parameter is related to the junco plant density distribution inside the junco marsh patch. It is concluded that the order parameter of the K distribution could be a useful parameter to estimate the junco plant density. This result is important for basin hydrodynamic modeling, since marsh plant density is the most important parameter to estimate marsh water conductance.
Physics in Perspective Volume II, Part C, Statistical Data.
ERIC Educational Resources Information Center
National Academy of Sciences - National Research Council, Washington, DC. Physics Survey Committee.
Statistical data relating to the sociology and economics of the physics enterprise are presented and explained. The data are divided into three sections: manpower data, data on funding and costs, and data on the literature of physics. Each section includes numerous studies, with notes on the sources and types of data, gathering procedures, and…
Interpreting two-photon imaging data of lymphocyte motility.
Meyer-Hermann, Michael E; Maini, Philip K
2005-06-01
Recently, using two-photon imaging it has been found that the movement of B and T cells in lymph nodes can be described by a random walk with persistence of orientation in the range of 2 minutes. We interpret this new class of lymphocyte motility data within a theoretical model. The model considers cell movement to be composed of the movement of subunits of the cell membrane. In this way movement and deformation of the cell are correlated to each other. We find that, indeed, the lymphocyte movement in lymph nodes can best be described as a random walk with persistence of orientation. The assumption of motility induced cell elongation is consistent with the data. Within the framework of our model the two-photon data suggest that T and B cells are in a single velocity state with large stochastic width. The alternative of three different velocity states with frequent changes of their state and small stochastic width is less likely. Two velocity states can be excluded. PMID:16089770
Enhancements to the Engine Data Interpretation System (EDIS)
NASA Astrophysics Data System (ADS)
Hofmann, Martin O.
1993-07-01
The Engine Data Interpretation System (EDIS) expert system project assists the data review personnel at NASA/MSFC in performing post-test data analysis and engine diagnosis of the Space Shuttle Main Engine (SSME). EDIS uses knowledge of the engine, its components, and simple thermodynamic principles instead of, and in addition to, heuristic rules gathered from the engine experts. EDIS reasons in cooperation with human experts, following roughly the pattern of logic exhibited by human experts. EDIS concentrates on steady-state static faults, such as small leaks, and component degradations, such as pump efficiencies. The objective of this contract was to complete the set of engine component models, integrate heuristic rules into EDIS, integrate the Power Balance Model into EDIS, and investigate modification of the qualitative reasoning mechanisms to allow 'fuzzy' value classification. The result of this contract is an operational version of EDIS. EDIS will become a module of the Post-Test Diagnostic System (PTDS) and will, in this context, provide system-level diagnostic capabilities which integrate component-specific findings provided by other modules.
Enhancements to the Engine Data Interpretation System (EDIS)
NASA Technical Reports Server (NTRS)
Hofmann, Martin O.
1993-01-01
The Engine Data Interpretation System (EDIS) expert system project assists the data review personnel at NASA/MSFC in performing post-test data analysis and engine diagnosis of the Space Shuttle Main Engine (SSME). EDIS uses knowledge of the engine, its components, and simple thermodynamic principles instead of, and in addition to, heuristic rules gathered from the engine experts. EDIS reasons in cooperation with human experts, following roughly the pattern of logic exhibited by human experts. EDIS concentrates on steady-state static faults, such as small leaks, and component degradations, such as pump efficiencies. The objective of this contract was to complete the set of engine component models, integrate heuristic rules into EDIS, integrate the Power Balance Model into EDIS, and investigate modification of the qualitative reasoning mechanisms to allow 'fuzzy' value classification. The result of this contract is an operational version of EDIS. EDIS will become a module of the Post-Test Diagnostic System (PTDS) and will, in this context, provide system-level diagnostic capabilities which integrate component-specific findings provided by other modules.
Enhancements to the Engine Data Interpretation System (EDIS)
NASA Technical Reports Server (NTRS)
Hofmann, Martin O.
1993-01-01
The Engine Data Interpretation System (EDIS) expert system project assists the data review personnel at NASA/MSFC in performing post-test data analysis and engine diagnosis of the Space Shuttle Main Engine (SSME). EDIS uses knowledge of the engine, its components, and simple thermodynamic principles instead of, and in addition to, heuristic rules gathered from the engine experts. EDIS reasons in cooperation with human experts, following roughly the pattern of logic exhibited by human experts. EDIS concentrates on steady-state static faults, such as small leaks, and component degradations, such as pump efficiencies. The objective of this contract was to complete the set of engine component models, integrate heuristic rules into EDIS, integrate the Power Balance Model into EDIS, and investigate modification of the qualitative reasoning mechanisms to allow 'fuzzy' value classification. The results of this contract is an operational version of EDIS. EDIS will become a module of the Post-Test Diagnostic System (PTDS) and will, in this context, provide system-level diagnostic capabilities which integrate component-specific findings provided by other modules.
A Decision Tree Approach to the Interpretation of Multivariate Statistical Techniques.
ERIC Educational Resources Information Center
Fok, Lillian Y.; And Others
1995-01-01
Discusses the nature, power, and limitations of four multivariate techniques: factor analysis, multiple analysis of variance, multiple regression, and multiple discriminant analysis. Shows how decision trees assist in interpreting results. (SK)
Transformations on Data Sets and Their Effects on Descriptive Statistics
ERIC Educational Resources Information Center
Fox, Thomas B.
2005-01-01
The activity asks students to examine the effects on the descriptive statistics of a data set that has undergone either a translation or a scale change. They make conjectures relative to the effects on the statistics of a transformation on a data set and then they defend their conjectures and deductively verify several of them.
Using Data from Climate Science to Teach Introductory Statistics
ERIC Educational Resources Information Center
Witt, Gary
2013-01-01
This paper shows how the application of simple statistical methods can reveal to students important insights from climate data. While the popular press is filled with contradictory opinions about climate science, teachers can encourage students to use introductory-level statistics to analyze data for themselves on this important issue in public…
The Empirical Nature and Statistical Treatment of Missing Data
ERIC Educational Resources Information Center
Tannenbaum, Christyn E.
2009-01-01
Introduction. Missing data is a common problem in research and can produce severely misleading analyses, including biased estimates of statistical parameters, and erroneous conclusions. In its 1999 report, the APA Task Force on Statistical Inference encouraged authors to report complications such as missing data and discouraged the use of…
Tsitouridou, Roxani; Papazova, Petia; Simeonova, Pavlina; Simeonov, Vasil
2013-01-01
The size distribution of aerosol particles (PM0.015-PM18) in relation to their soluble inorganic species and total water soluble organic compounds (WSOC) was investigated at an urban site of Thessaloniki, Northern Greece. The sampling period was from February to July 2007. The determined compounds were compared with mass concentrations of the PM fractions for nano (N: 0.015 < Dp < 0.06), ultrafine (UFP: 0.015 < Dp < 0.125), fine (FP: 0.015 < Dp < 2.0) and coarse particles (CP: 2.0 < Dp < 8.0) in order to perform mass closure of the water soluble content for the respective fractions. Electrolytes were the dominant species in all fractions (24-27%), followed by WSOC (16-23%). The water soluble inorganic and organic content was found to account for 53% of the nanoparticle, 48% of the ultrafine particle, 45% of the fine particle and 44% of the coarse particle mass. Correlations between the analyzed species were performed and the effect of local and long-range transported emissions was examined by wind direction and backward air mass trajectories. Multivariate statistical analysis (cluster analysis and principal components analysis) of the collected data was performed in order to reveal the specific data structure. Possible sources of air pollution were identified and an attempt is made to find patterns of similarity between the different sized aerosols and the seasons of monitoring. It was proven that several major latent factors are responsible for the data structure despite the size of the aerosols - mineral (soil) dust, sea sprays, secondary emissions, combustion sources and industrial impact. The seasonal separation proved to be not very specific. PMID:24007436
Experimental uncertainty estimation and statistics for data having interval uncertainty.
Kreinovich, Vladik (Applied Biomathematics, Setauket, New York); Oberkampf, William Louis (Applied Biomathematics, Setauket, New York); Ginzburg, Lev (Applied Biomathematics, Setauket, New York); Ferson, Scott (Applied Biomathematics, Setauket, New York); Hajagos, Janos (Applied Biomathematics, Setauket, New York)
2007-05-01
This report addresses the characterization of measurements that include epistemic uncertainties in the form of intervals. It reviews the application of basic descriptive statistics to data sets which contain intervals rather than exclusively point estimates. It describes algorithms to compute various means, the median and other percentiles, variance, interquartile range, moments, confidence limits, and other important statistics and summarizes the computability of these statistics as a function of sample size and characteristics of the intervals in the data (degree of overlap, size and regularity of widths, etc.). It also reviews the prospects for analyzing such data sets with the methods of inferential statistics such as outlier detection and regressions. The report explores the tradeoff between measurement precision and sample size in statistical results that are sensitive to both. It also argues that an approach based on interval statistics could be a reasonable alternative to current standard methods for evaluating, expressing and propagating measurement uncertainties.
Phase 1 report on sensor technology, data fusion and data interpretation for site characterization
Beckerman, M.
1991-10-01
In this report we discuss sensor technology, data fusion and data interpretation approaches of possible maximal usefulness for subsurface imaging and characterization of land-fill waste sites. Two sensor technologies, terrain conductivity using electromagnetic induction and ground penetrating radar, are described and the literature on the subject is reviewed. We identify the maximum entropy stochastic method as one providing a rigorously justifiable framework for fusing the sensor data, briefly summarize work done by us in this area, and examine some of the outstanding issues with regard to data fusion and interpretation. 25 refs., 17 figs.
Statistical analysis of marginal count failure data.
Karim, M R; Yamamoto, W; Suzuki, K
2001-06-01
Manufacturers want to assess the quality and reliability of their products. Specifically, they want to know the exact number of failures from the sales transacted during a particular month. Information available today is sometimes incomplete as many companies analyze their failure data simply comparing sales for a total month from a particular department with the total number of claims registered for that given month. This information--called marginal count data--is, thus, incomplete as it does not give the exact number of failures of the specific products that were sold in a particular month. In this paper we discuss nonparametric estimation of the mean numbers of failures for repairable products and the failure probabilities for nonrepairable products. We present a nonhomogeneous Poisson process model for repairable products and a multinomial model and its Poisson approximation for nonrepairable products. A numerical example is given and a simulation is carried out to evaluate the proposed methods of estimating failure probabilities under a number of possible situations. PMID:11458656
Statistical Modeling of Large-Scale Scientific Simulation Data
Eliassi-Rad, T; Baldwin, C; Abdulla, G; Critchlow, T
2003-11-15
With the advent of massively parallel computer systems, scientists are now able to simulate complex phenomena (e.g., explosions of a stars). Such scientific simulations typically generate large-scale data sets over the spatio-temporal space. Unfortunately, the sheer sizes of the generated data sets make efficient exploration of them impossible. Constructing queriable statistical models is an essential step in helping scientists glean new insight from their computer simulations. We define queriable statistical models to be descriptive statistics that (1) summarize and describe the data within a user-defined modeling error, and (2) are able to answer complex range-based queries over the spatiotemporal dimensions. In this chapter, we describe systems that build queriable statistical models for large-scale scientific simulation data sets. In particular, we present our Ad-hoc Queries for Simulation (AQSim) infrastructure, which reduces the data storage requirements and query access times by (1) creating and storing queriable statistical models of the data at multiple resolutions, and (2) evaluating queries on these models of the data instead of the entire data set. Within AQSim, we focus on three simple but effective statistical modeling techniques. AQSim's first modeling technique (called univariate mean modeler) computes the ''true'' (unbiased) mean of systematic partitions of the data. AQSim's second statistical modeling technique (called univariate goodness-of-fit modeler) uses the Andersen-Darling goodness-of-fit method on systematic partitions of the data. Finally, AQSim's third statistical modeling technique (called multivariate clusterer) utilizes the cosine similarity measure to cluster the data into similar groups. Our experimental evaluations on several scientific simulation data sets illustrate the value of using these statistical models on large-scale simulation data sets.
Control Statistics Process Data Base V4
Energy Science and Technology Software Center (ESTSC)
1998-05-07
The check standard database program, CSP_CB, is a menu-driven program that can acquire measurement data for check standards having a parameter dependence (such as frequency) or no parameter dependence (for example, mass measurements). The program may be run stand-alone or leaded as a subprogram to a Basic program already in memory. The software was designed to require little additional work on the part of the user. The facilitate this design goal, the program is entirelymore » menu-driven. In addition, the user does have control of file names and parameters within a definition file which sets up the basic scheme of file names.« less
Identification and interpretation of patterns in rocket engine data
NASA Astrophysics Data System (ADS)
Lo, C. F.; Wu, K.; Whitehead, B. A.
1993-10-01
A prototype software system was constructed to detect anomalous Space Shuttle Main Engine (SSME) behavior in the early stages of fault development significantly earlier than the indication provided by either redline detection mechanism or human expert analysis. The major task of the research project is to analyze ground test data, to identify patterns associated with the anomalous engine behavior, and to develop a pattern identification and detection system on the basis of this analysis. A prototype expert system which was developed on both PC and Symbolics 3670 lisp machine for detecting anomalies in turbopump vibration data was checked with data from ground tests 902-473, 902-501, 902-519, and 904-097 of the Space Shuttle Main Engine. The neural networks method was also applied to supplement the statistical method utilized in the prototype system to investigate the feasibility in detecting anomalies in turbopump vibration of SSME. In most cases the anomalies detected by the expert system agree with those reported by NASA. On the neural networks approach, the results are given the successful detection rate higher than 95 percent to identify either normal or abnormal running condition based on the experimental data as well as numerical simulation.
Identification and interpretation of patterns in rocket engine data
NASA Technical Reports Server (NTRS)
Lo, C. F.; Wu, K.; Whitehead, B. A.
1993-01-01
A prototype software system was constructed to detect anomalous Space Shuttle Main Engine (SSME) behavior in the early stages of fault development significantly earlier than the indication provided by either redline detection mechanism or human expert analysis. The major task of the research project is to analyze ground test data, to identify patterns associated with the anomalous engine behavior, and to develop a pattern identification and detection system on the basis of this analysis. A prototype expert system which was developed on both PC and Symbolics 3670 lisp machine for detecting anomalies in turbopump vibration data was checked with data from ground tests 902-473, 902-501, 902-519, and 904-097 of the Space Shuttle Main Engine. The neural networks method was also applied to supplement the statistical method utilized in the prototype system to investigate the feasibility in detecting anomalies in turbopump vibration of SSME. In most cases the anomalies detected by the expert system agree with those reported by NASA. On the neural networks approach, the results are given the successful detection rate higher than 95 percent to identify either normal or abnormal running condition based on the experimental data as well as numerical simulation.
"What If" Analyses: Ways to Interpret Statistical Significance Test Results Using EXCEL or "R"
ERIC Educational Resources Information Center
Ozturk, Elif
2012-01-01
The present paper aims to review two motivations to conduct "what if" analyses using Excel and "R" to understand the statistical significance tests through the sample size context. "What if" analyses can be used to teach students what statistical significance tests really do and in applied research either prospectively to estimate what sample size…
NASA Astrophysics Data System (ADS)
Bouzid, Mohamed; Sellaoui, Lotfi; Khalfaoui, Mohamed; Belmabrouk, Hafedh; Lamine, Abdelmottaleb Ben
2016-02-01
In this work, we studied the adsorption of ethanol on three types of activated carbon, namely parent Maxsorb III and two chemically modified activated carbons (H2-Maxsorb III and KOH-H2-Maxsorb III). This investigation has been conducted on the basis of the grand canonical formalism in statistical physics and on simplified assumptions. This led to three parameter equations describing the adsorption of ethanol onto the three types of activated carbon. There was a good correlation between experimental data and results obtained by the new proposed equation. The parameters characterizing the adsorption isotherm were the number of adsorbed molecules (s) per site n, the density of the receptor sites per unit mass of the adsorbent Nm, and the energetic parameter p1/2. They were estimated for the studied systems by a non linear least square regression. The results show that the ethanol molecules were adsorbed in perpendicular (or non parallel) position to the adsorbent surface. The magnitude of the calculated adsorption energies reveals that ethanol is physisorbed onto activated carbon. Both van der Waals and hydrogen interactions were involved in the adsorption process. The calculated values of the specific surface AS, proved that the three types of activated carbon have a highly microporous surface.
Eigenanalysis of SNP data with an identity by descent interpretation.
Zheng, Xiuwen; Weir, Bruce S
2016-02-01
Principal component analysis (PCA) is widely used in genome-wide association studies (GWAS), and the principal component axes often represent perpendicular gradients in geographic space. The explanation of PCA results is of major interest for geneticists to understand fundamental demographic parameters. Here, we provide an interpretation of PCA based on relatedness measures, which are described by the probability that sets of genes are identical-by-descent (IBD). An approximately linear transformation between ancestral proportions (AP) of individuals with multiple ancestries and their projections onto the principal components is found. In addition, a new method of eigenanalysis "EIGMIX" is proposed to estimate individual ancestries. EIGMIX is a method of moments with computational efficiency suitable for millions of SNP data, and it is not subject to the assumption of linkage equilibrium. With the assumptions of multiple ancestries and their surrogate ancestral samples, EIGMIX is able to infer ancestral proportions (APs) of individuals. The methods were applied to the SNP data from the HapMap Phase 3 project and the Human Genome Diversity Panel. The APs of individuals inferred by EIGMIX are consistent with the findings of the program ADMIXTURE. In conclusion, EIGMIX can be used to detect population structure and estimate genome-wide ancestral proportions with a relatively high accuracy. PMID:26482676
Chromosome microarrays in diagnostic testing: interpreting the genomic data.
Peters, Greg B; Pertile, Mark D
2014-01-01
DNA-based Chromosome MicroArrays (CMAs) are now well established as diagnostic tools in clinical genetics laboratories. Over the last decade, the primary application of CMAs has been the genome-wide detection of a particular class of mutation known as copy number variants (CNVs). Since 2010, CMA testing has been recommended as a first-tier test for detection of CNVs associated with intellectual disability, autism spectrum disorders, and/or multiple congenital anomalies…in the post-natal setting. CNVs are now regarded as pathogenic in 14-18 % of patients referred for these (and related) disorders.Through consideration of clinical examples, and several microarray platforms, we attempt to provide an appreciation of microarray diagnostics, from the initial inspection of the microarray data, to the composing of the patient report. In CMA data interpretation, a major challenge comes from the high frequency of clinically irrelevant CNVs observed within "patient" and "normal" populations. As might be predicted, the more common and clinically insignificant CNVs tend to be the smaller ones <100 kb in length, involving few or no known genes. However, this relationship is not at all straightforward: CNV length and gene content are only very imperfect indicators of CNV pathogenicity. Presently, there are no reliable means of separating, a priori, the benign from the pathological CNV classes.This chapter also considers sources of technical "noise" within CMA data sets. Some level of noise is inevitable in diagnostic genomics, given the very large number of data points generated in any one test. Noise further limits CMA resolution, and some miscalling of CNVs is unavoidable. In this, there is no ideal solution, but various strategies for handling noise are available. Even without solutions, consideration of these diagnostic problems per se is informative, as they afford critical insights into the biological and technical underpinnings of CNV discovery. These are indispensable
NASA Astrophysics Data System (ADS)
Lee, J.; Chang, H.
2001-12-01
In this research, we investigate the reciprocal influence between groundwater flow and its salinization occurred in two underground cavern sites, using major ion chemistry, PCA for chemical analysis data, and cross-correlation for various hydraulic data. The study areas are two underground LPG storage facilities constructed in South Sea coast, Yosu, and West Sea coastal regions, Pyeongtaek, Korea. Considerably high concentration of major cations and anions of groundwaters at both sites showed brackish or saline water types. In Yosu site, some great chemical difference of groundwater samples between rainy and dry season was caused by temporal intrusion of high-saline water into propane and butane cavern zone, but not in Pyeongtaek site. Cl/Br ratios and δ 18O- δ D distribution for tracing of salinization source water in both sites revealed that two kind of saline water (seawater and halite-dissolved solution) could influence the groundwater salinization in Yosu site, whereas only seawater intrusion could affect the groundwater chemistry of the observation wells in Pyeongtaek site. PCA performed by 8 and 10 chemical ions as statistical variables in both sites showed that intensive intrusion of seawater through butane cavern was occurred at Yosu site while seawater-groundwater mixing was observed at some observation wells located in the marginal part of Pyeongtaek site. Cross-correlation results revealed that the positive relationship between hydraulic head and cavern operating pressure was far more conspicuous at propane cavern zone in both sites (65 ~90% of correlation coefficients). According to the cross-correlation results of Yosu site, small change of head could provoke massive influx of halite-dissolved solution from surface through vertically developed fracture networks. However in Pyeongtaek site, the pressure-sensitive observation wells are not completely consistent with seawater-mixed wells, and the hydraulic change of heads at these wells related to the
Importance of data management with statistical analysis set division.
Wang, Ling; Li, Chan-juan; Jiang, Zhi-wei; Xia, Jie-lai
2015-11-01
Testing of hypothesis was affected by statistical analysis set division which was an important data management work before data base lock-in. Objective division of statistical analysis set under blinding was the guarantee of scientific trial conclusion. All the subjects having accepted at least once trial treatment after randomization should be concluded in safety set. Full analysis set should be close to the intention-to-treat as far as possible. Per protocol set division was the most difficult to control in blinded examination because of more subjectivity than the other two. The objectivity of statistical analysis set division must be guaranteed by the accurate raw data, the comprehensive data check and the scientific discussion, all of which were the strict requirement of data management. Proper division of statistical analysis set objectively and scientifically is an important approach to improve the data management quality. PMID:26911044
Clinical interpretation of CNVs with cross-species phenotype data
Czeschik, Johanna Christina; Doelken, Sandra C; Hehir-Kwa, Jayne Y; Ibn-Salem, Jonas; Mungall, Christopher J; Smedley, Damian; Haendel, Melissa A; Robinson, Peter N
2015-01-01
Background Clinical evaluation of CNVs identified via techniques such as array comparative genome hybridisation (aCGH) involves the inspection of lists of known and unknown duplications and deletions with the goal of distinguishing pathogenic from benign CNVs. A key step in this process is the comparison of the individual's phenotypic abnormalities with those associated with Mendelian disorders of the genes affected by the CNV. However, because often there is not much known about these human genes, an additional source of data that could be used is model organism phenotype data. Currently, almost 6000 genes in mouse and zebrafish are, when knocked out, associated with a phenotype in the model organism, but no disease is known to be caused by mutations in the human ortholog. Yet, searching model organism databases and comparing model organism phenotypes with patient phenotypes for identifying novel disease genes and medical evaluation of CNVs is hindered by the difficulty in integrating phenotype information across species and the lack of appropriate software tools. Methods Here, we present an integrated ranking scheme based on phenotypic matching, degree of overlap with known benign or pathogenic CNVs and the haploinsufficiency score for the prioritisation of CNVs responsible for a patient's clinical findings. Results We show that this scheme leads to significant improvements compared with rankings that do not exploit phenotypic information. We provide a software tool called PhenogramViz, which supports phenotype-driven interpretation of aCGH findings based on multiple data sources, including the integrated cross-species phenotype ontology Uberpheno, in order to visualise gene-to-phenotype relations. Conclusions Integrating and visualising cross-species phenotype information on the affected genes may help in routine diagnostics of CNVs. PMID:25280750
Reading, storing and statistical calculation of weight data.
Schliack, M
1987-02-01
A BASIC program is described which reads and computes weight data. Tara and gross weight data are read from an analytical balance. The net weights are calculated and stored on a disc. A statistical test (e.g. unpaired t-test or unpaired Wilcoxon test) can then be carried out with the weight data. The program calculates a descriptive statistic before the tests. PMID:3829652
Boyle temperature as a point of ideal gas in gentile statistics and its economic interpretation
NASA Astrophysics Data System (ADS)
Maslov, V. P.; Maslova, T. V.
2014-07-01
Boyle temperature is interpreted as the temperature at which the formation of dimers becomes impossible. To Irving Fisher's correspondence principle we assign two more quantities: the number of degrees of freedom, and credit. We determine the danger level of the mass of money M when the mutual trust between economic agents begins to fall.
The Galactic Center: possible interpretations of observational data.
NASA Astrophysics Data System (ADS)
Zakharov, Alexander
2015-08-01
There are not too many astrophysical cases where one really has an opportunity to check predictions of general relativity in the strong gravitational field limit. For these aims the black hole at the Galactic Center is one of the most interesting cases since it is the closest supermassive black hole. Gravitational lensing is a natural phenomenon based on the effect of light deflection in a gravitational field (isotropic geodesics are not straight lines in gravitational field and in a weak gravitational field one has small corrections for light deflection while the perturbative approach is not suitable for a strong gravitational field). Now there are two basic observational techniques to investigate a gravitational potential at the Galactic Center, namely, a) monitoring the orbits of bright stars near the Galactic Center to reconstruct a gravitational potential; b) measuring a size and a shape of shadows around black hole giving an alternative possibility to evaluate black hole parameters in mm-band with VLBI-technique. At the moment one can use a small relativistic correction approach for stellar orbit analysis (however, in the future the approximation will not be not precise enough due to enormous progress of observational facilities) while now for smallest structure analysis in VLBI observations one really needs a strong gravitational field approximation. We discuss results of observations, their conventional interpretations, tensions between observations and models and possible hints for a new physics from the observational data and tensions between observations and interpretations.References1. A.F. Zakharov, F. De Paolis, G. Ingrosso, and A. A. Nucita, New Astronomy Reviews, 56, 64 (2012).2. D. Borka, P. Jovanovic, V. Borka Jovanovic and A.F. Zakharov, Physical Reviews D, 85, 124004 (2012).3. D. Borka, P. Jovanovic, V. Borka Jovanovic and A.F. Zakharov, Journal of Cosmology and Astroparticle Physics, 11, 050 (2013).4. A.F. Zakharov, Physical Reviews D 90
Explorations in Statistics: The Analysis of Ratios and Normalized Data
ERIC Educational Resources Information Center
Curran-Everett, Douglas
2013-01-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This ninth installment of "Explorations in Statistics" explores the analysis of ratios and normalized--or standardized--data. As researchers, we compute a ratio--a numerator divided by a denominator--to compute a…
Textual Analysis and Data Mining: An Interpreting Research on Nursing.
De Caro, W; Mitello, L; Marucci, A R; Lancia, L; Sansoni, J
2016-01-01
Every day there is a data explosion on the web. In 2013, 5 exabytes of content were created each day. Every hour internet networks carries a quantity of texts equivalent to twenty billion books. For idea Iit is a huge mass of information on the linguistic behavior of people and society that was unthinkable until a few years ago. It is an opportunity for valuable analysis for understanding social phenomena, also in nursing and health care sector.This poster shows the the steps of an idealy strategy for textual statistical analysis and the process of extracting useful information about health care, referring expecially nursing care from journal and web information. We show the potential of web tools of Text Mining applications (DTM, Wordle, Voyant Tools, Taltac 2.10, Treecloud and other web 2.0 app) analyzing text data and information extraction about sentiment, perception, scientific activites and visibility of nursing. This specific analysis is conduct analyzing "Repubblica", first newspaper in Italy (years of analisys: 2012-14) and one italian scientific nursing journal (years: 2012-14). PMID:27332424
Using Data Mining to Teach Applied Statistics and Correlation
ERIC Educational Resources Information Center
Hartnett, Jessica L.
2016-01-01
This article describes two class activities that introduce the concept of data mining and very basic data mining analyses. Assessment data suggest that students learned some of the conceptual basics of data mining, understood some of the ethical concerns related to the practice, and were able to perform correlations via the Statistical Package for…
2013-01-01
Background High-throughput omics technologies such as microarrays and next-generation sequencing (NGS) have become indispensable tools in biological research. Computational analysis and biological interpretation of omics data can pose significant challenges due to a number of factors, in particular the systems integration required to fully exploit and compare data from different studies and/or technology platforms. In transcriptomics, the identification of differentially expressed genes when studying effect(s) or contrast(s) of interest constitutes the starting point for further downstream computational analysis (e.g. gene over-representation/enrichment analysis, reverse engineering) leading to mechanistic insights. Therefore, it is important to systematically store the full list of genes with their associated statistical analysis results (differential expression, t-statistics, p-value) corresponding to one or more effect(s) or contrast(s) of interest (shortly termed as ” contrast data”) in a comparable manner and extract gene sets in order to efficiently support downstream analyses and further leverage data on a long-term basis. Filling this gap would open new research perspectives for biologists to discover disease-related biomarkers and to support the understanding of molecular mechanisms underlying specific biological perturbation effects (e.g. disease, genetic, environmental, etc.). Results To address these challenges, we developed Confero, a contrast data and gene set platform for downstream analysis and biological interpretation of omics data. The Confero software platform provides storage of contrast data in a simple and standard format, data transformation to enable cross-study and platform data comparison, and automatic extraction and storage of gene sets to build new a priori knowledge which is leveraged by integrated and extensible downstream computational analysis tools. Gene Set Enrichment Analysis (GSEA) and Over-Representation Analysis (ORA) are
Korjus, Kristjan; Hebart, Martin N; Vicente, Raul
2016-01-01
Supervised machine learning methods typically require splitting data into multiple chunks for training, validating, and finally testing classifiers. For finding the best parameters of a classifier, training and validation are usually carried out with cross-validation. This is followed by application of the classifier with optimized parameters to a separate test set for estimating the classifier's generalization performance. With limited data, this separation of test data creates a difficult trade-off between having more statistical power in estimating generalization performance versus choosing better parameters and fitting a better model. We propose a novel approach that we term "Cross-validation and cross-testing" improving this trade-off by re-using test data without biasing classifier performance. The novel approach is validated using simulated data and electrophysiological recordings in humans and rodents. The results demonstrate that the approach has a higher probability of discovering significant results than the standard approach of cross-validation and testing, while maintaining the nominal alpha level. In contrast to nested cross-validation, which is maximally efficient in re-using data, the proposed approach additionally maintains the interpretability of individual parameters. Taken together, we suggest an addition to currently used machine learning approaches which may be particularly useful in cases where model weights do not require interpretation, but parameters do. PMID:27564393