A novel statistical analysis and interpretation of flow cytometry data
Banks, H.T.; Kapraun, D.F.; Thompson, W. Clayton; Peligero, Cristina; Argilaguet, Jordi; Meyerhans, Andreas
2013-01-01
A recently developed class of models incorporating the cyton model of population generation structure into a conservation-based model of intracellular label dynamics is reviewed. Statistical aspects of the data collection process are quantified and incorporated into a parameter estimation scheme. This scheme is then applied to experimental data for PHA-stimulated CD4+ T and CD8+ T cells collected from two healthy donors. This novel mathematical and statistical framework is shown to form the basis for accurate, meaningful analysis of cellular behaviour for a population of cells labelled with the dye carboxyfluorescein succinimidyl ester and stimulated to divide. PMID:23826744
A statistical model for interpreting computerized dynamic posturography data
NASA Technical Reports Server (NTRS)
Feiveson, Alan H.; Metter, E. Jeffrey; Paloski, William H.
2002-01-01
Computerized dynamic posturography (CDP) is widely used for assessment of altered balance control. CDP trials are quantified using the equilibrium score (ES), which ranges from zero to 100, as a decreasing function of peak sway angle. The problem of how best to model and analyze ESs from a controlled study is considered. The ES often exhibits a skewed distribution in repeated trials, which can lead to incorrect inference when applying standard regression or analysis of variance models. Furthermore, CDP trials are terminated when a patient loses balance. In these situations, the ES is not observable, but is assigned the lowest possible score--zero. As a result, the response variable has a mixed discrete-continuous distribution, further compromising inference obtained by standard statistical methods. Here, we develop alternative methodology for analyzing ESs under a stochastic model extending the ES to a continuous latent random variable that always exists, but is unobserved in the event of a fall. Loss of balance occurs conditionally, with probability depending on the realized latent ES. After fitting the model by a form of quasi-maximum-likelihood, one may perform statistical inference to assess the effects of explanatory variables. An example is provided, using data from the NIH/NIA Baltimore Longitudinal Study on Aging.
Misuse of statistics in the interpretation of data on low-level radiation
Hamilton, L.D.
1982-01-01
Four misuses of statistics in the interpretation of data of low-level radiation are reviewed: (1) post-hoc analysis and aggregation of data leading to faulty conclusions in the reanalysis of genetic effects of the atomic bomb, and premature conclusions on the Portsmouth Naval Shipyard data; (2) inappropriate adjustment for age and ignoring differences between urban and rural areas leading to potentially spurious increase in incidence of cancer at Rocky Flats; (3) hazard of summary statistics based on ill-conditioned individual rates leading to spurious association between childhood leukemia and fallout in Utah; and (4) the danger of prematurely published preliminary work with inadequate consideration of epidemiological problems - censored data - leading to inappropriate conclusions, needless alarm at the Portsmouth Naval Shipyard, and diversion of scarce research funds.
Statistics Translated: A Step-by-Step Guide to Analyzing and Interpreting Data
ERIC Educational Resources Information Center
Terrell, Steven R.
2012-01-01
Written in a humorous and encouraging style, this text shows how the most common statistical tools can be used to answer interesting real-world questions, presented as mysteries to be solved. Engaging research examples lead the reader through a series of six steps, from identifying a researchable problem to stating a hypothesis, identifying…
Logical, epistemological and statistical aspects of nature-nurture data interpretation.
Kempthorne, O
1978-03-01
In this paper the nature of the reasoning processes applied to the nature-nurture question is discussed in general and with particular reference to mental and behavioral traits. The nature of data analysis and analysis of variance is discussed. Necessarily, the nature of causation is considered. The notion that mere data analysis can establish "real" causation is attacked. Logic of quantitative genetic theory is reviewed briefly. The idea that heritability is meaningful in the human mental and behavioral arena is attacked. The conclusion is that the heredity-IQ controversy has been a "tale full of sound and fury, signifying nothing". To suppose that one can establish effects of an intervention process when it does not occur in the data is plainly ludicrous. Mere observational studies can easily lead to stupidities, and it is suggested that this has happened in the heredity-IQ arena. The idea that there are racial-genetic differences in mental abilities and behavioral traits of humans is, at best, no more than idle speculation. PMID:637918
Statistical weld process monitoring with expert interpretation
Cook, G.E.; Barnett, R.J.; Strauss, A.M.; Thompson, F.M. Jr.
1996-12-31
A statistical weld process monitoring system is described. Using data of voltage, current, wire feed speed, gas flow rate, travel speed, and elapsed arc time collected while welding, the welding statistical process control (SPC) tool provides weld process quality control by implementing techniques of data trending analysis, tolerance analysis, and sequential analysis. For purposes of quality control, the control limits required for acceptance are specified in the weld procedure acceptance specifications. The control charts then provide quality assurance documentation for each weld. The statistical data trending analysis performed by the SPC program is not only valuable as a quality assurance monitoring and documentation system, it is also valuable in providing diagnostic assistance in troubleshooting equipment and material problems. Possible equipment/process problems are identified and matched with features of the SPC control charts. To aid in interpreting the voluminous statistical output generated by the SPC system, a large number of If-Then rules have been devised for providing computer-based expert advice for pinpointing problems based on out-of-limit variations of the control charts. The paper describes the SPC monitoring tool and the rule-based expert interpreter that has been developed for relating control chart trends to equipment/process problems.
NASA Astrophysics Data System (ADS)
Irving, J.; Knight, R.; Holliger, K.
2007-12-01
The distribution of subsurface water content can be an excellent indicator of soil texture, which strongly influences the unsaturated hydraulic properties controlling vadose zone contaminant transport. Characterizing the heterogeneity in subsurface water content for use in numerical transport models, however, is an extremely difficult task as conventional hydrological measurement techniques do not offer the combined high spatial resolution and coverage required for accurate simulations. A number of recent studies have shown that ground-penetrating radar (GPR) reflection images may contain useful information regarding the statistical structure of subsurface water content. Comparisons of the horizontal correlation structures of radar images and those obtained from water content measurements have shown that, in some cases, the statistical characteristics are remarkably similar. However, a key issue in these studies is that a reflection GPR image is primarily related to changes in subsurface water content, and not the water content distribution directly. As a result, statistics gathered on the reflection image have a very complex relationship with the statistics of the underlying water content distribution, this relationship depending on a number of factors including the frequency of the GPR antennas used. In this work, we attempt to address the above issue by posing the estimation of the statistical structure of water content from reflection GPR data as an inverse problem. Using a simple convolution model for a radar image, we first derive a forward model relating the statistical structure of a radar image to that of the underlying water content distribution. We then use this forward model to invert for the spatial statistics of the water content distribution, given the spatial statistics of the GPR reflection image as data. We do this within a framework of uncertainty, such that realistic statistical bounds can be placed on the information that is inferred. In other
Asfahani, Jamal
2014-02-01
Factor analysis technique is proposed in this research for interpreting the combination of nuclear well logging, including natural gamma ray, density and neutron-porosity, and the electrical well logging of long and short normal, in order to characterize the large extended basaltic areas in southern Syria. Kodana well logging data are used for testing and applying the proposed technique. The four resulting score logs enable to establish the lithological score cross-section of the studied well. The established cross-section clearly shows the distribution and the identification of four kinds of basalt which are hard massive basalt, hard basalt, pyroclastic basalt and the alteration basalt products, clay. The factor analysis technique is successfully applied on the Kodana well logging data in southern Syria, and can be used efficiently when several wells and huge well logging data with high number of variables are required to be interpreted. PMID:24296157
The advent of new higher throughput analytical instrumentation has put a strain on interpreting and explaining the results from complex studies. Contemporary human, environmental, and biomonitoring data sets are comprised of tens or hundreds of analytes, multiple repeat measures...
Statistical mechanics and the ontological interpretation
NASA Astrophysics Data System (ADS)
Bohm, D.; Hiley, B. J.
1996-06-01
To complete our ontological interpretation of quantum theory we have to conclude a treatment of quantum statistical mechanics. The basic concepts in the ontological approach are the particle and the wave function. The density matrix cannot play a fundamental role here. Therefore quantum statistical mechanics will require a further statistical distribution over wave functions in addition to the distribution of particles that have a specified wave function. Ultimately the wave function of the universe will he required, but we show that if the universe in not in thermodynamic equilibrium then it can he treated in terms of weakly interacting large scale constituents that are very nearly independent of each other. In this way we obtain the same results as those of the usual approach within the framework of the ontological interpretation.
Fadda, Valeria; Maratea, Dario; Trippoli, Sabrina; Gatto, Roberta; De Rosa, Mauro; Marinai, Claudio
2014-01-01
Background: No equivalence analysis has yet been conducted on the effectiveness of biologics in rheumatoid arthritis. Equivalence testing has a specific scientific interest, but can also be useful for deciding whether acquisition tenders are feasible for the pharmacological agents being compared. Methods: Our search covered the literature up to August 2014. Our methodology was a combination of standard pairwise meta-analysis, Bayesian network meta-analysis and equivalence testing. The agents examined for their potential equivalence were etanercept, adalimumab, golimumab, certolizumab, and tocilizumab, each in combination with methotrexate (MTX). The reference treatment was MTX monotherapy. The endpoint was ACR50 achievement at 12 months. Odds ratio was the outcome measure. The equivalence margins were established by analyzing the statistical power data of the trials. Results: Our search identified seven randomized controlled trials (2846 patients). No study was retrieved for tocilizumab, and so only four biologics were evaluable. The equivalence range was set at odds ratio from 0.56 to 1.78. There were 10 head-to-head comparisons (4 direct, 6 indirect). Bayesian network meta-analysis estimated the odds ratio (with 90% credible intervals) for each of these comparisons. Between-trial heterogeneity was marked. According to our results, all credible intervals of the 10 comparisons were wide and none of them satisfied the equivalence criterion. A superiority finding was confirmed for the treatment with MTX plus adalimumab or certolizumab in comparison with MTX monotherapy, but not for the other two biologics. Conclusion: Our results indicate that these four biologics improved the rates of ACR50 achievement, but there was an evident between-study heterogeneity. The head-to-head indirect comparisons between individual biologics showed no significant difference, but failed to demonstrate the proof of no difference (i.e. equivalence). This body of evidence presently
Nash, J. Thomas; Frishman, David
1983-01-01
Analytical results for 61 elements in 370 samples from the Ranger Mine area are reported. Most of the rocks come from drill core in the Ranger No. 1 and Ranger No. 3 deposits, but 20 samples are from unmineralized drill core more than 1 km from ore. Statistical tests show that the elements Mg, Fe, F, Be, Co, Li, Ni, Pb, Sc, Th, Ti, V, CI, As, Br, Au, Ce, Dy, La Sc, Eu, Tb, Yb, and Tb have positive association with uranium, and Si, Ca, Na, K, Sr, Ba, Ce, and Cs have negative association. For most lithologic subsets Mg, Fe, Li, Cr, Ni, Pb, V, Y, Sm, Sc, Eu, and Yb are significantly enriched in ore-bearing rocks, whereas Ca, Na, K, Sr, Ba, Mn, Ce, and Cs are significantly depleted. These results are consistent with petrographic observations on altered rocks. Lithogeochemistry can aid exploration, but for these rocks requires methods that are expensive and not amenable to routine use.
Interpreting Educational Research Using Statistical Software.
ERIC Educational Resources Information Center
Evans, Elizabeth A.
A live demonstration of how a typical set of educational data can be examined using quantitative statistical software was conducted. The topic of tutorial support was chosen. Setting up a hypothetical research scenario, the researcher created 300 cases from random data generation adjusted to correct obvious error. Each case represented a student…
NASA Astrophysics Data System (ADS)
Tadaki, Kohtaro
2010-12-01
The statistical mechanical interpretation of algorithmic information theory (AIT, for short) was introduced and developed by our former works [K. Tadaki, Local Proceedings of CiE 2008, pp. 425-434, 2008] and [K. Tadaki, Proceedings of LFCS'09, Springer's LNCS, vol. 5407, pp. 422-440, 2009], where we introduced the notion of thermodynamic quantities, such as partition function Z(T), free energy F(T), energy E(T), statistical mechanical entropy S(T), and specific heat C(T), into AIT. We then discovered that, in the interpretation, the temperature T equals to the partial randomness of the values of all these thermodynamic quantities, where the notion of partial randomness is a stronger representation of the compression rate by means of program-size complexity. Furthermore, we showed that this situation holds for the temperature T itself, which is one of the most typical thermodynamic quantities. Namely, we showed that, for each of the thermodynamic quantities Z(T), F(T), E(T), and S(T) above, the computability of its value at temperature T gives a sufficient condition for T (0,1) to satisfy the condition that the partial randomness of T equals to T. In this paper, based on a physical argument on the same level of mathematical strictness as normal statistical mechanics in physics, we develop a total statistical mechanical interpretation of AIT which actualizes a perfect correspondence to normal statistical mechanics. We do this by identifying a microcanonical ensemble in the framework of AIT. As a result, we clarify the statistical mechanical meaning of the thermodynamic quantities of AIT.
Use and interpretation of statistics in wildlife journals
Tacha, Thomas C.; Warde, William D.; Burnham, Kenneth P.
1982-01-01
Use and interpretation of statistics in wildlife journals are reviewed, and suggestions for improvement are offered. Populations from which inferences are to be drawn should be clearly defined, and conclusions should be limited to the range of the data analyzed. Authors should be careful to avoid improper methods of plotting data and should clearly define the use of estimates of variance, standard deviation, standard error, or confidence intervals. Biological and statistical significant are often confused by authors and readers. Statistical hypothesis testing is a tool, and not every question should be answered by hypothesis testing. Meeting assumptions of hypothesis tests is the responsibility of authors, and assumptions should be reviewed before a test is employed. The use of statistical tools should be considered carefully both before and after gathering data.
Integrating statistical rock physics and sedimentology for quantitative seismic interpretation
NASA Astrophysics Data System (ADS)
Avseth, Per; Mukerji, Tapan; Mavko, Gary; Gonzalez, Ezequiel
This paper presents an integrated approach for seismic reservoir characterization that can be applied both in petroleum exploration and in hydrological subsurface analysis. We integrate fundamental concepts and models of rock physics, sedimentology, statistical pattern recognition, and information theory, with seismic inversions and geostatistics. Rock physics models enable us to link seismic amplitudes to geological facies and reservoir properties. Seismic imaging brings indirect, noninvasive, but nevertheless spatially exhaustive information about the reservoir properties that are not available from well data alone. Classification and estimation methods based on computational statistical techniques such as nonparametric Bayesian classification, Monte Carlo simulations and bootstrap, help to quantitatively measure the interpretation uncertainty and the mis-classification risk at each spatial location. Geostatistical stochastic simulations incorporate the spatial correlation and the small scale variability which is hard to capture with only seismic information because of the limits of resolution. Combining deterministic physical models with statistical techniques has provided us with a successful way of performing quantitative interpretation and estimation of reservoir properties from seismic data. These formulations identify not only the most likely interpretation but also the uncertainty of the interpretation, and serve as a guide for quantitative decision analysis. The methodology shown in this article is applied successfully to map petroleum reservoirs, and the examples are from relatively deeply buried oil fields. However, we suggest that this approach can also be carried out for improved characterization of shallow hydrologic aquifers using shallow seismic or GPR data.
Tuberculosis Data and Statistics
... Organization Chart Advisory Groups Federal TB Task Force Data and Statistics Language: English Español (Spanish) Recommend on ... United States publication. PDF [6 MB] Interactive TB Data Tool Online Tuberculosis Information System (OTIS) OTIS is ...
As watershed groups in the state of Georgia form and develop, they have a need for collecting, managing, and analyzing data associated with their watershed. Possible sources of data for flow, water quality, biology, habitat, and watershed characteristics include the U.S. Geologic...
Data collection and interpretation.
Citerio, Giuseppe; Park, Soojin; Schmidt, J Michael; Moberg, Richard; Suarez, Jose I; Le Roux, Peter D
2015-06-01
Patient monitoring is routinely performed in all patients who receive neurocritical care. The combined use of monitors, including the neurologic examination, laboratory analysis, imaging studies, and physiological parameters, is common in a platform called multi-modality monitoring (MMM). However, the full potential of MMM is only beginning to be realized since for the most part, decision making historically has focused on individual aspects of physiology in a largely threshold-based manner. The use of MMM now is being facilitated by the evolution of bio-informatics in critical care including developing techniques to acquire, store, retrieve, and display integrated data and new analytic techniques for optimal clinical decision making. In this review, we will discuss the crucial initial steps toward data and information management, which in this emerging era of data-intensive science is already shifting concepts of care for acute brain injury and has the potential to both reshape how we do research and enhance cost-effective clinical care. PMID:25846711
The Statistical Interpretation of Entropy: An Activity
ERIC Educational Resources Information Center
Timmberlake, Todd
2010-01-01
The second law of thermodynamics, which states that the entropy of an isolated macroscopic system can increase but will not decrease, is a cornerstone of modern physics. Ludwig Boltzmann argued that the second law arises from the motion of the atoms that compose the system. Boltzmann's statistical mechanics provides deep insight into the…
Interpreting Data: The Hybrid Mind
ERIC Educational Resources Information Center
Heisterkamp, Kimberly; Talanquer, Vicente
2015-01-01
The central goal of this study was to characterize major patterns of reasoning exhibited by college chemistry students when analyzing and interpreting chemical data. Using a case study approach, we investigated how a representative student used chemical models to explain patterns in the data based on structure-property relationships. Our results…
The Statistical Interpretation of Entropy: An Activity
NASA Astrophysics Data System (ADS)
Timmberlake, Todd
2010-11-01
The second law of thermodynamics, which states that the entropy of an isolated macroscopic system can increase but will not decrease, is a cornerstone of modern physics. Ludwig Boltzmann argued that the second law arises from the motion of the atoms that compose the system. Boltzmann's statistical mechanics provides deep insight into the functioning of the second law and also provided evidence for the existence of atoms at a time when many scientists (like Ernst Mach and Wilhelm Ostwald) were skeptical.
Workplace statistical literacy for teachers: interpreting box plots
NASA Astrophysics Data System (ADS)
Pierce, Robyn; Chick, Helen
2013-06-01
As a consequence of the increased use of data in workplace environments, there is a need to understand the demands that are placed on users to make sense of such data. In education, teachers are being increasingly expected to interpret and apply complex data about student and school performance, and, yet it is not clear that they always have the appropriate knowledge and experience to interpret the graphs, tables and other data that they receive. This study examined the statistical literacy demands placed on teachers, with a particular focus on box plot representations. Although box plots summarise the data in a way that makes visual comparisons possible across sets of data, this study showed that teachers do not always have the necessary fluency with the representation to describe correctly how the data are distributed in the representation. In particular, a significant number perceived the size of the regions of the box plot to be depicting frequencies rather than density, and there were misconceptions associated with outlying data that were not displayed on the plot. As well, teachers' perceptions of box plots were found to relate to three themes: attitudes, perceived value and misconceptions.
For a statistical interpretation of Helmholtz' thermal displacement
NASA Astrophysics Data System (ADS)
Podio-Guidugli, Paolo
2016-05-01
On moving from the classic papers by Einstein and Langevin on Brownian motion, two consistent statistical interpretations are given for the thermal displacement, a scalar field formally introduced by Helmholtz, whose time derivative is by definition the absolute temperature.
Paleomicrobiology Data: Authentification and Interpretation.
Drancourt, Michel
2016-06-01
The authenticity of some of the very first works in the field of paleopathology has been questioned, and standards have been progressively established for the experiments and the interpretation of data. Whereas most problems initially arose from the contamination of ancient specimens with modern human DNA, the situation is different in the field of paleomicrobiology, in which the risk for contamination is well-known and adequately managed by any laboratory team with expertise in the routine diagnosis of modern-day infections. Indeed, the exploration of ancient microbiota and pathogens is best done by such laboratory teams, with research directed toward the discovery and implementation of new techniques and the interpretation of data. PMID:27337456
The Statistical Interpretation of Classical Thermodynamic Heating and Expansion Processes
ERIC Educational Resources Information Center
Cartier, Stephen F.
2011-01-01
A statistical model has been developed and applied to interpret thermodynamic processes typically presented from the macroscopic, classical perspective. Through this model, students learn and apply the concepts of statistical mechanics, quantum mechanics, and classical thermodynamics in the analysis of the (i) constant volume heating, (ii)…
Muscular Dystrophy: Data and Statistics
... Statistics Recommend on Facebook Tweet Share Compartir MD STAR net Data and Statistics The following data and ... research [ Read Article ] For more information on MD STAR net see Research and Tracking . Key Findings Feature ...
Linda Stetzenbach; Lauren Nemnich; Davor Novosel
2009-08-31
Three independent tasks had been performed (Stetzenbach 2008, Stetzenbach 2008b, Stetzenbach 2009) to measure a variety of parameters in normative buildings across the United States. For each of these tasks 10 buildings were selected as normative indoor environments. Task 1 focused on office buildings, Task 13 focused on public schools, and Task 0606 focused on high performance buildings. To perform this task it was necessary to restructure the database for the Indoor Environmental Quality (IEQ) data and the Sound measurement as several issues were identified and resolved prior to and during the transfer of these data sets into SPSS. During overview discussions with the statistician utilized in this task it was determined that because the selection of indoor zones (1-6) was independently selected within each task; zones were not related by location across tasks. Therefore, no comparison would be valid across zones for the 30 buildings so the by location (zone) data were limited to three analysis sets of the buildings within each task. In addition, differences in collection procedures for lighting were used in Task 0606 as compared to Tasks 01 & 13 to improve sample collection. Therefore, these data sets could not be merged and compared so effects by-day data were run separately for Task 0606 and only Task 01 & 13 data were merged. Results of the statistical analysis of the IEQ parameters show statistically significant differences were found among days and zones for all tasks, although no differences were found by-day for Draft Rate data from Task 0606 (p>0.05). Thursday measurements of IEQ parameters were significantly different from Tuesday, and most Wednesday measures for all variables of Tasks 1 & 13. Data for all three days appeared to vary for Operative Temperature, whereas only Tuesday and Thursday differed for Draft Rate 1m. Although no Draft Rate measures within Task 0606 were found to significantly differ by-day, Temperature measurements for Tuesday and
Spirakis, C.S.; Pierson, C.T.; Santos, E.S.; Fishman, N.S.
1983-01-01
Statistical treatment of analytical data from 106 samples of uranium-mineralized and unmineralized or weakly mineralized rocks of the Morrison Formation from the northeastern part of the Church Rock area of the Grants uranium region indicates that along with uranium, the deposits in the northeast Church Rock area are enriched in barium, sulfur, sodium, vanadium and equivalent uranium. Selenium and molybdenum are sporadically enriched in the deposits and calcium, manganese, strontium, and yttrium are depleted. Unlike the primary deposits of the San Juan Basin, the deposits in the northeast part of the Church Rock area contain little organic carbon and several elements that are characteristically enriched in the primary deposits are not enriched or are enriched to a much lesser degree in the Church Rock deposits. The suite of elements associated with the deposits in the northeast part of the Church Rock area is also different from the suite of elements associated with the redistributed deposits in the Ambrosia Lake district. This suggests that the genesis of the Church Rock deposits is different, at least in part, from the genesis of the primary deposits of the San Juan Basin or the redistributed deposits at Ambrosia Lake.
NASA Astrophysics Data System (ADS)
Tema, E.; Zanella, E.; Pavón-Carrasco, F. J.; Kondopoulou, D.; Pavlides, S.
2015-10-01
We present the results of palaeomagnetic analysis on Late Bronge Age pottery from Santorini carried out in order to estimate the thermal effect of the Minoan eruption on the pre-Minoan habitation level. A total of 170 specimens from 108 ceramic fragments have been studied. The ceramics were collected from the surface of the pre-Minoan palaeosol at six different sites, including also samples from the Akrotiri archaeological site. The deposition temperatures of the first pyroclastic products have been estimated by the maximum overlap of the re-heating temperature intervals given by the individual fragments at site level. A new statistical elaboration of the temperature data has also been proposed, calculating at 95 per cent of probability the re-heating temperatures at each site. The obtained results show that the precursor tephra layer and the first pumice fall of the eruption were hot enough to re-heat the underlying ceramics at temperatures 160-230 °C in the non-inhabited sites while the temperatures recorded inside the Akrotiri village are slightly lower, varying from 130 to 200 °C. The decrease of the temperatures registered in the human settlements suggests that there was some interaction between the buildings and the pumice fallout deposits while probably the buildings debris layer caused by the preceding and syn-eruption earthquakes has also contributed to the decrease of the recorded re-heating temperatures.
On Interpreting Test Scores as Social Indicators: Statistical Considerations.
ERIC Educational Resources Information Center
Spencer, Bruce D.
1983-01-01
Because test scores are ordinal not cordinal attributes, the average test score often is a misleading way to summarize the scores of a group of individuals. Similarly, correlation coefficients may be misleading summary measures of association between test scores. Proper, readily interpretable, summary statistics are developed from a theory of…
Comparing survival curves using an easy to interpret statistic.
Hess, Kenneth R
2010-10-15
Here, I describe a statistic for comparing two survival curves that has a clear and obvious meaning and has a long history in biostatistics. Suppose we are comparing survival times associated with two treatments A and B. The statistic operates in such a way that if it takes on the value 0.95, then the interpretation is that a randomly chosen patient treated with A has a 95% chance of surviving longer than a randomly chosen patient treated with B. This statistic was first described in the 1950s, and was generalized in the 1960s to work with right-censored survival times. It is a useful and convenient measure for assessing differences between survival curves. Software for computing the statistic is readily available on the Internet. PMID:20732962
Hahn, A.A.
1994-11-01
The complexity of instrumentation sometimes requires data analysis to be done before the result is presented to the control room. This tutorial reviews some of the theoretical assumptions underlying the more popular forms of data analysis and presents simple examples to illuminate the advantages and hazards of different techniques.
Adapting internal statistical models for interpreting visual cues to depth
Seydell, Anna; Knill, David C.; Trommershäuser, Julia
2010-01-01
The informativeness of sensory cues depends critically on statistical regularities in the environment. However, statistical regularities vary between different object categories and environments. We asked whether and how the brain changes the prior assumptions about scene statistics used to interpret visual depth cues when stimulus statistics change. Subjects judged the slants of stereoscopically presented figures by adjusting a virtual probe perpendicular to the surface. In addition to stereoscopic disparities, the aspect ratio of the stimulus in the image provided a “figural compression” cue to slant, whose reliability depends on the distribution of aspect ratios in the world. As we manipulated this distribution from regular to random and back again, subjects’ reliance on the compression cue relative to stereoscopic cues changed accordingly. When we randomly interleaved stimuli from shape categories (ellipses and diamonds) with different statistics, subjects gave less weight to the compression cue for figures from the category with more random aspect ratios. Our results demonstrate that relative cue weights vary rapidly as a function of recently experienced stimulus statistics, and that the brain can use different statistical models for different object categories. We show that subjects’ behavior is consistent with that of a broad class of Bayesian learning models. PMID:20465321
Workplace Statistical Literacy for Teachers: Interpreting Box Plots
ERIC Educational Resources Information Center
Pierce, Robyn; Chick, Helen
2013-01-01
As a consequence of the increased use of data in workplace environments, there is a need to understand the demands that are placed on users to make sense of such data. In education, teachers are being increasingly expected to interpret and apply complex data about student and school performance, and, yet it is not clear that they always have the…
Pass-Fail Testing: Statistical Requirements and Interpretations
Gilliam, David; Leigh, Stefan; Rukhin, Andrew; Strawderman, William
2009-01-01
Performance standards for detector systems often include requirements for probability of detection and probability of false alarm at a specified level of statistical confidence. This paper reviews the accepted definitions of confidence level and of critical value. It describes the testing requirements for establishing either of these probabilities at a desired confidence level. These requirements are computable in terms of functions that are readily available in statistical software packages and general spreadsheet applications. The statistical interpretations of the critical values are discussed. A table is included for illustration, and a plot is presented showing the minimum required numbers of pass-fail tests. The results given here are applicable to one-sided testing of any system with performance characteristics conforming to a binomial distribution. PMID:27504221
Spina Bifida Data and Statistics
... Materials About Us Information For... Media Policy Makers Data and Statistics Recommend on Facebook Tweet Share Compartir ... non-Hispanic white and non-Hispanic black women. Data from 12 state-based birth defects tracking programs ...
Birth Defects Data and Statistics
... Websites About Us Information For... Media Policy Makers Data & Statistics Language: English Español (Spanish) Recommend on Facebook ... of birth defects in the United States. For data on specific birth defects, please visit the specific ...
[Big data in official statistics].
Zwick, Markus
2015-08-01
The concept of "big data" stands to change the face of official statistics over the coming years, having an impact on almost all aspects of data production. The tasks of future statisticians will not necessarily be to produce new data, but rather to identify and make use of existing data to adequately describe social and economic phenomena. Until big data can be used correctly in official statistics, a lot of questions need to be answered and problems solved: the quality of data, data protection, privacy, and the sustainable availability are some of the more pressing issues to be addressed. The essential skills of official statisticians will undoubtedly change, and this implies a number of challenges to be faced by statistical education systems, in universities, and inside the statistical offices. The national statistical offices of the European Union have concluded a concrete strategy for exploring the possibilities of big data for official statistics, by means of the Big Data Roadmap and Action Plan 1.0. This is an important first step and will have a significant influence on implementing the concept of big data inside the statistical offices of Germany. PMID:26077871
Statistical Interpretation of Key Comparison Reference Value and Degrees of Equivalence
Kacker, R. N.; Datla, R. U.; Parr, A. C.
2003-01-01
Key comparisons carried out by the Consultative Committees (CCs) of the International Committee of Weights and Measures (CIPM) or the Bureau International des Poids et Mesures (BIPM) are referred to as CIPM key comparisons. The outputs of a statistical analysis of the data from a CIPM key comparison are the key comparison reference value, the degrees of equivalence, and their associated uncertainties. The BIPM publications do not discuss statistical interpretation of these outputs. We discuss their interpretation under the following three statistical models: nonexistent laboratory-effects model, random laboratory-effects model, and systematic laboratory-effects model.
Statistical Interpretation of Natural and Technological Hazards in China
NASA Astrophysics Data System (ADS)
Borthwick, Alistair, ,, Prof.; Ni, Jinren, ,, Prof.
2010-05-01
China is prone to catastrophic natural hazards from floods, droughts, earthquakes, storms, cyclones, landslides, epidemics, extreme temperatures, forest fires, avalanches, and even tsunami. This paper will list statistics related to the six worst natural disasters in China over the past 100 or so years, ranked according to number of fatalities. The corresponding data for the six worst natural disasters in China over the past decade will also be considered. [The data are abstracted from the International Disaster Database, Centre for Research on the Epidemiology of Disasters (CRED), Université Catholique de Louvain, Brussels, Belgium, http://www.cred.be/ where a disaster is defined as occurring if one of the following criteria is fulfilled: 10 or more people reported killed; 100 or more people reported affected; a call for international assistance; or declaration of a state of emergency.] The statistics include the number of occurrences of each type of natural disaster, the number of deaths, the number of people affected, and the cost in billions of US dollars. Over the past hundred years, the largest disasters may be related to the overabundance or scarcity of water, and to earthquake damage. However, there has been a substantial relative reduction in fatalities due to water related disasters over the past decade, even though the overall numbers of people affected remain huge, as does the economic damage. This change is largely due to the efforts put in by China's water authorities to establish effective early warning systems, the construction of engineering countermeasures for flood protection, the implementation of water pricing and other measures for reducing excessive consumption during times of drought. It should be noted that the dreadful death toll due to the Sichuan Earthquake dominates recent data. Joint research has been undertaken between the Department of Environmental Engineering at Peking University and the Department of Engineering Science at Oxford
STATISTICAL SAMPLING AND DATA ANALYSIS
Research is being conducted to develop approaches to improve soil and sediment sampling techniques, measurement design and geostatistics, and data analysis via chemometric, environmetric, and robust statistical methods. Improvements in sampling contaminated soil and other hetero...
The interpretation of spectral data
NASA Technical Reports Server (NTRS)
Holter, M. R.
1972-01-01
The characteristics and extent of data which is obtainable by electromagnetic spectrum sensing and the application to earth resources survey are discussed. The wavelength and frequency ranges of operation for various remote sensors are tabulated. The spectral sensitivities of various sensing instruments are diagrammed. Examples of aerial photography to show the effects of lighting and seasonal variations on earth resources data are provided. Specific examples of multiband photography and multispectral imagery to crop analysis are included.
Interpreting health statistics for policymaking: the story behind the headlines.
Walker, Neff; Bryce, Jennifer; Black, Robert E
2007-03-17
Politicians, policymakers, and public-health professionals make complex decisions on the basis of estimates of disease burden from different sources, many of which are "marketed" by skilled advocates. To help people who rely on such statistics make more informed decisions, we explain how health estimates are developed, and offer basic guidance on how to assess and interpret them. We describe the different levels of estimates used to quantify disease burden and its correlates; understanding how closely linked a type of statistic is to disease and death rates is crucial in designing health policies and programmes. We also suggest questions that people using such statistics should ask and offer tips to help separate advocacy from evidence-based positions. Global health agencies have a key role in communicating robust estimates of disease, as do policymakers at national and subnational levels where key public-health decisions are made. A common framework and standardised methods, building on the work of Child Health Epidemiology Reference Group (CHERG) and others, are urgently needed. PMID:17368157
Structural interpretation of seismic data and inherent uncertainties
NASA Astrophysics Data System (ADS)
Bond, Clare
2013-04-01
Geoscience is perhaps unique in its reliance on incomplete datasets and building knowledge from their interpretation. This interpretation basis for the science is fundamental at all levels; from creation of a geological map to interpretation of remotely sensed data. To teach and understand better the uncertainties in dealing with incomplete data we need to understand the strategies individual practitioners deploy that make them effective interpreters. The nature of interpretation is such that the interpreter needs to use their cognitive ability in the analysis of the data to propose a sensible solution in their final output that is both consistent not only with the original data but also with other knowledge and understanding. In a series of experiments Bond et al. (2007, 2008, 2011, 2012) investigated the strategies and pitfalls of expert and non-expert interpretation of seismic images. These studies focused on large numbers of participants to provide a statistically sound basis for analysis of the results. The outcome of these experiments showed that a wide variety of conceptual models were applied to single seismic datasets. Highlighting not only spatial variations in fault placements, but whether interpreters thought they existed at all, or had the same sense of movement. Further, statistical analysis suggests that the strategies an interpreter employs are more important than expert knowledge per se in developing successful interpretations. Experts are successful because of their application of these techniques. In a new set of experiments a small number of experts are focused on to determine how they use their cognitive and reasoning skills, in the interpretation of 2D seismic profiles. Live video and practitioner commentary were used to track the evolving interpretation and to gain insight on their decision processes. The outputs of the study allow us to create an educational resource of expert interpretation through online video footage and commentary with
Engine Data Interpretation System (EDIS)
NASA Technical Reports Server (NTRS)
Cost, Thomas L.; Hofmann, Martin O.
1990-01-01
A prototype of an expert system was developed which applies qualitative or model-based reasoning to the task of post-test analysis and diagnosis of data resulting from a rocket engine firing. A combined component-based and process theory approach is adopted as the basis for system modeling. Such an approach provides a framework for explaining both normal and deviant system behavior in terms of individual component functionality. The diagnosis function is applied to digitized sensor time-histories generated during engine firings. The generic system is applicable to any liquid rocket engine but was adapted specifically in this work to the Space Shuttle Main Engine (SSME). The system is applied to idealized data resulting from turbomachinery malfunction in the SSME.
The broad topic of biomarker research has an often-overlooked component: the documentation and interpretation of the surrounding chemical environment and other meta-data, especially from visualization, analytical, and statistical perspectives (Pleil et al. 2014; Sobus et al. 2011...
Statistics by Example, Exploring Data.
ERIC Educational Resources Information Center
Mosteller, Frederick; And Others
Part of a series of four pamphlets providing real-life problems in probability and statistics for the secondary school level, this booklet shows how to organize data in tables and graphs in order to get and to exhibit messages. Elementary probability concepts are also introduced. Fourteen different problem situations arising from biology,…
STATISTICS AND DATA ANALYSIS WORKSHOP
On Janauary 15 and 16, 2003, a workshop for Tribal water resources staff on Statistics and Data Analysis was held at the Indian Springs Lodge on the Forest County Potowatomi Reservation near Wabeno, WI. The workshop was co-sponsored by the EPA, Sokaogon Chippewa (Mole Lake) Comm...
Interpretation of data from uphole refraction surveys
NASA Astrophysics Data System (ADS)
Franklin, A. G.
1980-06-01
The conventional interpretation of the data from an uphole refraction survey is based on the similarity between a plot of contours drawn on uphole arrival times and a wave-front diagram, which shows successive positions of the wave front produced by a single shot location at the ground surface. However, the two are alike only when the ground consists solely of homogeneous strata, oriented either horizontally or vertically. In this report, the term 'Meissner diagram' is used for the plot of arrival times from the uphole refraction survey in order to maintain the distinction between it and a true wave-front diagram. Where departures from the case of homogeneous, horizontal strata exist, the interpretation of the Meissner diagram is not straightforward, although a partial interpretation in terms of a horizontally stratified system is usually possible. A systematic approach to the interpretation problem, making use of such a partial interpretation, is proposed.
Interpretation of gamma-ray burst source count statistics
NASA Technical Reports Server (NTRS)
Petrosian, Vahe
1993-01-01
Ever since the discovery of gamma-ray bursts, the so-called log N-log S relation has been used for determination of their distances and distribution. This task has not been straightforward because of varying thresholds for the detection of bursts. Most of the current analyses of these data are couched in terms of ambiguous distributions, such as the distribution of Cp/Clim, the ratio of peak to threshold photon count rates, or the distribution of V/Vmax = (Cp/Clim) exp -3/2. It is shown that these distributions are not always a true reflection of the log N-log S relation. Some kind of deconvolution is required for obtaining the true log N-log S. Therefore, care is required in the interpretation of results of such analyses. A new method of analysis of these data is described, whereby the bivariate distribution of Cp and Clim is obtained directly from the data.
Statistical issues in the design, analysis and interpretation of animal carcinogenicity studies.
Haseman, J K
1984-01-01
Statistical issues in the design, analysis and interpretation of animal carcinogenicity studies are discussed. In the area of experimental design, issues that must be considered include randomization of animals, sample size considerations, dose selection and allocation of animals to experimental groups, and control of potentially confounding factors. In the analysis of tumor incidence data, survival differences among groups should be taken into account. It is important to try to distinguish between tumors that contribute to the death of the animal and "incidental" tumors discovered at autopsy in an animal dying of an unrelated cause. Life table analyses (appropriate for lethal tumors) and incidental tumor tests (appropriate for nonfatal tumors) are described, and the utilization of these procedures by the National Toxicology Program is discussed. Despite the fact that past interpretations of carcinogenicity data have tended to focus on pairwise comparisons in general and high-dose effects in particular, the importance of trend tests should not be overlooked, since these procedures are more sensitive than pairwise comparisons to the detection of carcinogenic effects. No rigid statistical "decision rule" should be employed in the interpretation of carcinogenicity data. Although the statistical significance of an observed tumor increase is perhaps the single most important piece of evidence used in the evaluation process, a number of biological factors must also be taken into account. The use of historical control data, the false-positive issue and the interpretation of negative trends are also discussed. PMID:6525993
Analysis of Visual Interpretation of Satellite Data
NASA Astrophysics Data System (ADS)
Svatonova, H.
2016-06-01
Millions of people of all ages and expertise are using satellite and aerial data as an important input for their work in many different fields. Satellite data are also gradually finding a new place in education, especially in the fields of geography and in environmental issues. The article presents the results of an extensive research in the area of visual interpretation of image data carried out in the years 2013 - 2015 in the Czech Republic. The research was aimed at comparing the success rate of the interpretation of satellite data in relation to a) the substrates (to the selected colourfulness, the type of depicted landscape or special elements in the landscape) and b) to selected characteristics of users (expertise, gender, age). The results of the research showed that (1) false colour images have a slightly higher percentage of successful interpretation than natural colour images, (2) colourfulness of an element expected or rehearsed by the user (regardless of the real natural colour) increases the success rate of identifying the element (3) experts are faster in interpreting visual data than non-experts, with the same degree of accuracy of solving the task, and (4) men and women are equally successful in the interpretation of visual image data.
Interpreting the flock algorithm from a statistical perspective.
Anderson, Eric C; Barry, Patrick D
2015-09-01
We show that the algorithm in the program flock (Duchesne & Turgeon 2009) can be interpreted as an estimation procedure based on a model essentially identical to the structure (Pritchard et al. 2000) model with no admixture and without correlated allele frequency priors. Rather than using MCMC, the flock algorithm searches for the maximum a posteriori estimate of this structure model via a simulated annealing algorithm with a rapid cooling schedule (namely, the exponent on the objective function →∞). We demonstrate the similarities between the two programs in a two-step approach. First, to enable rapid batch processing of many simulated data sets, we modified the source code of structure to use the flock algorithm, producing the program flockture. With simulated data, we confirmed that results obtained with flock and flockture are very similar (though flockture is some 200 times faster). Second, we simulated multiple large data sets under varying levels of population differentiation for both microsatellite and SNP genotypes. We analysed them with flockture and structure and assessed each program on its ability to cluster individuals to their correct subpopulation. We show that flockture yields results similar to structure albeit with greater variability from run to run. flockture did perform better than structure when genotypes were composed of SNPs and differentiation was moderate (FST= 0.022-0.032). When differentiation was low, structure outperformed flockture for both marker types. On large data sets like those we simulated, it appears that flock's reliance on inference rules regarding its 'plateau record' is not helpful. Interpreting flock's algorithm as a special case of the model in structure should aid in understanding the program's output and behaviour. PMID:25913195
Statistically significant relational data mining :
Berry, Jonathan W.; Leung, Vitus Joseph; Phillips, Cynthia Ann; Pinar, Ali; Robinson, David Gerald; Berger-Wolf, Tanya; Bhowmick, Sanjukta; Casleton, Emily; Kaiser, Mark; Nordman, Daniel J.; Wilson, Alyson G.
2014-02-01
This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publications that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.
Spatial Statistical Data Fusion (SSDF)
NASA Technical Reports Server (NTRS)
Braverman, Amy J.; Nguyen, Hai M.; Cressie, Noel
2013-01-01
As remote sensing for scientific purposes has transitioned from an experimental technology to an operational one, the selection of instruments has become more coordinated, so that the scientific community can exploit complementary measurements. However, tech nological and scientific heterogeneity across devices means that the statistical characteristics of the data they collect are different. The challenge addressed here is how to combine heterogeneous remote sensing data sets in a way that yields optimal statistical estimates of the underlying geophysical field, and provides rigorous uncertainty measures for those estimates. Different remote sensing data sets may have different spatial resolutions, different measurement error biases and variances, and other disparate characteristics. A state-of-the-art spatial statistical model was used to relate the true, but not directly observed, geophysical field to noisy, spatial aggregates observed by remote sensing instruments. The spatial covariances of the true field and the covariances of the true field with the observations were modeled. The observations are spatial averages of the true field values, over pixels, with different measurement noise superimposed. A kriging framework is used to infer optimal (minimum mean squared error and unbiased) estimates of the true field at point locations from pixel-level, noisy observations. A key feature of the spatial statistical model is the spatial mixed effects model that underlies it. The approach models the spatial covariance function of the underlying field using linear combinations of basis functions of fixed size. Approaches based on kriging require the inversion of very large spatial covariance matrices, and this is usually done by making simplifying assumptions about spatial covariance structure that simply do not hold for geophysical variables. In contrast, this method does not require these assumptions, and is also computationally much faster. This method is
Statistical analysis of pyroshock data
NASA Astrophysics Data System (ADS)
Hughes, William O.
2002-05-01
The sample size of aerospace pyroshock test data is typically small. This often forces the engineer to make assumptions on its population distribution and to use conservative margins or methodologies in determining shock specifications. For example, the maximum expected environment is often derived by adding 3-6 dB to the maximum envelope of a limited amount of shock data. The recent availability of a large amount of pyroshock test data has allowed a rare statistical analysis to be performed. Findings and procedures from this analysis will be explained, including information on population distributions, procedures to properly combine families of test data, and methods of deriving appropriate shock specifications for a multipoint shock source.
A plug-and-play approach to automated data interpretation: the data interpretation module (DIM)
Hartog, B.K.D.; Elling, J.W.; Mniszewski, S.M.
1995-12-31
The Contaminant Analysis Automation (CAA) Project`s automated analysis laboratory provides a ``plug-and-play`` reusable infrastructure for many types of environmental assays. As a sample progresses through sample preparation to sample analysis and finally to data interpretation, increasing expertise and judgment are needed at each step. The Data Interpretation Module (DIM) echoes the automation`s plug-and-play philosophy as a reusable engine and architecture for handling both the uncertainty and knowledge required for interpreting contaminant sample data. This presentation describes the implementation and performance of the DIM in interpreting polychlorinated biphenyl (PCB) gas chromatogram and shows the DIM architecture`s reusability for other applications.
Data Interpretation in the Digital Age
Leonelli, Sabina
2014-01-01
The consultation of internet databases and the related use of computer software to retrieve, visualise and model data have become key components of many areas of scientific research. This paper focuses on the relation of these developments to understanding the biology of organisms, and examines the conditions under which the evidential value of data posted online is assessed and interpreted by the researchers who access them, in ways that underpin and guide the use of those data to foster discovery. I consider the types of knowledge required to interpret data as evidence for claims about organisms, and in particular the relevance of knowledge acquired through physical interaction with actual organisms to assessing the evidential value of data found online. I conclude that familiarity with research in vivo is crucial to assessing the quality and significance of data visualised in silico; and that studying how biological data are disseminated, visualised, assessed and interpreted in the digital age provides a strong rationale for viewing scientific understanding as a social and distributed, rather than individual and localised, achievement. PMID:25729262
de Irala, J; Fernandez-Crehuet Navajas, R; Serrano del Castillo, A
1997-03-01
This study describes the behavior of eight statistical programs (BMDP, EGRET, JMP, SAS, SPSS, STATA, STATISTIX, and SYSTAT) when performing a logistic regression with a simulated data set that contains a numerical problem created by the presence of a cell value equal to zero. The programs respond in different ways to this problem. Most of them give a warning, although many simultaneously present incorrect results, among which are confidence intervals that tend toward infinity. Such results can mislead the user. Various guidelines are offered for detecting these problems in actual analyses, and users are reminded of the importance of critical interpretation of the results of statistical programs. PMID:9162592
Data compression preserving statistical independence
NASA Technical Reports Server (NTRS)
Morduch, G. E.; Rice, W. M.
1973-01-01
The purpose of this study was to determine the optimum points of evaluation of data compressed by means of polynomial smoothing. It is shown that a set y of m statistically independent observations Y(t sub 1), Y(t sub 2), ... Y(t sub m) of a quantity X(t), which can be described by a (n-1)th degree polynomial in time, may be represented by a set Z of n statistically independent compressed observations Z (tau sub 1), Z (tau sub 2),...Z (tau sub n), such that The compressed set Z has the same information content as the observed set Y. the times tau sub 1, tau sub 2,.. tau sub n are the zeros of an nth degree polynomial P sub n, to whose definition and properties the bulk of this report is devoted. The polynomials P sub n are defined as functions of the observation times t sub 1, t sub 2,.. t sub n, and it is interesting to note that if the observation times are continuously distributed the polynomials P sub n degenerate to legendre polynomials. The proposed data compression scheme is a little more complex than those usually employed, but has the advantage of preserving all the information content of the original observations.
Interpreting genomic data via entropic dissection
Azad, Rajeev K.; Li, Jing
2013-01-01
Since the emergence of high-throughput genome sequencing platforms and more recently the next-generation platforms, the genome databases are growing at an astronomical rate. Tremendous efforts have been invested in recent years in understanding intriguing complexities beneath the vast ocean of genomic data. This is apparent in the spurt of computational methods for interpreting these data in the past few years. Genomic data interpretation is notoriously difficult, partly owing to the inherent heterogeneities appearing at different scales. Methods developed to interpret these data often suffer from their inability to adequately measure the underlying heterogeneities and thus lead to confounding results. Here, we present an information entropy-based approach that unravels the distinctive patterns underlying genomic data efficiently and thus is applicable in addressing a variety of biological problems. We show the robustness and consistency of the proposed methodology in addressing three different biological problems of significance—identification of alien DNAs in bacterial genomes, detection of structural variants in cancer cell lines and alignment-free genome comparison. PMID:23036836
Regional interpretation of Kansas aeromagnetic data
Yarger, H.L.
1982-01-01
The aeromagnetic mapping techniques used in a regional aeromagnetic survey of the state are documented and a qualitative regional interpretation of the magnetic basement is presented. Geothermal gradients measured and data from oil well records indicate that geothermal resources in Kansas are of a low-grade nature. However, considerable variation in the gradient is noted statewide within the upper 500 meters of the sedimentary section; this suggests the feasibility of using groundwater for space heating by means of heat pumps.
Using Statistics to Lie, Distort, and Abuse Data
ERIC Educational Resources Information Center
Bintz, William; Moore, Sara; Adams, Cheryll; Pierce, Rebecca
2009-01-01
Statistics is a branch of mathematics that involves organization, presentation, and interpretation of data, both quantitative and qualitative. Data do not lie, but people do. On the surface, quantitative data are basically inanimate objects, nothing more than lifeless and meaningless symbols that appear on a page, calculator, computer, or in one's…
The Lure of Statistics in Data Mining
ERIC Educational Resources Information Center
Grover, Lovleen Kumar; Mehra, Rajni
2008-01-01
The field of Data Mining like Statistics concerns itself with "learning from data" or "turning data into information". For statisticians the term "Data mining" has a pejorative meaning. Instead of finding useful patterns in large volumes of data as in the case of Statistics, data mining has the connotation of searching for data to fit preconceived…
Statistical Interpretation of the Local Field Inside Dielectrics.
ERIC Educational Resources Information Center
Berrera, Ruben G.; Mello, P. A.
1982-01-01
Compares several derivations of the Clausius-Mossotti relation to analyze consistently the nature of approximations used and their range of applicability. Also presents a statistical-mechanical calculation of the local field for classical system of harmonic oscillators interacting via the Coulomb potential. (Author/SK)
Confounded Statistical Analyses Hinder Interpretation of the NELP Report
ERIC Educational Resources Information Center
Paris, Scott G.; Luo, Serena Wenshu
2010-01-01
The National Early Literacy Panel (2008) report identified early predictors of reading achievement as good targets for instruction, and many of those skills are related to decoding. In this article, the authors suggest that the developmental trajectories of rapidly developing skills pose problems for traditional statistical analyses. Rapidly…
Statistical characteristics of MST radar echoes and its interpretation
NASA Technical Reports Server (NTRS)
Woodman, Ronald F.
1989-01-01
Two concepts of fundamental importance are reviewed: the autocorrelation function and the frequency power spectrum. In addition, some turbulence concepts, the relationship between radar signals and atmospheric medium statistics, partial reflection, and the characteristics of noise and clutter interference are discussed.
Interpretation of Quantitative Shotgun Proteomic Data.
Aasebø, Elise; Berven, Frode S; Selheim, Frode; Barsnes, Harald; Vaudel, Marc
2016-01-01
In quantitative proteomics, large lists of identified and quantified proteins are used to answer biological questions in a systemic approach. However, working with such extensive datasets can be challenging, especially when complex experimental designs are involved. Here, we demonstrate how to post-process large quantitative datasets, detect proteins of interest, and annotate the data with biological knowledge. The protocol presented can be achieved without advanced computational knowledge thanks to the user-friendly Perseus interface (available from the MaxQuant website, www.maxquant.org ). Various visualization techniques facilitating the interpretation of quantitative results in complex biological systems are also highlighted. PMID:26700055
Interpreting magnetic data by integral moments
NASA Astrophysics Data System (ADS)
Tontini, F. Caratori; Pedersen, L. B.
2008-09-01
The use of the integral moments for interpreting magnetic data is based on a very elegant property of potential fields, but in the past it has not been completely exploited due to problems concerning real data. We describe a new 3-D development of previous 2-D results aimed at determining the magnetization direction, extending the calculation to second-order moments to recover the centre of mass of the magnetization distribution. The method is enhanced to reduce the effects of the regional field that often alters the first-order solutions. Moreover, we introduce an iterative correction to properly assess the errors coming from finite-size surveys or interaction with neighbouring anomalies, which are the most important causes of the failing of the method for real data. We test the method on some synthetic examples, and finally, we show the results obtained by analysing the aeromagnetic anomaly of the Monte Vulture volcano in Southern Italy.
Need for Caution in Interpreting Extreme Weather Statistics
NASA Astrophysics Data System (ADS)
Sardeshmukh, P. D.; Compo, G. P.; Penland, M. C.
2011-12-01
Given the substantial anthropogenic contribution to 20th century global warming, it is tempting to seek an anthropogenic component in any unusual recent weather event, or more generally in any observed change in the statistics of extreme weather. This study cautions that such detection and attribution efforts may, however, very likely lead to wrong conclusions if the non-Gaussian aspects of the probability distributions of observed daily atmospheric variations, especially their skewness and heavy tails, are not explicitly taken into account. Departures of three or more standard deviations from the mean, although rare, are far more common in such a non-Gaussian world than they are in a Gaussian world. This exacerbates the already difficult problem of establishing the significance of changes in extreme value probabilities from historical climate records of limited length, using either raw histograms or Generalized Extreme Value (GEV) distributions fitted to the sample extreme values. A possible solution is suggested by the fact that the non-Gaussian aspects of the observed distributions are well captured by a general class of "Stochastically Generated Skewed distributions" (SGS distributions) recently introduced in the meteorological literature by Sardeshmukh and Sura (J. Climate 2009). These distributions arise from simple modifications to a red noise process and reduce to Gaussian distributions under appropriate limits. As such, they represent perhaps the simplest physically based non-Gaussian prototypes of the distributions of daily atmospheric variations. Fitting such SGS distributions to all (not just the extreme) values in 25, 50, or 100-yr daily records also yields corresponding extreme value distributions that are much less prone to sampling uncertainty than GEV distributions. For both of the above reasons, SGS distributions provide an attractive alternative for assessing the significance of changes in extreme weather statistics (including changes in the
Aerosol backscatter lidar calibration and data interpretation
NASA Technical Reports Server (NTRS)
Kavaya, M. J.; Menzies, R. T.
1984-01-01
A treatment of the various factors involved in lidar data acquisition and analysis is presented. This treatment highlights sources of fundamental, systematic, modeling, and calibration errors that may affect the accurate interpretation and calibration of lidar aerosol backscatter data. The discussion primarily pertains to ground based, pulsed CO2 lidars that probe the troposphere and are calibrated using large, hard calibration targets. However, a large part of the analysis is relevant to other types of lidar systems such as lidars operating at other wavelengths; continuous wave (CW) lidars; lidars operating in other regions of the atmosphere; lidars measuring nonaerosol elastic or inelastic backscatter; airborne or Earth-orbiting lidar platforms; and lidars employing combinations of the above characteristics.
Data interpretation in the Automated Laboratory
Klatt, L.N.; Elling, J.W.; Mniszewski, S.
1995-12-01
The Contaminant Analysis Automation project envisions the analytical chemistry laboratory of the future being assembled from automation submodules that can be integrated into complete analysis system through a plug-and-play strategy. In this automated system the reduction of instrumental data to knowledge required by the laboratory customer must also be accomplished in an automated way. This paper presents the concept of an automated Data Interpretation Module (DIM) within the context of the plug-and-play automation strategy. The DIM is an expert system driven software module. The DIM functions as a standard laboratory module controlled by the system task sequence controller. The DIM consists of knowledge base(s) that accomplish the data assessment, quality control, and data analysis tasks. The expert system knowledge base(s) encapsulate the training and experience of the analytical chemist. Analysis of instrumental data by the DIM requires the use of pattern recognition techniques. Laboratory data from the analysis of PCBs will be used to illustrate the DIM.
DATA ON YOUTH, 1967, A STATISTICAL DOCUMENT.
ERIC Educational Resources Information Center
SCHEIDER, GEORGE
THE DATA IN THIS REPORT ARE STATISTICS ON YOUTH THROUGHOUT THE UNITED STATES AND IN NEW YORK STATE. INCLUDED ARE DATA ON POPULATION, SCHOOL STATISTICS, EMPLOYMENT, FAMILY INCOME, JUVENILE DELINQUENCY AND YOUTH CRIME (INCLUDING NEW YORK CITY FIGURES), AND TRAFFIC ACCIDENTS. THE STATISTICS ARE PRESENTED IN THE TEXT AND IN TABLES AND CHARTS. (NH)
Tools for interpretation of multispectral data
NASA Astrophysics Data System (ADS)
Speckert, Glen; Carpenter, Loren C.; Russell, Mike; Bradstreet, John; Waite, Tom; Conklin, Charlie
1990-08-01
The large size and multiple bands of todays satellite data require increasingly powerful tools in order to display and interpret the acquired imagery in a timely fashion. Pixar has developed two major tools for use in this data interpretation. These tools are the Electronic Light Table (ELT), and an extensive image processing package, ChapiP. These tools operate on images limited only by disk volume size, currently 3 Gbytes. The Electronic Light Table package provides a fully windowed interface to these large 12 bit monochrome and multiband images, passing images through a software defined image interpretation pipeline in real time during an interactive roam. A virtual image software framework allows interactive modification of the visible image. The roam software pipeline consists of a seventh order polynomial warp, bicubic resampling, a user registration affine, histogram drop sampling, a 5x5 unsharp mask, and per window contrast controls. It is important to note that these functions are done in software, and various performance tradeoffs can be made for different applications within a family of hardware configurations. Special high spped zoom, rotate, sharpness, and contrast operators provide interactive region of interest manipulation. Double window operators provide for flicker, fade, shade, and difference of two parent windows in a chained fashion. Overlay graphics capability is provided in a PostScfipt* windowed environment (NeWS**). The image is stored on disk as a multi resolution image pyramid. This allows resampling and other image operations independent of the zoom level. A set of tools layered upon ChapIP allow manipulation of the entire pyramid file. Arbitrary combinations of bands can be computed for arbitrary sized images, as well as other image processing operations. ChapIP can also be used in conjunction with ELT to dynamically operate on the current roaming window to append the image processing function onto the roam pipeline. Multiple Chapi
Laterally constrained inversion for CSAMT data interpretation
NASA Astrophysics Data System (ADS)
Wang, Ruo; Yin, Changchun; Wang, Miaoyue; Di, Qingyun
2015-10-01
Laterally constrained inversion (LCI) has been successfully applied to the inversion of dc resistivity, TEM and airborne EM data. However, it hasn't been yet applied to the interpretation of controlled-source audio-frequency magnetotelluric (CSAMT) data. In this paper, we apply the LCI method for CSAMT data inversion by preconditioning the Jacobian matrix. We apply a weighting matrix to Jacobian to balance the sensitivity of model parameters, so that the resolution with respect to different model parameters becomes more uniform. Numerical experiments confirm that this can improve the convergence of the inversion. We first invert a synthetic dataset with and without noise to investigate the effect of LCI applications to CSAMT data, for the noise free data, the results show that the LCI method can recover the true model better compared to the traditional single-station inversion; and for the noisy data, the true model is recovered even with a noise level of 8%, indicating that LCI inversions are to some extent noise insensitive. Then, we re-invert two CSAMT datasets collected respectively in a watershed and a coal mine area in Northern China and compare our results with those from previous inversions. The comparison with the previous inversion in a coal mine shows that LCI method delivers smoother layer interfaces that well correlate to seismic data, while comparison with a global searching algorithm of simulated annealing (SA) in a watershed shows that though both methods deliver very similar good results, however, LCI algorithm presented in this paper runs much faster. The inversion results for the coal mine CSAMT survey show that a conductive water-bearing zone that was not revealed by the previous inversions has been identified by the LCI. This further demonstrates that the method presented in this paper works for CSAMT data inversion.
A data-management system for detailed areal interpretive data
Ferrigno, C.F.
1986-01-01
A data storage and retrieval system has been developed to organize and preserve areal interpretive data. This system can be used by any study where there is a need to store areal interpretive data that generally is presented in map form. This system provides the capability to grid areal interpretive data for input to groundwater flow models at any spacing and orientation. The data storage and retrieval system is designed to be used for studies that cover small areas such as counties. The system is built around a hierarchically structured data base consisting of related latitude-longitude blocks. The information in the data base can be stored at different levels of detail, with the finest detail being a block of 6 sec of latitude by 6 sec of longitude (approximately 0.01 sq mi). This system was implemented on a mainframe computer using a hierarchical data base management system. The computer programs are written in Fortran IV and PL/1. The design and capabilities of the data storage and retrieval system, and the computer programs that are used to implement the system are described. Supplemental sections contain the data dictionary, user documentation of the data-system software, changes that would need to be made to use this system for other studies, and information on the computer software tape. (Lantz-PTT)
Polarimetric radar data decomposition and interpretation
NASA Technical Reports Server (NTRS)
Sun, Guoqing; Ranson, K. Jon
1993-01-01
Significant efforts have been made to decompose polarimetric radar data into several simple scattering components. The components which are selected because of their physical significance can be used to classify SAR (Synthetic Aperture Radar) image data. If particular components can be related to forest parameters, inversion procedures may be developed to estimate these parameters from the scattering components. Several methods have been used to decompose an averaged Stoke's matrix or covariance matrix into three components representing odd (surface), even (double-bounce) and diffuse (volume) scatterings. With these decomposition techniques, phenomena, such as canopy-ground interactions, randomness of orientation, and size of scatters can be examined from SAR data. In this study we applied the method recently reported by van Zyl (1992) to decompose averaged backscattering covariance matrices extracted from JPL SAR images over forest stands in Maine, USA. These stands are mostly mixed stands of coniferous and deciduous trees. Biomass data have been derived from field measurements of DBH and tree density using allometric equations. The interpretation of the decompositions and relationships with measured stand biomass are presented in this paper.
Data Torturing and the Misuse of Statistical Tools
Abate, Marcey L.
1999-08-16
Statistical concepts, methods, and tools are often used in the implementation of statistical thinking. Unfortunately, statistical tools are all too often misused by not applying them in the context of statistical thinking that focuses on processes, variation, and data. The consequences of this misuse may be ''data torturing'' or going beyond reasonable interpretation of the facts due to a misunderstanding of the processes creating the data or the misinterpretation of variability in the data. In the hope of averting future misuse and data torturing, examples are provided where the application of common statistical tools, in the absence of statistical thinking, provides deceptive results by not adequately representing the underlying process and variability. For each of the examples, a discussion is provided on how applying the concepts of statistical thinking may have prevented the data torturing. The lessons learned from these examples will provide an increased awareness of the potential for many statistical methods to mislead and a better understanding of how statistical thinking broadens and increases the effectiveness of statistical tools.
Michigan Library Statistical Report, 1999 Edition. Reporting 1998 Statistical Data.
ERIC Educational Resources Information Center
Krefman, Naomi, Comp.; Dwyer, Molly, Comp.; Krueger, Beth, Comp.
This statistical report on Michigan's libraries presents data provided by academic libraries, public libraries, public library cooperatives, and those public libraries that that serve as regional or subregional outlets to provide services for blind and physically handicapped patrons. For academic libraries, data are compiled from the 1998 academic…
Data Integration for Interpretation of Near-Surface Geophysical Tomograms
NASA Astrophysics Data System (ADS)
Day-Lewis, F. D.; Singha, K.
2007-12-01
Traditionally, interpretation of geophysical tomograms for geologic structure or engineering properties has been either qualitative, or based on petrophysical or statistical mapping to convert tomograms of the geophysical parameter (e.g., seismic velocity, radar velocity, or electrical conductivity) to some hydraulic parameter or engineering property of interest (e.g., hydraulic conductivity, porosity, or shear strength). Standard approaches to petrophysical and statistical mapping do not account for variable geophysical resolution, and thus it is difficult to obtain reliable, quantitative estimates of hydrologic properties or to characterize hydrologic processes in situ. Recent research to understand the limitations of tomograms for quantitative estimation points to the need for data integration. We divide near-surface geophysical data integration into two categories: 'inversion-based' and 'post- inversion' approaches. The first category includes 'informed-inversion' strategies that integrate complementary information in the form of prior information; constraints; physically-based regularization or parameterization; or coupled inversion. Post-inversion approaches include probabilistic frameworks to map tomograms to models of engineering properties, while accounting for geophysical resolution, survey design, heterogeneity, and physical models for hydrologic processes. Here, we review recent research demonstrating the need for, and advantages of, data integration. We present examples of both inversion-based and post-inversion data integration to reduce uncertainty, improve interpretation of near-surface geophysical results, and produce more reliable predictive models.
Recent statistical methods for orientation data
NASA Technical Reports Server (NTRS)
Batschelet, E.
1972-01-01
The application of statistical methods for determining the areas of animal orientation and navigation are discussed. The method employed is limited to the two-dimensional case. Various tests for determining the validity of the statistical analysis are presented. Mathematical models are included to support the theoretical considerations and tables of data are developed to show the value of information obtained by statistical analysis.
Impact of Equity Models and Statistical Measures on Interpretations of Educational Reform
ERIC Educational Resources Information Center
Rodriguez, Idaykis; Brewe, Eric; Sawtelle, Vashti; Kramer, Laird H.
2012-01-01
We present three models of equity and show how these, along with the statistical measures used to evaluate results, impact interpretation of equity in education reform. Equity can be defined and interpreted in many ways. Most equity education reform research strives to achieve equity by closing achievement gaps between groups. An example is given…
Distributed data collection for a database of radiological image interpretations
NASA Astrophysics Data System (ADS)
Long, L. Rodney; Ostchega, Yechiam; Goh, Gin-Hua; Thoma, George R.
1997-01-01
The National Library of Medicine, in collaboration with the National Center for Health Statistics and the National Institute for Arthritis and Musculoskeletal and Skin Diseases, has built a system for collecting radiological interpretations for a large set of x-ray images acquired as part of the data gathered in the second National Health and Nutrition Examination Survey. This system is capable of delivering across the Internet 5- and 10-megabyte x-ray images to Sun workstations equipped with X Window based 2048 X 2560 image displays, for the purpose of having these images interpreted for the degree of presence of particular osteoarthritic conditions in the cervical and lumbar spines. The collected interpretations can then be stored in a database at the National Library of Medicine, under control of the Illustra DBMS. This system is a client/server database application which integrates (1) distributed server processing of client requests, (2) a customized image transmission method for faster Internet data delivery, (3) distributed client workstations with high resolution displays, image processing functions and an on-line digital atlas, and (4) relational database management of the collected data.
Shafieloo, Arman
2012-05-01
By introducing Crossing functions and hyper-parameters I show that the Bayesian interpretation of the Crossing Statistics [1] can be used trivially for the purpose of model selection among cosmological models. In this approach to falsify a cosmological model there is no need to compare it with other models or assume any particular form of parametrization for the cosmological quantities like luminosity distance, Hubble parameter or equation of state of dark energy. Instead, hyper-parameters of Crossing functions perform as discriminators between correct and wrong models. Using this approach one can falsify any assumed cosmological model without putting priors on the underlying actual model of the universe and its parameters, hence the issue of dark energy parametrization is resolved. It will be also shown that the sensitivity of the method to the intrinsic dispersion of the data is small that is another important characteristic of the method in testing cosmological models dealing with data with high uncertainties.
Statistics for characterizing data on the periphery
Theiler, James P; Hush, Donald R
2010-01-01
We introduce a class of statistics for characterizing the periphery of a distribution, and show that these statistics are particularly valuable for problems in target detection. Because so many detection algorithms are rooted in Gaussian statistics, we concentrate on ellipsoidal models of high-dimensional data distributions (that is to say: covariance matrices), but we recommend several alternatives to the sample covariance matrix that more efficiently model the periphery of a distribution, and can more effectively detect anomalous data samples.
NATIONAL VITAL STATISTICS SYSTEM - MORTALITY DATA
In the United States, State laws require death certificates to be completed for all deaths, and Federal law mandates national collection and publication of deaths and other vital statistics data. The National Vital Statistics System, the Federal compilation of this data, is the r...
Barber, Chris; Cayley, Alex; Hanser, Thierry; Harding, Alex; Heghes, Crina; Vessey, Jonathan D; Werner, Stephane; Weiner, Sandy K; Wichard, Joerg; Giddings, Amanda; Glowienke, Susanne; Parenty, Alexis; Brigo, Alessandro; Spirkl, Hans-Peter; Amberg, Alexander; Kemper, Ray; Greene, Nigel
2016-04-01
The relative wealth of bacterial mutagenicity data available in the public literature means that in silico quantitative/qualitative structure activity relationship (QSAR) systems can readily be built for this endpoint. A good means of evaluating the performance of such systems is to use private unpublished data sets, which generally represent a more distinct chemical space than publicly available test sets and, as a result, provide a greater challenge to the model. However, raw performance metrics should not be the only factor considered when judging this type of software since expert interpretation of the results obtained may allow for further improvements in predictivity. Enough information should be provided by a QSAR to allow the user to make general, scientifically-based arguments in order to assess and overrule predictions when necessary. With all this in mind, we sought to validate the performance of the statistics-based in vitro bacterial mutagenicity prediction system Sarah Nexus (version 1.1) against private test data sets supplied by nine different pharmaceutical companies. The results of these evaluations were then analysed in order to identify findings presented by the model which would be useful for the user to take into consideration when interpreting the results and making their final decision about the mutagenic potential of a given compound. PMID:26708083
Statistical analysis of scintillation data
Chua, S.; Noonan, J.P.; Basu, S.
1981-09-01
The Nakagami-m distribution has traditionally been used successfully to model the probability characteristics of ionospheric scintillations at UHF. This report investigates the distribution properties of scintillation data in the L-band range. Specifically, the appropriateness of the Nakagami-m and lognormal distributions is tested. Briefly the results confirm that the Nakagami-m is appropriate for UHF but not for L-band scintillations. The lognormal provides a better fit to the distribution of L-band scintillations and is an adequate model allowing for an error of + or - 0.1 or smaller in predicted probability with a sample size of 256.
... to Other Websites Information For... Media Policy Makers Data & Statistics Language: English Español (Spanish) Recommend on Facebook Tweet Share Compartir * The data on this page are from the article, “Venous ...
Data Mining: Going beyond Traditional Statistics
ERIC Educational Resources Information Center
Zhao, Chun-Mei; Luan, Jing
2006-01-01
The authors provide an overview of data mining, giving special attention to the relationship between data mining and statistics to unravel some misunderstandings about the two techniques. (Contains 1 figure.)
Data explorer: a prototype expert system for statistical analysis.
Aliferis, C.; Chao, E.; Cooper, G. F.
1993-01-01
The inadequate analysis of medical research data, due mainly to the unavailability of local statistical expertise, seriously jeopardizes the quality of new medical knowledge. Data Explorer is a prototype Expert System that builds on the versatility and power of existing statistical software, to provide automatic analyses and interpretation of medical data. The system draws much of its power by using belief network methods in place of more traditional, but difficult to automate, classical multivariate statistical techniques. Data Explorer identifies statistically significant relationships among variables, and using power-size analysis, belief network inference/learning and various explanatory techniques helps the user understand the importance of the findings. Finally the system can be used as a tool for the automatic development of predictive/diagnostic models from patient databases. PMID:8130501
NASA Astrophysics Data System (ADS)
Karuppiah, R.; Faldi, A.; Laurenzi, I.; Usadi, A.; Venkatesh, A.
2014-12-01
An increasing number of studies are focused on assessing the environmental footprint of different products and processes, especially using life cycle assessment (LCA). This work shows how combining statistical methods and Geographic Information Systems (GIS) with environmental analyses can help improve the quality of results and their interpretation. Most environmental assessments in literature yield single numbers that characterize the environmental impact of a process/product - typically global or country averages, often unchanging in time. In this work, we show how statistical analysis and GIS can help address these limitations. For example, we demonstrate a method to separately quantify uncertainty and variability in the result of LCA models using a power generation case study. This is important for rigorous comparisons between the impacts of different processes. Another challenge is lack of data that can affect the rigor of LCAs. We have developed an approach to estimate environmental impacts of incompletely characterized processes using predictive statistical models. This method is applied to estimate unreported coal power plant emissions in several world regions. There is also a general lack of spatio-temporal characterization of the results in environmental analyses. For instance, studies that focus on water usage do not put in context where and when water is withdrawn. Through the use of hydrological modeling combined with GIS, we quantify water stress on a regional and seasonal basis to understand water supply and demand risks for multiple users. Another example where it is important to consider regional dependency of impacts is when characterizing how agricultural land occupation affects biodiversity in a region. We developed a data-driven methodology used in conjuction with GIS to determine if there is a statistically significant difference between the impacts of growing different crops on different species in various biomes of the world.
Statistical Data Analyses of Trace Chemical, Biochemical, and Physical Analytical Signatures
Udey, Ruth Norma
2013-01-01
Analytical and bioanalytical chemistry measurement results are most meaningful when interpreted using rigorous statistical treatments of the data. The same data set may provide many dimensions of information depending on the questions asked through the applied statistical methods. Three principal projects illustrated the wealth of information gained through the application of statistical data analyses to diverse problems.
NASA Astrophysics Data System (ADS)
Kuić, Domagoj
2016-05-01
In this paper an alternative approach to statistical mechanics based on the maximum information entropy principle (MaxEnt) is examined, specifically its close relation with the Gibbs method of ensembles. It is shown that the MaxEnt formalism is the logical extension of the Gibbs formalism of equilibrium statistical mechanics that is entirely independent of the frequentist interpretation of probabilities only as factual (i.e. experimentally verifiable) properties of the real world. Furthermore, we show that, consistently with the law of large numbers, the relative frequencies of the ensemble of systems prepared under identical conditions (i.e. identical constraints) actually correspond to the MaxEnt probabilites in the limit of a large number of systems in the ensemble. This result implies that the probabilities in statistical mechanics can be interpreted, independently of the frequency interpretation, on the basis of the maximum information entropy principle.
Statistical Data Analysis in the Computer Age
NASA Astrophysics Data System (ADS)
Efron, Bradley; Tibshirani, Robert
1991-07-01
Most of our familiar statistical methods, such as hypothesis testing, linear regression, analysis of variance, and maximum likelihood estimation, were designed to be implemented on mechanical calculators. modern electronic computation has encouraged a host of new statistical methods that require fewer distributional assumptions than their predecessors and can be applied to more complicated statistical estimators. These methods allow the scientist to explore and describe data and draw valid statistical inferences without the usual concerns for mathematical tractability. This is possible because traditional methods of mathematical analysis are replaced by specially constructed computer algorithms. Mathematics has not disappeared from statistical theory. It is the main method for deciding which algorithms are correct and efficient tools for automating statistical inference.
Transit Spectroscopy: new data analysis techniques and interpretation
NASA Astrophysics Data System (ADS)
Tinetti, Giovanna; Waldmann, Ingo P.; Morello, Giuseppe; Tessenyi, Marcell; Varley, Ryan; Barton, Emma; Yurchenko, Sergey; Tennyson, Jonathan; Hollis, Morgan
2014-11-01
Planetary science beyond the boundaries of our Solar System is today in its infancy. Until a couple of decades ago, the detailed investigation of the planetary properties was restricted to objects orbiting inside the Kuiper Belt. Today, we cannot ignore that the number of known planets has increased by two orders of magnitude nor that these planets resemble anything but the objects present in our own Solar System. A key observable for planets is the chemical composition and state of their atmosphere. To date, two methods can be used to sound exoplanetary atmospheres: transit and eclipse spectroscopy, and direct imaging spectroscopy. Although the field of exoplanet spectroscopy has been very successful in past years, there are a few serious hurdles that need to be overcome to progress in this area: in particular instrument systematics are often difficult to disentangle from the signal, data are sparse and often not recorded simultaneously causing degeneracy of interpretation. We will present here new data analysis techniques and interpretation developed by the “ExoLights” team at UCL to address the above-mentioned issues. Said techniques include statistical tools, non-parametric, machine-learning algorithms, optimized radiative transfer models and spectroscopic line-lists. These new tools have been successfully applied to existing data recorded with space and ground instruments, shedding new light on our knowledge and understanding of these alien worlds.
Topology for statistical modeling of petascale data.
Pascucci, Valerio; Mascarenhas, Ajith Arthur; Rusek, Korben; Bennett, Janine Camille; Levine, Joshua; Pebay, Philippe Pierre; Gyulassy, Attila; Thompson, David C.; Rojas, Joseph Maurice
2011-07-01
This document presents current technical progress and dissemination of results for the Mathematics for Analysis of Petascale Data (MAPD) project titled 'Topology for Statistical Modeling of Petascale Data', funded by the Office of Science Advanced Scientific Computing Research (ASCR) Applied Math program. Many commonly used algorithms for mathematical analysis do not scale well enough to accommodate the size or complexity of petascale data produced by computational simulations. The primary goal of this project is thus to develop new mathematical tools that address both the petascale size and uncertain nature of current data. At a high level, our approach is based on the complementary techniques of combinatorial topology and statistical modeling. In particular, we use combinatorial topology to filter out spurious data that would otherwise skew statistical modeling techniques, and we employ advanced algorithms from algebraic statistics to efficiently find globally optimal fits to statistical models. This document summarizes the technical advances we have made to date that were made possible in whole or in part by MAPD funding. These technical contributions can be divided loosely into three categories: (1) advances in the field of combinatorial topology, (2) advances in statistical modeling, and (3) new integrated topological and statistical methods.
HistFitter software framework for statistical data analysis
NASA Astrophysics Data System (ADS)
Baak, M.; Besjes, G. J.; Côté, D.; Koutsman, A.; Lorenz, J.; Short, D.
2015-04-01
We present a software framework for statistical data analysis, called HistFitter, that has been used extensively by the ATLAS Collaboration to analyze big datasets originating from proton-proton collisions at the Large Hadron Collider at CERN. Since 2012 HistFitter has been the standard statistical tool in searches for supersymmetric particles performed by ATLAS. HistFitter is a programmable and flexible framework to build, book-keep, fit, interpret and present results of data models of nearly arbitrary complexity. Starting from an object-oriented configuration, defined by users, the framework builds probability density functions that are automatically fit to data and interpreted with statistical tests. Internally HistFitter uses the statistics packages RooStats and HistFactory. A key innovation of HistFitter is its design, which is rooted in analysis strategies of particle physics. The concepts of control, signal and validation regions are woven into its fabric. These are progressively treated with statistically rigorous built-in methods. Being capable of working with multiple models at once that describe the data, HistFitter introduces an additional level of abstraction that allows for easy bookkeeping, manipulation and testing of large collections of signal hypotheses. Finally, HistFitter provides a collection of tools to present results with publication quality style through a simple command-line interface.
Extracting meaningful information from metabonomic data using multivariate statistics.
Bylesjö, Max
2015-01-01
Metabonomics aims to identify and quantify all small-molecule metabolites in biologically relevant samples using high-throughput techniques such as NMR and chromatography/mass spectrometry. This generates high-dimensional data sets with properties that require specialized approaches to data analysis. This chapter describes multivariate statistics and analysis tools to extract meaningful information from metabonomic data sets. The focus is on the use and interpretation of latent variable methods such as principal component analysis (PCA), partial least squares/projections to latent structures (PLS), and orthogonal PLS (OPLS). Descriptions of the key steps of the multivariate data analyses are provided with demonstrations from example data. PMID:25677152
Statistical treatment of fatigue test data
Raske, D.T.
1980-01-01
This report discussed several aspects of fatigue data analysis in order to provide a basis for the development of statistically sound design curves. Included is a discussion on the choice of the dependent variable, the assumptions associated with least squares regression models, the variability of fatigue data, the treatment of data from suspended tests and outlying observations, and various strain-life relations.
Statistical Tools for the Interpretation of Enzootic West Nile virus Transmission Dynamics.
Caillouët, Kevin A; Robertson, Suzanne
2016-01-01
Interpretation of enzootic West Nile virus (WNV) surveillance indicators requires little advanced mathematical skill, but greatly enhances the ability of public health officials to prescribe effective WNV management tactics. Stepwise procedures for the calculation of mosquito infection rates (IR) and vector index (VI) are presented alongside statistical tools that require additional computation. A brief review of advantages and important considerations for each statistic's use is provided. PMID:27188561
A spatial scan statistic for multinomial data
Jung, Inkyung; Kulldorff, Martin; Richard, Otukei John
2014-01-01
As a geographical cluster detection analysis tool, the spatial scan statistic has been developed for different types of data such as Bernoulli, Poisson, ordinal, exponential and normal. Another interesting data type is multinomial. For example, one may want to find clusters where the disease-type distribution is statistically significantly different from the rest of the study region when there are different types of disease. In this paper, we propose a spatial scan statistic for such data, which is useful for geographical cluster detection analysis for categorical data without any intrinsic order information. The proposed method is applied to meningitis data consisting of five different disease categories to identify areas with distinct disease-type patterns in two counties in the U.K. The performance of the method is evaluated through a simulation study. PMID:20680984
Seasonal variations of decay rate measurement data and their interpretation.
Schrader, Heinrich
2016-08-01
Measurement data of long-lived radionuclides, for example, (85)Kr, (90)Sr, (108m)Ag, (133)Ba, (152)Eu, (154)Eu and (226)Ra, and particularly the relative residuals of fitted raw data from current measurements of ionization chambers for half-life determination show small periodic seasonal variations with amplitudes of about 0.15%. The interpretation of these fluctuations is a matter of controversy whether the observed effect is produced by some interaction with the radionuclides themselves or is an artifact of the measuring chain. At the origin of such a discussion there is the exponential decay law of radioactive substances used for data fitting, one of the fundamentals of nuclear physics. Some groups of physicists use statistical methods and analyze correlations with various parameters of the measurement data and, for example, the Earth-Sun distance, as a basis of interpretation. In this article, data measured at the Physikalisch-Technische Bundesanstalt and published earlier are the subject of a correlation analysis using the corresponding time series of data with varying measurement conditions. An overview of these measurement conditions producing instrument instabilities is given and causality relations are discussed. The resulting correlation coefficients for various series of the same radionuclide using similar measurement conditions are in the order of 0.7, which indicates a high correlation, and for series of the same radionuclide using different measurement conditions and changes of the measuring chain of the order of -0.2 or even lower, which indicates an anti-correlation. These results provide strong arguments that the observed seasonal variations are caused by the measuring chain and, in particular, by the type of measuring electronics used. PMID:27258217
Interpreting Statistical Significance Test Results: A Proposed New "What If" Method.
ERIC Educational Resources Information Center
Kieffer, Kevin M.; Thompson, Bruce
As the 1994 publication manual of the American Psychological Association emphasized, "p" values are affected by sample size. As a result, it can be helpful to interpret the results of statistical significant tests in a sample size context by conducting so-called "what if" analyses. However, these methods can be inaccurate unless "corrected" effect…
HistFitter - A flexible framework for statistical data analysis
NASA Astrophysics Data System (ADS)
Lorenz, J. M.; Baak, M.; Besjes, G. J.; Côté, D.; Koutsman, A.; Short, D.
2015-05-01
We present a software framework for statistical data analysis, called HistFitter, that has extensively been used in the ATLAS Collaboration to analyze data of proton-proton collisions produced by the Large Hadron Collider at CERN. Most notably, HistFitter has become a de-facto standard in searches for supersymmetric particles since 2012, with some usage for Exotic and Higgs boson physics. HistFitter coherently combines several statistics tools in a programmable and flexible framework that is capable of bookkeeping hundreds of data models under study using thousands of generated input histograms. The key innovations of HistFitter are to weave the concepts of control, validation and signal regions into its very fabric, and to treat them with rigorous statistical methods, while providing multiple tools to visualize and interpret the results through a simple configuration interface.
Dotto, G L; Pinto, L A A; Hachicha, M A; Knani, S
2015-03-15
In this work, statistical physics treatment was employed to study the adsorption of food dyes onto chitosan films, in order to obtain new physicochemical interpretations at molecular level. Experimental equilibrium curves were obtained for the adsorption of four dyes (FD&C red 2, FD&C yellow 5, FD&C blue 2, Acid Red 51) at different temperatures (298, 313 and 328 K). A statistical physics formula was used to interpret these curves, and the parameters such as, number of adsorbed dye molecules per site (n), anchorage number (n'), receptor sites density (NM), adsorbed quantity at saturation (N asat), steric hindrance (τ), concentration at half saturation (c1/2) and molar adsorption energy (ΔE(a)) were estimated. The relation of the above mentioned parameters with the chemical structure of the dyes and temperature was evaluated and interpreted. PMID:25308634
On the Interpretation of Running Trends as Summary Statistics for Time Series Analysis
NASA Astrophysics Data System (ADS)
Vigo, Isabel M.; Trottini, Mario; Belda, Santiago
2016-04-01
In recent years, running trends analysis (RTA) has been widely used in climate applied research as summary statistics for time series analysis. There is no doubt that RTA might be a useful descriptive tool, but despite its general use in applied research, precisely what it reveals about the underlying time series is unclear and, as a result, its interpretation is unclear too. This work contributes to such interpretation in two ways: 1) an explicit formula is obtained for the set of time series with a given series of running trends, making it possible to show that running trends, alone, perform very poorly as summary statistics for time series analysis; and 2) an equivalence is established between RTA and the estimation of a (possibly nonlinear) trend component of the underlying time series using a weighted moving average filter. Such equivalence provides a solid ground for RTA implementation and interpretation/validation.
Statistical data of the uranium industry
1983-01-01
This report is a compendium of information relating to US uranium reserves and potential resources and to exploration, mining, milling, and other activities of the uranium industry through 1982. The statistics are based primarily on data provided voluntarily by the uranium exploration, mining and milling companies. The compendium has been published annually since 1968 and reflects the basic programs of the Grand Junction Area Office of the US Department of Energy. Statistical data obtained from surveys conducted by the Energy Information Administration are included in Section IX. The production, reserves, and drilling data are reported in a manner which avoids disclosure of proprietary information.
Revisiting the statistical analysis of pyroclast density and porosity data
NASA Astrophysics Data System (ADS)
Bernard, B.; Kueppers, U.; Ortiz, H.
2015-07-01
Explosive volcanic eruptions are commonly characterized based on a thorough analysis of the generated deposits. Amongst other characteristics in physical volcanology, density and porosity of juvenile clasts are some of the most frequently used to constrain eruptive dynamics. In this study, we evaluate the sensitivity of density and porosity data to statistical methods and introduce a weighting parameter to correct issues raised by the use of frequency analysis. Results of textural investigation can be biased by clast selection. Using statistical tools as presented here, the meaningfulness of a conclusion can be checked for any data set easily. This is necessary to define whether or not a sample has met the requirements for statistical relevance, i.e. whether a data set is large enough to allow for reproducible results. Graphical statistics are used to describe density and porosity distributions, similar to those used for grain-size analysis. This approach helps with the interpretation of volcanic deposits. To illustrate this methodology, we chose two large data sets: (1) directed blast deposits of the 3640-3510 BC eruption of Chachimbiro volcano (Ecuador) and (2) block-and-ash-flow deposits of the 1990-1995 eruption of Unzen volcano (Japan). We propose the incorporation of this analysis into future investigations to check the objectivity of results achieved by different working groups and guarantee the meaningfulness of the interpretation.
Vocational Education Statistical Data Plans and Programs.
ERIC Educational Resources Information Center
Schwartz, Mark
This document provides information on the Data on Vocational Education (DOVE) plan, which has provided the National Center for Education Statistics (NCES) with a framework on which a viable data collection and dissemination program is being constructed for vocational education. A section on the status of DOVE discusses the attainment of the…
Topology for Statistical Modeling of Petascale Data.
Bennett, Janine Camille; Pebay, Philippe Pierre; Pascucci, Valerio; Levine, Joshua; Gyulassy, Attila; Rojas, Joseph Maurice
2014-07-01
This document presents current technical progress and dissemination of results for the Mathematics for Analysis of Petascale Data (MAPD) project titled "Topology for Statistical Modeling of Petascale Data", funded by the Office of Science Advanced Scientific Computing Research (ASCR) Applied Math program.
Interpretation of remotely sensed data and its applications in oceanography
NASA Technical Reports Server (NTRS)
Parada, N. D. J. (Principal Investigator); Tanaka, K.; Inostroza, H. M.; Verdesio, J. J.
1982-01-01
The methodology of interpretation of remote sensing data and its oceanographic applications are described. The elements of image interpretation for different types of sensors are discussed. The sensors utilized are the multispectral scanner of LANDSAT, and the thermal infrared of NOAA and geostationary satellites. Visual and automatic data interpretation in studies of pollution, the Brazil current system, and upwelling along the southeastern Brazilian coast are compared.
The Top 100: Interpreting the Data.
ERIC Educational Resources Information Center
Borden, Victor M. H.
1999-01-01
The sources and structure of data reported in the annual "Top 100" list of colleges and universities conferring the highest numbers of degrees to students of color are described, and the use of the data to make comparisons between historically black and traditionally white institutions is explained. Some trends in the eight-year history of the…
Analysis and Interpretation of Financial Data.
ERIC Educational Resources Information Center
Robinson, Daniel D.
1975-01-01
Understanding the financial reports of colleges and universities has long been a problem because of the lack of comparability of the data presented. Recently, there has been a move to agree on uniform standards for financial accounting and reporting for the field of higher education. In addition to comparable data, the efforts to make financial…
Engine Data Interpretation System (EDIS), phase 2
NASA Technical Reports Server (NTRS)
Cost, Thomas L.; Hofmann, Martin O.
1991-01-01
A prototype of an expert system was developed which applies qualitative constraint-based reasoning to the task of post-test analysis of data resulting from a rocket engine firing. Data anomalies are detected and corresponding faults are diagnosed. Engine behavior is reconstructed using measured data and knowledge about engine behavior. Knowledge about common faults guides but does not restrict the search for the best explanation in terms of hypothesized faults. The system contains domain knowledge about the behavior of common rocket engine components and was configured for use with the Space Shuttle Main Engine (SSME). A graphical user interface allows an expert user to intimately interact with the system during diagnosis. The system was applied to data taken during actual SSME tests where data anomalies were observed.
Redman-MacLaren, Michelle; Mills, Jane; Tommbe, Rachael
2014-01-01
Background Participatory approaches to qualitative research practice constantly change in response to evolving research environments. Researchers are increasingly encouraged to undertake secondary analysis of qualitative data, despite epistemological and ethical challenges. Interpretive focus groups can be described as a more participative method for groups to analyse qualitative data. Objective To facilitate interpretive focus groups with women in Papua New Guinea to extend analysis of existing qualitative data and co-create new primary data. The purpose of this was to inform a transformational grounded theory and subsequent health promoting action. Design A two-step approach was used in a grounded theory study about how women experience male circumcision in Papua New Guinea. Participants analysed portions or ‘chunks’ of existing qualitative data in story circles and built upon this analysis by using the visual research method of storyboarding. Results New understandings of the data were evoked when women in interpretive focus groups analysed the data ‘chunks’. Interpretive focus groups encouraged women to share their personal experiences about male circumcision. The visual method of storyboarding enabled women to draw pictures to represent their experiences. This provided an additional focus for whole-of-group discussions about the research topic. Conclusions Interpretive focus groups offer opportunity to enhance trustworthiness of findings when researchers undertake secondary analysis of qualitative data. The co-analysis of existing data and co-generation of new data between research participants and researchers informed an emergent transformational grounded theory and subsequent health promoting action. PMID:25138532
Statistical data of the uranium industry
1981-01-01
Data are presented on US uranium reserves, potential resources, exploration, mining, drilling, milling, and other activities of the uranium industry through 1980. The compendium reflects the basic programs of the Grand Junction Office. Statistics are based primarily on information provided by the uranium exploration, mining, and milling companies. Data on commercial U/sub 3/O/sub 8/ sales and purchases are included. Data on non-US uranium production and resources are presented in the appendix. (DMC)
Statistical data of the uranium industry
1982-01-01
Statistical Data of the Uranium Industry is a compendium of information relating to US uranium reserves and potential resources and to exploration, mining, milling, and other activities of the uranium industry through 1981. The statistics are based primarily on data provided voluntarily by the uranium exploration, mining, and milling companies. The compendium has been published annually since 1968 and reflects the basic programs of the Grand Junction Area Office (GJAO) of the US Department of Energy. The production, reserves, and drilling information is reported in a manner which avoids disclosure of proprietary information.
The Top 100: Interpreting the Data.
ERIC Educational Resources Information Center
Borden, Victor M. H.
1999-01-01
The sources and structure of data reported in the annual "Top 100" list of colleges and universities conferring the highest numbers of degrees to students of color are described, including the way in which various student categories are reported. (MSE)
Telemetry Boards Interpret Rocket, Airplane Engine Data
NASA Technical Reports Server (NTRS)
2009-01-01
For all the data gathered by the space shuttle while in orbit, NASA engineers are just as concerned about the information it generates on the ground. From the moment the shuttle s wheels touch the runway to the break of its electrical umbilical cord at 0.4 seconds before its next launch, sensors feed streams of data about the status of the vehicle and its various systems to Kennedy Space Center s shuttle crews. Even while the shuttle orbiter is refitted in Kennedy s orbiter processing facility, engineers constantly monitor everything from power levels to the testing of the mechanical arm in the orbiter s payload bay. On the launch pad and up until liftoff, the Launch Control Center, attached to the large Vehicle Assembly Building, screens all of the shuttle s vital data. (Once the shuttle clears its launch tower, this responsibility shifts to Mission Control at Johnson Space Center, with Kennedy in a backup role.) Ground systems for satellite launches also generate significant amounts of data. At Cape Canaveral Air Force Station, across the Banana River from Kennedy s location on Merritt Island, Florida, NASA rockets carrying precious satellite payloads into space flood the Launch Vehicle Data Center with sensor information on temperature, speed, trajectory, and vibration. The remote measurement and transmission of systems data called telemetry is essential to ensuring the safe and successful launch of the Agency s space missions. When a launch is unsuccessful, as it was for this year s Orbiting Carbon Observatory satellite, telemetry data also provides valuable clues as to what went wrong and how to remedy any problems for future attempts. All of this information is streamed from sensors in the form of binary code: strings of ones and zeros. One small company has partnered with NASA to provide technology that renders raw telemetry data intelligible not only for Agency engineers, but also for those in the private sector.
Component fragilities. Data collection, analysis and interpretation
Bandyopadhyay, K.K.; Hofmayer, C.H.
1985-01-01
As part of the component fragility research program sponsored by the US NRC, BNL is involved in establishing seismic fragility levels for various nuclear power plant equipment with emphasis on electrical equipment. To date, BNL has reviewed approximately seventy test reports to collect fragility or high level test data for switchgears, motor control centers and similar electrical cabinets, valve actuators and numerous electrical and control devices, e.g., switches, transmitters, potentiometers, indicators, relays, etc., of various manufacturers and models. BNL has also obtained test data from EPRI/ANCO. Analysis of the collected data reveals that fragility levels can best be described by a group of curves corresponding to various failure modes. The lower bound curve indicates the initiation of malfunctioning or structural damage, whereas the upper bound curve corresponds to overall failure of the equipment based on known failure modes occurring separately or interactively. For some components, the upper and lower bound fragility levels are observed to vary appreciably depending upon the manufacturers and models. For some devices, testing even at the shake table vibration limit does not exhibit any failure. Failure of a relay is observed to be a frequent cause of failure of an electrical panel or a system. An extensive amount of additional fregility or high level test data exists.
The Top 100: Interpreting the Data.
ERIC Educational Resources Information Center
Borden, Victor M. H.
1998-01-01
Using data from federal surveys, the colleges and universities conferring the largest number of degrees on students of color are ranked. Tables include total minority degrees (bachelor's and associate) awarded; individual minority groups (African Americans, Hispanics, Asians, Native Americans); and individual disciplines (life sciences, business…
MICROARRAY DATA ANALYSIS USING MULTIPLE STATISTICAL MODELS
Microarray Data Analysis Using Multiple Statistical Models
Wenjun Bao1, Judith E. Schmid1, Amber K. Goetz1, Ming Ouyang2, William J. Welsh2,Andrew I. Brooks3,4, ChiYi Chu3,Mitsunori Ogihara3,4, Yinhe Cheng5, David J. Dix1. 1National Health and Environmental Effects Researc...
Phenomenological approach to scatterometer data interpretation
NASA Technical Reports Server (NTRS)
Alzofon, F. E.
1970-01-01
A graphic method of analyzing radar scatterometer sea clutter data leading to linear relations between scattering cross sections and tan angle of incidence of the radiation is proposed. This relation permits formulation of simple analytic relations without reference to the ocean surface spectrum. Parameters introduced depend on the wavelength of the incident radiation and its polarization, and on wind and sea states. The simplicity of the expressions derived suggests a corresponding simplicity in the physical mechanism of radar sea clutter return.
Bayesian methods for interpreting plutonium urinalysis data
Miller, G.; Inkret, W.C.
1995-09-01
The authors discuss an internal dosimetry problem, where measurements of plutonium in urine are used to calculate radiation doses. The authors have developed an algorithm using the MAXENT method. The method gives reasonable results, however the role of the entropy prior distribution is to effectively fit the urine data using intakes occurring close in time to each measured urine result, which is unrealistic. A better approximation for the actual prior is the log-normal distribution; however, with the log-normal distribution another calculational approach must be used. Instead of calculating the most probable values, they turn to calculating expectation values directly from the posterior probability, which is feasible for a small number of intakes.
Stratigraphic interpretation of seismic data on the workstation
Bahorich, M.; Van Bemmel, P.
1994-12-31
Until recently, interpretation of seismic data in the workstation environment has been restricted primarily to horizon and attribute maps. Interpreters have not had the ability to make various types of notations on seismic data and subsequent map views as has been done for years on paper. New thinking in the industry is leading to the development of software which provides the geoscientist with a broader range of interpretive functionality on seismic and subsequent map views. This new functionality reduces the tedious bookkeeping tasks associated with seismic sequence stratigraphy and facies analysis. Interpreters may now perform stratigraphic analysis in more detail in less time by employing the power of the interpretive workstation. A data set over a deep-water fan illustrates the power of this technology.
Impact of equity models and statistical measures on interpretations of educational reform
NASA Astrophysics Data System (ADS)
Rodriguez, Idaykis; Brewe, Eric; Sawtelle, Vashti; Kramer, Laird H.
2012-12-01
We present three models of equity and show how these, along with the statistical measures used to evaluate results, impact interpretation of equity in education reform. Equity can be defined and interpreted in many ways. Most equity education reform research strives to achieve equity by closing achievement gaps between groups. An example is given by the study by Lorenzo et al. that shows that interactive engagement methods lead to increased gender equity. In this paper, we reexamine the results of Lorenzo et al. through three models of equity. We find that interpretation of the results strongly depends on the model of equity chosen. Further, we argue that researchers must explicitly state their model of equity as well as use effect size measurements to promote clarity in education reform.
MSL DAN Passive Data and Interpretations
NASA Astrophysics Data System (ADS)
Tate, C. G.; Moersch, J.; Jun, I.; Ming, D. W.; Mitrofanov, I. G.; Litvak, M. L.; Behar, A.; Boynton, W. V.; Drake, D.; Lisov, D.; Mischna, M. A.; Hardgrove, C. J.; Milliken, R.; Sanin, A. B.; Starr, R. D.; Martín-Torres, J.; Zorzano, M. P.; Fedosov, F.; Golovin, D.; Harshman, K.; Kozyrev, A.; Malakhov, A. V.; Mokrousov, M.; Nikiforov, S.; Varenikov, A.
2014-12-01
In its passive mode of operation, The Mars Science Laboratory Dynamic Albedo of Neutrons experiment (DAN) detects low energy neutrons that are produced by two different sources on Mars. Neutrons are produced by the rover's Multi-Mission Radioisotope Thermoelectric Generator (MMRTG) and by interactions of high energy galactic cosmic rays (GCR) within the atmosphere and regolith. As these neutrons propagate through the subsurface, their energies can be moderated by interactions with hydrogen nuclei. More hydrogen leads to greater moderation (thermalization) of the neutron population energies. The presence of high thermal neutron absorbing elements within the regolith also complicates the spectrum of the returning neutron population, as shown by Hardgrove et al. DAN measures the thermal and epithermal neutron populations leaking from the surface to infer the amount of water equivalent hydrogen (WEH) in the shallow regolith. Extensive modeling is performed using a Monte Carlo approach (MCNPX) to analyze DAN passive measurements at fixed locations and along rover traverse segments. DAN passive WEH estimates along Curiosity's traverse will be presented along with an analysis of trends in the data and a description of correlations between these results and the geologic characteristics of the surfaces traversed.
Statistical Treatment of Looking-Time Data
2016-01-01
Looking times (LTs) are frequently measured in empirical research on infant cognition. We analyzed the statistical distribution of LTs across participants to develop recommendations for their treatment in infancy research. Our analyses focused on a common within-subject experimental design, in which longer looking to novel or unexpected stimuli is predicted. We analyzed data from 2 sources: an in-house set of LTs that included data from individual participants (47 experiments, 1,584 observations), and a representative set of published articles reporting group-level LT statistics (149 experiments from 33 articles). We established that LTs are log-normally distributed across participants, and therefore, should always be log-transformed before parametric statistical analyses. We estimated the typical size of significant effects in LT studies, which allowed us to make recommendations about setting sample sizes. We show how our estimate of the distribution of effect sizes of LT studies can be used to design experiments to be analyzed by Bayesian statistics, where the experimenter is required to determine in advance the predicted effect size rather than the sample size. We demonstrate the robustness of this method in both sets of LT experiments. PMID:26845505
Statistical treatment of looking-time data.
Csibra, Gergely; Hernik, Mikołaj; Mascaro, Olivier; Tatone, Denis; Lengyel, Máté
2016-04-01
Looking times (LTs) are frequently measured in empirical research on infant cognition. We analyzed the statistical distribution of LTs across participants to develop recommendations for their treatment in infancy research. Our analyses focused on a common within-subject experimental design, in which longer looking to novel or unexpected stimuli is predicted. We analyzed data from 2 sources: an in-house set of LTs that included data from individual participants (47 experiments, 1,584 observations), and a representative set of published articles reporting group-level LT statistics (149 experiments from 33 articles). We established that LTs are log-normally distributed across participants, and therefore, should always be log-transformed before parametric statistical analyses. We estimated the typical size of significant effects in LT studies, which allowed us to make recommendations about setting sample sizes. We show how our estimate of the distribution of effect sizes of LT studies can be used to design experiments to be analyzed by Bayesian statistics, where the experimenter is required to determine in advance the predicted effect size rather than the sample size. We demonstrate the robustness of this method in both sets of LT experiments. (PsycINFO Database Record PMID:26845505
Simultaneous Statistical Inference for Epigenetic Data
Schildknecht, Konstantin; Olek, Sven; Dickhaus, Thorsten
2015-01-01
Epigenetic research leads to complex data structures. Since parametric model assumptions for the distribution of epigenetic data are hard to verify we introduce in the present work a nonparametric statistical framework for two-group comparisons. Furthermore, epigenetic analyses are often performed at various genetic loci simultaneously. Hence, in order to be able to draw valid conclusions for specific loci, an appropriate multiple testing correction is necessary. Finally, with technologies available for the simultaneous assessment of many interrelated biological parameters (such as gene arrays), statistical approaches also need to deal with a possibly unknown dependency structure in the data. Our statistical approach to the nonparametric comparison of two samples with independent multivariate observables is based on recently developed multivariate multiple permutation tests. We adapt their theory in order to cope with families of hypotheses regarding relative effects. Our results indicate that the multivariate multiple permutation test keeps the pre-assigned type I error level for the global null hypothesis. In combination with the closure principle, the family-wise error rate for the simultaneous test of the corresponding locus/parameter-specific null hypotheses can be controlled. In applications we demonstrate that group differences in epigenetic data can be detected reliably with our methodology. PMID:25965389
Menzerath-Altmann Law: Statistical Mechanical Interpretation as Applied to a Linguistic Organization
NASA Astrophysics Data System (ADS)
Eroglu, Sertac
2014-10-01
The distribution behavior described by the empirical Menzerath-Altmann law is frequently encountered during the self-organization of linguistic and non-linguistic natural organizations at various structural levels. This study presents a statistical mechanical derivation of the law based on the analogy between the classical particles of a statistical mechanical organization and the distinct words of a textual organization. The derived model, a transformed (generalized) form of the Menzerath-Altmann model, was termed as the statistical mechanical Menzerath-Altmann model. The derived model allows interpreting the model parameters in terms of physical concepts. We also propose that many organizations presenting the Menzerath-Altmann law behavior, whether linguistic or not, can be methodically examined by the transformed distribution model through the properly defined structure-dependent parameter and the energy associated states.
78 FR 10166 - Access Interpreting; Transfer of Data
Federal Register 2010, 2011, 2012, 2013, 2014
2013-02-13
... From the Federal Register Online via the Government Publishing Office ENVIRONMENTAL PROTECTION AGENCY Access Interpreting; Transfer of Data AGENCY: Environmental Protection Agency (EPA). ACTION: Notice. SUMMARY: This notice announces that pesticide related information submitted to EPA's Office...
Statistical modeling of space shuttle environmental data
NASA Technical Reports Server (NTRS)
Tubbs, J. D.; Brewer, D. W.
1983-01-01
Statistical models which use a class of bivariate gamma distribution are examined. Topics discussed include: (1) the ratio of positively correlated gamma varieties; (2) a method to determine if unequal shape parameters are necessary in bivariate gamma distribution; (3) differential equations for modal location of a family of bivariate gamma distribution; and (4) analysis of some wind gust data using the analytical results developed for modeling application.
HistFitter: a flexible framework for statistical data analysis
NASA Astrophysics Data System (ADS)
Besjes, G. J.; Baak, M.; Côté, D.; Koutsman, A.; Lorenz, J. M.; Short, D.
2015-12-01
HistFitter is a software framework for statistical data analysis that has been used extensively in the ATLAS Collaboration to analyze data of proton-proton collisions produced by the Large Hadron Collider at CERN. Most notably, HistFitter has become a de-facto standard in searches for supersymmetric particles since 2012, with some usage for Exotic and Higgs boson physics. HistFitter coherently combines several statistics tools in a programmable and flexible framework that is capable of bookkeeping hundreds of data models under study using thousands of generated input histograms. HistFitter interfaces with the statistics tools HistFactory and RooStats to construct parametric models and to perform statistical tests of the data, and extends these tools in four key areas. The key innovations are to weave the concepts of control, validation and signal regions into the very fabric of HistFitter, and to treat these with rigorous methods. Multiple tools to visualize and interpret the results through a simple configuration interface are also provided.
Models to interpret bedform geometries from cross-bed data
Luthi, S.M. ); Banavar, J.R. ); Bayer, U. )
1990-03-01
Semi-elliptical and sinusoidal bedform crestlines were modeled with curvature and sinuosity as parameters. Both bedform crestlines are propagated at various angles of migration over a finite area of deposition. Two computational approaches are used, a statistical random sampling (Monte Carlo) technique over the area of the deposit, and an analytical method based on topology and differential geometry. The resulting foreset azimuth distributions provide a catalogue for a variety of situations. The resulting thickness distributions have a simple shape and can be combined with the azimuth distributions to constrain further the cross-strata geometry. Paleocurrent directions obtained by these models can differ substantially from other methods, especially for obliquely migrating low-curvature bedforms. Interpretation of foreset azimuth data from outcrops and wells can be done either by visual comparison with the catalogued distributions, or by iterative computational fits. Studied examples include eolian cross-strata from the Permian Rotliegendes in the North Sea, fluvial dunes from the Devonian in the Catskills (New York State), the Triassic Schilfsandstein (West Germany) and the Paleozoic-Jurassic of the Western Desert (Egypt), as well as recent tidal dunes from the German coast of the North Sea and tidal cross-strata from the Devonian Koblentquartzit (West Germany). In all cases the semi-elliptical bedform model gave a good fit to the data, suggesting that it may be applicable over a wide range of bedforms. The data from the Western Desert could only be explained by data scatter due to channel sinuosity combining with the scatter attributed to the ellipticity of the bedform crestlines. These models, therefore, may also allow simulations of some hierarchically structured bedforms.
A computer system for interpreting blood glucose data.
Deutsch, T; Gergely, T; Trunov, V
2004-10-01
This paper presents an overview on the design and implementation of a computer system for the interpretation of home monitoring data of diabetic patients. The comprehensive methodology covers the major information processing steps leading from raw data to a concise summary of what has happened between two subsequent visits. It includes techniques for summarising and interpreting data, checking for inconsistency, identifying and diagnosing metabolic problems and learning from patient data. Data interpretation focuses on extracting trend patterns and classifying/clustering daily blood glucose (BG) profiles. The software helps clinicians to explore data recorded before the main meals and bedtime, and to identify problems in the patient's metabolic control which should be addressed either by educating the patient and/or adjusting the current management regimen. PMID:15313541
Common misconceptions about data analysis and statistics.
Motulsky, Harvey J
2015-02-01
Ideally, any experienced investigator with the right tools should be able to reproduce a finding published in a peer-reviewed biomedical science journal. In fact, the reproducibility of a large percentage of published findings has been questioned. Undoubtedly, there are many reasons for this, but one reason may be that investigators fool themselves due to a poor understanding of statistical concepts. In particular, investigators often make these mistakes: (1) P-Hacking. This is when you reanalyze a data set in many different ways, or perhaps reanalyze with additional replicates, until you get the result you want. (2) Overemphasis on P values rather than on the actual size of the observed effect. (3) Overuse of statistical hypothesis testing, and being seduced by the word "significant". (4) Overreliance on standard errors, which are often misunderstood. PMID:25692012
Common misconceptions about data analysis and statistics.
Motulsky, Harvey J
2014-10-01
Ideally, any experienced investigator with the right tools should be able to reproduce a finding published in a peer-reviewed biomedical science journal. In fact, however, the reproducibility of a large percentage of published findings has been questioned. Undoubtedly, there are many reasons for this, but one reason may be that investigators fool themselves due to a poor understanding of statistical concepts. In particular, investigators often make these mistakes: 1) P-hacking, which is when you reanalyze a data set in many different ways, or perhaps reanalyze with additional replicates, until you get the result you want; 2) overemphasis on P values rather than on the actual size of the observed effect; 3) overuse of statistical hypothesis testing, and being seduced by the word "significant"; and 4) over-reliance on standard errors, which are often misunderstood. PMID:25204545
Common misconceptions about data analysis and statistics.
Motulsky, Harvey J
2014-11-01
Ideally, any experienced investigator with the right tools should be able to reproduce a finding published in a peer-reviewed biomedical science journal. In fact, the reproducibility of a large percentage of published findings has been questioned. Undoubtedly, there are many reasons for this, but one reason maybe that investigators fool themselves due to a poor understanding of statistical concepts. In particular, investigators often make these mistakes: 1. P-Hacking. This is when you reanalyze a data set in many different ways, or perhaps reanalyze with additional replicates, until you get the result you want. 2. Overemphasis on P values rather than on the actual size of the observed effect. 3. Overuse of statistical hypothesis testing, and being seduced by the word "significant". 4. Overreliance on standard errors, which are often misunderstood. PMID:25213136
The Statistical Literacy Needed to Interpret School Assessment Data
ERIC Educational Resources Information Center
Chick, Helen; Pierce, Robyn
2013-01-01
State-wide and national testing in areas such as literacy and numeracy produces reports containing graphs and tables illustrating school and individual performance. These are intended to inform teachers, principals, and education organisations about student and school outcomes, to guide change and improvement. Given the complexity of the…
Revisiting the statistical analysis of pyroclast density and porosity data
NASA Astrophysics Data System (ADS)
Bernard, B.; Kueppers, U.; Ortiz, H.
2015-03-01
Explosive volcanic eruptions are commonly characterized based on a thorough analysis of the generated deposits. Amongst other characteristics in physical volcanology, density and porosity of juvenile clasts are some of the most frequently used characteristics to constrain eruptive dynamics. In this study, we evaluate the sensitivity of density and porosity data and introduce a weighting parameter to correct issues raised by the use of frequency analysis. Results of textural investigation can be biased by clast selection. Using statistical tools as presented here, the meaningfulness of a conclusion can be checked for any dataset easily. This is necessary to define whether or not a sample has met the requirements for statistical relevance, i.e. whether a dataset is large enough to allow for reproducible results. Graphical statistics are used to describe density and porosity distributions, similar to those used for grain-size analysis. This approach helps with the interpretation of volcanic deposits. To illustrate this methodology we chose two large datasets: (1) directed blast deposits of the 3640-3510 BC eruption of Chachimbiro volcano (Ecuador) and (2) block-and-ash-flow deposits of the 1990-1995 eruption of Unzen volcano (Japan). We propose add the use of this analysis for future investigations to check the objectivity of results achieved by different working groups and guarantee the meaningfulness of the interpretation.
Energy statistics data finder. [Monograph; energy-related census data
Not Available
1980-08-01
Energy-related data collected by the Bureau of the Census covers economic and demographic areas and provides data on a regular basis to produce current estimates from survey programs. Series report numbers, a summary of subject content, geographic detail, and report frequency are identified under the following major publication title categories: Agriculture, Retail Trade, Wholesale Trade, Service Industries, Construction, Transportation, Enterprise Statistics, County Business Patterns, Foreign Trade, Governments, Manufacturers, Mineral Industries, 1980 Census of Population and Housing, Annual Housing Survey and Travel-to-Work Supplement, and Statistical Compendia. The data are also available on computer tapes, microfiche, and in special tabulations. (DCK)
Multivariate statistical mapping of spectroscopic imaging data.
Young, Karl; Govind, Varan; Sharma, Khema; Studholme, Colin; Maudsley, Andrew A; Schuff, Norbert
2010-01-01
For magnetic resonance spectroscopic imaging studies of the brain, it is important to measure the distribution of metabolites in a regionally unbiased way; that is, without restrictions to a priori defined regions of interest. Since magnetic resonance spectroscopic imaging provides measures of multiple metabolites simultaneously at each voxel, there is furthermore great interest in utilizing the multidimensional nature of magnetic resonance spectroscopic imaging for gains in statistical power. Voxelwise multivariate statistical mapping is expected to address both of these issues, but it has not been previously employed for spectroscopic imaging (SI) studies of brain. The aims of this study were to (1) develop and validate multivariate voxel-based statistical mapping for magnetic resonance spectroscopic imaging and (2) demonstrate that multivariate tests can be more powerful than univariate tests in identifying patterns of altered brain metabolism. Specifically, we compared multivariate to univariate tests in identifying known regional patterns in simulated data and regional patterns of metabolite alterations due to amyotrophic lateral sclerosis, a devastating brain disease of the motor neurons. PMID:19953514
NASA Technical Reports Server (NTRS)
Shewhart, Mark
1991-01-01
Statistical Process Control (SPC) charts are one of several tools used in quality control. Other tools include flow charts, histograms, cause and effect diagrams, check sheets, Pareto diagrams, graphs, and scatter diagrams. A control chart is simply a graph which indicates process variation over time. The purpose of drawing a control chart is to detect any changes in the process signalled by abnormal points or patterns on the graph. The Artificial Intelligence Support Center (AISC) of the Acquisition Logistics Division has developed a hybrid machine learning expert system prototype which automates the process of constructing and interpreting control charts.
Statistical challenges of high-dimensional data
Johnstone, Iain M.; Titterington, D. Michael
2009-01-01
Modern applications of statistical theory and methods can involve extremely large datasets, often with huge numbers of measurements on each of a comparatively small number of experimental units. New methodology and accompanying theory have emerged in response: the goal of this Theme Issue is to illustrate a number of these recent developments. This overview article introduces the difficulties that arise with high-dimensional data in the context of the very familiar linear statistical model: we give a taste of what can nevertheless be achieved when the parameter vector of interest is sparse, that is, contains many zero elements. We describe other ways of identifying low-dimensional subspaces of the data space that contain all useful information. The topic of classification is then reviewed along with the problem of identifying, from within a very large set, the variables that help to classify observations. Brief mention is made of the visualization of high-dimensional data and ways to handle computational problems in Bayesian analysis are described. At appropriate points, reference is made to the other papers in the issue. PMID:19805443
Computer Simulation of Incomplete-Data Interpretation Exercise.
ERIC Educational Resources Information Center
Robertson, Douglas Frederick
1987-01-01
Described is a computer simulation that was used to help general education students enrolled in a large introductory geology course. The purpose of the simulation is to learn to interpret incomplete data. Students design a plan to collect bathymetric data for an area of the ocean. Procedures used by the students and instructor are included.…
Interpreting Survey Data to Inform Solid-Waste Education Programs
ERIC Educational Resources Information Center
McKeown, Rosalyn
2006-01-01
Few examples exist on how to use survey data to inform public environmental education programs. I suggest a process for interpreting statewide survey data with the four questions that give insights into local context and make it possible to gain insight into potential target audiences and community priorities. The four questions are: What…
Customizable tool for ecological data entry, assessment, monitoring, and interpretation
Technology Transfer Automated Retrieval System (TEKTRAN)
The Database for Inventory, Monitoring and Assessment (DIMA) is a highly customizable tool for data entry, assessment, monitoring, and interpretation. DIMA is a Microsoft Access database that can easily be used without Access knowledge and is available at no cost. Data can be entered for common, nat...
Securing cooperation from persons supplying statistical data.
AUBENQUE, M J; BLAIKLEY, R M; HARRIS, F F; LAL, R B; NEURDENBURG, M G; DE SHELLY HERNANDEZ, R
1954-01-01
Securing the co-operation of persons supplying information required for medical statistics is essentially a problem in human relations, and an understanding of the motivations, attitudes, and behaviour of the respondents is necessary.Before any new statistical survey is undertaken, it is suggested by Aubenque and Harris that a preliminary review be made so that the maximum use is made of existing information. Care should also be taken not to burden respondents with an overloaded questionnaire. Aubenque and Harris recommend simplified reporting. Complete population coverage is not necessary.Neurdenburg suggests that the co-operation and support of such organizations as medical associations and social security boards are important and that propaganda should be directed specifically to the groups whose co-operation is sought. Informal personal contacts are valuable and desirable, according to Blaikley, but may have adverse effects if the right kind of approach is not made.Financial payments as an incentive in securing co-operation are opposed by Neurdenburg, who proposes that only postage-free envelopes or similar small favours be granted. Blaikley and Harris, on the other hand, express the view that financial incentives may do much to gain the support of those required to furnish data; there are, however, other incentives, and full use should be made of the natural inclinations of respondents. Compulsion may be necessary in certain instances, but administrative rather than statutory measures should be adopted. Penalties, according to Aubenque, should be inflicted only when justified by imperative health requirements.The results of surveys should be made available as soon as possible to those who co-operated, and Aubenque and Harris point out that they should also be of practical value to the suppliers of the information.Greater co-operation can be secured from medical persons who have an understanding of the statistical principles involved; Aubenque and Neurdenburg
Weatherization Assistance Program - Background Data and Statistics
Eisenberg, Joel Fred
2010-03-01
This technical memorandum is intended to provide readers with information that may be useful in understanding the purposes, performance, and outcomes of the Department of Energy's (DOE's) Weatherization Assistance Program (Weatherization). Weatherization has been in operation for over thirty years and is the nation's largest single residential energy efficiency program. Its primary purpose, established by law, is 'to increase the energy efficiency of dwellings owned or occupied by low-income persons, reduce their total residential energy expenditures, and improve their health and safety, especially low-income persons who are particularly vulnerable such as the elderly, the handicapped, and children.' The American Reinvestment and Recovery Act PL111-5 (ARRA), passed and signed into law in February 2009, committed $5 Billion over two years to an expanded Weatherization Assistance Program. This has created substantial interest in the program, the population it serves, the energy and cost savings it produces, and its cost-effectiveness. This memorandum is intended to address the need for this kind of information. Statistically valid answers to many of the questions surrounding Weatherization and its performance require comprehensive evaluation of the program. DOE is undertaking precisely this kind of independent evaluation in order to ascertain program effectiveness and to improve its performance. Results of this evaluation effort will begin to emerge in late 2010 and 2011, but they require substantial time and effort. In the meantime, the data and statistics in this memorandum can provide reasonable and transparent estimates of key program characteristics. The memorandum is laid out in three sections. The first deals with some key characteristics describing low-income energy consumption and expenditures. The second section provides estimates of energy savings and energy bill reductions that the program can reasonably be presumed to be producing. The third section
Design, analysis, and interpretation of field quality-control data for water-sampling projects
Mueller, David K.; Schertz, Terry L.; Martin, Jeffrey D.; Sandstrom, Mark W.
2015-01-01
The report provides extensive information about statistical methods used to analyze quality-control data in order to estimate potential bias and variability in environmental data. These methods include construction of confidence intervals on various statistical measures, such as the mean, percentiles and percentages, and standard deviation. The methods are used to compare quality-control results with the larger set of environmental data in order to determine whether the effects of bias and variability might interfere with interpretation of these data. Examples from published reports are presented to illustrate how the methods are applied, how bias and variability are reported, and how the interpretation of environmental data can be qualified based on the quality-control analysis.
Statistical atlas based extrapolation of CT data
NASA Astrophysics Data System (ADS)
Chintalapani, Gouthami; Murphy, Ryan; Armiger, Robert S.; Lepisto, Jyri; Otake, Yoshito; Sugano, Nobuhiko; Taylor, Russell H.; Armand, Mehran
2010-02-01
We present a framework to estimate the missing anatomical details from a partial CT scan with the help of statistical shape models. The motivating application is periacetabular osteotomy (PAO), a technique for treating developmental hip dysplasia, an abnormal condition of the hip socket that, if untreated, may lead to osteoarthritis. The common goals of PAO are to reduce pain, joint subluxation and improve contact pressure distribution by increasing the coverage of the femoral head by the hip socket. While current diagnosis and planning is based on radiological measurements, because of significant structural variations in dysplastic hips, a computer-assisted geometrical and biomechanical planning based on CT data is desirable to help the surgeon achieve optimal joint realignments. Most of the patients undergoing PAO are young females, hence it is usually desirable to minimize the radiation dose by scanning only the joint portion of the hip anatomy. These partial scans, however, do not provide enough information for biomechanical analysis due to missing iliac region. A statistical shape model of full pelvis anatomy is constructed from a database of CT scans. The partial volume is first aligned with the statistical atlas using an iterative affine registration, followed by a deformable registration step and the missing information is inferred from the atlas. The atlas inferences are further enhanced by the use of X-ray images of the patient, which are very common in an osteotomy procedure. The proposed method is validated with a leave-one-out analysis method. Osteotomy cuts are simulated and the effect of atlas predicted models on the actual procedure is evaluated.
Soil VisNIR chemometric performance statistics should be interpreted as random variables
NASA Astrophysics Data System (ADS)
Brown, David J.; Gasch, Caley K.; Poggio, Matteo; Morgan, Cristine L. S.
2015-04-01
Chemometric models are normally evaluated using performance statistics such as the Standard Error of Prediction (SEP) or the Root Mean Squared Error of Prediction (RMSEP). These statistics are used to evaluate the quality of chemometric models relative to other published work on a specific soil property or to compare the results from different processing and modeling techniques (e.g. Partial Least Squares Regression or PLSR and random forest algorithms). Claims are commonly made about the overall success of an application or the relative performance of different modeling approaches assuming that these performance statistics are fixed population parameters. While most researchers would acknowledge that small differences in performance statistics are not important, rarely are performance statistics treated as random variables. Given that we are usually comparing modeling approaches for general application, and given that the intent of VisNIR soil spectroscopy is to apply chemometric calibrations to larger populations than are included in our soil-spectral datasets, it is more appropriate to think of performance statistics as random variables with variation introduced through the selection of samples for inclusion in a given study and through the division of samples into calibration and validation sets (including spiking approaches). Here we look at the variation in VisNIR performance statistics for the following soil-spectra datasets: (1) a diverse US Soil Survey soil-spectral library with 3768 samples from all 50 states and 36 different countries; (2) 389 surface and subsoil samples taken from US Geological Survey continental transects; (3) the Texas Soil Spectral Library (TSSL) with 3000 samples; (4) intact soil core scans of Texas soils with 700 samples; (5) approximately 400 in situ scans from the Pacific Northwest region; and (6) miscellaneous local datasets. We find the variation in performance statistics to be surprisingly large. This has important
Accessing seismic data through geological interpretation: Challenges and solutions
NASA Astrophysics Data System (ADS)
Butler, R. W.; Clayton, S.; McCaffrey, B.
2008-12-01
Between them, the world's research programs, national institutions and corporations, especially oil and gas companies, have acquired substantial volumes of seismic reflection data. Although the vast majority are proprietary and confidential, significant data are released and available for research, including those in public data libraries. The challenge now is to maximise use of these data, by providing routes to seismic not simply on the basis of acquisition or processing attributes but via the geology they image. The Virtual Seismic Atlas (VSA: www.seismicatlas.org) meets this challenge by providing an independent, free-to-use community based internet resource that captures and shares the geological interpretation of seismic data globally. Images and associated documents are explicitly indexed by extensive metadata trees, using not only existing survey and geographical data but also the geology they portray. The solution uses a Documentum database interrogated through Endeca Guided Navigation, to search, discover and retrieve images. The VSA allows users to compare contrasting interpretations of clean data thereby exploring the ranges of uncertainty in the geometric interpretation of subsurface structure. The metadata structures can be used to link reports and published research together with other data types such as wells. And the VSA can link to existing data libraries. Searches can take different paths, revealing arrays of geological analogues, new datasets while providing entirely novel insights and genuine surprises. This can then drive new creative opportunities for research and training, and expose the contents of seismic data libraries to the world.
Statistical Analysis of Cardiovascular Data from FAP
NASA Technical Reports Server (NTRS)
Sealey, Meghan
2016-01-01
pressure, etc.) to see which could best predict how long the subjects could tolerate the tilt tests. With this I plan to analyze an artificial gravity study in order to determine the effects of orthostatic intolerance during spaceflight. From these projects, I became efficient in using the statistical software Stata, which I had previously never used before. I learned new statistical methods, such as mixed-effects linear regression, maximum likelihood estimation on longitudinal data, and post model-fitting tests to see if certain parameters contribute significantly to the model, all of which will better my understanding for when I continue studying for my masters' degree. I was also able to demonstrate my knowledge of statistics by helping other students run statistical analyses for their own projects. After completing these projects, the experience and knowledge gained from completing this analysis exemplifies the type of work that I would like to pursue in the future. After completing my masters' degree, I plan to pursue a career in biostatistics, which is exactly the position that I interned as, and I plan to use this experience to contribute to that goal
NASA Astrophysics Data System (ADS)
Jha, Sanjeev Kumar; Comunian, Alessandro; Mariethoz, Gregoire; Kelly, Bryce F. J.
2014-10-01
We develop a stochastic approach to construct channelized 3-D geological models constrained to borehole measurements as well as geological interpretation. The methodology is based on simple 2-D geologist-provided sketches of fluvial depositional elements, which are extruded in the 3rd dimension. Multiple-point geostatistics (MPS) is used to impair horizontal variability to the structures by introducing geometrical transformation parameters. The sketches provided by the geologist are used as elementary training images, whose statistical information is expanded through randomized transformations. We demonstrate the applicability of the approach by applying it to modeling a fluvial valley filling sequence in the Maules Creek catchment, Australia. The facies models are constrained to borehole logs, spatial information borrowed from an analogue and local orientations derived from the present-day stream networks. The connectivity in the 3-D facies models is evaluated using statistical measures and transport simulations. Comparison with a statistically equivalent variogram-based model shows that our approach is more suited for building 3-D facies models that contain structures specific to the channelized environment and which have a significant influence on the transport processes.
Geologic interpretation of HCMM and aircraft thermal data
NASA Technical Reports Server (NTRS)
1982-01-01
Progress on the Heat Capacity Mapping Mission (HCMM) follow-on study is reported. Numerous image products for geologic interpretation of both HCMM and aircraft thermal data were produced. These include, among others, various combinations of the thermal data with LANDSAT and SEASAT data. The combined data sets were displayed using simple color composites, principal component color composites and black and white images, and hue, saturation intensity color composites. Algorithms for incorporating both atmospheric and elevation data simultaneously into the digital processing for creation of quantitatively correct thermal inertia images, are in the final development stage. A field trip to Death Valley was undertaken to field check the aircraft and HCMM data.
Shock Classication of Ordinary Chondrites: New Data and Interpretations
NASA Astrophysics Data System (ADS)
Stoffler, D.; Keil, K.; Scott, E. R. D.
1992-07-01
Introduction. The recently proposed classification system for shocked chondrites (1) is based on a microscopic survey of 76 non-Antarctic H, L, and LL chondrites. Obviously, a larger database is highly desirable in order to confirm earlier conclusions and to allow for a statistically relevant interpretation of the data. Here, we report the shock classification of an additional 54 ordinary chondrites and summarize implications based on a total of 130 samples. New observations on shock effects. Continued studies of those shock effects in olivine and plagioclase that are indicative of the shock stages S1 - S6 as defined in (1) revealed the following: Planar deformation features in olivine, considered typical of stage S5, occur occasionally in stage S3 and are common in stage S4. In some S4 chondrites plagioclase is not partially isotropic but still birefringent coexisting with a small fraction of S3 olivines. Opaque shock veins occur not only in shock stage S3 and above (1) but have now been found in a few chondrites of shock stage S2. Thermal annealing of shock effects. Planar fractures and planar deformation features in olivine persist up to the temperatures required for recrystallization of olivine (> ca. 900 degrees C). Shock history of breccias. In a number of petrologic types 3 and 4 chondrites without recognizable (polymict) breccia texture, we found chondrules and olivine fragments with different shock histories ranging from S1 to S3. Regolith and fragmental breccias are polymict with regard to lithology and shock. The intensity of the latest shock typically varies from S1 to S4 in the breccias studied so far. Frequency distribution of shock stages. A significant difference between H and L chondrites is emerging in contrast to our previous statistics (1), whereas the conspicuous lack of shock stages S5 and S6 in type 3 and 4 chondrites is clearly confirmed (Fig. 1). Correlation between shock and noble gas content. The concentration of radiogenic argon and of
Toxic substances and human risk: principles of data interpretation
Tardiff, R.G.; Rodricks, J.V.
1988-01-01
This book provides a comprehensive overview of the relationship between toxicology and risk assessment and identifying the principles that should be used to evaluate toxicological data for human risk assessment. The book opens by distinguishing between the practice of toxicology as a science (observational and data-gathering activities) and its practice as an art (predictive or risk-estimating activities). This dichotomous nature produces the two elemental problems with which users of toxicological data must grapple. First, how relevant are data provided by the science of toxicology to assessment of human health risks. Second, what methods of data interpretation should be used to formulate hypotheses or predictions regarding human health risk.
Parameter Interpretation and Reduction for a Unified Statistical Mechanical Surface Tension Model.
Boyer, Hallie; Wexler, Anthony; Dutcher, Cari S
2015-09-01
Surface properties of aqueous solutions are important for environments as diverse as atmospheric aerosols and biocellular membranes. Previously, we developed a surface tension model for both electrolyte and nonelectrolyte aqueous solutions across the entire solute concentration range (Wexler and Dutcher, J. Phys. Chem. Lett. 2013, 4, 1723-1726). The model differentiated between adsorption of solute molecules in the bulk and surface of solution using the statistical mechanics of multilayer sorption solution model of Dutcher et al. (J. Phys. Chem. A 2013, 117, 3198-3213). The parameters in the model had physicochemical interpretations, but remained largely empirical. In the current work, these parameters are related to solute molecular properties in aqueous solutions. For nonelectrolytes, sorption tendencies suggest a strong relation with molecular size and functional group spacing. For electrolytes, surface adsorption of ions follows ion surface-bulk partitioning calculations by Pegram and Record (J. Phys. Chem. B 2007, 111, 5411-5417). PMID:26275040
2014-01-01
Background A new algorithm has been developed to enable the interpretation of black box models. The developed algorithm is agnostic to learning algorithm and open to all structural based descriptors such as fragments, keys and hashed fingerprints. The algorithm has provided meaningful interpretation of Ames mutagenicity predictions from both random forest and support vector machine models built on a variety of structural fingerprints. A fragmentation algorithm is utilised to investigate the model’s behaviour on specific substructures present in the query. An output is formulated summarising causes of activation and deactivation. The algorithm is able to identify multiple causes of activation or deactivation in addition to identifying localised deactivations where the prediction for the query is active overall. No loss in performance is seen as there is no change in the prediction; the interpretation is produced directly on the model’s behaviour for the specific query. Results Models have been built using multiple learning algorithms including support vector machine and random forest. The models were built on public Ames mutagenicity data and a variety of fingerprint descriptors were used. These models produced a good performance in both internal and external validation with accuracies around 82%. The models were used to evaluate the interpretation algorithm. Interpretation was revealed that links closely with understood mechanisms for Ames mutagenicity. Conclusion This methodology allows for a greater utilisation of the predictions made by black box models and can expedite further study based on the output for a (quantitative) structure activity model. Additionally the algorithm could be utilised for chemical dataset investigation and knowledge extraction/human SAR development. PMID:24661325
47 CFR 1.363 - Introduction of statistical data.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 47 Telecommunication 1 2011-10-01 2011-10-01 false Introduction of statistical data. 1.363 Section... Proceedings Evidence § 1.363 Introduction of statistical data. (a) All statistical studies, offered in... analyses, and experiments, and those parts of other studies involving statistical methodology shall...
47 CFR 1.363 - Introduction of statistical data.
Code of Federal Regulations, 2010 CFR
2010-10-01
... 47 Telecommunication 1 2010-10-01 2010-10-01 false Introduction of statistical data. 1.363 Section... Proceedings Evidence § 1.363 Introduction of statistical data. (a) All statistical studies, offered in... analyses, and experiments, and those parts of other studies involving statistical methodology shall...
Statistical mapping of count survey data
Royle, J. Andrew; Link, W.A.; Sauer, J.R.
2002-01-01
We apply a Poisson mixed model to the problem of mapping (or predicting) bird relative abundance from counts collected from the North American Breeding Bird Survey (BBS). The model expresses the logarithm of the Poisson mean as a sum of a fixed term (which may depend on habitat variables) and a random effect which accounts for remaining unexplained variation. The random effect is assumed to be spatially correlated, thus providing a more general model than the traditional Poisson regression approach. Consequently, the model is capable of improved prediction when data are autocorrelated. Moreover, formulation of the mapping problem in terms of a statistical model facilitates a wide variety of inference problems which are cumbersome or even impossible using standard methods of mapping. For example, assessment of prediction uncertainty, including the formal comparison of predictions at different locations, or through time, using the model-based prediction variance is straightforward under the Poisson model (not so with many nominally model-free methods). Also, ecologists may generally be interested in quantifying the response of a species to particular habitat covariates or other landscape attributes. Proper accounting for the uncertainty in these estimated effects is crucially dependent on specification of a meaningful statistical model. Finally, the model may be used to aid in sampling design, by modifying the existing sampling plan in a manner which minimizes some variance-based criterion. Model fitting under this model is carried out using a simulation technique known as Markov Chain Monte Carlo. Application of the model is illustrated using Mourning Dove (Zenaida macroura) counts from Pennsylvania BBS routes. We produce both a model-based map depicting relative abundance, and the corresponding map of prediction uncertainty. We briefly address the issue of spatial sampling design under this model. Finally, we close with some discussion of mapping in relation to
Design and coding considerations of the soil data language interpreter
NASA Astrophysics Data System (ADS)
Kollias, V. J.; Kollias, J. G.
A query language, named Soil Data Language (SDL), for retrieving information from a Soil. Data Bank, is part of the ARSIS (A Relational Soil Information System) system, currently being developed in Greece. The interpreter of the language accepts input programs, expressed as SDL commands, and outputs the requested information. This paper describes design principles employed during the coding of the interpreter of the language. The derived program can be modified to cover eventual alterations to the specifications of language or the content and structure of the data bank. The study may be seen as an initiative to the design of Generalized Soil and Land Information Systems that primarily are concerned with the easy adaptation to a variety of National processing requirements.
Implementation of ILLIAC 4 algorithms for multispectral image interpretation. [earth resources data
NASA Technical Reports Server (NTRS)
Ray, R. M.; Thomas, J. D.; Donovan, W. E.; Swain, P. H.
1974-01-01
Research has focused on the design and partial implementation of a comprehensive ILLIAC software system for computer-assisted interpretation of multispectral earth resources data such as that now collected by the Earth Resources Technology Satellite. Research suggests generally that the ILLIAC 4 should be as much as two orders of magnitude more cost effective than serial processing computers for digital interpretation of ERTS imagery via multivariate statistical classification techniques. The potential of the ARPA Network as a mechanism for interfacing geographically-dispersed users to an ILLIAC 4 image processing facility is discussed.
Rosenfield, G.H.
1986-01-01
Statistical analysis is conducted to determine the unique value of real- and synthetic-aperture side-looking airborne radar (SLAR) to detect interpreted structural elements. SLAR images were compared to standard and digitally enhanced Landsat multispectral scanner (MSS) images and to aerial photographs. After interpretation of the imagery, data were cumulated by total length in miles and by frequency of counts. Maximum uniqueness is obtained first from real-aperture SLAR, 58.3% of total, and, second, from digitally enhanced Landsat MSS images, 54.1% of total. ?? 1986 Plenum Publishing Corporation.
Statistical Analysis of DWPF ARG-1 Data
Harris, S.P.
2001-03-02
A statistical analysis of analytical results for ARG-1, an Analytical Reference Glass, blanks, and the associated calibration and bench standards has been completed. These statistics provide a means for DWPF to review the performance of their laboratory as well as identify areas of improvement.
Mobile Collection and Automated Interpretation of EEG Data
NASA Technical Reports Server (NTRS)
Mintz, Frederick; Moynihan, Philip
2007-01-01
A system that would comprise mobile and stationary electronic hardware and software subsystems has been proposed for collection and automated interpretation of electroencephalographic (EEG) data from subjects in everyday activities in a variety of environments. By enabling collection of EEG data from mobile subjects engaged in ordinary activities (in contradistinction to collection from immobilized subjects in clinical settings), the system would expand the range of options and capabilities for performing diagnoses. Each subject would be equipped with one of the mobile subsystems, which would include a helmet that would hold floating electrodes (see figure) in those positions on the patient s head that are required in classical EEG data-collection techniques. A bundle of wires would couple the EEG signals from the electrodes to a multi-channel transmitter also located in the helmet. Electronic circuitry in the helmet transmitter would digitize the EEG signals and transmit the resulting data via a multidirectional RF patch antenna to a remote location. At the remote location, the subject s EEG data would be processed and stored in a database that would be auto-administered by a newly designed relational database management system (RDBMS). In this RDBMS, in nearly real time, the newly stored data would be subjected to automated interpretation that would involve comparison with other EEG data and concomitant peer-reviewed diagnoses stored in international brain data bases administered by other similar RDBMSs.
Patton, Charles J.; Gilroy, Edward J.
1999-01-01
Data on which this report is based, including nutrient concentrations in synthetic reference samples determined concurrently with those in real samples, are extensive (greater than 20,000 determinations) and have been published separately. In addition to confirming the well-documented instability of nitrite in acidified samples, this study also demonstrates that when biota are removed from samples at collection sites by 0.45-micrometer membrane filtration, subsequent preservation with sulfuric acid or mercury (II) provides no statistically significant improvement in nutrient concentration stability during storage at 4 degrees Celsius for 30 days. Biocide preservation had no statistically significant effect on the 30-day stability of phosphorus concentrations in whole-water splits from any of the 15 stations, but did stabilize Kjeldahl nitrogen concentrations in whole-water splits from three data-collection stations where ammonium accounted for at least half of the measured Kjeldahl nitrogen.
Reiber, Hansotto
2016-06-01
The physiological and biophysical knowledge base for interpretations of cerebrospinal fluid (CSF) data and reference ranges are essential for the clinical pathologist and neurochemist. With the popular description of the CSF flow dependent barrier function, the dynamics and concentration gradients of blood-derived, brain-derived and leptomeningeal proteins in CSF or the specificity-independent functions of B-lymphocytes in brain also the neurologist, psychiatrist, neurosurgeon as well as the neuropharmacologist may find essentials for diagnosis, research or development of therapies. This review may help to replace the outdated ideas like "leakage" models of the barriers, linear immunoglobulin Index Interpretations or CSF electrophoresis. Calculations, Interpretations and analytical pitfalls are described for albumin quotients, quantitation of immunoglobulin synthesis in Reibergrams, oligoclonal IgG, IgM analysis, the polyspecific ( MRZ- ) antibody reaction, the statistical treatment of CSF data and general quality assessment in the CSF laboratory. The diagnostic relevance is documented in an accompaning review. PMID:27332077
Interpretation of Landsat-4 Thematic Mapper and Multispectral Scanner data for forest surveys
NASA Technical Reports Server (NTRS)
Benson, A. S.; Degloria, S. D.
1985-01-01
Landsat-4 Thematic Mapper (TM) and Multispectral Scanner (MSS) data were evaluated by interpreting film and digital products and statistical data for selected forest cover types in California. Significant results were: (1) TM color image products should contain a spectral band in the visible (bands 1, 2, or 3), near infrared (band 4), and middle infrared (band 5) regions for maximizing the interpretability of vegetation types; (2) TM color composites should contain band 4 in all cases even at the expense of excluding band 5; and (3) MSS color composites were more interpretable than all TM color composites for certain cover types and for all cover types when band 4 was excluded from the TM composite.
NASA Astrophysics Data System (ADS)
Grzempowski, Piotr; Bac-Bronowicz, Joanna; Blachowski, Jan; Milczarek, Wojciech
2014-05-01
The increasing number of data made available on Open Source servers allows for interdisciplinary interpretations of deformation measurements at both the local and the continental scales. The openly available vector and raster models of topographic, geological, geophysical, geodetic, remote sensing data have different spatial and temporal resolutions and are of various quality. The reliability of deformation modelling results depend on the resolution and accuracy of the models describing factors and conditions, in which these deformations take place. The paper describes the structure of a system for integration and processing of data obtained from Open Source servers including topographic, geological, geophysical, seismic, geodetic, remote sensing and other data needed for interpretation of deformation measurements and development of statistical models. The system is based on GIS environment in the scope of data storage and fundamental spatial analyses and support of external expert software. In the paper the results of interpretations and statistical models in local and continental scale taking into account analysis of the data resolution and accuracy and their influence on the final result of the modelling have been presented. Example influence models taking into account quantitative and qualitative data have also been shown.
New Statistical Approach to the Analysis of Hierarchical Data
NASA Astrophysics Data System (ADS)
Neuman, S. P.; Guadagnini, A.; Riva, M.
2014-12-01
Many variables possess a hierarchical structure reflected in how their increments vary in space and/or time. Quite commonly the increments (a) fluctuate in a highly irregular manner; (b) possess symmetric, non-Gaussian frequency distributions characterized by heavy tails that often decay with separation distance or lag; (c) exhibit nonlinear power-law scaling of sample structure functions in a midrange of lags, with breakdown in such scaling at small and large lags; (d) show extended power-law scaling (ESS) at all lags; and (e) display nonlinear scaling of power-law exponent with order of sample structure function. Some interpret this to imply that the variables are multifractal, which explains neither breakdowns in power-law scaling nor ESS. We offer an alternative interpretation consistent with all above phenomena. It views data as samples from stationary, anisotropic sub-Gaussian random fields subordinated to truncated fractional Brownian motion (tfBm) or truncated fractional Gaussian noise (tfGn). The fields are scaled Gaussian mixtures with random variances. Truncation of fBm and fGn entails filtering out components below data measurement or resolution scale and above domain scale. Our novel interpretation of the data allows us to obtain maximum likelihood estimates of all parameters characterizing the underlying truncated sub-Gaussian fields. These parameters in turn make it possible to downscale or upscale all statistical moments to situations entailing smaller or larger measurement or resolution and sampling scales, respectively. They also allow one to perform conditional or unconditional Monte Carlo simulations of random field realizations corresponding to these scales. Aspects of our approach are illustrated on field and laboratory measured porous and fractured rock permeabilities, as well as soil texture characteristics and neural network estimates of unsaturated hydraulic parameters in a deep vadose zone near Phoenix, Arizona. We also use our approach
Huang, Ruili; Southall, Noel; Xia, Menghang; Cho, Ming-Hsuang; Jadhav, Ajit; Nguyen, Dac-Trung; Inglese, James; Tice, Raymond R.; Austin, Christopher P.
2009-01-01
In support of the U.S. Tox21 program, we have developed a simple and chemically intuitive model we call weighted feature significance (WFS) to predict the toxicological activity of compounds, based on the statistical enrichment of structural features in toxic compounds. We trained and tested the model on the following: (1) data from quantitative high–throughput screening cytotoxicity and caspase activation assays conducted at the National Institutes of Health Chemical Genomics Center, (2) data from Salmonella typhimurium reverse mutagenicity assays conducted by the U.S. National Toxicology Program, and (3) hepatotoxicity data published in the Registry of Toxic Effects of Chemical Substances. Enrichments of structural features in toxic compounds are evaluated for their statistical significance and compiled into a simple additive model of toxicity and then used to score new compounds for potential toxicity. The predictive power of the model for cytotoxicity was validated using an independent set of compounds from the U.S. Environmental Protection Agency tested also at the National Institutes of Health Chemical Genomics Center. We compared the performance of our WFS approach with classical classification methods such as Naive Bayesian clustering and support vector machines. In most test cases, WFS showed similar or slightly better predictive power, especially in the prediction of hepatotoxic compounds, where WFS appeared to have the best performance among the three methods. The new algorithm has the important advantages of simplicity, power, interpretability, and ease of implementation. PMID:19805409
Eide, I; Zahlsen, K
1996-01-01
The paper describes experimental and statistical methods for toxicokinetic evaluation of mixtures in inhalation experiments. Synthetic mixtures of three C9 n-paraffinic, naphthenic and aromatic hydrocarbons (n-nonane, trimethylcyclohexane and trimethylbenzene, respectively) were studied in the rat after inhalation for 12h. The hydrocarbons were mixed according to principles for statistical experimental design using mixture design at four vapour levels (75, 150, 300 and 450 ppm) to support an empirical model with linear, interaction and quadratic terms (Taylor polynome). Immediately after exposure, concentrations of hydrocarbons were measured by head space gas chromatography in blood, brain, liver, kidneys and perirenal fat. Multivariate data analysis and modelling were performed with PLS (projections to latent structures). The best models were obtained after removing all interaction terms, suggesting that there were no interactions between the hydrocarbons with respect to absorption and distribution. Uptake of paraffins and particularly aromatics is best described by quadratic models, whereas the uptake of the naphthenic hydrocarbons is nearly linear. All models are good, with high correlation (r2) and prediction properties (Q2), the latter after cross validation. The concentrations of aromates in blood were high compared to the other hydrocarbons. At concentrations below 250 ppm, the naphthene reached higher concentrations in the brain compared to the paraffin and the aromate. Statistical experimental design, multivariate data analysis and modelling have proved useful for the evaluation of synthetic mixtures. The principles may also be used in the design of liquid mixtures, which may be evaporated partially or completely. PMID:8740533
Quantitative interpretation of Great Lakes remote sensing data
NASA Technical Reports Server (NTRS)
Shook, D. F.; Salzman, J.; Svehla, R. A.; Gedney, R. T.
1980-01-01
The paper discusses the quantitative interpretation of Great Lakes remote sensing water quality data. Remote sensing using color information must take into account (1) the existence of many different organic and inorganic species throughout the Great Lakes, (2) the occurrence of a mixture of species in most locations, and (3) spatial variations in types and concentration of species. The radiative transfer model provides a potential method for an orderly analysis of remote sensing data and a physical basis for developing quantitative algorithms. Predictions and field measurements of volume reflectances are presented which show the advantage of using a radiative transfer model. Spectral absorptance and backscattering coefficients for two inorganic sediments are reported.
Interdisciplinary applications and interpretations of remotely sensed data
NASA Technical Reports Server (NTRS)
Peterson, G. W.; Mcmurtry, G. J.
1972-01-01
An interdisciplinary approach to use remote sensor for the inventory of natural resources is discussed. The areas under investigation are land use, determination of pollution sources and damage, and analysis of geologic structure and terrain. The geographical area of primary interest is the Susquehanna River Basin. Descriptions of the data obtained by aerial cameras, multiband cameras, optical mechanical scanners, and radar are included. The Earth Resources Technology Satellite and Skylab program are examined. Interpretations of spacecraft data to show specific areas of interest are developed.
Borehole seismic data processing and interpretation: New free software
NASA Astrophysics Data System (ADS)
Farfour, Mohammed; Yoon, Wang Jung
2015-12-01
Vertical Seismic Profile (VSP) surveying is a vital tool in subsurface imaging and reservoir characterization. The technique allows geophysicists to infer critical information that cannot be obtained otherwise. MVSP is a new MATLAB tool with a graphical user interface (GUI) for VSP shot modeling, data processing, and interpretation. The software handles VSP data from the loading and preprocessing stages to the final stage of corridor plotting and integration with well and seismic data. Several seismic and signal processing toolboxes are integrated and modified to suit and enrich the processing and display packages. The main motivation behind the development of the software is to provide new geoscientists and students in the geoscience fields with free software that brings together all VSP modules in one easy-to-use package. The software has several modules that allow the user to test, process, compare, visualize, and produce publication-quality results. The software is developed as a stand-alone MATLAB application that requires only MATLAB Compiler Runtime (MCR) to run with full functionality. We present a detailed description of MVSP and use the software to create synthetic VSP data. The data are then processed using different available tools. Next, real data are loaded and fully processed using the software. The data are then integrated with well data for more detailed analysis and interpretation. In order to evaluate the software processing flow accuracy, the same data are processed using commercial software. Comparison of the processing results shows that MVSP is able to process VSP data as efficiently as commercial software packages currently used in industry, and provides similar high-quality processed data.
NASA Astrophysics Data System (ADS)
Dralle, D.; Karst, N.; Thompson, S. E.
2015-12-01
Multiple competing theories suggest that power law behavior governs the observed first-order dynamics of streamflow recessions - the important process by which catchments dry-out via the stream network, altering the availability of surface water resources and in-stream habitat. Frequently modeled as: dq/dt = -aqb, recessions typically exhibit a high degree of variability, even within a single catchment, as revealed by significant shifts in the values of "a" and "b" across recession events. One potential source of this variability lies in underlying, hard-to-observe fluctuations in how catchment water storage is partitioned amongst distinct storage elements, each having different discharge behaviors. Testing this and competing hypotheses with widely available streamflow timeseries, however, has been hindered by a power law scaling artifact that obscures meaningful covariation between the recession parameters, "a" and "b". Here we briefly outline a technique that removes this artifact, revealing intriguing new patterns in the joint distribution of recession parameters. Using long-term flow data from catchments in Northern California, we explore temporal variations, and find that the "a" parameter varies strongly with catchment wetness. Then we explore how the "b" parameter changes with "a", and find that measures of its variation are maximized at intermediate "a" values. We propose an interpretation of this pattern based on statistical mechanics, meaning "b" can be viewed as an indicator of the catchment "microstate" - i.e. the partitioning of storage - and "a" as a measure of the catchment macrostate (i.e. the total storage). In statistical mechanics, entropy (i.e. microstate variance, that is the variance of "b") is maximized for intermediate values of extensive variables (i.e. wetness, "a"), as observed in the recession data. This interpretation of "a" and "b" was supported by model runs using a multiple-reservoir catchment toy model, and lends support to the
Data relay system specifications for ERTS image interpretation
NASA Technical Reports Server (NTRS)
Daniel, J. F.
1970-01-01
Experiments with the Data Collection System (DCS) of the Earth Resources Technology Satellites (ERTS) have been developed to stress ERTS applications in the Earth Resources Observation Systems (EROS) Program. Active pursuit of this policy has resulted in the design of eight specific experiments requiring a total of 98 DCS ground-data platforms. Of these eight experiments, six are intended to make use of DCS data as an aid in image interpretation, while two make use of the capability to relay data from remote locations. Preliminary discussions regarding additional experiments indicate a need for at least 150 DCS platforms within the EROS Program for ERTS experimentation. Results from the experiments will be used to assess the DCS suitability for satellites providing on-line, real-time, data relay capability. The rationale of the total DCS network of ground platforms and the relationship of each experiment to that rationale are discussed.
Lee, L.; Helsel, D.
2005-01-01
Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.
Statistical Treatment of Looking-Time Data
ERIC Educational Resources Information Center
Csibra, Gergely; Hernik, Mikolaj; Mascaro, Olivier; Tatone, Denis; Lengyel, Máté
2016-01-01
Looking times (LTs) are frequently measured in empirical research on infant cognition. We analyzed the statistical distribution of LTs across participants to develop recommendations for their treatment in infancy research. Our analyses focused on a common within-subject experimental design, in which longer looking to novel or unexpected stimuli is…
Bayesian Statistics for Biological Data: Pedigree Analysis
ERIC Educational Resources Information Center
Stanfield, William D.; Carlton, Matthew A.
2004-01-01
The use of Bayes' formula is applied to the biological problem of pedigree analysis to show that the Bayes' formula and non-Bayesian or "classical" methods of probability calculation give different answers. First year college students of biology can be introduced to the Bayesian statistics.
47 CFR 1.363 - Introduction of statistical data.
Code of Federal Regulations, 2012 CFR
2012-10-01
... 47 Telecommunication 1 2012-10-01 2012-10-01 false Introduction of statistical data. 1.363 Section 1.363 Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL PRACTICE AND PROCEDURE Hearing Proceedings Evidence § 1.363 Introduction of statistical data. (a) All statistical studies, offered in evidence in common carrier hearing...
Describing Middle School Students' Organization of Statistical Data.
ERIC Educational Resources Information Center
Johnson, Yolanda; Hofbauer, Pamela
The purpose of this study was to describe how middle school students physically arrange and organize statistical data. A case-study analysis was used to define and characterize the styles in which students handle, organize, and group statistical data. A series of four statistical tasks (Mooney, Langrall, Hofbauer, & Johnson, 2001) were given to…
The Systematic Interpretation of Cosmic Ray Data (The Transport Project)
NASA Technical Reports Server (NTRS)
Guzik, T. Gregory
1997-01-01
The Transport project's primary goals were to: (1) Provide measurements of critical fragmentation cross sections; (2) Study the cross section systematics; (3) Improve the galactic cosmic ray propagation methodology; and (4) Use the new cross section measurements to improve the interpretation of cosmic ray data. To accomplish these goals a collaboration was formed consisting of researchers in the US at Louisiana State University (LSU), Lawrence Berkeley Laboratory (LBL), Goddard Space Flight Center (GSFC), the University of Minnesota (UM), New Mexico State University (NMSU), in France at the Centre d'Etudes de Saclay and in Italy at the Universita di Catania. The US institutions, lead by LSU, were responsible for measuring new cross sections using the LBL HISS facility, analysis of these measurements and their application to interpreting cosmic ray data. France developed a liquid hydrogen target that was used in the HISS experiment and participated in the data interpretation. Italy developed a Multifunctional Neutron Spectrometer (MUFFINS) for the HISS runs to measure the energy spectra, angular distributions and multiplicities of neutrons emitted during the high energy interactions. The Transport Project was originally proposed to NASA during Summer, 1988 and funding began January, 1989. Transport was renewed twice (1991, 1994) and finally concluded at LSU on September, 30, 1997. During the more than 8 years of effort we had two major experiment runs at LBL, obtained data on the interaction of twenty different beams with a liquid hydrogen target, completed the analysis of fifteen of these datasets obtaining 590 new cross section measurements, published nine journal articles as well as eighteen conference proceedings papers, and presented more than thirty conference talks.
Presentation and interpretation of chemical data for igneous rocks
Wright, T.L.
1974-01-01
Arguments are made in favor of using variation diagrams to plot analyses of igneous rocks and their derivatives and modeling differentiation processes by least-squares mixing procedures. These methods permit study of magmatic differentiation and related processes in terms of all of the chemical data available. Data are presented as they are reported by the chemist and specific processes may be modeled and either quantitatively described or rejected as inappropriate or too simple. Examples are given of the differing interpretations that can arise when data are plotted on an AEM ternary vs. the same data on a full set of MgO variation diagrams. Mixing procedures are illustrated with reference to basaltic lavas from the Columbia Plateau. ?? 1974 Springer-Verlag.
Statistical data of the uranium industry
1980-01-01
This document is a compilation of historical facts and figures through 1979. These statistics are based primarily on information provided voluntarily by the uranium exploration, mining, and milling companies. The production, reserves, drilling, and production capability information has been reported in a manner which avoids disclosure of proprietary information. Only the totals for the $1.5 reserves are reported. Because of increased interest in higher cost resources for long range planning purposes, a section covering the distribution of $100 per pound reserves statistics has been newly included. A table of mill recovery ranges for the January 1, 1980 reserves has also been added to this year's edition. The section on domestic uranium production capability has been deleted this year but will be included next year. The January 1, 1980 potential resource estimates are unchanged from the January 1, 1979 estimates.
Simple Hartmann test data interpretation for ophthalmic lenses
NASA Astrophysics Data System (ADS)
Salas-Peimbert, Didia Patricia; Trujillo-Schiaffino, Gerardo; González-Silva, Jorge Alberto; Almazán-Cuellar, Saúl; Malacara-Doblado, Daniel
2006-04-01
This article describes a simple Hartmann test data interpretation that can be used to evaluate the performance of ophthalmic lenses. Considering each spot of the Hartmann pattern such as a single test ray, using simple ray tracing analysis, it is possible to calculate the power values from the lens under test at the point corresponding with each spot. The values obtained by this procedure are used to plot the power distribution map of the entire lens. We present the results obtained applying this method with single vision, bifocal, and progressive lenses.
Geological Interpretation of PSInSAR Data at Regional Scale
Meisina, Claudia; Zucca, Francesco; Notti, Davide; Colombo, Alessio; Cucchi, Anselmo; Savio, Giuliano; Giannico, Chiara; Bianchi, Marco
2008-01-01
Results of a PSInSAR™ project carried out by the Regional Agency for Environmental Protection (ARPA) in Piemonte Region (Northern Italy) are presented and discussed. A methodology is proposed for the interpretation of the PSInSAR™ data at the regional scale, easy to use by the public administrations and by civil protection authorities. Potential and limitations of the PSInSAR™ technique for ground movement detection on a regional scale and monitoring are then estimated in relationship with different geological processes and various geological environments.
Internet Data Analysis for the Undergraduate Statistics Curriculum
ERIC Educational Resources Information Center
Sanchez, Juana; He, Yan
2005-01-01
Statistics textbooks for undergraduates have not caught up with the enormous amount of analysis of Internet data that is taking place these days. Case studies that use Web server log data or Internet network traffic data are rare in undergraduate Statistics education. And yet these data provide numerous examples of skewed and bimodal…
Guidelines for Statistical Analysis of Percentage of Syllables Stuttered Data
ERIC Educational Resources Information Center
Jones, Mark; Onslow, Mark; Packman, Ann; Gebski, Val
2006-01-01
Purpose: The purpose of this study was to develop guidelines for the statistical analysis of percentage of syllables stuttered (%SS) data in stuttering research. Method; Data on %SS from various independent sources were used to develop a statistical model to describe this type of data. On the basis of this model, %SS data were simulated with…
Flexibility in data interpretation: effects of representational format
Braithwaite, David W.; Goldstone, Robert L.
2013-01-01
Graphs and tables differentially support performance on specific tasks. For tasks requiring reading off single data points, tables are as good as or better than graphs, while for tasks involving relationships among data points, graphs often yield better performance. However, the degree to which graphs and tables support flexibility across a range of tasks is not well-understood. In two experiments, participants detected main and interaction effects in line graphs and tables of bivariate data. Graphs led to more efficient performance, but also lower flexibility, as indicated by a larger discrepancy in performance across tasks. In particular, detection of main effects of variables represented in the graph legend was facilitated relative to detection of main effects of variables represented in the x-axis. Graphs may be a preferable representational format when the desired task or analytical perspective is known in advance, but may also induce greater interpretive bias than tables, necessitating greater care in their use and design. PMID:24427145
Interpretation methodology and analysis of in-flight lightning data
NASA Technical Reports Server (NTRS)
Rudolph, T.; Perala, R. A.
1982-01-01
A methodology is presented whereby electromagnetic measurements of inflight lightning stroke data can be understood and extended to other aircraft. Recent measurements made on the NASA F106B aircraft indicate that sophisticated numerical techniques and new developments in corona modeling are required to fully understand the data. Thus the problem is nontrivial and successful interpretation can lead to a significant understanding of the lightning/aircraft interaction event. This is of particular importance because of the problem of lightning induced transient upset of new technology low level microcircuitry which is being used in increasing quantities in modern and future avionics. Inflight lightning data is analyzed and lightning environments incident upon the F106B are determined.
Improved interpretation of satellite altimeter data using genetic algorithms
NASA Technical Reports Server (NTRS)
Messa, Kenneth; Lybanon, Matthew
1992-01-01
Genetic algorithms (GA) are optimization techniques that are based on the mechanics of evolution and natural selection. They take advantage of the power of cumulative selection, in which successive incremental improvements in a solution structure become the basis for continued development. A GA is an iterative procedure that maintains a 'population' of 'organisms' (candidate solutions). Through successive 'generations' (iterations) the population as a whole improves in simulation of Darwin's 'survival of the fittest'. GA's have been shown to be successful where noise significantly reduces the ability of other search techniques to work effectively. Satellite altimetry provides useful information about oceanographic phenomena. It provides rapid global coverage of the oceans and is not as severely hampered by cloud cover as infrared imagery. Despite these and other benefits, several factors lead to significant difficulty in interpretation. The GA approach to the improved interpretation of satellite data involves the representation of the ocean surface model as a string of parameters or coefficients from the model. The GA searches in parallel, a population of such representations (organisms) to obtain the individual that is best suited to 'survive', that is, the fittest as measured with respect to some 'fitness' function. The fittest organism is the one that best represents the ocean surface model with respect to the altimeter data.
Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.
ERIC Educational Resources Information Center
Boysen, Guy A.
2015-01-01
Student evaluations of teaching are among the most accepted and important indicators of college teachers' performance. However, faculty and administrators can overinterpret small variations in mean teaching evaluations. The current research examined the effect of including statistical information on the interpretation of teaching evaluations.…
Plausible inference and the interpretation of quantitative data
Nakhleh, C.W.
1998-02-01
The analysis of quantitative data is central to scientific investigation. Probability theory, which is founded on two rules, the sum and product rules, provides the unique, logically consistent method for drawing valid inferences from quantitative data. This primer on the use of probability theory is meant to fulfill a pedagogical purpose. The discussion begins at the foundation of scientific inference by showing how the sum and product rules of probability theory follow from some very basic considerations of logical consistency. The authors then develop general methods of probability theory that are essential to the analysis and interpretation of data. They discuss how to assign probability distributions using the principle of maximum entropy, how to estimate parameters from data, how to handle nuisance parameters whose values are of little interest, and how to determine which of a set of models is most justified by a data set. All these methods are used together in most realistic data analyses. Examples are given throughout to illustrate the basic points.
Statistical Considerations of Data Processing in Giovanni Online Tool
NASA Technical Reports Server (NTRS)
Suhung, Shen; Leptoukh, G.; Acker, J.; Berrick, S.
2005-01-01
The GES DISC Interactive Online Visualization and Analysis Infrastructure (Giovanni) is a web-based interface for the rapid visualization and analysis of gridded data from a number of remote sensing instruments. The GES DISC currently employs several Giovanni instances to analyze various products, such as Ocean-Giovanni for ocean products from SeaWiFS and MODIS-Aqua; TOMS & OM1 Giovanni for atmospheric chemical trace gases from TOMS and OMI, and MOVAS for aerosols from MODIS, etc. (http://giovanni.gsfc.nasa.gov) Foremost among the Giovanni statistical functions is data averaging. Two aspects of this function are addressed here. The first deals with the accuracy of averaging gridded mapped products vs. averaging from the ungridded Level 2 data. Some mapped products contain mean values only; others contain additional statistics, such as number of pixels (NP) for each grid, standard deviation, etc. Since NP varies spatially and temporally, averaging with or without weighting by NP will be different. In this paper, we address differences of various weighting algorithms for some datasets utilized in Giovanni. The second aspect is related to different averaging methods affecting data quality and interpretation for data with non-normal distribution. The present study demonstrates results of different spatial averaging methods using gridded SeaWiFS Level 3 mapped monthly chlorophyll a data. Spatial averages were calculated using three different methods: arithmetic mean (AVG), geometric mean (GEO), and maximum likelihood estimator (MLE). Biogeochemical data, such as chlorophyll a, are usually considered to have a log-normal distribution. The study determined that differences between methods tend to increase with increasing size of a selected coastal area, with no significant differences in most open oceans. The GEO method consistently produces values lower than AVG and MLE. The AVG method produces values larger than MLE in some cases, but smaller in other cases. Further
Statistical Considerations of Data Processing in Giovanni Online Tool
NASA Astrophysics Data System (ADS)
Shen, S.; Leptoukh, G.; Acker, J.; Berrick, S.
2005-12-01
The GES DISC Interactive Online Visualization and Analysis Infrastructure (Giovanni) is a web-based interface for the rapid visualization and analysis of gridded data from a number of remote sensing instruments. The GES DISC currently employs several Giovanni instances to analyze various products, such as Ocean-Giovanni for ocean products from SeaWiFS and MODIS-Aqua; TOMS & OMI Giovanni for atmospheric chemical trace gases from TOMS and OMI, and MOVAS for aerosols from MODIS, etc. (http://giovanni.gsfc.nasa.gov) Foremost among the Giovanni statistical functions is data averaging. Two aspects of this function are addressed here. The first deals with the accuracy of averaging gridded mapped products vs. averaging from the ungridded Level 2 data. Some mapped products contain mean values only; others contain additional statistics, such as number of pixels (NP) for each grid, standard deviation, etc. Since NP varies spatially and temporally, averaging with or without weighting by NP will be different. In this paper, we address differences of various weighting algorithms for some datasets utilized in Giovanni. The second aspect is related to different averaging methods affecting data quality and interpretation for data with non-normal distribution. The present study demonstrates results of different spatial averaging methods using gridded SeaWiFS Level 3 mapped monthly chlorophyll a data. Spatial averages were calculated using three different methods: arithmetic mean (AVG), geometric mean (GEO), and maximum likelihood estimator (MLE). Biogeochemical data, such as chlorophyll a, are usually considered to have a log-normal distribution. The study determined that differences between methods tend to increase with increasing size of a selected coastal area, with no significant differences in most open oceans. The GEO method consistently produces values lower than AVG and MLE. The AVG method produces values larger than MLE in some cases, but smaller in other cases. Further
Dielectric Property Measurements to Support Interpretation of Cassini Radar Data
NASA Astrophysics Data System (ADS)
Jamieson, Corey; Barmatz, M.
2012-10-01
Radar observations are useful for constraining surface and near-surface compositions and illuminating geologic processes on Solar System bodies. The interpretation of Cassini radiometric and radar data at 13.78 GHz (2.2 cm) of Titan and other Saturnian icy satellites is aided by laboratory measurements of the dielectric properties of relevant materials. However, existing dielectric measurements of candidate surface materials at microwave frequencies and low temperatures is sparse. We have set up a microwave cavity and cryogenic system to measure the complex dielectric properties of liquid hydrocarbons relevant to Titan, specifically methane, ethane and their mixtures to support the interpretation of spacecraft instrument and telescope radar observations. To perform these measurements, we excite and detect the TM020 mode in a custom-built cavity with small metal loop antennas powered by a Vector Network Analyzer. The hydrocarbon samples are condensed into a cylindrical quartz tube that is axially oriented in the cavity. Frequency sweeps through a resonance are performed with an empty cavity, an empty quartz tube inserted into the cavity, and with a sample-filled quartz tube in the cavity. These sweeps are fit by a Lorentzian line shape, from which we obtain the resonant frequency, f, and quality factor, Q, for each experimental arrangement. We then derive dielectric constants and loss tangents for our samples near 13.78 GHz using a new technique ideally suited for measuring liquid samples. We will present temperature-dependent, dielectric property measurements for liquid methane and ethane. The full interpretation of the radar and radiometry observations of Saturn’s icy satellites depends critically on understanding the dielectric properties of potential surface materials. By investigating relevant liquids and solids we will improve constrains on lake depths, volumes and compositions, which are important to understand Titan’s carbon/organic cycle and inevitably
Interpretation of evidence in data by untrained medical students: a scenario-based study
2010-01-01
Background To determine which approach to assessment of evidence in data - statistical tests or likelihood ratios - comes closest to the interpretation of evidence by untrained medical students. Methods Empirical study of medical students (N = 842), untrained in statistical inference or in the interpretation of diagnostic tests. They were asked to interpret a hypothetical diagnostic test, presented in four versions that differed in the distributions of test scores in diseased and non-diseased populations. Each student received only one version. The intuitive application of the statistical test approach would lead to rejecting the null hypothesis of no disease in version A, and to accepting the null in version B. Application of the likelihood ratio approach led to opposite conclusions - against the disease in A, and in favour of disease in B. Version C tested the importance of the p-value (A: 0.04 versus C: 0.08) and version D the importance of the likelihood ratio (C: 1/4 versus D: 1/8). Results In version A, 7.5% concluded that the result was in favour of disease (compatible with p value), 43.6% ruled against the disease (compatible with likelihood ratio), and 48.9% were undecided. In version B, 69.0% were in favour of disease (compatible with likelihood ratio), 4.5% against (compatible with p value), and 26.5% undecided. Increasing the p value from 0.04 to 0.08 did not change the results. The change in the likelihood ratio from 1/4 to 1/8 increased the proportion of non-committed responses. Conclusions Most untrained medical students appear to interpret evidence from data in a manner that is compatible with the use of likelihood ratios. PMID:20796297
Rapp, J.B.
1991-01-01
Q-mode factor analysis was used to quantitate the distribution of the major aliphatic hydrocarbon (n-alkanes, pristane, phytane) systems in sediments from a variety of marine environments. The compositions of the pure end members of the systems were obtained from factor scores and the distribution of the systems within each sample was obtained from factor loadings. All the data, from the diverse environments sampled (estuarine (San Francisco Bay), fresh-water (San Francisco Peninsula), polar-marine (Antarctica) and geothermal-marine (Gorda Ridge) sediments), were reduced to three major systems: a terrestrial system (mostly high molecular weight aliphatics with odd-numbered-carbon predominance), a mature system (mostly low molecular weight aliphatics without predominance) and a system containing mostly high molecular weight aliphatics with even-numbered-carbon predominance. With this statistical approach, it is possible to assign the percentage contribution from various sources to the observed distribution of aliphatic hydrocarbons in each sediment sample. ?? 1991.
Hayashi, C
1986-04-01
The subjects which are often encountered in the statistical design and analysis of data in medical science studies were discussed. The five topics examined were: Medical science and statistical methods So-called mathematical statistics and medical science Fundamentals of cross-tabulation analysis of statistical data and inference Exploratory study by multidimensional data analyses Optimal process control of individual, medical science and informatics of statistical data In I, the author's statistico-mathematical idea is characterized as the analysis of phenomena by statistical data. This is closely related to the logic, methodology and philosophy of science. This statistical concept and method are based on operational and pragmatic ideas. Self-examination of mathematical statistics is particularly focused in II and III. In II, the effectiveness of experimental design and statistical testing is thoroughly examined with regard to the study of medical science, and the limitation of its application is discussed. In III the apparent paradox of analysis of cross-tabulation of statistical data and statistical inference is shown. This is due to the operation of a simple two- or three-fold cross-tabulation analysis of (more than two or three) multidimensional data, apart from the sophisticated statistical test theory of association. In IV, the necessity of informatics of multidimensional data analysis in medical science is stressed. In V, the following point is discussed. The essential point of clinical trials is that they are not based on any simple statistical test in a traditional experimental design but on the optimal process control of individuals in the information space of the body and mind, which is based on a knowledge of medical science and the informatics of multidimensional statistical data analysis. PMID:3729436
Traumatic Brain Injury (TBI) Data and Statistics
... data.cdc.gov . Emergency Department Visits, Hospitalizations, and Deaths Rates of TBI-related Emergency Department Visits, Hospitalizations, ... related Hospitalizations by Age Group and Injury Mechanism Deaths Rates of TBI-related Deaths by Sex Rates ...
Efficient statistical mapping of avian count data
Royle, J. Andrew; Wikle, C.K.
2005-01-01
We develop a spatial modeling framework for count data that is efficient to implement in high-dimensional prediction problems. We consider spectral parameterizations for the spatially varying mean of a Poisson model. The spectral parameterization of the spatial process is very computationally efficient, enabling effective estimation and prediction in large problems using Markov chain Monte Carlo techniques. We apply this model to creating avian relative abundance maps from North American Breeding Bird Survey (BBS) data. Variation in the ability of observers to count birds is modeled as spatially independent noise, resulting in over-dispersion relative to the Poisson assumption. This approach represents an improvement over existing approaches used for spatial modeling of BBS data which are either inefficient for continental scale modeling and prediction or fail to accommodate important distributional features of count data thus leading to inaccurate accounting of prediction uncertainty.
Seismic data processing and interpretation on the loess plateau, Part 1: Seismic data processing
NASA Astrophysics Data System (ADS)
Jiang, Jiayu; Fu, Shouxian; Li, Jiuling
2005-12-01
Branching river channels and the coexistence of valleys, ridges, hills, and slopes as the result of long-term weathering and erosion form the unique loess topography. The Changqing Geophysical Company, working in these complex conditions, has established a suite of technologies for high-fidelity processing and fine interpretation of seismic data. This article introduces the processes involved in the data processing and interpretation and illustrates the results.
Yu, Victoria; Kishan, Amar U.; Cao, Minsong; Low, Daniel; Lee, Percy; Ruan, Dan
2014-03-15
Purpose: To demonstrate a new method of evaluating dose response of treatment-induced lung radiographic injury post-SBRT (stereotactic body radiotherapy) treatment and the discovery of bimodal dose behavior within clinically identified injury volumes. Methods: Follow-up CT scans at 3, 6, and 12 months were acquired from 24 patients treated with SBRT for stage-1 primary lung cancers or oligometastic lesions. Injury regions in these scans were propagated to the planning CT coordinates by performing deformable registration of the follow-ups to the planning CTs. A bimodal behavior was repeatedly observed from the probability distribution for dose values within the deformed injury regions. Based on a mixture-Gaussian assumption, an Expectation-Maximization (EM) algorithm was used to obtain characteristic parameters for such distribution. Geometric analysis was performed to interpret such parameters and infer the critical dose level that is potentially inductive of post-SBRT lung injury. Results: The Gaussian mixture obtained from the EM algorithm closely approximates the empirical dose histogram within the injury volume with good consistency. The average Kullback-Leibler divergence values between the empirical differential dose volume histogram and the EM-obtained Gaussian mixture distribution were calculated to be 0.069, 0.063, and 0.092 for the 3, 6, and 12 month follow-up groups, respectively. The lower Gaussian component was located at approximately 70% prescription dose (35 Gy) for all three follow-up time points. The higher Gaussian component, contributed by the dose received by planning target volume, was located at around 107% of the prescription dose. Geometrical analysis suggests the mean of the lower Gaussian component, located at 35 Gy, as a possible indicator for a critical dose that induces lung injury after SBRT. Conclusions: An innovative and improved method for analyzing the correspondence between lung radiographic injury and SBRT treatment dose has
Statistical analysis of life history calendar data.
Eerola, Mervi; Helske, Satu
2016-04-01
The life history calendar is a data-collection tool for obtaining reliable retrospective data about life events. To illustrate the analysis of such data, we compare the model-based probabilistic event history analysis and the model-free data mining method, sequence analysis. In event history analysis, we estimate instead of transition hazards the cumulative prediction probabilities of life events in the entire trajectory. In sequence analysis, we compare several dissimilarity metrics and contrast data-driven and user-defined substitution costs. As an example, we study young adults' transition to adulthood as a sequence of events in three life domains. The events define the multistate event history model and the parallel life domains in multidimensional sequence analysis. The relationship between life trajectories and excess depressive symptoms in middle age is further studied by their joint prediction in the multistate model and by regressing the symptom scores on individual-specific cluster indices. The two approaches complement each other in life course analysis; sequence analysis can effectively find typical and atypical life patterns while event history analysis is needed for causal inquiries. PMID:23117406
Revisiting the interpretation of casein micelle SAXS data.
Ingham, B; Smialowska, A; Erlangga, G D; Matia-Merino, L; Kirby, N M; Wang, C; Haverkamp, R G; Carr, A J
2016-08-17
An in-depth, critical review of model-dependent fitting of small-angle X-ray scattering (SAXS) data of bovine skim milk has led us to develop a new mathematical model for interpreting these data. Calcium-edge resonant soft X-ray scattering data provides unequivocal evidence as to the shape and location of the scattering due to colloidal calcium phosphate, which is manifested as a correlation peak centred at q = 0.035 Å(-1). In SAXS data this feature is seldom seen, although most literature studies attribute another feature centred at q = 0.08-0.1 Å(-1) to CCP. This work shows that the major SAXS features are due to protein arrangements: the casein micelle itself; internal regions approximately 20 nm in size, separated by water channels; and protein structures which are inhomogeneous on a 1-3 nm length scale. The assignment of these features is consistent with their behaviour under various conditions, including hydration time after reconstitution, addition of EDTA (a Ca-chelating agent), addition of urea, and reduction of pH. PMID:27491477
Interpretation of AMS-02 electrons and positrons data
Mauro, M. Di; Donato, F.; Fornengo, N.; Vittino, A.; Lineros, R. E-mail: donato@to.infn.it E-mail: rlineros@ific.uv.es
2014-04-01
We perform a combined analysis of the recent AMS-02 data on electrons, positrons, electrons plus positrons and positron fraction, in a self-consistent framework where we realize a theoretical modeling of all the astrophysical components that can contribute to the observed fluxes in the whole energy range. The primary electron contribution is modeled through the sum of an average flux from distant sources and the fluxes from the local supernova remnants in the Green catalog. The secondary electron and positron fluxes originate from interactions on the interstellar medium of primary cosmic rays, for which we derive a novel determination by using AMS-02 proton and helium data. Primary positrons and electrons from pulsar wind nebulae in the ATNF catalog are included and studied in terms of their most significant (while loosely known) properties and under different assumptions (average contribution from the whole catalog, single dominant pulsar, a few dominant pulsars). We obtain a remarkable agreement between our various modeling and the AMS-02 data for all types of analysis, demonstrating that the whole AMS-02 leptonic data admit a self-consistent interpretation in terms of astrophysical contributions.
ERIC Educational Resources Information Center
Taffel, Selma
This report presents and interprets birth statistics for the United States with particular emphasis on changes that took place during the period 1970-73. Data for the report were based on information entered on birth certificates collected from all states. The majority of the document comprises graphs and tables of data, but there are four short…
Novice Interpretations of Visual Representations of Geosciences Data
NASA Astrophysics Data System (ADS)
Burkemper, L. K.; Arthurs, L.
2013-12-01
Past cognition research of individual's perception and comprehension of bar and line graphs are substantive enough that they have resulted in the generation of graph design principles and graph comprehension theories; however, gaps remain in our understanding of how people process visual representations of data, especially of geologic and atmospheric data. This pilot project serves to build on others' prior research and begin filling the existing gaps. The primary objectives of this pilot project include: (i) design a novel data collection protocol based on a combination of paper-based surveys, think-aloud interviews, and eye-tracking tasks to investigate student data handling skills of simple to complex visual representations of geologic and atmospheric data, (ii) demonstrate that the protocol yields results that shed light on student data handling skills, and (iii) generate preliminary findings upon which tentative but perhaps helpful recommendations on how to more effectively present these data to the non-scientist community and teach essential data handling skills. An effective protocol for the combined use of paper-based surveys, think-aloud interviews, and computer-based eye-tracking tasks for investigating cognitive processes involved in perceiving, comprehending, and interpreting visual representations of geologic and atmospheric data is instrumental to future research in this area. The outcomes of this pilot study provide the foundation upon which future more in depth and scaled up investigations can build. Furthermore, findings of this pilot project are sufficient for making, at least, tentative recommendations that can help inform (i) the design of physical attributes of visual representations of data, especially more complex representations, that may aid in improving students' data handling skills and (ii) instructional approaches that have the potential to aid students in more effectively handling visual representations of geologic and atmospheric data
GO Explorer: A gene-ontology tool to aid in the interpretation of shotgun proteomics data
Carvalho, Paulo C; Fischer, Juliana SG; Chen, Emily I; Domont, Gilberto B; Carvalho, Maria GC; Degrave, Wim M; Yates, John R; Barbosa, Valmir C
2009-01-01
Background Spectral counting is a shotgun proteomics approach comprising the identification and relative quantitation of thousands of proteins in complex mixtures. However, this strategy generates bewildering amounts of data whose biological interpretation is a challenge. Results Here we present a new algorithm, termed GO Explorer (GOEx), that leverages the gene ontology (GO) to aid in the interpretation of proteomic data. GOEx stands out because it combines data from protein fold changes with GO over-representation statistics to help draw conclusions. Moreover, it is tightly integrated within the PatternLab for Proteomics project and, thus, lies within a complete computational environment that provides parsers and pattern recognition tools designed for spectral counting. GOEx offers three independent methods to query data: an interactive directed acyclic graph, a specialist mode where key words can be searched, and an automatic search. Its usefulness is demonstrated by applying it to help interpret the effects of perillyl alcohol, a natural chemotherapeutic agent, on glioblastoma multiform cell lines (A172). We used a new multi-surfactant shotgun proteomic strategy and identified more than 2600 proteins; GOEx pinpointed key sets of differentially expressed proteins related to cell cycle, alcohol catabolism, the Ras pathway, apoptosis, and stress response, to name a few. Conclusion GOEx facilitates organism-specific studies by leveraging GO and providing a rich graphical user interface. It is a simple to use tool, specialized for biologists who wish to analyze spectral counting data from shotgun proteomics. GOEx is available at . PMID:19239707
Statistical information of ASAR observations over wetland areas: An interaction model interpretation
NASA Astrophysics Data System (ADS)
Grings, F.; Salvia, M.; Karszenbaum, H.; Ferrazzoli, P.; Perna, P.; Barber, M.; Jacobo Berlles, J.
2010-01-01
This paper presents the results obtained after studying the relation between the statistical parameters that describe the backscattering distribution of junco marshes and their biophysical variables. The results are based on the texture analysis of a time series of Envisat ASAR C-band data (APP mode, V V +HH polarizations) acquired between October 2003 and January 2005 over the Lower Paraná River Delta, Argentina. The image power distributions were analyzed, and we show that the K distribution provides a good fitting of SAR data extracted from wetland observations for both polarizations. We also show that the estimated values of the order parameter of the K distribution can be explained using fieldwork and reasonable assumptions. In order to explore these results, we introduce a radiative transfer based interaction model to simulate the junco marsh σ0 distribution. After analyzing model simulations, we found evidence that the order parameter is related to the junco plant density distribution inside the junco marsh patch. It is concluded that the order parameter of the K distribution could be a useful parameter to estimate the junco plant density. This result is important for basin hydrodynamic modeling, since marsh plant density is the most important parameter to estimate marsh water conductance.
Physics in Perspective Volume II, Part C, Statistical Data.
ERIC Educational Resources Information Center
National Academy of Sciences - National Research Council, Washington, DC. Physics Survey Committee.
Statistical data relating to the sociology and economics of the physics enterprise are presented and explained. The data are divided into three sections: manpower data, data on funding and costs, and data on the literature of physics. Each section includes numerous studies, with notes on the sources and types of data, gathering procedures, and…
Transformations on Data Sets and Their Effects on Descriptive Statistics
ERIC Educational Resources Information Center
Fox, Thomas B.
2005-01-01
The activity asks students to examine the effects on the descriptive statistics of a data set that has undergone either a translation or a scale change. They make conjectures relative to the effects on the statistics of a transformation on a data set and then they defend their conjectures and deductively verify several of them.
Using Data from Climate Science to Teach Introductory Statistics
ERIC Educational Resources Information Center
Witt, Gary
2013-01-01
This paper shows how the application of simple statistical methods can reveal to students important insights from climate data. While the popular press is filled with contradictory opinions about climate science, teachers can encourage students to use introductory-level statistics to analyze data for themselves on this important issue in public…
The Empirical Nature and Statistical Treatment of Missing Data
ERIC Educational Resources Information Center
Tannenbaum, Christyn E.
2009-01-01
Introduction. Missing data is a common problem in research and can produce severely misleading analyses, including biased estimates of statistical parameters, and erroneous conclusions. In its 1999 report, the APA Task Force on Statistical Inference encouraged authors to report complications such as missing data and discouraged the use of…
Enhancements to the Engine Data Interpretation System (EDIS)
NASA Technical Reports Server (NTRS)
Hofmann, Martin O.
1993-01-01
The Engine Data Interpretation System (EDIS) expert system project assists the data review personnel at NASA/MSFC in performing post-test data analysis and engine diagnosis of the Space Shuttle Main Engine (SSME). EDIS uses knowledge of the engine, its components, and simple thermodynamic principles instead of, and in addition to, heuristic rules gathered from the engine experts. EDIS reasons in cooperation with human experts, following roughly the pattern of logic exhibited by human experts. EDIS concentrates on steady-state static faults, such as small leaks, and component degradations, such as pump efficiencies. The objective of this contract was to complete the set of engine component models, integrate heuristic rules into EDIS, integrate the Power Balance Model into EDIS, and investigate modification of the qualitative reasoning mechanisms to allow 'fuzzy' value classification. The result of this contract is an operational version of EDIS. EDIS will become a module of the Post-Test Diagnostic System (PTDS) and will, in this context, provide system-level diagnostic capabilities which integrate component-specific findings provided by other modules.
Enhancements to the Engine Data Interpretation System (EDIS)
NASA Technical Reports Server (NTRS)
Hofmann, Martin O.
1993-01-01
The Engine Data Interpretation System (EDIS) expert system project assists the data review personnel at NASA/MSFC in performing post-test data analysis and engine diagnosis of the Space Shuttle Main Engine (SSME). EDIS uses knowledge of the engine, its components, and simple thermodynamic principles instead of, and in addition to, heuristic rules gathered from the engine experts. EDIS reasons in cooperation with human experts, following roughly the pattern of logic exhibited by human experts. EDIS concentrates on steady-state static faults, such as small leaks, and component degradations, such as pump efficiencies. The objective of this contract was to complete the set of engine component models, integrate heuristic rules into EDIS, integrate the Power Balance Model into EDIS, and investigate modification of the qualitative reasoning mechanisms to allow 'fuzzy' value classification. The results of this contract is an operational version of EDIS. EDIS will become a module of the Post-Test Diagnostic System (PTDS) and will, in this context, provide system-level diagnostic capabilities which integrate component-specific findings provided by other modules.
Interpreting two-photon imaging data of lymphocyte motility.
Meyer-Hermann, Michael E; Maini, Philip K
2005-06-01
Recently, using two-photon imaging it has been found that the movement of B and T cells in lymph nodes can be described by a random walk with persistence of orientation in the range of 2 minutes. We interpret this new class of lymphocyte motility data within a theoretical model. The model considers cell movement to be composed of the movement of subunits of the cell membrane. In this way movement and deformation of the cell are correlated to each other. We find that, indeed, the lymphocyte movement in lymph nodes can best be described as a random walk with persistence of orientation. The assumption of motility induced cell elongation is consistent with the data. Within the framework of our model the two-photon data suggest that T and B cells are in a single velocity state with large stochastic width. The alternative of three different velocity states with frequent changes of their state and small stochastic width is less likely. Two velocity states can be excluded. PMID:16089770
Enhancements to the Engine Data Interpretation System (EDIS)
NASA Astrophysics Data System (ADS)
Hofmann, Martin O.
1993-07-01
The Engine Data Interpretation System (EDIS) expert system project assists the data review personnel at NASA/MSFC in performing post-test data analysis and engine diagnosis of the Space Shuttle Main Engine (SSME). EDIS uses knowledge of the engine, its components, and simple thermodynamic principles instead of, and in addition to, heuristic rules gathered from the engine experts. EDIS reasons in cooperation with human experts, following roughly the pattern of logic exhibited by human experts. EDIS concentrates on steady-state static faults, such as small leaks, and component degradations, such as pump efficiencies. The objective of this contract was to complete the set of engine component models, integrate heuristic rules into EDIS, integrate the Power Balance Model into EDIS, and investigate modification of the qualitative reasoning mechanisms to allow 'fuzzy' value classification. The result of this contract is an operational version of EDIS. EDIS will become a module of the Post-Test Diagnostic System (PTDS) and will, in this context, provide system-level diagnostic capabilities which integrate component-specific findings provided by other modules.
A Decision Tree Approach to the Interpretation of Multivariate Statistical Techniques.
ERIC Educational Resources Information Center
Fok, Lillian Y.; And Others
1995-01-01
Discusses the nature, power, and limitations of four multivariate techniques: factor analysis, multiple analysis of variance, multiple regression, and multiple discriminant analysis. Shows how decision trees assist in interpreting results. (SK)
Experimental uncertainty estimation and statistics for data having interval uncertainty.
Kreinovich, Vladik (Applied Biomathematics, Setauket, New York); Oberkampf, William Louis (Applied Biomathematics, Setauket, New York); Ginzburg, Lev (Applied Biomathematics, Setauket, New York); Ferson, Scott (Applied Biomathematics, Setauket, New York); Hajagos, Janos (Applied Biomathematics, Setauket, New York)
2007-05-01
This report addresses the characterization of measurements that include epistemic uncertainties in the form of intervals. It reviews the application of basic descriptive statistics to data sets which contain intervals rather than exclusively point estimates. It describes algorithms to compute various means, the median and other percentiles, variance, interquartile range, moments, confidence limits, and other important statistics and summarizes the computability of these statistics as a function of sample size and characteristics of the intervals in the data (degree of overlap, size and regularity of widths, etc.). It also reviews the prospects for analyzing such data sets with the methods of inferential statistics such as outlier detection and regressions. The report explores the tradeoff between measurement precision and sample size in statistical results that are sensitive to both. It also argues that an approach based on interval statistics could be a reasonable alternative to current standard methods for evaluating, expressing and propagating measurement uncertainties.
Tsitouridou, Roxani; Papazova, Petia; Simeonova, Pavlina; Simeonov, Vasil
2013-01-01
The size distribution of aerosol particles (PM0.015-PM18) in relation to their soluble inorganic species and total water soluble organic compounds (WSOC) was investigated at an urban site of Thessaloniki, Northern Greece. The sampling period was from February to July 2007. The determined compounds were compared with mass concentrations of the PM fractions for nano (N: 0.015 < Dp < 0.06), ultrafine (UFP: 0.015 < Dp < 0.125), fine (FP: 0.015 < Dp < 2.0) and coarse particles (CP: 2.0 < Dp < 8.0) in order to perform mass closure of the water soluble content for the respective fractions. Electrolytes were the dominant species in all fractions (24-27%), followed by WSOC (16-23%). The water soluble inorganic and organic content was found to account for 53% of the nanoparticle, 48% of the ultrafine particle, 45% of the fine particle and 44% of the coarse particle mass. Correlations between the analyzed species were performed and the effect of local and long-range transported emissions was examined by wind direction and backward air mass trajectories. Multivariate statistical analysis (cluster analysis and principal components analysis) of the collected data was performed in order to reveal the specific data structure. Possible sources of air pollution were identified and an attempt is made to find patterns of similarity between the different sized aerosols and the seasons of monitoring. It was proven that several major latent factors are responsible for the data structure despite the size of the aerosols - mineral (soil) dust, sea sprays, secondary emissions, combustion sources and industrial impact. The seasonal separation proved to be not very specific. PMID:24007436
Statistical analysis of marginal count failure data.
Karim, M R; Yamamoto, W; Suzuki, K
2001-06-01
Manufacturers want to assess the quality and reliability of their products. Specifically, they want to know the exact number of failures from the sales transacted during a particular month. Information available today is sometimes incomplete as many companies analyze their failure data simply comparing sales for a total month from a particular department with the total number of claims registered for that given month. This information--called marginal count data--is, thus, incomplete as it does not give the exact number of failures of the specific products that were sold in a particular month. In this paper we discuss nonparametric estimation of the mean numbers of failures for repairable products and the failure probabilities for nonrepairable products. We present a nonhomogeneous Poisson process model for repairable products and a multinomial model and its Poisson approximation for nonrepairable products. A numerical example is given and a simulation is carried out to evaluate the proposed methods of estimating failure probabilities under a number of possible situations. PMID:11458656
Statistical Modeling of Large-Scale Scientific Simulation Data
Eliassi-Rad, T; Baldwin, C; Abdulla, G; Critchlow, T
2003-11-15
With the advent of massively parallel computer systems, scientists are now able to simulate complex phenomena (e.g., explosions of a stars). Such scientific simulations typically generate large-scale data sets over the spatio-temporal space. Unfortunately, the sheer sizes of the generated data sets make efficient exploration of them impossible. Constructing queriable statistical models is an essential step in helping scientists glean new insight from their computer simulations. We define queriable statistical models to be descriptive statistics that (1) summarize and describe the data within a user-defined modeling error, and (2) are able to answer complex range-based queries over the spatiotemporal dimensions. In this chapter, we describe systems that build queriable statistical models for large-scale scientific simulation data sets. In particular, we present our Ad-hoc Queries for Simulation (AQSim) infrastructure, which reduces the data storage requirements and query access times by (1) creating and storing queriable statistical models of the data at multiple resolutions, and (2) evaluating queries on these models of the data instead of the entire data set. Within AQSim, we focus on three simple but effective statistical modeling techniques. AQSim's first modeling technique (called univariate mean modeler) computes the ''true'' (unbiased) mean of systematic partitions of the data. AQSim's second statistical modeling technique (called univariate goodness-of-fit modeler) uses the Andersen-Darling goodness-of-fit method on systematic partitions of the data. Finally, AQSim's third statistical modeling technique (called multivariate clusterer) utilizes the cosine similarity measure to cluster the data into similar groups. Our experimental evaluations on several scientific simulation data sets illustrate the value of using these statistical models on large-scale simulation data sets.
Phase 1 report on sensor technology, data fusion and data interpretation for site characterization
Beckerman, M.
1991-10-01
In this report we discuss sensor technology, data fusion and data interpretation approaches of possible maximal usefulness for subsurface imaging and characterization of land-fill waste sites. Two sensor technologies, terrain conductivity using electromagnetic induction and ground penetrating radar, are described and the literature on the subject is reviewed. We identify the maximum entropy stochastic method as one providing a rigorously justifiable framework for fusing the sensor data, briefly summarize work done by us in this area, and examine some of the outstanding issues with regard to data fusion and interpretation. 25 refs., 17 figs.
Control Statistics Process Data Base V4
1998-05-07
The check standard database program, CSP_CB, is a menu-driven program that can acquire measurement data for check standards having a parameter dependence (such as frequency) or no parameter dependence (for example, mass measurements). The program may be run stand-alone or leaded as a subprogram to a Basic program already in memory. The software was designed to require little additional work on the part of the user. The facilitate this design goal, the program is entirelymore » menu-driven. In addition, the user does have control of file names and parameters within a definition file which sets up the basic scheme of file names.« less
Identification and interpretation of patterns in rocket engine data
NASA Astrophysics Data System (ADS)
Lo, C. F.; Wu, K.; Whitehead, B. A.
1993-10-01
A prototype software system was constructed to detect anomalous Space Shuttle Main Engine (SSME) behavior in the early stages of fault development significantly earlier than the indication provided by either redline detection mechanism or human expert analysis. The major task of the research project is to analyze ground test data, to identify patterns associated with the anomalous engine behavior, and to develop a pattern identification and detection system on the basis of this analysis. A prototype expert system which was developed on both PC and Symbolics 3670 lisp machine for detecting anomalies in turbopump vibration data was checked with data from ground tests 902-473, 902-501, 902-519, and 904-097 of the Space Shuttle Main Engine. The neural networks method was also applied to supplement the statistical method utilized in the prototype system to investigate the feasibility in detecting anomalies in turbopump vibration of SSME. In most cases the anomalies detected by the expert system agree with those reported by NASA. On the neural networks approach, the results are given the successful detection rate higher than 95 percent to identify either normal or abnormal running condition based on the experimental data as well as numerical simulation.
Identification and interpretation of patterns in rocket engine data
NASA Technical Reports Server (NTRS)
Lo, C. F.; Wu, K.; Whitehead, B. A.
1993-01-01
A prototype software system was constructed to detect anomalous Space Shuttle Main Engine (SSME) behavior in the early stages of fault development significantly earlier than the indication provided by either redline detection mechanism or human expert analysis. The major task of the research project is to analyze ground test data, to identify patterns associated with the anomalous engine behavior, and to develop a pattern identification and detection system on the basis of this analysis. A prototype expert system which was developed on both PC and Symbolics 3670 lisp machine for detecting anomalies in turbopump vibration data was checked with data from ground tests 902-473, 902-501, 902-519, and 904-097 of the Space Shuttle Main Engine. The neural networks method was also applied to supplement the statistical method utilized in the prototype system to investigate the feasibility in detecting anomalies in turbopump vibration of SSME. In most cases the anomalies detected by the expert system agree with those reported by NASA. On the neural networks approach, the results are given the successful detection rate higher than 95 percent to identify either normal or abnormal running condition based on the experimental data as well as numerical simulation.
"What If" Analyses: Ways to Interpret Statistical Significance Test Results Using EXCEL or "R"
ERIC Educational Resources Information Center
Ozturk, Elif
2012-01-01
The present paper aims to review two motivations to conduct "what if" analyses using Excel and "R" to understand the statistical significance tests through the sample size context. "What if" analyses can be used to teach students what statistical significance tests really do and in applied research either prospectively to estimate what sample size…
NASA Astrophysics Data System (ADS)
Bouzid, Mohamed; Sellaoui, Lotfi; Khalfaoui, Mohamed; Belmabrouk, Hafedh; Lamine, Abdelmottaleb Ben
2016-02-01
In this work, we studied the adsorption of ethanol on three types of activated carbon, namely parent Maxsorb III and two chemically modified activated carbons (H2-Maxsorb III and KOH-H2-Maxsorb III). This investigation has been conducted on the basis of the grand canonical formalism in statistical physics and on simplified assumptions. This led to three parameter equations describing the adsorption of ethanol onto the three types of activated carbon. There was a good correlation between experimental data and results obtained by the new proposed equation. The parameters characterizing the adsorption isotherm were the number of adsorbed molecules (s) per site n, the density of the receptor sites per unit mass of the adsorbent Nm, and the energetic parameter p1/2. They were estimated for the studied systems by a non linear least square regression. The results show that the ethanol molecules were adsorbed in perpendicular (or non parallel) position to the adsorbent surface. The magnitude of the calculated adsorption energies reveals that ethanol is physisorbed onto activated carbon. Both van der Waals and hydrogen interactions were involved in the adsorption process. The calculated values of the specific surface AS, proved that the three types of activated carbon have a highly microporous surface.
Eigenanalysis of SNP data with an identity by descent interpretation.
Zheng, Xiuwen; Weir, Bruce S
2016-02-01
Principal component analysis (PCA) is widely used in genome-wide association studies (GWAS), and the principal component axes often represent perpendicular gradients in geographic space. The explanation of PCA results is of major interest for geneticists to understand fundamental demographic parameters. Here, we provide an interpretation of PCA based on relatedness measures, which are described by the probability that sets of genes are identical-by-descent (IBD). An approximately linear transformation between ancestral proportions (AP) of individuals with multiple ancestries and their projections onto the principal components is found. In addition, a new method of eigenanalysis "EIGMIX" is proposed to estimate individual ancestries. EIGMIX is a method of moments with computational efficiency suitable for millions of SNP data, and it is not subject to the assumption of linkage equilibrium. With the assumptions of multiple ancestries and their surrogate ancestral samples, EIGMIX is able to infer ancestral proportions (APs) of individuals. The methods were applied to the SNP data from the HapMap Phase 3 project and the Human Genome Diversity Panel. The APs of individuals inferred by EIGMIX are consistent with the findings of the program ADMIXTURE. In conclusion, EIGMIX can be used to detect population structure and estimate genome-wide ancestral proportions with a relatively high accuracy. PMID:26482676
Chromosome microarrays in diagnostic testing: interpreting the genomic data.
Peters, Greg B; Pertile, Mark D
2014-01-01
DNA-based Chromosome MicroArrays (CMAs) are now well established as diagnostic tools in clinical genetics laboratories. Over the last decade, the primary application of CMAs has been the genome-wide detection of a particular class of mutation known as copy number variants (CNVs). Since 2010, CMA testing has been recommended as a first-tier test for detection of CNVs associated with intellectual disability, autism spectrum disorders, and/or multiple congenital anomalies…in the post-natal setting. CNVs are now regarded as pathogenic in 14-18 % of patients referred for these (and related) disorders.Through consideration of clinical examples, and several microarray platforms, we attempt to provide an appreciation of microarray diagnostics, from the initial inspection of the microarray data, to the composing of the patient report. In CMA data interpretation, a major challenge comes from the high frequency of clinically irrelevant CNVs observed within "patient" and "normal" populations. As might be predicted, the more common and clinically insignificant CNVs tend to be the smaller ones <100 kb in length, involving few or no known genes. However, this relationship is not at all straightforward: CNV length and gene content are only very imperfect indicators of CNV pathogenicity. Presently, there are no reliable means of separating, a priori, the benign from the pathological CNV classes.This chapter also considers sources of technical "noise" within CMA data sets. Some level of noise is inevitable in diagnostic genomics, given the very large number of data points generated in any one test. Noise further limits CMA resolution, and some miscalling of CNVs is unavoidable. In this, there is no ideal solution, but various strategies for handling noise are available. Even without solutions, consideration of these diagnostic problems per se is informative, as they afford critical insights into the biological and technical underpinnings of CNV discovery. These are indispensable
NASA Astrophysics Data System (ADS)
Lee, J.; Chang, H.
2001-12-01
In this research, we investigate the reciprocal influence between groundwater flow and its salinization occurred in two underground cavern sites, using major ion chemistry, PCA for chemical analysis data, and cross-correlation for various hydraulic data. The study areas are two underground LPG storage facilities constructed in South Sea coast, Yosu, and West Sea coastal regions, Pyeongtaek, Korea. Considerably high concentration of major cations and anions of groundwaters at both sites showed brackish or saline water types. In Yosu site, some great chemical difference of groundwater samples between rainy and dry season was caused by temporal intrusion of high-saline water into propane and butane cavern zone, but not in Pyeongtaek site. Cl/Br ratios and δ 18O- δ D distribution for tracing of salinization source water in both sites revealed that two kind of saline water (seawater and halite-dissolved solution) could influence the groundwater salinization in Yosu site, whereas only seawater intrusion could affect the groundwater chemistry of the observation wells in Pyeongtaek site. PCA performed by 8 and 10 chemical ions as statistical variables in both sites showed that intensive intrusion of seawater through butane cavern was occurred at Yosu site while seawater-groundwater mixing was observed at some observation wells located in the marginal part of Pyeongtaek site. Cross-correlation results revealed that the positive relationship between hydraulic head and cavern operating pressure was far more conspicuous at propane cavern zone in both sites (65 ~90% of correlation coefficients). According to the cross-correlation results of Yosu site, small change of head could provoke massive influx of halite-dissolved solution from surface through vertically developed fracture networks. However in Pyeongtaek site, the pressure-sensitive observation wells are not completely consistent with seawater-mixed wells, and the hydraulic change of heads at these wells related to the
Importance of data management with statistical analysis set division.
Wang, Ling; Li, Chan-juan; Jiang, Zhi-wei; Xia, Jie-lai
2015-11-01
Testing of hypothesis was affected by statistical analysis set division which was an important data management work before data base lock-in. Objective division of statistical analysis set under blinding was the guarantee of scientific trial conclusion. All the subjects having accepted at least once trial treatment after randomization should be concluded in safety set. Full analysis set should be close to the intention-to-treat as far as possible. Per protocol set division was the most difficult to control in blinded examination because of more subjectivity than the other two. The objectivity of statistical analysis set division must be guaranteed by the accurate raw data, the comprehensive data check and the scientific discussion, all of which were the strict requirement of data management. Proper division of statistical analysis set objectively and scientifically is an important approach to improve the data management quality. PMID:26911044
Clinical interpretation of CNVs with cross-species phenotype data
Czeschik, Johanna Christina; Doelken, Sandra C; Hehir-Kwa, Jayne Y; Ibn-Salem, Jonas; Mungall, Christopher J; Smedley, Damian; Haendel, Melissa A; Robinson, Peter N
2015-01-01
Background Clinical evaluation of CNVs identified via techniques such as array comparative genome hybridisation (aCGH) involves the inspection of lists of known and unknown duplications and deletions with the goal of distinguishing pathogenic from benign CNVs. A key step in this process is the comparison of the individual's phenotypic abnormalities with those associated with Mendelian disorders of the genes affected by the CNV. However, because often there is not much known about these human genes, an additional source of data that could be used is model organism phenotype data. Currently, almost 6000 genes in mouse and zebrafish are, when knocked out, associated with a phenotype in the model organism, but no disease is known to be caused by mutations in the human ortholog. Yet, searching model organism databases and comparing model organism phenotypes with patient phenotypes for identifying novel disease genes and medical evaluation of CNVs is hindered by the difficulty in integrating phenotype information across species and the lack of appropriate software tools. Methods Here, we present an integrated ranking scheme based on phenotypic matching, degree of overlap with known benign or pathogenic CNVs and the haploinsufficiency score for the prioritisation of CNVs responsible for a patient's clinical findings. Results We show that this scheme leads to significant improvements compared with rankings that do not exploit phenotypic information. We provide a software tool called PhenogramViz, which supports phenotype-driven interpretation of aCGH findings based on multiple data sources, including the integrated cross-species phenotype ontology Uberpheno, in order to visualise gene-to-phenotype relations. Conclusions Integrating and visualising cross-species phenotype information on the affected genes may help in routine diagnostics of CNVs. PMID:25280750
Reading, storing and statistical calculation of weight data.
Schliack, M
1987-02-01
A BASIC program is described which reads and computes weight data. Tara and gross weight data are read from an analytical balance. The net weights are calculated and stored on a disc. A statistical test (e.g. unpaired t-test or unpaired Wilcoxon test) can then be carried out with the weight data. The program calculates a descriptive statistic before the tests. PMID:3829652
Boyle temperature as a point of ideal gas in gentile statistics and its economic interpretation
NASA Astrophysics Data System (ADS)
Maslov, V. P.; Maslova, T. V.
2014-07-01
Boyle temperature is interpreted as the temperature at which the formation of dimers becomes impossible. To Irving Fisher's correspondence principle we assign two more quantities: the number of degrees of freedom, and credit. We determine the danger level of the mass of money M when the mutual trust between economic agents begins to fall.
The Galactic Center: possible interpretations of observational data.
NASA Astrophysics Data System (ADS)
Zakharov, Alexander
2015-08-01
There are not too many astrophysical cases where one really has an opportunity to check predictions of general relativity in the strong gravitational field limit. For these aims the black hole at the Galactic Center is one of the most interesting cases since it is the closest supermassive black hole. Gravitational lensing is a natural phenomenon based on the effect of light deflection in a gravitational field (isotropic geodesics are not straight lines in gravitational field and in a weak gravitational field one has small corrections for light deflection while the perturbative approach is not suitable for a strong gravitational field). Now there are two basic observational techniques to investigate a gravitational potential at the Galactic Center, namely, a) monitoring the orbits of bright stars near the Galactic Center to reconstruct a gravitational potential; b) measuring a size and a shape of shadows around black hole giving an alternative possibility to evaluate black hole parameters in mm-band with VLBI-technique. At the moment one can use a small relativistic correction approach for stellar orbit analysis (however, in the future the approximation will not be not precise enough due to enormous progress of observational facilities) while now for smallest structure analysis in VLBI observations one really needs a strong gravitational field approximation. We discuss results of observations, their conventional interpretations, tensions between observations and models and possible hints for a new physics from the observational data and tensions between observations and interpretations.References1. A.F. Zakharov, F. De Paolis, G. Ingrosso, and A. A. Nucita, New Astronomy Reviews, 56, 64 (2012).2. D. Borka, P. Jovanovic, V. Borka Jovanovic and A.F. Zakharov, Physical Reviews D, 85, 124004 (2012).3. D. Borka, P. Jovanovic, V. Borka Jovanovic and A.F. Zakharov, Journal of Cosmology and Astroparticle Physics, 11, 050 (2013).4. A.F. Zakharov, Physical Reviews D 90
Explorations in Statistics: The Analysis of Ratios and Normalized Data
ERIC Educational Resources Information Center
Curran-Everett, Douglas
2013-01-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This ninth installment of "Explorations in Statistics" explores the analysis of ratios and normalized--or standardized--data. As researchers, we compute a ratio--a numerator divided by a denominator--to compute a…
Using Data Mining to Teach Applied Statistics and Correlation
ERIC Educational Resources Information Center
Hartnett, Jessica L.
2016-01-01
This article describes two class activities that introduce the concept of data mining and very basic data mining analyses. Assessment data suggest that students learned some of the conceptual basics of data mining, understood some of the ethical concerns related to the practice, and were able to perform correlations via the Statistical Package for…
Textual Analysis and Data Mining: An Interpreting Research on Nursing.
De Caro, W; Mitello, L; Marucci, A R; Lancia, L; Sansoni, J
2016-01-01
Every day there is a data explosion on the web. In 2013, 5 exabytes of content were created each day. Every hour internet networks carries a quantity of texts equivalent to twenty billion books. For idea Iit is a huge mass of information on the linguistic behavior of people and society that was unthinkable until a few years ago. It is an opportunity for valuable analysis for understanding social phenomena, also in nursing and health care sector.This poster shows the the steps of an idealy strategy for textual statistical analysis and the process of extracting useful information about health care, referring expecially nursing care from journal and web information. We show the potential of web tools of Text Mining applications (DTM, Wordle, Voyant Tools, Taltac 2.10, Treecloud and other web 2.0 app) analyzing text data and information extraction about sentiment, perception, scientific activites and visibility of nursing. This specific analysis is conduct analyzing "Repubblica", first newspaper in Italy (years of analisys: 2012-14) and one italian scientific nursing journal (years: 2012-14). PMID:27332424
2013-01-01
Background High-throughput omics technologies such as microarrays and next-generation sequencing (NGS) have become indispensable tools in biological research. Computational analysis and biological interpretation of omics data can pose significant challenges due to a number of factors, in particular the systems integration required to fully exploit and compare data from different studies and/or technology platforms. In transcriptomics, the identification of differentially expressed genes when studying effect(s) or contrast(s) of interest constitutes the starting point for further downstream computational analysis (e.g. gene over-representation/enrichment analysis, reverse engineering) leading to mechanistic insights. Therefore, it is important to systematically store the full list of genes with their associated statistical analysis results (differential expression, t-statistics, p-value) corresponding to one or more effect(s) or contrast(s) of interest (shortly termed as ” contrast data”) in a comparable manner and extract gene sets in order to efficiently support downstream analyses and further leverage data on a long-term basis. Filling this gap would open new research perspectives for biologists to discover disease-related biomarkers and to support the understanding of molecular mechanisms underlying specific biological perturbation effects (e.g. disease, genetic, environmental, etc.). Results To address these challenges, we developed Confero, a contrast data and gene set platform for downstream analysis and biological interpretation of omics data. The Confero software platform provides storage of contrast data in a simple and standard format, data transformation to enable cross-study and platform data comparison, and automatic extraction and storage of gene sets to build new a priori knowledge which is leveraged by integrated and extensible downstream computational analysis tools. Gene Set Enrichment Analysis (GSEA) and Over-Representation Analysis (ORA) are
Korjus, Kristjan; Hebart, Martin N; Vicente, Raul
2016-01-01
Supervised machine learning methods typically require splitting data into multiple chunks for training, validating, and finally testing classifiers. For finding the best parameters of a classifier, training and validation are usually carried out with cross-validation. This is followed by application of the classifier with optimized parameters to a separate test set for estimating the classifier's generalization performance. With limited data, this separation of test data creates a difficult trade-off between having more statistical power in estimating generalization performance versus choosing better parameters and fitting a better model. We propose a novel approach that we term "Cross-validation and cross-testing" improving this trade-off by re-using test data without biasing classifier performance. The novel approach is validated using simulated data and electrophysiological recordings in humans and rodents. The results demonstrate that the approach has a higher probability of discovering significant results than the standard approach of cross-validation and testing, while maintaining the nominal alpha level. In contrast to nested cross-validation, which is maximally efficient in re-using data, the proposed approach additionally maintains the interpretability of individual parameters. Taken together, we suggest an addition to currently used machine learning approaches which may be particularly useful in cases where model weights do not require interpretation, but parameters do. PMID:27564393
Korjus, Kristjan; Hebart, Martin N.; Vicente, Raul
2016-01-01
Supervised machine learning methods typically require splitting data into multiple chunks for training, validating, and finally testing classifiers. For finding the best parameters of a classifier, training and validation are usually carried out with cross-validation. This is followed by application of the classifier with optimized parameters to a separate test set for estimating the classifier’s generalization performance. With limited data, this separation of test data creates a difficult trade-off between having more statistical power in estimating generalization performance versus choosing better parameters and fitting a better model. We propose a novel approach that we term “Cross-validation and cross-testing” improving this trade-off by re-using test data without biasing classifier performance. The novel approach is validated using simulated data and electrophysiological recordings in humans and rodents. The results demonstrate that the approach has a higher probability of discovering significant results than the standard approach of cross-validation and testing, while maintaining the nominal alpha level. In contrast to nested cross-validation, which is maximally efficient in re-using data, the proposed approach additionally maintains the interpretability of individual parameters. Taken together, we suggest an addition to currently used machine learning approaches which may be particularly useful in cases where model weights do not require interpretation, but parameters do. PMID:27564393
MacKinnon, David P.; Pirlott, Angela G.
2016-01-01
Statistical mediation methods provide valuable information about underlying mediating psychological processes, but the ability to infer that the mediator variable causes the outcome variable is more complex than widely known. Researchers have recently emphasized how violating assumptions about confounder bias severely limits causal inference of the mediator to dependent variable relation. Our article describes and addresses these limitations by drawing on new statistical developments in causal mediation analysis. We first review the assumptions underlying causal inference and discuss three ways to examine the effects of confounder bias when assumptions are violated. We then describe four approaches to address the influence of confounding variables and enhance causal inference, including comprehensive structural equation models, instrumental variable methods, principal stratification, and inverse probability weighting. Our goal is to further the adoption of statistical methods to enhance causal inference in mediation studies. PMID:25063043
Human milk biomonitoring data: interpretation and risk assessment issues.
LaKind, Judy S; Brent, Robert L; Dourson, Michael L; Kacew, Sam; Koren, Gideon; Sonawane, Babasaheb; Tarzian, Anita J; Uhl, Kathleen
2005-10-22
Biomonitoring data can, under certain conditions, be used to describe potential risks to human health (for example, blood lead levels used to determine children's neurodevelopmental risk). At present, there are very few chemical exposures at low levels for which sufficient data exist to state with confidence the link between levels of environmental chemicals in a person's body and his or her risk of adverse health effects. Human milk biomonitoring presents additional complications. Human milk can be used to obtain information on both the levels of environmental chemicals in the mother and her infant's exposure to an environmental chemical. However, in terms of the health of the mother, there are little to no extant data that can be used to link levels of most environmental chemicals in human milk to a particular health outcome in the mother. This is because, traditionally, risks are estimated based on dose, rather than on levels of environmental chemicals in the body, and the relationship between dose and human tissue levels is complex. On the other hand, for the infant, some information on dose is available because the infant is exposed to environmental chemicals in milk as a "dose" from which risk estimates can be derived. However, the traditional risk assessment approach is not designed to consider the benefits to the infant associated with breastfeeding and is complicated by the relatively short-term exposures to the infant from breastfeeding. A further complexity derives from the addition of in utero exposures, which complicates interpretation of epidemiological research on health outcomes of breastfeeding infants. Thus, the concept of "risk assessment" as it applies to human milk biomonitoring is not straightforward, and methodologies for undertaking this type of assessment have not yet been fully developed. This article describes the deliberations of the panel convened for the Technical Workshop on Human Milk Surveillance and Biomonitoring for Environmental
Antweiler, R.C.; Taylor, H.E.
2008-01-01
The main classes of statistical treatment of below-detection limit (left-censored) environmental data for the determination of basic statistics that have been used in the literature are substitution methods, maximum likelihood, regression on order statistics (ROS), and nonparametric techniques. These treatments, along with using all instrument-generated data (even those below detection), were evaluated by examining data sets in which the true values of the censored data were known. It was found that for data sets with less than 70% censored data, the best technique overall for determination of summary statistics was the nonparametric Kaplan-Meier technique. ROS and the two substitution methods of assigning one-half the detection limit value to censored data or assigning a random number between zero and the detection limit to censored data were adequate alternatives. The use of these two substitution methods, however, requires a thorough understanding of how the laboratory censored the data. The technique of employing all instrument-generated data - including numbers below the detection limit - was found to be less adequate than the above techniques. At high degrees of censoring (greater than 70% censored data), no technique provided good estimates of summary statistics. Maximum likelihood techniques were found to be far inferior to all other treatments except substituting zero or the detection limit value to censored data.
Statistical methods of combining information: Applications to sensor data fusion
Burr, T.
1996-12-31
This paper reviews some statistical approaches to combining information from multiple sources. Promising new approaches will be described, and potential applications to combining not-so-different data sources such as sensor data will be discussed. Experiences with one real data set are described.
The IUE data bank: Statistics and future aspects
NASA Technical Reports Server (NTRS)
Schmitz, Marion; Barylak, Michael
1988-01-01
The data exchange policy between Goddard Space Flight Center and ESA's Villafranca (Spain) station is described. The IUE data banks and their uses are outlined. Statistical information on objects observed, the quantity of data distributed and retrieved from the archives, together with a detailed design of the final format of the IUE merged log are also given.
Joint interpretation of geophysical data using Image Fusion techniques
NASA Astrophysics Data System (ADS)
Karamitrou, A.; Tsokas, G.; Petrou, M.
2013-12-01
Joint interpretation of geophysical data produced from different methods is a challenging area of research in a wide range of applications. In this work we apply several image fusion approaches to combine maps of electrical resistivity, electromagnetic conductivity, vertical gradient of the magnetic field, magnetic susceptibility, and ground penetrating radar reflections, in order to detect archaeological relics. We utilize data gathered from Arkansas University, with the support of the U.S. Department of Defense, through the Strategic Environmental Research and Development Program (SERDP-CS1263). The area of investigation is the Army City, situated in Riley Country of Kansas, USA. The depth of the relics is estimated about 30 cm from the surface, yet the surface indications of its existence are limited. We initially register the images from the different methods to correct from random offsets due to the use of hand-held devices during the measurement procedure. Next, we apply four different image fusion approaches to create combined images, using fusion with mean values, wavelet decomposition, curvelet transform, and curvelet transform enhancing the images along specific angles. We create seven combinations of pairs between the available geophysical datasets. The combinations are such that for every pair at least one high-resolution method (resistivity or magnetic gradiometry) is included. Our results indicate that in almost every case the method of mean values produces satisfactory fused images that corporate the majority of the features of the initial images. However, the contrast of the final image is reduced, and in some cases the averaging process nearly eliminated features that are fade in the original images. Wavelet based fusion outputs also good results, providing additional control in selecting the feature wavelength. Curvelet based fusion is proved the most effective method in most of the cases. The ability of curvelet domain to unfold the image in
ERIC Educational Resources Information Center
Maltese, Adam V.; Svetina, Dubravka; Harsh, Joseph A.
2015-01-01
In the STEM fields, adequate proficiency in reading and interpreting graphs is widely held as a central element for scientific literacy given the importance of data visualizations to succinctly present complex information. Although prior research espouses methods to improve graphing proficiencies, there is little understanding about when and how…
Statistical Modeling of Large-Scale Simulation Data
Eliassi-Rad, T; Critchlow, T; Abdulla, G
2002-02-22
With the advent of fast computer systems, Scientists are now able to generate terabytes of simulation data. Unfortunately, the shear size of these data sets has made efficient exploration of them impossible. To aid scientists in gathering knowledge from their simulation data, we have developed an ad-hoc query infrastructure. Our system, called AQSim (short for Ad-hoc Queries for Simulation) reduces the data storage requirements and access times in two stages. First, it creates and stores mathematical and statistical models of the data. Second, it evaluates queries on the models of the data instead of on the entire data set. In this paper, we present two simple but highly effective statistical modeling techniques for simulation data. Our first modeling technique computes the true mean of systematic partitions of the data. It makes no assumptions about the distribution of the data and uses a variant of the root mean square error to evaluate a model. In our second statistical modeling technique, we use the Andersen-Darling goodness-of-fit method on systematic partitions of the data. This second method evaluates a model by how well it passes the normality test on the data. Both of our statistical models summarize the data so as to answer range queries in the most effective way. We calculate precision on an answer to a query by scaling the one-sided Chebyshev Inequalities with the original mesh's topology. Our experimental evaluations on two scientific simulation data sets illustrate the value of using these statistical modeling techniques on large simulation data sets.
Smolders, R.; Den Hond, E.; Koppen, G.; Govarts, E.; Willems, H.; Casteleyn, L.; Kolossa-Gehring, M.; Fiddicke, U.; Castaño, A.; Koch, H.M.; Angerer, J.; Esteban, M.; Sepai, O.; Exley, K.; Bloemen, L.; Horvat, M.; Knudsen, L.E.; Joas, A.; Joas, R.; Biot, P.; and others
2015-08-15
In 2011 and 2012, the COPHES/DEMOCOPHES twin projects performed the first ever harmonized human biomonitoring survey in 17 European countries. In more than 1800 mother–child pairs, individual lifestyle data were collected and cadmium, cotinine and certain phthalate metabolites were measured in urine. Total mercury was determined in hair samples. While the main goal of the COPHES/DEMOCOPHES twin projects was to develop and test harmonized protocols and procedures, the goal of the current paper is to investigate whether the observed differences in biomarker values among the countries implementing DEMOCOPHES can be interpreted using information from external databases on environmental quality and lifestyle. In general, 13 countries having implemented DEMOCOPHES provided high-quality data from external sources that were relevant for interpretation purposes. However, some data were not available for reporting or were not in line with predefined specifications. Therefore, only part of the external information could be included in the statistical analyses. Nonetheless, there was a highly significant correlation between national levels of fish consumption and mercury in hair, the strength of antismoking legislation was significantly related to urinary cotinine levels, and we were able to show indications that also urinary cadmium levels were associated with environmental quality and food quality. These results again show the potential of biomonitoring data to provide added value for (the evaluation of) evidence-informed policy making. - Highlights: • External data was collected to interpret HBM data from DEMOCOPHES. • Hg in hair could be related to fish consumption across different countries. • Urinary cotinine was related to strictness of anti-smoking legislation. • Urinary Cd was borderline significantly related to air and food quality. • Lack of comparable data among countries hampered the analysis.
Choosing from Plausible Alternatives in Interpreting Qualitative Data.
ERIC Educational Resources Information Center
Donmoyer, Robert
This paper addresses a variation of the traditional validity question asked of qualitative researchers. Here the question is not "How do we know the qualitative researcher's question is valid?" but rather, "How does the qualitative researcher choose from among a multitude of apparently valid or at least plausible interpretations?" As early as…
Interpreting School Satisfaction Data from a Marketing Perspective.
ERIC Educational Resources Information Center
Pandiani, John A.; James, Brad C.; Banks, Steven M.
This paper presents results of a customer satisfaction survey of Vermont elementary and secondary public schools concerning satisfaction with mental health services during the 1996-97 school year. Analysis of completed questionnaires (N=233) are interpreted from a marketing perspective. Findings are reported for: (1) treated prevalence of…
Clinical Analysis and Interpretation of Cancer Genome Data
Van Allen, Eliezer M.; Wagle, Nikhil; Levy, Mia A.
2013-01-01
The scale of tumor genomic profiling is rapidly outpacing human cognitive capacity to make clinical decisions without the aid of tools. New frameworks are needed to help researchers and clinicians process the information emerging from the explosive growth in both the number of tumor genetic variants routinely tested and the respective knowledge to interpret their clinical significance. We review the current state, limitations, and future trends in methods to support the clinical analysis and interpretation of cancer genomes. This includes the processes of genome-scale variant identification, including tools for sequence alignment, tumor–germline comparison, and molecular annotation of variants. The process of clinical interpretation of tumor variants includes classification of the effect of the variant, reporting the results to clinicians, and enabling the clinician to make a clinical decision based on the genomic information integrated with other clinical features. We describe existing knowledge bases, databases, algorithms, and tools for identification and visualization of tumor variants and their actionable subsets. With the decreasing cost of tumor gene mutation testing and the increasing number of actionable therapeutics, we expect the methods for analysis and interpretation of cancer genomes to continue to evolve to meet the needs of patient-centered clinical decision making. The science of computational cancer medicine is still in its infancy; however, there is a clear need to continue the development of knowledge bases, best practices, tools, and validation experiments for successful clinical implementation in oncology. PMID:23589549
Imputing historical statistics, soils information, and other land-use data to crop area
NASA Technical Reports Server (NTRS)
Perry, C. R., Jr.; Willis, R. W.; Lautenschlager, L.
1982-01-01
In foreign crop condition monitoring, satellite acquired imagery is routinely used. To facilitate interpretation of this imagery, it is advantageous to have estimates of the crop types and their extent for small area units, i.e., grid cells on a map represent, at 60 deg latitude, an area nominally 25 by 25 nautical miles in size. The feasibility of imputing historical crop statistics, soils information, and other ancillary data to crop area for a province in Argentina is studied.
Data Acquisition and Preprocessing in Studies on Humans: What Is Not Taught in Statistics Classes?
Zhu, Yeyi; Hernandez, Ladia M.; Mueller, Peter; Dong, Yongquan; Forman, Michele R.
2013-01-01
The aim of this paper is to address issues in research that may be missing from statistics classes and important for (bio-)statistics students. In the context of a case study, we discuss data acquisition and preprocessing steps that fill the gap between research questions posed by subject matter scientists and statistical methodology for formal inference. Issues include participant recruitment, data collection training and standardization, variable coding, data review and verification, data cleaning and editing, and documentation. Despite the critical importance of these details in research, most of these issues are rarely discussed in an applied statistics program. One reason for the lack of more formal training is the difficulty in addressing the many challenges that can possibly arise in the course of a study in a systematic way. This article can help to bridge this gap between research questions and formal statistical inference by using an illustrative case study for a discussion. We hope that reading and discussing this paper and practicing data preprocessing exercises will sensitize statistics students to these important issues and achieve optimal conduct, quality control, analysis, and interpretation of a study. PMID:24511148
Statistical interpretation of joint multiplicity distributions of neutrons and charged particles
NASA Astrophysics Data System (ADS)
Tõke, J.; Agnihotri, D. K.; Skulski, W.; Schröder, W. U.
2001-02-01
Experimental joint multiplicity distributions of neutrons and charged particles provide a striking signal of the characteristic decay processes of nuclear systems following energetic nuclear reactions. They present, therefore, a valuable tool for testing theoretical models for such decay processes. The power of this experimental tool is demonstrated by a comparison of an experimental joint multiplicity distribution to the predictions of different theoretical models of statistical decay of excited nuclear systems. It is shown that, while generally phase-space based models offer a quantitative description of the observed correlation pattern of such an experimental multiplicity distribution, some models of nuclear multifragmentation fail to account for salient features of the observed correlation.
Statistical inference for exploratory data analysis and model diagnostics.
Buja, Andreas; Cook, Dianne; Hofmann, Heike; Lawrence, Michael; Lee, Eun-Kyung; Swayne, Deborah F; Wickham, Hadley
2009-11-13
We propose to furnish visual statistical methods with an inferential framework and protocol, modelled on confirmatory statistical testing. In this framework, plots take on the role of test statistics, and human cognition the role of statistical tests. Statistical significance of 'discoveries' is measured by having the human viewer compare the plot of the real dataset with collections of plots of simulated datasets. A simple but rigorous protocol that provides inferential validity is modelled after the 'lineup' popular from criminal legal procedures. Another protocol modelled after the 'Rorschach' inkblot test, well known from (pop-)psychology, will help analysts acclimatize to random variability before being exposed to the plot of the real data. The proposed protocols will be useful for exploratory data analysis, with reference datasets simulated by using a null assumption that structure is absent. The framework is also useful for model diagnostics in which case reference datasets are simulated from the model in question. This latter point follows up on previous proposals. Adopting the protocols will mean an adjustment in working procedures for data analysts, adding more rigour, and teachers might find that incorporating these protocols into the curriculum improves their students' statistical thinking. PMID:19805449
Data Warehousing: How To Make Your Statistics Meaningful.
ERIC Educational Resources Information Center
Flaherty, William
2001-01-01
Examines how one school district found a way to turn data collection from a disparate mountain of statistics into more useful information by using their Instructional Decision Support System. System software is explained as is how the district solved some data management challenges. (GR)
Using Carbon Emissions Data to "Heat Up" Descriptive Statistics
ERIC Educational Resources Information Center
Brooks, Robert
2012-01-01
This article illustrates using carbon emissions data in an introductory statistics assignment. The carbon emissions data has desirable characteristics including: choice of measure; skewness; and outliers. These complexities allow research and public policy debate to be introduced. (Contains 4 figures and 2 tables.)
Smolders, R; Den Hond, E; Koppen, G; Govarts, E; Willems, H; Casteleyn, L; Kolossa-Gehring, M; Fiddicke, U; Castaño, A; Koch, H M; Angerer, J; Esteban, M; Sepai, O; Exley, K; Bloemen, L; Horvat, M; Knudsen, L E; Joas, A; Joas, R; Biot, P; Aerts, D; Katsonouri, A; Hadjipanayis, A; Cerna, M; Krskova, A; Schwedler, G; Seiwert, M; Nielsen, J K S; Rudnai, P; Közepesy, S; Evans, D S; Ryan, M P; Gutleb, A C; Fischer, M E; Ligocka, D; Jakubowski, M; Reis, M F; Namorado, S; Lupsa, I-R; Gurzau, A E; Halzlova, K; Fabianova, E; Mazej, D; Tratnik Snoj, J; Gomez, S; González, S; Berglund, M; Larsson, K; Lehmann, A; Crettaz, P; Schoeters, G
2015-08-01
In 2011 and 2012, the COPHES/DEMOCOPHES twin projects performed the first ever harmonized human biomonitoring survey in 17 European countries. In more than 1800 mother-child pairs, individual lifestyle data were collected and cadmium, cotinine and certain phthalate metabolites were measured in urine. Total mercury was determined in hair samples. While the main goal of the COPHES/DEMOCOPHES twin projects was to develop and test harmonized protocols and procedures, the goal of the current paper is to investigate whether the observed differences in biomarker values among the countries implementing DEMOCOPHES can be interpreted using information from external databases on environmental quality and lifestyle. In general, 13 countries having implemented DEMOCOPHES provided high-quality data from external sources that were relevant for interpretation purposes. However, some data were not available for reporting or were not in line with predefined specifications. Therefore, only part of the external information could be included in the statistical analyses. Nonetheless, there was a highly significant correlation between national levels of fish consumption and mercury in hair, the strength of antismoking legislation was significantly related to urinary cotinine levels, and we were able to show indications that also urinary cadmium levels were associated with environmental quality and food quality. These results again show the potential of biomonitoring data to provide added value for (the evaluation of) evidence-informed policy making. PMID:25440294
ERIC Educational Resources Information Center
Olsen, Robert J.
2008-01-01
I describe how data pooling and data visualization can be employed in the first-semester general chemistry laboratory to introduce core statistical concepts such as central tendency and dispersion of a data set. The pooled data are plotted as a 1-D scatterplot, a purpose-designed number line through which statistical features of the data are…
Double precision errors in the logistic map: statistical study and dynamical interpretation.
Oteo, J A; Ros, J
2007-09-01
The nature of the round-off errors that occur in the usual double precision computation of the logistic map is studied in detail. Different iterative regimes from the whole panoply of behaviors exhibited in the bifurcation diagram are examined, histograms of errors in trajectories given, and for the case of fully developed chaos an explicit formula is found. It is shown that the statistics of the largest double precision error as a function of the map parameter is characterized by jumps whose location is determined by certain boundary crossings in the bifurcation diagram. Both jumps and locations seem to present geometric convergence characterized by the two first Feigenbaum constants. Eventually, a comparison with Benford's law for the distribution of the leading digit of compilation of numbers is discussed. PMID:17930330
Statistical interpretation of transient current power-law decay in colloidal quantum dot arrays
NASA Astrophysics Data System (ADS)
Sibatov, R. T.
2011-08-01
A new statistical model of the charge transport in colloidal quantum dot arrays is proposed. It takes into account Coulomb blockade forbidding multiple occupancy of nanocrystals and the influence of energetic disorder of interdot space. The model explains power-law current transients and the presence of the memory effect. The fractional differential analogue of the Ohm law is found phenomenologically for nanocrystal arrays. The model combines ideas that were considered as conflicting by other authors: the Scher-Montroll idea about the power-law distribution of waiting times in localized states for disordered semiconductors is applied taking into account Coulomb blockade; Novikov's condition about the asymptotic power-law distribution of time intervals between successful current pulses in conduction channels is fulfilled; and the carrier injection blocking predicted by Ginger and Greenham (2000 J. Appl. Phys. 87 1361) takes place.
Occupational exposure decisions: can limited data interpretation training help improve accuracy?
Logan, Perry; Ramachandran, Gurumurthy; Mulhausen, John; Hewett, Paul
2009-06-01
Accurate exposure assessments are critical for ensuring that potentially hazardous exposures are properly identified and controlled. The availability and accuracy of exposure assessments can determine whether resources are appropriately allocated to engineering and administrative controls, medical surveillance, personal protective equipment and other programs designed to protect workers. A desktop study was performed using videos, task information and sampling data to evaluate the accuracy and potential bias of participants' exposure judgments. Desktop exposure judgments were obtained from occupational hygienists for material handling jobs with small air sampling data sets (0-8 samples) and without the aid of computers. In addition, data interpretation tests (DITs) were administered to participants where they were asked to estimate the 95th percentile of an underlying log-normal exposure distribution from small data sets. Participants were presented with an exposure data interpretation or rule of thumb training which included a simple set of rules for estimating 95th percentiles for small data sets from a log-normal population. DIT was given to each participant before and after the rule of thumb training. Results of each DIT and qualitative and quantitative exposure judgments were compared with a reference judgment obtained through a Bayesian probabilistic analysis of the sampling data to investigate overall judgment accuracy and bias. There were a total of 4386 participant-task-chemical judgments for all data collections: 552 qualitative judgments made without sampling data and 3834 quantitative judgments with sampling data. The DITs and quantitative judgments were significantly better than random chance and much improved by the rule of thumb training. In addition, the rule of thumb training reduced the amount of bias in the DITs and quantitative judgments. The mean DIT % correct scores increased from 47 to 64% after the rule of thumb training (P < 0.001). The
testing the regional application of site statistics with satellite data
NASA Astrophysics Data System (ADS)
Kinne, S.; Paradise, S.
2009-04-01
Remote sensing from ground sites is often associated with a time record length and/or accuracy to its data, which is superior to that of remote sensing from space. Thus, in those cases ground measurement (network-) data are applied to constrain retrieval assumptions and/or to extend satellite data in time. Alternately, this combination can be and has been used to explore the potential application of local site statistics to surrounding regions. As a demonstrator MISR sensor statistical maps of the retrieved aerosol optical depth are applied the test the regional representation of site statistics for aerosol optical depth detected at AERONET (sub-photometer) and EARLINET (lidar) sites. The regional representation tests explore local applicability for regions from 100 to 1000 km in diameter based on an analysis of averages for relative error and relative bias.
Method of interpretation of remotely sensed data and applications to land use
NASA Technical Reports Server (NTRS)
Parada, N. D. J. (Principal Investigator); Dossantos, A. P.; Foresti, C.; Demoraesnovo, E. M. L.; Niero, M.; Lombardo, M. A.
1981-01-01
Instructional material describing a methodology of remote sensing data interpretation and examples of applicatons to land use survey are presented. The image interpretation elements are discussed for different types of sensor systems: aerial photographs, radar, and MSS/LANDSAT. Visual and automatic LANDSAT image interpretation is emphasized.
A Flexible Approach for the Statistical Visualization of Ensemble Data
Potter, K.; Wilson, A.; Bremer, P.; Williams, Dean N.; Pascucci, V.; Johnson, C.
2009-09-29
Scientists are increasingly moving towards ensemble data sets to explore relationships present in dynamic systems. Ensemble data sets combine spatio-temporal simulation results generated using multiple numerical models, sampled input conditions and perturbed parameters. While ensemble data sets are a powerful tool for mitigating uncertainty, they pose significant visualization and analysis challenges due to their complexity. We present a collection of overview and statistical displays linked through a high level of interactivity to provide a framework for gaining key scientific insight into the distribution of the simulation results as well as the uncertainty associated with the data. In contrast to methods that present large amounts of diverse information in a single display, we argue that combining multiple linked statistical displays yields a clearer presentation of the data and facilitates a greater level of visual data analysis. We demonstrate this approach using driving problems from climate modeling and meteorology and discuss generalizations to other fields.
ERIC Educational Resources Information Center
Mickler, J. Ernest
This 60th annual report on collegiate enrollments in the United States is based on data received from 1,635 four-year institutions in the U.S., Puerto Rico, and the U.S. Territories. General notes, survey methodology notes, and a summary of findings are presented. Detailed statistical charts present institutional data on men and women students and…
Estimation of context for statistical classification of multispectral image data
NASA Technical Reports Server (NTRS)
Tilton, J. C.; Vardeman, S. B.; Swain, P. H.
1982-01-01
Recent investigations have demonstrated the effectiveness of a contextual classifier that combines spatial and spectral information employing a general statistical approach. This statistical classification algorithm exploits the tendency of certain ground cover classes to occur more frequently in some spatial contexts than in others. Indeed, a key input to this algorithm is a statistical characterization of the context: the context function. An unbiased estimator of the context function is discussed which, besides having the advantage of statistical unbiasedness, has the additional advantage over other estimation techniques of being amenable to an adaptive implementation in which the context-function estimate varies according to local contextual information. Results from applying the unbiased estimator to the contextual classification of three real Landsat data sets are presented and contrasted with results from noncontextual classifications and from contextual classifications utilizing other context-function estimation techniques.
Data analysis using the Gnu R system for statistical computation
Simone, James; /Fermilab
2011-07-01
R is a language system for statistical computation. It is widely used in statistics, bioinformatics, machine learning, data mining, quantitative finance, and the analysis of clinical drug trials. Among the advantages of R are: it has become the standard language for developing statistical techniques, it is being actively developed by a large and growing global user community, it is open source software, it is highly portable (Linux, OS-X and Windows), it has a built-in documentation system, it produces high quality graphics and it is easily extensible with over four thousand extension library packages available covering statistics and applications. This report gives a very brief introduction to R with some examples using lattice QCD simulation results. It then discusses the development of R packages designed for chi-square minimization fits for lattice n-pt correlation functions.
A Laboratory Exercise in Statistical Analysis of Data
NASA Astrophysics Data System (ADS)
Vitha, Mark F.; Carr, Peter W.
1997-08-01
An undergraduate laboratory exercise in statistical analysis of data has been developed based on facile weighings of vitamin E pills. The use of electronic top-loading balances allows for very rapid data collection. Therefore, students obtain a sufficiently large number of replicates to provide statistically meaningful data sets. Through this exercise, students explore the effects of sample size and different types of sample averaging on the standard deviation of the average weight per pill. An emphasis is placed on the difference between the standard deviation of the mean and the standard deviation of the population. Students also perform the Q-test and t-test and are introduced to the X2-test. In this report, the class data from two consecutive offerings of the course are compared and reveal a statistically significant increase in the average weight per pill, presumably due to the absorption of water over time. Histograms of the class data are shown and used to illustrate the importance of plotting the data. Overall, through this brief laboratory exercise, students are exposed to many important statistical tests and concepts which are then used and further developed throughout the remainder of the course.
A Comprehensive Statistically-Based Method to Interpret Real-Time Flowing Measurements
Keita Yoshioka; Pinan Dawkrajai; Analis A. Romero; Ding Zhu; A. D. Hill; Larry W. Lake
2007-01-15
With the recent development of temperature measurement systems, continuous temperature profiles can be obtained with high precision. Small temperature changes can be detected by modern temperature measuring instruments such as fiber optic distributed temperature sensor (DTS) in intelligent completions and will potentially aid the diagnosis of downhole flow conditions. In vertical wells, since elevational geothermal changes make the wellbore temperature sensitive to the amount and the type of fluids produced, temperature logs can be used successfully to diagnose the downhole flow conditions. However, geothermal temperature changes along the wellbore being small for horizontal wells, interpretations of a temperature log become difficult. The primary temperature differences for each phase (oil, water, and gas) are caused by frictional effects. Therefore, in developing a thermal model for horizontal wellbore, subtle temperature changes must be accounted for. In this project, we have rigorously derived governing equations for a producing horizontal wellbore and developed a prediction model of the temperature and pressure by coupling the wellbore and reservoir equations. Also, we applied Ramey's model (1962) to the build section and used an energy balance to infer the temperature profile at the junction. The multilateral wellbore temperature model was applied to a wide range of cases at varying fluid thermal properties, absolute values of temperature and pressure, geothermal gradients, flow rates from each lateral, and the trajectories of each build section. With the prediction models developed, we present inversion studies of synthetic and field examples. These results are essential to identify water or gas entry, to guide flow control devices in intelligent completions, and to decide if reservoir stimulation is needed in particular horizontal sections. This study will complete and validate these inversion studies.
A Comprehensive Statistically-Based Method to Interpret Real-Time Flowing Measurements
Pinan Dawkrajai; Keita Yoshioka; Analis A. Romero; Ding Zhu; A.D. Hill; Larry W. Lake
2005-10-01
This project is motivated by the increasing use of distributed temperature sensors for real-time monitoring of complex wells (horizontal, multilateral and multi-branching wells) to infer the profiles of oil, gas, and water entry. Measured information can be used to interpret flow profiles along the wellbore including junction and build section. In this second project year, we have completed a forward model to predict temperature and pressure profiles in complex wells. As a comprehensive temperature model, we have developed an analytical reservoir flow model which takes into account Joule-Thomson effects in the near well vicinity and multiphase non-isothermal producing wellbore model, and couples those models accounting mass and heat transfer between them. For further inferences such as water coning or gas evaporation, we will need a numerical non-isothermal reservoir simulator, and unlike existing (thermal recovery, geothermal) simulators, it should capture subtle temperature change occurring in a normal production. We will show the results from the analytical coupled model (analytical reservoir solution coupled with numerical multi-segment well model) to infer the anomalous temperature or pressure profiles under various conditions, and the preliminary results from the numerical coupled reservoir model which solves full matrix including wellbore grids. We applied Ramey's model to the build section and used an enthalpy balance to infer the temperature profile at the junction. The multilateral wellbore temperature model was applied to a wide range of cases varying fluid thermal properties, absolute values of temperature and pressure, geothermal gradients, flow rates from each lateral, and the trajectories of each build section.
A COMPREHENSIVE STATISTICALLY-BASED METHOD TO INTERPRET REAL-TIME FLOWING MEASUREMENTS
Pinan Dawkrajai; Analis A. Romero; Keita Yoshioka; Ding Zhu; A.D. Hill; Larry W. Lake
2004-10-01
In this project, we are developing new methods for interpreting measurements in complex wells (horizontal, multilateral and multi-branching wells) to determine the profiles of oil, gas, and water entry. These methods are needed to take full advantage of ''smart'' well instrumentation, a technology that is rapidly evolving to provide the ability to continuously and permanently monitor downhole temperature, pressure, volumetric flow rate, and perhaps other fluid flow properties at many locations along a wellbore; and hence, to control and optimize well performance. In this first year, we have made considerable progress in the development of the forward model of temperature and pressure behavior in complex wells. In this period, we have progressed on three major parts of the forward problem of predicting the temperature and pressure behavior in complex wells. These three parts are the temperature and pressure behaviors in the reservoir near the wellbore, in the wellbore or laterals in the producing intervals, and in the build sections connecting the laterals, respectively. Many models exist to predict pressure behavior in reservoirs and wells, but these are almost always isothermal models. To predict temperature behavior we derived general mass, momentum, and energy balance equations for these parts of the complex well system. Analytical solutions for the reservoir and wellbore parts for certain special conditions show the magnitude of thermal effects that could occur. Our preliminary sensitivity analyses show that thermal effects caused by near-wellbore reservoir flow can cause temperature changes that are measurable with smart well technology. This is encouraging for the further development of the inverse model.
Statistics for correlated data: phylogenies, space, and time.
Ives, Anthony R; Zhu, Jun
2006-02-01
Here we give an introduction to the growing number of statistical techniques for analyzing data that are not independent realizations of the same sampling process--in other words, correlated data. We focus on regression problems, in which the value of a given variable depends linearly on the value of another variable. To illustrate different types of processes leading to correlated data, we analyze four simulated examples representing diverse problems arising in ecological studies. The first example is a comparison among species to determine the relationship between home-range area and body size; because species are phylogenetically related, they do not represent independent samples. The second example addresses spatial variation in net primary production and how this might be affected by soil nitrogen; because nearby locations are likely to have similar net primary productivity for reasons other than soil nitrogen, spatial correlation is likely. In the third example, we consider a time-series model to ask whether the decrease in density of a butterfly species is the result of decreases in its host-plant density; because the population density of a species in one generation is likely to affect the density in the following generation, time-series data are often correlated. The fourth example combines both spatial and temporal correlation in an experiment in which prey densities are manipulated to determine the response of predators to their food supply. For each of these examples, we use a different statistical approach for analyzing models of correlated data. Our goal is to give an overview of conceptual issues surrounding correlated data, rather than a detailed tutorial in how to apply different statistical techniques. By dispelling some of the mystery behind correlated data, we hope to encourage ecologists to learn about statistics that could be useful in their own work. Although at first encounter these techniques might seem complicated, they have the power to
Normalization and extraction of interpretable metrics from raw accelerometry data
Bai, Jiawei; He, Bing; Shou, Haochang; Zipunnikov, Vadim; Glass, Thomas A.; Crainiceanu, Ciprian M.
2014-01-01
We introduce an explicit set of metrics for human activity based on high-density acceleration recordings from a hip-worn tri-axial accelerometer. These metrics are based on two concepts: (i) Time Active, a measure of the length of time when activity is distinguishable from rest and (ii) AI, a measure of the relative amplitude of activity relative to rest. All measurements are normalized (have the same interpretation across subjects and days), easy to explain and implement, and reproducible across platforms and software implementations. Metrics were validated by visual inspection of results and quantitative in-lab replication studies, and by an association study with health outcomes PMID:23999141
Methodology of remote sensing data interpretation and geological applications. [Brazil
NASA Technical Reports Server (NTRS)
Parada, N. D. J. (Principal Investigator); Veneziani, P.; Dosanjos, C. E.
1982-01-01
Elements of photointerpretation discussed include the analysis of photographic texture and structure as well as film tonality. The method used is based on conventional techniques developed for interpreting aerial black and white photographs. By defining the properties which characterize the form and individuality of dual images, homologous zones can be identified. Guy's logic method (1966) was adapted and used on functions of resolution, scale, and spectral characteristics of remotely sensed products. Applications of LANDSAT imagery are discussed for regional geological mapping, mineral exploration, hydrogeology, and geotechnical engineering in Brazil.
Hysteresis model and statistical interpretation of energy losses in non-oriented steels
NASA Astrophysics Data System (ADS)
Mănescu (Păltânea), Veronica; Păltânea, Gheorghe; Gavrilă, Horia
2016-04-01
In this paper the hysteresis energy losses in two non-oriented industrial steels (M400-65A and M800-65A) were determined, by means of an efficient classical Preisach model, which is based on the Pescetti-Biorci method for the identification of the Preisach density. The excess and the total energy losses were also determined, using a statistical framework, based on magnetic object theory. The hysteresis energy losses, in a non-oriented steel alloy, depend on the peak magnetic polarization and they can be computed using a Preisach model, due to the fact that in these materials there is a direct link between the elementary rectangular loops and the discontinuous character of the magnetization process (Barkhausen jumps). To determine the Preisach density it was necessary to measure the normal magnetization curve and the saturation hysteresis cycle. A system of equations was deduced and the Preisach density was calculated for a magnetic polarization of 1.5 T; then the hysteresis cycle was reconstructed. Using the same pattern for the Preisach distribution, it was computed the hysteresis cycle for 1 T. The classical losses were calculated using a well known formula and the excess energy losses were determined by means of the magnetic object theory. The total energy losses were mathematically reconstructed and compared with those, measured experimentally.
Interpreting and Reporting Radiological Water-Quality Data
McCurdy, David E.; Garbarino, John R.; Mullin, Ann H.
2008-01-01
This document provides information to U.S. Geological Survey (USGS) Water Science Centers on interpreting and reporting radiological results for samples of environmental matrices, most notably water. The information provided is intended to be broadly useful throughout the United States, but it is recommended that scientists who work at sites containing radioactive hazardous wastes need to consult additional sources for more detailed information. The document is largely based on recognized national standards and guidance documents for radioanalytical sample processing, most notably the Multi-Agency Radiological Laboratory Analytical Protocols Manual (MARLAP), and on documents published by the U.S. Environmental Protection Agency and the American National Standards Institute. It does not include discussion of standard USGS practices including field quality-control sample analysis, interpretive report policies, and related issues, all of which shall always be included in any effort by the Water Science Centers. The use of 'shall' in this report signifies a policy requirement of the USGS Office of Water Quality.
Kim, Kyoung-Ho; Yun, Seong-Taek; Choi, Byoung-Young; Chae, Gi-Tak; Joo, Yongsung; Kim, Kangjoo; Kim, Hyoung-Soo
2009-07-21
Hydrochemical and multivariate statistical interpretations of 16 physicochemical parameters of 45 groundwater samples from a riverside alluvial aquifer underneath an agricultural area in Osong, central Korea, were performed in this study to understand the spatial controls of nitrate concentrations in terms of biogeochemical processes occurring near oxbow lakes within a fluvial plain. Nitrate concentrations in groundwater showed a large variability from 0.1 to 190.6 mg/L (mean=35.0 mg/L) with significantly lower values near oxbow lakes. The evaluation of hydrochemical data indicated that the groundwater chemistry (especially, degree of nitrate contamination) is mainly controlled by two competing processes: 1) agricultural contamination and 2) redox processes. In addition, results of factorial kriging, consisting of two steps (i.e., co-regionalization and factor analysis), reliably showed a spatial control of the concentrations of nitrate and other redox-sensitive species; in particular, significant denitrification was observed restrictedly near oxbow lakes. The results of this study indicate that sub-oxic conditions in an alluvial groundwater system are developed geologically and geochemically in and near oxbow lakes, which can effectively enhance the natural attenuation of nitrate before the groundwater discharges to nearby streams. This study also demonstrates the usefulness of multivariate statistical analysis in groundwater study as a supplementary tool for interpretation of complex hydrochemical data sets. PMID:19524319
Interpreting the Results of Weighted Least-Squares Regression: Caveats for the Statistical Consumer.
ERIC Educational Resources Information Center
Willett, John B.; Singer, Judith D.
In research, data sets often occur in which the variance of the distribution of the dependent variable at given levels of the predictors is a function of the values of the predictors. In this situation, the use of weighted least-squares (WLS) or techniques is required. Weights suitable for use in a WLS regression analysis must be estimated. A…
Mathematical and statistical approaches for interpreting biomarker compounds in exhaled human breath
The various instrumental techniques, human studies, and diagnostic tests that produce data from samples of exhaled breath have one thing in common: they all need to be put into a context wherein a posed question can actually be answered. Exhaled breath contains numerous compoun...
Eichler, Gabriel S; Reimers, Mark; Kane, David; Weinstein, John N
2007-01-01
Interpretation of microarray data remains a challenge, and most methods fail to consider the complex, nonlinear regulation of gene expression. To address that limitation, we introduce Learner of Functional Enrichment (LeFE), a statistical/machine learning algorithm based on Random Forest, and demonstrate it on several diverse datasets: smoker/never smoker, breast cancer classification, and cancer drug sensitivity. We also compare it with previously published algorithms, including Gene Set Enrichment Analysis. LeFE regularly identifies statistically significant functional themes consistent with known biology. PMID:17845722
Statistical Physics in the Era of Big Data
ERIC Educational Resources Information Center
Wang, Dashun
2013-01-01
With the wealth of data provided by a wide range of high-throughout measurement tools and technologies, statistical physics of complex systems is entering a new phase, impacting in a meaningful fashion a wide range of fields, from cell biology to computer science to economics. In this dissertation, by applying tools and techniques developed in…
DATA MANAGEMENT, STATISTICS AND COMMUNITY IMPACT MODELING CORE
EPA GRANT NUMBER: R832141C007
Title: Data Management, Statistics and Community Impact Modeling Core
Investigator: Frederica P Perera
Institution: Columbia University
EPA Project Officer: Nigel Fields
Project Period: No...
Data Desk Professional: Statistical Analysis for the Macintosh.
ERIC Educational Resources Information Center
Wise, Steven L.; Kutish, Gerald W.
This review of Data Desk Professional, a statistical software package for Macintosh microcomputers, includes information on: (1) cost and the amount and allocation of memory; (2) usability (documentation quality, ease of use); (3) running programs; (4) program output (quality of graphics); (5) accuracy; and (6) user services. In conclusion, it is…
Quick Access: Find Statistical Data on the Internet.
ERIC Educational Resources Information Center
Su, Di
1999-01-01
Provides an annotated list of Internet sources (World Wide Web, ftp, and gopher sites) for current and historical statistical business data, including selected interest rates, the Consumer Price Index, the Producer Price Index, foreign currency exchange rates, noon buying rates, per diem rates, the special drawing right, stock quotes, and mutual…
Exploring Foundation Concepts in Introductory Statistics Using Dynamic Data Points
ERIC Educational Resources Information Center
Ekol, George
2015-01-01
This paper analyses introductory statistics students' verbal and gestural expressions as they interacted with a dynamic sketch (DS) designed using "Sketchpad" software. The DS involved numeric data points built on the number line whose values changed as the points were dragged along the number line. The study is framed on aggregate…
Nebraska Public Library Profile. 1994-1995 Statistical Data.
ERIC Educational Resources Information Center
Nebraska Library Commission, Lincoln.
This summary of Nebraska Library statistics for the 1994-1995 fiscal year is a compilation of data reported by 234 of the 256 public libraries in the state. The major urban libraries of Omaha and Lincoln serve 47% of the state population, while 75% of libraries in the state serve populations under 2,500. Even with limited resources, Nebraska…
STATISTICAL ANALYSIS OF THE LOS ANGELES CATALYST STUDY DATA
This research was initiated to perform statistical analyses of the data from the Los Angeles Catalyst Study. The objective is to determine the effects of the introduction of the catalytic converter upon the atmospheric concentration levels of a number of air pollutants. This repo...
ERIC Educational Resources Information Center
Knirk, Frederick G.
Designed to assist educational researchers in utilizing microcomputers, this paper presents information on four types of computer software: writing tools for educators, statistical software designed to perform analyses of small and moderately large data sets, project management tools, and general education/research oriented information services…
Harnessing Multivariate Statistics for Ellipsoidal Data in Structural Geology
NASA Astrophysics Data System (ADS)
Roberts, N.; Davis, J. R.; Titus, S.; Tikoff, B.
2015-12-01
Most structural geology articles do not state significance levels, report confidence intervals, or perform regressions to find trends. This is, in part, because structural data tend to include directions, orientations, ellipsoids, and tensors, which are not treatable by elementary statistics. We describe a full procedural methodology for the statistical treatment of ellipsoidal data. We use a reconstructed dataset of deformed ooids in Maryland from Cloos (1947) to illustrate the process. Normalized ellipsoids have five degrees of freedom and can be represented by a second order tensor. This tensor can be permuted into a five dimensional vector that belongs to a vector space and can be treated with standard multivariate statistics. Cloos made several claims about the distribution of deformation in the South Mountain fold, Maryland, and we reexamine two particular claims using hypothesis testing: 1) octahedral shear strain increases towards the axial plane of the fold; 2) finite strain orientation varies systematically along the trend of the axial trace as it bends with the Appalachian orogen. We then test the null hypothesis that the southern segment of South Mountain is the same as the northern segment. This test illustrates the application of ellipsoidal statistics, which combine both orientation and shape. We report confidence intervals for each test, and graphically display our results with novel plots. This poster illustrates the importance of statistics in structural geology, especially when working with noisy or small datasets.
NASA Astrophysics Data System (ADS)
Stuart, N.; Cameron, I.; Viergever, K. M.; Moss, D.; Wallington, E.; Woodhouse, I.
2006-10-01
Satellite SAR data offers land managers an affordable, all-weather capability for detailed land cover mapping. Visual classification of these data may be more appropriate to the resource base in many developing countries and human interpreters can often overcome problems of speckle more effectively than automated classification procedures. We report work in progress on the visual interpretation of SAR data to classify land cover types within tropical savannas. Airborne L-band SAR data for a region in Belize, Central America is degraded to approximate the single polarisation hh and dual polarization hh/hv data that is expected from the ALOS PALSAR satellite sensor. Interpretations of these two types of data by multiple interpreters were compared to explore how the number of polarizations, the effective spatial resolution and the visual presentation of the SAR data affected the ability of interpreters to classify land cover. An average classification accuracy of 78% for hh and 85% for hh/hv data were achieved for all classes and interpreters. Denser high forest areas were accurately interpreted using both data sets, whilst a red-green colour composite of the hh/hv data allowed grass dominated areas to be separated from areas of savanna woodland. Conclusions are drawn about the benefits of certain presentations of backscatter data to assist visual interpretation.
Jenney, Anne; MacBeath, Gavin; Sorger, Peter K.
2014-01-01
Functional interpretation of genomic variation is critical to understanding human disease but it remains difficult to predict the effects of specific mutations on protein interaction networks and the phenotypes they regulate. We describe an analytical framework based on multiscale statistical mechanics that integrates genomic and biophysical data to model the human SH2-phosphoprotein network in normal and cancer cells. We apply our approach to data in The Cancer Genome Atlas (TCGA) and test model predictions experimentally. We find that mutations in phosphoproteins often create new interactions but that mutations in SH2 domains result almost exclusively in loss of interactions. Some of these mutations eliminate all interactions but many cause more selective loss, thereby rewiring specific edges in highly connected subnetworks. Moreover, idiosyncratic mutations appear to be as functionally consequential as recurrent mutations. By synthesizing genomic, structural, and biochemical data our framework represents a new approach to the interpretation of genetic variation. PMID:25362484
Using demographic data to better interpret pitfall trap catches
Matalin, Andrey V.; Makarov, Kirill V.
2011-01-01
Abstract The results of pitfall trapping are often interpreted as abundance in a particular habitat. At the same time, there are numerous cases of almost unrealistically high catches of ground beetles in seemingly unsuitable sites. The correlation of catches by pitfall trapping with the true distribution and abundance of Carabidae needs corroboration. During a full year survey in 2006/07 in the Lake Elton region (Volgograd Area, Russia), 175 species of ground beetles were trapped. Considering the differences in demographic structure of the local populations, and not their abundances, three groups of species were recognized: residents, migrants and sporadic. In residents, the demographic structure of local populations is complete, and their habitats can be considered “residential”. In migrants and sporadic species, the demographic structure of the local populations is incomplete, and their habitats can be considered “transit”. Residents interact both with their prey and with each other in a particular habitat. Sporadic species are hardly important to a carabid community because of their low abundances. The contribution of migrants to the structure of carabid communities is not apparent and requires additional research. Migrants and sporadic species represent a “labile” component in ground beetles communities, as opposed to a “stable” component, represented by residents. The variability of the labile component substantially limits our interpretation of species diversity in carabid communities. Thus, the criteria for determining the most abundant, or dominant species inevitably vary because the abundance of migrants in some cases can be one order of magnitude higher than that of residents. The results of pitfall trapping adequately reflect the state of carabid communities only in zonal habitats, while azonal and disturbed habitats are merely transit ones for many species of ground beetles. A study of the demographic structure of local populations and
Collegiate Enrollments in the U.S., 1981-82. Statistics, Interpretations, and Trends.
ERIC Educational Resources Information Center
Mickler, J. Ernest
Data and narrative information are presented on college enrollments, based on a survey of institutions in the United States, Puerto Rico, and U.S. Territories. The total four-year college enrollment for fall 1981 was 7,530,013, of which 5,306,832 were full-time and 2,223,181 were part-time. The total two-year college enrollment for fall 1981 was…
Statistical summaries of fatigue data for design purposes
NASA Technical Reports Server (NTRS)
Wirsching, P. H.
1983-01-01
Two methods are discussed for constructing a design curve on the safe side of fatigue data. Both the tolerance interval and equivalent prediction interval (EPI) concepts provide such a curve while accounting for both the distribution of the estimators in small samples and the data scatter. The EPI is also useful as a mechanism for providing necessary statistics on S-N data for a full reliability analysis which includes uncertainty in all fatigue design factors. Examples of statistical analyses of the general strain life relationship are presented. The tolerance limit and EPI techniques for defining a design curve are demonstrated. Examples usng WASPALOY B and RQC-100 data demonstrate that a reliability model could be constructed by considering the fatigue strength and fatigue ductility coefficients as two independent random variables. A technique given for establishing the fatigue strength for high cycle lives relies on an extrapolation technique and also accounts for "runners." A reliability model or design value can be specified.
Feature-based statistical analysis of combustion simulation data.
Bennett, Janine C; Krishnamoorthy, Vaidyanathan; Liu, Shusen; Grout, Ray W; Hawkes, Evatt R; Chen, Jacqueline H; Shepherd, Jason; Pascucci, Valerio; Bremer, Peer-Timo
2011-12-01
We present a new framework for feature-based statistical analysis of large-scale scientific data and demonstrate its effectiveness by analyzing features from Direct Numerical Simulations (DNS) of turbulent combustion. Turbulent flows are ubiquitous and account for transport and mixing processes in combustion, astrophysics, fusion, and climate modeling among other disciplines. They are also characterized by coherent structure or organized motion, i.e. nonlocal entities whose geometrical features can directly impact molecular mixing and reactive processes. While traditional multi-point statistics provide correlative information, they lack nonlocal structural information, and hence, fail to provide mechanistic causality information between organized fluid motion and mixing and reactive processes. Hence, it is of great interest to capture and track flow features and their statistics together with their correlation with relevant scalar quantities, e.g. temperature or species concentrations. In our approach we encode the set of all possible flow features by pre-computing merge trees augmented with attributes, such as statistical moments of various scalar fields, e.g. temperature, as well as length-scales computed via spectral analysis. The computation is performed in an efficient streaming manner in a pre-processing step and results in a collection of meta-data that is orders of magnitude smaller than the original simulation data. This meta-data is sufficient to support a fully flexible and interactive analysis of the features, allowing for arbitrary thresholds, providing per-feature statistics, and creating various global diagnostics such as Cumulative Density Functions (CDFs), histograms, or time-series. We combine the analysis with a rendering of the features in a linked-view browser that enables scientists to interactively explore, visualize, and analyze the equivalent of one terabyte of simulation data. We highlight the utility of this new framework for combustion
Feature-Based Statistical Analysis of Combustion Simulation Data
Bennett, J; Krishnamoorthy, V; Liu, S; Grout, R; Hawkes, E; Chen, J; Pascucci, V; Bremer, P T
2011-11-18
We present a new framework for feature-based statistical analysis of large-scale scientific data and demonstrate its effectiveness by analyzing features from Direct Numerical Simulations (DNS) of turbulent combustion. Turbulent flows are ubiquitous and account for transport and mixing processes in combustion, astrophysics, fusion, and climate modeling among other disciplines. They are also characterized by coherent structure or organized motion, i.e. nonlocal entities whose geometrical features can directly impact molecular mixing and reactive processes. While traditional multi-point statistics provide correlative information, they lack nonlocal structural information, and hence, fail to provide mechanistic causality information between organized fluid motion and mixing and reactive processes. Hence, it is of great interest to capture and track flow features and their statistics together with their correlation with relevant scalar quantities, e.g. temperature or species concentrations. In our approach we encode the set of all possible flow features by pre-computing merge trees augmented with attributes, such as statistical moments of various scalar fields, e.g. temperature, as well as length-scales computed via spectral analysis. The computation is performed in an efficient streaming manner in a pre-processing step and results in a collection of meta-data that is orders of magnitude smaller than the original simulation data. This meta-data is sufficient to support a fully flexible and interactive analysis of the features, allowing for arbitrary thresholds, providing per-feature statistics, and creating various global diagnostics such as Cumulative Density Functions (CDFs), histograms, or time-series. We combine the analysis with a rendering of the features in a linked-view browser that enables scientists to interactively explore, visualize, and analyze the equivalent of one terabyte of simulation data. We highlight the utility of this new framework for combustion
Statistical Quality Control of Moisture Data in GEOS DAS
NASA Technical Reports Server (NTRS)
Dee, D. P.; Rukhovets, L.; Todling, R.
1999-01-01
A new statistical quality control algorithm was recently implemented in the Goddard Earth Observing System Data Assimilation System (GEOS DAS). The final step in the algorithm consists of an adaptive buddy check that either accepts or rejects outlier observations based on a local statistical analysis of nearby data. A basic assumption in any such test is that the observed field is spatially coherent, in the sense that nearby data can be expected to confirm each other. However, the buddy check resulted in excessive rejection of moisture data, especially during the Northern Hemisphere summer. The analysis moisture variable in GEOS DAS is water vapor mixing ratio. Observational evidence shows that the distribution of mixing ratio errors is far from normal. Furthermore, spatial correlations among mixing ratio errors are highly anisotropic and difficult to identify. Both factors contribute to the poor performance of the statistical quality control algorithm. To alleviate the problem, we applied the buddy check to relative humidity data instead. This variable explicitly depends on temperature and therefore exhibits a much greater spatial coherence. As a result, reject rates of moisture data are much more reasonable and homogeneous in time and space.
Statistical Software for spatial analysis of stratigraphic data sets
2003-04-08
Stratistics s a tool for statistical analysis of spatially explicit data sets and model output for description and for model-data comparisons. lt is intended for the analysis of data sets commonly used in geology, such as gamma ray logs and lithologic sequences, as well as 2-D data such as maps. Stratistics incorporates a far wider range of spatial analysis methods drawn from multiple disciplines, than are currently available in other packages. These include incorporation of techniques from spatial and landscape ecology, fractal analysis, and mathematical geology. Its use should substantially reduce the risk associated with the use of predictive models
Statistical Software for spatial analysis of stratigraphic data sets
2003-04-08
Stratistics s a tool for statistical analysis of spatially explicit data sets and model output for description and for model-data comparisons. lt is intended for the analysis of data sets commonly used in geology, such as gamma ray logs and lithologic sequences, as well as 2-D data such as maps. Stratistics incorporates a far wider range of spatial analysis methods drawn from multiple disciplines, than are currently available in other packages. These include incorporation ofmore » techniques from spatial and landscape ecology, fractal analysis, and mathematical geology. Its use should substantially reduce the risk associated with the use of predictive models« less
A decision-theory approach to interpretable set analysis for high-dimensional data.
Boca, Simina M; Bravo, Héctor Céorrada; Caffo, Brian; Leek, Jeffrey T; Parmigiani, Giovanni
2013-09-01
A key problem in high-dimensional significance analysis is to find pre-defined sets that show enrichment for a statistical signal of interest; the classic example is the enrichment of gene sets for differentially expressed genes. Here, we propose a new decision-theory approach to the analysis of gene sets which focuses on estimating the fraction of non-null variables in a set. We introduce the idea of "atoms," non-overlapping sets based on the original pre-defined set annotations. Our approach focuses on finding the union of atoms that minimizes a weighted average of the number of false discoveries and missed discoveries. We introduce a new false discovery rate for sets, called the atomic false discovery rate (afdr), and prove that the optimal estimator in our decision-theory framework is to threshold the afdr. These results provide a coherent and interpretable framework for the analysis of sets that addresses the key issues of overlapping annotations and difficulty in interpreting p values in both competitive and self-contained tests. We illustrate our method and compare it to a popular existing method using simulated examples, as well as gene-set and brain ROI data analyses. PMID:23909925
Analyzing and interpreting genome data at the network level with ConsensusPathDB.
Herwig, Ralf; Hardt, Christopher; Lienhard, Matthias; Kamburov, Atanas
2016-10-01
ConsensusPathDB consists of a comprehensive collection of human (as well as mouse and yeast) molecular interaction data integrated from 32 different public repositories and a web interface featuring a set of computational methods and visualization tools to explore these data. This protocol describes the use of ConsensusPathDB (http://consensuspathdb.org) with respect to the functional and network-based characterization of biomolecules (genes, proteins and metabolites) that are submitted to the system either as a priority list or together with associated experimental data such as RNA-seq. The tool reports interaction network modules, biochemical pathways and functional information that are significantly enriched by the user's input, applying computational methods for statistical over-representation, enrichment and graph analysis. The results of this protocol can be observed within a few minutes, even with genome-wide data. The resulting network associations can be used to interpret high-throughput data mechanistically, to characterize and prioritize biomarkers, to integrate different omics levels, to design follow-up functional assay experiments and to generate topology for kinetic models at different scales. PMID:27606777
STATISTICS. The reusable holdout: Preserving validity in adaptive data analysis.
Dwork, Cynthia; Feldman, Vitaly; Hardt, Moritz; Pitassi, Toniann; Reingold, Omer; Roth, Aaron
2015-08-01
Misapplication of statistical data analysis is a common cause of spurious discoveries in scientific research. Existing approaches to ensuring the validity of inferences drawn from data assume a fixed procedure to be performed, selected before the data are examined. In common practice, however, data analysis is an intrinsically adaptive process, with new analyses generated on the basis of data exploration, as well as the results of previous analyses on the same data. We demonstrate a new approach for addressing the challenges of adaptivity based on insights from privacy-preserving data analysis. As an application, we show how to safely reuse a holdout data set many times to validate the results of adaptively chosen analyses. PMID:26250683
Data and statistical methods for analysis of trends and patterns
Atwood, C.L.; Gentillon, C.D.; Wilson, G.E.
1992-11-01
This report summarizes topics considered at a working meeting on data and statistical methods for analysis of trends and patterns in US commercial nuclear power plants. This meeting was sponsored by the Office of Analysis and Evaluation of Operational Data (AEOD) of the Nuclear Regulatory Commission (NRC). Three data sets are briefly described: Nuclear Plant Reliability Data System (NPRDS), Licensee Event Report (LER) data, and Performance Indicator data. Two types of study are emphasized: screening studies, to see if any trends or patterns appear to be present; and detailed studies, which are more concerned with checking the analysis assumptions, modeling any patterns that are present, and searching for causes. A prescription is given for a screening study, and ideas are suggested for a detailed study, when the data take of any of three forms: counts of events per time, counts of events per demand, and non-event data.
Biosurveillance applying scan statistics with multiple, disparate data sources.
Burkom, Howard S
2003-06-01
Researchers working on the Department of Defense Global Emerging Infections System (DoD-GEIS) pilot system, the Electronic Surveillance System for the Early Notification of Community-Based Epidemics (ESSENCE), have applied scan statistics for early outbreak detection using both traditional and nontraditional data sources. These sources include medical data indexed by International Classification of Disease, 9th Revision (ICD-9) diagnosis codes, as well as less-specific, but potentially timelier, indicators such as records of over-the-counter remedy sales and of school absenteeism. Early efforts employed the Kulldorff scan statistic as implemented in the SaTScan software of the National Cancer Institute. A key obstacle to this application is that the input data streams are typically based on time-varying factors, such as consumer behavior, rather than simply on the populations of the component subregions. We have used both modeling and recent historical data distributions to obtain background spatial distributions. Data analyses have provided guidance on how to condition and model input data to avoid excessive clustering. We have used this methodology in combining data sources for both retrospective studies of known outbreaks and surveillance of high-profile events of concern to local public health authorities. We have integrated the scan statistic capability into a Microsoft Access-based system in which we may include or exclude data sources, vary time windows separately for different data sources, censor data from subsets of individual providers or subregions, adjust the background computation method, and run retrospective or simulated studies. PMID:12791780
Statistical Treatment of Earth Observing System Pyroshock Separation Test Data
NASA Technical Reports Server (NTRS)
McNelis, Anne M.; Hughes, William O.
1998-01-01
The Earth Observing System (EOS) AM-1 spacecraft for NASA's Mission to Planet Earth is scheduled to be launched on an Atlas IIAS vehicle in June of 1998. One concern is that the instruments on the EOS spacecraft are sensitive to the shock-induced vibration produced when the spacecraft separates from the launch vehicle. By employing unique statistical analysis to the available ground test shock data, the NASA Lewis Research Center found that shock-induced vibrations would not be as great as the previously specified levels of Lockheed Martin. The EOS pyroshock separation testing, which was completed in 1997, produced a large quantity of accelerometer data to characterize the shock response levels at the launch vehicle/spacecraft interface. Thirteen pyroshock separation firings of the EOS and payload adapter configuration yielded 78 total measurements at the interface. The multiple firings were necessary to qualify the newly developed Lockheed Martin six-hardpoint separation system. Because of the unusually large amount of data acquired, Lewis developed a statistical methodology to predict the maximum expected shock levels at the interface between the EOS spacecraft and the launch vehicle. Then, this methodology, which is based on six shear plate accelerometer measurements per test firing at the spacecraft/launch vehicle interface, was used to determine the shock endurance specification for EOS. Each pyroshock separation test of the EOS spacecraft simulator produced its own set of interface accelerometer data. Probability distributions, histograms, the median, and higher order moments (skew and kurtosis) were analyzed. The data were found to be lognormally distributed, which is consistent with NASA pyroshock standards. Each set of lognormally transformed test data produced was analyzed to determine if the data should be combined statistically. Statistical testing of the data's standard deviations and means (F and t testing, respectively) determined if data sets were
A Geophysical Atlas for Interpretation of Satellite-derived Data
NASA Technical Reports Server (NTRS)
Lowman, P. D., Jr. (Editor); Frey, H. V. (Editor); Davis, W. M.; Greenberg, A. P.; Hutchinson, M. K.; Langel, R. A.; Lowrey, B. E.; Marsh, J. G.; Mead, G. D.; Okeefe, J. A.
1979-01-01
A compilation of maps of global geophysical and geological data plotted on a common scale and projection is presented. The maps include satellite gravity, magnetic, seismic, volcanic, tectonic activity, and mantle velocity anomaly data. The Bibliographic references for all maps are included.
Helping Students Interpret Large-Scale Data Tables
ERIC Educational Resources Information Center
Prodromou, Theodosia
2016-01-01
New technologies have completely altered the ways that citizens can access data. Indeed, emerging online data sources give citizens access to an enormous amount of numerical information that provides new sorts of evidence used to influence public opinion. In this new environment, two trends have had a significant impact on our increasingly…
Summary of Quantitative Interpretation of Image Far Ultraviolet Auroral Data
NASA Technical Reports Server (NTRS)
Frey, H. U.; Immel, T. J.; Mende, S. B.; Gerard, J.-C.; Hubert, B.; Habraken, S.; Span, J.; Gladstone, G. R.; Bisikalo, D. V.; Shematovich, V. I.; Six, N. Frank (Technical Monitor)
2002-01-01
Direct imaging of the magnetosphere by instruments on the IMAGE spacecraft is supplemented by simultaneous observations of the global aurora in three far ultraviolet (FUV) wavelength bands. The purpose of the multi-wavelength imaging is to study the global auroral particle and energy input from thc magnetosphere into the atmosphere. This paper describes provides the method for quantitative interpretation of FUV measurements. The Wide-Band Imaging Camera (WIC) provides broad band ultraviolet images of the aurora with maximum spatial and temporal resolution by imaging the nitrogen lines and bands between 140 and 180 nm wavelength. The Spectrographic Imager (SI), a dual wavelength monochromatic instrument, images both Doppler-shifted Lyman alpha emissions produced by precipitating protons, in the SI-12 channel and OI 135.6 nm emissions in the SI-13 channel. From the SI-12 Doppler shifted Lyman alpha images it is possible to obtain the precipitating proton flux provided assumptions are made regarding the mean energy of the protons. Knowledge of the proton (flux and energy) component allows the calculation of the contribution produced by protons in the WIC and SI-13 instruments. Comparison of the corrected WIC and SI-13 signals provides a measure of the electron mean energy, which can then be used to determine the electron energy fluxun-. To accomplish this reliable modeling emission modeling and instrument calibrations are required. In-flight calibration using early-type stars was used to validate the pre-flight laboratory calibrations and determine long-term trends in sensitivity. In general, very reasonable agreement is found between in-situ measurements and remote quantitative determinations.
A statistical analysis of sea temperature data. A statistical analysis of sea temperature data
NASA Astrophysics Data System (ADS)
Lorentzen, Torbjørn
2015-02-01
The paper analyzes sea temperature series measured at two geographical locations along the coast of Norway. We address the question whether the series are stable over the sample period 1936-2012 and whether we can measure any signal of climate change in the regional data. We use nonstandard supF, OLS-based CUSUM, RE, and Chow tests in combination with the Bai-Perron's structural break test to identify potential changes in the temperature. The augmented Dickey-Fuller, the KPSS, and the nonparametric Phillips-Perron tests are in addition applied in the evaluation of the stochastic properties of the series. The analysis indicates that both series undergo similar structural instabilities in the form of small shifts in the temperature level. The temperature at Lista (58° 06' N, 06° 38' E) shifts downward about 1962 while the Skrova series (68° 12' N, 14° 10' E) shifts to a lower level about 1977. Both series shift upward about 1987, and after a period of increasing temperature, both series start leveling off about the turn of the millennium. The series have no significant stochastic or deterministic trend. The analysis indicates that the mean temperature has moved upward in decadal, small steps since the 1980s. The result is in accordance with recent analyses of sea temperatures in the North Atlantic. The findings are also related to the so-called hiatus phenomenon where natural variation in climate can mask global warming processes. The paper contributes to the discussion of applying objective methods in measuring climate change.
Statistical Analysis of Strength Data for an Aerospace Aluminum Alloy
NASA Technical Reports Server (NTRS)
Neergaard, Lynn; Malone, Tina; Gentz, Steven J. (Technical Monitor)
2000-01-01
Aerospace vehicles are produced in limited quantities that do not always allow development of MIL-HDBK-5 A-basis design allowables. One method of examining production and composition variations is to perform 100% lot acceptance testing for aerospace Aluminum (Al) alloys. This paper discusses statistical trends seen in strength data for one Al alloy. A four-step approach reduced the data to residuals, visualized residuals as a function of time, grouped data with quantified scatter, and conducted analysis of variance (ANOVA).
Statistical Analysis of Strength Data for an Aerospace Aluminum Alloy
NASA Technical Reports Server (NTRS)
Neergaard, L.; Malone, T.
2001-01-01
Aerospace vehicles are produced in limited quantities that do not always allow development of MIL-HDBK-5 A-basis design allowables. One method of examining production and composition variations is to perform 100% lot acceptance testing for aerospace Aluminum (Al) alloys. This paper discusses statistical trends seen in strength data for one Al alloy. A four-step approach reduced the data to residuals, visualized residuals as a function of time, grouped data with quantified scatter, and conducted analysis of variance (ANOVA).
Interpreting Microarray Data to Build Models of Microbial Genetic Regulation Networks
Sokhansanj, B; Garnham, J B; Fitch, J P
2002-01-23
Microarrays and DNA chips are an efficient, high-throughput technology for measuring temporal changes in the expression of message RNA (mRNA) from thousands of genes (often the entire genome of an organism) in a single experiment. A crucial drawback of microarray experiments is that results are inherently qualitative: data are generally neither quantitatively repeatable, nor may microarray spot intensities be calibrated to in vivo mRNA concentrations. Nevertheless, microarrays represent by the far the cheapest and fastest way to obtain information about a cells global genetic regulatory networks. Besides poor signal characteristics, the massive number of data produced by microarray experiments poses challenges for visualization, interpretation and model building. Towards initial model development, we have developed a Java tool for visualizing the spatial organization of gene expression in bacteria. We are also developing an approach to inferring and testing qualitative fuzzy logic models of gene regulation using microarray data. Because we are developing and testing qualitative hypotheses that do not require quantitative precision, our statistical evaluation of experimental data is limited to checking for validity and consistency. Our goals are to maximize the impact of inexpensive microarray technology, bearing in mind that biological models and hypotheses are typically qualitative.
Spatial Statistical Estimation for Massive Sea Surface Temperature Data
NASA Astrophysics Data System (ADS)
Marchetti, Y.; Vazquez, J.; Nguyen, H.; Braverman, A. J.
2015-12-01
We combine several large remotely sensed sea surface temperature (SST) datasets to create a single high-resolution SST dataset that has no missing data and provides an uncertainty associated with each value. This high resolution dataset will optimize estimates of SST in critical parts of the world's oceans, such as coastal upwelling regions. We use Spatial Statistical Data Fusion (SSDF), a statistical methodology for predicting global spatial fields by exploiting spatial correlations in the data. The main advantages of SSDF over spatial smoothing methodologies include the provision of probabilistic uncertainties, the ability to incorporate multiple datasets with varying footprints, measurement errors and biases, and estimation at any desired resolution. In order to accommodate massive input and output datasets, we introduce two modifications of the existing SSDF algorithm. First, we compute statistical model parameters based on coarse resolution aggregated data. Second, we use an adaptive spatial grid that allows us to perform estimation in a specified region of interest, but incorporate spatial dependence between locations in that region and all locations globally. Finally, we demonstrate with a case study involving estimations on the full globe at coarse resolution grid (30 km) and a high resolution (1 km) inset for the Gulf Stream region.
Statistical analysis of spectral data for vegetation detection
NASA Astrophysics Data System (ADS)
Love, Rafael; Cathcart, J. Michael
2006-05-01
Identification and reduction of false alarms provide a critical component in the detection of landmines. Research at Georgia Tech over the past several years has focused on this problem through an examination of the signature characteristics of various background materials. These efforts seek to understand the physical basis and features of these signatures as an aid to the development of false target identification techniques. The investigation presented in this paper deal concentrated on the detection of foliage in long wave infrared imagery. Data collected by a hyperspectral long-wave infrared sensor provided the background signatures used in this study. These studies focused on an analysis of the statistical characteristics of both the intensity signature and derived emissivity data. Results from these studies indicate foliage signatures possess unique characteristics that can be exploited to enable detection of vegetation in LWIR images. This paper will present review of the approach and results of the statistical analysis.
Probability and Statistics in Astronomical Machine Learning and Data Minin
NASA Astrophysics Data System (ADS)
Scargle, Jeffrey
2012-03-01
Statistical issues peculiar to astronomy have implications for machine learning and data mining. It should be obvious that statistics lies at the heart of machine learning and data mining. Further it should be no surprise that the passive observational nature of astronomy, the concomitant lack of sampling control, and the uniqueness of its realm (the whole universe!) lead to some special statistical issues and problems. As described in the Introduction to this volume, data analysis technology is largely keeping up with major advances in astrophysics and cosmology, even driving many of them. And I realize that there are many scientists with good statistical knowledge and instincts, especially in the modern era I like to call the Age of Digital Astronomy. Nevertheless, old impediments still lurk, and the aim of this chapter is to elucidate some of them. Many experiences with smart people doing not-so-smart things (cf. the anecdotes collected in the Appendix here) have convinced me that the cautions given here need to be emphasized. Consider these four points: 1. Data analysis often involves searches of many cases, for example, outcomes of a repeated experiment, for a feature of the data. 2. The feature comprising the goal of such searches may not be defined unambiguously until the search is carried out, or perhaps vaguely even then. 3. The human visual system is very good at recognizing patterns in noisy contexts. 4. People are much easier to convince of something they want to believe, or already believe, as opposed to unpleasant or surprising facts. One can argue that all four are good things during the initial, exploratory phases of most data analysis. They represent the curiosity and creativity of the scientific process, especially during the exploration of data collections from new observational programs such as all-sky surveys in wavelengths not accessed before or sets of images of a planetary surface not yet explored. On the other hand, confirmatory scientific
Interpreting Disasters From Limited Data Availability: A Guatemalan Study Case
NASA Astrophysics Data System (ADS)
Soto Gomez, A.
2012-12-01
Guatemala is located in a geographical area exposed to multiple natural hazards. Although Guatemalan populations live in hazardous conditions, limited scientific research is being focused in this particular geographical area. Thorough studies are needed to understand the disasters occurring in the country and consequently enable decision makers and professionals to plan future actions, yet available data is limited. Data comprised in the available data sources is limited by their timespan or the size of the events included and therefore is insufficient to provide the whole picture of the disasters in the country. This study proposes a methodology to use the available data within one of the most important catchments in the country, the Samala River basin, to look for answers to what kind of disasters occurs? Where such events happen? And, why do they happen? Three datasets from different source agencies -one global, one regional, and one local- have been analyzed numerically and spatially using spreadsheets, numerical computing software, and geographic information systems. Analyses results have been coupled in order to search for possible answers to the established questions. It has been found a relation between the compositions of data of two of the three datasets analyzed. The third has shown a very different composition probably because the inclusion criteria of the dataset exclude smaller but more frequent disasters in its records. In all the datasets the most frequent type of disasters are those caused by hydrometeorological hazards i.e. floods and landslides. It has been found a relation between the occurrences of disasters and the records of precipitation in the area, but this relation is not strong enough to affirm that the disasters are the direct result of rain in the area and further studies must be carried out to explore other potential causes. Analyzing the existing data contributes to identify what kind of data is needed and this would be useful to
Identification and interpretation of patterns in rocket engine data
NASA Technical Reports Server (NTRS)
Lo, Ching F.
1993-01-01
The goal of our research is to analyze ground test data, to identify patterns associated with the anomalous engine behavior. On the basis of this analysis, it is the task of our project to develop a Pattern Identification and Detection System which detects anomalous engine behavior in the early stages of fault development significantly earlier than the indication provided by either redline detection mechanism or human expert analysis. Early detection of these anomalies is challenging because of the large amount of noise presence in the data. In the presence of this noise, early indication of anomalies becomes even more difficult to distinguish from fluctuations in normal steady state operation.
Artefact Mobile Data Model to Support Cultural Heritage Data Collection and Interpretation
NASA Astrophysics Data System (ADS)
Mohamed-Ghouse, Z. S.; Kelly, D.; Costello, A.; Edmonds, V.
2012-07-01
This paper discusses the limitation of existing data structures in mobile mapping applications to support archaeologists to manage the artefact (any object made or modified by a human culture, and later recovered by an archaeological endeavor) details excavated at a cultural heritage site. Current limitations of data structure in the mobile mapping application allow archeologist to record only one artefact per test pit location. In reality, more than one artefact can be excavated from the same test pit location. A spatial data model called Artefact Mobile Data Model (AMDM) was developed applying existing Relational Data Base Management System (RDBMS) technique to overcome the limitation. The data model was implemented in a mobile database environment called SprintDB Pro which was in turn connected to ArcPad 7.1 mobile mapping application through Open Data Base Connectivity (ODBC). In addition, the design of a user friendly application built on top of AMDM to interpret and record the technology associated with each artefact excavated in the field is also discussed in the paper. In summary, the paper discusses the design and implementation of a data model to facilitate the collection of artefacts in the field using integrated mobile mapping and database approach.
Computational and Statistical Analysis of Protein Mass Spectrometry Data
Noble, William Stafford; MacCoss, Michael J.
2012-01-01
High-throughput proteomics experiments involving tandem mass spectrometry produce large volumes of complex data that require sophisticated computational analyses. As such, the field offers many challenges for computational biologists. In this article, we briefly introduce some of the core computational and statistical problems in the field and then describe a variety of outstanding problems that readers of PLoS Computational Biology might be able to help solve. PMID:22291580
Analysis , interpretation, and avoidance of difficult data in bioassay
Technology Transfer Automated Retrieval System (TEKTRAN)
Problem data in biological assay of insect pathogens results from a number of difficult experimental design challenges, including: 1) screening of large numbers of pathogen isolates, 2) limited availability of inocula or test animals, 3) instability of pathogen preparations, 4) high variability in s...
A fast inversion method for interpreting borehole electromagnetic data
NASA Astrophysics Data System (ADS)
Kim, H. J.; Lee, K. H.; Wilt, M.
2003-05-01
A fast and stable inversion scheme has been developed using the localized nonlinear (LN) approximation to analyze electromagnetic fields obtained in a borehole. The medium is assumed to be cylindrically symmetric about the borehole, and to maintain the symmetry a vertical magnetic dipole is used as a source. The efficiency and robustness of an inversion scheme is very much dependent on the proper use of Lagrange multiplier, which is often provided manually to achieve a desired convergence. We utilize an automatic Lagrange multiplier selection scheme, which enhances the utility of the inversion scheme in handling field data. In this selection scheme, the integral equation (IE) method is quite attractive in speed because Green's functions, the most time consuming part in IE methods, are repeatedly re-usable throughout the selection procedure. The inversion scheme using the LN approximation has been tested to show its stability and efficiency using synthetic and field data. The inverted result from the field data is successfully compared with induction logging data measured in the same borehole.
Interpreting Temperature Strain Data from Meso-Scale Clathrate Experiments
Leeman, John R; Rawn, Claudia J; Ulrich, Shannon M; Elwood Madden, Megan; Phelps, Tommy Joe
2012-01-01
Gas hydrates are important in global climate change, carbon sequestra- tion, and seafloor stability. Currently, formation and dissociation pathways are poorly defined. We present a new approach for processing large amounts of data from meso-scale experiments, such as the LUNA distributed sensing system (DSS) in the seafloor process simulator (SPS) at Oak Ridge National Laboratory. The DSS provides a proxy for temperature measurement with a high spatial resolution allowing the heat of reaction during gas hydrate formation/dissociation to aid in locating clathrates in the vessel. The DSS fibers are placed in the sediment following an Archimedean spiral design and then the position of each sensor is solved by iterating over the arc length formula with Newtons method. The data is then gridded with 1 a natural neighbor interpolation algorithm to allow contouring of the data. The solution of the sensor locations is verified with hot and cold stimulus in known locations. An experiment was preformed with a vertically split column of sand and silt. The DSS system clearly showed hydrate forming in the sand first, then slowly creeping into the silt. Similar systems and data processing techniques could be used for monitoring of hydrates in natural environments or in any situation where a hybrid temperature/strain index is useful. Further ad- vances in fiber technology allow the fiber to be applied in any configuration and the position of each sensor to be precisely determined making practical applications easier.
The GEOS Ozone Data Assimilation System: Specification of Error Statistics
NASA Technical Reports Server (NTRS)
Stajner, Ivanka; Riishojgaard, Lars Peter; Rood, Richard B.
2000-01-01
A global three-dimensional ozone data assimilation system has been developed at the Data Assimilation Office of the NASA/Goddard Space Flight Center. The Total Ozone Mapping Spectrometer (TOMS) total ozone and the Solar Backscatter Ultraviolet (SBUV) or (SBUV/2) partial ozone profile observations are assimilated. The assimilation, into an off-line ozone transport model, is done using the global Physical-space Statistical Analysis Scheme (PSAS). This system became operational in December 1999. A detailed description of the statistical analysis scheme, and in particular, the forecast and observation error covariance models is given. A new global anisotropic horizontal forecast error correlation model accounts for a varying distribution of observations with latitude. Correlations are largest in the zonal direction in the tropics where data is sparse. Forecast error variance model is proportional to the ozone field. The forecast error covariance parameters were determined by maximum likelihood estimation. The error covariance models are validated using x squared statistics. The analyzed ozone fields in the winter 1992 are validated against independent observations from ozone sondes and HALOE. There is better than 10% agreement between mean Halogen Occultation Experiment (HALOE) and analysis fields between 70 and 0.2 hPa. The global root-mean-square (RMS) difference between TOMS observed and forecast values is less than 4%. The global RMS difference between SBUV observed and analyzed ozone between 50 and 3 hPa is less than 15%.
Mathematical and statistical aids to evaluate data from renal patients.
Knapp, M S; Smith, A F; Trimble, I M; Pownall, R; Gordon, K
1983-10-01
The monitoring of renal patients and the making of many decisions during their management involves consideration of sequences of numerical data. Renal function results after renal transplantation were used as an example of how graphical presentations, simple mathematical transforms, statistical evaluation and adjustments to the data, to take into account other biological and technical sources of error, can all contribute to better understanding. Experience with a statistical technique, the 4-state Kalman filter not often used in the biological sciences, was summarized and its use suggested as a method to quantitate some traditionally subjective decisions about individual patients, for example, the onset of allograft rejection. The method has identified in retrospect and in prospect events after transplantation earlier than did experienced clinicians. Other statistical techniques to set the sensitivity and specificity of monitoring methods, to detect change points and to quantitate rhythmic sequences of clinical data were discussed, with examples, and with increasing access to computers, these can be used more easily by nephrologists, transplant surgeons, and others. PMID:6358637
Statistical Analysis of Longitudinal Psychiatric Data with Dropouts
Mazumdar, Sati; Tang, Gong; Houck, Patricia R.; Dew, Mary Amanda; Begley, Amy E.; Scott, John; Mulsant, Benoit H.; Reynolds, Charles F.
2007-01-01
Longitudinal studies are used in psychiatric research to address outcome changes over time within and between individuals. However, because participants may drop out of a study prematurely, ignoring the nature of dropout often leads to biased inference, and in turn, wrongful conclusions. The purpose of the present paper is: (1) to review several dropout processes, corresponding inferential issues and recent methodological advances; (2) to evaluate the impact of assumptions regarding the dropout processes on inference by simulation studies and an illustrative example using psychiatric data; and (3) to provide a general strategy for practitioners to perform analyses of longitudinal data with dropouts, using software available commercially or in the public domain. The statistical methods used in this paper are maximum likelihood, multiple imputation and semi-parametric regression methods for inference, as well as Little’s test and ISNI (Index of Sensitivity to Nonignorability) for assessing statistical dropout mechanisms. We show that accounting for the nature of the dropout process influences results and that sensitivity analysis is useful in assessing the robustness of parameter estimates and related uncertainties. We conclude that recording the causes of dropouts should be an integral part of any statistical analysis with longitudinal psychiatric data, and we recommend performing a sensitivity analysis when the exact nature of the dropout process cannot be discerned. PMID:17092516
The statistical analysis of multivariate serological frequency data.
Reyment, Richard A
2005-11-01
Data occurring in the form of frequencies are common in genetics-for example, in serology. Examples are provided by the AB0 group, the Rhesus group, and also DNA data. The statistical analysis of tables of frequencies is carried out using the available methods of multivariate analysis with usually three principal aims. One of these is to seek meaningful relationships between the components of a data set, the second is to examine relationships between populations from which the data have been obtained, the third is to bring about a reduction in dimensionality. This latter aim is usually realized by means of bivariate scatter diagrams using scores computed from a multivariate analysis. The multivariate statistical analysis of tables of frequencies cannot safely be carried out by standard multivariate procedures because they represent compositions and are therefore embedded in simplex space, a subspace of full space. Appropriate procedures for simplex space are compared and contrasted with simple standard methods of multivariate analysis ("raw" principal component analysis). The study shows that the differences between a log-ratio model and a simple logarithmic transformation of proportions may not be very great, particularly as regards graphical ordinations, but important discrepancies do occur. The divergencies between logarithmically based analyses and raw data are, however, great. Published data on Rhesus alleles observed for Italian populations are used to exemplify the subject. PMID:16024067
Data analysis and interpretation of lunar dust exosphere
NASA Technical Reports Server (NTRS)
Andrews, George A., Jr.
1992-01-01
The lunar horizon glow observed by Apollo astronauts and captured on film during the Surveyor mission is believed to result from the scattering of sunlight off lunar fines suspended in a dust layer over the lunar surface. For scale heights on the order of tens of kilometers, it is anticipated that the size of the dust particles will be small enough to admit Rayleigh scattering. Such events would result in scattered light which is polarized to a degree which is a function of observation angle and produce spectra containing large high frequency components ('bluing'). Believing these signatures to be observable from ground based telescopes, observational data has been collected from McDonald Observatory and the task of reduction and analysis of this data is the focus of the present report.
Market Available Virgin Nickel Analysis Data Summary Interpretation Report
Hampson, Steve; Volpe, John
2004-10-01
Collection, analysis, and assessment of market available nickel samples for their radionuclide content is being conducted to support efforts of the Purchase Area Community Reuse Organization (PACRO) to identify and implement a decontamination method that will allow for the sale and recycling of contaminated Paducah Gaseous Diffusion Plant (PGDP) nickel-metal stockpiles. The objectives of the Nickel Project address the lack of radionuclide data in market available nickel metal. The lack of radionuclide data for commercial-recycled nickel metal or commercial-virgin nickel metal has been detrimental to assessments of the potential impacts of the free-release of recycled PGDP nickel on public health. The nickel project, to date, has only evaluated "virgin" nickel metal which is derived form non-recycled sources.
Data analysis and interpretation of lunar dust exosphere
NASA Technical Reports Server (NTRS)
Andrews, George A., Jr.
1993-01-01
The lunar horizon glow observed by Apollo astronauts and recorded during the Surveyor missions is believed to result from the scattering of sunlight off lunar fines suspended in a dust layer above the lunar surface. For scale heights of tens of kilometers, theory and astronaut's observations suggest that the size of the dust particles will be smaller than 0.1 microns in radius and will act as Rayleigh scatters. This means that the dust scattered light will be 100 percent polarized at a 90 degree scattering angle and will depend on wavelength to the inverse fourth power ('bluing'). Believing these signatures to be observable from ground based telescopes, observational data in the form of CCD images has been collected from McDonald Observatory's 36 in. telescope, and the reduction and analysis of this data is the focus of the present report.
Methods for Quantitative Interpretation of Retarding Field Analyzer Data
Calvey, J.R.; Crittenden, J.A.; Dugan, G.F.; Palmer, M.A.; Furman, M.; Harkay, K.
2011-03-28
Over the course of the CesrTA program at Cornell, over 30 Retarding Field Analyzers (RFAs) have been installed in the CESR storage ring, and a great deal of data has been taken with them. These devices measure the local electron cloud density and energy distribution, and can be used to evaluate the efficacy of different cloud mitigation techniques. Obtaining a quantitative understanding of RFA data requires use of cloud simulation programs, as well as a detailed model of the detector itself. In a drift region, the RFA can be modeled by postprocessing the output of a simulation code, and one can obtain best fit values for important simulation parameters with a chi-square minimization method.
Using fuzzy sets for data interpretation in natural analogue studies
De Lemos, F.L.; Sullivan, T.; Hellmuth, K.H.
2008-07-01
Natural analogue studies can play a key role in deep geological radioactive disposal systems safety assessment. These studies can help develop a better understanding of complex natural processes and, therefore, provide valuable means of confidence building in the safety assessment. In evaluation of natural analogues, there are, however, several sources of uncertainties that stem from factors such as complexity; lack of data; and ignorance. Often, analysts have to simplify the mathematical models in order to cope with the various sources of complexity and this ads uncertainty to the model results. The uncertainties reflected in model predictions must be addressed to understand their impact on safety assessment and therefore, the utility of natural analogues. Fuzzy sets can be used to represent the information regarding the natural processes and their mutual connections. With this methodology we are able to quantify and propagate the epistemic uncertainties in both processes and, thereby, assign degrees of truth to the similarities between them. An example calculation with literature data is provided. In conclusion: Fuzzy sets are an effective way of quantifying semi-quantitative information such as natural analogues data. Epistemic uncertainty that stems from complexity and lack of knowledge regarding natural processes are represented by the degrees of membership. It also facilitates the propagation of this uncertainty throughout the performance assessment by the extension principle. This principle allows calculation with fuzzy numbers, where fuzzy input results in fuzzy output. This may be one of the main applications of fuzzy sets theory to radioactive waste disposal facility performance assessment. Through the translation of natural data into fuzzy numbers, the effect of parameters in important processes in one site can be quantified and compared to processes in other sites with different conditions. The approach presented in this paper can be extended to
Linear and nonlinear interpretation of CV-580 lightning data
NASA Technical Reports Server (NTRS)
Ng, Poh H.; Rudolph, Terence H.; Perala, Rodney A.
1988-01-01
Numerical models developed for the study of lightning strike data acquired by in-flight aircraft are applied to the data measured on the CV-580. The basic technique used is the three dimensional time domain finite difference solution of Maxwell's equations. Both linear and nonlinear models are used in the analysis. In the linear model, the lightning channel and the aircraft are assumed to form a linear time invariant system. A transfer function technique can then be used to study the response of the aircraft to a given lightning strike current. Conversely, the lightning current can be inferred from the measured response. In the nonlinear model, the conductivity of air in the vicinity of the aircraft is calculated and incorporated into the solution of the Maxwell's equations. The nonlinear model thus simulates corona formation and air breakdown. Results obtained from the models are in reasonable agreement with the measured data. This study provides another validation of the models and increases confidence that the models may be used to predict aircraft response to any general lightning strike.
Interpretation of Cluster WBD frequency conversion mode data
NASA Astrophysics Data System (ADS)
Pickett, J. S.; Christopher, I. W.; Kirchner, D. L.
2013-08-01
The Cluster Wide-Band Data (WBD) plasma wave receiver mounted on each of the four Cluster spacecraft obtains high time resolution waveform data in the frequency range of ~70 Hz to 577 kHz. In order to make measurements above 77 kHz, it uses frequency conversion to sample the higher frequency waves at one of three different conversion frequencies (~125, 250 and 500 kHz, where these frequencies are the base frequency of the frequency range being sampled) in one of three different filter bandwidths (9.5, 19 and 77 kHz). Within the WBD instrument a down conversion technique, built around quadrature mixing, is used to convert these data to baseband (0 kHz) in order to reduce the sample rate for telemetry to the ground. We describe this down conversion technique and illustrate it through data obtained in space. Because these down converted data sometimes contain pulses, which can be indicative of nonlinear physical structures (e.g., electron phase space holes and electron density enhancements and depletions), it is necessary to understand what effects mixing and down conversion have on them. We present simulations using constructed signals containing pulses, nonlinear wave packets, sinusoids and noise. We show that the pulses and impulsive wave packets, if of sufficient amplitude and of appropriate width, survive the down conversion process, sometimes with the same pulse shape but usually with reduced amplitude, and have time scales consistent with the filter bandwidth at the base frequency. Although we cannot infer the actual time scale of the pulses and impulsive wave packets as originally recorded by the WBD instrument before mixing and down conversion, their presence indicates nonlinear processes occurring at or somewhat near the location of the measurement. Sinusoidal waves are represented in the down conversion time scale as sinusoids of nearly the same amplitude and at frequencies adjusted down by the conversion frequency. The original input waveforms, regardless
Interpretation of Cluster WBD frequency conversion mode data
NASA Astrophysics Data System (ADS)
Pickett, J. S.; Christopher, I. W.; Kirchner, D. L.
2014-02-01
The Cluster wide-band data (WBD) plasma wave receiver mounted on each of the four Cluster spacecraft obtains high time resolution waveform data in the frequency range of ~70 Hz to 577 kHz. In order to make measurements above 77 kHz, it uses frequency conversion to sample the higher frequency waves at one of three different conversion frequencies (~125, 250 and 500 kHz, these frequencies being the possible options for setting the base frequency of the frequency range being sampled) in one of three different filter bandwidths (9.5, 19 and 77 kHz). Within the WBD instrument, a down-conversion technique, built around quadrature mixing, is used to convert these data to baseband (0 kHz) in order to reduce the sample rate for telemetry to the ground. We describe this down-conversion technique and illustrate it through data obtained in space. Because these down-converted data sometimes contain pulses, which can be indicative of nonlinear physical structures (e.g., electron phase-space holes and electron density enhancements and depletions), it is necessary to understand what effects mixing and down conversion have on them. We present simulations using constructed signals containing pulses, nonlinear wave packets, sinusoids and noise. We show that the pulses and impulsive wave packets, if of sufficient amplitude and of appropriate width, survive the down-conversion process, sometimes with the same pulse shape but usually with reduced amplitude, and have timescales consistent with the filter bandwidth at the base frequency. Although we cannot infer the actual timescale of the pulses and impulsive wave packets as originally recorded by the WBD instrument before mixing and down conversion, their presence indicates nonlinear processes occurring at or somewhat near the location of the measurement. Sinusoidal waves are represented in the down-conversion timescale as sinusoids of nearly the same amplitude and at frequencies adjusted down by the conversion frequency. The original
Understanding spatial organizations of chromosomes via statistical analysis of Hi-C data
Hu, Ming; Deng, Ke; Qin, Zhaohui; Liu, Jun S.
2015-01-01
Understanding how chromosomes fold provides insights into the transcription regulation, hence, the functional state of the cell. Using the next generation sequencing technology, the recently developed Hi-C approach enables a global view of spatial chromatin organization in the nucleus, which substantially expands our knowledge about genome organization and function. However, due to multiple layers of biases, noises and uncertainties buried in the protocol of Hi-C experiments, analyzing and interpreting Hi-C data poses great challenges, and requires novel statistical methods to be developed. This article provides an overview of recent Hi-C studies and their impacts on biomedical research, describes major challenges in statistical analysis of Hi-C data, and discusses some perspectives for future research. PMID:26124977
Skylab 2 ground winds data reduction and statistical analysis
NASA Technical Reports Server (NTRS)
1974-01-01
A ground winds test was conducted on the Skylab 2 spacecraft in a subsonic wind tunnel and the results were tape recorded for analysis. The data reduction system used to analyze the tapes for full scale, first and second mode bending moments, or acceleration plots versus dynamic pressure or wind velocity is explained. Portions of the Skylab 2 tape data were analyzed statistically in the form of power spectral densities, autocorrelations, and cross correlations to introduce a concept of using system response decay as a measure of linear system damping.
Statistical Inference for Big Data Problems in Molecular Biophysics
Ramanathan, Arvind; Savol, Andrej; Burger, Virginia; Quinn, Shannon; Agarwal, Pratul K; Chennubhotla, Chakra
2012-01-01
We highlight the role of statistical inference techniques in providing biological insights from analyzing long time-scale molecular simulation data. Technologi- cal and algorithmic improvements in computation have brought molecular simu- lations to the forefront of techniques applied to investigating the basis of living systems. While these longer simulations, increasingly complex reaching petabyte scales presently, promise a detailed view into microscopic behavior, teasing out the important information has now become a true challenge on its own. Mining this data for important patterns is critical to automating therapeutic intervention discovery, improving protein design, and fundamentally understanding the mech- anistic basis of cellular homeostasis.
Automatic interpretation of ERTS data for forest management
NASA Technical Reports Server (NTRS)
Kirvida, L.; Johnson, G. R.
1973-01-01
Automatic stratification of forested land from ERTS-1 data provides a valuable tool for resource management. The results are useful for wood product yield estimates, recreation and wild life management, forest inventory and forest condition monitoring. Automatic procedures based on both multi-spectral and spatial features are evaluated. With five classes, training and testing on the same samples, classification accuracy of 74% was achieved using the MSS multispectral features. When adding texture computed from 8 x 8 arrays, classification accuracy of 99% was obtained.
SXLSQA. For the Interpretation of Solvent Extraction Data
Baes, C.F. Jr; Moyer, B.A.
1992-01-01
SXLSQA models solvent extraction systems involving an acidic and/or a neutral reagent in an organic solvent that can extract from an aqueous solution one or two cations in addition to H plus and one or two anions in addition to OH minus. In modelling data, any number of product species can be assumed to form in either phase. Activity coefficients of species in the aqueous phase can be calculated by the Pitzer treatment and in the organic phase by Hildebrand-Scott treatment.
Undergraduate non-science majors' descriptions and interpretations of scientific data visualizations
NASA Astrophysics Data System (ADS)
Swenson, Sandra Signe
Professionally developed and freely accessible through the Internet, scientific data maps have great potential for teaching and learning with data in the science classroom. Solving problems or developing ideas while using data maps of Earth phenomena in the science classroom may help students to understand the nature and process of science. Little is known about how students perceive and interpret scientific data visualizations. This study was an in-depth exploration of descriptions and interpretations of topographic and bathymetric data maps made by a population of 107 non-science majors at an urban public college. Survey, interviews, and artifacts were analyzed within an epistemological framework for understanding data collected about the Earth, by examining representational strategies used to understand maps, and by examining student interpretations using Bloom's Taxonomy of Educational Objectives. The findings suggest that the majority of students interpret data maps by assuming iconicity that was not intended by the maps creator; that students do not appear to have a robust understanding of how data is collected about Earth phenomena; and that while most students are able to make some kinds of interpretations of the data maps, often their interpretations are not based upon the actual data the map is representing. This study provided baseline information of student understanding of data maps from which educators may design curriculum for teaching and learning about Earth phenomena.
Biosimilar insulins: guidance for data interpretation by clinicians and users.
Heinemann, L; Home, P D; Hompesch, M
2015-10-01
Biosimilar insulins are approved copies of insulins outside patent protection. Advantages may include greater market competition and potential cost reduction, but clinicians and users lack a clear perspective on 'biosimilarity' for insulins. The manufacturing processes for biosimilar insulins are manufacturer-specific and, although these are reviewed by regulators there are few public data available to allow independent assessment or review of issues such as intrinsic quality or batch-to-batch variation. Preclinical measures used to assess biosimilarity, such as tissue and cellular studies of metabolic activity, physico-chemical stability and animal studies of pharmacodynamics, pharmacokinetics and immunogenicity may be insufficiently sensitive to differences, and are often not formally published. Pharmacokinetic and pharmacodynamic studies (glucose clamps) with humans, although core assessments, have problems of precision which are relevant for accurate insulin dosing. Studies that assess clinical efficacy and safety and device compatibility are limited by current outcome measures, such as glycated haemoblobin levels and hypoglycaemia, which are insensitive to differences between insulins. To address these issues, we suggest that all comparative data are put in the public domain, and that systematic clinical studies are performed to address batch-to-batch variability, delivery devices, interchangeability in practice and long-term efficacy and safety. Despite these challenges biosimilar insulins are a welcome addition to diabetes therapy and, with a transparent approach, should provide useful benefit to insulin users. PMID:25974131
Paleomagnetic data from Alaska: reliability, interpretation and terrane trajectories
NASA Astrophysics Data System (ADS)
Harbert, William
1990-11-01
Virtually the entire body of paleomagnetic data collected from southern Alaska depicts a clear decrease in paleolatitude with increasing age, strongly suggesting that southern Alaska represents a displaced terrane. In this paper, paleomagnetic studies from southern Alaska have been classified with respect to a Quality Index that is based on four criteria. These criteria are the presence of both polarities of magnetic remanence, stepwise thermal or alternating field demagnetization of specimens, principal component analysis of demagnetization data, and a successful fold test. Of the 51 studies compiled, only four from southern Alaska and one from western Canada are demonstrated to satisfy all criteria and fall therefore in the category of Group 3, ("highly reliable"). Two studies from southern Alaska satisfy three of the four criteria, lacking both polarities of characteristic remanence, and are judged to be of Group 2 ("reliable"). Two of these paleomagnetic studies constrain the accretion time of the southern Alaska terrane to the relatively stationary region of central Alaska north of the Denali fault. Four paleomagnetic studies from the southern Alaska terrane show a distinct paleolatitude anomaly when compared with their expected paleolatitudes from the North American apparent polar wander path. Using the model of Engebretson et al. (1984), a series of models are presented to best fit these highly reliable and reliable paleomagnetic studies. The model preferred in this article assumes an accretion time with North America of 50 Ma, and documents pre-50 Ma displacement of the southern Alaska terrane on the Kula plate. If the Ghost Rocks paleomagnetic magnetizations (Plumley et al., 1983) are assumed to be of earliest Tertiary age, this model fits all of the low paleolatitudes observed in southern Alaska. Models incorporating coastwise translation of the southern Alaska terrane along the western boundary of the North America plate and a 50 Ma suturing age of this
Spatial Statistical Procedures to Validate Input Data in Energy Models
Lawrence Livermore National Laboratory
2006-01-27
Energy modeling and analysis often relies on data collected for other purposes such as census counts, atmospheric and air quality observations, economic trends, and other primarily non-energy-related uses. Systematic collection of empirical data solely for regional, national, and global energy modeling has not been established as in the above-mentioned fields. Empirical and modeled data relevant to energy modeling is reported and available at various spatial and temporal scales that might or might not be those needed and used by the energy modeling community. The incorrect representation of spatial and temporal components of these data sets can result in energy models producing misleading conclusions, especially in cases of newly evolving technologies with spatial and temporal operating characteristics different from the dominant fossil and nuclear technologies that powered the energy economy over the last two hundred years. Increased private and government research and development and public interest in alternative technologies that have a benign effect on the climate and the environment have spurred interest in wind, solar, hydrogen, and other alternative energy sources and energy carriers. Many of these technologies require much finer spatial and temporal detail to determine optimal engineering designs, resource availability, and market potential. This paper presents exploratory and modeling techniques in spatial statistics that can improve the usefulness of empirical and modeled data sets that do not initially meet the spatial and/or temporal requirements of energy models. In particular, we focus on (1) aggregation and disaggregation of spatial data, (2) predicting missing data, and (3) merging spatial data sets. In addition, we introduce relevant statistical software models commonly used in the field for various sizes and types of data sets.
Spatial Statistical Procedures to Validate Input Data in Energy Models
Johannesson, G.; Stewart, J.; Barr, C.; Brady Sabeff, L.; George, R.; Heimiller, D.; Milbrandt, A.
2006-01-01
Energy modeling and analysis often relies on data collected for other purposes such as census counts, atmospheric and air quality observations, economic trends, and other primarily non-energy related uses. Systematic collection of empirical data solely for regional, national, and global energy modeling has not been established as in the abovementioned fields. Empirical and modeled data relevant to energy modeling is reported and available at various spatial and temporal scales that might or might not be those needed and used by the energy modeling community. The incorrect representation of spatial and temporal components of these data sets can result in energy models producing misleading conclusions, especially in cases of newly evolving technologies with spatial and temporal operating characteristics different from the dominant fossil and nuclear technologies that powered the energy economy over the last two hundred years. Increased private and government research and development and public interest in alternative technologies that have a benign effect on the climate and the environment have spurred interest in wind, solar, hydrogen, and other alternative energy sources and energy carriers. Many of these technologies require much finer spatial and temporal detail to determine optimal engineering designs, resource availability, and market potential. This paper presents exploratory and modeling techniques in spatial statistics that can improve the usefulness of empirical and modeled data sets that do not initially meet the spatial and/or temporal requirements of energy models. In particular, we focus on (1) aggregation and disaggregation of spatial data, (2) predicting missing data, and (3) merging spatial data sets. In addition, we introduce relevant statistical software models commonly used in the field for various sizes and types of data sets.
Statistical methods for handling unwanted variation in metabolomics data
Sysi-Aho, Marko; Jacob, Laurent; Gagnon-Bartsch, Johann A.; Castillo, Sandra; Simpson, Julie A; Speed, Terence P.
2015-01-01
Metabolomics experiments are inevitably subject to a component of unwanted variation, due to factors such as batch effects, long runs of samples, and confounding biological variation. Although the removal of this unwanted variation is a vital step in the analysis of metabolomics data, it is considered a gray area in which there is a recognised need to develop a better understanding of the procedures and statistical methods required to achieve statistically relevant optimal biological outcomes. In this paper, we discuss the causes of unwanted variation in metabolomics experiments, review commonly used metabolomics approaches for handling this unwanted variation, and present a statistical approach for the removal of unwanted variation to obtain normalized metabolomics data. The advantages and performance of the approach relative to several widely-used metabolomics normalization approaches are illustrated through two metabolomics studies, and recommendations are provided for choosing and assessing the most suitable normalization method for a given metabolomics experiment. Software for the approach is made freely available online. PMID:25692814
Statistics of Optical Coherence Tomography Data From Human Retina
de Juan, Joaquín; Ferrone, Claudia; Giannini, Daniela; Huang, David; Koch, Giorgio; Russo, Valentina; Tan, Ou; Bruni, Carlo
2010-01-01
Optical coherence tomography (OCT) has recently become one of the primary methods for noninvasive probing of the human retina. The pseudoimage formed by OCT (the so-called B-scan) varies probabilistically across pixels due to complexities in the measurement technique. Hence, sensitive automatic procedures of diagnosis using OCT may exploit statistical analysis of the spatial distribution of reflectance. In this paper, we perform a statistical study of retinal OCT data. We find that the stretched exponential probability density function can model well the distribution of intensities in OCT pseudoimages. Moreover, we show a small, but significant correlation between neighbor pixels when measuring OCT intensities with pixels of about 5 µm. We then develop a simple joint probability model for the OCT data consistent with known retinal features. This model fits well the stretched exponential distribution of intensities and their spatial correlation. In normal retinas, fit parameters of this model are relatively constant along retinal layers, but varies across layers. However, in retinas with diabetic retinopathy, large spikes of parameter modulation interrupt the constancy within layers, exactly where pathologies are visible. We argue that these results give hope for improvement in statistical pathology-detection methods even when the disease is in its early stages. PMID:20304733
Interpretation of Pennsylvania agricultural land use from ERTS-1 data
NASA Technical Reports Server (NTRS)
Mcmurtry, G. J.; Petersen, G. W. (Principal Investigator); Wilson, A. D.
1974-01-01
The author has identified the following significant results. To study the complex agricultural patterns in Pennsylvania, a portion of an ERTS scene was selected for detailed analysis. Various photographic products were made and were found to be only of limited value. This necessitated the digital processing of the ERTS data. Using an unsupervised classification procedure, it was possible to delineate the following categories: (1) forest land with a northern aspect, (2) forest land with a southern aspect, (3) valley trees, (4) wheat, (5) corn, (6) alfalfa, grass, pasture, (7) disturbed land, (8) builtup land, (9) strip mines, and (10) water. These land use categories were delineated at a scale of approximately 1:20,000 on the line printer output. Land use delineations were also made using the General Electric IMAGE 100 interactive analysis system.
Common misconceptions about data analysis and statistics1
Motulsky, Harvey J
2015-01-01
Ideally, any experienced investigator with the right tools should be able to reproduce a finding published in a peer-reviewed biomedical science journal. In fact, the reproducibility of a large percentage of published findings has been questioned. Undoubtedly, there are many reasons for this, but one reason may be that investigators fool themselves due to a poor understanding of statistical concepts. In particular, investigators often make these mistakes: (1) P-Hacking. This is when you reanalyze a data set in many different ways, or perhaps reanalyze with additional replicates, until you get the result you want. (2) Overemphasis on P values rather than on the actual size of the observed effect. (3) Overuse of statistical hypothesis testing, and being seduced by the word “significant”. (4) Overreliance on standard errors, which are often misunderstood. PMID:25692012
Filaments Data Since 1919: A Basis for Statistics
NASA Astrophysics Data System (ADS)
Aboudarham, J.; Renié, C.
2016-04-01
From 1919 to 2002, Paris-Meudon Observatory published synoptic maps of the Solar activity. Together with maps, tables were provided, containing some information concerning at least filaments. The board of Paris Observatory funded a data capture program concerning the 680 000 basic informations available in those tables. On the other hand, in the frame of the FP7 European project HELIO, a Heliophysics Feature Catalogue (HFC) has been developed, which contains also filaments data from 1996 up to now. We now pool all these data in order to give access to a filaments database for nearly a century of observations. This allows to make statistical studies of those Solar features, and try to correlate them with other information such as sunspot number. We present here the data available for this long period of time.
Statistical comparison of the AGDISP model with deposit data
NASA Astrophysics Data System (ADS)
Duan, Baozhong; Yendol, William G.; Mierzejewski, Karl
An aerial spray Agricultural Dispersal (AGDISP) model was tested against quantitative field data. The microbial pesticide Bacillus thuringiensis (Bt) was sprayed as fine spray from a helicopted over a flat site in various meteorological conditions. Droplet deposition on evenly spaced Kromekote cards, 0.15 m above the ground, was measured with image analysis equipment. Six complete data sets out of the 12 trials were selected for data comparison. A set of statistical parameters suggested by the American Meteorological Society and other authors was applied for comparisons of the model prediction with the ground deposit data. The results indicated that AGDISP tended to overpredict the average volume deposition by a factor of two. The sensitivity test of the AGDISP model to the input wind direction showed that the model may not be sensitive to variations in wind direction within 10 degrees relative to aircraft flight path.
Statistical mechanics of complex neural systems and high dimensional data
NASA Astrophysics Data System (ADS)
Advani, Madhu; Lahiri, Subhaneil; Ganguli, Surya
2013-03-01
Recent experimental advances in neuroscience have opened new vistas into the immense complexity of neuronal networks. This proliferation of data challenges us on two parallel fronts. First, how can we form adequate theoretical frameworks for understanding how dynamical network processes cooperate across widely disparate spatiotemporal scales to solve important computational problems? Second, how can we extract meaningful models of neuronal systems from high dimensional datasets? To aid in these challenges, we give a pedagogical review of a collection of ideas and theoretical methods arising at the intersection of statistical physics, computer science and neurobiology. We introduce the interrelated replica and cavity methods, which originated in statistical physics as powerful ways to quantitatively analyze large highly heterogeneous systems of many interacting degrees of freedom. We also introduce the closely related notion of message passing in graphical models, which originated in computer science as a distributed algorithm capable of solving large inference and optimization problems involving many coupled variables. We then show how both the statistical physics and computer science perspectives can be applied in a wide diversity of contexts to problems arising in theoretical neuroscience and data analysis. Along the way we discuss spin glasses, learning theory, illusions of structure in noise, random matrices, dimensionality reduction and compressed sensing, all within the unified formalism of the replica method. Moreover, we review recent conceptual connections between message passing in graphical models, and neural computation and learning. Overall, these ideas illustrate how statistical physics and computer science might provide a lens through which we can uncover emergent computational functions buried deep within the dynamical complexities of neuronal networks.
How to Measure and Interpret Quality Improvement Data.
McQuillan, Rory Francis; Silver, Samuel Adam; Harel, Ziv; Weizman, Adam; Thomas, Alison; Bell, Chaim; Chertow, Glenn M; Chan, Christopher T; Nesrallah, Gihad
2016-05-01
This article will demonstrate how to conduct a quality improvement project using the change idea generated in "How To Use Quality Improvement Tools in Clinical Practice: How To Diagnose Solutions to a Quality of Care Problem" by Dr. Ziv Harel and colleagues in this Moving Points feature. This change idea involves the introduction of a nurse educator into a CKD clinic with a goal of increasing rates of patients performing dialysis independently at home (home hemodialysis or peritoneal dialysis). Using this example, we will illustrate a Plan-Do-Study-Act (PDSA) cycle in action and highlight the principles of rapid cycle change methodology. We will then discuss the selection of outcome, process, and balancing measures, and the practicalities of collecting these data in the clinic environment. We will also introduce the PDSA worksheet as a practical way to oversee the progress of a quality improvement project. Finally, we will demonstrate how run charts are used to visually illustrate improvement in real time, and how this information can be used to validate achievement, respond appropriately to challenges the project may encounter, and prove the significance of results. This article aims to provide readers with a clear and practical framework upon which to trial their own ideas for quality improvement in the clinical setting. PMID:27016496
Rifting of Continental Interiors: Some New Geophysical Data and Interpretations
NASA Astrophysics Data System (ADS)
Keller, G. R.
2005-12-01
Rifting is one of the major processes that affect the evolution of the continents. This process sometimes leads to continental breakup and the formation of new oceans, but more often does not. This is presumably due to extension not progressing sufficiently to form a new plate margin resulting in a structure, which remains isolated in an intra-plate environment. The Southern Oklahoma aulacogen is such a feature, and the continental portion of the East African rift system may be a modern example. As more detailed geophysical and geological studies of rifts have become available in recent years, a complex picture of rift structure and evolution has emerged. Global patterns that reveal the connections between lithospheric structure (deep and shallow), magmatism (amount and style), amount of extension, uplift, and older structures remain elusive. However, our geophysical studies of modern and paleo rifts in North America, East Africa, and Europe makes it possible to make some general observations: 1). Magmatism in rifts is modest without the presence of a (pre-existing?) thermal anomaly in the mantle. 2). Magmatic modification of the crust takes many forms which probably depend on the nature of older structures present and the state of the lithosphere when rifting is initiated (i.e. cold vs. hot; fertility), 3) There is no clear relation between amount of extension and the amount of magmatic modification of the crust. 4) Brittle deformation in the upper crustal is complex, often asymmetrical and older features often play important roles in focusing deformation. However on a lithospheric scale, rift structure is usually symmetrical. 5) A better understanding of rift processes is emerging as we achieve higher levels of integration of a wide variety of geoscience data.
NASA Astrophysics Data System (ADS)
Lacelle, Serge; Hwang, Son-Jong; Gerstein, Bernard C.
1993-12-01
The conventional method of data analysis and interpretation of time-resolved multiple quantum (MQ) nuclear magnetic resonance (NMR) spectra of solids is closely examined. Intensity profiles of experimental 1H MQ NMR spectra of polycrystalline adamantane and hexamethylbenzene serve to test the Gaussian statistical model approach. Consequences of this model are explored with a least-squares fitting procedure, transformation of data to yield linear plots, and a scaling analysis. Non-Gaussian behavior of the MQ NMR spectral intensity profiles, as a function of order of coherences, is demonstrated with all these methods of analysis. A heuristic argument, based on the multiplicative properties of dipolar coupling constants in the equation of motion of the density operator, leads to the prediction of exponentially decaying MQ NMR spectral intensity profiles. Scaling analysis and semilog plots of experimental time-resolved MQ NMR spectra of adamantane and hexamethylbenzene support this deduction. Dynamical scale invariance in the growth process of multiple spin coherences is revealed with this new approach. The validity of spin counting in solids with MQ NMR is discussed in light of the present results.
Lacelle, S. ); Hwang, S. ); Gerstein, B.C. )
1993-12-01
The conventional method of data analysis and interpretation of time-resolved multiple quantum (MQ) nuclear magnetic resonance (NMR) spectra of solids is closely examined. Intensity profiles of experimental [sup 1]H MQ NMR spectra of polycrystalline adamantane and hexamethylbenzene serve to test the Gaussian statistical model approach. Consequences of this model are explored with a least-squares fitting procedure, transformation of data to yield linear plots, and a scaling analysis. Non-Gaussian behavior of the MQ NMR spectral intensity profiles, as a function of order of coherences, is demonstrated with all these methods of analysis. A heuristic argument, based on the multiplicative properties of dipolar coupling constants in the equation of motion of the density operator, leads to the prediction of exponentially decaying MQ NMR spectral intensity profiles. Scaling analysis and semilog plots of experimental time-resolved MQ NMR spectra of adamantane and hexamethylbenzene support this deduction. Dynamical scale invariance in the growth process of multiple spin coherences is revealed with this new approach. The validity of spin counting in solids with MQ NMR is discussed in light of the present results.
Searching the Heavens: Astronomy, Computation, Statistics, Data Mining and Philosophy
NASA Astrophysics Data System (ADS)
Glymour, Clark
2012-03-01
Our first and purest science, the mother of scientific methods, sustained by sheer curiosity, searching the heavens we cannot manipulate. From the beginning, astronomy has combined mathematical idealization, technological ingenuity, and indefatigable data collection with procedures to search through assembled data for the processes that govern the cosmos. Astronomers are, and ever have been, data miners, and for that reason astronomical methods (but not astronomical discoveries) have often been despised by statisticians and philosophers. Epithets laced the statistical literature: Ransacking! Data dredging! Double Counting! Statistical disdain was usually directed at social scientists and biologists, rarely if ever at astronomers, but the methodological attitudes and goals that many twentieth-century philosophers and statisticians rejected were creations of the astronomical tradition. The philosophical criticisms were earlier and more direct. In the shadow (or in Alexander Popeâs phrasing, the light) cast on nature in the eighteenth century by the Newtonian triumph, David Hume revived arguments from the ancient Greeks to challenge the very possibility of coming to know what causes what. His conclusion was endorsed in the twentieth century by many philosophers who found talk of causation unnecessary or unacceptably metaphysical, and absorbed by many statisticians as a general suspicion of causal claims, except possibly when they are founded on experimental manipulation. And yet in the hands of a mathematician, Thomas Bayes, and another mathematician and philosopher, Richard Price, Humeâs essays prompted the development of a new kind of statistics, the kind we now call "Bayesian." The computer and new data acquisition methods have begun to dissolve the antipathy between astronomy, philosophy, and statistics. But the resolution is practical, without much reflection on the arguments or the course of events. So, I offer a largely unoriginal history
Summary Statistics for Homemade ?Play Dough? -- Data Acquired at LLNL
Kallman, J S; Morales, K E; Whipple, R E; Huber, R D; Martz, A; Brown, W D; Smith, J A; Schneberk, D J; Martz, Jr., H E; White, III, W T
2010-03-11
Using x-ray computerized tomography (CT), we have characterized the x-ray linear attenuation coefficients (LAC) of a homemade Play Dough{trademark}-like material, designated as PDA. Table 1 gives the first-order statistics for each of four CT measurements, estimated with a Gaussian kernel density estimator (KDE) analysis. The mean values of the LAC range from a high of about 2700 LMHU{sub D} 100kVp to a low of about 1200 LMHUD at 300kVp. The standard deviation of each measurement is around 10% to 15% of the mean. The entropy covers the range from 6.0 to 7.4. Ordinarily, we would model the LAC of the material and compare the modeled values to the measured values. In this case, however, we did not have the detailed chemical composition of the material and therefore did not model the LAC. Using a method recently proposed by Lawrence Livermore National Laboratory (LLNL), we estimate the value of the effective atomic number, Z{sub eff}, to be near 10. LLNL prepared about 50mL of the homemade 'Play Dough' in a polypropylene vial and firmly compressed it immediately prior to the x-ray measurements. We used the computer program IMGREC to reconstruct the CT images. The values of the key parameters used in the data capture and image reconstruction are given in this report. Additional details may be found in the experimental SOP and a separate document. To characterize the statistical distribution of LAC values in each CT image, we first isolated an 80% central-core segment of volume elements ('voxels') lying completely within the specimen, away from the walls of the polypropylene vial. All of the voxels within this central core, including those comprised of voids and inclusions, are included in the statistics. We then calculated the mean value, standard deviation and entropy for (a) the four image segments and for (b) their digital gradient images. (A digital gradient image of a given image was obtained by taking the absolute value of the difference between the initial image
Summary Statistics for Fun Dough Data Acquired at LLNL
Kallman, J S; Morales, K E; Whipple, R E; Huber, R D; Brown, W D; Smith, J A; Schneberk, D J; Martz, Jr., H E; White, III, W T
2010-03-11
Using x-ray computerized tomography (CT), we have characterized the x-ray linear attenuation coefficients (LAC) of a Play Dough{trademark}-like product, Fun Dough{trademark}, designated as PD. Table 1 gives the first-order statistics for each of four CT measurements, estimated with a Gaussian kernel density estimator (KDE) analysis. The mean values of the LAC range from a high of about 2100 LMHU{sub D} at 100kVp to a low of about 1100 LMHU{sub D} at 300kVp. The standard deviation of each measurement is around 1% of the mean. The entropy covers the range from 3.9 to 4.6. Ordinarily, we would model the LAC of the material and compare the modeled values to the measured values. In this case, however, we did not have the composition of the material and therefore did not model the LAC. Using a method recently proposed by Lawrence Livermore National Laboratory (LLNL), we estimate the value of the effective atomic number, Z{sub eff}, to be near 8.5. LLNL prepared about 50mL of the Fun Dough{trademark} in a polypropylene vial and firmly compressed it immediately prior to the x-ray measurements. Still, layers can plainly be seen in the reconstructed images, indicating that the bulk density of the material in the container is affected by voids and bubbles. We used the computer program IMGREC to reconstruct the CT images. The values of the key parameters used in the data capture and image reconstruction are given in this report. Additional details may be found in the experimental SOP and a separate document. To characterize the statistical distribution of LAC values in each CT image, we first isolated an 80% central-core segment of volume elements ('voxels') lying completely within the specimen, away from the walls of the polypropylene vial. All of the voxels within this central core, including those comprised of voids and inclusions, are included in the statistics. We then calculated the mean value, standard deviation and entropy for (a) the four image segments and for (b
Statistical significance of climate sensitivity predictors obtained by data mining
NASA Astrophysics Data System (ADS)
Caldwell, Peter M.; Bretherton, Christopher S.; Zelinka, Mark D.; Klein, Stephen A.; Santer, Benjamin D.; Sanderson, Benjamin M.
2014-03-01
Several recent efforts to estimate Earth's equilibrium climate sensitivity (ECS) focus on identifying quantities in the current climate which are skillful predictors of ECS yet can be constrained by observations. This study automates the search for observable predictors using data from phase 5 of the Coupled Model Intercomparison Project. The primary focus of this paper is assessing statistical significance of the resulting predictive relationships. Failure to account for dependence between models, variables, locations, and seasons is shown to yield misleading results. A new technique for testing the field significance of data-mined correlations which avoids these problems is presented. Using this new approach, all 41,741 relationships we tested were found to be explainable by chance. This leads us to conclude that data mining is best used to identify potential relationships which are then validated or discarded using physically based hypothesis testing.
Inferential Statistics from Black Hispanic Breast Cancer Survival Data
Khan, Hafiz M. R.; Saxena, Anshul; Ross, Elizabeth
2014-01-01
In this paper we test the statistical probability models for breast cancer survival data for race and ethnicity. Data was collected from breast cancer patients diagnosed in United States during the years 1973–2009. We selected a stratified random sample of Black Hispanic female patients from the Surveillance Epidemiology and End Results (SEER) database to derive the statistical probability models. We used three common model building criteria which include Akaike Information Criteria (AIC), Bayesian Information Criteria (BIC), and Deviance Information Criteria (DIC) to measure the goodness of fit tests and it was found that Black Hispanic female patients survival data better fit the exponentiated exponential probability model. A novel Bayesian method was used to derive the posterior density function for the model parameters as well as to derive the predictive inference for future response. We specifically focused on Black Hispanic race. Markov Chain Monte Carlo (MCMC) method was used for obtaining the summary results of posterior parameters. Additionally, we reported predictive intervals for future survival times. These findings would be of great significance in treatment planning and healthcare resource allocation. PMID:24678273
Data collection, computation and statistical analysis in psychophysiological experiments.
Buzzi, R; Wespi, J; Zwimpfer, J
1982-01-01
The system was designed to allow simultaneous monitoring of eight bioelectrical signals together with the necessary event markers. The data inputs are pulse code modulated, recorded on magnetic tape, and then read into a minicomputer. The computer permits the determination of parameters for the following signals: electrocardiogram (ECG), respiration (RESP), skin conductance changes (SCC), electromyogram (EMG), plethysmogram (PLET), pulse transmission time (PTT), and electroencephalogram (EEG). These parameters are determined for time blocks of selectable duration and read into a mainframe computer for further statistical analysis. PMID:7183101
Predictive data modeling of human type II diabetes related statistics
NASA Astrophysics Data System (ADS)
Jaenisch, Kristina L.; Jaenisch, Holger M.; Handley, James W.; Albritton, Nathaniel G.
2009-04-01
During the course of routine Type II treatment of one of the authors, it was decided to derive predictive analytical Data Models of the daily sampled vital statistics: namely weight, blood pressure, and blood sugar, to determine if the covariance among the observed variables could yield a descriptive equation based model, or better still, a predictive analytical model that could forecast the expected future trend of the variables and possibly eliminate the number of finger stickings required to montior blood sugar levels. The personal history and analysis with resulting models are presented.
Statistical mechanics of lossy data compression using a nonmonotonic perceptron
NASA Astrophysics Data System (ADS)
Hosaka, Tadaaki; Kabashima, Yoshiyuki; Nishimori, Hidetoshi
2002-12-01
The performance of a lossy data compression scheme for uniformly biased Boolean messages is investigated via methods of statistical mechanics. Inspired by a formal similarity to the storage capacity problem in neural network research, we utilize a perceptron of which the transfer function is appropriately designed in order to compress and decode the messages. Employing the replica method, we analytically show that our scheme can achieve the optimal performance known in the framework of lossy compression in most cases when the code length becomes infinite. The validity of the obtained results is numerically confirmed.
Statistical assessment of model fit for synthetic aperture radar data
NASA Astrophysics Data System (ADS)
DeVore, Michael D.; O'Sullivan, Joseph A.
2001-08-01
Parametric approaches to problems of inference from observed data often rely on assumed probabilistic models for the data which may be based on knowledge of the physics of the data acquisition. Given a rich enough collection of sample data, the validity of those assumed models can be assessed in a statistical hypothesis testing framework using any of a number of goodness-of-fit tests developed over the last hundred years for this purpose. Such assessments can be used both to compare alternate models for observed data and to help determine the conditions under which a given model breaks down. We apply three such methods, the (chi) 2 test of Karl Pearson, Kolmogorov's goodness-of-fit test, and the D'Agostino-Pearson test for normality, to quantify how well the data fit various models for synthetic aperture radar (SAR) images. The results of these tests are used to compare a conditionally Gaussian model for complex-valued SAR pixel values, a conditionally log-normal model for SAR pixel magnitudes, and a conditionally normal model for SAR pixel quarter-power values. Sample data for these tests are drawn from the publicly released MSTAR dataset.
StegoWall: blind statistical detection of hidden data
NASA Astrophysics Data System (ADS)
Voloshynovskiy, Sviatoslav V.; Herrigel, Alexander; Rytsar, Yuri B.; Pun, Thierry
2002-04-01
Novel functional possibilities, provided by recent data hiding technologies, carry out the danger of uncontrolled (unauthorized) and unlimited information exchange that might be used by people with unfriendly interests. The multimedia industry as well as the research community recognize the urgent necessity for network security and copyright protection, or rather the lack of adequate law for digital multimedia protection. This paper advocates the need for detecting hidden data in digital and analog media as well as in electronic transmissions, and for attempting to identify the underlying hidden data. Solving this problem calls for the development of an architecture for blind stochastic hidden data detection in order to prevent unauthorized data exchange. The proposed architecture is called StegoWall; its key aspects are the solid investigation, the deep understanding, and the prediction of possible tendencies in the development of advanced data hiding technologies. The basic idea of our complex approach is to exploit all information about hidden data statistics to perform its detection based on a stochastic framework. The StegoWall system will be used for four main applications: robust watermarking, secret communications, integrity control and tamper proofing, and internet/network security.
Interpreting Low Spatial Resolution Thermal Data from Active Volcanoes on Io and the Earth
NASA Technical Reports Server (NTRS)
Keszthelyi, L.; Harris, A. J. L.; Flynn, L.; Davies, A. G.; McEwen, A.
2001-01-01
The style of volcanism was successfully determined at a number of active volcanoes on Io and the Earth using the same techniques to interpret thermal remote sensing data. Additional information is contained in the original extended abstract.
Interpretation of complex glacial geology from AEM data using a knowledge-driven cognitive approach
NASA Astrophysics Data System (ADS)
Jørgensen, Flemming; Sandersen, Peter B. E.
2014-05-01
Existing borehole data are seldom sufficient for detailed subsurface geological interpretation and 3D modelling due to geological complexity. If geology is not too complex and the amount of borehole data is high, experienced geologists may be able to construct coarse models based on boreholes and by using their geological expert knowledge. But very often supplementary data is needed, and this is one of the reasons for the growing use of geophysical methods like Airborne ElectroMagnetic methods (AEM). New developments in AEM technology offer new opportunities for spatially dense subsurface mapping. The new AEM data enable high-quality mapping of detailed geology, providing new and improved geological knowledge and understanding of surveyed areas. When AEM data is geologically interpreted, it is the measured electrical resistivity that is being used. The translation of resistivity into geology/lithology is a complicated task, but without this translation, lithological properties and the structural composition of the subsurface cannot be properly assessed. The translation can only be successfully done if a series of limiting issues about the methodology are carefully considered and implemented in the interpretation. An automated conversion/interpretation routine is therefore difficult to establish. In order to end up with the best interpretation that makes full use of the collected data and at the same time improves the geological understanding of the area, we recommend knowledge-driven cognitive interpretation approaches. Cognitive interpretation ensures a high degree of incorporated geological background knowledge such as the understanding of sedimentary processes, structural geology or sequence stratigraphy, but also that the limitations of the method mentioned above are taken into account. We will present cases where AEM data combined with seismic and borehole data have been successfully interpreted, and we will show how they have brought new insight into local
Lindsey, David A.
2001-01-01
Pebble count data from Quaternary gravel deposits north of Denver, Colo., were analyzed by multivariate statistical methods to identify lithologic factors that might affect aggregate quality. The pebble count data used in this analysis were taken from the map by Colton and Fitch (1974) and are supplemented by data reported by the Front Range Infrastructure Resources Project. This report provides data tables and results of the statistical analysis. The multivariate statistical analysis used here consists of log-contrast principal components analysis (method of Reyment and Savazzi, 1999) followed by rotation of principal components and factor interpretation. Three lithologic factors that might affect aggregate quality were identified: 1) granite and gneiss versus pegmatite, 2) quartz + quartzite versus total volcanic rocks, and 3) total sedimentary rocks (mainly sandstone) versus granite. Factor 1 (grain size of igneous and metamorphic rocks) may represent destruction during weathering and transport or varying proportions of rocks in source areas. Factor 2 (resistant source rocks) represents the dispersion shadow of metaquartzite detritus, perhaps enhanced by resistance of quartz and quartzite during weathering and transport. Factor 3 (proximity to sandstone source) represents dilution of gravel by soft sedimentary rocks (mainly sandstone), which are exposed mainly in hogbacks near the mountain front. Factor 1 probably does not affect aggregate quality. Factor 2 would be expected to enhance aggregate quality as measured by the Los Angeles degradation test. Factor 3 may diminish aggregate quality.
Right-sizing statistical models for longitudinal data.
Wood, Phillip K; Steinley, Douglas; Jackson, Kristina M
2015-12-01
Arguments are proposed that researchers using longitudinal data should consider more and less complex statistical model alternatives to their initially chosen techniques in an effort to "right-size" the model to the data at hand. Such model comparisons may alert researchers who use poorly fitting, overly parsimonious models to more complex, better-fitting alternatives and, alternatively, may identify more parsimonious alternatives to overly complex (and perhaps empirically underidentified and/or less powerful) statistical models. A general framework is proposed for considering (often nested) relationships between a variety of psychometric and growth curve models. A 3-step approach is proposed in which models are evaluated based on the number and patterning of variance components prior to selection of better-fitting growth models that explain both mean and variation-covariation patterns. The orthogonal free curve slope intercept (FCSI) growth model is considered a general model that includes, as special cases, many models, including the factor mean (FM) model (McArdle & Epstein, 1987), McDonald's (1967) linearly constrained factor model, hierarchical linear models (HLMs), repeated-measures multivariate analysis of variance (MANOVA), and the linear slope intercept (linearSI) growth model. The FCSI model, in turn, is nested within the Tuckerized factor model. The approach is illustrated by comparing alternative models in a longitudinal study of children's vocabulary and by comparing several candidate parametric growth and chronometric models in a Monte Carlo study. PMID:26237507
Models to interpret bed-form geometries from cross-bed data
Luthi, S.M. ); Banavar, J.R. ); Bayer, U. )
1990-05-01
To improve the understanding of the relation of cross-bed azimuth distributions to bed-forms, geometric models were developed for migrating bed forms using a minimum number of parameters. Semielliptical and sinusoidal bed-form crestlines were modeled with curvature and sinuosity as parameters. Both bedform crestlines are propagated at various angles of migration over a finite area of deposition. Two computational approaches are used, a statistical random sampling (Monte Carlo) technique over the area of the deposit, and an analytical method based on topology and differential geometry. The resulting foreset azimuth distributions provide a catalog for a variety of simulations. The resulting thickness distributions have a simple shape and can be combined with the azimuth distributions to further constrain the cross-strata geometry. Paleocurrent directions obtained by these models can differ substantially from other methods, especially for obliquely migrating low-curvature bed forms. Interpretation of foreset azimuth data from outcrops and wells can be done either by visual comparison with the cataloged distributions, or by iterative computational fits. Studied examples include eolian cross-strata from the Permian Rotliegendes in the North Sea, fluvial dunes from the Devonian in the Catskills (New York state), the Triassic Schilfsandstein (Federal Republic of Germany), and the Paleozoic-Jurassic of the Western Desert (Egypt), as well as recent tidal dunes from the German coast of the North Sea and tidal cross-strata from the Devonian Koblentzquartzit (Federal Republic of Germany). In all cases the semi-elliptical bed-form model gave a good fit to the data, suggesting that it may be applicable over a wide range of bed forms. The data from the Western Desert could be explained only by data scatter due to channel sinuosity combined with the scatter attributed to the ellipticity of the bed-form crestlines.
Securing co-operation from persons supplying statistical data
Aubenque, M. J.; Blaikley, R. M.; Harris, F. Fraser; Lal, R. B.; Neurdenburg, M. G.; Hernández, R. de Shelly
1954-01-01
Securing the co-operation of persons supplying information required for medical statistics is essentially a problem in human relations, and an understanding of the motivations, attitudes, and behaviour of the respondents is necessary. Before any new statistical survey is undertaken, it is suggested by Aubenque and Harris that a preliminary review be made so that the maximum use is made of existing information. Care should also be taken not to burden respondents with an overloaded questionnaire. Aubenque and Harris recommend simplified reporting. Complete population coverage is not necessary. Neurdenburg suggests that the co-operation and support of such organizations as medical associations and social security boards are important and that propaganda should be directed specifically to the groups whose co-operation is sought. Informal personal contacts are valuable and desirable, according to Blaikley, but may have adverse effects if the right kind of approach is not made. Financial payments as an incentive in securing co-operation are opposed by Neurdenburg, who proposes that only postage-free envelopes or similar small favours be granted. Blaikley and Harris, on the other hand, express the view that financial incentives may do much to gain the support of those required to furnish data; there are, however, other incentives, and full use should be made of the natural inclinations of respondents. Compulsion may be necessary in certain instances, but administrative rather than statutory measures should be adopted. Penalties, according to Aubenque, should be inflicted only when justified by imperative health requirements. The results of surveys should be made available as soon as possible to those who co-operated, and Aubenque and Harris point out that they should also be of practical value to the suppliers of the information. Greater co-operation can be secured from medical persons who have an understanding of the statistical principles involved; Aubenque and
Estimation of descriptive statistics for multiply censored water quality data
Helsel, D.R.; Cohn, T.A.
1988-01-01
This paper extends the work of Gilliom and Helsel on procedures for estimating descriptive statistics of water quality data than contain "less than' observations. Previously, procedures were evaluated when only one detection limit was present. Here the performance of estimators for data that have multiple detection limits is investigated. Probability plotting and maximum likelihood methods perform substantially better than simple substitution procedures now commonly in use. Therefore simple substitution procedures eg substitution of the detection limit should be avoided. Probability plotting methods are more robust than maximum likelihood methods to misspecification of the parent distribution and their use should be encouraged in the typical situation where the parent distribution is unknown. When utilized correctly, less than values frequently contain nearly as much information for estimating population moments and quantiles as would the same observations had the detection limit been below them. -Authors
Accidents in Malaysian construction industry: statistical data and court cases.
Chong, Heap Yih; Low, Thuan Siang
2014-01-01
Safety and health issues remain critical to the construction industry due to its working environment and the complexity of working practises. This research attempts to adopt 2 research approaches using statistical data and court cases to address and identify the causes and behavior underlying construction safety and health issues in Malaysia. Factual data on the period of 2000-2009 were retrieved to identify the causes and agents that contributed to health issues. Moreover, court cases were tabulated and analyzed to identify legal patterns of parties involved in construction site accidents. Approaches of this research produced consistent results and highlighted a significant reduction in the rate of accidents per construction project in Malaysia. PMID:25189753
Multivariate statistical analysis of atom probe tomography data
Parish, Chad M; Miller, Michael K
2010-01-01
The application of spectrum imaging multivariate statistical analysis methods, specifically principal component analysis (PCA), to atom probe tomography (APT) data has been investigated. The mathematical method of analysis is described and the results for two example datasets are analyzed and presented. The first dataset is from the analysis of a PM 2000 Fe-Cr-Al-Ti steel containing two different ultrafine precipitate populations. PCA properly describes the matrix and precipitate phases in a simple and intuitive manner. A second APT example is from the analysis of an irradiated reactor pressure vessel steel. Fine, nm-scale Cu-enriched precipitates having a core-shell structure were identified and qualitatively described by PCA. Advantages, disadvantages, and future prospects for implementing these data analysis methodologies for APT datasets, particularly with regard to quantitative analysis, are also discussed.
Multivariate statistical analysis of atom probe tomography data.
Parish, Chad M; Miller, Michael K
2010-10-01
The application of spectrum imaging multivariate statistical analysis methods, specifically principal component analysis (PCA), to atom probe tomography (APT) data has been investigated. The mathematical method of analysis is described and the results for two example datasets are analyzed and presented. The first dataset is from the analysis of a PM 2000 Fe-Cr-Al-Ti steel containing two different ultrafine precipitate populations. PCA properly describes the matrix and precipitate phases in a simple and intuitive manner. A second APT example is from the analysis of an irradiated reactor pressure vessel steel. Fine, nm-scale Cu-enriched precipitates having a core-shell structure were identified and qualitatively described by PCA. Advantages, disadvantages, and future prospects for implementing these data analysis methodologies for APT datasets, particularly with regard to quantitative analysis, are also discussed. PMID:20650566
Statistical analysis of test data for APM rod issue
Edwards, T.B.; Harris, S.P.; Reeve, C.P.
1992-05-01
The uncertainty associated with the use of the K-Reactor axial power monitors (APMs) to measure roof-top-ratios is investigated in this report. Internal heating test data acquired under both DC-flow conditions and AC-flow conditions have been analyzed. These tests were conducted to simulate gamma heating at the lower power levels planned for reactor operation. The objective of this statistical analysis is to investigate the relationship between the observed and true roof-top-ratio (RTR) values and associated uncertainties at power levels within this lower operational range. Conditional on a given, known power level, a prediction interval for the true RTR value corresponding to a new, observed RTR is given. This is done for a range of power levels. Estimates of total system uncertainty are also determined by combining the analog-to-digital converter uncertainty with the results from the test data.
Advances and Pitfalls in the Analysis and Interpretation of Resting-State FMRI Data
Cole, David M.; Smith, Stephen M.; Beckmann, Christian F.
2010-01-01
The last 15 years have witnessed a steady increase in the number of resting-state functional neuroimaging studies. The connectivity patterns of multiple functional, distributed, large-scale networks of brain dynamics have been recognised for their potential as useful tools in the domain of systems and other neurosciences. The application of functional connectivity methods to areas such as cognitive psychology, clinical diagnosis and treatment progression has yielded promising preliminary results, but is yet to be fully realised. This is due, in part, to an array of methodological and interpretative issues that remain to be resolved. We here present a review of the methods most commonly applied in this rapidly advancing field, such as seed-based correlation analysis and independent component analysis, along with examples of their use at the individual subject and group analysis levels and a discussion of practical and theoretical issues arising from this data ‘explosion’. We describe the similarities and differences across these varied statistical approaches to processing resting-state functional magnetic resonance imaging signals, and conclude that further technical optimisation and experimental refinement is required in order to fully delineate and characterise the gross complexity of the human neural functional architecture. PMID:20407579
NASA Technical Reports Server (NTRS)
Feiveson, Alan H.; Foy, Millennia; Ploutz-Snyder, Robert; Fiedler, James
2014-01-01
Do you have elevated p-values? Is the data analysis process getting you down? Do you experience anxiety when you need to respond to criticism of statistical methods in your manuscript? You may be suffering from Insufficient Statistical Support Syndrome (ISSS). For symptomatic relief of ISSS, come for a free consultation with JSC biostatisticians at our help desk during the poster sessions at the HRP Investigators Workshop. Get answers to common questions about sample size, missing data, multiple testing, when to trust the results of your analyses and more. Side effects may include sudden loss of statistics anxiety, improved interpretation of your data, and increased confidence in your results.
Teschendorff, Andrew E; Sollich, Peter; Kuehn, Reimer
2014-06-01
A key challenge in systems biology is the elucidation of the underlying principles, or fundamental laws, which determine the cellular phenotype. Understanding how these fundamental principles are altered in diseases like cancer is important for translating basic scientific knowledge into clinical advances. While significant progress is being made, with the identification of novel drug targets and treatments by means of systems biological methods, our fundamental systems level understanding of why certain treatments succeed and others fail is still lacking. We here advocate a novel methodological framework for systems analysis and interpretation of molecular omic data, which is based on statistical mechanical principles. Specifically, we propose the notion of cellular signalling entropy (or uncertainty), as a novel means of analysing and interpreting omic data, and more fundamentally, as a means of elucidating systems-level principles underlying basic biology and disease. We describe the power of signalling entropy to discriminate cells according to differentiation potential and cancer status. We further argue the case for an empirical cellular entropy-robustness correlation theorem and demonstrate its existence in cancer cell line drug sensitivity data. Specifically, we find that high signalling entropy correlates with drug resistance and further describe how entropy could be used to identify the achilles heels of cancer cells. In summary, signalling entropy is a deep and powerful concept, based on rigorous statistical mechanical principles, which, with improved data quality and coverage, will allow a much deeper understanding of the systems biological principles underlying normal and disease physiology. PMID:24675401
Rigby, Sean P; Edler, Karen J
2002-06-01
The use of a semi-empirical alternative to the standard Washburn equation for the interpretation of raw mercury porosimetry data has been advocated. The alternative expression takes account of variations in both mercury contact angle and surface tension with pore size, for both advancing and retreating mercury meniscii. The semi-empirical equation presented was ultimately derived from electron microscopy data, obtained for controlled pore glasses by previous workers. It has been found that this equation is also suitable for the interpretation of raw data for sol-gel silica spheres. Interpretation of mercury porosimetry data using the alternative to the standard Washburn equation was found to give rise to pore sizes similar to those obtained from corresponding SAXS data. The interpretation of porosimetry data, for both whole and finely powdered silica spheres, using the alternative expression has demonstrated that the hysteresis and mercury entrapment observed for whole samples does not occur for fragmented samples. Therefore, for these materials, the structural hysteresis and overall level of mercury entrapment is caused by the macroscopic (> approximately 30 microm), and not the microscopic (< approximately 30 microm), properties of the porous medium. This finding suggested that mercury porosimetry may be used to obtain a statistical characterization of sample macroscopic structure similar to that obtained using MRI. In addition, from a comparison of the pore size distribution from porosimetry with that obtained using complementary nitrogen sorption data, it was found that, even in the absence of hysteresis and mercury entrapment, pore shielding effects were still present. This observation suggested that the mercury extrusion process does not occur by a piston-type retraction mechanism and, therefore, the usual method for the application of percolation concepts to mercury retraction is flawed. PMID:16290649
Generation of dense statistical connectomes from sparse morphological data
Egger, Robert; Dercksen, Vincent J.; Udvary, Daniel; Hege, Hans-Christian; Oberlaender, Marcel
2014-01-01
Sensory-evoked signal flow, at cellular and network levels, is primarily determined by the synaptic wiring of the underlying neuronal circuitry. Measurements of synaptic innervation, connection probabilities and subcellular organization of synaptic inputs are thus among the most active fields of research in contemporary neuroscience. Methods to measure these quantities range from electrophysiological recordings over reconstructions of dendrite-axon overlap at light-microscopic levels to dense circuit reconstructions of small volumes at electron-microscopic resolution. However, quantitative and complete measurements at subcellular resolution and mesoscopic scales to obtain all local and long-range synaptic in/outputs for any neuron within an entire brain region are beyond present methodological limits. Here, we present a novel concept, implemented within an interactive software environment called NeuroNet, which allows (i) integration of sparsely sampled (sub)cellular morphological data into an accurate anatomical reference frame of the brain region(s) of interest, (ii) up-scaling to generate an average dense model of the neuronal circuitry within the respective brain region(s) and (iii) statistical measurements of synaptic innervation between all neurons within the model. We illustrate our approach by generating a dense average model of the entire rat vibrissal cortex, providing the required anatomical data, and illustrate how to measure synaptic innervation statistically. Comparing our results with data from paired recordings in vitro and in vivo, as well as with reconstructions of synaptic contact sites at light- and electron-microscopic levels, we find that our in silico measurements are in line with previous results. PMID:25426033
Dziurkowska, Ewelina; Wesolowski, Marek
2015-01-01
Multivariate statistical analysis is widely used in medical studies as a profitable tool facilitating diagnosis of some diseases, for instance, cancer, allergy, pneumonia, or Alzheimer's and psychiatric diseases. Taking this in consideration, the aim of this study was to use two multivariate techniques, hierarchical cluster analysis (HCA) and principal component analysis (PCA), to disclose the relationship between the drugs used in the therapy of major depressive disorder and the salivary cortisol level and the period of hospitalization. The cortisol contents in saliva of depressed women were quantified by HPLC with UV detection day-to-day during the whole period of hospitalization. A data set with 16 variables (e.g., the patients' age, multiplicity and period of hospitalization, initial and final cortisol level, highest and lowest hormone level, mean contents, and medians) characterizing 97 subjects was used for HCA and PCA calculations. Multivariate statistical analysis reveals that various groups of antidepressants affect at the varying degree the salivary cortisol level. The SSRIs, SNRIs, and the polypragmasy reduce most effectively the hormone secretion. Thus, both unsupervised pattern recognition methods, HCA and PCA, can be used as complementary tools for interpretation of the results obtained by laboratory diagnostic methods. PMID:26380376
The International Coal Statistics Data Base operations guide
Not Available
1991-04-01
The International Coal Statistics Data base (ICSD) is a micro- computer based system which contains informations related to international coal trade. This includes coal production, consumption, imports and exports information. The ICSD is a secondary data base, meaning that information contained therein is derived entirely from other primary sources. It uses dBase 3+ and Lotus 1-2-3 to locate, report and display data. The system is used for analysis in preparing the Annual Prospects for World Coal Trade (DOE/EIA-0363) publication. The ICSD system is menu driven, and also permits the user who is familiar with dBase and Lotus operations to leave the menu structure to perform independent queries. Documentation for the ICSD consists of three manuals -- the User's Guide, the Operations Manual and the Program Maintenance Manual. This Operations Manual explains how to install the programs, how to obtain reports on coal trade, what systems requirements apply, and how to update the major data files. It also explains file naming conventions, what each file does, and the programming procedures used to make the system work. The Operations Manual explains how to make the system respond to customized queries. It is organized around the ICSD menu structure and describes what each selection will do. Sample reports and graphs generated from individual menu selection are provided to acquaint the user with the various types of output. 17 figs.
Fuzzy logic and image processing techniques for the interpretation of seismic data
NASA Astrophysics Data System (ADS)
Orozco-del-Castillo, M. G.; Ortiz-Alemán, C.; Urrutia-Fucugauchi, J.; Rodríguez-Castellanos, A.
2011-06-01
Since interpretation of seismic data is usually a tedious and repetitive task, the ability to do so automatically or semi-automatically has become an important objective of recent research. We believe that the vagueness and uncertainty in the interpretation process makes fuzzy logic an appropriate tool to deal with seismic data. In this work we developed a semi-automated fuzzy inference system to detect the internal architecture of a mass transport complex (MTC) in seismic images. We propose that the observed characteristics of a MTC can be expressed as fuzzy if-then rules consisting of linguistic values associated with fuzzy membership functions. The constructions of the fuzzy inference system and various image processing techniques are presented. We conclude that this is a well-suited problem for fuzzy logic since the application of the proposed methodology yields a semi-automatically interpreted MTC which closely resembles the MTC from expert manual interpretation.
Statistical analysis of heartbeat data with wavelet techniques
NASA Astrophysics Data System (ADS)
Pazsit, Imre
2004-05-01
The purpose of this paper is to demonstrate the use of some methods of signal analysis, performed on ECG and in some cases blood pressure signals, for the classification of the health status of the heart of mice and rats. Spectral and wavelet analysis were performed on the raw signals. FFT-based coherence and phase was also calculated between blood pressure and raw ECG signals. Finally, RR-intervals were deduced from the ECG signals and an analysis of the fractal dimensions was performed. The analysis was made on data from mice and rats. A correlation was found between the health status of the mice and the rats and some of the statistical descriptors, most notably the phase of the cross-spectra between ECG and blood pressure, and the fractal properties and dimensions of the interbeat series (RR-interval fluctuations).
Incorporating spatial context into statistical classification of multidimensional image data
NASA Technical Reports Server (NTRS)
Bauer, M. E. (Principal Investigator); Tilton, J. C.; Swain, P. H.
1981-01-01
Compound decision theory is employed to develop a general statistical model for classifying image data using spatial context. The classification algorithm developed from this model exploits the tendency of certain ground-cover classes to occur more frequently in some spatial contexts than in others. A key input to this contextural classifier is a quantitative characterization of this tendency: the context function. Several methods for estimating the context function are explored, and two complementary methods are recommended. The contextural classifier is shown to produce substantial improvements in classification accuracy compared to the accuracy produced by a non-contextural uniform-priors maximum likelihood classifier when these methods of estimating the context function are used. An approximate algorithm, which cuts computational requirements by over one-half, is presented. The search for an optimal implementation is furthered by an exploration of the relative merits of using spectral classes or information classes for classification and/or context function estimation.
Statistical approach to the analysis of cell desynchronization data
NASA Astrophysics Data System (ADS)
Milotti, Edoardo; Del Fabbro, Alessio; Dalla Pellegrina, Chiara; Chignola, Roberto
2008-07-01
Experimental measurements on semi-synchronous tumor cell populations show that after a few cell cycles they desynchronize completely, and this desynchronization reflects the intercell variability of cell-cycle duration. It is important to identify the sources of randomness that desynchronize a population of cells living in a homogeneous environment: for example, being able to reduce randomness and induce synchronization would aid in targeting tumor cells with chemotherapy or radiotherapy. Here we describe a statistical approach to the analysis of the desynchronization measurements that is based on minimal modeling hypotheses, and can be derived from simple heuristics. We use the method to analyze existing desynchronization data and to draw conclusions on the randomness of cell growth and proliferation.
Multispectral data acquisition and classification - Statistical models for system design
NASA Technical Reports Server (NTRS)
Huck, F. O.; Park, S. K.
1978-01-01
In this paper we relate the statistical processes that are involved in multispectral data acquisition and classification to a simple radiometric model of the earth surface and atmosphere. If generalized, these formulations could provide an analytical link between the steadily improving models of our environment and the performance characteristics of rapidly advancing device technology. This link is needed to bring system analysis tools to the task of optimizing remote sensing and (real-time) signal processing systems as a function of target and atmospheric properties, remote sensor spectral bands and system topology (e.g., image-plane processing), radiometric sensitivity and calibration accuracy, compensation for imaging conditions (e.g., atmospheric effects), and classification rates and errors.
Statistical Analysis of Data with Non-Detectable Values
Frome, E.L.
2004-08-26
Environmental exposure measurements are, in general, positive and may be subject to left censoring, i.e. the measured value is less than a ''limit of detection''. In occupational monitoring, strategies for assessing workplace exposures typically focus on the mean exposure level or the probability that any measurement exceeds a limit. A basic problem of interest in environmental risk assessment is to determine if the mean concentration of an analyte is less than a prescribed action level. Parametric methods, used to determine acceptable levels of exposure, are often based on a two parameter lognormal distribution. The mean exposure level and/or an upper percentile (e.g. the 95th percentile) are used to characterize exposure levels, and upper confidence limits are needed to describe the uncertainty in these estimates. In certain situations it is of interest to estimate the probability of observing a future (or ''missed'') value of a lognormal variable. Statistical methods for random samples (without non-detects) from the lognormal distribution are well known for each of these situations. In this report, methods for estimating these quantities based on the maximum likelihood method for randomly left censored lognormal data are described and graphical methods are used to evaluate the lognormal assumption. If the lognormal model is in doubt and an alternative distribution for the exposure profile of a similar exposure group is not available, then nonparametric methods for left censored data are used. The mean exposure level, along with the upper confidence limit, is obtained using the product limit estimate, and the upper confidence limit on the 95th percentile (i.e. the upper tolerance limit) is obtained using a nonparametric approach. All of these methods are well known but computational complexity has limited their use in routine data analysis with left censored data. The recent development of the R environment for statistical data analysis and graphics has greatly
NASA Technical Reports Server (NTRS)
Reder, F.; Hargrave, J.; Crouchley, J.
1972-01-01
The specific applications of very low frequency phase tracking are described. The requirements for correct interpretation of very low frequency phase data are defined. The effects of the lower ionosphere and the ground along the path of signal propagation are analyzed. The following subjects are discussed: (1) interpretation equipment, (2) representation of very low frequency waves, (3) diurnal effects and mode interference phenomena, (4) antipodal interference, and (5) overall effects resulting from solar flares, galactic X-rays, and geomagnetic parameters.
ERIC Educational Resources Information Center
Barner, David; Snedeker, Jesse
2008-01-01
Four experiments investigated 4-year-olds' understanding of adjective-noun compositionality and their sensitivity to statistics when interpreting scalar adjectives. In Experiments 1 and 2, children selected "tall" and "short" items from 9 novel objects called "pimwits" (1-9 in. in height) or from this array plus 4 taller or shorter distractor…
Chen, Chih-Hao; Hsu, Chueh-Lin; Huang, Shih-Hao; Chen, Shih-Yuan; Hung, Yi-Lin; Chen, Hsiao-Rong; Wu, Yu-Chung
2015-01-01
Although genome-wide expression analysis has become a routine tool for gaining insight into molecular mechanisms, extraction of information remains a major challenge. It has been unclear why standard statistical methods, such as the t-test and ANOVA, often lead to low levels of reproducibility, how likely applying fold-change cutoffs to enhance reproducibility is to miss key signals, and how adversely using such methods has affected data interpretations. We broadly examined expression data to investigate the reproducibility problem and discovered that molecular heterogeneity, a biological property of genetically different samples, has been improperly handled by the statistical methods. Here we give a mathematical description of the discovery and report the development of a statistical method, named HTA, for better handling molecular heterogeneity. We broadly demonstrate the improved sensitivity and specificity of HTA over the conventional methods and show that using fold-change cutoffs has lost much information. We illustrate the especial usefulness of HTA for heterogeneous diseases, by applying it to existing data sets of schizophrenia, bipolar disorder and Parkinson’s disease, and show it can abundantly and reproducibly uncover disease signatures not previously detectable. Based on 156 biological data sets, we estimate that the methodological issue has affected over 96% of expression studies and that HTA can profoundly correct 86% of the affected data interpretations. The methodological advancement can better facilitate systems understandings of biological processes, render biological inferences that are more reliable than they have hitherto been and engender translational medical applications, such as identifying diagnostic biomarkers and drug prediction, which are more robust. PMID:25793610
The International Coal Statistics Data Base program maintenance guide
Not Available
1991-06-01
The International Coal Statistics Data Base (ICSD) is a microcomputer-based system which contains information related to international coal trade. This includes coal production, consumption, imports and exports information. The ICSD is a secondary data base, meaning that information contained therein is derived entirely from other primary sources. It uses dBase III+ and Lotus 1-2-3 to locate, report and display data. The system is used for analysis in preparing the Annual Prospects for World Coal Trade (DOE/EIA-0363) publication. The ICSD system is menu driven and also permits the user who is familiar with dBase and Lotus operations to leave the menu structure to perform independent queries. Documentation for the ICSD consists of three manuals -- the User's Guide, the Operations Manual, and the Program Maintenance Manual. This Program Maintenance Manual provides the information necessary to maintain and update the ICSD system. Two major types of program maintenance documentation are presented in this manual. The first is the source code for the dBase III+ routines and related non-dBase programs used in operating the ICSD. The second is listings of the major component database field structures. A third important consideration for dBase programming, the structure of index files, is presented in the listing of source code for the index maintenance program. 1 fig.
Data Analysis & Statistical Methods for Command File Errors
NASA Technical Reports Server (NTRS)
Meshkat, Leila; Waggoner, Bruce; Bryant, Larry
2014-01-01
This paper explains current work on modeling for managing the risk of command file errors. It is focused on analyzing actual data from a JPL spaceflight mission to build models for evaluating and predicting error rates as a function of several key variables. We constructed a rich dataset by considering the number of errors, the number of files radiated, including the number commands and blocks in each file, as well as subjective estimates of workload and operational novelty. We have assessed these data using different curve fitting and distribution fitting techniques, such as multiple regression analysis, and maximum likelihood estimation to see how much of the variability in the error rates can be explained with these. We have also used goodness of fit testing strategies and principal component analysis to further assess our data. Finally, we constructed a model of expected error rates based on the what these statistics bore out as critical drivers to the error rate. This model allows project management to evaluate the error rate against a theoretically expected rate as well as anticipate future error rates.
Slow and fast solar wind - data selection and statistical analysis
NASA Astrophysics Data System (ADS)
Wawrzaszek, Anna; Macek, Wiesław M.; Bruno, Roberto; Echim, Marius
2014-05-01
In this work we consider the important problem of selection of slow and fast solar wind data measured in-situ by the Ulysses spacecraft during two solar minima (1995-1997, 2007-2008) and solar maximum (1999-2001). To recognise different types of solar wind we use a set of following parameters: radial velocity, proton density, proton temperature, the distribution of charge states of oxygen ions, and compressibility of magnetic field. We present how this idea of the data selection works on Ulysses data. In the next step we consider the chosen intervals for fast and slow solar wind and perform statistical analysis of the fluctuating magnetic field components. In particular, we check the possibility of identification of inertial range by considering the scale dependence of the third and fourth orders scaling exponents of structure function. We try to verify the size of inertial range depending on the heliographic latitudes, heliocentric distance and phase of the solar cycle. Research supported by the European Community's Seventh Framework Programme (FP7/2007 - 2013) under grant agreement no 313038/STORM.
Statistical characterization of actigraphy data during sleep and wakefulness states.
Adamec, Ondrej; Domingues, Alexandre; Paiva, Teresa; Sanches, J Miguel
2010-01-01
Human activity can be measured with actimetry sensors used by the subjects in several locations such as the wrists or legs. Actigraphy data is used in different contexts such as sports training or tele-medicine monitoring. In the diagnosis of sleep disorders, the actimetry sensor, which is basically a 3D axis accelerometer, is used by the patient in the non dominant wrist typically during an entire week. In this paper the actigraphy data is described by a weighted mixture of two distributions where the weight evolves along the day according to the patient circadian cycle. Thus, one of the distributions is mainly associated with the wakefulness state while the other is associated with the sleep state. Actigraphy data, acquired from 20 healthy patients and manually segmented by trained technicians, is used to characterize the acceleration magnitude during sleep and wakefulness states. Several mixture combinations are tested and statistically validated with conformity measures. It is shown that both distributions can co-exist at a certain time with varying importance along the circadian cycle. PMID:21097022
The International Coal Statistics Data Base user's guide
Not Available
1991-06-01
The ICSD is a microcomputer-based system which presents four types of data: (1) the quantity of coal traded between importers and exporters, (2) the price of particular ranks of coal and the cost of shipping it in world trade, (3) a detailed look at coal shipments entering and leaving the United States, and (4) the context for world coal trade in the form of data on how coal and other primary energy sources are used now and are projected to be used in the future, especially by major industrial economies. The ICSD consists of more than 140 files organized into a rapid query system for coal data. It can operate on any IBM-compatible microcomputer with 640 kilobytes memory and a hard disk drive with at least 8 megabytes of available space. The ICSD is: 1. A menu-driven, interactive data base using Dbase 3+ and Lotus 1-2-3. 2. Inputs include official and commercial statistics on international coal trade volumes and consumption. 3. Outputs include dozens of reports and color graphic displays. Output report type include Lotus worksheets, dBase data bases, ASCII text files, screen displays, and printed reports. 4. Flexible design permits user to follow structured query system or design his own queries using either Lotus or dBase procedures. 5. Incudes maintenance programs to configure the system, correct indexing errors, back-up work, restore corrupted files, annotate user-created files and update system programs, use DOS shells, and much more. Forecasts and other information derived from the ICSD are published in EIA's Annual Prospects for World Coal Trade (DOE/EIA-0363).
ERIC Educational Resources Information Center
Neumann, David L.; Hood, Michelle; Neumann, Michelle M.
2013-01-01
Many teachers of statistics recommend using real-life data during class lessons. However, there has been little systematic study of what effect this teaching method has on student engagement and learning. The present study examined this question in a first-year university statistics course. Students (n = 38) were interviewed and their reflections…
Statistical characteristics of ionospheric variability using oblique sounding data
NASA Astrophysics Data System (ADS)
Kurkin, Vladimir; Polekh, Nelya; Ivanova, Vera; Dumbrava, Zinaida; Podelsky, Igor
Using data from oblique sounding obtained over two paths Magadan-Irkutsk and Khabarovsk-Irkutsk in the 2006-2011 the statistical parameters of ionospheric variability are studied during equinox and the winter solstice. It was shown that the probability of maximum observed frequency registration with average standard deviations from the median in the range 5-10% in winter is 0.43, in spring and autumn - 0.64 over Magadan-Irkutsk path. In winter during daytime standard deviation does not exceed 10%, and at night it reaches 20% or more. During the equinox the daytime standard deviation increases to 12%, and at night it does not exceed 16%. This may be due to changes in lighting conditions at the midpoint of the path (58.2(°) N, 124.2(°) E). As far Khabarovsk-Irkutsk path standard deviations from their median less than the ones obtained for Magadan-Irkutsk path. The estimations are consistent with previously obtained results deduced from the vertical sounding data. The study was done under RF President Grant of Public Support for RF Leading Scientific Schools (NSh-2942.2014.5) and RFBR Grant No 14-05-00259.
Statistical attribution of temporal variability in global gridded temperature data
NASA Astrophysics Data System (ADS)
Pisoft, P.; Miksovsky, J.
2015-12-01
Spatiotemporal variability within the climate system results from a complex interaction of various exogenous and endogenous factors, yet the understanding of the specific role of individual climate-forming agents is still incomplete. In this contribution, near-surface monthly temperature anomalies from several gridded datasets (GISTEMP, Berkeley Earth, MLOST, HadCRUT4, 20th Century Reanalysis) are investigated for presence of components attributable to external forcings (anthropogenic, solar and volcanic) as well as to internal forcings related to major climate variability modes (El Niño / Southern Oscillation, North Atlantic Oscillation, Atlantic Multidecadal Oscillation and Pacific Decadal Oscillation). Statistical methodology based on multiple linear regression is employed, and applied to monthly temperature data for the 1901-2010 period. The results presented illustrate the spatial fingerprints of individual forcing factors and their robustness (or lack thereof) among individual temperature datasets. Particular attention is devoted to the specific features of the 20th Century Reanalysis: It is demonstrated that while most of the response patterns are represented similarly in the reanalysis data and in their analysis-type counterparts, some distinctions appear, especially for the components associated with anthropogenic forcing and volcanic activity.
Interpreters in cross-cultural interviews: a three-way coconstruction of data.
Björk Brämberg, Elisabeth; Dahlberg, Karin
2013-02-01
Our focus in this article is research interviews that involve two languages. We present an epistemological and methodological analysis of the meaning of qualitative interviewing with an interpreter. The results of the analysis show that such interviewing is not simply exchanging words between two languages, but means understanding, grasping the essential meanings of the spoken words, which requires an interpreter to bridge the different horizons of understanding. Consequently, a research interview including an interpreter means a three-way coconstruction of data. We suggest that interpreters be thoroughly introduced into the research process and research interview technique, that they take part in the preparations for the interview event, and evaluate the translation process with the researcher and informant after the interview. PMID:23258420
Kuo, Kuan-Liang; Fuh, Chiou-Shann
2011-12-01
Health examinations can obtain relatively complete health information and thus are important for the personal and public health management. For clinicians, one of the most important works in the health examinations is to interpret the health examination results. Continuously interpreting numerous health examination results of healthcare receivers is tedious and error-prone. This paper proposes a clinical decision support system to assist solving above problems. In order to customize the clinical decision support system intuitively and flexibly, this paper also proposes the rule syntax to implement computer-interpretable logic for health examinations. It is our purpose in this paper to describe the methodology of the proposed clinical decision support system. The evaluation was performed by the implementation and execution of decision rules on health examination results and a survey on clinical decision support system users. It reveals the efficiency and user satisfaction of proposed clinical decision support system. Positive impact of clinical data interpretation is also noted. PMID:20703517
Analysis of statistical model properties from discrete nuclear structure data
NASA Astrophysics Data System (ADS)
Firestone, Richard B.
2012-02-01
Experimental M1, E1, and E2 photon strengths have been compiled from experimental data in the Evaluated Nuclear Structure Data File (ENSDF) and the Evaluated Gamma-ray Activation File (EGAF). Over 20,000 Weisskopf reduced transition probabilities were recovered from the ENSDF and EGAF databases. These transition strengths have been analyzed for their dependence on transition energies, initial and final level energies, spin/parity dependence, and nuclear deformation. ENSDF BE1W values were found to increase exponentially with energy, possibly consistent with the Axel-Brink hypothesis, although considerable excess strength observed for transitions between 4-8 MeV. No similar energy dependence was observed in EGAF or ARC data. BM1W average values were nearly constant at all energies above 1 MeV with substantial excess strength below 1 MeV and between 4-8 MeV. BE2W values decreased exponentially by a factor of 1000 from 0 to 16 MeV. The distribution of ENSDF transition probabilities for all multipolarities could be described by a lognormal statistical distribution. BE1W, BM1W, and BE2W strengths all increased substantially for initial transition level energies between 4-8 MeV possibly due to dominance of spin-flip and Pygmy resonance transitions at those excitations. Analysis of the average resonance capture data indicated no transition probability dependence on final level spins or energies between 0-3 MeV. The comparison of favored to unfavored transition probabilities for odd-A or odd-Z targets indicated only partial support for the expected branching intensity ratios with many unfavored transitions having nearly the same strength as favored ones. Average resonance capture BE2W transition strengths generally increased with greater deformation. Analysis of ARC data suggest that there is a large E2 admixture in M1 transitions with the mixing ratio δ ≈ 1.0. The ENSDF reduced transition strengths were considerably stronger than those derived from capture gamma ray
77 FR 65585 - Renewal of the Bureau of Labor Statistics Data Users Advisory Committee
Federal Register 2010, 2011, 2012, 2013, 2014
2012-10-29
... of Labor Statistics Renewal of the Bureau of Labor Statistics Data Users Advisory Committee The... determined that the renewal of the Bureau of Labor Statistics Data Users Advisory Committee (the ``Committee... of Labor Statistics by 29 U.S.C. 1 and 2. This determination follows consultation with the...
ERIC Educational Resources Information Center
Singamsetti, Rao
2007-01-01
In this paper an attempt is made to highlight some issues of interpretation of statistical concepts and interpretation of results as taught in undergraduate Business statistics courses. The use of modern technology in the class room is shown to have increased the efficiency and the ease of learning and teaching in statistics. The importance of…
Dejaegher, Bieke; Heyden, Yvan Vander
2011-09-10
In this review, the set-up and data interpretation of experimental designs (screening, response surface, and mixture designs) are discussed. Advanced set-ups considered are the application of D-optimal and supersaturated designs as screening designs. Advanced data interpretation approaches discussed are an adaptation of the algorithm of Dong and the estimation of factor effects from supersaturated design results. Finally, some analytical applications in separation science, on the one hand, and formulation-, product-, or process optimization, on the other, are discussed. PMID:21632194
When outside the norm is normal: interpreting lab data in the aged.
Cavalieri, T A; Chopra, A; Bryman, P N
1992-05-01
Laboratory tests that are frequently ordered in a primary care practice include complete blood count, arterial blood gases, erythrocyte sedimentation rate, creatinine clearance, and glucose tolerance. Yet, the impact of age-associated physiologic changes on interpretation of laboratory data has only recently been elucidated. With advancing age, many laboratory parameters increase or decrease, some remain unchanged, and the effect of age on still others remains unclear. Interpreting lab data in the elderly is further confounded by the multiple disease states, polypharmacy, and atypical disease presentations commonly found in this population. Additional clinical research is needed to better establish reference laboratory values in elderly patients. PMID:1577283
Quantum of area {Delta}A=8{pi}l{sub P}{sup 2} and a statistical interpretation of black hole entropy
Ropotenko, Kostiantyn
2010-08-15
In contrast to alternative values, the quantum of area {Delta}A=8{pi}l{sub P}{sup 2} does not follow from the usual statistical interpretation of black hole entropy; on the contrary, a statistical interpretation follows from it. This interpretation is based on the two concepts: nonadditivity of black hole entropy and Landau quantization. Using nonadditivity a microcanonical distribution for a black hole is found and it is shown that the statistical weight of a black hole should be proportional to its area. By analogy with conventional Landau quantization, it is shown that quantization of a black hole is nothing but the Landau quantization. The Landau levels of a black hole and their degeneracy are found. The degree of degeneracy is equal to the number of ways to distribute a patch of area 8{pi}l{sub P}{sup 2} over the horizon. Taking into account these results, it is argued that the black hole entropy should be of the form S{sub bh}=2{pi}{center_dot}{Delta}{Gamma}, where the number of microstates is {Delta}{Gamma}=A/8{pi}l{sub P}{sup 2}. The nature of the degrees of freedom responsible for black hole entropy is elucidated. The applications of the new interpretation are presented. The effect of noncommuting coordinates is discussed.
Interpretation of Ground Penetrating Radar data at the Hanford Site, Richland, Washington
Bergstrom, K.A.; Mitchell, T.H.; Kunk, J.R.
1993-07-01
Ground Penetrating Radar (GPR) is being used extensively during characterization and remediation of chemical and radioactive waste sites at the Hanford Site in Washington State. Time and money for GPR investigations are often not included during the planning and budgeting phase. Therefore GPR investigations must be inexpensive and quick to minimize impact on already established budgets and schedules. An approach to survey design, data collection, and interpretation has been developed which emphasizes speed and budget with minimal impact on the integrity of the interpretation or quality of the data. The following simple rules of thumb can be applied: (1) Assemble as much pre-survey information as possible, (2) Clearly define survey objectives prior to designing the survey and determine which combination of geophysical methods will best meet the objectives, (3) Continuously communicate with the client, before, during and after the investigation, (4) Only experienced GPR interpreters should acquire the field data, (5) Use real-time monitoring of the data to determine where and how much data to collect and assist in the interpretation, (6) Always ``error`` in favor of collecting too much data, (7) Surveys should have closely spaced (preferably 5 feet, no more than 10 feet), orthogonal profiles, (8) When possible, pull the antenna by hand.
Geochemical portray of the Pacific Ridge: New isotopic data and statistical techniques
NASA Astrophysics Data System (ADS)
Hamelin, Cédric; Dosso, Laure; Hanan, Barry B.; Moreira, Manuel; Kositsky, Andrew P.; Thomas, Marion Y.
2011-02-01
Samples collected during the PACANTARCTIC 2 cruise fill a sampling gap from 53° to 41° S along the Pacific Antarctic Ridge (PAR). Analysis of Sr, Nd, Pb, Hf, and He isotope compositions of these new samples is shown together with published data from 66°S to 53°S and from the EPR. The recent advance in analytical mass spectrometry techniques generates a spectacular increase in the number of multidimensional isotopic data for oceanic basalts. Working with such multidimensional datasets generates a new approach for the data interpretation, preferably based on statistical analysis techniques. Principal Component Analysis (PCA) is a powerful mathematical tool to study this type of datasets. The purpose of PCA is to reduce the number of dimensions by keeping only those characteristics that contribute most to its variance. Using this technique, it becomes possible to have a statistical picture of the geochemical variations along the entire Pacific Ridge from 70°S to 10°S. The incomplete sampling of the ridge led previously to the identification of a large-scale division of the south Pacific mantle at the latitude of Easter Island. The PCA method applied here to the completed dataset reveals a different geochemical profile. Along the Pacific Ridge, a large-scale bell-shaped variation with an extremum at about 38°S of latitude is interpreted as a progressive change in the geochemical characteristics of the depleted matrix of the mantle. This Pacific Isotopic Bump (PIB) is also noticeable in the He isotopic ratio along-axis variation. The linear correlation observed between He and heavy radiogenic isotopes, together with the result of the PCA calculation, suggests that the large-scale variation is unrelated to the plume-ridge interactions in the area and should rather be attributed to the partial melting of a marble-cake assemblage.
STATISTICAL ESTIMATION AND VISUALIZATION OF GROUND-WATER CONTAMINATION DATA
This work presents methods of visualizing and animating statistical estimates of ground water and/or soil contamination over a region from observations of the contaminant for that region. The primary statistical methods used to produce the regional estimates are nonparametric re...
Applying Statistical Process Control to Clinical Data: An Illustration.
ERIC Educational Resources Information Center
Pfadt, Al; And Others
1992-01-01
Principles of statistical process control are applied to a clinical setting through the use of control charts to detect changes, as part of treatment planning and clinical decision-making processes. The logic of control chart analysis is derived from principles of statistical inference. Sample charts offer examples of evaluating baselines and…
What defines an Expert? - Uncertainty in the interpretation of seismic data
NASA Astrophysics Data System (ADS)
Bond, C. E.
2008-12-01
Studies focusing on the elicitation of information from experts are concentrated primarily in economics and world markets, medical practice and expert witness testimonies. Expert elicitation theory has been applied in the natural sciences, most notably in the prediction of fluid flow in hydrological studies. In the geological sciences expert elicitation has been limited to theoretical analysis with studies focusing on the elicitation element, gaining expert opinion rather than necessarily understanding the basis behind the expert view. In these cases experts are defined in a traditional sense, based for example on: standing in the field, no. of years of experience, no. of peer reviewed publications, the experts position in a company hierarchy or academia. Here traditional indicators of expertise have been compared for significance on affective seismic interpretation. Polytomous regression analysis has been used to assess the relative significance of length and type of experience on the outcome of a seismic interpretation exercise. Following the initial analysis the techniques used by participants to interpret the seismic image were added as additional variables to the analysis. Specific technical skills and techniques were found to be more important for the affective geological interpretation of seismic data than the traditional indicators of expertise. The results of a seismic interpretation exercise, the techniques used to interpret the seismic and the participant's prior experience have been combined and analysed to answer the question - who is and what defines an expert?
Improving the Statistical Methodology of Astronomical Data Analysis
NASA Astrophysics Data System (ADS)
Feigelson, Eric D.; Babu, Gutti Jogesh
Contemporary observational astronomers are generally unfamiliar with the extensive advances made in mathematical and applied statistics during the past several decades. Astronomical problems can often be addressed by methods developed in statistical fields such as spatial point processes, density estimation, Bayesian statistics, and sampling theory. The common problem of bivariate linear regression illustrates the need for sophisticated methods. Astronomical problems often require combinations of ordinary least-squares lines, double-weighted and errors-in-variables models, censored and truncated regressions, each with its own error analysis procedure. The recent conference Statistical Challenges in Modern Astronomy highlighted issues of mutual interest to statisticians and astronomers including clustering of point processes and time series analysis. We conclude with advice on how the astronomical community can advance its statistical methodology with improvements in education of astrophysicists, collaboration and consultation with professional statisticians, and acquisition of new software.
EBprot: Statistical analysis of labeling-based quantitative proteomics data.
Koh, Hiromi W L; Swa, Hannah L F; Fermin, Damian; Ler, Siok Ghee; Gunaratne, Jayantha; Choi, Hyungwon
2015-08-01
Labeling-based proteomics is a powerful method for detection of differentially expressed proteins (DEPs). The current data analysis platform typically relies on protein-level ratios, which is obtained by summarizing peptide-level ratios for each protein. In shotgun proteomics, however, some proteins are quantified with more peptides than others, and this reproducibility information is not incorporated into the differential expression (DE) analysis. Here, we propose a novel probabilistic framework EBprot that directly models the peptide-protein hierarchy and rewards the proteins with reproducible evidence of DE over multiple peptides. To evaluate its performance with known DE states, we conducted a simulation study to show that the peptide-level analysis of EBprot provides better receiver-operating characteristic and more accurate estimation of the false discovery rates than the methods based on protein-level ratios. We also demonstrate superior classification performance of peptide-level EBprot analysis in a spike-in dataset. To illustrate the wide applicability of EBprot in different experimental designs, we applied EBprot to a dataset for lung cancer subtype analysis with biological replicates and another dataset for time course phosphoproteome analysis of EGF-stimulated HeLa cells with multiplexed labeling. Through these examples, we show that the peptide-level analysis of EBprot is a robust alternative to the existing statistical methods for the DE analysis of labeling-based quantitative datasets. The software suite is freely available on the Sourceforge website http://ebprot.sourceforge.net/. All MS data have been deposited in the ProteomeXchange with identifier PXD001426 (http://proteomecentral.proteomexchange.org/dataset/PXD001426/). PMID:25913743
Statistical methods for detecting periodic fragments in DNA sequence data
2011-01-01
Background Period 10 dinucleotides are structurally and functionally validated factors that influence the ability of DNA to form nucleosomes, histone core octamers. Robust identification of periodic signals in DNA sequences is therefore required to understand nucleosome organisation in genomes. While various techniques for identifying periodic components in genomic sequences have been proposed or adopted, the requirements for such techniques have not been considered in detail and confirmatory testing for a priori specified periods has not been developed. Results We compared the estimation accuracy and suitability for confirmatory testing of autocorrelation, discrete Fourier transform (DFT), integer period discrete Fourier transform (IPDFT) and a previously proposed Hybrid measure. A number of different statistical significance procedures were evaluated but a blockwise bootstrap proved superior. When applied to synthetic data whose period-10 signal had been eroded, or for which the signal was approximately period-10, the Hybrid technique exhibited superior properties during exploratory period estimation. In contrast, confirmatory testing using the blockwise bootstrap procedure identified IPDFT as having the greatest statistical power. These properties were validated on yeast sequences defined from a ChIP-chip study where the Hybrid metric confirmed the expected dominance of period-10 in nucleosome associated DNA but IPDFT identified more significant occurrences of period-10. Application to the whole genomes of yeast and mouse identified ~ 21% and ~ 19% respectively of these genomes as spanned by period-10 nucleosome positioning sequences (NPS). Conclusions For estimating the dominant period, we find the Hybrid period estimation method empirically to be the most effective for both eroded and approximate periodicity. The blockwise bootstrap was found to be effective as a significance measure, performing particularly well in the problem of period detection in the
Characterization of spatial statistics of distributed targets in SAR data. [applied to sea-ice data
NASA Technical Reports Server (NTRS)
Rignot, E.; Kwok, R.
1993-01-01
A statistical approach to the analysis of spatial statistics in polarimetric multifrequency SAR data, which is aimed at extracting the intrinsic variability of the target by removing variability from other sources, is presented. An image model, which takes into account three sources of spatial variability, namely, image speckle, system noise, and the intrinsic spatial variability of the target or texture, is described. It is shown that the presence of texture increases the image variance-to-mean square ratio and introduces deviations of the image autocovariance function from the expected SAR system response. The approach is exemplified by sea-ice SAR imagery acquired by the Jet Propulsion Laboratory three-frequency polarimetric airborne SAR. Data obtained indicate that, for different sea-ice types, the spatial statistics seem to vary more across frequency than across polarization and the observed differences increase in magnitude with decreasing frequency.
ISSUES IN THE STATISTICAL ANALYSIS OF SMALL-AREA HEALTH DATA. (R825173)
The availability of geographically indexed health and population data, with advances in computing, geographical information systems and statistical methodology, have opened the way for serious exploration of small area health statistics based on routine data. Such analyses may be...
Lin, Meng Kuan; Nicolini, Oliver; Waxenegger, Harald; Galloway, Graham J.; Ullmann, Jeremy F. P.; Janke, Andrew L.
2013-01-01
Digital Imaging Processing (DIP) requires data extraction and output from a visualization tool to be consistent. Data handling and transmission between the server and a user is a systematic process in service interpretation. The use of integrated medical services for management and viewing of imaging data in combination with a mobile visualization tool can be greatly facilitated by data analysis and interpretation. This paper presents an integrated mobile application and DIP service, called M-DIP. The objective of the system is to (1) automate the direct data tiling, conversion, pre-tiling of brain images from Medical Imaging NetCDF (MINC), Neuroimaging Informatics Technology Initiative (NIFTI) to RAW formats; (2) speed up querying of imaging measurement; and (3) display high-level of images with three dimensions in real world coordinates. In addition, M-DIP provides the ability to work on a mobile or tablet device without any software installation using web-based protocols. M-DIP implements three levels of architecture with a relational middle-layer database, a stand-alone DIP server, and a mobile application logic middle level realizing user interpretation for direct querying and communication. This imaging software has the ability to display biological imaging data at multiple zoom levels and to increase its quality to meet users’ expectations. Interpretation of bioimaging data is facilitated by an interface analogous to online mapping services using real world coordinate browsing. This allows mobile devices to display multiple datasets simultaneously from a remote site. M-DIP can be used as a measurement repository that can be accessed by any network environment, such as a portable mobile or tablet device. In addition, this system and combination with mobile applications are establishing a virtualization tool in the neuroinformatics field to speed interpretation services. PMID:23847587
Lin, Meng Kuan; Nicolini, Oliver; Waxenegger, Harald; Galloway, Graham J; Ullmann, Jeremy F P; Janke, Andrew L
2013-01-01
Digital Imaging Processing (DIP) requires data extraction and output from a visualization tool to be consistent. Data handling and transmission between the server and a user is a systematic process in service interpretation. The use of integrated medical services for management and viewing of imaging data in combination with a mobile visualization tool can be greatly facilitated by data analysis and interpretation. This paper presents an integrated mobile application and DIP service, called M-DIP. The objective of the system is to (1) automate the direct data tiling, conversion, pre-tiling of brain images from Medical Imaging NetCDF (MINC), Neuroimaging Informatics Technology Initiative (NIFTI) to RAW formats; (2) speed up querying of imaging measurement; and (3) display high-level of images with three dimensions in real world coordinates. In addition, M-DIP provides the ability to work on a mobile or tablet device without any software installation using web-based protocols. M-DIP implements three levels of architecture with a relational middle-layer database, a stand-alone DIP server, and a mobile application logic middle level realizing user interpretation for direct querying and communication. This imaging software has the ability to display biological imaging data at multiple zoom levels and to increase its quality to meet users' expectations. Interpretation of bioimaging data is facilitated by an interface analogous to online mapping services using real world coordinate browsing. This allows mobile devices to display multiple datasets simultaneously from a remote site. M-DIP can be used as a measurement repository that can be accessed by any network environment, such as a portable mobile or tablet device. In addition, this system and combination with mobile applications are establishing a virtualization tool in the neuroinformatics field to speed interpretation services. PMID:23847587
GoMiner: a resource for biological interpretation of genomic and proteomic data
Zeeberg, Barry R; Feng, Weimin; Wang, Geoffrey; Wang, May D; Fojo, Anthony T; Sunshine, Margot; Narasimhan, Sudarshan; Kane, David W; Reinhold, William C; Lababidi, Samir; Bussey, Kimberly J; Riss, Joseph; Barrett, J Carl; Weinstein, John N
2003-01-01
We have developed GoMiner, a program package that organizes lists of 'interesting' genes (for example, under- and overexpressed genes from a microarray experiment) for biological interpretation in the context of the Gene Ontology. GoMiner provides quantitative and statistical output files and two useful visualizations. The first is a tree-like structure analogous to that in the AmiGO browser and the second is a compact, dynamically interactive 'directed acyclic graph'. Genes displayed in GoMiner are linked to major public bioinformatics resources. PMID:12702209
Robust statistical approaches to assess the degree of agreement of clinical data
NASA Astrophysics Data System (ADS)
Grilo, Luís M.; Grilo, Helena L.
2016-06-01
To analyze the blood of patients who took vitamin B12 for a period of time, two different medicine measurement methods were used (one is the established method, with more human intervention, and the other method uses essentially machines). Given the non-normality of the differences between both measurement methods, the limits of agreement are estimated using also a non-parametric approach to assess the degree of agreement of the clinical data. The bootstrap resampling method is applied in order to obtain robust confidence intervals for mean and median of differences. The approaches used are easy to apply, running a friendly software, and their outputs are also easy to interpret. In this case study the results obtained with (non)parametric approaches lead us to different statistical conclusions, but the decision whether agreement is acceptable or not is always a clinical judgment.
Global statistical analysis of TOPEX and POSEIDON data
NASA Astrophysics Data System (ADS)
Le Traon, P. Y.; Stum, J.; Dorandeu, J.; Gaspar, P.; Vincent, P.
1994-12-01
A global statistical analysis of the first 10 months of TOPEX/POSEIDON merged geophysical data records is presented. The global crossover analysis using the Cartwright and Ray (1990) (CR) tide model and Gaspar et al. (this issue) electromagnetic bias parameterization yields a sea level RMS crossover difference of 10.05 cm, 10.15 cm, and 10.15 cm for TOPEX-TOPEX, POSEIDON-POSEIDON, and TOPEX-POSEIDON crossovers, respectively. All geophysical corrections give reductions in the crossover differences, the most significant being with respect to ocean tides, solid earth tide, and inverse barometer effect. Based on TOPEX-POSEIDON crossovers and repeat-track differences, we estimate the relative bias between TOPEX and POSEIDON at about -15.5 +/- 1 cm. This value is dependent on electromagnetic bias corrections used. An orbit error reduction method based on global minimization of crossover differences over one cycle yields an orbit error of about 3 cm root mean square (RMS). This is probably an upper estimate of the orbit error since the estimation absorbs other altimetric signals. The RMS crossover difference is reduced to 8.8 cm after adjustment. A repeat-track analysis is then performed using the CR tide model. In regions of high mesoscale variability, the RMS sea level variability agrees well with the Geosat results. Tidal errors are also clearly evidenced. A recent tide model (Ma et al., this issue) determined from TOPEX/POSEIDON data considerably improves the RMS sea level variability. The reduction of sea level variance is (4 cm) sqaured on average but can reach (8 cm) squared in the southeast Pacific, southeast Atlantic, and Indian Oceans. The RMS sea level variability thus decreases from 6 cm to only 4 cm in quiet ocean regions. The large-scale sea level variations over these first 10 months most likely show for the first time the global annual cycle of sea level. We analyze the TOPEX and POSEIDON sea level anomaly wavenumber spectral characteristics. TOPEX and
ERIC Educational Resources Information Center
Cruce, Ty M.
2009-01-01
This methodological note illustrates how a commonly used calculation of the Delta-p statistic is inappropriate for categorical independent variables, and this note provides users of logistic regression with a revised calculation of the Delta-p statistic that is more meaningful when studying the differences in the predicted probability of an…
PANDA: pathway and annotation explorer for visualizing and interpreting gene-centric data.
Hart, Steven N; Moore, Raymond M; Zimmermann, Michael T; Oliver, Gavin R; Egan, Jan B; Bryce, Alan H; Kocher, Jean-Pierre A
2015-01-01
Objective. Bringing together genomics, transcriptomics, proteomics, and other -omics technologies is an important step towards developing highly personalized medicine. However, instrumentation has advances far beyond expectations and now we are able to generate data faster than it can be interpreted. Materials and Methods. We have developed PANDA (Pathway AND Annotation) Explorer, a visualization tool that integrates gene-level annotation in the context of biological pathways to help interpret complex data from disparate sources. PANDA is a web-based application that displays data in the context of well-studied pathways like KEGG, BioCarta, and PharmGKB. PANDA represents data/annotations as icons in the graph while maintaining the other data elements (i.e., other columns for the table of annotations). Custom pathways from underrepresented diseases can be imported when existing data sources are inadequate. PANDA also allows sharing annotations among collaborators. Results. In our first use case, we show how easy it is to view supplemental data from a manuscript in the context of a user's own data. Another use-case is provided describing how PANDA was leveraged to design a treatment strategy from the somatic variants found in the tumor of a patient with metastatic sarcomatoid renal cell carcinoma. Conclusion. PANDA facilitates the interpretation of gene-centric annotations by visually integrating this information with context of biological pathways. The application can be downloaded or used directly from our website: http://bioinformaticstools.mayo.edu/research/panda-viewer/. PMID:26038725
NASA Technical Reports Server (NTRS)
Taylor, P. T.; Kis, K. I.; Wittmann, G.
2013-01-01
The ESA SWARM mission will have three earth orbiting magnetometer bearing satellites one in a high orbit and two side-by-side in lower orbits. These latter satellites will record a horizontal magnetic gradient. In order to determine how we can use these gradient measurements for interpretation of large geologic units we used ten years of CHAMP data to compute a horizontal gradient map over a section of southeastern Europe with our goal to interpret these data over the Pannonian Basin of Hungary.
Qualitative Data Analysis and Interpretation in Counseling Psychology: Strategies for Best Practices
ERIC Educational Resources Information Center
Yeh, Christine J.; Inman, Arpana G.
2007-01-01
This article presents an overview of various strategies and methods of engaging in qualitative data interpretations and analyses in counseling psychology. The authors explore the themes of self, culture, collaboration, circularity, trustworthiness, and evidence deconstruction from multiple qualitative methodologies. Commonalities and differences…
ERIC Educational Resources Information Center
Walther, Joachim; Sochacka, Nicola W.; Pawley, Alice L.
2016-01-01
This article explores challenges and opportunities associated with sharing qualitative data in engineering education research. This exploration is theoretically informed by an existing framework of interpretive research quality with a focus on the concept of Communicative Validation. Drawing on practice anecdotes from the authors' work, the…
Computational Approaches and Tools for Exposure Prioritization and Biomonitoring Data Interpretation
The ability to describe the source-environment-exposure-dose-response continuum is essential for identifying exposures of greater concern to prioritize chemicals for toxicity testing or risk assessment, as well as for interpreting biomarker data for better assessment of exposure ...
ERIC Educational Resources Information Center
Plantec, Peter M.; And Others
Three papers by experts in the field of education for gifted children present interpretive comments on one of three research efforts (a School Staffing Survey) whose data were analytically studied in a companion volume (EC 040 763). The School Staffing Survey, which included a representative sample of elementary and secondary schools, gathered…
The use of formal and informal expert judgments when interpreting data for performance assessments
Rechard, R.P.; Trauth, K.M.; Tierney, M.S.; Rath, J.S.; Guzowski, R.V.; Hora, S.C.
1993-12-31
This paper discusses the general process by which data and information are compiled and used for defining modeling parameters. These modeling parameters are input for the mathematical models that are used in performance assessments of the Waste Isolation Pilot Plant (WIPP), near Carlsbad, NM, which is designed to safely manage, store, and dispose of transuranic radioactive wastes. The physical and temporal scales, and the difficulty of obtaining measurements in geologic media made interpretation of measured data, including the difficulty of obtaining measurements in geologic media make interpretation of measured data, including the difference between laboratory experiment scale and repository scale, important tasks. In most instances, standard scientific practices can ensure consistency of data use. To illustrate this point, an example is provided of the interpretation of field measurements of intrinsic permeability for use in computational models using the bootstrap technique. In some cases, sufficient information can never be collected, interpretation of the information is controversial, or information from diverse disciplines must be used. A procedure that formalizes the standard scientific practices under these conditions has been developed. An example is provided of how this procedure has been applied in eliciting expert judgments on markers to deter inadvertant human intrusion into the WIPP.
Computer Search Center Statistics on Users and Data Bases
ERIC Educational Resources Information Center
Schipma, Peter B.
1974-01-01
Statistics gathered over five years of operation by the IIT Research Institute's Computer Search Center are summarized for profile terms and lists, use of truncation modes, use of logic operators, some characteristics of CA Condensates, etc. (Author/JB)
47 CFR 1.363 - Introduction of statistical data.
Code of Federal Regulations, 2014 CFR
2014-10-01
... description of the experimental design shall be set forth, including a specification of the controlled... following items shall be set forth clearly: The formulas used for statistical estimates, standard errors...
47 CFR 1.363 - Introduction of statistical data.
Code of Federal Regulations, 2013 CFR
2013-10-01
... description of the experimental design shall be set forth, including a specification of the controlled... following items shall be set forth clearly: The formulas used for statistical estimates, standard errors...
42 CFR 417.568 - Adequate financial records, statistical data, and cost finding.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 42 Public Health 3 2011-10-01 2011-10-01 false Adequate financial records, statistical data, and... financial records, statistical data, and cost finding. (a) Maintenance of records. (1) An HMO or CMP must maintain sufficient financial records and statistical data for proper determination of costs payable by...
42 CFR 417.568 - Adequate financial records, statistical data, and cost finding.
Code of Federal Regulations, 2010 CFR
2010-10-01
... 42 Public Health 3 2010-10-01 2010-10-01 false Adequate financial records, statistical data, and... financial records, statistical data, and cost finding. (a) Maintenance of records. (1) An HMO or CMP must maintain sufficient financial records and statistical data for proper determination of costs payable by...
42 CFR 417.568 - Adequate financial records, statistical data, and cost finding.
Code of Federal Regulations, 2012 CFR
2012-10-01
... 42 Public Health 3 2012-10-01 2012-10-01 false Adequate financial records, statistical data, and....568 Adequate financial records, statistical data, and cost finding. (a) Maintenance of records. (1) An HMO or CMP must maintain sufficient financial records and statistical data for proper determination...
42 CFR 417.568 - Adequate financial records, statistical data, and cost finding.
Code of Federal Regulations, 2013 CFR
2013-10-01
... 42 Public Health 3 2013-10-01 2013-10-01 false Adequate financial records, statistical data, and....568 Adequate financial records, statistical data, and cost finding. (a) Maintenance of records. (1) An HMO or CMP must maintain sufficient financial records and statistical data for proper determination...
42 CFR 417.568 - Adequate financial records, statistical data, and cost finding.
Code of Federal Regulations, 2014 CFR
2014-10-01
... 42 Public Health 3 2014-10-01 2014-10-01 false Adequate financial records, statistical data, and....568 Adequate financial records, statistical data, and cost finding. (a) Maintenance of records. (1) An HMO or CMP must maintain sufficient financial records and statistical data for proper determination...
Nair, G. Jaya
2013-01-01
Medical coding and dictionaries for clinical trials have seen a wave of change over the past decade where emphasis on more standardized tools for coding and reporting clinical data has taken precedence. Coding personifies the backbone of clinical reporting as safety data reports primarily depend on the coded data. Hence, maintaining an optimum quality of coding is quintessential to the accurate analysis and interpretation of critical clinical data. The perception that medical coding is merely a process of assigning numeric/alphanumeric codes to clinical data needs to be revisited. The significance of quality coding and its impact on clinical reporting has been highlighted in this article. PMID:24010060
Nair, G Jaya
2013-07-01
Medical coding and dictionaries for clinical trials have seen a wave of change over the past decade where emphasis on more standardized tools for coding and reporting clinical data has taken precedence. Coding personifies the backbone of clinical reporting as safety data reports primarily depend on the coded data. Hence, maintaining an optimum quality of coding is quintessential to the accurate analysis and interpretation of critical clinical data. The perception that medical coding is merely a process of assigning numeric/alphanumeric codes to clinical data needs to be revisited. The significance of quality coding and its impact on clinical reporting has been highlighted in this article. PMID:24010060
3D interpretation of SHARAD radargram data using seismic processing routines
NASA Astrophysics Data System (ADS)
Kleuskens, M. H. P.; Oosthoek, J. H. P.
2009-04-01
Ground penetrating radar on board a satellite has entered the field of planetary geology. Two radars enable subsurface observations of Mars. In 2003, ESA launched the Mars Express equipped with MARSIS, a low frequency radar which was able to detect only the base of the ice caps. Since December 2006, the Shallow Radar (SHARAD) of Agenzia Spaziale Italiana (ASI) on board the NASA Mars Reconnaissance Orbiter (MRO) is active in orbit around Mars. The SHARAD radar covers the frequency band between 15 and 25 MHz. The vertical resolution is about 15 m in free space. The horizontal resolution is 300-1000 m along track and 1500-8000 m across track. The radar penetrates the subsurface of Mars up to 2 km deep, and is capable of detecting multiple reflections in the ice caps of Mars. Considering the scarcity of planetary data relative to terrestrial data, it is essential to combine all available types of data of an area of interest. Up to now SHARAD data has only been interpreted separately as 2D radargrams. The Geological Survey of the Netherlands has decades of experience in interpreting 2D and 3D seismic data of the Dutch subsurface, especially for the 3D interpretation of reservoir characteristics of the deeper subsurface. In this abstract we present a methodology which can be used for 3D interpretation of SHARAD data combined with surface data using state-of-the art seismic software applied in the oil and gas industry. We selected a region that would be most suitable to demonstrate 3D interpretation. The Titania Lobe of the North Polar ice cap was selected based on the abundancy of radar data and the complexity of the ice lobe. SHARAD data is released to the scientific community via the Planetary Data System. It includes ‘Reduced Data Records' (RDR) data, a binary format which contains the radargram. First the binary radargram data and corresponding coordinates were combined and converted to the commonly used seismic seg-y format. Second, we used the reservoir
Teaching for Statistical Literacy: Utilising Affordances in Real-World Data
ERIC Educational Resources Information Center
Chick, Helen L.; Pierce, Robyn
2012-01-01
It is widely held that context is important in teaching mathematics and statistics. Consideration of context is central to statistical thinking, and any teaching of statistics must incorporate this aspect. Indeed, it has been advocated that real-world data sets can motivate the learning of statistical principles. It is not, however, a…
Interpretation Of Multifrequency Crosswell Electromagnetic Data With Frequency Dependent Core Data
Kirkendall, B; Roberts, J
2005-06-07
Interpretation of cross-borehole electromagnetic (EM) images acquired at enhanced oil recovery (EOR) sites has proven to be difficult due to the typically complex subsurface geology. Significant problems in image interpretation include correlation of specific electrical conductivity values with oil saturations, the time-dependent electrical variation of the subsurface during EOR, and the non-unique electrical conductivity relationship with subsurface conditions. In this study we perform laboratory electrical properties measurements of core samples from the EOR site to develop an interpretation approach that combines field images and petrophysical results. Cross-borehole EM images from the field indicate resistivity increases in EOR areas--behavior contrary to the intended waterflooding design. Laboratory measurements clearly show a decrease in resistivity with increasing effective pressure and are attributed to increased grain-to-grain contact enhancing a strong surface conductance. We also observe a resistivity increase for some samples during brine injection. These observations possibly explain the contrary behavior observed in the field images. Possible mechanisms for increasing the resistivity in the region include (1) increased oil content as injectate sweeps oil toward the plane of the observation wells; (2) lower conductance pore fluid displacing the high-conductivity brine; (3) degradation of grain-to-grain contacts of the initially conductive matrix; and (4) artifacts of the complicated resistivity/time history similar to that observed in the laboratory experiments.
Van Ness, Peter H.; Fried, Terri R.; Gill, Thomas M.
2012-01-01
This article’s main objective is to demonstrate that data analysis, including quantitative data analysis, is a process of interpretation involving basic hermeneutic principles that philosophers have identified in the interpretive process as applied to other, mainly literary, creations. Such principles include a version of the hermeneutic circle, an insistence on interpretive presuppositions, and a resistance to reducing the discovery of truth to the application of inductive methods. The importance of interpretation becomes especially evident when qualitative and quantitative methods are combined in a single clinical research project and when the data being analyzed are longitudinal. Study objectives will be accomplished by showing that three major hermeneutic principles make practical methodological contributions to an insightful, illustrative mixed methods analysis of a qualitative study of changes in functional disability over time embedded in the Precipitating Events Project—a major longitudinal, quantitative study of functional disability among older persons. Mixed methods, especially as shaped by hermeneutic insights such as the importance of empathetic understanding, are potentially valuable resources for scientific investigations of the experience of aging: a practical aim of this article is to articulate and demonstrate this contention. PMID:22582035
Chagovets, Vtaliy; Kononikhin, Aleksey; Starodubtseva, Nataliia; Kostyukevich, Yury; Popov, Igor; Frankevich, Vladimir; Nikolaev, Eugene
2016-01-01
The importance of high-resolution mass spectrometry for the correct data interpretation of a direct tissue analysis is demonstrated with an example of its clinical application for an endometriosis study. Multivariate analysis of the data discovers lipid species differentially expressed in different tissues under investigation. High-resolution mass spectrometry allows unambiguous separation of peaks with close masses that correspond to proton and sodium adducts of phosphatidylcholines and to phosphatidylcholines differing in double bond number. PMID:27553733
ROOT: A C++ framework for petabyte data storage, statistical analysis and visualization
Antcheva, I.; Ballintijn, M.; Bellenot, B.; Biskup, M.; Brun, R.; Buncic, N.; Canal, Ph.; Casadei, D.; Couet, O.; Fine, V.; Franco, L.; /CERN /CERN
2009-01-01
ROOT is an object-oriented C++ framework conceived in the high-energy physics (HEP) community, designed for storing and analyzing petabytes of data in an efficient way. Any instance of a C++ class can be stored into a ROOT file in a machine-independent compressed binary format. In ROOT the TTree object container is optimized for statistical data analysis over very large data sets by using vertical data storage techniques. These containers can span a large number of files on local disks, the web or a number of different shared file systems. In order to analyze this data, the user can chose out of a wide set of mathematical and statistical functions, including linear algebra classes, numerical algorithms such as integration and minimization, and various methods for performing regression analysis (fitting). In particular, the RooFit package allows the user to perform complex data modeling and fitting while the RooStats library provides abstractions and implementations for advanced statistical tools. Multivariate classification methods based on machine learning techniques are available via the TMVA package. A central piece in these analysis tools are the histogram classes which provide binning of one- and multi-dimensional data. Results can be saved in high-quality graphical formats like Postscript and PDF or in bitmap formats like JPG or GIF. The result can also be stored into ROOT macros that allow a full recreation and rework of the graphics. Users typically create their analysis macros step by step, making use of the interactive C++ interpreter CINT, while running over small data samples. Once the development is finished, they can run these macros at full compiled speed over large data sets, using on-the-fly compilation, or by creating a stand-alone batch program. Finally, if processing farms are available, the user can reduce the execution time of intrinsically parallel tasks - e.g. data mining in HEP - by using PROOF, which will take care of optimally
Quantum Correlations from the Conditional Statistics of Incomplete Data.
Sperling, J; Bartley, T J; Donati, G; Barbieri, M; Jin, X-M; Datta, A; Vogel, W; Walmsley, I A
2016-08-19
We study, in theory and experiment, the quantum properties of correlated light fields measured with click-counting detectors providing incomplete information on the photon statistics. We establish a correlation parameter for the conditional statistics, and we derive the corresponding nonclassicality criteria for detecting conditional quantum correlations. Classical bounds for Pearson's correlation parameter are formulated that allow us, once they are violated, to determine nonclassical correlations via the joint statistics. On the one hand, we demonstrate nonclassical correlations in terms of the joint click statistics of light produced by a parametric down-conversion source. On the other hand, we verify quantum correlations of a heralded, split single-photon state via the conditional click statistics together with a generalization to higher-order moments. We discuss the performance of the presented nonclassicality criteria to successfully discern joint and conditional quantum correlations. Remarkably, our results are obtained without making any assumptions on the response function, quantum efficiency, and dark-count rate of photodetectors. PMID:27588857
Interdisciplinary application and interpretation of EREP data within the Susquehanna River Basin
NASA Technical Reports Server (NTRS)
Mcmurtry, G. J.; Petersen, G. W. (Principal Investigator)
1975-01-01
The author has identified the following significant results. It has become that lineaments seen on Skylab and ERTS images are not equally well defined, and that the clarity of definition of a particular lineament is recorded somewhat differently by different interpreters. In an effort to determine the extent of these variations, a semi-quantitative classification scheme was devised. In the field, along the crest of Bald Eagle Mountain in central Pennsylvania, statistical techniques borrowed from sedimentary petrography (point counting) were used to determine the existence and location of intensely fractured float rock. Verification of Skylab and ERTS detected lineaments on aerial photography at different scales indicated that the brecciated zones appear to occur at one margin of the 1 km zone of brecciation defined as a lineament. In the Lock Haven area, comparison of the film types from the SL4 S190A sensor revealed the black and white Pan X photography to be superior in quality for general interpretation to the black and white IR film. Also, the color positive film is better for interpretation than the color IR film.
NASA Astrophysics Data System (ADS)
Watkins, Hannah; Bond, Clare; Butler, Rob
2016-04-01
Geological mapping techniques have advanced significantly in recent years from paper fieldslips to Toughbook, smartphone and tablet mapping; but how do the methods used to create a geological map affect the thought processes that result in the final map interpretation? Geological maps have many key roles in the field of geosciences including understanding geological processes and geometries in 3D, interpreting geological histories and understanding stratigraphic relationships in 2D and 3D. Here we consider the impact of the methods used to create a map on the thought processes that result in the final geological map interpretation. As mapping technology has advanced in recent years, the way in which we produce geological maps has also changed. Traditional geological mapping is undertaken using paper fieldslips, pencils and compass clinometers. The map interpretation evolves through time as data is collected. This interpretive process that results in the final geological map is often supported by recording in a field notebook, observations, ideas and alternative geological models explored with the use of sketches and evolutionary diagrams. In combination the field map and notebook can be used to challenge the map interpretation and consider its uncertainties. These uncertainties and the balance of data to interpretation are often lost in the creation of published 'fair' copy geological maps. The advent of Toughbooks, smartphones and tablets in the production of geological maps has changed the process of map creation. Digital data collection, particularly through the use of inbuilt gyrometers in phones and tablets, has changed smartphones into geological mapping tools that can be used to collect lots of geological data quickly. With GPS functionality this data is also geospatially located, assuming good GPS connectivity, and can be linked to georeferenced infield photography. In contrast line drawing, for example for lithological boundary interpretation and sketching
Heat-Passing Framework for Robust Interpretation of Data in Networks
Fang, Yi; Sun, Mengtian; Ramani, Karthik
2015-01-01
Researchers are regularly interested in interpreting the multipartite structure of data entities according to their functional relationships. Data is often heterogeneous with intricately hidden inner structure. With limited prior knowledge, researchers are likely to confront the problem of transforming this data into knowledge. We develop a new framework, called heat-passing, which exploits intrinsic similarity relationships within noisy and incomplete raw data, and constructs a meaningful map of the data. The proposed framework is able to rank, cluster, and visualize the data all at once. The novelty of this framework is derived from an analogy between the process of data interpretation and that of heat transfer, in which all data points contribute simultaneously and globally to reveal intrinsic similarities between regions of data, meaningful coordinates for embedding the data, and exemplar data points that lie at optimal positions for heat transfer. We demonstrate the effectiveness of the heat-passing framework for robustly partitioning the complex networks, analyzing the globin family of proteins and determining conformational states of macromolecules in the presence of high levels of noise. The results indicate that the methodology is able to reveal functionally consistent relationships in a robust fashion with no reference to prior knowledge. The heat-passing framework is very general and has the potential for applications to a broad range of research fields, for example, biological networks, social networks and semantic analysis of documents. PMID:25668316
NASA Technical Reports Server (NTRS)
Smith, G. L.; Green, R. N.; Young, G. R.
1974-01-01
The NIMBUS-G environmental monitoring satellite has an instrument (a gas correlation spectrometer) onboard for measuring the mass of a given pollutant within a gas volume. The present paper treats the problem: How can this type measurement be used to estimate the distribution of pollutant levels in a metropolitan area. Estimation methods are used to develop this distribution. The pollution concentration caused by a point source is modeled as a Gaussian plume. The uncertainty in the measurements is used to determine the accuracy of estimating the source strength, the wind velocity, diffusion coefficients and source location.
Fitzgerald, Peter; Laughter, Mark D; Martyn, Rose; Richardson, Dave; Rowe, Nathan C; Pickett, Chris A; Younkin, James R; Shephard, Adam M
2010-01-01
Accountability scale data from the Global Nuclear Fuels (GNF) fuel fabrication facility in Wilmington, NC has been collected and analyzed as a part of the Cylinder Accountability and Tracking System (CATS) field trial in 2009. The purpose of the data collection was to demonstrate an authentication method for safeguards applications, and the use of load cell data in cylinder accountability. The scale data was acquired using a commercial off-the-shelf communication server with authentication and encryption capabilities. The authenticated weight data was then analyzed to determine facility operating activities. The data allowed for the determination of the number of full and empty cylinders weighed and the respective weights along with other operational activities. Data authentication concepts, practices and methods, the details of the GNF weight data authentication implementation and scale data interpretation results will be presented.
Complex data types and a data manipulation language for scientific and statistical databases
Brown, V.A.
1982-01-01
Existing database management systems (DBMSs) support only a few primitive data types: simple numerics and character strings. For scientific and statistical databases (SSDs), which require extensive data manipulation, many more data types need to be recognized. Currently, an SSD user must perform the necessary operations through application programs outside of the DBMS. Not only is this a burden to the user, but the system has no control over whether the operations performed are semantically valid. A solution to this problem is to directly support an extended set of data types. A complex data type (CDT) is a structured data type which corresponds to an abstract object commonly found in the user's view of data. We propose that a DBMS for SSD processing should recognize the following CDTs: set, vector, ordered set, matrix, time, time series, text, and generalized relation. In this thesis CDTs are defined and a language is developed specifically for SSD processing using CDTs as a basis.
Statistical Analysis of CMC Constituent and Processing Data
NASA Technical Reports Server (NTRS)
Fornuff, Jonathan
2004-01-01
Ceramic Matrix Composites (CMCs) are the next "big thing" in high-temperature structural materials. In the case of jet engines, it is widely believed that the metallic superalloys currently being utilized for hot structures (combustors, shrouds, turbine vanes and blades) are nearing their potential limits of improvement. In order to allow for increased turbine temperatures to increase engine efficiency, material scientists have begun looking toward advanced CMCs and SiC/SiC composites in particular. Ceramic composites provide greater strength-to-weight ratios at higher temperatures than metallic alloys, but at the same time require greater challenges in micro-structural optimization that in turn increases the cost of the material as well as increases the risk of variability in the material s thermo-structural behavior. to model various potential CMC engine materials and examines the current variability in these properties due to variability in component processing conditions and constituent materials; then, to see how processing and constituent variations effect key strength, stiffness, and thermal properties of the finished components. Basically, this means trying to model variations in the component s behavior by knowing what went into creating it. inter-phase and manufactured by chemical vapor infiltration (CVI) and melt infiltration (MI) were considered. Examinations of: (1) the percent constituents by volume, (2) the inter-phase thickness, (3) variations in the total porosity, and (4) variations in the chemical composition of the Sic fiber are carried out and modeled using various codes used here at NASA-Glenn (PCGina, NASALife, CEMCAN, etc...). The effects of these variations and the ranking of their respective influences on the various thermo-mechanical material properties are studied and compared to available test data. The properties of the materials as well as minor changes to geometry are then made to the computer model and the detrimental effects
Statistical data of X-ray emission from laboratory sparks
NASA Astrophysics Data System (ADS)
Kochkin, P.; Deursen, D. V.
2011-12-01
In this study we present a summary of the data of 1331 long laboratory sparks in atmospheric pressure intended for a statistical analysis. A 2 MV, 17kJ Marx generator were used to generate 1.2/52μs shape pulses positive and negative polarity. The generator was connected to a spark gap with cone-shaped electrodes. The distance between high-voltage and grounded electrodes was 1.08 meters. Breakdown voltage between electrodes was about 1MV. X-rays have been detected during the development of the discharge channel. The currents through the grounded electrode and through the high-voltage electrode were recorded separately and simultaneously with the voltage and the X-ray signal. X-rays were registered by two LaBr3(Ce+) scintillation detectors in different positions with respect to the forming discharge channel. Detector D1 was placed immediately under the grounded electrode at 15cm distance. Detector D2 was placed at horizontal distances of 143cm and 210cm, at mid-gap height. We also used lead shields of 1.5, 3, and 4 mm thickness for radiation attenuation measurements. For detector collimation we used shields up to 2 cm thickness. Also no metallic objects with pointed surfaces were present within 2 m from the spark gap. Typical plot of positive discharge presented in Figure 1a. Table 1 shows the summary of the X-ray registrations. Signal detection occurred significantly more for positive polarity discharges than for negative. This dependence was observed for both detectors. For detector D2 the probability of X-ray registration decreased proportional to 1/d2 with increasing the distance d to the breakdown gap from 1m43 to 2m10. Detailed energy spectra and time distribution of X-ray emission were obtained; see for example Fig. 1b. For both polarities of the high voltage, the X-rays only occurred when there was a current at the cathode.
A Novel Approach to Asynchronous MVP Data Interpretation Based on Elliptical-Vectors
NASA Astrophysics Data System (ADS)
Kruglyakov, M.; Trofimov, I.; Korotaev, S.; Shneyer, V.; Popova, I.; Orekhova, D.; Scshors, Y.; Zhdanov, M. S.
2014-12-01
We suggest a novel approach to asynchronous magnetic-variation profiling (MVP) data interpretation. Standard method in MVP is based on the interpretation of the coefficients of linear relation between vertical and horizontal components of the measured magnetic field.From mathematical point of view this pair of linear coefficients is not a vector which leads to significant difficulties in asynchronous data interpretation. Our approach allows us to actually treat such a pair of complex numbers as a special vector called an ellipse-vector (EV). By choosing the particular definitions of complex length and direction, the basic relation of MVP can be considered as the dot product. This considerably simplifies the interpretation of asynchronous data. The EV is described by four real numbers: the values of major and minor semiaxes, the angular direction of the major semiaxis and the phase. The notation choice is motivated by historical reasons. It is important that different EV's components have different sensitivity with respect to the field sources and the local heterogeneities. Namely, the value of major semiaxis and the angular direction are mostly determined by the field source and the normal cross-section. On the other hand, the value of minor semiaxis and the phase are responsive to local heterogeneities. Since the EV is the general form of complex vector, the traditional Schmucker vectors can be explicitly expressed through its components.The proposed approach was successfully applied to interpretation the results of asynchronous measurements that had been obtained in the Arctic Ocean at the drift stations "North Pole" in 1962-1976.
Simple Data Sets for Distinct Basic Summary Statistics
ERIC Educational Resources Information Center
Lesser, Lawrence M.
2011-01-01
It is important to avoid ambiguity with numbers because unfortunate choices of numbers can inadvertently make it possible for students to form misconceptions or make it difficult for teachers to tell if students obtained the right answer for the right reason. Therefore, it is important to make sure when introducing basic summary statistics that…
Building Basic Statistical Literacy with U.S. Census Data
ERIC Educational Resources Information Center
Sheffield, Caroline C.; Karp, Karen S.; Brown, E. Todd
2010-01-01
The world is filled with information delivered through graphical representations--everything from voting trends to economic projections to health statistics. Whether comparing incomes of individuals by their level of education, tracking the rise and fall of state populations, or researching home ownership in different geographical areas, basic…
Performance Data Gathering and Representation from Fixed-Size Statistical Data
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Jin, Haoqiang H.; Schmidt, Melisa A.; Kutler, Paul (Technical Monitor)
1997-01-01
The two commonly-used performance data types in the super-computing community, statistics and event traces, are discussed and compared. Statistical data are much more compact but lack the probative power event traces offer. Event traces, on the other hand, are unbounded and can easily fill up the entire file system during program execution. In this paper, we propose an innovative methodology for performance data gathering and representation that offers a middle ground. Two basic ideas are employed: the use of averages to replace recording data for each instance and 'formulae' to represent sequences associated with communication and control flow. The user can trade off tracing overhead, trace data size with data quality incrementally. In other words, the user will be able to limit the amount of trace data collected and, at the same time, carry out some of the analysis event traces offer using space-time views. With the help of a few simple examples, we illustrate the use of these techniques in performance tuning and compare the quality of the traces we collected with event traces. We found that the trace files thus obtained are, indeed, small, bounded and predictable before program execution, and that the quality of the space-time views generated from these statistical data are excellent. Furthermore, experimental results showed that the formulae proposed were able to capture all the sequences associated with 11 of the 15 applications tested. The performance of the formulae can be incrementally improved by allocating more memory at runtime to learn longer sequences.
49 CFR Schedule G to Subpart B of... - Selected Statistical Data
Code of Federal Regulations, 2013 CFR
2013-10-01
... 49 Transportation 8 2013-10-01 2013-10-01 false Selected Statistical Data G Schedule G to Subpart... Statistical Data () Greyhound Lines, Inc. () Trailways combined () All study carriers Line No. and Item (a.... (b) Other Statistics: 25Number of regulator route intercity passenger miles Sch. 9002, L. 12, col....
29 CFR 1904.42 - Requests from the Bureau of Labor Statistics for data.
Code of Federal Regulations, 2013 CFR
2013-07-01
... 29 Labor 5 2013-07-01 2013-07-01 false Requests from the Bureau of Labor Statistics for data. 1904... Statistics for data. (a) Basic requirement. If you receive a Survey of Occupational Injuries and Illnesses Form from the Bureau of Labor Statistics (BLS), or a BLS designee, you must promptly complete the...
49 CFR Schedule G to Subpart B of... - Selected Statistical Data
Code of Federal Regulations, 2014 CFR
2014-10-01
... 49 Transportation 8 2014-10-01 2014-10-01 false Selected Statistical Data G Schedule G to Subpart... Statistical Data () Greyhound Lines, Inc. () Trailways combined () All study carriers Line No. and Item (a.... (b) Other Statistics: 25Number of regulator route intercity passenger miles Sch. 9002, L. 12, col....
29 CFR 1904.42 - Requests from the Bureau of Labor Statistics for data.
Code of Federal Regulations, 2012 CFR
2012-07-01
... 29 Labor 5 2012-07-01 2012-07-01 false Requests from the Bureau of Labor Statistics for data. 1904... Statistics for data. (a) Basic requirement. If you receive a Survey of Occupational Injuries and Illnesses Form from the Bureau of Labor Statistics (BLS), or a BLS designee, you must promptly complete the...
49 CFR Schedule G to Subpart B of... - Selected Statistical Data
Code of Federal Regulations, 2012 CFR
2012-10-01
... 49 Transportation 8 2012-10-01 2012-10-01 false Selected Statistical Data G Schedule G to Subpart... Statistical Data () Greyhound Lines, Inc. () Trailways combined () All study carriers Line No. and Item (a.... (b) Other Statistics: 25Number of regulator route intercity passenger miles Sch. 9002, L. 12, col....
29 CFR 1904.42 - Requests from the Bureau of Labor Statistics for data.
Code of Federal Regulations, 2014 CFR
2014-07-01
... 29 Labor 5 2014-07-01 2014-07-01 false Requests from the Bureau of Labor Statistics for data. 1904... Statistics for data. (a) Basic requirement. If you receive a Survey of Occupational Injuries and Illnesses Form from the Bureau of Labor Statistics (BLS), or a BLS designee, you must promptly complete the...
3 CFR - Enhanced Collection of Relevant Data and Statistics Relating to Women
Code of Federal Regulations, 2012 CFR
2012-01-01
... 3 The President 1 2012-01-01 2012-01-01 false Enhanced Collection of Relevant Data and Statistics... Collection of Relevant Data and Statistics Relating to Women Memorandum for the Heads of Executive... light of available statistical evidence. It will also assist the work of the nongovernmental...
Radar Derived Spatial Statistics of Summer Rain. Volume 2; Data Reduction and Analysis
NASA Technical Reports Server (NTRS)
Konrad, T. G.; Kropfli, R. A.
1975-01-01
Data reduction and analysis procedures are discussed along with the physical and statistical descriptors used. The statistical modeling techniques are outlined and examples of the derived statistical characterization of rain cells in terms of the several physical descriptors are presented. Recommendations concerning analyses which can be pursued using the data base collected during the experiment are included.
29 CFR 1904.42 - Requests from the Bureau of Labor Statistics for data.
Code of Federal Regulations, 2010 CFR
2010-07-01
... 29 Labor 5 2010-07-01 2010-07-01 false Requests from the Bureau of Labor Statistics for data. 1904... Statistics for data. (a) Basic requirement. If you receive a Survey of Occupational Injuries and Illnesses Form from the Bureau of Labor Statistics (BLS), or a BLS designee, you must promptly complete the...
49 CFR Schedule G to Subpart B of... - Selected Statistical Data
Code of Federal Regulations, 2010 CFR
2010-10-01
... 49 Transportation 8 2010-10-01 2010-10-01 false Selected Statistical Data G Schedule G to Subpart... Statistical Data () Greyhound Lines, Inc. () Trailways combined () All study carriers Line No. and Item (a.... (b) Other Statistics: 25Number of regulator route intercity passenger miles Sch. 9002, L. 12, col....
29 CFR 1904.42 - Requests from the Bureau of Labor Statistics for data.
Code of Federal Regulations, 2011 CFR
2011-07-01
... 29 Labor 5 2011-07-01 2011-07-01 false Requests from the Bureau of Labor Statistics for data. 1904... Statistics for data. (a) Basic requirement. If you receive a Survey of Occupational Injuries and Illnesses Form from the Bureau of Labor Statistics (BLS), or a BLS designee, you must promptly complete the...
Data mining and well logging interpretation: application to a conglomerate reservoir
NASA Astrophysics Data System (ADS)
Shi, Ning; Li, Hong-Qi; Luo, Wei-Ping
2015-06-01
Data mining is the process of extracting implicit but potentially useful information from incomplete, noisy, and fuzzy data. Data mining offers excellent nonlinear modeling and self-organized learning, and it can play a vital role in the interpretation of well logging data of complex reservoirs. We used data mining to identify the lithologies in a complex reservoir. The reservoir lithologies served as the classification task target and were identified using feature extraction, feature selection, and modeling of data streams. We used independent component analysis to extract information from well curves. We then used the branch-and-bound algorithm to look for the optimal feature subsets and eliminate redundant information. Finally, we used the C5.0 decision-tree algorithm to set up disaggregated models of the well logging curves. The modeling and actual logging data were in good agreement, showing the usefulness of data mining methods in complex reservoirs.
Presentation and interpretation of food intake data: factors affecting comparability across studies.
Faber, Mieke; Wenhold, Friede A M; Macintyre, Una E; Wentzel-Viljoen, Edelweiss; Steyn, Nelia P; Oldewage-Theron, Wilna H
2013-01-01
Non-uniform, unclear, or incomplete presentation of food intake data limits interpretation, usefulness, and comparisons across studies. In this contribution, we discuss factors affecting uniform reporting of food intake across studies. The amount of food eaten can be reported as mean portion size, number of servings or total amount of food consumed per day; the absolute intake value for the specific study depends on the denominator used because food intake data can be presented as per capita intake or for consumers only. To identify the foods mostly consumed, foods are reported and ranked according to total number of times consumed, number of consumers, total intake, or nutrient contribution by individual foods or food groups. Presentation of food intake data primarily depends on a study's aim; reported data thus often are not comparable across studies. Food intake data further depend on the dietary assessment methodology used and foods in the database consulted; and are influenced by the inherent limitations of all dietary assessments. Intake data can be presented as either single foods or as clearly defined food groups. Mixed dishes, reported as such or in terms of ingredients and items added during food preparation remain challenging. Comparable presentation of food consumption data is not always possible; presenting sufficient information will assist valid interpretation and optimal use of the presented data. A checklist was developed to strengthen the reporting of food intake data in science communication. PMID:23800564
Gönci, Balázs; Németh, Valéria; Balogh, Emeric; Szabó, Bálint; Dénes, Ádám; Környei, Zsuzsanna; Vicsek, Tamás
2010-01-01
Because of its relevance to everyday life, the spreading of viral infections has been of central interest in a variety of scientific communities involved in fighting, preventing and theoretically interpreting epidemic processes. Recent large scale observations have resulted in major discoveries concerning the overall features of the spreading process in systems with highly mobile susceptible units, but virtually no data are available about observations of infection spreading for a very large number of immobile units. Here we present the first detailed quantitative documentation of percolation-type viral epidemics in a highly reproducible in vitro system consisting of tens of thousands of virtually motionless cells. We use a confluent astroglial monolayer in a Petri dish and induce productive infection in a limited number of cells with a genetically modified herpesvirus strain. This approach allows extreme high resolution tracking of the spatio-temporal development of the epidemic. We show that a simple model is capable of reproducing the basic features of our observations, i.e., the observed behaviour is likely to be applicable to many different kinds of systems. Statistical physics inspired approaches to our data, such as fractal dimension of the infected clusters as well as their size distribution, seem to fit into a percolation theory based interpretation. We suggest that our observations may be used to model epidemics in more complex systems, which are difficult to study in isolation. PMID:21187920
Comparison of Grammar-Based and Statistical Language Models Trained on the Same Data
NASA Technical Reports Server (NTRS)
Hockey, Beth Ann; Rfayner, Manny
2005-01-01
This paper presents a methodologically sound comparison of the performance of grammar-based (GLM) and statistical-based (SLM) recognizer architectures using data from the Clarissa procedure navigator domain. The Regulus open source packages make this possible with a method for constructing a grammar-based language model by training on a corpus. We construct grammar-based and statistical language models from the same corpus for comparison, and find that the grammar-based language models provide better performance in this domain. The best SLM version has a semantic error rate of 9.6%, while the best GLM version has an error rate of 6.0%. Part of this advantage is accounted for by the superior WER and Sentence Error Rate (SER) of the GLM (WER 7.42% versus 6.27%, and SER 12.41% versus 9.79%). The rest is most likely accounted for by the fact that the GLM architecture is able to use logical-form-based features, which permit tighter integration of recognition and semantic interpretation.
Cheung, Yvonne Y; Jung, Boyoun; Sohn, Jae Ho; Ogrinc, Greg
2012-01-01
Quality improvement (QI) projects are an integral part of today's radiology practice, helping identify opportunities for improving outcomes by refining work processes. QI projects are typically driven by outcome measures, but the data can be difficult to interpret: The numbers tend to fluctuate even before a process is altered, and after a QI intervention takes place, it may be even more difficult to determine the cause of such vacillations. Control chart analysis helps the QI project team identify variations that should be targeted for intervention and avoid tampering in processes in which variation is random or harmless. Statistical control charts make it possible to distinguish among random variation or noise in the data, outlying tendencies that should be targeted for future intervention, and changes that signify the success of previous intervention. The data on control charts are plotted over time and integrated with various graphic devices that represent statistical reasoning (eg, control limits) to allow visualization of the intensity and overall effect-negative or positive-of variability. Even when variability has no substantial negative effect, appropriate intervention based on the results of control chart analysis can help increase the efficiency of a process by optimizing the central tendency of the outcome measure. Different types of control charts may be used to analyze the same outcome dataset: For example, paired charts of individual values (x) and the moving range (mR) allow robust and reliable analyses of most types of data from radiology QI projects. Many spreadsheet programs and templates are available for use in creating x-mR charts and other types of control charts. PMID:23150861
Statistical analysis of sparse data: Space plasma measurements
NASA Astrophysics Data System (ADS)
Roelof, Edmond C.
2012-05-01
Some operating space plasma instruments, e.g., ACE/SWICS, can have low counting rates (<1 count/sample). A novel approach has been suggested [1] that estimates counting rates (x) from ``strings'' of samples with (k) zeros followed by a non-zero count (n>=1) using x' = n/(k+1) for each string. We apply Poisson statistics to obtain the ensemble-averaged expectation value of R' and its standard deviation (s.d.) as a function of the (unknown) true rate (x). We find that x'>x for all true rates (particularly for x<1), but interestingly that the s.d. of x' can be less than that of the usual Poisson s.d. from (k+1) samples. We suggest a statistical theoretical ``correction'' for each bin rate that will, on average, compensate for this sampling bias.
Morse, V.C.; Johnson, J.H.; Crittenden, J.L.; Anderson, T.D.
1986-05-01
There are successes and failures in recording and interpreting a single seismic line across the South Owl Creek Mountain fault on the west flank of the Casper arch. Information obtained from this type of work should help explorationists who are exploring structurally complex areas. A depth cross section lacks a subthrust prospect, but is illustrated to show that the South Owl Creek Mountain fault is steeper with less apparent displacement than in areas to the north. This cross section is derived from two-dimensional seismic modeling, using data processing methods specifically for modeling. A flat horizon and balancing technique helps confirm model accuracy. High-quality data were acquired using specifically designed seismic field parameters. The authors concluded that the methodology used is valid, and an interactive modeling program in addition to cross-line control can improve seismic interpretations in structurally complex areas.
Software for Automated Interpretation of Mass Spectrometry Data from Glycans and Glycopeptides
Woodin, Carrie L.; Maxon, Morgan; Desaire, Heather
2013-01-01
The purpose of this review is to provide those interested in glycosylation analysis with the most updated information on the availability of automated tools for MS characterization of N-linked and O-linked glycosylation types. Specifically, this review describes software tools that facilitate elucidation of glycosylation from MS data on the basis of mass alone, as well as software designed to speed the interpretation of glycan and glycopeptide fragmentation from MS/MS data. This review focuses equally on software designed to interpret the composition of released glycans and on tools to characterize N-linked and O-linked glycopeptides. Several websites have been compiled and described that will be helpful to the reader who is interested in further exploring the described tools. PMID:23293784
Air pollutant interactions with vegetation: research needs in data acquisition and interpretation
Lindberg, S. E.; McLauglin, S. B.
1980-01-01
The objective of this discussion is to consider problems involved in the acquisition, interpretation, and application of data collected in studies of air pollutant interactions with the terrestrial environment. Emphasis will be placed on a critical evaluation of current deficiencies and future research needs by addressing the following questions: (1) which pollutants are either sufficiently toxic, pervasive, or persistent to warrant the expense of monitoring and effects research; (2) what are the interactions of multiple pollutants during deposition and how do these influence toxicity; (3) how de we collect, report, and interpret deposition and air quality data to ensure its maximum utility in assessment of potential regional environmental effects; (4) what processes do we study, and how are they measured to most efficiently describe the relationship between air quality dose and ultimate impacts on terrestrial ecosystems; and (5) how do we integrate site-specific studies into regional estimates of present and potential environmental degradation (or benefit).
Network and Data Integration for Biomarker Signature Discovery via Network Smoothed T-Statistics
Cun, Yupeng; Fröhlich, Holger
2013-01-01
Predictive, stable and interpretable gene signatures are generally seen as an important step towards a better personalized medicine. During the last decade various methods have been proposed for that purpose. However, one important obstacle for making gene signatures a standard tool in clinics is the typical low reproducibility of signatures combined with the difficulty to achieve a clear biological interpretation. For that purpose in the last years there has been a growing interest in approaches that try to integrate information from molecular interaction networks. We here propose a technique that integrates network information as well as different kinds of experimental data (here exemplified by mRNA and miRNA expression) into one classifier. This is done by smoothing t-statistics of individual genes or miRNAs over the structure of a combined protein-protein interaction (PPI) and miRNA-target gene network. A permutation test is conducted to select features in a highly consistent manner, and subsequently a Support Vector Machine (SVM) classifier is trained. Compared to several other competing methods our algorithm reveals an overall better prediction performance for early versus late disease relapse and a higher signature stability. Moreover, obtained gene lists can be clearly associated to biological knowledge, such as known disease genes and KEGG pathways. We demonstrate that our data integration strategy can improve classification performance compared to using a single data source only. Our method, called stSVM, is available in R-package netClass on CRAN (http://cran.r-project.org). PMID:24019896
Network and data integration for biomarker signature discovery via network smoothed T-statistics.
Cun, Yupeng; Fröhlich, Holger
2013-01-01
Predictive, stable and interpretable gene signatures are generally seen as an important step towards a better personalized medicine. During the last decade various methods have been proposed for that purpose. However, one important obstacle for making gene signatures a standard tool in clinics is the typical low reproducibility of signatures combined with the difficulty to achieve a clear biological interpretation. For that purpose in the last years there has been a growing interest in approaches that try to integrate information from molecular interaction networks. We here propose a technique that integrates network information as well as different kinds of experimental data (here exemplified by mRNA and miRNA expression) into one classifier. This is done by smoothing t-statistics of individual genes or miRNAs over the structure of a combined protein-protein interaction (PPI) and miRNA-target gene network. A permutation test is conducted to select features in a highly consistent manner, and subsequently a Support Vector Machine (SVM) classifier is trained. Compared to several other competing methods our algorithm reveals an overall better prediction performance for early versus late disease relapse and a higher signature stability. Moreover, obtained gene lists can be clearly associated to biological knowledge, such as known disease genes and KEGG pathways. We demonstrate that our data integration strategy can improve classification performance compared to using a single data source only. Our method, called stSVM, is available in R-package netClass on CRAN (http://cran.r-project.org). PMID:24019896
Interpretation of Lidar and Satellite Data Sets Using a Global Photochemical Model
NASA Technical Reports Server (NTRS)
Zenker, Thomas; Chyba, Thomas
1999-01-01
A primary goal of the NASA Tropospheric Chemistry Program (TCP) is to "contribute substantially to scientific understanding of human impacts on the global troposphere". In order to analyze global or regional trends and factors of the troposphere chemistry, for example, its oxidation capacity or composition, a continuous global/regional data coverage as well as model simulations are needed. The Global Tropospheric Experiment (GTE), a major component of the TCP, provides data vital to these questions via aircraft measurement of key trace chemical species in various remote regions of the world. Another component in NASA's effort are satellite projects for exploration of tropospheric chemistry and dynamics. A unique data product is the Tropospheric Ozone Residual (TOR) utilizing global tropospheric ozone data. Another key research tool are simulation studies of atmospheric chemistry and dynamics for the theoretical understanding of the atmosphere, the extrapolation of observed trends, and for sensitivity studies assessing a changing anthropogenic impact to air chemistry and climate. In the context with model simulations, field data derived from satellites or (airborne) field missions are needed for two purposes: 1. To initialize and validate model simulations, and 2., to interpret field data by comparison to model simulation results in order to analyze global or regional trends and deviations from standard tropospheric chemistry and transport conditions as defined by the simulations. Currently, there is neither a sufficient global data coverage available nor are existing well established global circulation models. The NASA LARC CTM model is currently not yet in a state to accomplish a sufficient tropospheric chemistry simulation, so that the current research under this cooperative agreement focuses on utilizing field data products for direct interpretation. They will be also available for model testing and a later interpretation with a finally utilized model.
Pomeau, Yves; Louët, Sabine
2016-06-01
During the StatPhys Conference on 20th July 2016 in Lyon, France, Yves Pomeau and Daan Frenkel will be awarded the most important prize in the field of Statistical Mechanics: the 2016 Boltzmann Medal, named after the Austrian physicist and philosopher Ludwig Boltzmann. The award recognises Pomeau's key contributions to the Statistical Physics of non-equilibrium phenomena in general. And, in particular, for developing our modern understanding of fluid mechanics, instabilities, pattern formation and chaos. He is recognised as an outstanding theorist bridging disciplines from applied mathematics to statistical physics with a profound impact on the neighbouring fields of turbulence and mechanics. In the article Sabine Louët interviews Pomeau, who is an Editor for the European Physical Journal Special Topics. He shares his views and tells how he experienced the rise of Statistical Mechanics in the past few decades. He also touches upon the need to provide funding to people who have the rare ability to discover new things and ideas, and not just those who are good at filling in grant application forms. PMID:27349556
Interpretation and display of the NURE data base using computer graphics
Koller, G R
1980-01-01
Computer graphics not only is an integral part of data reduction and interpretation, it is also a fundamental aid in the planning and forecasting of the National Uranium Resource Evaluation program at Savannah River Laboratory. Computer graphics not only allows more rapid execution of tasks which could be performed manually, but also presents scientists with new capabilities which would be exceedingly impractical to apply were it not for the application of computer graphics to a problem.
Bias and Sensitivity in the Placement of Fossil Taxa Resulting from Interpretations of Missing Data
Sansom, Robert S.
2015-01-01
The utility of fossils in evolutionary contexts is dependent on their accurate placement in phylogenetic frameworks, yet intrinsic and widespread missing data make this problematic. The complex taphonomic processes occurring during fossilization can make it difficult to distinguish absence from non-preservation, especially in the case of exceptionally preserved soft-tissue fossils: is a particular morphological character (e.g., appendage, tentacle, or nerve) missing from a fossil because it was never there (phylogenetic absence), or just happened to not be preserved (taphonomic loss)? Missing data have not been tested in the context of interpretation of non-present anatomy nor in the context of directional shifts and biases in affinity. Here, complete taxa, both simulated and empirical, are subjected to data loss through the replacement of present entries (1s) with either missing (?s) or absent (0s) entries. Both cause taxa to drift down trees, from their original position, toward the root. Absolute thresholds at which downshift is significant are extremely low for introduced absences (two entries replaced, 6% of present characters). The opposite threshold in empirical fossil taxa is also found to be low; two absent entries replaced with presences causes fossil taxa to drift up trees. As such, only a few instances of non-preserved characters interpreted as absences will cause fossil organisms to be erroneously interpreted as more primitive than they were in life. This observed sensitivity to coding non-present morphology presents a problem for all evolutionary studies that attempt to use fossils to reconstruct rates of evolution or unlock sequences of morphological change. Stem-ward slippage, whereby fossilization processes cause organisms to appear artificially primitive, appears to be a ubiquitous and problematic phenomenon inherent to missing data, even when no decay biases exist. Absent characters therefore require explicit justification and taphonomic
Interpretation of Voyager 1 data on low energy cosmic rays in galactic wind model
NASA Astrophysics Data System (ADS)
Ptuskin, V. S.; Seo, E. S.; Zirakashvili, V. N.
2015-08-01
The local interstellar energy spectra of galactic cosmic rays down to a few MeV/nucleon were directly measured in the experiment on the board of the Voyager 1 spacecraft. We suggest interpretation of these data based on our models of cosmic ray acceleration in supernova remnants and the diffusion in galactic wind where diffusion coefficient is determined by the cosmic ray streaming instability. The dependence of wind velocity on distance above the Galactic disk is determined.