Equivalent statistics and data interpretation.
Francis, Gregory
2016-10-14
Recent reform efforts in psychological science have led to a plethora of choices for scientists to analyze their data. A scientist making an inference about their data must now decide whether to report a p value, summarize the data with a standardized effect size and its confidence interval, report a Bayes Factor, or use other model comparison methods. To make good choices among these options, it is necessary for researchers to understand the characteristics of the various statistics used by the different analysis frameworks. Toward that end, this paper makes two contributions. First, it shows that for the case of a two-sample t test with known sample sizes, many different summary statistics are mathematically equivalent in the sense that they are based on the very same information in the data set. When the sample sizes are known, the p value provides as much information about a data set as the confidence interval of Cohen's d or a JZS Bayes factor. Second, this equivalence means that different analysis methods differ only in their interpretation of the empirical data. At first glance, it might seem that mathematical equivalence of the statistics suggests that it does not matter much which statistic is reported, but the opposite is true because the appropriateness of a reported statistic is relative to the inference it promotes. Accordingly, scientists should choose an analysis method appropriate for their scientific investigation. A direct comparison of the different inferential frameworks provides some guidance for scientists to make good choices and improve scientific practice.
A statistical model for interpreting computerized dynamic posturography data
NASA Technical Reports Server (NTRS)
Feiveson, Alan H.; Metter, E. Jeffrey; Paloski, William H.
2002-01-01
Computerized dynamic posturography (CDP) is widely used for assessment of altered balance control. CDP trials are quantified using the equilibrium score (ES), which ranges from zero to 100, as a decreasing function of peak sway angle. The problem of how best to model and analyze ESs from a controlled study is considered. The ES often exhibits a skewed distribution in repeated trials, which can lead to incorrect inference when applying standard regression or analysis of variance models. Furthermore, CDP trials are terminated when a patient loses balance. In these situations, the ES is not observable, but is assigned the lowest possible score--zero. As a result, the response variable has a mixed discrete-continuous distribution, further compromising inference obtained by standard statistical methods. Here, we develop alternative methodology for analyzing ESs under a stochastic model extending the ES to a continuous latent random variable that always exists, but is unobserved in the event of a fall. Loss of balance occurs conditionally, with probability depending on the realized latent ES. After fitting the model by a form of quasi-maximum-likelihood, one may perform statistical inference to assess the effects of explanatory variables. An example is provided, using data from the NIH/NIA Baltimore Longitudinal Study on Aging.
Correlation-based interpretations of paleoclimate data - where statistics meet past climates
NASA Astrophysics Data System (ADS)
Hu, Jun; Emile-Geay, Julien; Partin, Judson
2017-02-01
Correlation analysis is omnipresent in paleoclimatology, and often serves to support the proposed climatic interpretation of a given proxy record. However, this analysis presents several statistical challenges, each of which is sufficient to nullify the interpretation: the loss of degrees of freedom due to serial correlation, the test multiplicity problem in connection with a climate field, and the presence of age uncertainties. While these issues have long been known to statisticians, they are not widely appreciated by the wider paleoclimate community; yet they can have a first-order impact on scientific conclusions. Here we use three examples from the recent paleoclimate literature to highlight how spurious correlations affect the published interpretations of paleoclimate proxies, and suggest that future studies should address these issues to strengthen their conclusions. In some cases, correlations that were previously claimed to be significant are found insignificant, thereby challenging published interpretations. In other cases, minor adjustments can be made to safeguard against these concerns. Because such problems arise so commonly with paleoclimate data, we provide open-source code to address them. Ultimately, we conclude that statistics alone cannot ground-truth a proxy, and recommend establishing a mechanistic understanding of a proxy signal as a sounder basis for interpretation.
Misuse of statistics in the interpretation of data on low-level radiation
Hamilton, L.D.
1982-01-01
Four misuses of statistics in the interpretation of data of low-level radiation are reviewed: (1) post-hoc analysis and aggregation of data leading to faulty conclusions in the reanalysis of genetic effects of the atomic bomb, and premature conclusions on the Portsmouth Naval Shipyard data; (2) inappropriate adjustment for age and ignoring differences between urban and rural areas leading to potentially spurious increase in incidence of cancer at Rocky Flats; (3) hazard of summary statistics based on ill-conditioned individual rates leading to spurious association between childhood leukemia and fallout in Utah; and (4) the danger of prematurely published preliminary work with inadequate consideration of epidemiological problems - censored data - leading to inappropriate conclusions, needless alarm at the Portsmouth Naval Shipyard, and diversion of scarce research funds.
Phoenix, S.L.; Wu, E.M.
1983-03-01
This paper presents some new data on the strength and stress-rupture of Kevlar-49 fibers, fiber/epoxy strands and pressure vessels, and consolidated data obtained at LLNL over the past 10 years. This data are interpreted by using recent theoretical results from a micromechanical model of the statistical failure process, thereby gaining understanding of the roles of the epoxy matrix and ultraviolet radiation on long term lifetime.
Onisko, Agnieszka; Druzdzel, Marek J.; Austin, R. Marshall
2016-01-01
Background: Classical statistics is a well-established approach in the analysis of medical data. While the medical community seems to be familiar with the concept of a statistical analysis and its interpretation, the Bayesian approach, argued by many of its proponents to be superior to the classical frequentist approach, is still not well-recognized in the analysis of medical data. Aim: The goal of this study is to encourage data analysts to use the Bayesian approach, such as modeling with graphical probabilistic networks, as an insightful alternative to classical statistical analysis of medical data. Materials and Methods: This paper offers a comparison of two approaches to analysis of medical time series data: (1) classical statistical approach, such as the Kaplan–Meier estimator and the Cox proportional hazards regression model, and (2) dynamic Bayesian network modeling. Our comparison is based on time series cervical cancer screening data collected at Magee-Womens Hospital, University of Pittsburgh Medical Center over 10 years. Results: The main outcomes of our comparison are cervical cancer risk assessments produced by the three approaches. However, our analysis discusses also several aspects of the comparison, such as modeling assumptions, model building, dealing with incomplete data, individualized risk assessment, results interpretation, and model validation. Conclusion: Our study shows that the Bayesian approach is (1) much more flexible in terms of modeling effort, and (2) it offers an individualized risk assessment, which is more cumbersome for classical statistical approaches. PMID:28163973
QC Metrics from CPTAC Raw LC-MS/MS Data Interpreted through Multivariate Statistics
2015-01-01
Shotgun proteomics experiments integrate a complex sequence of processes, any of which can introduce variability. Quality metrics computed from LC-MS/MS data have relied upon identifying MS/MS scans, but a new mode for the QuaMeter software produces metrics that are independent of identifications. Rather than evaluating each metric independently, we have created a robust multivariate statistical toolkit that accommodates the correlation structure of these metrics and allows for hierarchical relationships among data sets. The framework enables visualization and structural assessment of variability. Study 1 for the Clinical Proteomics Technology Assessment for Cancer (CPTAC), which analyzed three replicates of two common samples at each of two time points among 23 mass spectrometers in nine laboratories, provided the data to demonstrate this framework, and CPTAC Study 5 provided data from complex lysates under Standard Operating Procedures (SOPs) to complement these findings. Identification-independent quality metrics enabled the differentiation of sites and run-times through robust principal components analysis and subsequent factor analysis. Dissimilarity metrics revealed outliers in performance, and a nested ANOVA model revealed the extent to which all metrics or individual metrics were impacted by mass spectrometer and run time. Study 5 data revealed that even when SOPs have been applied, instrument-dependent variability remains prominent, although it may be reduced, while within-site variability is reduced significantly. Finally, identification-independent quality metrics were shown to be predictive of identification sensitivity in these data sets. QuaMeter and the associated multivariate framework are available from http://fenchurch.mc.vanderbilt.edu and http://homepages.uc.edu/~wang2x7/, respectively. PMID:24494671
Tasker, Gary D.; Granato, Gregory E.
2000-01-01
Decision makers need viable methods for the interpretation of local, regional, and national-highway runoff and urban-stormwater data including flows, concentrations and loads of chemical constituents and sediment, potential effects on receiving waters, and the potential effectiveness of various best management practices (BMPs). Valid (useful for intended purposes), current, and technically defensible stormwater-runoff models are needed to interpret data collected in field studies, to support existing highway and urban-runoffplanning processes, to meet National Pollutant Discharge Elimination System (NPDES) requirements, and to provide methods for computation of Total Maximum Daily Loads (TMDLs) systematically and economically. Historically, conceptual, simulation, empirical, and statistical models of varying levels of detail, complexity, and uncertainty have been used to meet various data-quality objectives in the decision-making processes necessary for the planning, design, construction, and maintenance of highways and for other land-use applications. Water-quality simulation models attempt a detailed representation of the physical processes and mechanisms at a given site. Empirical and statistical regional water-quality assessment models provide a more general picture of water quality or changes in water quality over a region. All these modeling techniques share one common aspect-their predictive ability is poor without suitable site-specific data for calibration. To properly apply the correct model, one must understand the classification of variables, the unique characteristics of water-resources data, and the concept of population structure and analysis. Classifying variables being used to analyze data may determine which statistical methods are appropriate for data analysis. An understanding of the characteristics of water-resources data is necessary to evaluate the applicability of different statistical methods, to interpret the results of these techniques
Asfahani, Jamal
2014-02-01
Factor analysis technique is proposed in this research for interpreting the combination of nuclear well logging, including natural gamma ray, density and neutron-porosity, and the electrical well logging of long and short normal, in order to characterize the large extended basaltic areas in southern Syria. Kodana well logging data are used for testing and applying the proposed technique. The four resulting score logs enable to establish the lithological score cross-section of the studied well. The established cross-section clearly shows the distribution and the identification of four kinds of basalt which are hard massive basalt, hard basalt, pyroclastic basalt and the alteration basalt products, clay. The factor analysis technique is successfully applied on the Kodana well logging data in southern Syria, and can be used efficiently when several wells and huge well logging data with high number of variables are required to be interpreted.
Statistical interpretation of traveltime fluctuations
NASA Astrophysics Data System (ADS)
Roth, Michael
1997-02-01
A ray-theoretical relation between the autocorrelation functions of traveltime and slowness fluctuations is established for recording profiles with arbitrary angles to the propagation direction of a plane wave. From this relation follows that the variance of traveltime fluctuations is independent of the profile orientation and proportional to the variance, ɛ2, of slowness fluctuations, to the correlation distance, a, and to the propagation distance L. The halfwidth of the autocorrelation function of traveltime fluctuations is proportional to a and decreases with increasing profile angle. This relationship allows us to estimate the statistical parameters ɛ and a from observed traveltime fluctuations. Numerical experiments for spatial isotropic random media characterized by a Gaussian autocorrelation function show that the statistical parameters can be reproduced successfully if L/a ≤ 10 . For larger L/a the correlation distance is overestimated and the standard deviation is underestimated. However, the results of the numerical experiments provide empirical factors to correct for these effects. The theory is applied to observed traveltime fluctuations of the Pg phase on a profile of the BABEL project. For the upper crust east of Øland (Sweden) slowness fluctuations with standard deviation ɛ = 2.2-5% and correlation distance a = 330-600 m are found.
Interpretation and use of statistics in nursing research.
Giuliano, Karen K; Polanowicz, Michelle
2008-01-01
A working understanding of the major fundamentals of statistical analysis is required to incorporate the findings of empirical research into nursing practice. The primary focus of this article is to describe common statistical terms, present some common statistical tests, and explain the interpretation of results from inferential statistics in nursing research. An overview of major concepts in statistics, including the distinction between parametric and nonparametric statistics, different types of data, and the interpretation of statistical significance, is reviewed. Examples of some of the most common statistical techniques used in nursing research, such as the Student independent t test, analysis of variance, and regression, are also discussed. Nursing knowledge based on empirical research plays a fundamental role in the development of evidence-based nursing practice. The ability to interpret and use quantitative findings from nursing research is an essential skill for advanced practice nurses to ensure provision of the best care possible for our patients.
The advent of new higher throughput analytical instrumentation has put a strain on interpreting and explaining the results from complex studies. Contemporary human, environmental, and biomonitoring data sets are comprised of tens or hundreds of analytes, multiple repeat measures...
Data Interpretation: Using Probability
ERIC Educational Resources Information Center
Drummond, Gordon B.; Vowler, Sarah L.
2011-01-01
Experimental data are analysed statistically to allow researchers to draw conclusions from a limited set of measurements. The hard fact is that researchers can never be certain that measurements from a sample will exactly reflect the properties of the entire group of possible candidates available to be studied (although using a sample is often the…
Local statistical interpretation for water structure
NASA Astrophysics Data System (ADS)
Sun, Qiang
2013-05-01
In this Letter, Raman spectroscopy is employed to study supercooled water down to a temperature of 248 K at ambient pressure. Based on our interpretation of the Raman OH stretching band, decreasing temperature mainly leads to a structural transition from the single donor-single acceptor (DA) to the double donor-double acceptor (DDAA) hydrogen bonding motif. Additionally, a local statistical interpretation of the water structure is proposed, which reveals that a water molecule interacts with molecules in the first shell through various local hydrogen-bonded networks. From this, a local structure order parameter is proposed to explain the short-range order and long-range disorder.
Nash, J. Thomas; Frishman, David
1983-01-01
Analytical results for 61 elements in 370 samples from the Ranger Mine area are reported. Most of the rocks come from drill core in the Ranger No. 1 and Ranger No. 3 deposits, but 20 samples are from unmineralized drill core more than 1 km from ore. Statistical tests show that the elements Mg, Fe, F, Be, Co, Li, Ni, Pb, Sc, Th, Ti, V, CI, As, Br, Au, Ce, Dy, La Sc, Eu, Tb, Yb, and Tb have positive association with uranium, and Si, Ca, Na, K, Sr, Ba, Ce, and Cs have negative association. For most lithologic subsets Mg, Fe, Li, Cr, Ni, Pb, V, Y, Sm, Sc, Eu, and Yb are significantly enriched in ore-bearing rocks, whereas Ca, Na, K, Sr, Ba, Mn, Ce, and Cs are significantly depleted. These results are consistent with petrographic observations on altered rocks. Lithogeochemistry can aid exploration, but for these rocks requires methods that are expensive and not amenable to routine use.
Interpreting statistics of small lunar craters
NASA Technical Reports Server (NTRS)
Schultz, P. H.; Gault, D.; Greeley, R.
1977-01-01
Some of the wide variations in the crater-size distributions in lunar photography and in the resulting statistics were interpreted as different degradation rates on different surfaces, different scaling laws in different targets, and a possible population of endogenic craters. These possibilities are reexamined for statistics of 26 different regions. In contrast to most other studies, crater diameters as small as 5 m were measured from enlarged Lunar Orbiter framelets. According to the results of the reported analysis, the different crater distribution types appear to be most consistent with the hypotheses of differential degradation and a superposed crater population. Differential degradation can account for the low level of equilibrium in incompetent materials such as ejecta deposits, mantle deposits, and deep regoliths where scaling law changes and catastrophic processes introduce contradictions with other observations.
Hemophilia Data and Statistics
... Hemophilia Women Healthcare Providers Partners Media Policy Makers Data & Statistics Language: English Español (Spanish) Recommend on Facebook ... at a very young age. Based on CDC data, the median age at diagnosis is 36 months ...
Use and interpretation of statistics in wildlife journals
Tacha, Thomas C.; Warde, William D.; Burnham, Kenneth P.
1982-01-01
Use and interpretation of statistics in wildlife journals are reviewed, and suggestions for improvement are offered. Populations from which inferences are to be drawn should be clearly defined, and conclusions should be limited to the range of the data analyzed. Authors should be careful to avoid improper methods of plotting data and should clearly define the use of estimates of variance, standard deviation, standard error, or confidence intervals. Biological and statistical significant are often confused by authors and readers. Statistical hypothesis testing is a tool, and not every question should be answered by hypothesis testing. Meeting assumptions of hypothesis tests is the responsibility of authors, and assumptions should be reviewed before a test is employed. The use of statistical tools should be considered carefully both before and after gathering data.
As watershed groups in the state of Georgia form and develop, they have a need for collecting, managing, and analyzing data associated with their watershed. Possible sources of data for flow, water quality, biology, habitat, and watershed characteristics include the U.S. Geologic...
Data collection and interpretation.
Citerio, Giuseppe; Park, Soojin; Schmidt, J Michael; Moberg, Richard; Suarez, Jose I; Le Roux, Peter D
2015-06-01
Patient monitoring is routinely performed in all patients who receive neurocritical care. The combined use of monitors, including the neurologic examination, laboratory analysis, imaging studies, and physiological parameters, is common in a platform called multi-modality monitoring (MMM). However, the full potential of MMM is only beginning to be realized since for the most part, decision making historically has focused on individual aspects of physiology in a largely threshold-based manner. The use of MMM now is being facilitated by the evolution of bio-informatics in critical care including developing techniques to acquire, store, retrieve, and display integrated data and new analytic techniques for optimal clinical decision making. In this review, we will discuss the crucial initial steps toward data and information management, which in this emerging era of data-intensive science is already shifting concepts of care for acute brain injury and has the potential to both reshape how we do research and enhance cost-effective clinical care.
Interpreting Shock Tube Ignition Data
2003-10-01
times only for high concentrations (of order 1% fuel or greater). The requirements of engine (IC, HCCI , CI and SI) modelers also present a different...Paper 03F-61 Interpreting Shock Tube Ignition Data D. F. Davidson and R. K. Hanson Mechanical Engineering ... Engineering Department Stanford University, Stanford CA 94305 Abstract Chemical kinetic modelers make extensive use of shock tube ignition data
Interpreting Data: The Hybrid Mind
ERIC Educational Resources Information Center
Heisterkamp, Kimberly; Talanquer, Vicente
2015-01-01
The central goal of this study was to characterize major patterns of reasoning exhibited by college chemistry students when analyzing and interpreting chemical data. Using a case study approach, we investigated how a representative student used chemical models to explain patterns in the data based on structure-property relationships. Our results…
Rossell, David
2016-01-01
Big Data brings unprecedented power to address scientific, economic and societal issues, but also amplifies the possibility of certain pitfalls. These include using purely data-driven approaches that disregard understanding the phenomenon under study, aiming at a dynamically moving target, ignoring critical data collection issues, summarizing or preprocessing the data inadequately and mistaking noise for signal. We review some success stories and illustrate how statistical principles can help obtain more reliable information from data. We also touch upon current challenges that require active methodological research, such as strategies for efficient computation, integration of heterogeneous data, extending the underlying theory to increasingly complex questions and, perhaps most importantly, training a new generation of scientists to develop and deploy these strategies. PMID:27722040
The Statistical Interpretation of Entropy: An Activity
ERIC Educational Resources Information Center
Timmberlake, Todd
2010-01-01
The second law of thermodynamics, which states that the entropy of an isolated macroscopic system can increase but will not decrease, is a cornerstone of modern physics. Ludwig Boltzmann argued that the second law arises from the motion of the atoms that compose the system. Boltzmann's statistical mechanics provides deep insight into the…
The Statistical Interpretation of Entropy: An Activity
NASA Astrophysics Data System (ADS)
Timmberlake, Todd
2010-11-01
The second law of thermodynamics, which states that the entropy of an isolated macroscopic system can increase but will not decrease, is a cornerstone of modern physics. Ludwig Boltzmann argued that the second law arises from the motion of the atoms that compose the system. Boltzmann's statistical mechanics provides deep insight into the functioning of the second law and also provided evidence for the existence of atoms at a time when many scientists (like Ernst Mach and Wilhelm Ostwald) were skeptical.
The Statistical Interpretation of Classical Thermodynamic Heating and Expansion Processes
ERIC Educational Resources Information Center
Cartier, Stephen F.
2011-01-01
A statistical model has been developed and applied to interpret thermodynamic processes typically presented from the macroscopic, classical perspective. Through this model, students learn and apply the concepts of statistical mechanics, quantum mechanics, and classical thermodynamics in the analysis of the (i) constant volume heating, (ii)…
Muscular Dystrophy: Data and Statistics
... Statistics Recommend on Facebook Tweet Share Compartir MD STAR net Data and Statistics The following data and ... research [ Read Article ] For more information on MD STAR net see Research and Tracking . Key Findings Feature ...
Linda Stetzenbach; Lauren Nemnich; Davor Novosel
2009-08-31
Three independent tasks had been performed (Stetzenbach 2008, Stetzenbach 2008b, Stetzenbach 2009) to measure a variety of parameters in normative buildings across the United States. For each of these tasks 10 buildings were selected as normative indoor environments. Task 1 focused on office buildings, Task 13 focused on public schools, and Task 0606 focused on high performance buildings. To perform this task it was necessary to restructure the database for the Indoor Environmental Quality (IEQ) data and the Sound measurement as several issues were identified and resolved prior to and during the transfer of these data sets into SPSS. During overview discussions with the statistician utilized in this task it was determined that because the selection of indoor zones (1-6) was independently selected within each task; zones were not related by location across tasks. Therefore, no comparison would be valid across zones for the 30 buildings so the by location (zone) data were limited to three analysis sets of the buildings within each task. In addition, differences in collection procedures for lighting were used in Task 0606 as compared to Tasks 01 & 13 to improve sample collection. Therefore, these data sets could not be merged and compared so effects by-day data were run separately for Task 0606 and only Task 01 & 13 data were merged. Results of the statistical analysis of the IEQ parameters show statistically significant differences were found among days and zones for all tasks, although no differences were found by-day for Draft Rate data from Task 0606 (p>0.05). Thursday measurements of IEQ parameters were significantly different from Tuesday, and most Wednesday measures for all variables of Tasks 1 & 13. Data for all three days appeared to vary for Operative Temperature, whereas only Tuesday and Thursday differed for Draft Rate 1m. Although no Draft Rate measures within Task 0606 were found to significantly differ by-day, Temperature measurements for Tuesday and
Spirakis, C.S.; Pierson, C.T.; Santos, E.S.; Fishman, N.S.
1983-01-01
Statistical treatment of analytical data from 106 samples of uranium-mineralized and unmineralized or weakly mineralized rocks of the Morrison Formation from the northeastern part of the Church Rock area of the Grants uranium region indicates that along with uranium, the deposits in the northeast Church Rock area are enriched in barium, sulfur, sodium, vanadium and equivalent uranium. Selenium and molybdenum are sporadically enriched in the deposits and calcium, manganese, strontium, and yttrium are depleted. Unlike the primary deposits of the San Juan Basin, the deposits in the northeast part of the Church Rock area contain little organic carbon and several elements that are characteristically enriched in the primary deposits are not enriched or are enriched to a much lesser degree in the Church Rock deposits. The suite of elements associated with the deposits in the northeast part of the Church Rock area is also different from the suite of elements associated with the redistributed deposits in the Ambrosia Lake district. This suggests that the genesis of the Church Rock deposits is different, at least in part, from the genesis of the primary deposits of the San Juan Basin or the redistributed deposits at Ambrosia Lake.
NASA Astrophysics Data System (ADS)
Tema, E.; Zanella, E.; Pavón-Carrasco, F. J.; Kondopoulou, D.; Pavlides, S.
2015-10-01
We present the results of palaeomagnetic analysis on Late Bronge Age pottery from Santorini carried out in order to estimate the thermal effect of the Minoan eruption on the pre-Minoan habitation level. A total of 170 specimens from 108 ceramic fragments have been studied. The ceramics were collected from the surface of the pre-Minoan palaeosol at six different sites, including also samples from the Akrotiri archaeological site. The deposition temperatures of the first pyroclastic products have been estimated by the maximum overlap of the re-heating temperature intervals given by the individual fragments at site level. A new statistical elaboration of the temperature data has also been proposed, calculating at 95 per cent of probability the re-heating temperatures at each site. The obtained results show that the precursor tephra layer and the first pumice fall of the eruption were hot enough to re-heat the underlying ceramics at temperatures 160-230 °C in the non-inhabited sites while the temperatures recorded inside the Akrotiri village are slightly lower, varying from 130 to 200 °C. The decrease of the temperatures registered in the human settlements suggests that there was some interaction between the buildings and the pumice fallout deposits while probably the buildings debris layer caused by the preceding and syn-eruption earthquakes has also contributed to the decrease of the recorded re-heating temperatures.
Interpreting NHANES biomonitoring data, cadmium.
Ruiz, Patricia; Mumtaz, Moiz; Osterloh, John; Fisher, Jeffrey; Fowler, Bruce A
2010-09-15
Cadmium (Cd) occurs naturally in the environment and the general population's exposure to it is predominantly through diet. Chronic Cd exposure is a public health concern because Cd is a known carcinogen; it accumulates in the body and causes kidney damage. The National Health and Nutritional Examination Survey (NHANES) has measured urinary Cd; the 2003-2004 NHANES survey cycle reported estimates for 2257 persons aged 6 years and older in the Fourth National Report on Human Exposure to Environmental Chemicals. As part of translational research to make computerized models accessible to health risk assessors we re-coded a cadmium model in Berkeley Madonna simulation language. This model was used in our computational toxicology laboratory to predict the urinary excretion of cadmium. The model simulated the NHANES-measured data very well from ages 6 to 60+ years. An unusual increase in Cd urinary excretion was observed among 6-11-year-olds, followed by a continuous monotonic rise into the seventh decade of life. This observation was also made in earlier studies that could be life stage-related and a function of anatomical and phsysiological changes occurring during this period of life. Urinary excretion of Cd was approximately twofold higher among females than males in all age groups. The model describes Cd's cumulative nature in humans and accommodates the observed variation in exposure/uptake over the course of a lifetime. Such models may be useful for interpreting biomonitoring data and risk assessment.
Hahn, A.A.
1994-11-01
The complexity of instrumentation sometimes requires data analysis to be done before the result is presented to the control room. This tutorial reviews some of the theoretical assumptions underlying the more popular forms of data analysis and presents simple examples to illuminate the advantages and hazards of different techniques.
Comparing survival curves using an easy to interpret statistic.
Hess, Kenneth R
2010-10-15
Here, I describe a statistic for comparing two survival curves that has a clear and obvious meaning and has a long history in biostatistics. Suppose we are comparing survival times associated with two treatments A and B. The statistic operates in such a way that if it takes on the value 0.95, then the interpretation is that a randomly chosen patient treated with A has a 95% chance of surviving longer than a randomly chosen patient treated with B. This statistic was first described in the 1950s, and was generalized in the 1960s to work with right-censored survival times. It is a useful and convenient measure for assessing differences between survival curves. Software for computing the statistic is readily available on the Internet.
Spina Bifida Data and Statistics
... Materials About Us Information For... Media Policy Makers Data and Statistics Recommend on Facebook Tweet Share Compartir ... non-Hispanic white and non-Hispanic black women. Data from 12 state-based birth defects tracking programs ...
[Big data in official statistics].
Zwick, Markus
2015-08-01
The concept of "big data" stands to change the face of official statistics over the coming years, having an impact on almost all aspects of data production. The tasks of future statisticians will not necessarily be to produce new data, but rather to identify and make use of existing data to adequately describe social and economic phenomena. Until big data can be used correctly in official statistics, a lot of questions need to be answered and problems solved: the quality of data, data protection, privacy, and the sustainable availability are some of the more pressing issues to be addressed. The essential skills of official statisticians will undoubtedly change, and this implies a number of challenges to be faced by statistical education systems, in universities, and inside the statistical offices. The national statistical offices of the European Union have concluded a concrete strategy for exploring the possibilities of big data for official statistics, by means of the Big Data Roadmap and Action Plan 1.0. This is an important first step and will have a significant influence on implementing the concept of big data inside the statistical offices of Germany.
Workplace Statistical Literacy for Teachers: Interpreting Box Plots
ERIC Educational Resources Information Center
Pierce, Robyn; Chick, Helen
2013-01-01
As a consequence of the increased use of data in workplace environments, there is a need to understand the demands that are placed on users to make sense of such data. In education, teachers are being increasingly expected to interpret and apply complex data about student and school performance, and, yet it is not clear that they always have the…
Statistical Interpretation of Natural and Technological Hazards in China
NASA Astrophysics Data System (ADS)
Borthwick, Alistair, ,, Prof.; Ni, Jinren, ,, Prof.
2010-05-01
China is prone to catastrophic natural hazards from floods, droughts, earthquakes, storms, cyclones, landslides, epidemics, extreme temperatures, forest fires, avalanches, and even tsunami. This paper will list statistics related to the six worst natural disasters in China over the past 100 or so years, ranked according to number of fatalities. The corresponding data for the six worst natural disasters in China over the past decade will also be considered. [The data are abstracted from the International Disaster Database, Centre for Research on the Epidemiology of Disasters (CRED), Université Catholique de Louvain, Brussels, Belgium, http://www.cred.be/ where a disaster is defined as occurring if one of the following criteria is fulfilled: 10 or more people reported killed; 100 or more people reported affected; a call for international assistance; or declaration of a state of emergency.] The statistics include the number of occurrences of each type of natural disaster, the number of deaths, the number of people affected, and the cost in billions of US dollars. Over the past hundred years, the largest disasters may be related to the overabundance or scarcity of water, and to earthquake damage. However, there has been a substantial relative reduction in fatalities due to water related disasters over the past decade, even though the overall numbers of people affected remain huge, as does the economic damage. This change is largely due to the efforts put in by China's water authorities to establish effective early warning systems, the construction of engineering countermeasures for flood protection, the implementation of water pricing and other measures for reducing excessive consumption during times of drought. It should be noted that the dreadful death toll due to the Sichuan Earthquake dominates recent data. Joint research has been undertaken between the Department of Environmental Engineering at Peking University and the Department of Engineering Science at Oxford
Statistical description for survival data
2016-01-01
Statistical description is always the first step in data analysis. It gives investigator a general impression of the data at hand. Traditionally, data are described as central tendency and deviation. However, this framework does not fit to the survival data (also termed time-to-event data). Such data type contains two components. One is the survival time and the other is the status. Researchers are usually interested in the probability of event at a given survival time point. Hazard function, cumulative hazard function and survival function are commonly used to describe survival data. Survival function can be estimated using Kaplan-Meier estimator, which is also the default method in most statistical packages. Alternatively, Nelson-Aalen estimator is available to estimate survival function. Survival functions of subgroups can be compared using log-rank test. Furthermore, the article also introduces how to describe time-to-event data with parametric modeling. PMID:27867953
Structural interpretation of seismic data and inherent uncertainties
NASA Astrophysics Data System (ADS)
Bond, Clare
2013-04-01
Geoscience is perhaps unique in its reliance on incomplete datasets and building knowledge from their interpretation. This interpretation basis for the science is fundamental at all levels; from creation of a geological map to interpretation of remotely sensed data. To teach and understand better the uncertainties in dealing with incomplete data we need to understand the strategies individual practitioners deploy that make them effective interpreters. The nature of interpretation is such that the interpreter needs to use their cognitive ability in the analysis of the data to propose a sensible solution in their final output that is both consistent not only with the original data but also with other knowledge and understanding. In a series of experiments Bond et al. (2007, 2008, 2011, 2012) investigated the strategies and pitfalls of expert and non-expert interpretation of seismic images. These studies focused on large numbers of participants to provide a statistically sound basis for analysis of the results. The outcome of these experiments showed that a wide variety of conceptual models were applied to single seismic datasets. Highlighting not only spatial variations in fault placements, but whether interpreters thought they existed at all, or had the same sense of movement. Further, statistical analysis suggests that the strategies an interpreter employs are more important than expert knowledge per se in developing successful interpretations. Experts are successful because of their application of these techniques. In a new set of experiments a small number of experts are focused on to determine how they use their cognitive and reasoning skills, in the interpretation of 2D seismic profiles. Live video and practitioner commentary were used to track the evolving interpretation and to gain insight on their decision processes. The outputs of the study allow us to create an educational resource of expert interpretation through online video footage and commentary with
Engine Data Interpretation System (EDIS)
NASA Technical Reports Server (NTRS)
Cost, Thomas L.; Hofmann, Martin O.
1990-01-01
A prototype of an expert system was developed which applies qualitative or model-based reasoning to the task of post-test analysis and diagnosis of data resulting from a rocket engine firing. A combined component-based and process theory approach is adopted as the basis for system modeling. Such an approach provides a framework for explaining both normal and deviant system behavior in terms of individual component functionality. The diagnosis function is applied to digitized sensor time-histories generated during engine firings. The generic system is applicable to any liquid rocket engine but was adapted specifically in this work to the Space Shuttle Main Engine (SSME). The system is applied to idealized data resulting from turbomachinery malfunction in the SSME.
The broad topic of biomarker research has an often-overlooked component: the documentation and interpretation of the surrounding chemical environment and other meta-data, especially from visualization, analytical, and statistical perspectives (Pleil et al. 2014; Sobus et al. 2011...
Statistically significant relational data mining :
Berry, Jonathan W.; Leung, Vitus Joseph; Phillips, Cynthia Ann; Pinar, Ali; Robinson, David Gerald; Berger-Wolf, Tanya; Bhowmick, Sanjukta; Casleton, Emily; Kaiser, Mark; Nordman, Daniel J.; Wilson, Alyson G.
2014-02-01
This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publications that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.
Spatial Statistical Data Fusion (SSDF)
NASA Technical Reports Server (NTRS)
Braverman, Amy J.; Nguyen, Hai M.; Cressie, Noel
2013-01-01
As remote sensing for scientific purposes has transitioned from an experimental technology to an operational one, the selection of instruments has become more coordinated, so that the scientific community can exploit complementary measurements. However, tech nological and scientific heterogeneity across devices means that the statistical characteristics of the data they collect are different. The challenge addressed here is how to combine heterogeneous remote sensing data sets in a way that yields optimal statistical estimates of the underlying geophysical field, and provides rigorous uncertainty measures for those estimates. Different remote sensing data sets may have different spatial resolutions, different measurement error biases and variances, and other disparate characteristics. A state-of-the-art spatial statistical model was used to relate the true, but not directly observed, geophysical field to noisy, spatial aggregates observed by remote sensing instruments. The spatial covariances of the true field and the covariances of the true field with the observations were modeled. The observations are spatial averages of the true field values, over pixels, with different measurement noise superimposed. A kriging framework is used to infer optimal (minimum mean squared error and unbiased) estimates of the true field at point locations from pixel-level, noisy observations. A key feature of the spatial statistical model is the spatial mixed effects model that underlies it. The approach models the spatial covariance function of the underlying field using linear combinations of basis functions of fixed size. Approaches based on kriging require the inversion of very large spatial covariance matrices, and this is usually done by making simplifying assumptions about spatial covariance structure that simply do not hold for geophysical variables. In contrast, this method does not require these assumptions, and is also computationally much faster. This method is
Redshift data and statistical inference
NASA Technical Reports Server (NTRS)
Newman, William I.; Haynes, Martha P.; Terzian, Yervant
1994-01-01
Frequency histograms and the 'power spectrum analysis' (PSA) method, the latter developed by Yu & Peebles (1969), have been widely employed as techniques for establishing the existence of periodicities. We provide a formal analysis of these two classes of methods, including controlled numerical experiments, to better understand their proper use and application. In particular, we note that typical published applications of frequency histograms commonly employ far greater numbers of class intervals or bins than is advisable by statistical theory sometimes giving rise to the appearance of spurious patterns. The PSA method generates a sequence of random numbers from observational data which, it is claimed, is exponentially distributed with unit mean and variance, essentially independent of the distribution of the original data. We show that the derived random processes is nonstationary and produces a small but systematic bias in the usual estimate of the mean and variance. Although the derived variable may be reasonably described by an exponential distribution, the tail of the distribution is far removed from that of an exponential, thereby rendering statistical inference and confidence testing based on the tail of the distribution completely unreliable. Finally, we examine a number of astronomical examples wherein these methods have been used giving rise to widespread acceptance of statistically unconfirmed conclusions.
Analysis of Visual Interpretation of Satellite Data
NASA Astrophysics Data System (ADS)
Svatonova, H.
2016-06-01
Millions of people of all ages and expertise are using satellite and aerial data as an important input for their work in many different fields. Satellite data are also gradually finding a new place in education, especially in the fields of geography and in environmental issues. The article presents the results of an extensive research in the area of visual interpretation of image data carried out in the years 2013 - 2015 in the Czech Republic. The research was aimed at comparing the success rate of the interpretation of satellite data in relation to a) the substrates (to the selected colourfulness, the type of depicted landscape or special elements in the landscape) and b) to selected characteristics of users (expertise, gender, age). The results of the research showed that (1) false colour images have a slightly higher percentage of successful interpretation than natural colour images, (2) colourfulness of an element expected or rehearsed by the user (regardless of the real natural colour) increases the success rate of identifying the element (3) experts are faster in interpreting visual data than non-experts, with the same degree of accuracy of solving the task, and (4) men and women are equally successful in the interpretation of visual image data.
Statistical analysis of pyroshock data
NASA Astrophysics Data System (ADS)
Hughes, William O.
2002-05-01
The sample size of aerospace pyroshock test data is typically small. This often forces the engineer to make assumptions on its population distribution and to use conservative margins or methodologies in determining shock specifications. For example, the maximum expected environment is often derived by adding 3-6 dB to the maximum envelope of a limited amount of shock data. The recent availability of a large amount of pyroshock test data has allowed a rare statistical analysis to be performed. Findings and procedures from this analysis will be explained, including information on population distributions, procedures to properly combine families of test data, and methods of deriving appropriate shock specifications for a multipoint shock source.
Interpreting the flock algorithm from a statistical perspective.
Anderson, Eric C; Barry, Patrick D
2015-09-01
We show that the algorithm in the program flock (Duchesne & Turgeon 2009) can be interpreted as an estimation procedure based on a model essentially identical to the structure (Pritchard et al. 2000) model with no admixture and without correlated allele frequency priors. Rather than using MCMC, the flock algorithm searches for the maximum a posteriori estimate of this structure model via a simulated annealing algorithm with a rapid cooling schedule (namely, the exponent on the objective function →∞). We demonstrate the similarities between the two programs in a two-step approach. First, to enable rapid batch processing of many simulated data sets, we modified the source code of structure to use the flock algorithm, producing the program flockture. With simulated data, we confirmed that results obtained with flock and flockture are very similar (though flockture is some 200 times faster). Second, we simulated multiple large data sets under varying levels of population differentiation for both microsatellite and SNP genotypes. We analysed them with flockture and structure and assessed each program on its ability to cluster individuals to their correct subpopulation. We show that flockture yields results similar to structure albeit with greater variability from run to run. flockture did perform better than structure when genotypes were composed of SNPs and differentiation was moderate (FST= 0.022-0.032). When differentiation was low, structure outperformed flockture for both marker types. On large data sets like those we simulated, it appears that flock's reliance on inference rules regarding its 'plateau record' is not helpful. Interpreting flock's algorithm as a special case of the model in structure should aid in understanding the program's output and behaviour.
Alternative interpretations of statistics on health effects of low-level radiation
Hamilton, L.D.
1983-11-01
Four examples of the interpretation of statistics of data on low-level radiation are reviewed: (a) genetic effects of the atomic bombs at Hiroshima and Nagasaki, (b) cancer at Rocky Flats, (c) childhood leukemia and fallout in Utah, and (d) cancer among workers at the Portsmouth Naval Shipyard. Aggregation of data, adjustment for age, and other problems related to the determination of health effects of low-level radiation are discussed. Troublesome issues related to post hoc analysis are considered.
Data Interpretation in the Digital Age
Leonelli, Sabina
2014-01-01
The consultation of internet databases and the related use of computer software to retrieve, visualise and model data have become key components of many areas of scientific research. This paper focuses on the relation of these developments to understanding the biology of organisms, and examines the conditions under which the evidential value of data posted online is assessed and interpreted by the researchers who access them, in ways that underpin and guide the use of those data to foster discovery. I consider the types of knowledge required to interpret data as evidence for claims about organisms, and in particular the relevance of knowledge acquired through physical interaction with actual organisms to assessing the evidential value of data found online. I conclude that familiarity with research in vivo is crucial to assessing the quality and significance of data visualised in silico; and that studying how biological data are disseminated, visualised, assessed and interpreted in the digital age provides a strong rationale for viewing scientific understanding as a social and distributed, rather than individual and localised, achievement. PMID:25729262
Interpreting genomic data via entropic dissection
Azad, Rajeev K.; Li, Jing
2013-01-01
Since the emergence of high-throughput genome sequencing platforms and more recently the next-generation platforms, the genome databases are growing at an astronomical rate. Tremendous efforts have been invested in recent years in understanding intriguing complexities beneath the vast ocean of genomic data. This is apparent in the spurt of computational methods for interpreting these data in the past few years. Genomic data interpretation is notoriously difficult, partly owing to the inherent heterogeneities appearing at different scales. Methods developed to interpret these data often suffer from their inability to adequately measure the underlying heterogeneities and thus lead to confounding results. Here, we present an information entropy-based approach that unravels the distinctive patterns underlying genomic data efficiently and thus is applicable in addressing a variety of biological problems. We show the robustness and consistency of the proposed methodology in addressing three different biological problems of significance—identification of alien DNAs in bacterial genomes, detection of structural variants in cancer cell lines and alignment-free genome comparison. PMID:23036836
Data Systems and Reports as Active Participants in Data Interpretation
ERIC Educational Resources Information Center
Rankin, Jenny Grant
2016-01-01
Most data-informed decision-making in education is undermined by flawed interpretations. Educator-driven interventions to improve data use are beneficial but not omnipotent, as data misunderstandings persist at schools and school districts commended for ideal data use support. Meanwhile, most data systems and reports display figures without…
Using Statistics to Lie, Distort, and Abuse Data
ERIC Educational Resources Information Center
Bintz, William; Moore, Sara; Adams, Cheryll; Pierce, Rebecca
2009-01-01
Statistics is a branch of mathematics that involves organization, presentation, and interpretation of data, both quantitative and qualitative. Data do not lie, but people do. On the surface, quantitative data are basically inanimate objects, nothing more than lifeless and meaningless symbols that appear on a page, calculator, computer, or in one's…
Regional interpretation of Kansas aeromagnetic data
Yarger, H.L.
1982-01-01
The aeromagnetic mapping techniques used in a regional aeromagnetic survey of the state are documented and a qualitative regional interpretation of the magnetic basement is presented. Geothermal gradients measured and data from oil well records indicate that geothermal resources in Kansas are of a low-grade nature. However, considerable variation in the gradient is noted statewide within the upper 500 meters of the sedimentary section; this suggests the feasibility of using groundwater for space heating by means of heat pumps.
Data interpretation in breath biomarker research: pitfalls and directions.
Miekisch, Wolfram; Herbig, Jens; Schubert, Jochen K
2012-09-01
Most--if not all--potential diagnostic applications in breath research involve different marker concentrations rather than unique breath markers which only occur in the diseased state. Hence, data interpretation is a crucial step in breath analysis. To avoid artificial significance in breath testing every effort should be made to implement method validation, data cross-testing and statistical validation along this process. The most common data analysis related problems can be classified into three groups: confounding variables (CVs), which have a real correlation with both the diseased state and a breath marker but lead to the erroneous conclusion that disease and breath are in a causal relationship; voodoo correlations (VCs), which can be understood as statistically true correlations that arise coincidentally in the vast number of measured variables; and statistical misconceptions in the study design (SMSD). CV: Typical confounding variables are environmental and medical history, host factors such as gender, age, weight, etc and parameters that could affect the quality of breath data such as subject breathing mode, effects of breath sampling and effects of the analytical technique itself. VC: The number of measured variables quickly overwhelms the number of samples that can feasibly be taken. As a consequence, the chances of finding coincidental 'voodoo' correlations grow proportionally. VCs can typically be expected in the following scenarios: insufficient number of patients, (too) many measurement variables, the use of advanced statistical data mining methods, and non-independent data for validation. SMSD: Non-prospective, non-blinded and non-randomized trials, a priori biased study populations or group selection with unrealistically high disease prevalence typically represent misconception of study design. In this paper important data interpretation issues are discussed, common pitfalls are addressed and directions for sound data processing and interpretation
The Lure of Statistics in Data Mining
ERIC Educational Resources Information Center
Grover, Lovleen Kumar; Mehra, Rajni
2008-01-01
The field of Data Mining like Statistics concerns itself with "learning from data" or "turning data into information". For statisticians the term "Data mining" has a pejorative meaning. Instead of finding useful patterns in large volumes of data as in the case of Statistics, data mining has the connotation of searching for data to fit preconceived…
Confounded Statistical Analyses Hinder Interpretation of the NELP Report
ERIC Educational Resources Information Center
Paris, Scott G.; Luo, Serena Wenshu
2010-01-01
The National Early Literacy Panel (2008) report identified early predictors of reading achievement as good targets for instruction, and many of those skills are related to decoding. In this article, the authors suggest that the developmental trajectories of rapidly developing skills pose problems for traditional statistical analyses. Rapidly…
Interpretation of Statistical Significance Testing: A Matter of Perspective.
ERIC Educational Resources Information Center
McClure, John; Suen, Hoi K.
1994-01-01
This article compares three models that have been the foundation for approaches to the analysis of statistical significance in early childhood research--the Fisherian and the Neyman-Pearson models (both considered "classical" approaches), and the Bayesian model. The article concludes that all three models have a place in the analysis of research…
Statistical characteristics of MST radar echoes and its interpretation
NASA Technical Reports Server (NTRS)
Woodman, Ronald F.
1989-01-01
Two concepts of fundamental importance are reviewed: the autocorrelation function and the frequency power spectrum. In addition, some turbulence concepts, the relationship between radar signals and atmospheric medium statistics, partial reflection, and the characteristics of noise and clutter interference are discussed.
Interpreting magnetic data by integral moments
NASA Astrophysics Data System (ADS)
Tontini, F. Caratori; Pedersen, L. B.
2008-09-01
The use of the integral moments for interpreting magnetic data is based on a very elegant property of potential fields, but in the past it has not been completely exploited due to problems concerning real data. We describe a new 3-D development of previous 2-D results aimed at determining the magnetization direction, extending the calculation to second-order moments to recover the centre of mass of the magnetization distribution. The method is enhanced to reduce the effects of the regional field that often alters the first-order solutions. Moreover, we introduce an iterative correction to properly assess the errors coming from finite-size surveys or interaction with neighbouring anomalies, which are the most important causes of the failing of the method for real data. We test the method on some synthetic examples, and finally, we show the results obtained by analysing the aeromagnetic anomaly of the Monte Vulture volcano in Southern Italy.
Evaluating bifactor models: Calculating and interpreting statistical indices.
Rodriguez, Anthony; Reise, Steven P; Haviland, Mark G
2016-06-01
Bifactor measurement models are increasingly being applied to personality and psychopathology measures (Reise, 2012). In this work, authors generally have emphasized model fit, and their typical conclusion is that a bifactor model provides a superior fit relative to alternative subordinate models. Often unexplored, however, are important statistical indices that can substantially improve the psychometric analysis of a measure. We provide a review of the particularly valuable statistical indices one can derive from bifactor models. They include omega reliability coefficients, factor determinacy, construct reliability, explained common variance, and percentage of uncontaminated correlations. We describe how these indices can be calculated and used to inform: (a) the quality of unit-weighted total and subscale score composites, as well as factor score estimates, and (b) the specification and quality of a measurement model in structural equation modeling. (PsycINFO Database Record
Variation in reaction norms: Statistical considerations and biological interpretation.
Morrissey, Michael B; Liefting, Maartje
2016-09-01
Analysis of reaction norms, the functions by which the phenotype produced by a given genotype depends on the environment, is critical to studying many aspects of phenotypic evolution. Different techniques are available for quantifying different aspects of reaction norm variation. We examine what biological inferences can be drawn from some of the more readily applicable analyses for studying reaction norms. We adopt a strongly biologically motivated view, but draw on statistical theory to highlight strengths and drawbacks of different techniques. In particular, consideration of some formal statistical theory leads to revision of some recently, and forcefully, advocated opinions on reaction norm analysis. We clarify what simple analysis of the slope between mean phenotype in two environments can tell us about reaction norms, explore the conditions under which polynomial regression can provide robust inferences about reaction norm shape, and explore how different existing approaches may be used to draw inferences about variation in reaction norm shape. We show how mixed model-based approaches can provide more robust inferences than more commonly used multistep statistical approaches, and derive new metrics of the relative importance of variation in reaction norm intercepts, slopes, and curvatures.
Aerosol backscatter lidar calibration and data interpretation
NASA Technical Reports Server (NTRS)
Kavaya, M. J.; Menzies, R. T.
1984-01-01
A treatment of the various factors involved in lidar data acquisition and analysis is presented. This treatment highlights sources of fundamental, systematic, modeling, and calibration errors that may affect the accurate interpretation and calibration of lidar aerosol backscatter data. The discussion primarily pertains to ground based, pulsed CO2 lidars that probe the troposphere and are calibrated using large, hard calibration targets. However, a large part of the analysis is relevant to other types of lidar systems such as lidars operating at other wavelengths; continuous wave (CW) lidars; lidars operating in other regions of the atmosphere; lidars measuring nonaerosol elastic or inelastic backscatter; airborne or Earth-orbiting lidar platforms; and lidars employing combinations of the above characteristics.
Data interpretation in the Automated Laboratory
Klatt, L.N.; Elling, J.W.; Mniszewski, S.
1995-12-01
The Contaminant Analysis Automation project envisions the analytical chemistry laboratory of the future being assembled from automation submodules that can be integrated into complete analysis system through a plug-and-play strategy. In this automated system the reduction of instrumental data to knowledge required by the laboratory customer must also be accomplished in an automated way. This paper presents the concept of an automated Data Interpretation Module (DIM) within the context of the plug-and-play automation strategy. The DIM is an expert system driven software module. The DIM functions as a standard laboratory module controlled by the system task sequence controller. The DIM consists of knowledge base(s) that accomplish the data assessment, quality control, and data analysis tasks. The expert system knowledge base(s) encapsulate the training and experience of the analytical chemist. Analysis of instrumental data by the DIM requires the use of pattern recognition techniques. Laboratory data from the analysis of PCBs will be used to illustrate the DIM.
Systematic interpretation of differential capacitance data
NASA Astrophysics Data System (ADS)
Gavish, Nir; Promislow, Keith
2015-07-01
Differential capacitance (DC) data have been widely used to characterize the structure of electrolyte solutions near charged interfaces and as experimental validation of models for electrolyte structure. Fixing a large class of models of electrolyte free energy that incorporate finite-volume effects, a reduction is identified which permits the identification of all free energies within that class that return identical DC data. The result is an interpretation of DC data through the equivalence classes of nonideality terms, and associated boundary layer structures, that cannot be differentiated by DC data. Specifically, for binary salts, DC data, even if measured over a range of ionic concentrations, are unable to distinguish among models which exhibit charge asymmetry, charge reversal, and even ion crowding. The reduction applies to capacitors which are much wider than the associated Debye length and to finite-volume terms that are algebraic in charge density. However, within these restrictions the free energy is shown to be uniquely identified if the DC data are supplemented with measurements of the excess chemical potential of the system in the bulk state.
Smart Interpretation - Application of Machine Learning in Geological Interpretation of AEM Data
NASA Astrophysics Data System (ADS)
Bach, T.; Gulbrandsen, M. L.; Jacobsen, R.; Pallesen, T. M.; Jørgensen, F.; Høyer, A. S.; Hansen, T. M.
2015-12-01
When using airborne geophysical measurements in e.g. groundwater mapping, an overwhelming amount of data is collected. Increasingly larger survey areas, denser data collection and limited resources, combines to an increasing problem of building geological models that use all the available data in a manner that is consistent with the geologists knowledge about the geology of the survey area. In the ERGO project, funded by The Danish National Advanced Technology Foundation, we address this problem, by developing new, usable tools, enabling the geologist utilize her geological knowledge directly in the interpretation of the AEM data, and thereby handle the large amount of data, In the project we have developed the mathematical basis for capturing geological expertise in a statistical model. Based on this, we have implemented new algorithms that have been operationalized and embedded in user friendly software. In this software, the machine learning algorithm, Smart Interpretation, enables the geologist to use the system as an assistant in the geological modelling process. As the software 'learns' the geology from the geologist, the system suggest new modelling features in the data. In this presentation we demonstrate the application of the results from the ERGO project, including the proposed modelling workflow utilized on a variety of data examples.
Statistical Analysis of Geotechnical Data.
1987-09-01
The Data of Fig. 2a. 36 Figure 4. Probability Paper Plot of Compaction Data. 37 Figure 5. Scatter Plot of Compaction Control Data Showing Water 38...Autocorrelation Function of Water Content Over Small Interval 87 of San Francisco Bay Mud. Figure 22. Autocorrelation Function of Water Content Over Large Interval...A Copper 90 Porphyry. Figure 25. Autocorrelation Function of Compacted Water Content in Clay 91 Core of an Embankment Dam. Figure 26. Autocorrelation
Tools for interpretation of multispectral data
NASA Astrophysics Data System (ADS)
Speckert, Glen; Carpenter, Loren C.; Russell, Mike; Bradstreet, John; Waite, Tom; Conklin, Charlie
1990-08-01
The large size and multiple bands of todays satellite data require increasingly powerful tools in order to display and interpret the acquired imagery in a timely fashion. Pixar has developed two major tools for use in this data interpretation. These tools are the Electronic Light Table (ELT), and an extensive image processing package, ChapiP. These tools operate on images limited only by disk volume size, currently 3 Gbytes. The Electronic Light Table package provides a fully windowed interface to these large 12 bit monochrome and multiband images, passing images through a software defined image interpretation pipeline in real time during an interactive roam. A virtual image software framework allows interactive modification of the visible image. The roam software pipeline consists of a seventh order polynomial warp, bicubic resampling, a user registration affine, histogram drop sampling, a 5x5 unsharp mask, and per window contrast controls. It is important to note that these functions are done in software, and various performance tradeoffs can be made for different applications within a family of hardware configurations. Special high spped zoom, rotate, sharpness, and contrast operators provide interactive region of interest manipulation. Double window operators provide for flicker, fade, shade, and difference of two parent windows in a chained fashion. Overlay graphics capability is provided in a PostScfipt* windowed environment (NeWS**). The image is stored on disk as a multi resolution image pyramid. This allows resampling and other image operations independent of the zoom level. A set of tools layered upon ChapIP allow manipulation of the entire pyramid file. Arbitrary combinations of bands can be computed for arbitrary sized images, as well as other image processing operations. ChapIP can also be used in conjunction with ELT to dynamically operate on the current roaming window to append the image processing function onto the roam pipeline. Multiple Chapi
Data Torturing and the Misuse of Statistical Tools
Abate, Marcey L.
1999-08-16
Statistical concepts, methods, and tools are often used in the implementation of statistical thinking. Unfortunately, statistical tools are all too often misused by not applying them in the context of statistical thinking that focuses on processes, variation, and data. The consequences of this misuse may be ''data torturing'' or going beyond reasonable interpretation of the facts due to a misunderstanding of the processes creating the data or the misinterpretation of variability in the data. In the hope of averting future misuse and data torturing, examples are provided where the application of common statistical tools, in the absence of statistical thinking, provides deceptive results by not adequately representing the underlying process and variability. For each of the examples, a discussion is provided on how applying the concepts of statistical thinking may have prevented the data torturing. The lessons learned from these examples will provide an increased awareness of the potential for many statistical methods to mislead and a better understanding of how statistical thinking broadens and increases the effectiveness of statistical tools.
A t-statistic for objective interpretation of comparative genomic hybridization (CGH) profiles.
Moore, D H; Pallavicini, M; Cher, M L; Gray, J W
1997-07-01
An objective method for interpreting comparative genomic hybridization (CGH) is described and compared with current methods of interpretation. The method is based on a two-sample t-statistic in which composite test:reference and reference:reference CGH profiles are compared at each point along the genome to detect regions of significant differences. Composite profiles are created by combining CGH profiles measured from several metaphase chromosomes for each type of chromosome in the normal human karyotype. Composites for both test:reference and reference:reference CGH analyses are used to generate mean CGH profiles and information about the variance therein. The utility of the method is demonstrated through analysis of aneusomies and partial gain and loss of DNA sequence in a myeloid leukemia specimen. Banding analyses of this specimen indicated inv (3)(q21q26), del (5)(q2?q35), -7, +8 and add (17)(p11.2). The t-statistic analyses of CGH data indicated rev ish enh (8) and rev ish dim (5q31.1q33.1,7q11.23qter). The undetected gain on 17p was small and confined to a single band (17p11.2). Thus, the t-statistic is an objective and effective method for defining significant differences between test and reference CGH profiles.
Polarimetric radar data decomposition and interpretation
NASA Technical Reports Server (NTRS)
Sun, Guoqing; Ranson, K. Jon
1993-01-01
Significant efforts have been made to decompose polarimetric radar data into several simple scattering components. The components which are selected because of their physical significance can be used to classify SAR (Synthetic Aperture Radar) image data. If particular components can be related to forest parameters, inversion procedures may be developed to estimate these parameters from the scattering components. Several methods have been used to decompose an averaged Stoke's matrix or covariance matrix into three components representing odd (surface), even (double-bounce) and diffuse (volume) scatterings. With these decomposition techniques, phenomena, such as canopy-ground interactions, randomness of orientation, and size of scatters can be examined from SAR data. In this study we applied the method recently reported by van Zyl (1992) to decompose averaged backscattering covariance matrices extracted from JPL SAR images over forest stands in Maine, USA. These stands are mostly mixed stands of coniferous and deciduous trees. Biomass data have been derived from field measurements of DBH and tree density using allometric equations. The interpretation of the decompositions and relationships with measured stand biomass are presented in this paper.
Recent statistical methods for orientation data
NASA Technical Reports Server (NTRS)
Batschelet, E.
1972-01-01
The application of statistical methods for determining the areas of animal orientation and navigation are discussed. The method employed is limited to the two-dimensional case. Various tests for determining the validity of the statistical analysis are presented. Mathematical models are included to support the theoretical considerations and tables of data are developed to show the value of information obtained by statistical analysis.
[Blood proteins in African trypanosomiasis: variations and statistical interpretations].
Cailliez, M; Poupin, F; Pages, J P; Savel, J
1982-01-01
The estimation of blood orosomucoid, haptoglobin, C-reactive protein and immunoglobulins levels, has enable us to prove a specific proteic profile in the human african trypanosomiasis, as compared with other that of parasitic diseases, and with an healthy african reference group. Data processing informatique by principal components analysis, provide a valuable pool for epidemiological surveys.
Statistics for characterizing data on the periphery
Theiler, James P; Hush, Donald R
2010-01-01
We introduce a class of statistics for characterizing the periphery of a distribution, and show that these statistics are particularly valuable for problems in target detection. Because so many detection algorithms are rooted in Gaussian statistics, we concentrate on ellipsoidal models of high-dimensional data distributions (that is to say: covariance matrices), but we recommend several alternatives to the sample covariance matrix that more efficiently model the periphery of a distribution, and can more effectively detect anomalous data samples.
Example of scattering noise in radar data interpretation
Canavan, G.H.
1996-10-01
Radar data interpretation typically assumes well behaved, known particle distributions. Those assumptions are at variance with the unknown angular scattering characteristics of the particles measured. This note gives a simple example of how those characteristics complicate data interpretation.
Statistical Literacy: Data Tell a Story
ERIC Educational Resources Information Center
Sole, Marla A.
2016-01-01
Every day, students collect, organize, and analyze data to make decisions. In this data-driven world, people need to assess how much trust they can place in summary statistics. The results of every survey and the safety of every drug that undergoes a clinical trial depend on the correct application of appropriate statistics. Recognizing the…
Data Mining: Going beyond Traditional Statistics
ERIC Educational Resources Information Center
Zhao, Chun-Mei; Luan, Jing
2006-01-01
The authors provide an overview of data mining, giving special attention to the relationship between data mining and statistics to unravel some misunderstandings about the two techniques. (Contains 1 figure.)
Distributed data collection for a database of radiological image interpretations
NASA Astrophysics Data System (ADS)
Long, L. Rodney; Ostchega, Yechiam; Goh, Gin-Hua; Thoma, George R.
1997-01-01
The National Library of Medicine, in collaboration with the National Center for Health Statistics and the National Institute for Arthritis and Musculoskeletal and Skin Diseases, has built a system for collecting radiological interpretations for a large set of x-ray images acquired as part of the data gathered in the second National Health and Nutrition Examination Survey. This system is capable of delivering across the Internet 5- and 10-megabyte x-ray images to Sun workstations equipped with X Window based 2048 X 2560 image displays, for the purpose of having these images interpreted for the degree of presence of particular osteoarthritic conditions in the cervical and lumbar spines. The collected interpretations can then be stored in a database at the National Library of Medicine, under control of the Illustra DBMS. This system is a client/server database application which integrates (1) distributed server processing of client requests, (2) a customized image transmission method for faster Internet data delivery, (3) distributed client workstations with high resolution displays, image processing functions and an on-line digital atlas, and (4) relational database management of the collected data.
Woźnicka, U; Jarzyna, J; Krynicka, E
2005-05-01
Measurements of various physical quantities in a borehole by geophysical well logging tools are designed to determine these quantities for underground geological formations. Then, the raw data (logs) are combined in a comprehensive interpretation to obtain values of geological parameters. Estimating the uncertainty of calculated geological parameters, interpreted in such a way, is difficult, often impossible, when classical statistical methods are used. The method presented here permits an estimate of the uncertainty of a quantity to be obtained. The discussion of the dependence between the uncertainty of nuclear and acoustic tool responses, and the estimated uncertainty of the interpreted geological parameters (among others: porosity, water saturation, clay content) is presented.
Data explorer: a prototype expert system for statistical analysis.
Aliferis, C.; Chao, E.; Cooper, G. F.
1993-01-01
The inadequate analysis of medical research data, due mainly to the unavailability of local statistical expertise, seriously jeopardizes the quality of new medical knowledge. Data Explorer is a prototype Expert System that builds on the versatility and power of existing statistical software, to provide automatic analyses and interpretation of medical data. The system draws much of its power by using belief network methods in place of more traditional, but difficult to automate, classical multivariate statistical techniques. Data Explorer identifies statistically significant relationships among variables, and using power-size analysis, belief network inference/learning and various explanatory techniques helps the user understand the importance of the findings. Finally the system can be used as a tool for the automatic development of predictive/diagnostic models from patient databases. PMID:8130501
Statistical Data Analyses of Trace Chemical, Biochemical, and Physical Analytical Signatures
Udey, Ruth Norma
2013-01-01
Analytical and bioanalytical chemistry measurement results are most meaningful when interpreted using rigorous statistical treatments of the data. The same data set may provide many dimensions of information depending on the questions asked through the applied statistical methods. Three principal projects illustrated the wealth of information gained through the application of statistical data analyses to diverse problems.
Topology for statistical modeling of petascale data.
Pascucci, Valerio; Mascarenhas, Ajith Arthur; Rusek, Korben; Bennett, Janine Camille; Levine, Joshua; Pebay, Philippe Pierre; Gyulassy, Attila; Thompson, David C.; Rojas, Joseph Maurice
2011-07-01
This document presents current technical progress and dissemination of results for the Mathematics for Analysis of Petascale Data (MAPD) project titled 'Topology for Statistical Modeling of Petascale Data', funded by the Office of Science Advanced Scientific Computing Research (ASCR) Applied Math program. Many commonly used algorithms for mathematical analysis do not scale well enough to accommodate the size or complexity of petascale data produced by computational simulations. The primary goal of this project is thus to develop new mathematical tools that address both the petascale size and uncertain nature of current data. At a high level, our approach is based on the complementary techniques of combinatorial topology and statistical modeling. In particular, we use combinatorial topology to filter out spurious data that would otherwise skew statistical modeling techniques, and we employ advanced algorithms from algebraic statistics to efficiently find globally optimal fits to statistical models. This document summarizes the technical advances we have made to date that were made possible in whole or in part by MAPD funding. These technical contributions can be divided loosely into three categories: (1) advances in the field of combinatorial topology, (2) advances in statistical modeling, and (3) new integrated topological and statistical methods.
Statistical analysis principles for Omics data.
Dunkler, Daniela; Sánchez-Cabo, Fátima; Heinze, Georg
2011-01-01
In Omics experiments, typically thousands of hypotheses are tested simultaneously, each based on very few independent replicates. Traditional tests like the t-test were shown to perform poorly with this new type of data. Furthermore, simultaneous consideration of many hypotheses, each prone to a decision error, requires powerful adjustments for this multiple testing situation. After a general introduction to statistical testing, we present the moderated t-statistic, the SAM statistic, and the RankProduct statistic which have been developed to evaluate hypotheses in typical Omics experiments. We also provide an introduction to the multiple testing problem and discuss some state-of-the-art procedures to address this issue. The presented test statistics are subjected to a comparative analysis of a microarray experiment comparing tissue samples of two groups of tumors. All calculations can be done using the freely available statistical software R. Accompanying, commented code is available at: http://www.meduniwien.ac.at/msi/biometrie/MIMB.
Lin, K K
2000-11-01
The U.S. Food and Drug Administration (FDA) is in the process of preparing a draft Guidance for Industry document on the statistical aspects of carcinogenicity studies of pharmaceuticals for public comment. The purpose of the document is to provide statistical guidance for the design of carcinogenicity experiments, methods of statistical analysis of study data, interpretation of study results, presentation of data and results in reports, and submission of electronic study data. This article covers the genesis of the guidance document and some statistical methods in study design, data analysis, and interpretation of results included in the draft FDA guidance document.
NASA Astrophysics Data System (ADS)
Karuppiah, R.; Faldi, A.; Laurenzi, I.; Usadi, A.; Venkatesh, A.
2014-12-01
An increasing number of studies are focused on assessing the environmental footprint of different products and processes, especially using life cycle assessment (LCA). This work shows how combining statistical methods and Geographic Information Systems (GIS) with environmental analyses can help improve the quality of results and their interpretation. Most environmental assessments in literature yield single numbers that characterize the environmental impact of a process/product - typically global or country averages, often unchanging in time. In this work, we show how statistical analysis and GIS can help address these limitations. For example, we demonstrate a method to separately quantify uncertainty and variability in the result of LCA models using a power generation case study. This is important for rigorous comparisons between the impacts of different processes. Another challenge is lack of data that can affect the rigor of LCAs. We have developed an approach to estimate environmental impacts of incompletely characterized processes using predictive statistical models. This method is applied to estimate unreported coal power plant emissions in several world regions. There is also a general lack of spatio-temporal characterization of the results in environmental analyses. For instance, studies that focus on water usage do not put in context where and when water is withdrawn. Through the use of hydrological modeling combined with GIS, we quantify water stress on a regional and seasonal basis to understand water supply and demand risks for multiple users. Another example where it is important to consider regional dependency of impacts is when characterizing how agricultural land occupation affects biodiversity in a region. We developed a data-driven methodology used in conjuction with GIS to determine if there is a statistically significant difference between the impacts of growing different crops on different species in various biomes of the world.
Systematic interpretation of microarray data using experiment annotations
Fellenberg, Kurt; Busold, Christian H; Witt, Olaf; Bauer, Andrea; Beckmann, Boris; Hauser, Nicole C; Frohme, Marcus; Winter, Stefan; Dippon, Jürgen; Hoheisel, Jörg D
2006-01-01
Background Up to now, microarray data are mostly assessed in context with only one or few parameters characterizing the experimental conditions under study. More explicit experiment annotations, however, are highly useful for interpreting microarray data, when available in a statistically accessible format. Results We provide means to preprocess these additional data, and to extract relevant traits corresponding to the transcription patterns under study. We found correspondence analysis particularly well-suited for mapping such extracted traits. It visualizes associations both among and between the traits, the hereby annotated experiments, and the genes, revealing how they are all interrelated. Here, we apply our methods to the systematic interpretation of radioactive (single channel) and two-channel data, stemming from model organisms such as yeast and drosophila up to complex human cancer samples. Inclusion of technical parameters allows for identification of artifacts and flaws in experimental design. Conclusion Biological and clinical traits can act as landmarks in transcription space, systematically mapping the variance of large datasets from the predominant changes down toward intricate details. PMID:17181856
HistFitter software framework for statistical data analysis
NASA Astrophysics Data System (ADS)
Baak, M.; Besjes, G. J.; Côté, D.; Koutsman, A.; Lorenz, J.; Short, D.
2015-04-01
We present a software framework for statistical data analysis, called HistFitter, that has been used extensively by the ATLAS Collaboration to analyze big datasets originating from proton-proton collisions at the Large Hadron Collider at CERN. Since 2012 HistFitter has been the standard statistical tool in searches for supersymmetric particles performed by ATLAS. HistFitter is a programmable and flexible framework to build, book-keep, fit, interpret and present results of data models of nearly arbitrary complexity. Starting from an object-oriented configuration, defined by users, the framework builds probability density functions that are automatically fit to data and interpreted with statistical tests. Internally HistFitter uses the statistics packages RooStats and HistFactory. A key innovation of HistFitter is its design, which is rooted in analysis strategies of particle physics. The concepts of control, signal and validation regions are woven into its fabric. These are progressively treated with statistically rigorous built-in methods. Being capable of working with multiple models at once that describe the data, HistFitter introduces an additional level of abstraction that allows for easy bookkeeping, manipulation and testing of large collections of signal hypotheses. Finally, HistFitter provides a collection of tools to present results with publication quality style through a simple command-line interface.
Statistical treatment of fatigue test data
Raske, D.T.
1980-01-01
This report discussed several aspects of fatigue data analysis in order to provide a basis for the development of statistically sound design curves. Included is a discussion on the choice of the dependent variable, the assumptions associated with least squares regression models, the variability of fatigue data, the treatment of data from suspended tests and outlying observations, and various strain-life relations.
Statistical Inference for Data Adaptive Target Parameters.
Hubbard, Alan E; Kherad-Pajouh, Sara; van der Laan, Mark J
2016-05-01
Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in an estimation sample (one of the V subsamples) and corresponding complementary parameter-generating sample. For each of the V parameter-generating samples, we apply an algorithm that maps the sample to a statistical target parameter. We define our sample-split data adaptive statistical target parameter as the average of these V-sample specific target parameters. We present an estimator (and corresponding central limit theorem) of this type of data adaptive target parameter. This general methodology for generating data adaptive target parameters is demonstrated with a number of practical examples that highlight new opportunities for statistical learning from data. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming "data-driven", the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods. To suggest such potential, and to verify the predictions of the theory, extensive simulation studies, along with a data analysis based on adaptively determined intervention rules are shown and give insight into how to structure such an approach. The results show that the data adaptive target parameter approach provides a general framework and resulting methodology for data-driven science.
Transit Spectroscopy: new data analysis techniques and interpretation
NASA Astrophysics Data System (ADS)
Tinetti, Giovanna; Waldmann, Ingo P.; Morello, Giuseppe; Tessenyi, Marcell; Varley, Ryan; Barton, Emma; Yurchenko, Sergey; Tennyson, Jonathan; Hollis, Morgan
2014-11-01
Planetary science beyond the boundaries of our Solar System is today in its infancy. Until a couple of decades ago, the detailed investigation of the planetary properties was restricted to objects orbiting inside the Kuiper Belt. Today, we cannot ignore that the number of known planets has increased by two orders of magnitude nor that these planets resemble anything but the objects present in our own Solar System. A key observable for planets is the chemical composition and state of their atmosphere. To date, two methods can be used to sound exoplanetary atmospheres: transit and eclipse spectroscopy, and direct imaging spectroscopy. Although the field of exoplanet spectroscopy has been very successful in past years, there are a few serious hurdles that need to be overcome to progress in this area: in particular instrument systematics are often difficult to disentangle from the signal, data are sparse and often not recorded simultaneously causing degeneracy of interpretation. We will present here new data analysis techniques and interpretation developed by the “ExoLights” team at UCL to address the above-mentioned issues. Said techniques include statistical tools, non-parametric, machine-learning algorithms, optimized radiative transfer models and spectroscopic line-lists. These new tools have been successfully applied to existing data recorded with space and ground instruments, shedding new light on our knowledge and understanding of these alien worlds.
A spatial scan statistic for multinomial data.
Jung, Inkyung; Kulldorff, Martin; Richard, Otukei John
2010-08-15
As a geographical cluster detection analysis tool, the spatial scan statistic has been developed for different types of data such as Bernoulli, Poisson, ordinal, exponential and normal. Another interesting data type is multinomial. For example, one may want to find clusters where the disease-type distribution is statistically significantly different from the rest of the study region when there are different types of disease. In this paper, we propose a spatial scan statistic for such data, which is useful for geographical cluster detection analysis for categorical data without any intrinsic order information. The proposed method is applied to meningitis data consisting of five different disease categories to identify areas with distinct disease-type patterns in two counties in the U.K. The performance of the method is evaluated through a simulation study.
Topology for Statistical Modeling of Petascale Data
Bennett, Janine Camille; Pebay, Philippe Pierre; Pascucci, Valerio; Levine, Joshua; Gyulassy, Attila; Rojas, Maurice
2014-07-01
This document presents current technical progress and dissemination of results for the Mathematics for Analysis of Petascale Data (MAPD) project titled "Topology for Statistical Modeling of Petascale Data", funded by the Office of Science Advanced Scientific Computing Research (ASCR) Applied Math program.
Revisiting the statistical analysis of pyroclast density and porosity data
NASA Astrophysics Data System (ADS)
Bernard, B.; Kueppers, U.; Ortiz, H.
2015-07-01
Explosive volcanic eruptions are commonly characterized based on a thorough analysis of the generated deposits. Amongst other characteristics in physical volcanology, density and porosity of juvenile clasts are some of the most frequently used to constrain eruptive dynamics. In this study, we evaluate the sensitivity of density and porosity data to statistical methods and introduce a weighting parameter to correct issues raised by the use of frequency analysis. Results of textural investigation can be biased by clast selection. Using statistical tools as presented here, the meaningfulness of a conclusion can be checked for any data set easily. This is necessary to define whether or not a sample has met the requirements for statistical relevance, i.e. whether a data set is large enough to allow for reproducible results. Graphical statistics are used to describe density and porosity distributions, similar to those used for grain-size analysis. This approach helps with the interpretation of volcanic deposits. To illustrate this methodology, we chose two large data sets: (1) directed blast deposits of the 3640-3510 BC eruption of Chachimbiro volcano (Ecuador) and (2) block-and-ash-flow deposits of the 1990-1995 eruption of Unzen volcano (Japan). We propose the incorporation of this analysis into future investigations to check the objectivity of results achieved by different working groups and guarantee the meaningfulness of the interpretation.
Teacher Perception of Tasks That Enhance Data Interpretation
ERIC Educational Resources Information Center
Wolfe, Gretchen L.
2012-01-01
The purpose of this study is to provide an account of teacher perception of core practice tasks in data use, particularly data interpretation. Data interpretation is critical to professional practice in planning instructional adjustments for student learning. This is a case study of four elementary teachers who provide numerous task-specific…
Topology for Statistical Modeling of Petascale Data
Pascucci, Valerio; Levine, Joshua; Gyulassy, Attila; Bremer, P. -T.
2013-10-31
Many commonly used algorithms for mathematical analysis do not scale well enough to accommodate the size or complexity of petascale data produced by computational simulations. The primary goal of this project is to develop new mathematical tools that address both the petascale size and uncertain nature of current data. At a high level, the approach of the entire team involving all three institutions is based on the complementary techniques of combinatorial topology and statistical modelling. In particular, we use combinatorial topology to filter out spurious data that would otherwise skew statistical modelling techniques, and we employ advanced algorithms from algebraic statistics to efficiently find globally optimal fits to statistical models. The overall technical contributions can be divided loosely into three categories: (1) advances in the field of combinatorial topology, (2) advances in statistical modelling, and (3) new integrated topological and statistical methods. Roughly speaking, the division of labor between our 3 groups (Sandia Labs in Livermore, Texas A&M in College Station, and U Utah in Salt Lake City) is as follows: the Sandia group focuses on statistical methods and their formulation in algebraic terms, and finds the application problems (and data sets) most relevant to this project, the Texas A&M Group develops new algebraic geometry algorithms, in particular with fewnomial theory, and the Utah group develops new algorithms in computational topology via Discrete Morse Theory. However, we hasten to point out that our three groups stay in tight contact via videconference every 2 weeks, so there is much synergy of ideas between the groups. The following of this document is focused on the contributions that had grater direct involvement from the team at the University of Utah in Salt Lake City.
Statistical data of the uranium industry
1981-01-01
Data are presented on US uranium reserves, potential resources, exploration, mining, drilling, milling, and other activities of the uranium industry through 1980. The compendium reflects the basic programs of the Grand Junction Office. Statistics are based primarily on information provided by the uranium exploration, mining, and milling companies. Data on commercial U/sub 3/O/sub 8/ sales and purchases are included. Data on non-US uranium production and resources are presented in the appendix. (DMC)
Interpreting Statistical Significance Test Results: A Proposed New "What If" Method.
ERIC Educational Resources Information Center
Kieffer, Kevin M.; Thompson, Bruce
As the 1994 publication manual of the American Psychological Association emphasized, "p" values are affected by sample size. As a result, it can be helpful to interpret the results of statistical significant tests in a sample size context by conducting so-called "what if" analyses. However, these methods can be inaccurate…
Dotto, G L; Pinto, L A A; Hachicha, M A; Knani, S
2015-03-15
In this work, statistical physics treatment was employed to study the adsorption of food dyes onto chitosan films, in order to obtain new physicochemical interpretations at molecular level. Experimental equilibrium curves were obtained for the adsorption of four dyes (FD&C red 2, FD&C yellow 5, FD&C blue 2, Acid Red 51) at different temperatures (298, 313 and 328 K). A statistical physics formula was used to interpret these curves, and the parameters such as, number of adsorbed dye molecules per site (n), anchorage number (n'), receptor sites density (NM), adsorbed quantity at saturation (N asat), steric hindrance (τ), concentration at half saturation (c1/2) and molar adsorption energy (ΔE(a)) were estimated. The relation of the above mentioned parameters with the chemical structure of the dyes and temperature was evaluated and interpreted.
On the Interpretation of Running Trends as Summary Statistics for Time Series Analysis
NASA Astrophysics Data System (ADS)
Vigo, Isabel M.; Trottini, Mario; Belda, Santiago
2016-04-01
In recent years, running trends analysis (RTA) has been widely used in climate applied research as summary statistics for time series analysis. There is no doubt that RTA might be a useful descriptive tool, but despite its general use in applied research, precisely what it reveals about the underlying time series is unclear and, as a result, its interpretation is unclear too. This work contributes to such interpretation in two ways: 1) an explicit formula is obtained for the set of time series with a given series of running trends, making it possible to show that running trends, alone, perform very poorly as summary statistics for time series analysis; and 2) an equivalence is established between RTA and the estimation of a (possibly nonlinear) trend component of the underlying time series using a weighted moving average filter. Such equivalence provides a solid ground for RTA implementation and interpretation/validation.
Interpretation of remotely sensed data and its applications in oceanography
NASA Technical Reports Server (NTRS)
Parada, N. D. J. (Principal Investigator); Tanaka, K.; Inostroza, H. M.; Verdesio, J. J.
1982-01-01
The methodology of interpretation of remote sensing data and its oceanographic applications are described. The elements of image interpretation for different types of sensors are discussed. The sensors utilized are the multispectral scanner of LANDSAT, and the thermal infrared of NOAA and geostationary satellites. Visual and automatic data interpretation in studies of pollution, the Brazil current system, and upwelling along the southeastern Brazilian coast are compared.
Statistical considerations when analyzing biomarker data.
Beam, Craig A
2015-11-01
Biomarkers have become, and will continue to become, increasingly important to clinical immunology research. Yet, biomarkers often present new problems and raise new statistical and study design issues to scientists working in clinical immunology. In this paper I discuss statistical considerations related to the important biomarker problems of: 1) The design and analysis of clinical studies which seek to determine whether changes from baseline in a biomarker are associated with changes in a metabolic outcome; 2) The conditions that are required for a biomarker to be considered a "surrogate"; 3) Considerations that arise when analyzing whether or not a predictive biomarker could act as a surrogate endpoint; 4) Biomarker timing relative to the clinical endpoint; 5) The problem of analyzing studies that measure many biomarkers from few subjects; and, 6) The use of statistical models when analyzing biomarker data arising from count data.
MICROARRAY DATA ANALYSIS USING MULTIPLE STATISTICAL MODELS
Microarray Data Analysis Using Multiple Statistical Models
Wenjun Bao1, Judith E. Schmid1, Amber K. Goetz1, Ming Ouyang2, William J. Welsh2,Andrew I. Brooks3,4, ChiYi Chu3,Mitsunori Ogihara3,4, Yinhe Cheng5, David J. Dix1. 1National Health and Environmental Effects Researc...
Engine Data Interpretation System (EDIS), phase 2
NASA Technical Reports Server (NTRS)
Cost, Thomas L.; Hofmann, Martin O.
1991-01-01
A prototype of an expert system was developed which applies qualitative constraint-based reasoning to the task of post-test analysis of data resulting from a rocket engine firing. Data anomalies are detected and corresponding faults are diagnosed. Engine behavior is reconstructed using measured data and knowledge about engine behavior. Knowledge about common faults guides but does not restrict the search for the best explanation in terms of hypothesized faults. The system contains domain knowledge about the behavior of common rocket engine components and was configured for use with the Space Shuttle Main Engine (SSME). A graphical user interface allows an expert user to intimately interact with the system during diagnosis. The system was applied to data taken during actual SSME tests where data anomalies were observed.
Telemetry Boards Interpret Rocket, Airplane Engine Data
NASA Technical Reports Server (NTRS)
2009-01-01
For all the data gathered by the space shuttle while in orbit, NASA engineers are just as concerned about the information it generates on the ground. From the moment the shuttle s wheels touch the runway to the break of its electrical umbilical cord at 0.4 seconds before its next launch, sensors feed streams of data about the status of the vehicle and its various systems to Kennedy Space Center s shuttle crews. Even while the shuttle orbiter is refitted in Kennedy s orbiter processing facility, engineers constantly monitor everything from power levels to the testing of the mechanical arm in the orbiter s payload bay. On the launch pad and up until liftoff, the Launch Control Center, attached to the large Vehicle Assembly Building, screens all of the shuttle s vital data. (Once the shuttle clears its launch tower, this responsibility shifts to Mission Control at Johnson Space Center, with Kennedy in a backup role.) Ground systems for satellite launches also generate significant amounts of data. At Cape Canaveral Air Force Station, across the Banana River from Kennedy s location on Merritt Island, Florida, NASA rockets carrying precious satellite payloads into space flood the Launch Vehicle Data Center with sensor information on temperature, speed, trajectory, and vibration. The remote measurement and transmission of systems data called telemetry is essential to ensuring the safe and successful launch of the Agency s space missions. When a launch is unsuccessful, as it was for this year s Orbiting Carbon Observatory satellite, telemetry data also provides valuable clues as to what went wrong and how to remedy any problems for future attempts. All of this information is streamed from sensors in the form of binary code: strings of ones and zeros. One small company has partnered with NASA to provide technology that renders raw telemetry data intelligible not only for Agency engineers, but also for those in the private sector.
Cho, Kyung Hwa; Park, Yongeun; Kang, Joo-Hyon; Ki, Seo Jin; Cha, Sungmin; Lee, Seung Won; Kim, Joon Ha
2009-01-01
The Yeongsan (YS) Reservoir is an estuarine reservoir which provides surrounding areas with public goods, such as water supply for agricultural and industrial areas and flood control. Beneficial uses of the YS Reservoir, however, are recently threatened by enriched non-point and point source inputs. A series of multivariate statistical approaches including principal component analysis (PCA) were applied to extract significant characteristics contained in a large suite of water quality data (18 variables monthly recorded for 5 years); thereby to provide the important phenomenal information for establishing effective water resource management plans for the YS Reservoir. The PCA results identified the most important five principal components (PCs), explaining 71% of total variance of the original data set. The five PCs were interpreted as hydro-meteorological effect, nitrogen loading, phosphorus loading, primary production of phytoplankton, and fecal indicator bacteria (FIB) loading. Furthermore, hydro-meteorological effect and nitrogen loading could be characterized by a yearly periodicity whereas FIB loading showed an increasing trend with respect to time. The study results presented here might be useful to establish preliminary strategies for abating water quality degradation in the YS Reservoir.
Biau, David Jean; Kernéis, Solen; Porcher, Raphaël
2008-09-01
The increasing volume of research by the medical community often leads to increasing numbers of contradictory findings and conclusions. Although the differences observed may represent true differences, the results also may differ because of sampling variability as all studies are performed on a limited number of specimens or patients. When planning a study reporting differences among groups of patients or describing some variable in a single group, sample size should be considered because it allows the researcher to control for the risk of reporting a false-negative finding (Type II error) or to estimate the precision his or her experiment will yield. Equally important, readers of medical journals should understand sample size because such understanding is essential to interpret the relevance of a finding with regard to their own patients. At the time of planning, the investigator must establish (1) a justifiable level of statistical significance, (2) the chances of detecting a difference of given magnitude between the groups compared, ie, the power, (3) this targeted difference (ie, effect size), and (4) the variability of the data (for quantitative data). We believe correct planning of experiments is an ethical issue of concern to the entire community.
Multivariate Statistical Mapping of Spectroscopic Imaging Data
Young, K.; Govind, V.; Sharma, K.; Studholme, C.; Maudsley, A.A; Schuff, N.
2010-01-01
For magnetic resonance spectroscopic imaging (MRSI) studies of the brain it is important to measure the distribution of metabolites in a regionally unbiased way - that is without restrictions to apriori defined regions of interest (ROI). Since MRSI provides measures of multiple metabolites simultaneously at each voxel, there is furthermore great interest in utilizing the multidimensional nature of MRSI for gains in statistical power. Voxelwise multivariate statistical mapping is expected to address both of these issues but it has not been previously employed for SI studies of brain. The aims of this study were to: 1) develop and validate multivariate voxel based statistical mapping for MRSI and 2) demonstrate that multivariate tests can be more powerful than univariate tests in identifying patterns of altered brain metabolism. Specifically, we compared multivariate to univariate tests in identifying known regional patterns in simulated data and regional patterns of metabolite alterations due to amyotrophic lateral sclerosis, a devastating brain disease of the motor neurons. PMID:19953514
Critical analysis of adsorption data statistically
NASA Astrophysics Data System (ADS)
Kaushal, Achla; Singh, S. K.
2016-09-01
Experimental data can be presented, computed, and critically analysed in a different way using statistics. A variety of statistical tests are used to make decisions about the significance and validity of the experimental data. In the present study, adsorption was carried out to remove zinc ions from contaminated aqueous solution using mango leaf powder. The experimental data was analysed statistically by hypothesis testing applying t test, paired t test and Chi-square test to (a) test the optimum value of the process pH, (b) verify the success of experiment and (c) study the effect of adsorbent dose in zinc ion removal from aqueous solutions. Comparison of calculated and tabulated values of t and χ 2 showed the results in favour of the data collected from the experiment and this has been shown on probability charts. K value for Langmuir isotherm was 0.8582 and m value for Freundlich adsorption isotherm obtained was 0.725, both are <1, indicating favourable isotherms. Karl Pearson's correlation coefficient values for Langmuir and Freundlich adsorption isotherms were obtained as 0.99 and 0.95 respectively, which show higher degree of correlation between the variables. This validates the data obtained for adsorption of zinc ions from the contaminated aqueous solution with the help of mango leaf powder.
Component fragilities. Data collection, analysis and interpretation
Bandyopadhyay, K.K.; Hofmayer, C.H.
1985-01-01
As part of the component fragility research program sponsored by the US NRC, BNL is involved in establishing seismic fragility levels for various nuclear power plant equipment with emphasis on electrical equipment. To date, BNL has reviewed approximately seventy test reports to collect fragility or high level test data for switchgears, motor control centers and similar electrical cabinets, valve actuators and numerous electrical and control devices, e.g., switches, transmitters, potentiometers, indicators, relays, etc., of various manufacturers and models. BNL has also obtained test data from EPRI/ANCO. Analysis of the collected data reveals that fragility levels can best be described by a group of curves corresponding to various failure modes. The lower bound curve indicates the initiation of malfunctioning or structural damage, whereas the upper bound curve corresponds to overall failure of the equipment based on known failure modes occurring separately or interactively. For some components, the upper and lower bound fragility levels are observed to vary appreciably depending upon the manufacturers and models. For some devices, testing even at the shake table vibration limit does not exhibit any failure. Failure of a relay is observed to be a frequent cause of failure of an electrical panel or a system. An extensive amount of additional fregility or high level test data exists.
Statistical analysis of the lithospheric magnetic anomaly data
NASA Astrophysics Data System (ADS)
Pavon-Carrasco, Fco Javier; de Santis, Angelo; Ferraccioli, Fausto; Catalán, Manuel; Ishihara, Takemi
2013-04-01
Different analyses carried out on the lithospheric magnetic anomaly data from GEODAS DVD v5.0.10 database (World Digital Magnetic Anomaly Map, WDMAM) show that the data distribution is not Gaussian, but Laplacian. Although this behaviour has been formerly pointed out in other works (e.g., Walker and Jackson, Geophys. J. Int, 143, 799-808, 2000), they have not given any explanation about this statistical property of the magnetic anomalies. In this work, we perform different statistical tests to confirm that the lithospheric magnetic anomaly data follow indeed a Laplacian distribution and we also give a possible interpretation of this behavior providing a model of magnetization which depends on the variation of the geomagnetic field and both induced and remanent magnetizations in the terrestrial lithosphere.
Simultaneous statistical inference for epigenetic data.
Schildknecht, Konstantin; Olek, Sven; Dickhaus, Thorsten
2015-01-01
Epigenetic research leads to complex data structures. Since parametric model assumptions for the distribution of epigenetic data are hard to verify we introduce in the present work a nonparametric statistical framework for two-group comparisons. Furthermore, epigenetic analyses are often performed at various genetic loci simultaneously. Hence, in order to be able to draw valid conclusions for specific loci, an appropriate multiple testing correction is necessary. Finally, with technologies available for the simultaneous assessment of many interrelated biological parameters (such as gene arrays), statistical approaches also need to deal with a possibly unknown dependency structure in the data. Our statistical approach to the nonparametric comparison of two samples with independent multivariate observables is based on recently developed multivariate multiple permutation tests. We adapt their theory in order to cope with families of hypotheses regarding relative effects. Our results indicate that the multivariate multiple permutation test keeps the pre-assigned type I error level for the global null hypothesis. In combination with the closure principle, the family-wise error rate for the simultaneous test of the corresponding locus/parameter-specific null hypotheses can be controlled. In applications we demonstrate that group differences in epigenetic data can be detected reliably with our methodology.
Data series embedding and scale invariant statistics.
Michieli, I; Medved, B; Ristov, S
2010-06-01
Data sequences acquired from bio-systems such as human gait data, heart rate interbeat data, or DNA sequences exhibit complex dynamics that is frequently described by a long-memory or power-law decay of autocorrelation function. One way of characterizing that dynamics is through scale invariant statistics or "fractal-like" behavior. For quantifying scale invariant parameters of physiological signals several methods have been proposed. Among them the most common are detrended fluctuation analysis, sample mean variance analyses, power spectral density analysis, R/S analysis, and recently in the realm of the multifractal approach, wavelet analysis. In this paper it is demonstrated that embedding the time series data in the high-dimensional pseudo-phase space reveals scale invariant statistics in the simple fashion. The procedure is applied on different stride interval data sets from human gait measurements time series (Physio-Bank data library). Results show that introduced mapping adequately separates long-memory from random behavior. Smaller gait data sets were analyzed and scale-free trends for limited scale intervals were successfully detected. The method was verified on artificially produced time series with known scaling behavior and with the varying content of noise. The possibility for the method to falsely detect long-range dependence in the artificially generated short range dependence series was investigated.
Szabolcsi, Zoltán; Farkas, Zsuzsa; Borbély, Andrea; Bárány, Gusztáv; Varga, Dániel; Heinrich, Attila; Völgyi, Antónia; Pamjav, Horolma
2015-11-01
When the DNA profile from a crime-scene matches that of a suspect, the weight of DNA evidence depends on the unbiased estimation of the match probability of the profiles. For this reason, it is required to establish and expand the databases that reflect the actual allele frequencies in the population applied. 21,473 complete DNA profiles from Databank samples were used to establish the allele frequency database to represent the population of Hungarian suspects. We used fifteen STR loci (PowerPlex ESI16) including five, new ESS loci. The aim was to calculate the statistical, forensic efficiency parameters for the Databank samples and compare the newly detected data to the earlier report. The population substructure caused by relatedness may influence the frequency of profiles estimated. As our Databank profiles were considered non-random samples, possible relationships between the suspects can be assumed. Therefore, population inbreeding effect was estimated using the FIS calculation. The overall inbreeding parameter was found to be 0.0106. Furthermore, we tested the impact of the two allele frequency datasets on 101 randomly chosen STR profiles, including full and partial profiles. The 95% confidence interval estimates for the profile frequencies (pM) resulted in a tighter range when we used the new dataset compared to the previously published ones. We found that the FIS had less effect on frequency values in the 21,473 samples than the application of minimum allele frequency. No genetic substructure was detected by STRUCTURE analysis. Due to the low level of inbreeding effect and the high number of samples, the new dataset provides unbiased and precise estimates of LR for statistical interpretation of forensic casework and allows us to use lower allele frequencies.
Interpretation of genomic data: questions and answers.
Simon, Richard
2008-07-01
Using a question and answer format we describe important aspects of using genomic technologies in cancer research. The main challenges are not managing the mass of data, but rather the design, analysis, and accurate reporting of studies that result in increased biological knowledge and medical utility. Many analysis issues address the use of expression microarrays but are also applicable to other whole genome assays. Microarray-based clinical investigations have generated both unrealistic hype and excessive skepticism. Genomic technologies are tremendously powerful and will play instrumental roles in elucidating the mechanisms of oncogenesis and in bringing on an era of predictive medicine in which treatments are tailored to individual tumors. Achieving these goals involves challenges in rethinking many paradigms for the conduct of basic and clinical cancer research and for the organization of interdisciplinary collaboration.
Statistical modeling of space shuttle environmental data
NASA Technical Reports Server (NTRS)
Tubbs, J. D.; Brewer, D. W.
1983-01-01
Statistical models which use a class of bivariate gamma distribution are examined. Topics discussed include: (1) the ratio of positively correlated gamma varieties; (2) a method to determine if unequal shape parameters are necessary in bivariate gamma distribution; (3) differential equations for modal location of a family of bivariate gamma distribution; and (4) analysis of some wind gust data using the analytical results developed for modeling application.
Thoth: Software for data visualization & statistics
NASA Astrophysics Data System (ADS)
Laher, R. R.
2016-10-01
Thoth is a standalone software application with a graphical user interface for making it easy to query, display, visualize, and analyze tabular data stored in relational databases and data files. From imported data tables, it can create pie charts, bar charts, scatter plots, and many other kinds of data graphs with simple menus and mouse clicks (no programming required), by leveraging the open-source JFreeChart library. It also computes useful table-column data statistics. A mature tool, having underwent development and testing over several years, it is written in the Java computer language, and hence can be run on any computing platform that has a Java Virtual Machine and graphical-display capability. It can be downloaded and used by anyone free of charge, and has general applicability in science, engineering, medical, business, and other fields. Special tools and features for common tasks in astronomy and astrophysical research are included in the software.
The seismic analyzer: interpreting and illustrating 2D seismic data.
Patel, Daniel; Giertsen, Christopher; Thurmond, John; Gjelberg, John; Gröller, M Eduard
2008-01-01
We present a toolbox for quickly interpreting and illustrating 2D slices of seismic volumetric reflection data. Searching for oil and gas involves creating a structural overview of seismic reflection data to identify hydrocarbon reservoirs. We improve the search of seismic structures by precalculating the horizon structures of the seismic data prior to interpretation. We improve the annotation of seismic structures by applying novel illustrative rendering algorithms tailored to seismic data, such as deformed texturing and line and texture transfer functions. The illustrative rendering results in multi-attribute and scale invariant visualizations where features are represented clearly in both highly zoomed in and zoomed out views. Thumbnail views in combination with interactive appearance control allows for a quick overview of the data before detailed interpretation takes place. These techniques help reduce the work of seismic illustrators and interpreters.
78 FR 10166 - Access Interpreting; Transfer of Data
Federal Register 2010, 2011, 2012, 2013, 2014
2013-02-13
... From the Federal Register Online via the Government Publishing Office ENVIRONMENTAL PROTECTION AGENCY Access Interpreting; Transfer of Data AGENCY: Environmental Protection Agency (EPA). ACTION: Notice. SUMMARY: This notice announces that pesticide related information submitted to EPA's Office...
Interpreting New Data from the High Energy Frontier
Thaler, Jesse
2016-09-26
This is the final technical report for DOE grant DE-SC0006389, "Interpreting New Data from the High Energy Frontier", describing research accomplishments by the PI in the field of theoretical high energy physics.
Menzerath-Altmann Law: Statistical Mechanical Interpretation as Applied to a Linguistic Organization
NASA Astrophysics Data System (ADS)
Eroglu, Sertac
2014-10-01
The distribution behavior described by the empirical Menzerath-Altmann law is frequently encountered during the self-organization of linguistic and non-linguistic natural organizations at various structural levels. This study presents a statistical mechanical derivation of the law based on the analogy between the classical particles of a statistical mechanical organization and the distinct words of a textual organization. The derived model, a transformed (generalized) form of the Menzerath-Altmann model, was termed as the statistical mechanical Menzerath-Altmann model. The derived model allows interpreting the model parameters in terms of physical concepts. We also propose that many organizations presenting the Menzerath-Altmann law behavior, whether linguistic or not, can be methodically examined by the transformed distribution model through the properly defined structure-dependent parameter and the energy associated states.
Some statistical issues in modelling pharmacokinetic data.
Lindsey, J K; Jones, B; Jarvis, P
A fundamental assumption underlying pharmacokinetic compartment modelling is that each subject has a different individual curve. To some extent this runs counter to the statistical principle that similar individuals will have similar curves, thus making inferences to a wider population possible. In population pharmacokinetics, the compromise is to use random effects. We recommend that such models also be used in data rich situations instead of independently fitting individual curves. However, the additional information available in such studies shows that random effects are often not sufficient; generally, an autoregressive process is also required. This has the added advantage that it provides a means of tracking each individual, yielding predictions for the next observation. The compartment model curve being fitted may also be distorted in other ways. A widely held assumption is that most, if not all, pharmacokinetic concentration data follow a log-normal distribution. By examples, we show that this is not generally true, with the gamma distribution often being more suitable. When extreme individuals are present, a heavy-tailed distribution, such as the log Cauchy, can often provide more robust results. Finally, other assumptions that can distort the results include a direct dependence of the variance, or other dispersion parameter, on the mean and setting non-detectable values to some arbitrarily small value instead of treating them as censored. By pointing out these problems with standard methods of statistical modelling of pharmacokinetic data, we hope that commercial software will soon make more flexible and suitable models available.
Revisiting the statistical analysis of pyroclast density and porosity data
NASA Astrophysics Data System (ADS)
Bernard, B.; Kueppers, U.; Ortiz, H.
2015-03-01
Explosive volcanic eruptions are commonly characterized based on a thorough analysis of the generated deposits. Amongst other characteristics in physical volcanology, density and porosity of juvenile clasts are some of the most frequently used characteristics to constrain eruptive dynamics. In this study, we evaluate the sensitivity of density and porosity data and introduce a weighting parameter to correct issues raised by the use of frequency analysis. Results of textural investigation can be biased by clast selection. Using statistical tools as presented here, the meaningfulness of a conclusion can be checked for any dataset easily. This is necessary to define whether or not a sample has met the requirements for statistical relevance, i.e. whether a dataset is large enough to allow for reproducible results. Graphical statistics are used to describe density and porosity distributions, similar to those used for grain-size analysis. This approach helps with the interpretation of volcanic deposits. To illustrate this methodology we chose two large datasets: (1) directed blast deposits of the 3640-3510 BC eruption of Chachimbiro volcano (Ecuador) and (2) block-and-ash-flow deposits of the 1990-1995 eruption of Unzen volcano (Japan). We propose add the use of this analysis for future investigations to check the objectivity of results achieved by different working groups and guarantee the meaningfulness of the interpretation.
A Novel Statistical Analysis and Interpretation of Flow Cytometry Data
2013-07-05
aCenter for Research in Scientific Computation and Center for Quantitative Sciences in Biomedicine, North Carolina State University, Raleigh, NC 27695-8212...dependent com- partmental model for computing cell numbers in CFSE-based lymphocyte proliferation assays, Math . Biosci. Eng. 9 (2012), pp. 699–736. CRSC-TR12...USA; bICREA Infection Biology Laboratory, Department of Experimental and Health Sciences , Universitat Pompeu Fabra, 08003 Barcelona, Spain (Received
A Novel Statistical Analysis and Interpretation of Flow Cytometry Data
2013-03-31
Scientific Computation and Center for Quantitative Sciences in Biomedicine North Carolina State University, Raleigh, NC 27695-8212 Cristina Peligero...Jordi Argilaguet, and Andreas Meyerhans ICREA Infection Biology Lab, Department of Experimental and Health Sciences Universitat Pompeu Fabra, 08003...the fast computational approaches as described in [27]. It is also shown how the new model can be compared with older label-structured models such as
The Statistical Literacy Needed to Interpret School Assessment Data
ERIC Educational Resources Information Center
Chick, Helen; Pierce, Robyn
2013-01-01
State-wide and national testing in areas such as literacy and numeracy produces reports containing graphs and tables illustrating school and individual performance. These are intended to inform teachers, principals, and education organisations about student and school outcomes, to guide change and improvement. Given the complexity of the…
Regional interpretation of water-quality monitoring data
Smith, R.A.; Schwarz, G.E.; Alexander, R.B.
1997-01-01
We describe a method for using spatially referenced regressions of contaminant transport on watershed attributes (SPARROW) in regional water-quality assessment. The method is designed to reduce the problems of data interpretation caused by sparse sampling, network bias, and basin heterogeneity. The regression equation relates measured transport rates in streams to spatially referenced descriptors of pollution sources and land-surface and stream-channel characteristics. Regression models of total phosphorus (TP) and total nitrogen (TN) transport are constructed for a region defined as the nontidal conterminous United States. Observed TN and TP transport rates are derived from water-quality records for 414 stations in the National Stream Quality Accounting Network. Nutrient sources identified in the equations include point sources, applied fertilizer, livestock waste, nonagricultural land, and atmospheric deposition (TN only). Surface characteristics found to be significant predictors of land-water delivery include soil permeability, stream density, and temperature (TN only). Estimated instream decay coefficients for the two contaminants decrease monotonically with increasing stream size. TP transport is found to be significantly reduced by reservoir retention. Spatial referencing of basin attributes in relation to the stream channel network greatly increases their statistical significance and model accuracy. The method is used to estimate the proportion of watersheds in the conterminous United States (i.e., hydrologic cataloging units) with outflow TP concentrations less than the criterion of 0.1 mg L, and to classify cataloging units according to local TN yield (kg/km2/yr).
Regional interpretation of water-quality monitoring data
NASA Astrophysics Data System (ADS)
Smith, Richard A.; Schwarz, Gregory E.; Alexander, Richard B.
1997-12-01
We describe a method for using spatially referenced regressions of contaminant transport on watershed attributes (SPARROW) in regional water-quality assessment. The method is designed to reduce the problems of data interpretation caused by sparse sampling, network bias, and basin heterogeneity. The regression equation relates measured transport rates in streams to spatially referenced descriptors of pollution sources and land-surface and stream-channel characteristics. Regression models of total phosphorus (TP) and total nitrogen (TN) transport are constructed for a region defined as the nontidal conterminous United States. Observed TN and TP transport rates are derived from water-quality records for 414 stations in the National Stream Quality Accounting Network. Nutrient sources identified in the equations include point sources, applied fertilizer, livestock waste, nonagricultural land, and atmospheric deposition (TN only). Surface characteristics found to be significant predictors of land-water delivery include soil permeability, stream density, and temperature (TN only). Estimated instream decay coefficients for the two contaminants decrease monotonically with increasing stream size. TP transport is found to be significantly reduced by reservoir retention. Spatial referencing of basin attributes in relation to the stream channel network greatly increases their statistical significance and model accuracy. The method is used to estimate the proportion of watersheds in the conterminous United States (i.e., hydrologic cataloging units) with outflow TP concentrations less than the criterion of 0.1 mg/L, and to classify cataloging units according to local TN yield (kg/km2/yr).
Statistical methods and computing for big data
Wang, Chun; Chen, Ming-Hui; Schifano, Elizabeth; Wu, Jing
2016-01-01
Big data are data on a massive scale in terms of volume, intensity, and complexity that exceed the capacity of standard analytic tools. They present opportunities as well as challenges to statisticians. The role of computational statisticians in scientific discovery from big data analyses has been under-recognized even by peer statisticians. This article summarizes recent methodological and software developments in statistics that address the big data challenges. Methodologies are grouped into three classes: subsampling-based, divide and conquer, and online updating for stream data. As a new contribution, the online updating approach is extended to variable selection with commonly used criteria, and their performances are assessed in a simulation study with stream data. Software packages are summarized with focuses on the open source R and R packages, covering recent tools that help break the barriers of computer memory and computing power. Some of the tools are illustrated in a case study with a logistic regression for the chance of airline delay. PMID:27695593
Quantitative interpretation of airborne gravity gradiometry data for mineral exploration
NASA Astrophysics Data System (ADS)
Martinez, Cericia D.
In the past two decades, commercialization of previously classified instrumentation has provided the ability to rapidly collect quality gravity gradient measurements for resource exploration. In the near future, next-generation instrumentation are expected to further advance acquisition of higher-quality data not subject to pre-processing regulations. Conversely, the ability to process and interpret gravity gradiometry data has not kept pace with innovations occurring in data acquisition systems. The purpose of the research presented in this thesis is to contribute to the understanding, development, and application of processing and interpretation techniques available for airborne gravity gradiometry in resource exploration. In particular, this research focuses on the utility of 3D inversion of gravity gradiometry for interpretation purposes. Towards this goal, I investigate the requisite components for an integrated interpretation workflow. In addition to practical 3D inversions, components of the workflow include estimation of density for terrain correction, processing of multi-component data using equivalent source for denoising, quantification of noise level, and component conversion. The objective is to produce high quality density distributions for subsequent geological interpretation. I then investigate the use of the inverted density model in orebody imaging, lithology differentiation, and resource evaluation. The systematic and sequential approach highlighted in the thesis addresses some of the challenges facing the use of gravity gradiometry as an exploration tool, while elucidating a procedure for incorporating gravity gradient interpretations into the lifecycle of not only resource exploration, but also resource modeling.
Fordyce, James A.
2010-01-01
Background Phylogenetic hypotheses are increasingly being used to elucidate historical patterns of diversification rate-variation. Hypothesis testing is often conducted by comparing the observed vector of branching times to a null, pure-birth expectation. A popular method for inferring a decrease in speciation rate, which might suggest an early burst of diversification followed by a decrease in diversification rate is the γ statistic. Methodology Using simulations under varying conditions, I examine the sensitivity of γ to the distribution of the most recent branching times. Using an exploratory data analysis tool for lineages through time plots, tree deviation, I identified trees with a significant γ statistic that do not appear to have the characteristic early accumulation of lineages consistent with an early, rapid rate of cladogenesis. I further investigated the sensitivity of the γ statistic to recent diversification by examining the consequences of failing to simulate the full time interval following the most recent cladogenic event. The power of γ to detect rate decrease at varying times was assessed for simulated trees with an initial high rate of diversification followed by a relatively low rate. Conclusions The γ statistic is extraordinarily sensitive to recent diversification rates, and does not necessarily detect early bursts of diversification. This was true for trees of various sizes and completeness of taxon sampling. The γ statistic had greater power to detect recent diversification rate decreases compared to early bursts of diversification. Caution should be exercised when interpreting the γ statistic as an indication of early, rapid diversification. PMID:20668707
Weatherization Assistance Program - Background Data and Statistics
Eisenberg, Joel Fred
2010-03-01
This technical memorandum is intended to provide readers with information that may be useful in understanding the purposes, performance, and outcomes of the Department of Energy's (DOE's) Weatherization Assistance Program (Weatherization). Weatherization has been in operation for over thirty years and is the nation's largest single residential energy efficiency program. Its primary purpose, established by law, is 'to increase the energy efficiency of dwellings owned or occupied by low-income persons, reduce their total residential energy expenditures, and improve their health and safety, especially low-income persons who are particularly vulnerable such as the elderly, the handicapped, and children.' The American Reinvestment and Recovery Act PL111-5 (ARRA), passed and signed into law in February 2009, committed $5 Billion over two years to an expanded Weatherization Assistance Program. This has created substantial interest in the program, the population it serves, the energy and cost savings it produces, and its cost-effectiveness. This memorandum is intended to address the need for this kind of information. Statistically valid answers to many of the questions surrounding Weatherization and its performance require comprehensive evaluation of the program. DOE is undertaking precisely this kind of independent evaluation in order to ascertain program effectiveness and to improve its performance. Results of this evaluation effort will begin to emerge in late 2010 and 2011, but they require substantial time and effort. In the meantime, the data and statistics in this memorandum can provide reasonable and transparent estimates of key program characteristics. The memorandum is laid out in three sections. The first deals with some key characteristics describing low-income energy consumption and expenditures. The second section provides estimates of energy savings and energy bill reductions that the program can reasonably be presumed to be producing. The third section
Statistical atlas based extrapolation of CT data
NASA Astrophysics Data System (ADS)
Chintalapani, Gouthami; Murphy, Ryan; Armiger, Robert S.; Lepisto, Jyri; Otake, Yoshito; Sugano, Nobuhiko; Taylor, Russell H.; Armand, Mehran
2010-02-01
We present a framework to estimate the missing anatomical details from a partial CT scan with the help of statistical shape models. The motivating application is periacetabular osteotomy (PAO), a technique for treating developmental hip dysplasia, an abnormal condition of the hip socket that, if untreated, may lead to osteoarthritis. The common goals of PAO are to reduce pain, joint subluxation and improve contact pressure distribution by increasing the coverage of the femoral head by the hip socket. While current diagnosis and planning is based on radiological measurements, because of significant structural variations in dysplastic hips, a computer-assisted geometrical and biomechanical planning based on CT data is desirable to help the surgeon achieve optimal joint realignments. Most of the patients undergoing PAO are young females, hence it is usually desirable to minimize the radiation dose by scanning only the joint portion of the hip anatomy. These partial scans, however, do not provide enough information for biomechanical analysis due to missing iliac region. A statistical shape model of full pelvis anatomy is constructed from a database of CT scans. The partial volume is first aligned with the statistical atlas using an iterative affine registration, followed by a deformable registration step and the missing information is inferred from the atlas. The atlas inferences are further enhanced by the use of X-ray images of the patient, which are very common in an osteotomy procedure. The proposed method is validated with a leave-one-out analysis method. Osteotomy cuts are simulated and the effect of atlas predicted models on the actual procedure is evaluated.
NASA Technical Reports Server (NTRS)
Shewhart, Mark
1991-01-01
Statistical Process Control (SPC) charts are one of several tools used in quality control. Other tools include flow charts, histograms, cause and effect diagrams, check sheets, Pareto diagrams, graphs, and scatter diagrams. A control chart is simply a graph which indicates process variation over time. The purpose of drawing a control chart is to detect any changes in the process signalled by abnormal points or patterns on the graph. The Artificial Intelligence Support Center (AISC) of the Acquisition Logistics Division has developed a hybrid machine learning expert system prototype which automates the process of constructing and interpreting control charts.
Encoding Dissimilarity Data for Statistical Model Building.
Wahba, Grace
2010-12-01
We summarize, review and comment upon three papers which discuss the use of discrete, noisy, incomplete, scattered pairwise dissimilarity data in statistical model building. Convex cone optimization codes are used to embed the objects into a Euclidean space which respects the dissimilarity information while controlling the dimension of the space. A "newbie" algorithm is provided for embedding new objects into this space. This allows the dissimilarity information to be incorporated into a Smoothing Spline ANOVA penalized likelihood model, a Support Vector Machine, or any model that will admit Reproducing Kernel Hilbert Space components, for nonparametric regression, supervised learning, or semi-supervised learning. Future work and open questions are discussed. The papers are: F. Lu, S. Keles, S. Wright and G. Wahba 2005. A framework for kernel regularization with application to protein clustering. Proceedings of the National Academy of Sciences 102, 12332-1233.G. Corrada Bravo, G. Wahba, K. Lee, B. Klein, R. Klein and S. Iyengar 2009. Examining the relative influence of familial, genetic and environmental covariate information in flexible risk models. Proceedings of the National Academy of Sciences 106, 8128-8133F. Lu, Y. Lin and G. Wahba. Robust manifold unfolding with kernel regularization. TR 1008, Department of Statistics, University of Wisconsin-Madison.
Computer Simulation of Incomplete-Data Interpretation Exercise.
ERIC Educational Resources Information Center
Robertson, Douglas Frederick
1987-01-01
Described is a computer simulation that was used to help general education students enrolled in a large introductory geology course. The purpose of the simulation is to learn to interpret incomplete data. Students design a plan to collect bathymetric data for an area of the ocean. Procedures used by the students and instructor are included.…
Interpreting Survey Data to Inform Solid-Waste Education Programs
ERIC Educational Resources Information Center
McKeown, Rosalyn
2006-01-01
Few examples exist on how to use survey data to inform public environmental education programs. I suggest a process for interpreting statewide survey data with the four questions that give insights into local context and make it possible to gain insight into potential target audiences and community priorities. The four questions are: What…
Customizable tool for ecological data entry, assessment, monitoring, and interpretation
Technology Transfer Automated Retrieval System (TEKTRAN)
The Database for Inventory, Monitoring and Assessment (DIMA) is a highly customizable tool for data entry, assessment, monitoring, and interpretation. DIMA is a Microsoft Access database that can easily be used without Access knowledge and is available at no cost. Data can be entered for common, nat...
2010-01-01
Background The null hypothesis significance test (NHST) is the most frequently used statistical method, although its inferential validity has been widely criticized since its introduction. In 1988, the International Committee of Medical Journal Editors (ICMJE) warned against sole reliance on NHST to substantiate study conclusions and suggested supplementary use of confidence intervals (CI). Our objective was to evaluate the extent and quality in the use of NHST and CI, both in English and Spanish language biomedical publications between 1995 and 2006, taking into account the International Committee of Medical Journal Editors recommendations, with particular focus on the accuracy of the interpretation of statistical significance and the validity of conclusions. Methods Original articles published in three English and three Spanish biomedical journals in three fields (General Medicine, Clinical Specialties and Epidemiology - Public Health) were considered for this study. Papers published in 1995-1996, 2000-2001, and 2005-2006 were selected through a systematic sampling method. After excluding the purely descriptive and theoretical articles, analytic studies were evaluated for their use of NHST with P-values and/or CI for interpretation of statistical "significance" and "relevance" in study conclusions. Results Among 1,043 original papers, 874 were selected for detailed review. The exclusive use of P-values was less frequent in English language publications as well as in Public Health journals; overall such use decreased from 41% in 1995-1996 to 21% in 2005-2006. While the use of CI increased over time, the "significance fallacy" (to equate statistical and substantive significance) appeared very often, mainly in journals devoted to clinical specialties (81%). In papers originally written in English and Spanish, 15% and 10%, respectively, mentioned statistical significance in their conclusions. Conclusions Overall, results of our review show some improvements in
Design, analysis, and interpretation of field quality-control data for water-sampling projects
Mueller, David K.; Schertz, Terry L.; Martin, Jeffrey D.; Sandstrom, Mark W.
2015-01-01
The report provides extensive information about statistical methods used to analyze quality-control data in order to estimate potential bias and variability in environmental data. These methods include construction of confidence intervals on various statistical measures, such as the mean, percentiles and percentages, and standard deviation. The methods are used to compare quality-control results with the larger set of environmental data in order to determine whether the effects of bias and variability might interfere with interpretation of these data. Examples from published reports are presented to illustrate how the methods are applied, how bias and variability are reported, and how the interpretation of environmental data can be qualified based on the quality-control analysis.
Accessing seismic data through geological interpretation: Challenges and solutions
NASA Astrophysics Data System (ADS)
Butler, R. W.; Clayton, S.; McCaffrey, B.
2008-12-01
Between them, the world's research programs, national institutions and corporations, especially oil and gas companies, have acquired substantial volumes of seismic reflection data. Although the vast majority are proprietary and confidential, significant data are released and available for research, including those in public data libraries. The challenge now is to maximise use of these data, by providing routes to seismic not simply on the basis of acquisition or processing attributes but via the geology they image. The Virtual Seismic Atlas (VSA: www.seismicatlas.org) meets this challenge by providing an independent, free-to-use community based internet resource that captures and shares the geological interpretation of seismic data globally. Images and associated documents are explicitly indexed by extensive metadata trees, using not only existing survey and geographical data but also the geology they portray. The solution uses a Documentum database interrogated through Endeca Guided Navigation, to search, discover and retrieve images. The VSA allows users to compare contrasting interpretations of clean data thereby exploring the ranges of uncertainty in the geometric interpretation of subsurface structure. The metadata structures can be used to link reports and published research together with other data types such as wells. And the VSA can link to existing data libraries. Searches can take different paths, revealing arrays of geological analogues, new datasets while providing entirely novel insights and genuine surprises. This can then drive new creative opportunities for research and training, and expose the contents of seismic data libraries to the world.
2-D Versus 3-D Magnetotelluric Data Interpretation
NASA Astrophysics Data System (ADS)
Ledo, Juanjo
2005-09-01
In recent years, the number of publications dealing with the mathematical and physical 3-D aspects of the magnetotelluric method has increased drastically. However, field experiments on a grid are often impractical and surveys are frequently restricted to single or widely separated profiles. So, in many cases we find ourselves with the following question: is the applicability of the 2-D hypothesis valid to extract geoelectric and geological information from real 3-D environments? The aim of this paper is to explore a few instructive but general situations to understand the basics of a 2-D interpretation of 3-D magnetotelluric data and to determine which data subset (TE-mode or TM-mode) is best for obtaining the electrical conductivity distribution of the subsurface using 2-D techniques. A review of the mathematical and physical fundamentals of the electromagnetic fields generated by a simple 3-D structure allows us to prioritise the choice of modes in a 2-D interpretation of responses influenced by 3-D structures. This analysis is corroborated by numerical results from synthetic models and by real data acquired by other authors. One important result of this analysis is that the mode most unaffected by 3-D effects depends on the position of the 3-D structure with respect to the regional 2-D strike direction. When the 3-D body is normal to the regional strike, the TE-mode is affected mainly by galvanic effects, while the TM-mode is affected by galvanic and inductive effects. In this case, a 2-D interpretation of the TM-mode is prone to error. When the 3-D body is parallel to the regional 2-D strike the TE-mode is affected by galvanic and inductive effects and the TM-mode is affected mainly by galvanic effects, making it more suitable for 2-D interpretation. In general, a wise 2-D interpretation of 3-D magnetotelluric data can be a guide to a reasonable geological interpretation.
Soil VisNIR chemometric performance statistics should be interpreted as random variables
NASA Astrophysics Data System (ADS)
Brown, David J.; Gasch, Caley K.; Poggio, Matteo; Morgan, Cristine L. S.
2015-04-01
Chemometric models are normally evaluated using performance statistics such as the Standard Error of Prediction (SEP) or the Root Mean Squared Error of Prediction (RMSEP). These statistics are used to evaluate the quality of chemometric models relative to other published work on a specific soil property or to compare the results from different processing and modeling techniques (e.g. Partial Least Squares Regression or PLSR and random forest algorithms). Claims are commonly made about the overall success of an application or the relative performance of different modeling approaches assuming that these performance statistics are fixed population parameters. While most researchers would acknowledge that small differences in performance statistics are not important, rarely are performance statistics treated as random variables. Given that we are usually comparing modeling approaches for general application, and given that the intent of VisNIR soil spectroscopy is to apply chemometric calibrations to larger populations than are included in our soil-spectral datasets, it is more appropriate to think of performance statistics as random variables with variation introduced through the selection of samples for inclusion in a given study and through the division of samples into calibration and validation sets (including spiking approaches). Here we look at the variation in VisNIR performance statistics for the following soil-spectra datasets: (1) a diverse US Soil Survey soil-spectral library with 3768 samples from all 50 states and 36 different countries; (2) 389 surface and subsoil samples taken from US Geological Survey continental transects; (3) the Texas Soil Spectral Library (TSSL) with 3000 samples; (4) intact soil core scans of Texas soils with 700 samples; (5) approximately 400 in situ scans from the Pacific Northwest region; and (6) miscellaneous local datasets. We find the variation in performance statistics to be surprisingly large. This has important
Geologic interpretation of HCMM and aircraft thermal data
NASA Technical Reports Server (NTRS)
1982-01-01
Progress on the Heat Capacity Mapping Mission (HCMM) follow-on study is reported. Numerous image products for geologic interpretation of both HCMM and aircraft thermal data were produced. These include, among others, various combinations of the thermal data with LANDSAT and SEASAT data. The combined data sets were displayed using simple color composites, principal component color composites and black and white images, and hue, saturation intensity color composites. Algorithms for incorporating both atmospheric and elevation data simultaneously into the digital processing for creation of quantitatively correct thermal inertia images, are in the final development stage. A field trip to Death Valley was undertaken to field check the aircraft and HCMM data.
Statistical mapping of count survey data
Royle, J. Andrew; Link, W.A.; Sauer, J.R.; Scott, J. Michael; Heglund, Patricia J.; Morrison, Michael L.; Haufler, Jonathan B.; Wall, William A.
2002-01-01
We apply a Poisson mixed model to the problem of mapping (or predicting) bird relative abundance from counts collected from the North American Breeding Bird Survey (BBS). The model expresses the logarithm of the Poisson mean as a sum of a fixed term (which may depend on habitat variables) and a random effect which accounts for remaining unexplained variation. The random effect is assumed to be spatially correlated, thus providing a more general model than the traditional Poisson regression approach. Consequently, the model is capable of improved prediction when data are autocorrelated. Moreover, formulation of the mapping problem in terms of a statistical model facilitates a wide variety of inference problems which are cumbersome or even impossible using standard methods of mapping. For example, assessment of prediction uncertainty, including the formal comparison of predictions at different locations, or through time, using the model-based prediction variance is straightforward under the Poisson model (not so with many nominally model-free methods). Also, ecologists may generally be interested in quantifying the response of a species to particular habitat covariates or other landscape attributes. Proper accounting for the uncertainty in these estimated effects is crucially dependent on specification of a meaningful statistical model. Finally, the model may be used to aid in sampling design, by modifying the existing sampling plan in a manner which minimizes some variance-based criterion. Model fitting under this model is carried out using a simulation technique known as Markov Chain Monte Carlo. Application of the model is illustrated using Mourning Dove (Zenaida macroura) counts from Pennsylvania BBS routes. We produce both a model-based map depicting relative abundance, and the corresponding map of prediction uncertainty. We briefly address the issue of spatial sampling design under this model. Finally, we close with some discussion of mapping in relation to
Building software tools to help contextualize and interpret monitoring data
Technology Transfer Automated Retrieval System (TEKTRAN)
Even modest monitoring efforts at landscape scales produce large volumes of data.These are most useful if they can be interpreted relative to land potential or other similar sites. However, for many ecological systems reference conditions may not be defined or are poorly described, which hinders und...
Statistical Analysis of DWPF ARG-1 Data
Harris, S.P.
2001-03-02
A statistical analysis of analytical results for ARG-1, an Analytical Reference Glass, blanks, and the associated calibration and bench standards has been completed. These statistics provide a means for DWPF to review the performance of their laboratory as well as identify areas of improvement.
NASA Astrophysics Data System (ADS)
Bellac, Michel Le
2014-11-01
Although nobody can question the practical efficiency of quantum mechanics, there remains the serious question of its interpretation. As Valerio Scarani puts it, "We do not feel at ease with the indistinguishability principle (that is, the superposition principle) and some of its consequences." Indeed, this principle which pervades the quantum world is in stark contradiction with our everyday experience. From the very beginning of quantum mechanics, a number of physicists--but not the majority of them!--have asked the question of its "interpretation". One may simply deny that there is a problem: according to proponents of the minimalist interpretation, quantum mechanics is self-sufficient and needs no interpretation. The point of view held by a majority of physicists, that of the Copenhagen interpretation, will be examined in Section 10.1. The crux of the problem lies in the status of the state vector introduced in the preceding chapter to describe a quantum system, which is no more than a symbolic representation for the Copenhagen school of thought. Conversely, one may try to attribute some "external reality" to this state vector, that is, a correspondence between the mathematical description and the physical reality. In this latter case, it is the measurement problem which is brought to the fore. In 1932, von Neumann was first to propose a global approach, in an attempt to build a purely quantum theory of measurement examined in Section 10.2. This theory still underlies modern approaches, among them those grounded on decoherence theory, or on the macroscopic character of the measuring apparatus: see Section 10.3. Finally, there are non-standard interpretations such as Everett's many worlds theory or the hidden variables theory of de Broglie and Bohm (Section 10.4). Note, however, that this variety of interpretations has no bearing whatsoever on the practical use of quantum mechanics. There is no controversy on the way we should use quantum mechanics!
47 CFR 1.363 - Introduction of statistical data.
Code of Federal Regulations, 2010 CFR
2010-10-01
... 47 Telecommunication 1 2010-10-01 2010-10-01 false Introduction of statistical data. 1.363 Section... Proceedings Evidence § 1.363 Introduction of statistical data. (a) All statistical studies, offered in... analyses, and experiments, and those parts of other studies involving statistical methodology shall...
47 CFR 1.363 - Introduction of statistical data.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 47 Telecommunication 1 2011-10-01 2011-10-01 false Introduction of statistical data. 1.363 Section... Proceedings Evidence § 1.363 Introduction of statistical data. (a) All statistical studies, offered in... analyses, and experiments, and those parts of other studies involving statistical methodology shall...
47 CFR 1.363 - Introduction of statistical data.
Code of Federal Regulations, 2014 CFR
2014-10-01
... 47 Telecommunication 1 2014-10-01 2014-10-01 false Introduction of statistical data. 1.363 Section... Proceedings Evidence § 1.363 Introduction of statistical data. (a) All statistical studies, offered in... analyses, and experiments, and those parts of other studies involving statistical methodology shall...
47 CFR 1.363 - Introduction of statistical data.
Code of Federal Regulations, 2013 CFR
2013-10-01
... 47 Telecommunication 1 2013-10-01 2013-10-01 false Introduction of statistical data. 1.363 Section... Proceedings Evidence § 1.363 Introduction of statistical data. (a) All statistical studies, offered in... analyses, and experiments, and those parts of other studies involving statistical methodology shall...
Toxic substances and human risk: principles of data interpretation
Tardiff, R.G.; Rodricks, J.V.
1988-01-01
This book provides a comprehensive overview of the relationship between toxicology and risk assessment and identifying the principles that should be used to evaluate toxicological data for human risk assessment. The book opens by distinguishing between the practice of toxicology as a science (observational and data-gathering activities) and its practice as an art (predictive or risk-estimating activities). This dichotomous nature produces the two elemental problems with which users of toxicological data must grapple. First, how relevant are data provided by the science of toxicology to assessment of human health risks. Second, what methods of data interpretation should be used to formulate hypotheses or predictions regarding human health risk.
Statistical modelling for falls count data.
Ullah, Shahid; Finch, Caroline F; Day, Lesley
2010-03-01
Falls and their injury outcomes have count distributions that are highly skewed toward the right with clumping at zero, posing analytical challenges. Different modelling approaches have been used in the published literature to describe falls count distributions, often without consideration of the underlying statistical and modelling assumptions. This paper compares the use of modified Poisson and negative binomial (NB) models as alternatives to Poisson (P) regression, for the analysis of fall outcome counts. Four different count-based regression models (P, NB, zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB)) were each individually fitted to four separate fall count datasets from Australia, New Zealand and United States. The finite mixtures of P and NB regression models were also compared to the standard NB model. Both analytical (F, Vuong and bootstrap tests) and graphical approaches were used to select and compare models. Simulation studies assessed the size and power of each model fit. This study confirms that falls count distributions are over-dispersed, but not dispersed due to excess zero counts or heterogeneous population. Accordingly, the P model generally provided the poorest fit to all datasets. The fit improved significantly with NB and both zero-inflated models. The fit was also improved with the NB model, compared to finite mixtures of both P and NB regression models. Although there was little difference in fit between NB and ZINB models, in the interests of parsimony it is recommended that future studies involving modelling of falls count data routinely use the NB models in preference to the P or ZINB or finite mixture distribution. The fact that these conclusions apply across four separate datasets from four different samples of older people participating in studies of different methodology, adds strength to this general guiding principle.
New Statistical Approach to the Analysis of Hierarchical Data
NASA Astrophysics Data System (ADS)
Neuman, S. P.; Guadagnini, A.; Riva, M.
2014-12-01
Many variables possess a hierarchical structure reflected in how their increments vary in space and/or time. Quite commonly the increments (a) fluctuate in a highly irregular manner; (b) possess symmetric, non-Gaussian frequency distributions characterized by heavy tails that often decay with separation distance or lag; (c) exhibit nonlinear power-law scaling of sample structure functions in a midrange of lags, with breakdown in such scaling at small and large lags; (d) show extended power-law scaling (ESS) at all lags; and (e) display nonlinear scaling of power-law exponent with order of sample structure function. Some interpret this to imply that the variables are multifractal, which explains neither breakdowns in power-law scaling nor ESS. We offer an alternative interpretation consistent with all above phenomena. It views data as samples from stationary, anisotropic sub-Gaussian random fields subordinated to truncated fractional Brownian motion (tfBm) or truncated fractional Gaussian noise (tfGn). The fields are scaled Gaussian mixtures with random variances. Truncation of fBm and fGn entails filtering out components below data measurement or resolution scale and above domain scale. Our novel interpretation of the data allows us to obtain maximum likelihood estimates of all parameters characterizing the underlying truncated sub-Gaussian fields. These parameters in turn make it possible to downscale or upscale all statistical moments to situations entailing smaller or larger measurement or resolution and sampling scales, respectively. They also allow one to perform conditional or unconditional Monte Carlo simulations of random field realizations corresponding to these scales. Aspects of our approach are illustrated on field and laboratory measured porous and fractured rock permeabilities, as well as soil texture characteristics and neural network estimates of unsaturated hydraulic parameters in a deep vadose zone near Phoenix, Arizona. We also use our approach
Implementation of ILLIAC 4 algorithms for multispectral image interpretation. [earth resources data
NASA Technical Reports Server (NTRS)
Ray, R. M.; Thomas, J. D.; Donovan, W. E.; Swain, P. H.
1974-01-01
Research has focused on the design and partial implementation of a comprehensive ILLIAC software system for computer-assisted interpretation of multispectral earth resources data such as that now collected by the Earth Resources Technology Satellite. Research suggests generally that the ILLIAC 4 should be as much as two orders of magnitude more cost effective than serial processing computers for digital interpretation of ERTS imagery via multivariate statistical classification techniques. The potential of the ARPA Network as a mechanism for interfacing geographically-dispersed users to an ILLIAC 4 image processing facility is discussed.
Mobile Collection and Automated Interpretation of EEG Data
NASA Technical Reports Server (NTRS)
Mintz, Frederick; Moynihan, Philip
2007-01-01
A system that would comprise mobile and stationary electronic hardware and software subsystems has been proposed for collection and automated interpretation of electroencephalographic (EEG) data from subjects in everyday activities in a variety of environments. By enabling collection of EEG data from mobile subjects engaged in ordinary activities (in contradistinction to collection from immobilized subjects in clinical settings), the system would expand the range of options and capabilities for performing diagnoses. Each subject would be equipped with one of the mobile subsystems, which would include a helmet that would hold floating electrodes (see figure) in those positions on the patient s head that are required in classical EEG data-collection techniques. A bundle of wires would couple the EEG signals from the electrodes to a multi-channel transmitter also located in the helmet. Electronic circuitry in the helmet transmitter would digitize the EEG signals and transmit the resulting data via a multidirectional RF patch antenna to a remote location. At the remote location, the subject s EEG data would be processed and stored in a database that would be auto-administered by a newly designed relational database management system (RDBMS). In this RDBMS, in nearly real time, the newly stored data would be subjected to automated interpretation that would involve comparison with other EEG data and concomitant peer-reviewed diagnoses stored in international brain data bases administered by other similar RDBMSs.
Statistics: The Shape of the Data. Used Numbers: Real Data in the Classroom. Grades 4-6.
ERIC Educational Resources Information Center
Russell, Susan Jo; Corwin, Rebecca B.
A unit of study that introduces collecting, representing, describing, and interpreting data is presented. Suitable for students in grades 4 through 6, it provides a foundation for further work in statistics and data analysis. The investigations may extend from one to four class sessions and are grouped into three parts: "Introduction to Data…
Metal Complexes of EDTA: An Exercise in Data Interpretation
NASA Astrophysics Data System (ADS)
Mitchell, Philip C. H.
1997-10-01
Stability constants of metal complexes of edta with main group and transition metals are correlated with properties of the elements and cations (ion charge, atomic and ionic radii, ionization energies and electronegativities) and interpreted with an ionic bonding model including a covalent contribution. Enthalpy and entropy contributions are discussed. It is shown how chemists recognize patterns in data with the help of a general theory and so develop a model.
Amplitude interpretation and visualization of three-dimensional reflection data
Enachescu, M.E. )
1994-07-01
Digital recording and processing of modern three-dimensional surveys allow for relative good preservation and correct spatial positioning of seismic reflection amplitude. A four-dimensional seismic reflection field matrix R (x,y,t,A), which can be computer visualized (i.e., real-time interactively rendered, edited, and animated), is now available to the interpreter. The amplitude contains encoded geological information indirectly related to lithologies and reservoir properties. The magnitude of the amplitude depends not only on the acoustic impedance contrast across a boundary, but is also strongly affected by the shape of the reflective boundary. This allows the interpreter to image subtle tectonic and structural elements not obvious on time-structure maps. The use of modern workstations allows for appropriate color coding of the total available amplitude range, routine on-screen time/amplitude extraction, and late display of horizon amplitude maps (horizon slices) or complex amplitude-structure spatial visualization. Stratigraphic, structural, tectonic, fluid distribution, and paleogeographic information are commonly obtained by displaying the amplitude variation A = A(x,y,t) associated with a particular reflective surface or seismic interval. As illustrated with several case histories, traditional structural and stratigraphic interpretation combined with a detailed amplitude study generally greatly enhance extraction of subsurface geological information from a reflection data volume. In the context of three-dimensional seismic surveys, the horizon amplitude map (horizon slice), amplitude attachment to structure and [open quotes]bright clouds[close quotes] displays are very powerful tools available to the interpreter.
Bayesian Statistics for Biological Data: Pedigree Analysis
ERIC Educational Resources Information Center
Stanfield, William D.; Carlton, Matthew A.
2004-01-01
The use of Bayes' formula is applied to the biological problem of pedigree analysis to show that the Bayes' formula and non-Bayesian or "classical" methods of probability calculation give different answers. First year college students of biology can be introduced to the Bayesian statistics.
Statistical Treatment of Looking-Time Data
ERIC Educational Resources Information Center
Csibra, Gergely; Hernik, Mikolaj; Mascaro, Olivier; Tatone, Denis; Lengyel, Máté
2016-01-01
Looking times (LTs) are frequently measured in empirical research on infant cognition. We analyzed the statistical distribution of LTs across participants to develop recommendations for their treatment in infancy research. Our analyses focused on a common within-subject experimental design, in which longer looking to novel or unexpected stimuli is…
Patton, Charles J.; Gilroy, Edward J.
1999-01-01
Data on which this report is based, including nutrient concentrations in synthetic reference samples determined concurrently with those in real samples, are extensive (greater than 20,000 determinations) and have been published separately. In addition to confirming the well-documented instability of nitrite in acidified samples, this study also demonstrates that when biota are removed from samples at collection sites by 0.45-micrometer membrane filtration, subsequent preservation with sulfuric acid or mercury (II) provides no statistically significant improvement in nutrient concentration stability during storage at 4 degrees Celsius for 30 days. Biocide preservation had no statistically significant effect on the 30-day stability of phosphorus concentrations in whole-water splits from any of the 15 stations, but did stabilize Kjeldahl nitrogen concentrations in whole-water splits from three data-collection stations where ammonium accounted for at least half of the measured Kjeldahl nitrogen.
Huang, Ruili; Southall, Noel; Xia, Menghang; Cho, Ming-Hsuang; Jadhav, Ajit; Nguyen, Dac-Trung; Inglese, James; Tice, Raymond R.; Austin, Christopher P.
2009-01-01
In support of the U.S. Tox21 program, we have developed a simple and chemically intuitive model we call weighted feature significance (WFS) to predict the toxicological activity of compounds, based on the statistical enrichment of structural features in toxic compounds. We trained and tested the model on the following: (1) data from quantitative high–throughput screening cytotoxicity and caspase activation assays conducted at the National Institutes of Health Chemical Genomics Center, (2) data from Salmonella typhimurium reverse mutagenicity assays conducted by the U.S. National Toxicology Program, and (3) hepatotoxicity data published in the Registry of Toxic Effects of Chemical Substances. Enrichments of structural features in toxic compounds are evaluated for their statistical significance and compiled into a simple additive model of toxicity and then used to score new compounds for potential toxicity. The predictive power of the model for cytotoxicity was validated using an independent set of compounds from the U.S. Environmental Protection Agency tested also at the National Institutes of Health Chemical Genomics Center. We compared the performance of our WFS approach with classical classification methods such as Naive Bayesian clustering and support vector machines. In most test cases, WFS showed similar or slightly better predictive power, especially in the prediction of hepatotoxic compounds, where WFS appeared to have the best performance among the three methods. The new algorithm has the important advantages of simplicity, power, interpretability, and ease of implementation. PMID:19805409
Practical Considerations in Clinical Pathology Data Interpretation and Description.
Hall, Robert L
2017-02-01
Although interpretation and description of clinical pathology test results for any preclinical safety assessment study should employ a consistent standard approach, companies differ regarding that approach and the appearance of the end product. Some rely heavily on statistical analysis, others do not. Some believe reference intervals are important, most do not. Some prefer severity of effects be described by percentage differences from, or multiples of, baseline or control, others prefer only word modifiers. Some expect a definitive decision for every potential effect, others accept uncertainty. This commentary addresses these differences and underscores the need for flexibility in a "consistent standard approach" because the conditions of every study are unique. This article constitutes an overview of material originally presented at Session 2 of the 2016 Society of Toxicologic Pathology Annual Symposium.
Statistical data of the uranium industry
1980-01-01
This document is a compilation of historical facts and figures through 1979. These statistics are based primarily on information provided voluntarily by the uranium exploration, mining, and milling companies. The production, reserves, drilling, and production capability information has been reported in a manner which avoids disclosure of proprietary information. Only the totals for the $1.5 reserves are reported. Because of increased interest in higher cost resources for long range planning purposes, a section covering the distribution of $100 per pound reserves statistics has been newly included. A table of mill recovery ranges for the January 1, 1980 reserves has also been added to this year's edition. The section on domestic uranium production capability has been deleted this year but will be included next year. The January 1, 1980 potential resource estimates are unchanged from the January 1, 1979 estimates.
Quantitative interpretation of Great Lakes remote sensing data
NASA Technical Reports Server (NTRS)
Shook, D. F.; Salzman, J.; Svehla, R. A.; Gedney, R. T.
1980-01-01
The paper discusses the quantitative interpretation of Great Lakes remote sensing water quality data. Remote sensing using color information must take into account (1) the existence of many different organic and inorganic species throughout the Great Lakes, (2) the occurrence of a mixture of species in most locations, and (3) spatial variations in types and concentration of species. The radiative transfer model provides a potential method for an orderly analysis of remote sensing data and a physical basis for developing quantitative algorithms. Predictions and field measurements of volume reflectances are presented which show the advantage of using a radiative transfer model. Spectral absorptance and backscattering coefficients for two inorganic sediments are reported.
Interdisciplinary applications and interpretations of remotely sensed data
NASA Technical Reports Server (NTRS)
Peterson, G. W.; Mcmurtry, G. J.
1972-01-01
An interdisciplinary approach to use remote sensor for the inventory of natural resources is discussed. The areas under investigation are land use, determination of pollution sources and damage, and analysis of geologic structure and terrain. The geographical area of primary interest is the Susquehanna River Basin. Descriptions of the data obtained by aerial cameras, multiband cameras, optical mechanical scanners, and radar are included. The Earth Resources Technology Satellite and Skylab program are examined. Interpretations of spacecraft data to show specific areas of interest are developed.
Laboratory study supporting the interpretation of Solar Dynamics Observatory data
Trabert, E.; Beiersdorfer, P.
2015-01-29
High-resolution extreme ultraviolet spectra of ions in an electron beam ion trap are investigated as a laboratory complement of the moderate-resolution observation bands of the AIA experiment on board the Solar Dynamics Observatory (SDO) spacecraft. Here, the latter observations depend on dominant iron lines of various charge states which in combination yield temperature information on the solar plasma. Our measurements suggest additions to the spectral models that are used in the SDO data interpretation. In the process, we also note a fair number of inconsistencies among the wavelength reference data bases.
Borehole seismic data processing and interpretation: New free software
NASA Astrophysics Data System (ADS)
Farfour, Mohammed; Yoon, Wang Jung
2015-12-01
Vertical Seismic Profile (VSP) surveying is a vital tool in subsurface imaging and reservoir characterization. The technique allows geophysicists to infer critical information that cannot be obtained otherwise. MVSP is a new MATLAB tool with a graphical user interface (GUI) for VSP shot modeling, data processing, and interpretation. The software handles VSP data from the loading and preprocessing stages to the final stage of corridor plotting and integration with well and seismic data. Several seismic and signal processing toolboxes are integrated and modified to suit and enrich the processing and display packages. The main motivation behind the development of the software is to provide new geoscientists and students in the geoscience fields with free software that brings together all VSP modules in one easy-to-use package. The software has several modules that allow the user to test, process, compare, visualize, and produce publication-quality results. The software is developed as a stand-alone MATLAB application that requires only MATLAB Compiler Runtime (MCR) to run with full functionality. We present a detailed description of MVSP and use the software to create synthetic VSP data. The data are then processed using different available tools. Next, real data are loaded and fully processed using the software. The data are then integrated with well data for more detailed analysis and interpretation. In order to evaluate the software processing flow accuracy, the same data are processed using commercial software. Comparison of the processing results shows that MVSP is able to process VSP data as efficiently as commercial software packages currently used in industry, and provides similar high-quality processed data.
Lee, L.; Helsel, D.
2005-01-01
Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.
Interpretation of absorption bands in airborne hyperspectral radiance data.
Szekielda, Karl H; Bowles, Jeffrey H; Gillis, David B; Miller, W David
2009-01-01
It is demonstrated that hyperspectral imagery can be used, without atmospheric correction, to determine the presence of accessory phytoplankton pigments in coastal waters using derivative techniques. However, care must be taken not to confuse other absorptions for those caused by the presence of pigments. Atmospheric correction, usually the first step to making products from hyperspectral data, may not completely remove Fraunhofer lines and atmospheric absorption bands and these absorptions may interfere with identification of phytoplankton accessory pigments. Furthermore, the ability to resolve absorption bands depends on the spectral resolution of the spectrometer, which for a fixed spectral range also determines the number of observed bands. Based on this information, a study was undertaken to determine under what circumstances a hyperspectral sensor may determine the presence of pigments. As part of the study a hyperspectral imager was used to take high spectral resolution data over two different water masses. In order to avoid the problems associated with atmospheric correction this data was analyzed as radiance data without atmospheric correction. Here, the purpose was to identify spectral regions that might be diagnostic for photosynthetic pigments. Two well proven techniques were used to aid in absorption band recognition, the continuum removal of the spectra and the fourth derivative. The findings in this study suggest that interpretation of absorption bands in remote sensing data, whether atmospherically corrected or not, have to be carefully reviewed when they are interpreted in terms of photosynthetic pigments.
Interpretation of Absorption Bands in Airborne Hyperspectral Radiance Data
Szekielda, Karl H.; Bowles, Jeffrey H.; Gillis, David B.; Miller, W. David
2009-01-01
It is demonstrated that hyperspectral imagery can be used, without atmospheric correction, to determine the presence of accessory phytoplankton pigments in coastal waters using derivative techniques. However, care must be taken not to confuse other absorptions for those caused by the presence of pigments. Atmospheric correction, usually the first step to making products from hyperspectral data, may not completely remove Fraunhofer lines and atmospheric absorption bands and these absorptions may interfere with identification of phytoplankton accessory pigments. Furthermore, the ability to resolve absorption bands depends on the spectral resolution of the spectrometer, which for a fixed spectral range also determines the number of observed bands. Based on this information, a study was undertaken to determine under what circumstances a hyperspectral sensor may determine the presence of pigments. As part of the study a hyperspectral imager was used to take high spectral resolution data over two different water masses. In order to avoid the problems associated with atmospheric correction this data was analyzed as radiance data without atmospheric correction. Here, the purpose was to identify spectral regions that might be diagnostic for photosynthetic pigments. Two well proven techniques were used to aid in absorption band recognition, the continuum removal of the spectra and the fourth derivative. The findings in this study suggest that interpretation of absorption bands in remote sensing data, whether atmospherically corrected or not, have to be carefully reviewed when they are interpreted in terms of photosynthetic pigments. PMID:22574053
A derivation of the statistical characteristics of SAR imagery data. [Rayleigh speckle statistics
NASA Technical Reports Server (NTRS)
Wu, C.
1981-01-01
Basic statistical properties of the speckle effect and the associated spatial correlation of SAR image data are discussed. Statistics of SAR sensed measurement and their relationships to the surface mean power reflectivity are derived. The Rayleigh speckle model is reviewed. Applications of the derived statistics to SAR radiometric measures and image processing are considered.
NASA Astrophysics Data System (ADS)
Dralle, D.; Karst, N.; Thompson, S. E.
2015-12-01
Multiple competing theories suggest that power law behavior governs the observed first-order dynamics of streamflow recessions - the important process by which catchments dry-out via the stream network, altering the availability of surface water resources and in-stream habitat. Frequently modeled as: dq/dt = -aqb, recessions typically exhibit a high degree of variability, even within a single catchment, as revealed by significant shifts in the values of "a" and "b" across recession events. One potential source of this variability lies in underlying, hard-to-observe fluctuations in how catchment water storage is partitioned amongst distinct storage elements, each having different discharge behaviors. Testing this and competing hypotheses with widely available streamflow timeseries, however, has been hindered by a power law scaling artifact that obscures meaningful covariation between the recession parameters, "a" and "b". Here we briefly outline a technique that removes this artifact, revealing intriguing new patterns in the joint distribution of recession parameters. Using long-term flow data from catchments in Northern California, we explore temporal variations, and find that the "a" parameter varies strongly with catchment wetness. Then we explore how the "b" parameter changes with "a", and find that measures of its variation are maximized at intermediate "a" values. We propose an interpretation of this pattern based on statistical mechanics, meaning "b" can be viewed as an indicator of the catchment "microstate" - i.e. the partitioning of storage - and "a" as a measure of the catchment macrostate (i.e. the total storage). In statistical mechanics, entropy (i.e. microstate variance, that is the variance of "b") is maximized for intermediate values of extensive variables (i.e. wetness, "a"), as observed in the recession data. This interpretation of "a" and "b" was supported by model runs using a multiple-reservoir catchment toy model, and lends support to the
Empirical approach to interpreting card-sorting data
NASA Astrophysics Data System (ADS)
Wolf, Steven F.; Dougherty, Daniel P.; Kortemeyer, Gerd
2012-06-01
Since it was first published 30 years ago, the seminal paper of Chi et al. on expert and novice categorization of introductory problems led to a plethora of follow-up studies within and outside of the area of physics [Cogn. Sci. 5, 121 (1981)COGSD50364-021310.1207/s15516709cog0502_2]. These studies frequently encompass “card-sorting” exercises whereby the participants group problems. While this technique certainly allows insights into problem solving approaches, simple descriptive statistics more often than not fail to find significant differences between experts and novices. In moving beyond descriptive statistics, we describe a novel microscopic approach that takes into account the individual identity of the cards and uses graph theory and models to visualize, analyze, and interpret problem categorization experiments. We apply these methods to an introductory physics (mechanics) problem categorization experiment, and find that most of the variation in sorting outcome is not due to the sorter being an expert versus a novice, but rather due to an independent characteristic that we named “stacker” versus “spreader.” The fact that the expert-novice distinction only accounts for a smaller amount of the variation may explain the frequent null results when conducting these experiments.
The Systematic Interpretation of Cosmic Ray Data (The Transport Project)
NASA Technical Reports Server (NTRS)
Guzik, T. Gregory
1997-01-01
The Transport project's primary goals were to: (1) Provide measurements of critical fragmentation cross sections; (2) Study the cross section systematics; (3) Improve the galactic cosmic ray propagation methodology; and (4) Use the new cross section measurements to improve the interpretation of cosmic ray data. To accomplish these goals a collaboration was formed consisting of researchers in the US at Louisiana State University (LSU), Lawrence Berkeley Laboratory (LBL), Goddard Space Flight Center (GSFC), the University of Minnesota (UM), New Mexico State University (NMSU), in France at the Centre d'Etudes de Saclay and in Italy at the Universita di Catania. The US institutions, lead by LSU, were responsible for measuring new cross sections using the LBL HISS facility, analysis of these measurements and their application to interpreting cosmic ray data. France developed a liquid hydrogen target that was used in the HISS experiment and participated in the data interpretation. Italy developed a Multifunctional Neutron Spectrometer (MUFFINS) for the HISS runs to measure the energy spectra, angular distributions and multiplicities of neutrons emitted during the high energy interactions. The Transport Project was originally proposed to NASA during Summer, 1988 and funding began January, 1989. Transport was renewed twice (1991, 1994) and finally concluded at LSU on September, 30, 1997. During the more than 8 years of effort we had two major experiment runs at LBL, obtained data on the interaction of twenty different beams with a liquid hydrogen target, completed the analysis of fifteen of these datasets obtaining 590 new cross section measurements, published nine journal articles as well as eighteen conference proceedings papers, and presented more than thirty conference talks.
Internet Data Analysis for the Undergraduate Statistics Curriculum
ERIC Educational Resources Information Center
Sanchez, Juana; He, Yan
2005-01-01
Statistics textbooks for undergraduates have not caught up with the enormous amount of analysis of Internet data that is taking place these days. Case studies that use Web server log data or Internet network traffic data are rare in undergraduate Statistics education. And yet these data provide numerous examples of skewed and bimodal…
Guidelines for Statistical Analysis of Percentage of Syllables Stuttered Data
ERIC Educational Resources Information Center
Jones, Mark; Onslow, Mark; Packman, Ann; Gebski, Val
2006-01-01
Purpose: The purpose of this study was to develop guidelines for the statistical analysis of percentage of syllables stuttered (%SS) data in stuttering research. Method; Data on %SS from various independent sources were used to develop a statistical model to describe this type of data. On the basis of this model, %SS data were simulated with…
Urovi, V; Jimenez-Del-Toro, O; Dubosson, F; Ruiz Torres, A; Schumacher, M I
2017-02-01
This paper describes a novel temporal logic-based framework for reasoning with continuous data collected from wearable sensors. The work is motivated by the Metabolic Syndrome, a cluster of conditions which are linked to obesity and unhealthy lifestyle. We assume that, by interpreting the physiological parameters of continuous monitoring, we can identify which patients have a higher risk of Metabolic Syndrome. We define temporal patterns for reasoning with continuous data and specify the coordination mechanisms for combining different sets of clinical guidelines that relate to this condition. The proposed solution is tested with data provided by twenty subjects, which used sensors for four days of continuous monitoring. The results are compared to the gold standard. The novelty of the framework stands in extending a temporal logic formalism, namely the Event Calculus, with temporal patterns. These patterns are helpful to specify the rules for reasoning with continuous data and in combining new knowledge into one consistent outcome that is tailored to the patient's profile. The overall approach opens new possibilities for delivering patient-tailored interventions and educational material before the patients present the symptoms of the disease.
Presentation and interpretation of chemical data for igneous rocks
Wright, T.L.
1974-01-01
Arguments are made in favor of using variation diagrams to plot analyses of igneous rocks and their derivatives and modeling differentiation processes by least-squares mixing procedures. These methods permit study of magmatic differentiation and related processes in terms of all of the chemical data available. Data are presented as they are reported by the chemist and specific processes may be modeled and either quantitatively described or rejected as inappropriate or too simple. Examples are given of the differing interpretations that can arise when data are plotted on an AEM ternary vs. the same data on a full set of MgO variation diagrams. Mixing procedures are illustrated with reference to basaltic lavas from the Columbia Plateau. ?? 1974 Springer-Verlag.
Reliability of travel time data computed from interpreted migrated events
NASA Astrophysics Data System (ADS)
Jannaud, L. R.
1995-02-01
In the Sequential Migration Aided Reflection Tomography (SMART) method, travel times used by reflection tomography are computed by tracing rays which propagate with the migration velocity and reflect from reflectors picked on migrated images. Because of limits of migration resolution, this picking involves inaccuracies, to which computed travel times are unfortunately very sensitive. The objective of this paper is to predict a priori the confidence we can have in emergence data, i.e., emergence point location and travel time, from the statistical information that describes the uncertainties of the reflectors. (These reflectors can be obtained by picking on migrated images as explained above or by any other method). The proposed method relies on a linearization of each step of the ray computation, allowing one to deduce, from the statistical properties of reflector fluctuations, the statistical properties of ray-tracing outputs. The computed confidences and correlations give access to a more realistic analysis of emergence data. Moreover, they can be used as inputs for reflection tomography to compute models that match travel times according to the confidence we have in the reflector. Applications on real data show that the uncertainties are generally large and, what is much more interesting, strongly varying from one ray to another. Taking them into account is therefore very important for both a better understanding of the kinematic information in the data and the computation of a model that matches these travel times.
Autonomous image data reduction by analysis and interpretation
NASA Technical Reports Server (NTRS)
Eberlein, Susan; Yates, Gigi; Ritter, Niles
1988-01-01
Image data is a critical component of the scientific information acquired by space missions. Compression of image data is required due to the limited bandwidth of the data transmission channel and limited memory space on the acquisition vehicle. This need becomes more pressing when dealing with multispectral data where each pixel may comprise 300 or more bytes. An autonomous, real time, on-board image analysis system for an exploratory vehicle such as a Mars Rover is developed. The completed system will be capable of interpreting image data to produce reduced representations of the image, and of making decisions regarding the importance of data based on current scientific goals. Data from multiple sources, including stereo images, color images, and multispectral data, are fused into single image representations. Analysis techniques emphasize artificial neural networks. Clusters are described by their outlines and class values. These analysis and compression techniques are coupled with decision making capacity for determining importance of each image region. Areas determined to be noise or uninteresting can be discarded in favor of more important areas. Thus limited resources for data storage and transmission are allocated to the most significant images.
Autonomous image data reduction by analysis and interpretation
NASA Astrophysics Data System (ADS)
Eberlein, Susan; Yates, Gigi; Ritter, Niles
Image data is a critical component of the scientific information acquired by space missions. Compression of image data is required due to the limited bandwidth of the data transmission channel and limited memory space on the acquisition vehicle. This need becomes more pressing when dealing with multispectral data where each pixel may comprise 300 or more bytes. An autonomous, real time, on-board image analysis system for an exploratory vehicle such as a Mars Rover is developed. The completed system will be capable of interpreting image data to produce reduced representations of the image, and of making decisions regarding the importance of data based on current scientific goals. Data from multiple sources, including stereo images, color images, and multispectral data, are fused into single image representations. Analysis techniques emphasize artificial neural networks. Clusters are described by their outlines and class values. These analysis and compression techniques are coupled with decision-making capacity for determining importance of each image region. Areas determined to be noise or uninteresting can be discarded in favor of more important areas. Thus limited resources for data storage and transmission are allocated to the most significant images.
Infrared spectroscopy for geologic interpretation of TIMS data
NASA Technical Reports Server (NTRS)
Bartholomew, Mary Jane
1986-01-01
The Portable Field Emission Spectrometer (PFES) was designed to collect meaningful spectra in the field under climatic, thermal, and sky conditions that approximate those at the time of the overflight. The specifications and procedures of PFES are discussed. Laboratory reflectance measurements of rocks and minerals were examined for the purpose of interpreting Thermal Infrared Multispectral Scanner (TIMS) data. The capability is currently being developed to perform direct laboratory measurement of the normal spectral radiance of Earth surface materials at low temperatures (20 to 30 C) at the Jet Propulsion Laboratory.
Interpretation of solar extinction data for stratospheric aerosols
NASA Technical Reports Server (NTRS)
Pepin, T. J.
1980-01-01
This paper discusses the inversion problem for aerosols using the solar extinction method. A series of numerical experiments is described in which solar extinction measurement systems are modeled. A numerical model of a solar extinction measurement system has been coupled with model atmospheres that exhibit fine scale structures to produce numerically generated data signals. These signals were then inverted to study the effect that measurement errors and desired vertical resolution produce in the inverted results. Knowledge o2 the trade off between vertical resolution and the accuracy of inversion aid in the interpretation of the inverted results.
About the problems to interpret spectroscopic data from plasmas
NASA Astrophysics Data System (ADS)
Rosmej, F. B.; Guedda, E. H.; Lisitsa, V. S.; Capes, H.; Stamm, R.
2006-01-01
Continued developments of quantitative spectroscopy and related atomic physics are originating from inertial and magnetic fusion research. In almost all experimental facilities, non-equilibrium phenomena are now a central issue and the interpretation of related spectroscopic data is a great challenge. We discuss new general diagnostic/spectroscopic approaches and usual point of views: high density methods and high density atomic physics for magnetic fusion research like ITER and the Virtual Contour Shape Kinetic Theory VCSKT which unifies low and high density plasma regimes and therefore allows to employ complex satellite transitions in non-equilibrium, non-LTE and non-Coronal plasmas.
Geological Interpretation of PSInSAR Data at Regional Scale
Meisina, Claudia; Zucca, Francesco; Notti, Davide; Colombo, Alessio; Cucchi, Anselmo; Savio, Giuliano; Giannico, Chiara; Bianchi, Marco
2008-01-01
Results of a PSInSAR™ project carried out by the Regional Agency for Environmental Protection (ARPA) in Piemonte Region (Northern Italy) are presented and discussed. A methodology is proposed for the interpretation of the PSInSAR™ data at the regional scale, easy to use by the public administrations and by civil protection authorities. Potential and limitations of the PSInSAR™ technique for ground movement detection on a regional scale and monitoring are then estimated in relationship with different geological processes and various geological environments. PMID:27873940
Preliminary Interpretation of the MSL REMS Pressure Data
NASA Astrophysics Data System (ADS)
Haberle, Robert; Gómez-Elvira, Javier; de la Torre Juárez, Manuel; Harri, Ari-Matti; Hollingsworth, Jeffery; Kahanpää, Henrik; Kahre, Melinda; Martin-Torres, Javier; Mischna, Michael; Newman, Claire; Rafkin, Scot; Rennó, Nilton; Richardson, Mark; Rodríguez-Manfredi, Jose; Vasavada, Ashwin; Zorzano, Maria-Paz; REMS/MSL Science Teams
2013-04-01
The Rover Environmental Monitoring Station (REMS) on the Mars Science Laboratory (MSL) Curiosity rover consists of a suite of meteorological instruments that measure pressure, temperature (air and ground), wind (speed and direction), relative humidity, and the UV flux. A detailed description of the REMS sensors and their performance can be found in Gómez-Elvira et al. [2012, Space Science Reviews, 170(1-4), 583-640]. Here we focus on interpreting the first 100 sols of REMS operations with a particular emphasis on the pressure data. A unique feature of pressure data is that they reveal information on meteorological phenomena with time scales from seconds to years and spatial scales from local to global. From a single station we can learn about dust devils, regional circulations, thermal tides, synoptic weather systems, the CO2 cycle, dust storms, and interannual variability. Thus far MSL's REMS pressure sensor, provided by the Finnish Meteorological Institute and integrated into the REMS payload by Centro de Astrobiología, is performing flawlessly and our preliminary interpretation of its data includes the discovery of relatively dust-free convective vortices; a regional circulation system significantly modified by Gale crater and its central mound; the strongest thermal tides yet measured from the surface of Mars whose amplitudes and phases are very sensitive to fluctuations in global dust loading; and the classical signature of the seasonal cycling of carbon dioxide into and out of the polar caps.
Interpretation methodology and analysis of in-flight lightning data
NASA Technical Reports Server (NTRS)
Rudolph, T.; Perala, R. A.
1982-01-01
A methodology is presented whereby electromagnetic measurements of inflight lightning stroke data can be understood and extended to other aircraft. Recent measurements made on the NASA F106B aircraft indicate that sophisticated numerical techniques and new developments in corona modeling are required to fully understand the data. Thus the problem is nontrivial and successful interpretation can lead to a significant understanding of the lightning/aircraft interaction event. This is of particular importance because of the problem of lightning induced transient upset of new technology low level microcircuitry which is being used in increasing quantities in modern and future avionics. Inflight lightning data is analyzed and lightning environments incident upon the F106B are determined.
Flexibility in data interpretation: effects of representational format
Braithwaite, David W.; Goldstone, Robert L.
2013-01-01
Graphs and tables differentially support performance on specific tasks. For tasks requiring reading off single data points, tables are as good as or better than graphs, while for tasks involving relationships among data points, graphs often yield better performance. However, the degree to which graphs and tables support flexibility across a range of tasks is not well-understood. In two experiments, participants detected main and interaction effects in line graphs and tables of bivariate data. Graphs led to more efficient performance, but also lower flexibility, as indicated by a larger discrepancy in performance across tasks. In particular, detection of main effects of variables represented in the graph legend was facilitated relative to detection of main effects of variables represented in the x-axis. Graphs may be a preferable representational format when the desired task or analytical perspective is known in advance, but may also induce greater interpretive bias than tables, necessitating greater care in their use and design. PMID:24427145
Statistical Considerations of Data Processing in Giovanni Online Tool
NASA Technical Reports Server (NTRS)
Suhung, Shen; Leptoukh, G.; Acker, J.; Berrick, S.
2005-01-01
The GES DISC Interactive Online Visualization and Analysis Infrastructure (Giovanni) is a web-based interface for the rapid visualization and analysis of gridded data from a number of remote sensing instruments. The GES DISC currently employs several Giovanni instances to analyze various products, such as Ocean-Giovanni for ocean products from SeaWiFS and MODIS-Aqua; TOMS & OM1 Giovanni for atmospheric chemical trace gases from TOMS and OMI, and MOVAS for aerosols from MODIS, etc. (http://giovanni.gsfc.nasa.gov) Foremost among the Giovanni statistical functions is data averaging. Two aspects of this function are addressed here. The first deals with the accuracy of averaging gridded mapped products vs. averaging from the ungridded Level 2 data. Some mapped products contain mean values only; others contain additional statistics, such as number of pixels (NP) for each grid, standard deviation, etc. Since NP varies spatially and temporally, averaging with or without weighting by NP will be different. In this paper, we address differences of various weighting algorithms for some datasets utilized in Giovanni. The second aspect is related to different averaging methods affecting data quality and interpretation for data with non-normal distribution. The present study demonstrates results of different spatial averaging methods using gridded SeaWiFS Level 3 mapped monthly chlorophyll a data. Spatial averages were calculated using three different methods: arithmetic mean (AVG), geometric mean (GEO), and maximum likelihood estimator (MLE). Biogeochemical data, such as chlorophyll a, are usually considered to have a log-normal distribution. The study determined that differences between methods tend to increase with increasing size of a selected coastal area, with no significant differences in most open oceans. The GEO method consistently produces values lower than AVG and MLE. The AVG method produces values larger than MLE in some cases, but smaller in other cases. Further
Efficient statistical mapping of avian count data
Royle, J. Andrew; Wikle, C.K.
2005-01-01
We develop a spatial modeling framework for count data that is efficient to implement in high-dimensional prediction problems. We consider spectral parameterizations for the spatially varying mean of a Poisson model. The spectral parameterization of the spatial process is very computationally efficient, enabling effective estimation and prediction in large problems using Markov chain Monte Carlo techniques. We apply this model to creating avian relative abundance maps from North American Breeding Bird Survey (BBS) data. Variation in the ability of observers to count birds is modeled as spatially independent noise, resulting in over-dispersion relative to the Poisson assumption. This approach represents an improvement over existing approaches used for spatial modeling of BBS data which are either inefficient for continental scale modeling and prediction or fail to accommodate important distributional features of count data thus leading to inaccurate accounting of prediction uncertainty.
Traumatic Brain Injury (TBI) Data and Statistics
... data.cdc.gov . Emergency Department Visits, Hospitalizations, and Deaths Rates of TBI-related Emergency Department Visits, Hospitalizations, ... related Hospitalizations by Age Group and Injury Mechanism Deaths Rates of TBI-related Deaths by Sex Rates ...
Vapor Pressure Data Analysis and Statistics
2016-12-01
several assumptions that are not exact. These are, primarily, that heat of vaporization (the slope of the vapor pressure curve) does not vary with...account the variation in heat of vaporization with temperature, and accurately describes data over broad experimental ranges, thereby enabling...units; however, the fit determined using one unit system will only correspond to that using the same data in another unit system if unrounded values
Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.
Statistical analysis of life history calendar data.
Eerola, Mervi; Helske, Satu
2016-04-01
The life history calendar is a data-collection tool for obtaining reliable retrospective data about life events. To illustrate the analysis of such data, we compare the model-based probabilistic event history analysis and the model-free data mining method, sequence analysis. In event history analysis, we estimate instead of transition hazards the cumulative prediction probabilities of life events in the entire trajectory. In sequence analysis, we compare several dissimilarity metrics and contrast data-driven and user-defined substitution costs. As an example, we study young adults' transition to adulthood as a sequence of events in three life domains. The events define the multistate event history model and the parallel life domains in multidimensional sequence analysis. The relationship between life trajectories and excess depressive symptoms in middle age is further studied by their joint prediction in the multistate model and by regressing the symptom scores on individual-specific cluster indices. The two approaches complement each other in life course analysis; sequence analysis can effectively find typical and atypical life patterns while event history analysis is needed for causal inquiries.
Improved interpretation of satellite altimeter data using genetic algorithms
NASA Technical Reports Server (NTRS)
Messa, Kenneth; Lybanon, Matthew
1992-01-01
Genetic algorithms (GA) are optimization techniques that are based on the mechanics of evolution and natural selection. They take advantage of the power of cumulative selection, in which successive incremental improvements in a solution structure become the basis for continued development. A GA is an iterative procedure that maintains a 'population' of 'organisms' (candidate solutions). Through successive 'generations' (iterations) the population as a whole improves in simulation of Darwin's 'survival of the fittest'. GA's have been shown to be successful where noise significantly reduces the ability of other search techniques to work effectively. Satellite altimetry provides useful information about oceanographic phenomena. It provides rapid global coverage of the oceans and is not as severely hampered by cloud cover as infrared imagery. Despite these and other benefits, several factors lead to significant difficulty in interpretation. The GA approach to the improved interpretation of satellite data involves the representation of the ocean surface model as a string of parameters or coefficients from the model. The GA searches in parallel, a population of such representations (organisms) to obtain the individual that is best suited to 'survive', that is, the fittest as measured with respect to some 'fitness' function. The fittest organism is the one that best represents the ocean surface model with respect to the altimeter data.
Statistical inference for serial dilution assay data.
Lee, M L; Whitmore, G A
1999-12-01
Serial dilution assays are widely employed for estimating substance concentrations and minimum inhibitory concentrations. The Poisson-Bernoulli model for such assays is appropriate for count data but not for continuous measurements that are encountered in applications involving substance concentrations. This paper presents practical inference methods based on a log-normal model and illustrates these methods using a case application involving bacterial toxins.
Statistical Analysis of Japanese Structural Damage Data
1977-01-01
Calculated Peak Overpressure Data ......... .. 36 Figure 4. Frequency Functions for Assumed Damage Laws ............ .. 38 Figure 5. Conversions to Value of...Buildings .... .......... .. 92 Figure 20. Effect of Damage Law on Confidence Regions ............ .. 97 * Figure 21. Comparison of Confidence Limits on...Value of ad (Cumulative Log Normal Damage Law ) ...... ............ .. 99 * Figure 22. Comparison of Confidence Limits on Value of ad (Cumulative Log
... People with Blood Clots at Risk of Permanent Work-Related Disability CDC collaborated on a study of individuals who had participated in two previous ... VTE subsequently received a disability pension due to work-related disability. (Published ... Study Findings Multiple data sources needed for accurate reporting ...
Interpretation of MINOS data in terms of nonstandard neutrino interactions
NASA Astrophysics Data System (ADS)
Kopp, Joachim; Machado, Pedro A. N.; Parke, Stephen J.
2010-12-01
The MINOS experiment at Fermilab has recently reported a tension between the oscillation results for neutrinos and antineutrinos. We show that this tension, if it persists, can be understood in the framework of nonstandard neutrino interactions (NSI). While neutral current NSI (nonstandard matter effects) are disfavored by atmospheric neutrinos, a new charged current coupling between tau neutrinos and nucleons can fit the MINOS data without violating other constraints. In particular, we show that loop-level contributions to flavor-violating τ decays are sufficiently suppressed. However, conflicts with existing bounds could arise once the effective theory considered here is embedded into a complete renormalizable model. We predict the future sensitivity of the T2K and NOνA experiments to the NSI parameter region favored by the MINOS fit, and show that both experiments are excellent tools to test the NSI interpretation of the MINOS data.
Plausible inference and the interpretation of quantitative data
Nakhleh, C.W.
1998-02-01
The analysis of quantitative data is central to scientific investigation. Probability theory, which is founded on two rules, the sum and product rules, provides the unique, logically consistent method for drawing valid inferences from quantitative data. This primer on the use of probability theory is meant to fulfill a pedagogical purpose. The discussion begins at the foundation of scientific inference by showing how the sum and product rules of probability theory follow from some very basic considerations of logical consistency. The authors then develop general methods of probability theory that are essential to the analysis and interpretation of data. They discuss how to assign probability distributions using the principle of maximum entropy, how to estimate parameters from data, how to handle nuisance parameters whose values are of little interest, and how to determine which of a set of models is most justified by a data set. All these methods are used together in most realistic data analyses. Examples are given throughout to illustrate the basic points.
Transforming Graph Data for Statistical Relational Learning
2012-10-01
Lichtenwalter et al. (2010) investigate several supervised methods for link prediction in sparsely labeled networks, using many of the met- rics from Table...for specific tasks such as social tagging (Lu, Hu, Chen, & ran Park, 2010) or temporal data (Huh & Fienberg, 2010; He & Parker , 2010). 6.3 Node...Proceedings of VLDB, pp. 102–114. He, D., & Parker , D. (2010). Topic Dynamics: an alternative model of ’Bursts’ in Streams of Topics. In Proceeding of the 16th
ERIC Educational Resources Information Center
Boysen, Guy A.
2015-01-01
Student evaluations of teaching are among the most accepted and important indicators of college teachers' performance. However, faculty and administrators can overinterpret small variations in mean teaching evaluations. The current research examined the effect of including statistical information on the interpretation of teaching evaluations.…
Interpretation of evidence in data by untrained medical students: a scenario-based study
2010-01-01
Background To determine which approach to assessment of evidence in data - statistical tests or likelihood ratios - comes closest to the interpretation of evidence by untrained medical students. Methods Empirical study of medical students (N = 842), untrained in statistical inference or in the interpretation of diagnostic tests. They were asked to interpret a hypothetical diagnostic test, presented in four versions that differed in the distributions of test scores in diseased and non-diseased populations. Each student received only one version. The intuitive application of the statistical test approach would lead to rejecting the null hypothesis of no disease in version A, and to accepting the null in version B. Application of the likelihood ratio approach led to opposite conclusions - against the disease in A, and in favour of disease in B. Version C tested the importance of the p-value (A: 0.04 versus C: 0.08) and version D the importance of the likelihood ratio (C: 1/4 versus D: 1/8). Results In version A, 7.5% concluded that the result was in favour of disease (compatible with p value), 43.6% ruled against the disease (compatible with likelihood ratio), and 48.9% were undecided. In version B, 69.0% were in favour of disease (compatible with likelihood ratio), 4.5% against (compatible with p value), and 26.5% undecided. Increasing the p value from 0.04 to 0.08 did not change the results. The change in the likelihood ratio from 1/4 to 1/8 increased the proportion of non-committed responses. Conclusions Most untrained medical students appear to interpret evidence from data in a manner that is compatible with the use of likelihood ratios. PMID:20796297
Component outage data analysis methods. Volume 2: Basic statistical methods
NASA Astrophysics Data System (ADS)
Marshall, J. A.; Mazumdar, M.; McCutchan, D. A.
1981-08-01
Statistical methods for analyzing outage data on major power system components such as generating units, transmission lines, and transformers are identified. The analysis methods produce outage statistics from component failure and repair data that help in understanding the failure causes and failure modes of various types of components. Methods for forecasting outage statistics for those components used in the evaluation of system reliability are emphasized.
Statistical approach for evaluation of contraceptive data.
Tripathi, Vriyesh
2008-04-01
This article will define how best to analyse data collected from a longitudinal follow up on contraceptive use and discontinuation, with special consideration to the needs of developing countries. Accessibility and acceptability of contraceptives at the ground level remains low and it is an overlooked area of research. The author presents a set of propositions that are closer in spirit to practical recommendations than to formal theorems. We will comment specifically on issues of model validation of model through bootstrapping techniques. The paper makes a presentation of a multivariate model to assess the rate of discontinuation of contraception, while accounting for the possibility that there may be factors that influence both a couple's choice of provider and their probability of discontinuation.
ERIC Educational Resources Information Center
McArthur, David; Chou, Chih-Ping
Diagnostic testing confronts several challenges at once, among which are issues of test interpretation and immediate modification of the test itself in response to the interpretation. Several methods are available for administering and evaluating a test in real-time, towards optimizing the examiner's chances of isolating a persistent pattern of…
Rapp, J.B.
1991-01-01
Q-mode factor analysis was used to quantitate the distribution of the major aliphatic hydrocarbon (n-alkanes, pristane, phytane) systems in sediments from a variety of marine environments. The compositions of the pure end members of the systems were obtained from factor scores and the distribution of the systems within each sample was obtained from factor loadings. All the data, from the diverse environments sampled (estuarine (San Francisco Bay), fresh-water (San Francisco Peninsula), polar-marine (Antarctica) and geothermal-marine (Gorda Ridge) sediments), were reduced to three major systems: a terrestrial system (mostly high molecular weight aliphatics with odd-numbered-carbon predominance), a mature system (mostly low molecular weight aliphatics without predominance) and a system containing mostly high molecular weight aliphatics with even-numbered-carbon predominance. With this statistical approach, it is possible to assign the percentage contribution from various sources to the observed distribution of aliphatic hydrocarbons in each sediment sample. ?? 1991.
Using Data from Climate Science to Teach Introductory Statistics
ERIC Educational Resources Information Center
Witt, Gary
2013-01-01
This paper shows how the application of simple statistical methods can reveal to students important insights from climate data. While the popular press is filled with contradictory opinions about climate science, teachers can encourage students to use introductory-level statistics to analyze data for themselves on this important issue in public…
The Empirical Nature and Statistical Treatment of Missing Data
ERIC Educational Resources Information Center
Tannenbaum, Christyn E.
2009-01-01
Introduction. Missing data is a common problem in research and can produce severely misleading analyses, including biased estimates of statistical parameters, and erroneous conclusions. In its 1999 report, the APA Task Force on Statistical Inference encouraged authors to report complications such as missing data and discouraged the use of…
Interpretation of AMS-02 electrons and positrons data
Mauro, M. Di; Donato, F.; Fornengo, N.; Vittino, A.; Lineros, R. E-mail: donato@to.infn.it E-mail: rlineros@ific.uv.es
2014-04-01
We perform a combined analysis of the recent AMS-02 data on electrons, positrons, electrons plus positrons and positron fraction, in a self-consistent framework where we realize a theoretical modeling of all the astrophysical components that can contribute to the observed fluxes in the whole energy range. The primary electron contribution is modeled through the sum of an average flux from distant sources and the fluxes from the local supernova remnants in the Green catalog. The secondary electron and positron fluxes originate from interactions on the interstellar medium of primary cosmic rays, for which we derive a novel determination by using AMS-02 proton and helium data. Primary positrons and electrons from pulsar wind nebulae in the ATNF catalog are included and studied in terms of their most significant (while loosely known) properties and under different assumptions (average contribution from the whole catalog, single dominant pulsar, a few dominant pulsars). We obtain a remarkable agreement between our various modeling and the AMS-02 data for all types of analysis, demonstrating that the whole AMS-02 leptonic data admit a self-consistent interpretation in terms of astrophysical contributions.
Experimental uncertainty estimation and statistics for data having interval uncertainty.
Kreinovich, Vladik (Applied Biomathematics, Setauket, New York); Oberkampf, William Louis (Applied Biomathematics, Setauket, New York); Ginzburg, Lev (Applied Biomathematics, Setauket, New York); Ferson, Scott (Applied Biomathematics, Setauket, New York); Hajagos, Janos (Applied Biomathematics, Setauket, New York)
2007-05-01
This report addresses the characterization of measurements that include epistemic uncertainties in the form of intervals. It reviews the application of basic descriptive statistics to data sets which contain intervals rather than exclusively point estimates. It describes algorithms to compute various means, the median and other percentiles, variance, interquartile range, moments, confidence limits, and other important statistics and summarizes the computability of these statistics as a function of sample size and characteristics of the intervals in the data (degree of overlap, size and regularity of widths, etc.). It also reviews the prospects for analyzing such data sets with the methods of inferential statistics such as outlier detection and regressions. The report explores the tradeoff between measurement precision and sample size in statistical results that are sensitive to both. It also argues that an approach based on interval statistics could be a reasonable alternative to current standard methods for evaluating, expressing and propagating measurement uncertainties.
Novice Interpretations of Visual Representations of Geosciences Data
NASA Astrophysics Data System (ADS)
Burkemper, L. K.; Arthurs, L.
2013-12-01
Past cognition research of individual's perception and comprehension of bar and line graphs are substantive enough that they have resulted in the generation of graph design principles and graph comprehension theories; however, gaps remain in our understanding of how people process visual representations of data, especially of geologic and atmospheric data. This pilot project serves to build on others' prior research and begin filling the existing gaps. The primary objectives of this pilot project include: (i) design a novel data collection protocol based on a combination of paper-based surveys, think-aloud interviews, and eye-tracking tasks to investigate student data handling skills of simple to complex visual representations of geologic and atmospheric data, (ii) demonstrate that the protocol yields results that shed light on student data handling skills, and (iii) generate preliminary findings upon which tentative but perhaps helpful recommendations on how to more effectively present these data to the non-scientist community and teach essential data handling skills. An effective protocol for the combined use of paper-based surveys, think-aloud interviews, and computer-based eye-tracking tasks for investigating cognitive processes involved in perceiving, comprehending, and interpreting visual representations of geologic and atmospheric data is instrumental to future research in this area. The outcomes of this pilot study provide the foundation upon which future more in depth and scaled up investigations can build. Furthermore, findings of this pilot project are sufficient for making, at least, tentative recommendations that can help inform (i) the design of physical attributes of visual representations of data, especially more complex representations, that may aid in improving students' data handling skills and (ii) instructional approaches that have the potential to aid students in more effectively handling visual representations of geologic and atmospheric data
Yu, Victoria; Kishan, Amar U.; Cao, Minsong; Low, Daniel; Lee, Percy; Ruan, Dan
2014-03-15
Purpose: To demonstrate a new method of evaluating dose response of treatment-induced lung radiographic injury post-SBRT (stereotactic body radiotherapy) treatment and the discovery of bimodal dose behavior within clinically identified injury volumes. Methods: Follow-up CT scans at 3, 6, and 12 months were acquired from 24 patients treated with SBRT for stage-1 primary lung cancers or oligometastic lesions. Injury regions in these scans were propagated to the planning CT coordinates by performing deformable registration of the follow-ups to the planning CTs. A bimodal behavior was repeatedly observed from the probability distribution for dose values within the deformed injury regions. Based on a mixture-Gaussian assumption, an Expectation-Maximization (EM) algorithm was used to obtain characteristic parameters for such distribution. Geometric analysis was performed to interpret such parameters and infer the critical dose level that is potentially inductive of post-SBRT lung injury. Results: The Gaussian mixture obtained from the EM algorithm closely approximates the empirical dose histogram within the injury volume with good consistency. The average Kullback-Leibler divergence values between the empirical differential dose volume histogram and the EM-obtained Gaussian mixture distribution were calculated to be 0.069, 0.063, and 0.092 for the 3, 6, and 12 month follow-up groups, respectively. The lower Gaussian component was located at approximately 70% prescription dose (35 Gy) for all three follow-up time points. The higher Gaussian component, contributed by the dose received by planning target volume, was located at around 107% of the prescription dose. Geometrical analysis suggests the mean of the lower Gaussian component, located at 35 Gy, as a possible indicator for a critical dose that induces lung injury after SBRT. Conclusions: An innovative and improved method for analyzing the correspondence between lung radiographic injury and SBRT treatment dose has
Enhancements to the Engine Data Interpretation System (EDIS)
NASA Technical Reports Server (NTRS)
Hofmann, Martin O.
1993-01-01
The Engine Data Interpretation System (EDIS) expert system project assists the data review personnel at NASA/MSFC in performing post-test data analysis and engine diagnosis of the Space Shuttle Main Engine (SSME). EDIS uses knowledge of the engine, its components, and simple thermodynamic principles instead of, and in addition to, heuristic rules gathered from the engine experts. EDIS reasons in cooperation with human experts, following roughly the pattern of logic exhibited by human experts. EDIS concentrates on steady-state static faults, such as small leaks, and component degradations, such as pump efficiencies. The objective of this contract was to complete the set of engine component models, integrate heuristic rules into EDIS, integrate the Power Balance Model into EDIS, and investigate modification of the qualitative reasoning mechanisms to allow 'fuzzy' value classification. The results of this contract is an operational version of EDIS. EDIS will become a module of the Post-Test Diagnostic System (PTDS) and will, in this context, provide system-level diagnostic capabilities which integrate component-specific findings provided by other modules.
Enhancements to the Engine Data Interpretation System (EDIS)
NASA Technical Reports Server (NTRS)
Hofmann, Martin O.
1993-01-01
The Engine Data Interpretation System (EDIS) expert system project assists the data review personnel at NASA/MSFC in performing post-test data analysis and engine diagnosis of the Space Shuttle Main Engine (SSME). EDIS uses knowledge of the engine, its components, and simple thermodynamic principles instead of, and in addition to, heuristic rules gathered from the engine experts. EDIS reasons in cooperation with human experts, following roughly the pattern of logic exhibited by human experts. EDIS concentrates on steady-state static faults, such as small leaks, and component degradations, such as pump efficiencies. The objective of this contract was to complete the set of engine component models, integrate heuristic rules into EDIS, integrate the Power Balance Model into EDIS, and investigate modification of the qualitative reasoning mechanisms to allow 'fuzzy' value classification. The result of this contract is an operational version of EDIS. EDIS will become a module of the Post-Test Diagnostic System (PTDS) and will, in this context, provide system-level diagnostic capabilities which integrate component-specific findings provided by other modules.
Phase 1 report on sensor technology, data fusion and data interpretation for site characterization
Beckerman, M.
1991-10-01
In this report we discuss sensor technology, data fusion and data interpretation approaches of possible maximal usefulness for subsurface imaging and characterization of land-fill waste sites. Two sensor technologies, terrain conductivity using electromagnetic induction and ground penetrating radar, are described and the literature on the subject is reviewed. We identify the maximum entropy stochastic method as one providing a rigorously justifiable framework for fusing the sensor data, briefly summarize work done by us in this area, and examine some of the outstanding issues with regard to data fusion and interpretation. 25 refs., 17 figs.
Statistical information of ASAR observations over wetland areas: An interaction model interpretation
NASA Astrophysics Data System (ADS)
Grings, F.; Salvia, M.; Karszenbaum, H.; Ferrazzoli, P.; Perna, P.; Barber, M.; Jacobo Berlles, J.
2010-01-01
This paper presents the results obtained after studying the relation between the statistical parameters that describe the backscattering distribution of junco marshes and their biophysical variables. The results are based on the texture analysis of a time series of Envisat ASAR C-band data (APP mode, V V +HH polarizations) acquired between October 2003 and January 2005 over the Lower Paraná River Delta, Argentina. The image power distributions were analyzed, and we show that the K distribution provides a good fitting of SAR data extracted from wetland observations for both polarizations. We also show that the estimated values of the order parameter of the K distribution can be explained using fieldwork and reasonable assumptions. In order to explore these results, we introduce a radiative transfer based interaction model to simulate the junco marsh σ0 distribution. After analyzing model simulations, we found evidence that the order parameter is related to the junco plant density distribution inside the junco marsh patch. It is concluded that the order parameter of the K distribution could be a useful parameter to estimate the junco plant density. This result is important for basin hydrodynamic modeling, since marsh plant density is the most important parameter to estimate marsh water conductance.
Interpretations of the OSCAR data for reactive-gas scavenging
Easter, R.C.; Hales, J.M.
1982-11-01
A description is given of the application of a reactive scavenging model for the interpretation of data from the Oxidation and Scavenging Characteristics of April Rains (OSCAR) field study to evaluate scavenging mechanisms. The OSCAR experiment, conducted during April 1982, was a cooperative field investigation of wet removal by cyclonic storms. A part of the experiment involved intensive measurements at a site in NE Indiana and was designed to provide needed inputs for diagnostic scavenging models. Sequential precipitation chemistry, surface and airborne air chemistry, cloud physics, and meteorological measurements were performed. The model application reported here involves a single storm event at the Indiana site. Although the work presented involves the analysis of only a single precipitation event over a limited geographical area (10/sup 4/ km/sup 2/), the data utilized have considerable uncertainties, and the model contains numerous approximations, it is nevertheless concluded that the ability of the model to reproduce much of the observed precipitation chemistry behavior for the event is quite encouraging.
Eigenanalysis of SNP data with an identity by descent interpretation.
Zheng, Xiuwen; Weir, Bruce S
2016-02-01
Principal component analysis (PCA) is widely used in genome-wide association studies (GWAS), and the principal component axes often represent perpendicular gradients in geographic space. The explanation of PCA results is of major interest for geneticists to understand fundamental demographic parameters. Here, we provide an interpretation of PCA based on relatedness measures, which are described by the probability that sets of genes are identical-by-descent (IBD). An approximately linear transformation between ancestral proportions (AP) of individuals with multiple ancestries and their projections onto the principal components is found. In addition, a new method of eigenanalysis "EIGMIX" is proposed to estimate individual ancestries. EIGMIX is a method of moments with computational efficiency suitable for millions of SNP data, and it is not subject to the assumption of linkage equilibrium. With the assumptions of multiple ancestries and their surrogate ancestral samples, EIGMIX is able to infer ancestral proportions (APs) of individuals. The methods were applied to the SNP data from the HapMap Phase 3 project and the Human Genome Diversity Panel. The APs of individuals inferred by EIGMIX are consistent with the findings of the program ADMIXTURE. In conclusion, EIGMIX can be used to detect population structure and estimate genome-wide ancestral proportions with a relatively high accuracy.
Identification and interpretation of patterns in rocket engine data
NASA Technical Reports Server (NTRS)
Lo, C. F.; Wu, K.; Whitehead, B. A.
1993-01-01
A prototype software system was constructed to detect anomalous Space Shuttle Main Engine (SSME) behavior in the early stages of fault development significantly earlier than the indication provided by either redline detection mechanism or human expert analysis. The major task of the research project is to analyze ground test data, to identify patterns associated with the anomalous engine behavior, and to develop a pattern identification and detection system on the basis of this analysis. A prototype expert system which was developed on both PC and Symbolics 3670 lisp machine for detecting anomalies in turbopump vibration data was checked with data from ground tests 902-473, 902-501, 902-519, and 904-097 of the Space Shuttle Main Engine. The neural networks method was also applied to supplement the statistical method utilized in the prototype system to investigate the feasibility in detecting anomalies in turbopump vibration of SSME. In most cases the anomalies detected by the expert system agree with those reported by NASA. On the neural networks approach, the results are given the successful detection rate higher than 95 percent to identify either normal or abnormal running condition based on the experimental data as well as numerical simulation.
Using Data Mining to Teach Applied Statistics and Correlation
ERIC Educational Resources Information Center
Hartnett, Jessica L.
2016-01-01
This article describes two class activities that introduce the concept of data mining and very basic data mining analyses. Assessment data suggest that students learned some of the conceptual basics of data mining, understood some of the ethical concerns related to the practice, and were able to perform correlations via the Statistical Package for…
NASA Astrophysics Data System (ADS)
Bouzid, Mohamed; Sellaoui, Lotfi; Khalfaoui, Mohamed; Belmabrouk, Hafedh; Lamine, Abdelmottaleb Ben
2016-02-01
In this work, we studied the adsorption of ethanol on three types of activated carbon, namely parent Maxsorb III and two chemically modified activated carbons (H2-Maxsorb III and KOH-H2-Maxsorb III). This investigation has been conducted on the basis of the grand canonical formalism in statistical physics and on simplified assumptions. This led to three parameter equations describing the adsorption of ethanol onto the three types of activated carbon. There was a good correlation between experimental data and results obtained by the new proposed equation. The parameters characterizing the adsorption isotherm were the number of adsorbed molecules (s) per site n, the density of the receptor sites per unit mass of the adsorbent Nm, and the energetic parameter p1/2. They were estimated for the studied systems by a non linear least square regression. The results show that the ethanol molecules were adsorbed in perpendicular (or non parallel) position to the adsorbent surface. The magnitude of the calculated adsorption energies reveals that ethanol is physisorbed onto activated carbon. Both van der Waals and hydrogen interactions were involved in the adsorption process. The calculated values of the specific surface AS, proved that the three types of activated carbon have a highly microporous surface.
Bayesian Analysis of Order-Statistics Models for Ranking Data.
ERIC Educational Resources Information Center
Yu, Philip L. H.
2000-01-01
Studied the order-statistics models, extending the usual normal order-statistics model into one in which the underlying random variables followed a multivariate normal distribution. Used a Bayesian approach and the Gibbs sampling technique. Applied the proposed method to analyze presidential election data from the American Psychological…
Explorations in Statistics: The Analysis of Ratios and Normalized Data
ERIC Educational Resources Information Center
Curran-Everett, Douglas
2013-01-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This ninth installment of "Explorations in Statistics" explores the analysis of ratios and normalized--or standardized--data. As researchers, we compute a ratio--a numerator divided by a denominator--to compute a…
"What If" Analyses: Ways to Interpret Statistical Significance Test Results Using EXCEL or "R"
ERIC Educational Resources Information Center
Ozturk, Elif
2012-01-01
The present paper aims to review two motivations to conduct "what if" analyses using Excel and "R" to understand the statistical significance tests through the sample size context. "What if" analyses can be used to teach students what statistical significance tests really do and in applied research either prospectively to estimate what sample size…
NASA Astrophysics Data System (ADS)
Lee, J.; Chang, H.
2001-12-01
In this research, we investigate the reciprocal influence between groundwater flow and its salinization occurred in two underground cavern sites, using major ion chemistry, PCA for chemical analysis data, and cross-correlation for various hydraulic data. The study areas are two underground LPG storage facilities constructed in South Sea coast, Yosu, and West Sea coastal regions, Pyeongtaek, Korea. Considerably high concentration of major cations and anions of groundwaters at both sites showed brackish or saline water types. In Yosu site, some great chemical difference of groundwater samples between rainy and dry season was caused by temporal intrusion of high-saline water into propane and butane cavern zone, but not in Pyeongtaek site. Cl/Br ratios and δ 18O- δ D distribution for tracing of salinization source water in both sites revealed that two kind of saline water (seawater and halite-dissolved solution) could influence the groundwater salinization in Yosu site, whereas only seawater intrusion could affect the groundwater chemistry of the observation wells in Pyeongtaek site. PCA performed by 8 and 10 chemical ions as statistical variables in both sites showed that intensive intrusion of seawater through butane cavern was occurred at Yosu site while seawater-groundwater mixing was observed at some observation wells located in the marginal part of Pyeongtaek site. Cross-correlation results revealed that the positive relationship between hydraulic head and cavern operating pressure was far more conspicuous at propane cavern zone in both sites (65 ~90% of correlation coefficients). According to the cross-correlation results of Yosu site, small change of head could provoke massive influx of halite-dissolved solution from surface through vertically developed fracture networks. However in Pyeongtaek site, the pressure-sensitive observation wells are not completely consistent with seawater-mixed wells, and the hydraulic change of heads at these wells related to the
NASA Astrophysics Data System (ADS)
Borradaile, Graham J.; Werner, Tomasz; Lagroix, France
2003-02-01
The Kapuskasing Structural Zone (KSZ) reveals a section through the Archean lower crustal granoblastic gneisses. Our new paleomagnetic data largely agree with previous work but we show that interpretations vary according to the choices of statistical, demagnetization and field-correction techniques. First, where the orientation distribution of characteristic remanence directions on the sphere is not symmetrically circular, the commonly used statistical model is invalid [Fisher, R.A., Proc. R. Soc. A217 (1953) 295]. Any tendency to form an elliptical distribution indicates that the sample is drawn from a Bingham-type population [Bingham, C., 1964. Distributions on the sphere and on the projective plane. PhD thesis, Yale University]. Fisher and Bingham statistics produce different confidence estimates from the same data and the traditionally defined mean vector may differ from the maximum eigenvector of an orthorhombic Bingham distribution. It seems prudent to apply both models wherever a non-Fisher population is suspected and that may be appropriate in any tectonized rocks. Non-Fisher populations require larger sample sizes so that focussing on individual sites may not be the most effective policy in tectonized rocks. More dispersed sampling across tectonic structures may be more productive. Second, from the same specimens, mean vectors isolated by thermal and alternating field (AF) demagnetization differ. Which treatment gives more meaningful results is difficult to decipher, especially in metamorphic rocks where the history of the magnetic minerals is not easily related to the ages of tectonic and petrological events. In this study, thermal demagnetization gave lower inclinations for paleomagnetic vectors and thus more distant paleopoles. Third, of more parochial significance, tilt corrections may be unnecessary in the KSZ because magnetic fabrics and thrust ramp are constant in orientation to the depth at which they level off, at approximately 15-km depth. With
Integrative Analyses of Cancer Data: A Review from a Statistical Perspective
Wei, Yingying
2015-01-01
It has become increasingly common for large-scale public data repositories and clinical settings to have multiple types of data, including high-dimensional genomics, epigenomics, and proteomics data as well as survival data, measured simultaneously for the same group of biological samples, which provides unprecedented opportunities to understand cancer mechanisms from a more comprehensive scope and to develop new cancer therapies. Nevertheless, how to interpret a wealth of data into biologically and clinically meaningful information remains very challenging. In this paper, I review recent development in statistics for integrative analyses of cancer data. Topics will cover meta-analysis of homogeneous type of data across multiple studies, integrating multiple heterogeneous genomic data types, survival analysis with high-or ultrahigh-dimensional genomic profiles, and cross-data-type prediction where both predictors and responses are high-or ultrahigh-dimensional vectors. I compare existing statistical methods and comment on potential future research problems. PMID:26041968
Analysis of Accelerants in Fire Debris - Data Interpretation.
Bertsch, W
1997-06-01
Analysis of accelerants in fire debris involves the isolation of residual volatiles from the matrix and the analysis of these volatiles, usually by gas chromatography (GC). The resulting chromatograms are interpreted by comparing to a library of accelerant chromatograms obtained under similar conditions. This review first mentions ASTM's system in classifying fire accelerants into light petroleum distillates, gasoline, medium petroleum distillates, kerosene, heavy petroleum distillates, and unclassified compounds. Chromatograms with well-resolved n-alkane homolog patterns are most recognizable. Chromatograms that are inadequately resolved can be improved by columns having higher efficiency or selectivity, while those with too much interference can be improved by physical removal or reduction of these interfering compounds or selective detection. Using a mass spectrometer (MS) as the detector in GC/MS applications allows the display of common ions shared by compounds with similar structural features, thus greatly facilitating pattern recognition practices. Computer algorithms are now available for automated recognition of patterns possessed by various categories of accelerants. The state-of-the-art in forensic laboratories' analysis of accelerants in fire debris is presented as an appendix to this review. Data generated in annual proficiency tests over an 8-year period (1987-1995) revealed increased use of GC/MS instrumentation and some persisting problems, which include false positives and difficulties associated with component discrimination in the sample preparation process and recognition of partially evaporated distillates.
The Galactic Center: possible interpretations of observational data.
NASA Astrophysics Data System (ADS)
Zakharov, Alexander
2015-08-01
There are not too many astrophysical cases where one really has an opportunity to check predictions of general relativity in the strong gravitational field limit. For these aims the black hole at the Galactic Center is one of the most interesting cases since it is the closest supermassive black hole. Gravitational lensing is a natural phenomenon based on the effect of light deflection in a gravitational field (isotropic geodesics are not straight lines in gravitational field and in a weak gravitational field one has small corrections for light deflection while the perturbative approach is not suitable for a strong gravitational field). Now there are two basic observational techniques to investigate a gravitational potential at the Galactic Center, namely, a) monitoring the orbits of bright stars near the Galactic Center to reconstruct a gravitational potential; b) measuring a size and a shape of shadows around black hole giving an alternative possibility to evaluate black hole parameters in mm-band with VLBI-technique. At the moment one can use a small relativistic correction approach for stellar orbit analysis (however, in the future the approximation will not be not precise enough due to enormous progress of observational facilities) while now for smallest structure analysis in VLBI observations one really needs a strong gravitational field approximation. We discuss results of observations, their conventional interpretations, tensions between observations and models and possible hints for a new physics from the observational data and tensions between observations and interpretations.References1. A.F. Zakharov, F. De Paolis, G. Ingrosso, and A. A. Nucita, New Astronomy Reviews, 56, 64 (2012).2. D. Borka, P. Jovanovic, V. Borka Jovanovic and A.F. Zakharov, Physical Reviews D, 85, 124004 (2012).3. D. Borka, P. Jovanovic, V. Borka Jovanovic and A.F. Zakharov, Journal of Cosmology and Astroparticle Physics, 11, 050 (2013).4. A.F. Zakharov, Physical Reviews D 90
Boyle temperature as a point of ideal gas in gentile statistics and its economic interpretation
NASA Astrophysics Data System (ADS)
Maslov, V. P.; Maslova, T. V.
2014-07-01
Boyle temperature is interpreted as the temperature at which the formation of dimers becomes impossible. To Irving Fisher's correspondence principle we assign two more quantities: the number of degrees of freedom, and credit. We determine the danger level of the mass of money M when the mutual trust between economic agents begins to fall.
Antweiler, R.C.; Taylor, H.E.
2008-01-01
The main classes of statistical treatment of below-detection limit (left-censored) environmental data for the determination of basic statistics that have been used in the literature are substitution methods, maximum likelihood, regression on order statistics (ROS), and nonparametric techniques. These treatments, along with using all instrument-generated data (even those below detection), were evaluated by examining data sets in which the true values of the censored data were known. It was found that for data sets with less than 70% censored data, the best technique overall for determination of summary statistics was the nonparametric Kaplan-Meier technique. ROS and the two substitution methods of assigning one-half the detection limit value to censored data or assigning a random number between zero and the detection limit to censored data were adequate alternatives. The use of these two substitution methods, however, requires a thorough understanding of how the laboratory censored the data. The technique of employing all instrument-generated data - including numbers below the detection limit - was found to be less adequate than the above techniques. At high degrees of censoring (greater than 70% censored data), no technique provided good estimates of summary statistics. Maximum likelihood techniques were found to be far inferior to all other treatments except substituting zero or the detection limit value to censored data.
2013-01-01
Background High-throughput omics technologies such as microarrays and next-generation sequencing (NGS) have become indispensable tools in biological research. Computational analysis and biological interpretation of omics data can pose significant challenges due to a number of factors, in particular the systems integration required to fully exploit and compare data from different studies and/or technology platforms. In transcriptomics, the identification of differentially expressed genes when studying effect(s) or contrast(s) of interest constitutes the starting point for further downstream computational analysis (e.g. gene over-representation/enrichment analysis, reverse engineering) leading to mechanistic insights. Therefore, it is important to systematically store the full list of genes with their associated statistical analysis results (differential expression, t-statistics, p-value) corresponding to one or more effect(s) or contrast(s) of interest (shortly termed as ” contrast data”) in a comparable manner and extract gene sets in order to efficiently support downstream analyses and further leverage data on a long-term basis. Filling this gap would open new research perspectives for biologists to discover disease-related biomarkers and to support the understanding of molecular mechanisms underlying specific biological perturbation effects (e.g. disease, genetic, environmental, etc.). Results To address these challenges, we developed Confero, a contrast data and gene set platform for downstream analysis and biological interpretation of omics data. The Confero software platform provides storage of contrast data in a simple and standard format, data transformation to enable cross-study and platform data comparison, and automatic extraction and storage of gene sets to build new a priori knowledge which is leveraged by integrated and extensible downstream computational analysis tools. Gene Set Enrichment Analysis (GSEA) and Over-Representation Analysis (ORA) are
Statistical methods of combining information: Applications to sensor data fusion
Burr, T.
1996-12-31
This paper reviews some statistical approaches to combining information from multiple sources. Promising new approaches will be described, and potential applications to combining not-so-different data sources such as sensor data will be discussed. Experiences with one real data set are described.
Promoting Statistical Thinking in Schools with Road Injury Data
ERIC Educational Resources Information Center
Woltman, Marie
2017-01-01
Road injury is an immediately relevant topic for 9-19 year olds. Current availability of Open Data makes it increasingly possible to find locally relevant data. Statistical lessons developed from these data can mutually reinforce life lessons about minimizing risk on the road. Devon County Council demonstrate how a wide array of statistical…
Statistical summaries of selected Iowa streamflow data through September 2013
Eash, David A.; O'Shea, Padraic S.; Weber, Jared R.; Nguyen, Kevin T.; Montgomery, Nicholas L.; Simonson, Adrian J.
2016-01-04
Statistical summaries of streamflow data collected at 184 streamgages in Iowa are presented in this report. All streamgages included for analysis have at least 10 years of continuous record collected before or through September 2013. This report is an update to two previously published reports that presented statistical summaries of selected Iowa streamflow data through September 1988 and September 1996. The statistical summaries include (1) monthly and annual flow durations, (2) annual exceedance probabilities of instantaneous peak discharges (flood frequencies), (3) annual exceedance probabilities of high discharges, and (4) annual nonexceedance probabilities of low discharges and seasonal low discharges. Also presented for each streamgage are graphs of the annual mean discharges, mean annual mean discharges, 50-percent annual flow-duration discharges (median flows), harmonic mean flows, mean daily mean discharges, and flow-duration curves. Two sets of statistical summaries are presented for each streamgage, which include (1) long-term statistics for the entire period of streamflow record and (2) recent-term statistics for or during the 30-year period of record from 1984 to 2013. The recent-term statistics are only calculated for streamgages with streamflow records pre-dating the 1984 water year and with at least 10 years of record during 1984–2013. The streamflow statistics in this report are not adjusted for the effects of water use; although some of this water is used consumptively, most of it is returned to the streams.
Social inequality: from data to statistical physics modeling
NASA Astrophysics Data System (ADS)
Chatterjee, Arnab; Ghosh, Asim; Inoue, Jun-ichi; Chakrabarti, Bikas K.
2015-09-01
Social inequality is a topic of interest since ages, and has attracted researchers across disciplines to ponder over it origin, manifestation, characteristics, consequences, and finally, the question of how to cope with it. It is manifested across different strata of human existence, and is quantified in several ways. In this review we discuss the origins of social inequality, the historical and commonly used non-entropic measures such as Lorenz curve, Gini index and the recently introduced k index. We also discuss some analytical tools that aid in understanding and characterizing them. Finally, we argue how statistical physics modeling helps in reproducing the results and interpreting them.
Professional judgment and the interpretation of viable mold air sampling data.
Johnson, David; Thompson, David; Clinkenbeard, Rodney; Redus, Jason
2008-10-01
Although mold air sampling is technically straightforward, interpreting the results to decide if there is an indoor source is not. Applying formal statistical tests to mold sampling data is an error-prone practice due to the extreme data variability. With neither established exposure limits nor useful statistical techniques, indoor air quality investigators often must rely on their professional judgment, but the lack of a consensus "decision strategy" incorporating explicit decision criteria requires professionals to establish their own personal set of criteria when interpreting air sampling data. This study examined the level of agreement among indoor air quality practitioners in their evaluation of airborne mold sampling data and explored differences in inter-evaluator assessments. Eighteen investigators independently judged 30 sets of viable mold air sampling results to indicate: "definite indoor mold source," "likely indoor mold source," "not enough information to decide," "likely no indoor mold source," or "definitely no indoor mold source." Kappa coefficient analysis indicated weak inter-observer reliability, and comparison of evaluator mean scores showed clear inter-evaluator differences in their overall scoring patterns. The responses were modeled on indicator "traits" of the data sets using a generalized, linear mixed model approach and showed several traits to be associated with respondents' ratings, but they also demonstrated distinct and divergent inter-evaluator response patterns. Conclusions were that there was only weak overall agreement in evaluation of the mold sampling data, that particular traits of the data were associated with the conclusions reached, and that there were substantial inter-evaluator differences that were likely due to differences in the personal decision criteria employed by the individual evaluators. The overall conclusion was that there is a need for additional work to rigorously explore the constellation of decision criteria
Measuring and interpretation of three-component borehole magnetic data
NASA Astrophysics Data System (ADS)
Virgil, C.; Ehmann, S.; Hördt, A.; Leven, M.; Steveling, E.
2012-04-01
Three-component borehole magnetics provides important additional information compared with total field or horizontal and vertical measurements. The "Göttinger Bohrloch Magnetometer" (GBM) is capable of recording the vector of the magnetic field along with the orientation of the tool using three fluxgate magnetometers and fibre-optic gyros. The GBM was successfully applied in the Outokumpu Deep Drill Hole (OKU R2500), Finland in September 2008 and in the Louisville Seamount Trail (IODP Expedition 330) from December 2010 until February 2011, and in several shallower boreholes. With the declination of the magnetic field, the GBM provides additional information compared to conventional tools, which reduces the ambiguity for structural interpretation. The position of ferromagnetic objects in the vicinity of the borehole can be computed with higher accuracy. In the case of drilled-through structures, three-component borehole magnetics allow the computation of the vector of magnetization. Using supplementary susceptibility data, the natural remanent magnetization (NRM) vector can be derived, which yields information about the apparent polar wander curve and/or about the structural evolution of the rock units. The NRM vector can further be used to reorient core samples in regions of strong magnetization. The most important aspect in three-component borehole magnetics is the knowledge of the orientation of the probe along the drillhole. With the GBM we use three fibre-optic gyros (FOG), which are aligned orthogonal to each other. These instruments record the turning rate about the three main axes of the probe. The FOGs benefit from a high resolution (< 9 · 10-4 °) and a low drift (< 2 °/h). However, to reach optimal results, extensive data processing and calibration measurements are necessary. Properties to be taken into account are the misalignment, scaling factors and offsets of the fluxgate and FOG triplet, temperature dependent drift of the FOGs, misalignment of the
Mars Geological Province Designations for the Interpretation of GRS Data
NASA Technical Reports Server (NTRS)
Dohm, J. M.; Kerry, K.; Baker, V. R.; Boynton, W.; Maruyama, Shige; Anderson, R. C.
2005-01-01
elemental information, we have defined geologic provinces that represent significant windows into the geological evolution of Mars, unfolding the GEOMARS Theory and forming the basis for interpreting GRS data.
MacKinnon, David P; Pirlott, Angela G
2015-02-01
Statistical mediation methods provide valuable information about underlying mediating psychological processes, but the ability to infer that the mediator variable causes the outcome variable is more complex than widely known. Researchers have recently emphasized how violating assumptions about confounder bias severely limits causal inference of the mediator to dependent variable relation. Our article describes and addresses these limitations by drawing on new statistical developments in causal mediation analysis. We first review the assumptions underlying causal inference and discuss three ways to examine the effects of confounder bias when assumptions are violated. We then describe four approaches to address the influence of confounding variables and enhance causal inference, including comprehensive structural equation models, instrumental variable methods, principal stratification, and inverse probability weighting. Our goal is to further the adoption of statistical methods to enhance causal inference in mediation studies.
Influence of heterogeneity on the interpretation of pumping test data in leaky aquifers
NASA Astrophysics Data System (ADS)
Copty, Nadim K.; Trinchero, Paolo; Sanchez-Vila, Xavier; Sarioglu, Murat Savas; Findikakis, Angelos N.
2008-11-01
Pumping tests are routinely interpreted from the analysis of drawdown data and their derivatives. These interpretations result in a small number of apparent parameter values which lump the underlying heterogeneous structure of the aquifer. Key questions in such interpretations are (1) what is the physical meaning of those lumped parameters and (2) whether it is possible to infer some information about the spatial variability of the hydraulic parameters. The system analyzed in this paper consists of an aquifer separated from a second recharging aquifer by means of an aquitard. The natural log transforms of the transmissivity, ln T, and the vertical conductance of the aquitard, ln C, are modeled as two independent second-order stationary spatial random functions (SRFs). The Monte Carlo approach is used to simulate the time-dependent drawdown at a suite of observation points for different values of the statistical parameters defining the SRFs. Drawdown data at each observation point are independently used to estimate hydraulic parameters using three existing methods: (1) the inflection-point method, (2) curve-fitting, and (3) the double inflection-point method. The resulting estimated parameters are shown to be space dependent and vary with the interpretation method since each method gives different emphasis to different parts of the time-drawdown data. Moreover, the heterogeneity in the pumped aquifer or the aquitard influences the estimates in distinct manners. Finally, we show that, by combining the parameter estimates obtained from the different analysis procedures, information about the heterogeneity of the leaky aquifer system may be inferred.
Estimation of global network statistics from incomplete data.
Bliss, Catherine A; Danforth, Christopher M; Dodds, Peter Sheridan
2014-01-01
Complex networks underlie an enormous variety of social, biological, physical, and virtual systems. A profound complication for the science of complex networks is that in most cases, observing all nodes and all network interactions is impossible. Previous work addressing the impacts of partial network data is surprisingly limited, focuses primarily on missing nodes, and suggests that network statistics derived from subsampled data are not suitable estimators for the same network statistics describing the overall network topology. We generate scaling methods to predict true network statistics, including the degree distribution, from only partial knowledge of nodes, links, or weights. Our methods are transparent and do not assume a known generating process for the network, thus enabling prediction of network statistics for a wide variety of applications. We validate analytical results on four simulated network classes and empirical data sets of various sizes. We perform subsampling experiments by varying proportions of sampled data and demonstrate that our scaling methods can provide very good estimates of true network statistics while acknowledging limits. Lastly, we apply our techniques to a set of rich and evolving large-scale social networks, Twitter reply networks. Based on 100 million tweets, we use our scaling techniques to propose a statistical characterization of the Twitter Interactome from September 2008 to November 2008. Our treatment allows us to find support for Dunbar's hypothesis in detecting an upper threshold for the number of active social contacts that individuals maintain over the course of one week.
ERIC Educational Resources Information Center
Maltese, Adam V.; Svetina, Dubravka; Harsh, Joseph A.
2015-01-01
In the STEM fields, adequate proficiency in reading and interpreting graphs is widely held as a central element for scientific literacy given the importance of data visualizations to succinctly present complex information. Although prior research espouses methods to improve graphing proficiencies, there is little understanding about when and how…
Imputing historical statistics, soils information, and other land-use data to crop area
NASA Technical Reports Server (NTRS)
Perry, C. R., Jr.; Willis, R. W.; Lautenschlager, L.
1982-01-01
In foreign crop condition monitoring, satellite acquired imagery is routinely used. To facilitate interpretation of this imagery, it is advantageous to have estimates of the crop types and their extent for small area units, i.e., grid cells on a map represent, at 60 deg latitude, an area nominally 25 by 25 nautical miles in size. The feasibility of imputing historical crop statistics, soils information, and other ancillary data to crop area for a province in Argentina is studied.
Nonparametric statistical testing of EEG- and MEG-data.
Maris, Eric; Oostenveld, Robert
2007-08-15
In this paper, we show how ElectroEncephaloGraphic (EEG) and MagnetoEncephaloGraphic (MEG) data can be analyzed statistically using nonparametric techniques. Nonparametric statistical tests offer complete freedom to the user with respect to the test statistic by means of which the experimental conditions are compared. This freedom provides a straightforward way to solve the multiple comparisons problem (MCP) and it allows to incorporate biophysically motivated constraints in the test statistic, which may drastically increase the sensitivity of the statistical test. The paper is written for two audiences: (1) empirical neuroscientists looking for the most appropriate data analysis method, and (2) methodologists interested in the theoretical concepts behind nonparametric statistical tests. For the empirical neuroscientist, a large part of the paper is written in a tutorial-like fashion, enabling neuroscientists to construct their own statistical test, maximizing the sensitivity to the expected effect. And for the methodologist, it is explained why the nonparametric test is formally correct. This means that we formulate a null hypothesis (identical probability distribution in the different experimental conditions) and show that the nonparametric test controls the false alarm rate under this null hypothesis.
Kappa statistic for clustered matched-pair data.
Yang, Zhao; Zhou, Ming
2014-07-10
Kappa statistic is widely used to assess the agreement between two procedures in the independent matched-pair data. For matched-pair data collected in clusters, on the basis of the delta method and sampling techniques, we propose a nonparametric variance estimator for the kappa statistic without within-cluster correlation structure or distributional assumptions. The results of an extensive Monte Carlo simulation study demonstrate that the proposed kappa statistic provides consistent estimation and the proposed variance estimator behaves reasonably well for at least a moderately large number of clusters (e.g., K ≥50). Compared with the variance estimator ignoring dependence within a cluster, the proposed variance estimator performs better in maintaining the nominal coverage probability when the intra-cluster correlation is fair (ρ ≥0.3), with more pronounced improvement when ρ is further increased. To illustrate the practical application of the proposed estimator, we analyze two real data examples of clustered matched-pair data.
Data flow language and interpreter for a reconfigurable distributed data processor
Hurt, A.D.; Heath, J.R.
1982-01-01
An analytic language and an interpreter whereby an applications data flow graph may serve as an input to a reconfigurable distributed data processor is proposed. The architecture considered consists of a number of loosely coupled computing elements (CES) which may be linked to data and file memories through fully nonblocking interconnect networks. The real-time performance of such an architecture depends upon its ability to alter its topology in response to changes in application, asynchronous data rates and faults. Such a data flow language enhances the versatility of a reconfigurable architecture by allowing the user to specify the machine's topology at a very high level. 11 references.
Smolders, R.; Den Hond, E.; Koppen, G.; Govarts, E.; Willems, H.; Casteleyn, L.; Kolossa-Gehring, M.; Fiddicke, U.; Castaño, A.; Koch, H.M.; Angerer, J.; Esteban, M.; Sepai, O.; Exley, K.; Bloemen, L.; Horvat, M.; Knudsen, L.E.; Joas, A.; Joas, R.; Biot, P.; and others
2015-08-15
In 2011 and 2012, the COPHES/DEMOCOPHES twin projects performed the first ever harmonized human biomonitoring survey in 17 European countries. In more than 1800 mother–child pairs, individual lifestyle data were collected and cadmium, cotinine and certain phthalate metabolites were measured in urine. Total mercury was determined in hair samples. While the main goal of the COPHES/DEMOCOPHES twin projects was to develop and test harmonized protocols and procedures, the goal of the current paper is to investigate whether the observed differences in biomarker values among the countries implementing DEMOCOPHES can be interpreted using information from external databases on environmental quality and lifestyle. In general, 13 countries having implemented DEMOCOPHES provided high-quality data from external sources that were relevant for interpretation purposes. However, some data were not available for reporting or were not in line with predefined specifications. Therefore, only part of the external information could be included in the statistical analyses. Nonetheless, there was a highly significant correlation between national levels of fish consumption and mercury in hair, the strength of antismoking legislation was significantly related to urinary cotinine levels, and we were able to show indications that also urinary cadmium levels were associated with environmental quality and food quality. These results again show the potential of biomonitoring data to provide added value for (the evaluation of) evidence-informed policy making. - Highlights: • External data was collected to interpret HBM data from DEMOCOPHES. • Hg in hair could be related to fish consumption across different countries. • Urinary cotinine was related to strictness of anti-smoking legislation. • Urinary Cd was borderline significantly related to air and food quality. • Lack of comparable data among countries hampered the analysis.
Healthcare-Associated Infections (HAIs) Data and Statistics
... What's this? Submit What's this? Submit Button HAI Data and Statistics Recommend on Facebook Tweet Share Compartir ... bring increased attention to HAIs and prevention. HAI Data Sources CDCâ€™s National Healthcare Safety Network (NHSN) CDCâ€™s ...
Data Warehousing: How To Make Your Statistics Meaningful.
ERIC Educational Resources Information Center
Flaherty, William
2001-01-01
Examines how one school district found a way to turn data collection from a disparate mountain of statistics into more useful information by using their Instructional Decision Support System. System software is explained as is how the district solved some data management challenges. (GR)
Using Carbon Emissions Data to "Heat Up" Descriptive Statistics
ERIC Educational Resources Information Center
Brooks, Robert
2012-01-01
This article illustrates using carbon emissions data in an introductory statistics assignment. The carbon emissions data has desirable characteristics including: choice of measure; skewness; and outliers. These complexities allow research and public policy debate to be introduced. (Contains 4 figures and 2 tables.)
47 CFR 1.363 - Introduction of statistical data.
Code of Federal Regulations, 2012 CFR
2012-10-01
... studies, so as to be available upon request. In the case of experimental analyses, a clear and complete... adjustments, if any, to observed data shall be described. In the case of every kind of statistical study, the... input data shall be made available. (b) In the case of all studies and analyses offered in evidence...
Interpreting School Satisfaction Data from a Marketing Perspective.
ERIC Educational Resources Information Center
Pandiani, John A.; James, Brad C.; Banks, Steven M.
This paper presents results of a customer satisfaction survey of Vermont elementary and secondary public schools concerning satisfaction with mental health services during the 1996-97 school year. Analysis of completed questionnaires (N=233) are interpreted from a marketing perspective. Findings are reported for: (1) treated prevalence of…
ERIC Educational Resources Information Center
Olsen, Robert J.
2008-01-01
I describe how data pooling and data visualization can be employed in the first-semester general chemistry laboratory to introduce core statistical concepts such as central tendency and dispersion of a data set. The pooled data are plotted as a 1-D scatterplot, a purpose-designed number line through which statistical features of the data are…
Estimation of context for statistical classification of multispectral image data
NASA Technical Reports Server (NTRS)
Tilton, J. C.; Vardeman, S. B.; Swain, P. H.
1982-01-01
Recent investigations have demonstrated the effectiveness of a contextual classifier that combines spatial and spectral information employing a general statistical approach. This statistical classification algorithm exploits the tendency of certain ground cover classes to occur more frequently in some spatial contexts than in others. Indeed, a key input to this algorithm is a statistical characterization of the context: the context function. An unbiased estimator of the context function is discussed which, besides having the advantage of statistical unbiasedness, has the additional advantage over other estimation techniques of being amenable to an adaptive implementation in which the context-function estimate varies according to local contextual information. Results from applying the unbiased estimator to the contextual classification of three real Landsat data sets are presented and contrasted with results from noncontextual classifications and from contextual classifications utilizing other context-function estimation techniques.
Data analysis using the Gnu R system for statistical computation
Simone, James; /Fermilab
2011-07-01
R is a language system for statistical computation. It is widely used in statistics, bioinformatics, machine learning, data mining, quantitative finance, and the analysis of clinical drug trials. Among the advantages of R are: it has become the standard language for developing statistical techniques, it is being actively developed by a large and growing global user community, it is open source software, it is highly portable (Linux, OS-X and Windows), it has a built-in documentation system, it produces high quality graphics and it is easily extensible with over four thousand extension library packages available covering statistics and applications. This report gives a very brief introduction to R with some examples using lattice QCD simulation results. It then discusses the development of R packages designed for chi-square minimization fits for lattice n-pt correlation functions.
Statistics for correlated data: phylogenies, space, and time.
Ives, Anthony R; Zhu, Jun
2006-02-01
Here we give an introduction to the growing number of statistical techniques for analyzing data that are not independent realizations of the same sampling process--in other words, correlated data. We focus on regression problems, in which the value of a given variable depends linearly on the value of another variable. To illustrate different types of processes leading to correlated data, we analyze four simulated examples representing diverse problems arising in ecological studies. The first example is a comparison among species to determine the relationship between home-range area and body size; because species are phylogenetically related, they do not represent independent samples. The second example addresses spatial variation in net primary production and how this might be affected by soil nitrogen; because nearby locations are likely to have similar net primary productivity for reasons other than soil nitrogen, spatial correlation is likely. In the third example, we consider a time-series model to ask whether the decrease in density of a butterfly species is the result of decreases in its host-plant density; because the population density of a species in one generation is likely to affect the density in the following generation, time-series data are often correlated. The fourth example combines both spatial and temporal correlation in an experiment in which prey densities are manipulated to determine the response of predators to their food supply. For each of these examples, we use a different statistical approach for analyzing models of correlated data. Our goal is to give an overview of conceptual issues surrounding correlated data, rather than a detailed tutorial in how to apply different statistical techniques. By dispelling some of the mystery behind correlated data, we hope to encourage ecologists to learn about statistics that could be useful in their own work. Although at first encounter these techniques might seem complicated, they have the power to
Advanced petrophysical interpretation of nuclear well logging data
NASA Astrophysics Data System (ADS)
Kozhevnikov, D. A.; Lazutkina, N. Ye.
1995-04-01
A new approach to rock component analyses using “adaptive petrophysical tuning” provides three crucially new benefits: an original method for interpreting well logs; an algorithm for adaptive tuning and a reliable method of isolating reservoirs within a section. The latter can be regarded as a kind of “petrophysical filtration” based on using the dynamic porosity. Some results of component analyses of terrigenous deposits of the Tyumen suite (West Siberia) are presented.
Online Updating of Statistical Inference in the Big Data Setting.
Schifano, Elizabeth D; Wu, Jing; Wang, Chun; Yan, Jun; Chen, Ming-Hui
2016-01-01
We present statistical methods for big data arising from online analytical processing, where large amounts of data arrive in streams and require fast analysis without storage/access to the historical data. In particular, we develop iterative estimating algorithms and statistical inferences for linear models and estimating equations that update as new data arrive. These algorithms are computationally efficient, minimally storage-intensive, and allow for possible rank deficiencies in the subset design matrices due to rare-event covariates. Within the linear model setting, the proposed online-updating framework leads to predictive residual tests that can be used to assess the goodness-of-fit of the hypothesized model. We also propose a new online-updating estimator under the estimating equation setting. Theoretical properties of the goodness-of-fit tests and proposed estimators are examined in detail. In simulation studies and real data applications, our estimator compares favorably with competing approaches under the estimating equation setting.
A note on the kappa statistic for clustered dichotomous data.
Zhou, Ming; Yang, Zhao
2014-06-30
The kappa statistic is widely used to assess the agreement between two raters. Motivated by a simulation-based cluster bootstrap method to calculate the variance of the kappa statistic for clustered physician-patients dichotomous data, we investigate its special correlation structure and develop a new simple and efficient data generation algorithm. For the clustered physician-patients dichotomous data, based on the delta method and its special covariance structure, we propose a semi-parametric variance estimator for the kappa statistic. An extensive Monte Carlo simulation study is performed to evaluate the performance of the new proposal and five existing methods with respect to the empirical coverage probability, root-mean-square error, and average width of the 95% confidence interval for the kappa statistic. The variance estimator ignoring the dependence within a cluster is generally inappropriate, and the variance estimators from the new proposal, bootstrap-based methods, and the sampling-based delta method perform reasonably well for at least a moderately large number of clusters (e.g., the number of clusters K ⩾50). The new proposal and sampling-based delta method provide convenient tools for efficient computations and non-simulation-based alternatives to the existing bootstrap-based methods. Moreover, the new proposal has acceptable performance even when the number of clusters is as small as K = 25. To illustrate the practical application of all the methods, one psychiatric research data and two simulated clustered physician-patients dichotomous data are analyzed.
Method of interpretation of remotely sensed data and applications to land use
NASA Technical Reports Server (NTRS)
Parada, N. D. J. (Principal Investigator); Dossantos, A. P.; Foresti, C.; Demoraesnovo, E. M. L.; Niero, M.; Lombardo, M. A.
1981-01-01
Instructional material describing a methodology of remote sensing data interpretation and examples of applicatons to land use survey are presented. The image interpretation elements are discussed for different types of sensor systems: aerial photographs, radar, and MSS/LANDSAT. Visual and automatic LANDSAT image interpretation is emphasized.
NASA Astrophysics Data System (ADS)
Samfira, Ionel; Boldea, Marius; Popescu, Cosmin
2012-09-01
Significant parameters of permanent grasslands are represented by the pastoral value and Shannon and Simpson biodiversity indices. The dynamics of these parameters has been studied in several plant associations in Banat Plain, Romania. From the point of view of their typology, these permanent grasslands belong to the steppe area, series Festuca pseudovina, type Festuca pseudovina-Achilea millefolium, subtype Lolium perenne. The methods used for the purpose of this research included plant cover analysis (double meter method, calculation of Shannon and Simpson indices), and statistical methods of regression and correlation. The results show that, in the permanent grasslands in the plain region, when the pastoral value is average to low, the level of interspecific biodiversity is on the increase.
Statistical interpretation of transient current power-law decay in colloidal quantum dot arrays
NASA Astrophysics Data System (ADS)
Sibatov, R. T.
2011-08-01
A new statistical model of the charge transport in colloidal quantum dot arrays is proposed. It takes into account Coulomb blockade forbidding multiple occupancy of nanocrystals and the influence of energetic disorder of interdot space. The model explains power-law current transients and the presence of the memory effect. The fractional differential analogue of the Ohm law is found phenomenologically for nanocrystal arrays. The model combines ideas that were considered as conflicting by other authors: the Scher-Montroll idea about the power-law distribution of waiting times in localized states for disordered semiconductors is applied taking into account Coulomb blockade; Novikov's condition about the asymptotic power-law distribution of time intervals between successful current pulses in conduction channels is fulfilled; and the carrier injection blocking predicted by Ginger and Greenham (2000 J. Appl. Phys. 87 1361) takes place.
Simpson's Paradox in the Interpretation of "Leaky Pipeline" Data
ERIC Educational Resources Information Center
Walton, Paul H.; Walton, Daniel J.
2016-01-01
The traditional "leaky pipeline" plots are widely used to inform gender equality policy and practice. Herein, we demonstrate how a statistical phenomenon known as Simpson's paradox can obscure trends in gender "leaky pipeline" plots. Our approach has been to use Excel spreadsheets to generate hypothetical "leaky…
Exploring Foundation Concepts in Introductory Statistics Using Dynamic Data Points
ERIC Educational Resources Information Center
Ekol, George
2015-01-01
This paper analyses introductory statistics students' verbal and gestural expressions as they interacted with a dynamic sketch (DS) designed using "Sketchpad" software. The DS involved numeric data points built on the number line whose values changed as the points were dragged along the number line. The study is framed on aggregate…
Data Desk Professional: Statistical Analysis for the Macintosh.
ERIC Educational Resources Information Center
Wise, Steven L.; Kutish, Gerald W.
This review of Data Desk Professional, a statistical software package for Macintosh microcomputers, includes information on: (1) cost and the amount and allocation of memory; (2) usability (documentation quality, ease of use); (3) running programs; (4) program output (quality of graphics); (5) accuracy; and (6) user services. In conclusion, it is…
Statistical Physics in the Era of Big Data
ERIC Educational Resources Information Center
Wang, Dashun
2013-01-01
With the wealth of data provided by a wide range of high-throughout measurement tools and technologies, statistical physics of complex systems is entering a new phase, impacting in a meaningful fashion a wide range of fields, from cell biology to computer science to economics. In this dissertation, by applying tools and techniques developed in…
Introduction to Statistics and Data Analysis With Computer Applications I.
ERIC Educational Resources Information Center
Morris, Carl; Rolph, John
This document consists of unrevised lecture notes for the first half of a 20-week in-house graduate course at Rand Corporation. The chapter headings are: (1) Histograms and descriptive statistics; (2) Measures of dispersion, distance and goodness of fit; (3) Using JOSS for data analysis; (4) Binomial distribution and normal approximation; (5)…
Statistical Modeling for Radiation Hardness Assurance: Toward Bigger Data
NASA Technical Reports Server (NTRS)
Ladbury, R.; Campola, M. J.
2015-01-01
New approaches to statistical modeling in radiation hardness assurance are discussed. These approaches yield quantitative bounds on flight-part radiation performance even in the absence of conventional data sources. This allows the analyst to bound radiation risk at all stages and for all decisions in the RHA process. It also allows optimization of RHA procedures for the project's risk tolerance.
Quick Access: Find Statistical Data on the Internet.
ERIC Educational Resources Information Center
Su, Di
1999-01-01
Provides an annotated list of Internet sources (World Wide Web, ftp, and gopher sites) for current and historical statistical business data, including selected interest rates, the Consumer Price Index, the Producer Price Index, foreign currency exchange rates, noon buying rates, per diem rates, the special drawing right, stock quotes, and mutual…
ERIC Educational Resources Information Center
Knirk, Frederick G.
Designed to assist educational researchers in utilizing microcomputers, this paper presents information on four types of computer software: writing tools for educators, statistical software designed to perform analyses of small and moderately large data sets, project management tools, and general education/research oriented information services…
Using Non-Linear Statistical Methods with Laboratory Kinetic Data
NASA Technical Reports Server (NTRS)
Anicich, Vincent
1997-01-01
This paper will demonstrate the usefulness of standard non-linear statistical analysis on ICR and SIFT kinetic data. The specific systems used in the demonstration are the isotopic and change transfer reactions in the system of H2O+/D2O, H30+/D2O, and other permutations.
Harnessing Multivariate Statistics for Ellipsoidal Data in Structural Geology
NASA Astrophysics Data System (ADS)
Roberts, N.; Davis, J. R.; Titus, S.; Tikoff, B.
2015-12-01
Most structural geology articles do not state significance levels, report confidence intervals, or perform regressions to find trends. This is, in part, because structural data tend to include directions, orientations, ellipsoids, and tensors, which are not treatable by elementary statistics. We describe a full procedural methodology for the statistical treatment of ellipsoidal data. We use a reconstructed dataset of deformed ooids in Maryland from Cloos (1947) to illustrate the process. Normalized ellipsoids have five degrees of freedom and can be represented by a second order tensor. This tensor can be permuted into a five dimensional vector that belongs to a vector space and can be treated with standard multivariate statistics. Cloos made several claims about the distribution of deformation in the South Mountain fold, Maryland, and we reexamine two particular claims using hypothesis testing: 1) octahedral shear strain increases towards the axial plane of the fold; 2) finite strain orientation varies systematically along the trend of the axial trace as it bends with the Appalachian orogen. We then test the null hypothesis that the southern segment of South Mountain is the same as the northern segment. This test illustrates the application of ellipsoidal statistics, which combine both orientation and shape. We report confidence intervals for each test, and graphically display our results with novel plots. This poster illustrates the importance of statistics in structural geology, especially when working with noisy or small datasets.
Statistical summaries of fatigue data for design purposes
NASA Technical Reports Server (NTRS)
Wirsching, P. H.
1983-01-01
Two methods are discussed for constructing a design curve on the safe side of fatigue data. Both the tolerance interval and equivalent prediction interval (EPI) concepts provide such a curve while accounting for both the distribution of the estimators in small samples and the data scatter. The EPI is also useful as a mechanism for providing necessary statistics on S-N data for a full reliability analysis which includes uncertainty in all fatigue design factors. Examples of statistical analyses of the general strain life relationship are presented. The tolerance limit and EPI techniques for defining a design curve are demonstrated. Examples usng WASPALOY B and RQC-100 data demonstrate that a reliability model could be constructed by considering the fatigue strength and fatigue ductility coefficients as two independent random variables. A technique given for establishing the fatigue strength for high cycle lives relies on an extrapolation technique and also accounts for "runners." A reliability model or design value can be specified.
Methodology of remote sensing data interpretation and geological applications. [Brazil
NASA Technical Reports Server (NTRS)
Parada, N. D. J. (Principal Investigator); Veneziani, P.; Dosanjos, C. E.
1982-01-01
Elements of photointerpretation discussed include the analysis of photographic texture and structure as well as film tonality. The method used is based on conventional techniques developed for interpreting aerial black and white photographs. By defining the properties which characterize the form and individuality of dual images, homologous zones can be identified. Guy's logic method (1966) was adapted and used on functions of resolution, scale, and spectral characteristics of remotely sensed products. Applications of LANDSAT imagery are discussed for regional geological mapping, mineral exploration, hydrogeology, and geotechnical engineering in Brazil.
A COMPREHENSIVE STATISTICALLY-BASED METHOD TO INTERPRET REAL-TIME FLOWING MEASUREMENTS
Pinan Dawkrajai; Analis A. Romero; Keita Yoshioka; Ding Zhu; A.D. Hill; Larry W. Lake
2004-10-01
In this project, we are developing new methods for interpreting measurements in complex wells (horizontal, multilateral and multi-branching wells) to determine the profiles of oil, gas, and water entry. These methods are needed to take full advantage of ''smart'' well instrumentation, a technology that is rapidly evolving to provide the ability to continuously and permanently monitor downhole temperature, pressure, volumetric flow rate, and perhaps other fluid flow properties at many locations along a wellbore; and hence, to control and optimize well performance. In this first year, we have made considerable progress in the development of the forward model of temperature and pressure behavior in complex wells. In this period, we have progressed on three major parts of the forward problem of predicting the temperature and pressure behavior in complex wells. These three parts are the temperature and pressure behaviors in the reservoir near the wellbore, in the wellbore or laterals in the producing intervals, and in the build sections connecting the laterals, respectively. Many models exist to predict pressure behavior in reservoirs and wells, but these are almost always isothermal models. To predict temperature behavior we derived general mass, momentum, and energy balance equations for these parts of the complex well system. Analytical solutions for the reservoir and wellbore parts for certain special conditions show the magnitude of thermal effects that could occur. Our preliminary sensitivity analyses show that thermal effects caused by near-wellbore reservoir flow can cause temperature changes that are measurable with smart well technology. This is encouraging for the further development of the inverse model.
A Comprehensive Statistically-Based Method to Interpret Real-Time Flowing Measurements
Keita Yoshioka; Pinan Dawkrajai; Analis A. Romero; Ding Zhu; A. D. Hill; Larry W. Lake
2007-01-15
With the recent development of temperature measurement systems, continuous temperature profiles can be obtained with high precision. Small temperature changes can be detected by modern temperature measuring instruments such as fiber optic distributed temperature sensor (DTS) in intelligent completions and will potentially aid the diagnosis of downhole flow conditions. In vertical wells, since elevational geothermal changes make the wellbore temperature sensitive to the amount and the type of fluids produced, temperature logs can be used successfully to diagnose the downhole flow conditions. However, geothermal temperature changes along the wellbore being small for horizontal wells, interpretations of a temperature log become difficult. The primary temperature differences for each phase (oil, water, and gas) are caused by frictional effects. Therefore, in developing a thermal model for horizontal wellbore, subtle temperature changes must be accounted for. In this project, we have rigorously derived governing equations for a producing horizontal wellbore and developed a prediction model of the temperature and pressure by coupling the wellbore and reservoir equations. Also, we applied Ramey's model (1962) to the build section and used an energy balance to infer the temperature profile at the junction. The multilateral wellbore temperature model was applied to a wide range of cases at varying fluid thermal properties, absolute values of temperature and pressure, geothermal gradients, flow rates from each lateral, and the trajectories of each build section. With the prediction models developed, we present inversion studies of synthetic and field examples. These results are essential to identify water or gas entry, to guide flow control devices in intelligent completions, and to decide if reservoir stimulation is needed in particular horizontal sections. This study will complete and validate these inversion studies.
A Comprehensive Statistically-Based Method to Interpret Real-Time Flowing Measurements
Pinan Dawkrajai; Keita Yoshioka; Analis A. Romero; Ding Zhu; A.D. Hill; Larry W. Lake
2005-10-01
This project is motivated by the increasing use of distributed temperature sensors for real-time monitoring of complex wells (horizontal, multilateral and multi-branching wells) to infer the profiles of oil, gas, and water entry. Measured information can be used to interpret flow profiles along the wellbore including junction and build section. In this second project year, we have completed a forward model to predict temperature and pressure profiles in complex wells. As a comprehensive temperature model, we have developed an analytical reservoir flow model which takes into account Joule-Thomson effects in the near well vicinity and multiphase non-isothermal producing wellbore model, and couples those models accounting mass and heat transfer between them. For further inferences such as water coning or gas evaporation, we will need a numerical non-isothermal reservoir simulator, and unlike existing (thermal recovery, geothermal) simulators, it should capture subtle temperature change occurring in a normal production. We will show the results from the analytical coupled model (analytical reservoir solution coupled with numerical multi-segment well model) to infer the anomalous temperature or pressure profiles under various conditions, and the preliminary results from the numerical coupled reservoir model which solves full matrix including wellbore grids. We applied Ramey's model to the build section and used an enthalpy balance to infer the temperature profile at the junction. The multilateral wellbore temperature model was applied to a wide range of cases varying fluid thermal properties, absolute values of temperature and pressure, geothermal gradients, flow rates from each lateral, and the trajectories of each build section.
ERIC Educational Resources Information Center
Mickler, J. Ernest
This 60th annual report on collegiate enrollments in the United States is based on data received from 1,635 four-year institutions in the U.S., Puerto Rico, and the U.S. Territories. General notes, survey methodology notes, and a summary of findings are presented. Detailed statistical charts present institutional data on men and women students and…
Feature-Based Statistical Analysis of Combustion Simulation Data
Bennett, J; Krishnamoorthy, V; Liu, S; Grout, R; Hawkes, E; Chen, J; Pascucci, V; Bremer, P T
2011-11-18
We present a new framework for feature-based statistical analysis of large-scale scientific data and demonstrate its effectiveness by analyzing features from Direct Numerical Simulations (DNS) of turbulent combustion. Turbulent flows are ubiquitous and account for transport and mixing processes in combustion, astrophysics, fusion, and climate modeling among other disciplines. They are also characterized by coherent structure or organized motion, i.e. nonlocal entities whose geometrical features can directly impact molecular mixing and reactive processes. While traditional multi-point statistics provide correlative information, they lack nonlocal structural information, and hence, fail to provide mechanistic causality information between organized fluid motion and mixing and reactive processes. Hence, it is of great interest to capture and track flow features and their statistics together with their correlation with relevant scalar quantities, e.g. temperature or species concentrations. In our approach we encode the set of all possible flow features by pre-computing merge trees augmented with attributes, such as statistical moments of various scalar fields, e.g. temperature, as well as length-scales computed via spectral analysis. The computation is performed in an efficient streaming manner in a pre-processing step and results in a collection of meta-data that is orders of magnitude smaller than the original simulation data. This meta-data is sufficient to support a fully flexible and interactive analysis of the features, allowing for arbitrary thresholds, providing per-feature statistics, and creating various global diagnostics such as Cumulative Density Functions (CDFs), histograms, or time-series. We combine the analysis with a rendering of the features in a linked-view browser that enables scientists to interactively explore, visualize, and analyze the equivalent of one terabyte of simulation data. We highlight the utility of this new framework for combustion
Statistical Quality Control of Moisture Data in GEOS DAS
NASA Technical Reports Server (NTRS)
Dee, D. P.; Rukhovets, L.; Todling, R.
1999-01-01
A new statistical quality control algorithm was recently implemented in the Goddard Earth Observing System Data Assimilation System (GEOS DAS). The final step in the algorithm consists of an adaptive buddy check that either accepts or rejects outlier observations based on a local statistical analysis of nearby data. A basic assumption in any such test is that the observed field is spatially coherent, in the sense that nearby data can be expected to confirm each other. However, the buddy check resulted in excessive rejection of moisture data, especially during the Northern Hemisphere summer. The analysis moisture variable in GEOS DAS is water vapor mixing ratio. Observational evidence shows that the distribution of mixing ratio errors is far from normal. Furthermore, spatial correlations among mixing ratio errors are highly anisotropic and difficult to identify. Both factors contribute to the poor performance of the statistical quality control algorithm. To alleviate the problem, we applied the buddy check to relative humidity data instead. This variable explicitly depends on temperature and therefore exhibits a much greater spatial coherence. As a result, reject rates of moisture data are much more reasonable and homogeneous in time and space.
Interpreting and Reporting Radiological Water-Quality Data
McCurdy, David E.; Garbarino, John R.; Mullin, Ann H.
2008-01-01
This document provides information to U.S. Geological Survey (USGS) Water Science Centers on interpreting and reporting radiological results for samples of environmental matrices, most notably water. The information provided is intended to be broadly useful throughout the United States, but it is recommended that scientists who work at sites containing radioactive hazardous wastes need to consult additional sources for more detailed information. The document is largely based on recognized national standards and guidance documents for radioanalytical sample processing, most notably the Multi-Agency Radiological Laboratory Analytical Protocols Manual (MARLAP), and on documents published by the U.S. Environmental Protection Agency and the American National Standards Institute. It does not include discussion of standard USGS practices including field quality-control sample analysis, interpretive report policies, and related issues, all of which shall always be included in any effort by the Water Science Centers. The use of 'shall' in this report signifies a policy requirement of the USGS Office of Water Quality.
Hysteresis model and statistical interpretation of energy losses in non-oriented steels
NASA Astrophysics Data System (ADS)
Mănescu (Păltânea), Veronica; Păltânea, Gheorghe; Gavrilă, Horia
2016-04-01
In this paper the hysteresis energy losses in two non-oriented industrial steels (M400-65A and M800-65A) were determined, by means of an efficient classical Preisach model, which is based on the Pescetti-Biorci method for the identification of the Preisach density. The excess and the total energy losses were also determined, using a statistical framework, based on magnetic object theory. The hysteresis energy losses, in a non-oriented steel alloy, depend on the peak magnetic polarization and they can be computed using a Preisach model, due to the fact that in these materials there is a direct link between the elementary rectangular loops and the discontinuous character of the magnetization process (Barkhausen jumps). To determine the Preisach density it was necessary to measure the normal magnetization curve and the saturation hysteresis cycle. A system of equations was deduced and the Preisach density was calculated for a magnetic polarization of 1.5 T; then the hysteresis cycle was reconstructed. Using the same pattern for the Preisach distribution, it was computed the hysteresis cycle for 1 T. The classical losses were calculated using a well known formula and the excess energy losses were determined by means of the magnetic object theory. The total energy losses were mathematically reconstructed and compared with those, measured experimentally.
Data and statistical methods for analysis of trends and patterns
Atwood, C.L.; Gentillon, C.D.; Wilson, G.E.
1992-11-01
This report summarizes topics considered at a working meeting on data and statistical methods for analysis of trends and patterns in US commercial nuclear power plants. This meeting was sponsored by the Office of Analysis and Evaluation of Operational Data (AEOD) of the Nuclear Regulatory Commission (NRC). Three data sets are briefly described: Nuclear Plant Reliability Data System (NPRDS), Licensee Event Report (LER) data, and Performance Indicator data. Two types of study are emphasized: screening studies, to see if any trends or patterns appear to be present; and detailed studies, which are more concerned with checking the analysis assumptions, modeling any patterns that are present, and searching for causes. A prescription is given for a screening study, and ideas are suggested for a detailed study, when the data take of any of three forms: counts of events per time, counts of events per demand, and non-event data.
Statistical analysis of the seasonal variation in demographic data.
Fellman, J; Eriksson, A W
2000-10-01
There has been little agreement as to whether reproduction or similar demographic events occur seasonally and, especially, whether there is any universal seasonal pattern. One reason is that the seasonal pattern may vary in different populations and at different times. Another reason is that different statistical methods have been used. Every statistical model is based on certain assumed conditions and hence is designed to identify specific components of the seasonal pattern. Therefore, the statistical method applied should be chosen with due consideration. In this study we present, develop, and compare different statistical methods for the study of seasonal variation. Furthermore, we stress that the methods are applicable for the analysis of many kinds of demographic data. The first approaches in the literature were based on monthly frequencies, on the simple sine curve, and on the approximation that the months are of equal length. Later, "the population at risk" and the fact that the months have different lengths were considered. Under these later assumptions the targets of the statistical analyses are the rates. In this study we present and generalize the earlier models. Furthermore, we use trigonometric regression methods. The trigonometric regression model in its simplest form corresponds to the sine curve. We compare the regression methods with the earlier models and reanalyze some data. Our results show that models for rates eliminate the disturbing effects of the varying length of the months, including the effect of leap years, and of the seasonal pattern of the population at risk. Therefore, they give the purest analysis of the seasonal pattern of the demographic data in question, e.g., rates of general births, twin maternities, neural tube defects, and mortality. Our main finding is that the trigonometric regression methods are more flexible and easier to handle than the earlier methods, particularly when the data differ from the simple sine curve.
Statistical Treatment of Earth Observing System Pyroshock Separation Test Data
NASA Technical Reports Server (NTRS)
McNelis, Anne M.; Hughes, William O.
1998-01-01
The Earth Observing System (EOS) AM-1 spacecraft for NASA's Mission to Planet Earth is scheduled to be launched on an Atlas IIAS vehicle in June of 1998. One concern is that the instruments on the EOS spacecraft are sensitive to the shock-induced vibration produced when the spacecraft separates from the launch vehicle. By employing unique statistical analysis to the available ground test shock data, the NASA Lewis Research Center found that shock-induced vibrations would not be as great as the previously specified levels of Lockheed Martin. The EOS pyroshock separation testing, which was completed in 1997, produced a large quantity of accelerometer data to characterize the shock response levels at the launch vehicle/spacecraft interface. Thirteen pyroshock separation firings of the EOS and payload adapter configuration yielded 78 total measurements at the interface. The multiple firings were necessary to qualify the newly developed Lockheed Martin six-hardpoint separation system. Because of the unusually large amount of data acquired, Lewis developed a statistical methodology to predict the maximum expected shock levels at the interface between the EOS spacecraft and the launch vehicle. Then, this methodology, which is based on six shear plate accelerometer measurements per test firing at the spacecraft/launch vehicle interface, was used to determine the shock endurance specification for EOS. Each pyroshock separation test of the EOS spacecraft simulator produced its own set of interface accelerometer data. Probability distributions, histograms, the median, and higher order moments (skew and kurtosis) were analyzed. The data were found to be lognormally distributed, which is consistent with NASA pyroshock standards. Each set of lognormally transformed test data produced was analyzed to determine if the data should be combined statistically. Statistical testing of the data's standard deviations and means (F and t testing, respectively) determined if data sets were
Mathematical and statistical approaches for interpreting biomarker compounds in exhaled human breath
The various instrumental techniques, human studies, and diagnostic tests that produce data from samples of exhaled breath have one thing in common: they all need to be put into a context wherein a posed question can actually be answered. Exhaled breath contains numerous compoun...
Interpreting the Results of Weighted Least-Squares Regression: Caveats for the Statistical Consumer.
ERIC Educational Resources Information Center
Willett, John B.; Singer, Judith D.
In research, data sets often occur in which the variance of the distribution of the dependent variable at given levels of the predictors is a function of the values of the predictors. In this situation, the use of weighted least-squares (WLS) or techniques is required. Weights suitable for use in a WLS regression analysis must be estimated. A…
Using demographic data to better interpret pitfall trap catches
Matalin, Andrey V.; Makarov, Kirill V.
2011-01-01
Abstract The results of pitfall trapping are often interpreted as abundance in a particular habitat. At the same time, there are numerous cases of almost unrealistically high catches of ground beetles in seemingly unsuitable sites. The correlation of catches by pitfall trapping with the true distribution and abundance of Carabidae needs corroboration. During a full year survey in 2006/07 in the Lake Elton region (Volgograd Area, Russia), 175 species of ground beetles were trapped. Considering the differences in demographic structure of the local populations, and not their abundances, three groups of species were recognized: residents, migrants and sporadic. In residents, the demographic structure of local populations is complete, and their habitats can be considered “residential”. In migrants and sporadic species, the demographic structure of the local populations is incomplete, and their habitats can be considered “transit”. Residents interact both with their prey and with each other in a particular habitat. Sporadic species are hardly important to a carabid community because of their low abundances. The contribution of migrants to the structure of carabid communities is not apparent and requires additional research. Migrants and sporadic species represent a “labile” component in ground beetles communities, as opposed to a “stable” component, represented by residents. The variability of the labile component substantially limits our interpretation of species diversity in carabid communities. Thus, the criteria for determining the most abundant, or dominant species inevitably vary because the abundance of migrants in some cases can be one order of magnitude higher than that of residents. The results of pitfall trapping adequately reflect the state of carabid communities only in zonal habitats, while azonal and disturbed habitats are merely transit ones for many species of ground beetles. A study of the demographic structure of local populations and
Kim, Kyoung-Ho; Yun, Seong-Taek; Choi, Byoung-Young; Chae, Gi-Tak; Joo, Yongsung; Kim, Kangjoo; Kim, Hyoung-Soo
2009-07-21
Hydrochemical and multivariate statistical interpretations of 16 physicochemical parameters of 45 groundwater samples from a riverside alluvial aquifer underneath an agricultural area in Osong, central Korea, were performed in this study to understand the spatial controls of nitrate concentrations in terms of biogeochemical processes occurring near oxbow lakes within a fluvial plain. Nitrate concentrations in groundwater showed a large variability from 0.1 to 190.6 mg/L (mean=35.0 mg/L) with significantly lower values near oxbow lakes. The evaluation of hydrochemical data indicated that the groundwater chemistry (especially, degree of nitrate contamination) is mainly controlled by two competing processes: 1) agricultural contamination and 2) redox processes. In addition, results of factorial kriging, consisting of two steps (i.e., co-regionalization and factor analysis), reliably showed a spatial control of the concentrations of nitrate and other redox-sensitive species; in particular, significant denitrification was observed restrictedly near oxbow lakes. The results of this study indicate that sub-oxic conditions in an alluvial groundwater system are developed geologically and geochemically in and near oxbow lakes, which can effectively enhance the natural attenuation of nitrate before the groundwater discharges to nearby streams. This study also demonstrates the usefulness of multivariate statistical analysis in groundwater study as a supplementary tool for interpretation of complex hydrochemical data sets.
NASA Astrophysics Data System (ADS)
Kim, Kyoung-Ho; Yun, Seong-Taek; Choi, Byoung-Young; Chae, Gi-Tak; Joo, Yongsung; Kim, Kangjoo; Kim, Hyoung-Soo
2009-07-01
Hydrochemical and multivariate statistical interpretations of 16 physicochemical parameters of 45 groundwater samples from a riverside alluvial aquifer underneath an agricultural area in Osong, central Korea, were performed in this study to understand the spatial controls of nitrate concentrations in terms of biogeochemical processes occurring near oxbow lakes within a fluvial plain. Nitrate concentrations in groundwater showed a large variability from 0.1 to 190.6 mg/L (mean = 35.0 mg/L) with significantly lower values near oxbow lakes. The evaluation of hydrochemical data indicated that the groundwater chemistry (especially, degree of nitrate contamination) is mainly controlled by two competing processes: 1) agricultural contamination and 2) redox processes. In addition, results of factorial kriging, consisting of two steps (i.e., co-regionalization and factor analysis), reliably showed a spatial control of the concentrations of nitrate and other redox-sensitive species; in particular, significant denitrification was observed restrictedly near oxbow lakes. The results of this study indicate that sub-oxic conditions in an alluvial groundwater system are developed geologically and geochemically in and near oxbow lakes, which can effectively enhance the natural attenuation of nitrate before the groundwater discharges to nearby streams. This study also demonstrates the usefulness of multivariate statistical analysis in groundwater study as a supplementary tool for interpretation of complex hydrochemical data sets.
Kissling, Grace E; Haseman, Joseph K; Zeiger, Errol
2015-09-02
A recent article by Gaus (2014) demonstrates a serious misunderstanding of the NTP's statistical analysis and interpretation of rodent carcinogenicity data as reported in Technical Report 578 (Ginkgo biloba) (NTP, 2013), as well as a failure to acknowledge the abundant literature on false positive rates in rodent carcinogenicity studies. The NTP reported Ginkgo biloba extract to be carcinogenic in mice and rats. Gaus claims that, in this study, 4800 statistical comparisons were possible, and that 209 of them were statistically significant (p<0.05) compared with 240 (4800×0.05) expected by chance alone; thus, the carcinogenicity of Ginkgo biloba extract cannot be definitively established. However, his assumptions and calculations are flawed since he incorrectly assumes that the NTP uses no correction for multiple comparisons, and that significance tests for discrete data operate at exactly the nominal level. He also misrepresents the NTP's decision making process, overstates the number of statistical comparisons made, and ignores the fact that the mouse liver tumor effects were so striking (e.g., p<0.0000000000001) that it is virtually impossible that they could be false positive outcomes. Gaus' conclusion that such obvious responses merely "generate a hypothesis" rather than demonstrate a real carcinogenic effect has no scientific credibility. Moreover, his claims regarding the high frequency of false positive outcomes in carcinogenicity studies are misleading because of his methodological misconceptions and errors.
NASA Astrophysics Data System (ADS)
Abraham, J. D.; Ball, L. B.; Bedrosian, P. A.; Cannia, J. C.; Deszcz-Pan, M.; Minsley, B. J.; Peterson, S. M.; Smith, B. D.
2009-12-01
contacts between hydrostratigraphic units. This provides a 3D image of the hydrostratigraphic units interpreted from the electrical resistivity derived from the HEM tied to statistical confidences on the picked contacts. The interpreted 2D and 3D data provides the groundwater modeler with a high-resolution hydrogeologic framework and a solid understanding of the uncertainty in the information it provides. This interpretation facilitates more informed modeling decisions, more accurate groundwater models, and development of more effective water-resources management strategies.
Collegiate Enrollments in the U.S., 1981-82. Statistics, Interpretations, and Trends.
ERIC Educational Resources Information Center
Mickler, J. Ernest
Data and narrative information are presented on college enrollments, based on a survey of institutions in the United States, Puerto Rico, and U.S. Territories. The total four-year college enrollment for fall 1981 was 7,530,013, of which 5,306,832 were full-time and 2,223,181 were part-time. The total two-year college enrollment for fall 1981 was…
Statistical Analysis of Strength Data for an Aerospace Aluminum Alloy
NASA Technical Reports Server (NTRS)
Neergaard, Lynn; Malone, Tina; Gentz, Steven J. (Technical Monitor)
2000-01-01
Aerospace vehicles are produced in limited quantities that do not always allow development of MIL-HDBK-5 A-basis design allowables. One method of examining production and composition variations is to perform 100% lot acceptance testing for aerospace Aluminum (Al) alloys. This paper discusses statistical trends seen in strength data for one Al alloy. A four-step approach reduced the data to residuals, visualized residuals as a function of time, grouped data with quantified scatter, and conducted analysis of variance (ANOVA).
Statistical Analysis of Strength Data for an Aerospace Aluminum Alloy
NASA Technical Reports Server (NTRS)
Neergaard, L.; Malone, T.
2001-01-01
Aerospace vehicles are produced in limited quantities that do not always allow development of MIL-HDBK-5 A-basis design allowables. One method of examining production and composition variations is to perform 100% lot acceptance testing for aerospace Aluminum (Al) alloys. This paper discusses statistical trends seen in strength data for one Al alloy. A four-step approach reduced the data to residuals, visualized residuals as a function of time, grouped data with quantified scatter, and conducted analysis of variance (ANOVA).
Statistical comparison of similarity tests applied to speech production data
NASA Astrophysics Data System (ADS)
Kollia, H.; Jorgenson, Jay; Saint Fleur, Rose; Foster, Kevin
2004-05-01
Statistical analysis of data variability in speech production research has traditionally been addressed with the assumption of normally distributed error terms. The correct and valid application of statistical procedure requires a thorough investigation of the assumptions that underlie the methodology. In previous work [Kollia and Jorgenson, J. Acoust. Soc. Am. 102 (1997); 109 (2002)], it was shown that the error terms of speech production data in a linear regression can be modeled accurately using a quadratic probability distribution, rather than a normal distribution as is frequently assumed. The measurement used in the earlier Kollia-Jorgenson work involved the classical Kolmogorov-Smirnov statistical test. In the present work, the authors further explore the problem of analyzing the error terms coming from linear regression using a variety of known statistical tests, including, but not limited to chi-square, Kolmogorov-Smirnov, Anderson-Darling, Cramer-von Mises, skewness and kurtosis, and Durbin. Our study complements a similar study by Shapiro, Wilk, and Chen [J. Am. Stat. Assoc. (1968)]. [Partial support provided by PSC-CUNY and NSF to Jay Jorgenson.
Statistical modeling of natural backgrounds in hyperspectral LWIR data
NASA Astrophysics Data System (ADS)
Truslow, Eric; Manolakis, Dimitris; Cooley, Thomas; Meola, Joseph
2016-09-01
Hyperspectral sensors operating in the long wave infrared (LWIR) have a wealth of applications including remote material identification and rare target detection. While statistical models for modeling surface reflectance in visible and near-infrared regimes have been well studied, models for the temperature and emissivity in the LWIR have not been rigorously investigated. In this paper, we investigate modeling hyperspectral LWIR data using a statistical mixture model for the emissivity and surface temperature. Statistical models for the surface parameters can be used to simulate surface radiances and at-sensor radiance which drives the variability of measured radiance and ultimately the performance of signal processing algorithms. Thus, having models that adequately capture data variation is extremely important for studying performance trades. The purpose of this paper is twofold. First, we study the validity of this model using real hyperspectral data, and compare the relative variability of hyperspectral data in the LWIR and visible and near-infrared (VNIR) regimes. Second, we illustrate how materials that are easily distinguished in the VNIR, may be difficult to separate when imaged in the LWIR.
Bayesian Case Influence Measures for Statistical Models with Missing Data
Zhu, Hongtu; Ibrahim, Joseph G.; Cho, Hyunsoon; Tang, Niansheng
2011-01-01
We examine three Bayesian case influence measures including the φ-divergence, Cook's posterior mode distance and Cook's posterior mean distance for identifying a set of influential observations for a variety of statistical models with missing data including models for longitudinal data and latent variable models in the absence/presence of missing data. Since it can be computationally prohibitive to compute these Bayesian case influence measures in models with missing data, we derive simple first-order approximations to the three Bayesian case influence measures by using the Laplace approximation formula and examine the applications of these approximations to the identification of influential sets. All of the computations for the first-order approximations can be easily done using Markov chain Monte Carlo samples from the posterior distribution based on the full data. Simulated data and an AIDS dataset are analyzed to illustrate the methodology. PMID:23399928
A decision-theory approach to interpretable set analysis for high-dimensional data.
Boca, Simina M; Bravo, Héctor Céorrada; Caffo, Brian; Leek, Jeffrey T; Parmigiani, Giovanni
2013-09-01
A key problem in high-dimensional significance analysis is to find pre-defined sets that show enrichment for a statistical signal of interest; the classic example is the enrichment of gene sets for differentially expressed genes. Here, we propose a new decision-theory approach to the analysis of gene sets which focuses on estimating the fraction of non-null variables in a set. We introduce the idea of "atoms," non-overlapping sets based on the original pre-defined set annotations. Our approach focuses on finding the union of atoms that minimizes a weighted average of the number of false discoveries and missed discoveries. We introduce a new false discovery rate for sets, called the atomic false discovery rate (afdr), and prove that the optimal estimator in our decision-theory framework is to threshold the afdr. These results provide a coherent and interpretable framework for the analysis of sets that addresses the key issues of overlapping annotations and difficulty in interpreting p values in both competitive and self-contained tests. We illustrate our method and compare it to a popular existing method using simulated examples, as well as gene-set and brain ROI data analyses.
Tunariu, Aneta D; Reavey, Paula
2007-12-01
This paper explores the notion of sexual boredom through combining the use of qualitative and quantitative methods. Drawing on ideas from discursive psychology, we provide an interpretative reading of both numerical and textual data obtained via a postal questionnaire. Within the mixed-methods strategy adopted here, the questionnaire is treated as a medium that can deliver interesting material about prevalent linguistic resources, their content and pattern of use, available to romantic partners in making sense of sexual boredom. A total of 144 women and 66 men from the general population completed a set of structured questions, including a Sexual Boredom Scale (SBS; Watt & Ewing, 1996), followed by an open-ended question prompting more elaborated views on the topic. Statistical analysis found gender to explain some of the variation across SBS scores. An interpretative analysis of respondent ratings of disagreement/agreement and the actual meaning content of the scale's statements also reveals ranked and gendered regularities. Written responses to the open-ended question were subjected to a thematic analysis, revealing how specific changes to quality of sex, intensity of sexual interest and degree of romantic relatedness with a current partner are used by participants to delineate key dimensions of sexual boredom. Overall, the unfolding narratives of sexual boredom are greatly indebted to a static view of relationship satisfaction founded on wishful expectations for consistent, idealized displays of sexual excitement and interest from oneself and one's partner. The interplay between these understandings and a missing discourse of sexuo-erotic calmness is also considered.
Probability and Statistics in Astronomical Machine Learning and Data Minin
NASA Astrophysics Data System (ADS)
Scargle, Jeffrey
2012-03-01
Statistical issues peculiar to astronomy have implications for machine learning and data mining. It should be obvious that statistics lies at the heart of machine learning and data mining. Further it should be no surprise that the passive observational nature of astronomy, the concomitant lack of sampling control, and the uniqueness of its realm (the whole universe!) lead to some special statistical issues and problems. As described in the Introduction to this volume, data analysis technology is largely keeping up with major advances in astrophysics and cosmology, even driving many of them. And I realize that there are many scientists with good statistical knowledge and instincts, especially in the modern era I like to call the Age of Digital Astronomy. Nevertheless, old impediments still lurk, and the aim of this chapter is to elucidate some of them. Many experiences with smart people doing not-so-smart things (cf. the anecdotes collected in the Appendix here) have convinced me that the cautions given here need to be emphasized. Consider these four points: 1. Data analysis often involves searches of many cases, for example, outcomes of a repeated experiment, for a feature of the data. 2. The feature comprising the goal of such searches may not be defined unambiguously until the search is carried out, or perhaps vaguely even then. 3. The human visual system is very good at recognizing patterns in noisy contexts. 4. People are much easier to convince of something they want to believe, or already believe, as opposed to unpleasant or surprising facts. One can argue that all four are good things during the initial, exploratory phases of most data analysis. They represent the curiosity and creativity of the scientific process, especially during the exploration of data collections from new observational programs such as all-sky surveys in wavelengths not accessed before or sets of images of a planetary surface not yet explored. On the other hand, confirmatory scientific
Virsik, R.P.; Harder, D.
1981-01-01
The hypothesis that overdispersion of the chromosome aberration number per cell results from multiple aberrations per particle traversal is investigated in mathematical terms. At a given absorbed dose, Poisson distributions are assumed both for the number of ionizing particles traversing a cell nucleus and for the number of aberrations induced by a single particle traversal. The resulting distribution of the number of aberrations per cell is the Neyman type A distribution, a special case of the generalized Poisson distribution. This function is generally overdispersed, its relative variance 1 + lambda being determined by the expectation value lambda of aberrations per particle traversal. Data from experiments with neutrons and ..cap alpha.. particles are found to agree with this theory. The developed formalism provides a method to determine the efficiency of aberration induction per particle traversal, lambda, from the frequency distribution of aberrations.
A statistical model for iTRAQ data analysis.
Hill, Elizabeth G; Schwacke, John H; Comte-Walters, Susana; Slate, Elizabeth H; Oberg, Ann L; Eckel-Passow, Jeanette E; Therneau, Terry M; Schey, Kevin L
2008-08-01
We describe biological and experimental factors that induce variability in reporter ion peak areas obtained from iTRAQ experiments. We demonstrate how these factors can be incorporated into a statistical model for use in evaluating differential protein expression and highlight the benefits of using analysis of variance to quantify fold change. We demonstrate the model's utility based on an analysis of iTRAQ data derived from a spike-in study.
Adaptive statistical pattern classifiers for remotely sensed data
NASA Technical Reports Server (NTRS)
Gonzalez, R. C.; Pace, M. O.; Raulston, H. S.
1975-01-01
A technique for the adaptive estimation of nonstationary statistics necessary for Bayesian classification is developed. The basic approach to the adaptive estimation procedure consists of two steps: (1) an optimal stochastic approximation of the parameters of interest and (2) a projection of the parameters in time or position. A divergence criterion is developed to monitor algorithm performance. Comparative results of adaptive and nonadaptive classifier tests are presented for simulated four dimensional spectral scan data.
Computational and Statistical Analysis of Protein Mass Spectrometry Data
Noble, William Stafford; MacCoss, Michael J.
2012-01-01
High-throughput proteomics experiments involving tandem mass spectrometry produce large volumes of complex data that require sophisticated computational analyses. As such, the field offers many challenges for computational biologists. In this article, we briefly introduce some of the core computational and statistical problems in the field and then describe a variety of outstanding problems that readers of PLoS Computational Biology might be able to help solve. PMID:22291580
A Geophysical Atlas for Interpretation of Satellite-derived Data
NASA Technical Reports Server (NTRS)
Lowman, P. D., Jr. (Editor); Frey, H. V. (Editor); Davis, W. M.; Greenberg, A. P.; Hutchinson, M. K.; Langel, R. A.; Lowrey, B. E.; Marsh, J. G.; Mead, G. D.; Okeefe, J. A.
1979-01-01
A compilation of maps of global geophysical and geological data plotted on a common scale and projection is presented. The maps include satellite gravity, magnetic, seismic, volcanic, tectonic activity, and mantle velocity anomaly data. The Bibliographic references for all maps are included.
Helping Students Interpret Large-Scale Data Tables
ERIC Educational Resources Information Center
Prodromou, Theodosia
2016-01-01
New technologies have completely altered the ways that citizens can access data. Indeed, emerging online data sources give citizens access to an enormous amount of numerical information that provides new sorts of evidence used to influence public opinion. In this new environment, two trends have had a significant impact on our increasingly…
Noshing on Numbers: Using and Interpreting Data in Activities
NASA Astrophysics Data System (ADS)
Shupla, C. B.
2014-07-01
Students must learn how to plot and analyze data as a fundamental science and math skill. Data must also be incorporated into activities in meaningful ways that allow students to build understanding of the concepts being shared. In this workshop, attendees participated in three graphing activities, which served as the basis for discussion of these numerical literacy issues in the science classroom.
Pre-Service Teachers' Interpretation of CBM Progress Monitoring Data
ERIC Educational Resources Information Center
Wagner, Dana L.; Hammerschmidt-Snidarich, Stephanie M.; Espin, Christine A.; Seifert, Kathleen; McMaster, Kristen L.
2017-01-01
Teachers must be proficient at using data to evaluate the effects of instructional strategies and interventions, and must be able to make, describe, justify, and validate their data-based instructional decisions to parents, students, and educational colleagues. An important related skill is the ability to accurately read and interpret…
The GEOS Ozone Data Assimilation System: Specification of Error Statistics
NASA Technical Reports Server (NTRS)
Stajner, Ivanka; Riishojgaard, Lars Peter; Rood, Richard B.
2000-01-01
A global three-dimensional ozone data assimilation system has been developed at the Data Assimilation Office of the NASA/Goddard Space Flight Center. The Total Ozone Mapping Spectrometer (TOMS) total ozone and the Solar Backscatter Ultraviolet (SBUV) or (SBUV/2) partial ozone profile observations are assimilated. The assimilation, into an off-line ozone transport model, is done using the global Physical-space Statistical Analysis Scheme (PSAS). This system became operational in December 1999. A detailed description of the statistical analysis scheme, and in particular, the forecast and observation error covariance models is given. A new global anisotropic horizontal forecast error correlation model accounts for a varying distribution of observations with latitude. Correlations are largest in the zonal direction in the tropics where data is sparse. Forecast error variance model is proportional to the ozone field. The forecast error covariance parameters were determined by maximum likelihood estimation. The error covariance models are validated using x squared statistics. The analyzed ozone fields in the winter 1992 are validated against independent observations from ozone sondes and HALOE. There is better than 10% agreement between mean Halogen Occultation Experiment (HALOE) and analysis fields between 70 and 0.2 hPa. The global root-mean-square (RMS) difference between TOMS observed and forecast values is less than 4%. The global RMS difference between SBUV observed and analyzed ozone between 50 and 3 hPa is less than 15%.
The statistical analysis of multivariate serological frequency data.
Reyment, Richard A
2005-11-01
Data occurring in the form of frequencies are common in genetics-for example, in serology. Examples are provided by the AB0 group, the Rhesus group, and also DNA data. The statistical analysis of tables of frequencies is carried out using the available methods of multivariate analysis with usually three principal aims. One of these is to seek meaningful relationships between the components of a data set, the second is to examine relationships between populations from which the data have been obtained, the third is to bring about a reduction in dimensionality. This latter aim is usually realized by means of bivariate scatter diagrams using scores computed from a multivariate analysis. The multivariate statistical analysis of tables of frequencies cannot safely be carried out by standard multivariate procedures because they represent compositions and are therefore embedded in simplex space, a subspace of full space. Appropriate procedures for simplex space are compared and contrasted with simple standard methods of multivariate analysis ("raw" principal component analysis). The study shows that the differences between a log-ratio model and a simple logarithmic transformation of proportions may not be very great, particularly as regards graphical ordinations, but important discrepancies do occur. The divergencies between logarithmically based analyses and raw data are, however, great. Published data on Rhesus alleles observed for Italian populations are used to exemplify the subject.
Summary of Quantitative Interpretation of Image Far Ultraviolet Auroral Data
NASA Technical Reports Server (NTRS)
Frey, H. U.; Immel, T. J.; Mende, S. B.; Gerard, J.-C.; Hubert, B.; Habraken, S.; Span, J.; Gladstone, G. R.; Bisikalo, D. V.; Shematovich, V. I.; Six, N. Frank (Technical Monitor)
2002-01-01
Direct imaging of the magnetosphere by instruments on the IMAGE spacecraft is supplemented by simultaneous observations of the global aurora in three far ultraviolet (FUV) wavelength bands. The purpose of the multi-wavelength imaging is to study the global auroral particle and energy input from thc magnetosphere into the atmosphere. This paper describes provides the method for quantitative interpretation of FUV measurements. The Wide-Band Imaging Camera (WIC) provides broad band ultraviolet images of the aurora with maximum spatial and temporal resolution by imaging the nitrogen lines and bands between 140 and 180 nm wavelength. The Spectrographic Imager (SI), a dual wavelength monochromatic instrument, images both Doppler-shifted Lyman alpha emissions produced by precipitating protons, in the SI-12 channel and OI 135.6 nm emissions in the SI-13 channel. From the SI-12 Doppler shifted Lyman alpha images it is possible to obtain the precipitating proton flux provided assumptions are made regarding the mean energy of the protons. Knowledge of the proton (flux and energy) component allows the calculation of the contribution produced by protons in the WIC and SI-13 instruments. Comparison of the corrected WIC and SI-13 signals provides a measure of the electron mean energy, which can then be used to determine the electron energy fluxun-. To accomplish this reliable modeling emission modeling and instrument calibrations are required. In-flight calibration using early-type stars was used to validate the pre-flight laboratory calibrations and determine long-term trends in sensitivity. In general, very reasonable agreement is found between in-situ measurements and remote quantitative determinations.
Statistical Inference for Big Data Problems in Molecular Biophysics
Ramanathan, Arvind; Savol, Andrej; Burger, Virginia; Quinn, Shannon; Agarwal, Pratul K; Chennubhotla, Chakra
2012-01-01
We highlight the role of statistical inference techniques in providing biological insights from analyzing long time-scale molecular simulation data. Technologi- cal and algorithmic improvements in computation have brought molecular simu- lations to the forefront of techniques applied to investigating the basis of living systems. While these longer simulations, increasingly complex reaching petabyte scales presently, promise a detailed view into microscopic behavior, teasing out the important information has now become a true challenge on its own. Mining this data for important patterns is critical to automating therapeutic intervention discovery, improving protein design, and fundamentally understanding the mech- anistic basis of cellular homeostasis.
Training for Rapid Interpretation of Voluminous Multimodal Data
2008-04-01
Data DASW01-02-K-0001 5b. PROGRAM ELEMENT NUMBER 611102A 6. AUTHOR(S) 5c. PROJECT NUMBER Dennis J. Folds (Georgia Institute of Technology) B74F Cad T...TSD) and in contracts sponsored by that agency (see Cannon-Bowers & Salas, 1998). This program of research has generated many useful findings...of the proposed research program are as follows: 0 Assess the effects of data format, density, and overall volume on determination of relevance and
Interpreting Microarray Data to Build Models of Microbial Genetic Regulation Networks
Sokhansanj, B; Garnham, J B; Fitch, J P
2002-01-23
Microarrays and DNA chips are an efficient, high-throughput technology for measuring temporal changes in the expression of message RNA (mRNA) from thousands of genes (often the entire genome of an organism) in a single experiment. A crucial drawback of microarray experiments is that results are inherently qualitative: data are generally neither quantitatively repeatable, nor may microarray spot intensities be calibrated to in vivo mRNA concentrations. Nevertheless, microarrays represent by the far the cheapest and fastest way to obtain information about a cells global genetic regulatory networks. Besides poor signal characteristics, the massive number of data produced by microarray experiments poses challenges for visualization, interpretation and model building. Towards initial model development, we have developed a Java tool for visualizing the spatial organization of gene expression in bacteria. We are also developing an approach to inferring and testing qualitative fuzzy logic models of gene regulation using microarray data. Because we are developing and testing qualitative hypotheses that do not require quantitative precision, our statistical evaluation of experimental data is limited to checking for validity and consistency. Our goals are to maximize the impact of inexpensive microarray technology, bearing in mind that biological models and hypotheses are typically qualitative.
Spatial Statistical Procedures to Validate Input Data in Energy Models
Lawrence Livermore National Laboratory
2006-01-27
Energy modeling and analysis often relies on data collected for other purposes such as census counts, atmospheric and air quality observations, economic trends, and other primarily non-energy-related uses. Systematic collection of empirical data solely for regional, national, and global energy modeling has not been established as in the above-mentioned fields. Empirical and modeled data relevant to energy modeling is reported and available at various spatial and temporal scales that might or might not be those needed and used by the energy modeling community. The incorrect representation of spatial and temporal components of these data sets can result in energy models producing misleading conclusions, especially in cases of newly evolving technologies with spatial and temporal operating characteristics different from the dominant fossil and nuclear technologies that powered the energy economy over the last two hundred years. Increased private and government research and development and public interest in alternative technologies that have a benign effect on the climate and the environment have spurred interest in wind, solar, hydrogen, and other alternative energy sources and energy carriers. Many of these technologies require much finer spatial and temporal detail to determine optimal engineering designs, resource availability, and market potential. This paper presents exploratory and modeling techniques in spatial statistics that can improve the usefulness of empirical and modeled data sets that do not initially meet the spatial and/or temporal requirements of energy models. In particular, we focus on (1) aggregation and disaggregation of spatial data, (2) predicting missing data, and (3) merging spatial data sets. In addition, we introduce relevant statistical software models commonly used in the field for various sizes and types of data sets.
Spatial Statistical Procedures to Validate Input Data in Energy Models
Johannesson, G.; Stewart, J.; Barr, C.; Brady Sabeff, L.; George, R.; Heimiller, D.; Milbrandt, A.
2006-01-01
Energy modeling and analysis often relies on data collected for other purposes such as census counts, atmospheric and air quality observations, economic trends, and other primarily non-energy related uses. Systematic collection of empirical data solely for regional, national, and global energy modeling has not been established as in the abovementioned fields. Empirical and modeled data relevant to energy modeling is reported and available at various spatial and temporal scales that might or might not be those needed and used by the energy modeling community. The incorrect representation of spatial and temporal components of these data sets can result in energy models producing misleading conclusions, especially in cases of newly evolving technologies with spatial and temporal operating characteristics different from the dominant fossil and nuclear technologies that powered the energy economy over the last two hundred years. Increased private and government research and development and public interest in alternative technologies that have a benign effect on the climate and the environment have spurred interest in wind, solar, hydrogen, and other alternative energy sources and energy carriers. Many of these technologies require much finer spatial and temporal detail to determine optimal engineering designs, resource availability, and market potential. This paper presents exploratory and modeling techniques in spatial statistics that can improve the usefulness of empirical and modeled data sets that do not initially meet the spatial and/or temporal requirements of energy models. In particular, we focus on (1) aggregation and disaggregation of spatial data, (2) predicting missing data, and (3) merging spatial data sets. In addition, we introduce relevant statistical software models commonly used in the field for various sizes and types of data sets.
Interpretation of recent AMPTE data at the magnetopause
NASA Astrophysics Data System (ADS)
Heikkila, Walter J.
1997-02-01
Phan and Paschmann [1996] have done a superposed epoch analysis of conditions near the dayside magnetopause and have found significant structure within the magnetopause current sheet itself. Among their many important results is that the electron temperature for an outward profile shows cooling of the solar wind plasma for the inner part followed by heating for the outer. Since these two cases are associated with
Autonomous exploration system: Techniques for interpretation of multispectral data
NASA Technical Reports Server (NTRS)
Yates, Gigi; Eberlein, Susan
1989-01-01
An on-board autonomous exploration system that fuses data from multiple sensors, and makes decisions based on scientific goals is being developed using a series of artificial neural networks. Emphasis is placed on classifying minerals into broad geological categories by analyzing multispectral data from an imaging spectrometer. Artificial neural network architectures are being investigated for pattern matching and feature detection, information extraction, and decision making. As a first step, a stereogrammetry net extracts distance data from two gray scale stereo images. For each distance plane, the output is the probable mineral composition of the region, and a list of spectral features such as peaks, valleys, or plateaus, showing the characteristics of energy absorption and reflection. The classifier net is constructed using a grandmother cell architecture: an input layer of spectral data, an intermediate processor, and an output value. The feature detector is a three-layer feed-forward network that was developed to map input spectra to four geological classes, and will later be expanded to encompass more classes. Results from the classifier and feature detector nets will help to determine the relative importance of the region being examined with regard to current scientific goals of the system. This information is fed into a decision making neural net along with data from other sensors to decide on a plan of activity. A plan may be to examine the region at higher resolution, move closer, employ other sensors, or record an image and transmit it back to Earth.
Statistics of Optical Coherence Tomography Data From Human Retina
de Juan, Joaquín; Ferrone, Claudia; Giannini, Daniela; Huang, David; Koch, Giorgio; Russo, Valentina; Tan, Ou; Bruni, Carlo
2010-01-01
Optical coherence tomography (OCT) has recently become one of the primary methods for noninvasive probing of the human retina. The pseudoimage formed by OCT (the so-called B-scan) varies probabilistically across pixels due to complexities in the measurement technique. Hence, sensitive automatic procedures of diagnosis using OCT may exploit statistical analysis of the spatial distribution of reflectance. In this paper, we perform a statistical study of retinal OCT data. We find that the stretched exponential probability density function can model well the distribution of intensities in OCT pseudoimages. Moreover, we show a small, but significant correlation between neighbor pixels when measuring OCT intensities with pixels of about 5 µm. We then develop a simple joint probability model for the OCT data consistent with known retinal features. This model fits well the stretched exponential distribution of intensities and their spatial correlation. In normal retinas, fit parameters of this model are relatively constant along retinal layers, but varies across layers. However, in retinas with diabetic retinopathy, large spikes of parameter modulation interrupt the constancy within layers, exactly where pathologies are visible. We argue that these results give hope for improvement in statistical pathology-detection methods even when the disease is in its early stages. PMID:20304733
Statistical comparison of the AGDISP model with deposit data
NASA Astrophysics Data System (ADS)
Duan, Baozhong; Yendol, William G.; Mierzejewski, Karl
An aerial spray Agricultural Dispersal (AGDISP) model was tested against quantitative field data. The microbial pesticide Bacillus thuringiensis (Bt) was sprayed as fine spray from a helicopted over a flat site in various meteorological conditions. Droplet deposition on evenly spaced Kromekote cards, 0.15 m above the ground, was measured with image analysis equipment. Six complete data sets out of the 12 trials were selected for data comparison. A set of statistical parameters suggested by the American Meteorological Society and other authors was applied for comparisons of the model prediction with the ground deposit data. The results indicated that AGDISP tended to overpredict the average volume deposition by a factor of two. The sensitivity test of the AGDISP model to the input wind direction showed that the model may not be sensitive to variations in wind direction within 10 degrees relative to aircraft flight path.
Outpatient health care statistics data warehouse--implementation.
Zilli, D
1999-01-01
Data warehouse implementation is assumed to be a very knowledge-demanding, expensive and long-lasting process. As such it requires senior management sponsorship, involvement of experts, a big budget and probably years of development time. Presented Outpatient Health Care Statistics Data Warehouse implementation research provides ample evidence against the infallibility of the above statements. New, inexpensive, but powerful technology, which provides outstanding platform for On-Line Analytical Processing (OLAP), has emerged recently. Presumably, it will be the basis for the estimated future growth of data warehouse market, both in the medical and in other business fields. Methods and tools for building, maintaining and exploiting data warehouses are also briefly discussed in the paper.
The analysis, interpretation, and presentation of quality of life data.
Stephens, Richard
2004-02-01
All too often in clinical trials the assessment of quality of life is seen as a bolt-on study. Consequently insufficient consideration is often given to its design, collection, analysis and presentation, and its impact on the trial results and on clinical practice is minimal. In many trials quality of life is a key endpoint, and it is vital that quality of life expertise is involved as soon as possible in the design. Setting a priori quality of life hypotheses will focus the decisions regarding which questionnaire to use, when to administer it, the sample size required, and the primary analyses. Nevertheless quality of life data are complex, and require much skill in determining how to deal with multi-dimensional and longitudinal data, much of which is often missing. There are no agreed standard ways of analysing and presenting quality of life data, but there are guidelines, which if followed, will add transparency to the way results have been calculated. Understanding the impact of treatments on their quality of life is vital to patients, and it is up to us, as statisticians and trialists, to present the data as clearly as we can.
Statistical mechanics of complex neural systems and high dimensional data
NASA Astrophysics Data System (ADS)
Advani, Madhu; Lahiri, Subhaneil; Ganguli, Surya
2013-03-01
Recent experimental advances in neuroscience have opened new vistas into the immense complexity of neuronal networks. This proliferation of data challenges us on two parallel fronts. First, how can we form adequate theoretical frameworks for understanding how dynamical network processes cooperate across widely disparate spatiotemporal scales to solve important computational problems? Second, how can we extract meaningful models of neuronal systems from high dimensional datasets? To aid in these challenges, we give a pedagogical review of a collection of ideas and theoretical methods arising at the intersection of statistical physics, computer science and neurobiology. We introduce the interrelated replica and cavity methods, which originated in statistical physics as powerful ways to quantitatively analyze large highly heterogeneous systems of many interacting degrees of freedom. We also introduce the closely related notion of message passing in graphical models, which originated in computer science as a distributed algorithm capable of solving large inference and optimization problems involving many coupled variables. We then show how both the statistical physics and computer science perspectives can be applied in a wide diversity of contexts to problems arising in theoretical neuroscience and data analysis. Along the way we discuss spin glasses, learning theory, illusions of structure in noise, random matrices, dimensionality reduction and compressed sensing, all within the unified formalism of the replica method. Moreover, we review recent conceptual connections between message passing in graphical models, and neural computation and learning. Overall, these ideas illustrate how statistical physics and computer science might provide a lens through which we can uncover emergent computational functions buried deep within the dynamical complexities of neuronal networks.
A Statistical Quality Model for Data-Driven Speech Animation.
Ma, Xiaohan; Deng, Zhigang
2012-11-01
In recent years, data-driven speech animation approaches have achieved significant successes in terms of animation quality. However, how to automatically evaluate the realism of novel synthesized speech animations has been an important yet unsolved research problem. In this paper, we propose a novel statistical model (called SAQP) to automatically predict the quality of on-the-fly synthesized speech animations by various data-driven techniques. Its essential idea is to construct a phoneme-based, Speech Animation Trajectory Fitting (SATF) metric to describe speech animation synthesis errors and then build a statistical regression model to learn the association between the obtained SATF metric and the objective speech animation synthesis quality. Through delicately designed user studies, we evaluate the effectiveness and robustness of the proposed SAQP model. To the best of our knowledge, this work is the first-of-its-kind, quantitative quality model for data-driven speech animation. We believe it is the important first step to remove a critical technical barrier for applying data-driven speech animation techniques to numerous online or interactive talking avatar applications.
Methods for Quantitative Interpretation of Retarding Field Analyzer Data
Calvey, J.R.; Crittenden, J.A.; Dugan, G.F.; Palmer, M.A.; Furman, M.; Harkay, K.
2011-03-28
Over the course of the CesrTA program at Cornell, over 30 Retarding Field Analyzers (RFAs) have been installed in the CESR storage ring, and a great deal of data has been taken with them. These devices measure the local electron cloud density and energy distribution, and can be used to evaluate the efficacy of different cloud mitigation techniques. Obtaining a quantitative understanding of RFA data requires use of cloud simulation programs, as well as a detailed model of the detector itself. In a drift region, the RFA can be modeled by postprocessing the output of a simulation code, and one can obtain best fit values for important simulation parameters with a chi-square minimization method.
Suitability of Archie's Law For Interpreting Electrical Resistivity Data
NASA Astrophysics Data System (ADS)
Singha, K.; Gorelick, S. M.
2003-12-01
Electrical resistivity tomography (ERT) is examined as a method to provide spatially continuous images of saline tracer concentrations during transport through unconsolidated fluid-saturated media. It is frequently accepted that there exists a quantitative relationship between the electrical conductivity of dilute electrolytes in pore water and bulk electrical conductivity of the subsurface measured using resistivity methods. The assumed relationship is typically Archie's Law. We tested the applicability of Archie's Law to field-scale data collected over a 10 m by 14 m area. A 20-day weak-dipole tracer test was conducted, in which 2 g/L NaCl were introduced into the upper 30 m of the saturated zone in a coarse sand and gravel aquifer. Cross-well ERT data were collected at 4 geophysical monitoring wells and inverted in 3-D. Fluid electrical conductivity was measured directly from a multilevel sampler. The change in the direct measurements of fluid electrical conductivity exceeded the change in bulk conductivity values in the tomograms by an order of magnitude. The estimated Archie formation factor from the field data was not constant with time, due largely to smoothing during the image reconstruction process. We illustrate by modeling synthetic cases over the field site that the ERT response is difficult to match to measured fluid conductivities due to the variability in the effects of regularization, which change in both space and time. Analysis of both the field data and synthetic cases suggest that Archie's Law cannot be used to directly scale ERT conductivities to fluid conductivities.
Assessment of Dermatophytosis Treatment Studies: Interpreting the Data.
Rosen, Theodore
2015-10-01
Antifungal therapy has recently enjoyed a resurgence of interest due to the introduction of a number of new formulations of topical drugs and novel molecules. This has led to a plethora of new publications on management of cutaneous fungal disease. This paper summarizes the various clinical trial factors which may affect the published data regarding how well antifungal drugs work. Understanding these parameters allows the healthcare provider to choose more rationally between available agents based upon an assessment of the evidence.
Interpreting Multiple Environmental Tracer Data in a Perialpine Catchment
NASA Astrophysics Data System (ADS)
Onnis, G. A.; Althaus, R.; Klump, S.; Purtschert, R.; Kipfer, R.; Hendricks-Franssen, H.; Stauffer, F.; Kinzelbach, W.
2008-12-01
A case study for the environmental tracers Tritium, Helium-3 and Krypton-85 in a small sand-gravel aquifer catchment in Northern Switzerland is presented. The groundwater flow is determined by means of Stochastic Inverse Modelling, using available transient hydraulic head and transmissivity (T) data to calibrate the transmissivity field with the Sequential Self-Calibration Technique as implemented in the code INVERTO. The evaluation of the aquifer recharge and its discharge via natural springs is independently performed and confirmed through comparison of simulated and observed head after the inversion procedure. A number of equally-likely transmissivity field realizations honoring both transmissivity and transient head measurements is generated, establishing the basis for environmental tracer transport modeling. The impact of the spatially-variable, thick unsaturated zone (>10 m) on tracer transport is accounted for by means of a numerical solution to the vertical advection-dispersion equation. Starting from the measured tracer concentrations in the atmosphere, the input history to the saturated zone is reconstructed for different groundwater table depths. Environmental tracer transport in the saturated zone is investigated for each calibrated T -realization. The transport simulation results are in general fair for all tracers and can well reproduce the tracer data at most observation locations, with a small uncertainty bandwidth related to the T -parameter. Ad-hoc zonation of transport parameters (vadose zone gas-phase tortuosity and saturated porosity) can help in achieving a simultaneous match of the tracer data at all locations. However, the model can account for only 20% of the amplitude of the high-frequency oscillations in Krypton-85 concentrations observed at one pumping station. Short term variations of the recharge rate and of the actual pumping rate can account for a further 10% each to the Krypton-85 concentration fluctuations amplitudes. The origin of
Searching the Heavens: Astronomy, Computation, Statistics, Data Mining and Philosophy
NASA Astrophysics Data System (ADS)
Glymour, Clark
2012-03-01
Our first and purest science, the mother of scientific methods, sustained by sheer curiosity, searching the heavens we cannot manipulate. From the beginning, astronomy has combined mathematical idealization, technological ingenuity, and indefatigable data collection with procedures to search through assembled data for the processes that govern the cosmos. Astronomers are, and ever have been, data miners, and for that reason astronomical methods (but not astronomical discoveries) have often been despised by statisticians and philosophers. Epithets laced the statistical literature: Ransacking! Data dredging! Double Counting! Statistical disdain was usually directed at social scientists and biologists, rarely if ever at astronomers, but the methodological attitudes and goals that many twentieth-century philosophers and statisticians rejected were creations of the astronomical tradition. The philosophical criticisms were earlier and more direct. In the shadow (or in Alexander Popeâs phrasing, the light) cast on nature in the eighteenth century by the Newtonian triumph, David Hume revived arguments from the ancient Greeks to challenge the very possibility of coming to know what causes what. His conclusion was endorsed in the twentieth century by many philosophers who found talk of causation unnecessary or unacceptably metaphysical, and absorbed by many statisticians as a general suspicion of causal claims, except possibly when they are founded on experimental manipulation. And yet in the hands of a mathematician, Thomas Bayes, and another mathematician and philosopher, Richard Price, Humeâs essays prompted the development of a new kind of statistics, the kind we now call "Bayesian." The computer and new data acquisition methods have begun to dissolve the antipathy between astronomy, philosophy, and statistics. But the resolution is practical, without much reflection on the arguments or the course of events. So, I offer a largely unoriginal history
Undergraduate non-science majors' descriptions and interpretations of scientific data visualizations
NASA Astrophysics Data System (ADS)
Swenson, Sandra Signe
Professionally developed and freely accessible through the Internet, scientific data maps have great potential for teaching and learning with data in the science classroom. Solving problems or developing ideas while using data maps of Earth phenomena in the science classroom may help students to understand the nature and process of science. Little is known about how students perceive and interpret scientific data visualizations. This study was an in-depth exploration of descriptions and interpretations of topographic and bathymetric data maps made by a population of 107 non-science majors at an urban public college. Survey, interviews, and artifacts were analyzed within an epistemological framework for understanding data collected about the Earth, by examining representational strategies used to understand maps, and by examining student interpretations using Bloom's Taxonomy of Educational Objectives. The findings suggest that the majority of students interpret data maps by assuming iconicity that was not intended by the maps creator; that students do not appear to have a robust understanding of how data is collected about Earth phenomena; and that while most students are able to make some kinds of interpretations of the data maps, often their interpretations are not based upon the actual data the map is representing. This study provided baseline information of student understanding of data maps from which educators may design curriculum for teaching and learning about Earth phenomena.
RNA Pol II transcription model and interpretation of GRO-seq data.
Lladser, Manuel E; Azofeifa, Joseph G; Allen, Mary A; Dowell, Robin D
2017-01-01
A mixture model and statistical method is proposed to interpret the distribution of reads from a nascent transcriptional assay, such as global run-on sequencing (GRO-seq) data. The model is annotation agnostic and leverages on current understanding of the behavior of RNA polymerase II. Briefly, it assumes that polymerase loads at key positions (transcription start sites) within the genome. Once loaded, polymerase either remains in the initiation form (with some probability) or transitions into an elongating form (with the remaining probability). The model can be fit genome-wide, allowing patterns of Pol II behavior to be assessed on each distinct transcript. Furthermore, it allows for the first time a principled approach to distinguishing the initiation signal from the elongation signal; in particular, it implies a data driven method for calculating the pausing index, a commonly used metric that informs on the behavior of RNA polymerase II. We demonstrate that this approach improves on existing analyses of GRO-seq data and uncovers a novel biological understanding of the impact of knocking down the Male Specific Lethal (MSL) complex in Drosophilia melanogaster.
Dredging Research Program Understanding and Interpreting Seabed Drifter (SBD) Data
1994-01-01
Dr. Rudy Hoffman, U.S. Navy , Naval Ocean Research & Development Activity, NSTL, MS, using data collected by Dr. W. W. Schroeder, Marine Science...by Resio and Vincent (1977) and was used in a study for the Navy to estimate winds that drove predictive current models for a similar area of the Gulf...ables5 Mntlon of Wind and Wave (5-Day Categories) 0 -, 10 knos 1 10 knob S U#2. < 2D knotb 2 k,"I UWS~ OWve Cabgoey cm.@ Odv r unm Speed o Ivcl < us ae
Automatic interpretation of ERTS data for forest management
NASA Technical Reports Server (NTRS)
Kirvida, L.; Johnson, G. R.
1973-01-01
Automatic stratification of forested land from ERTS-1 data provides a valuable tool for resource management. The results are useful for wood product yield estimates, recreation and wild life management, forest inventory and forest condition monitoring. Automatic procedures based on both multi-spectral and spatial features are evaluated. With five classes, training and testing on the same samples, classification accuracy of 74% was achieved using the MSS multispectral features. When adding texture computed from 8 x 8 arrays, classification accuracy of 99% was obtained.
Challenges in analysis and interpretation of microsatellite data for population genetic studies
Putman, Alexander I; Carbone, Ignazio
2014-01-01
Advancing technologies have facilitated the ever-widening application of genetic markers such as microsatellites into new systems and research questions in biology. In light of the data and experience accumulated from several years of using microsatellites, we present here a literature review that synthesizes the limitations of microsatellites in population genetic studies. With a focus on population structure, we review the widely used fixation (FST) statistics and Bayesian clustering algorithms and find that the former can be confusing and problematic for microsatellites and that the latter may be confounded by complex population models and lack power in certain cases. Clustering, multivariate analyses, and diversity-based statistics are increasingly being applied to infer population structure, but in some instances these methods lack formalization with microsatellites. Migration-specific methods perform well only under narrow constraints. We also examine the use of microsatellites for inferring effective population size, changes in population size, and deeper demographic history, and find that these methods are untested and/or highly context-dependent. Overall, each method possesses important weaknesses for use with microsatellites, and there are significant constraints on inferences commonly made using microsatellite markers in the areas of population structure, admixture, and effective population size. To ameliorate and better understand these constraints, researchers are encouraged to analyze simulated datasets both prior to and following data collection and analysis, the latter of which is formalized within the approximate Bayesian computation framework. We also examine trends in the literature and show that microsatellites continue to be widely used, especially in non-human subject areas. This review assists with study design and molecular marker selection, facilitates sound interpretation of microsatellite data while fostering respect for their practical
Biosimilar insulins: guidance for data interpretation by clinicians and users.
Heinemann, L; Home, P D; Hompesch, M
2015-10-01
Biosimilar insulins are approved copies of insulins outside patent protection. Advantages may include greater market competition and potential cost reduction, but clinicians and users lack a clear perspective on 'biosimilarity' for insulins. The manufacturing processes for biosimilar insulins are manufacturer-specific and, although these are reviewed by regulators there are few public data available to allow independent assessment or review of issues such as intrinsic quality or batch-to-batch variation. Preclinical measures used to assess biosimilarity, such as tissue and cellular studies of metabolic activity, physico-chemical stability and animal studies of pharmacodynamics, pharmacokinetics and immunogenicity may be insufficiently sensitive to differences, and are often not formally published. Pharmacokinetic and pharmacodynamic studies (glucose clamps) with humans, although core assessments, have problems of precision which are relevant for accurate insulin dosing. Studies that assess clinical efficacy and safety and device compatibility are limited by current outcome measures, such as glycated haemoblobin levels and hypoglycaemia, which are insensitive to differences between insulins. To address these issues, we suggest that all comparative data are put in the public domain, and that systematic clinical studies are performed to address batch-to-batch variability, delivery devices, interchangeability in practice and long-term efficacy and safety. Despite these challenges biosimilar insulins are a welcome addition to diabetes therapy and, with a transparent approach, should provide useful benefit to insulin users.
Summary Statistics for Homemade ?Play Dough? -- Data Acquired at LLNL
Kallman, J S; Morales, K E; Whipple, R E; Huber, R D; Martz, A; Brown, W D; Smith, J A; Schneberk, D J; Martz, Jr., H E; White, III, W T
2010-03-11
Using x-ray computerized tomography (CT), we have characterized the x-ray linear attenuation coefficients (LAC) of a homemade Play Dough{trademark}-like material, designated as PDA. Table 1 gives the first-order statistics for each of four CT measurements, estimated with a Gaussian kernel density estimator (KDE) analysis. The mean values of the LAC range from a high of about 2700 LMHU{sub D} 100kVp to a low of about 1200 LMHUD at 300kVp. The standard deviation of each measurement is around 10% to 15% of the mean. The entropy covers the range from 6.0 to 7.4. Ordinarily, we would model the LAC of the material and compare the modeled values to the measured values. In this case, however, we did not have the detailed chemical composition of the material and therefore did not model the LAC. Using a method recently proposed by Lawrence Livermore National Laboratory (LLNL), we estimate the value of the effective atomic number, Z{sub eff}, to be near 10. LLNL prepared about 50mL of the homemade 'Play Dough' in a polypropylene vial and firmly compressed it immediately prior to the x-ray measurements. We used the computer program IMGREC to reconstruct the CT images. The values of the key parameters used in the data capture and image reconstruction are given in this report. Additional details may be found in the experimental SOP and a separate document. To characterize the statistical distribution of LAC values in each CT image, we first isolated an 80% central-core segment of volume elements ('voxels') lying completely within the specimen, away from the walls of the polypropylene vial. All of the voxels within this central core, including those comprised of voids and inclusions, are included in the statistics. We then calculated the mean value, standard deviation and entropy for (a) the four image segments and for (b) their digital gradient images. (A digital gradient image of a given image was obtained by taking the absolute value of the difference between the initial image
Summary Statistics for Fun Dough Data Acquired at LLNL
Kallman, J S; Morales, K E; Whipple, R E; Huber, R D; Brown, W D; Smith, J A; Schneberk, D J; Martz, Jr., H E; White, III, W T
2010-03-11
Using x-ray computerized tomography (CT), we have characterized the x-ray linear attenuation coefficients (LAC) of a Play Dough{trademark}-like product, Fun Dough{trademark}, designated as PD. Table 1 gives the first-order statistics for each of four CT measurements, estimated with a Gaussian kernel density estimator (KDE) analysis. The mean values of the LAC range from a high of about 2100 LMHU{sub D} at 100kVp to a low of about 1100 LMHU{sub D} at 300kVp. The standard deviation of each measurement is around 1% of the mean. The entropy covers the range from 3.9 to 4.6. Ordinarily, we would model the LAC of the material and compare the modeled values to the measured values. In this case, however, we did not have the composition of the material and therefore did not model the LAC. Using a method recently proposed by Lawrence Livermore National Laboratory (LLNL), we estimate the value of the effective atomic number, Z{sub eff}, to be near 8.5. LLNL prepared about 50mL of the Fun Dough{trademark} in a polypropylene vial and firmly compressed it immediately prior to the x-ray measurements. Still, layers can plainly be seen in the reconstructed images, indicating that the bulk density of the material in the container is affected by voids and bubbles. We used the computer program IMGREC to reconstruct the CT images. The values of the key parameters used in the data capture and image reconstruction are given in this report. Additional details may be found in the experimental SOP and a separate document. To characterize the statistical distribution of LAC values in each CT image, we first isolated an 80% central-core segment of volume elements ('voxels') lying completely within the specimen, away from the walls of the polypropylene vial. All of the voxels within this central core, including those comprised of voids and inclusions, are included in the statistics. We then calculated the mean value, standard deviation and entropy for (a) the four image segments and for (b
Inferential statistics from Black Hispanic breast cancer survival data.
Khan, Hafiz M R; Saxena, Anshul; Ross, Elizabeth; Ramamoorthy, Venkataraghavan; Sheehan, Diana
2014-01-01
In this paper we test the statistical probability models for breast cancer survival data for race and ethnicity. Data was collected from breast cancer patients diagnosed in United States during the years 1973-2009. We selected a stratified random sample of Black Hispanic female patients from the Surveillance Epidemiology and End Results (SEER) database to derive the statistical probability models. We used three common model building criteria which include Akaike Information Criteria (AIC), Bayesian Information Criteria (BIC), and Deviance Information Criteria (DIC) to measure the goodness of fit tests and it was found that Black Hispanic female patients survival data better fit the exponentiated exponential probability model. A novel Bayesian method was used to derive the posterior density function for the model parameters as well as to derive the predictive inference for future response. We specifically focused on Black Hispanic race. Markov Chain Monte Carlo (MCMC) method was used for obtaining the summary results of posterior parameters. Additionally, we reported predictive intervals for future survival times. These findings would be of great significance in treatment planning and healthcare resource allocation.
Uncertainty Quantification and Statistical Convergence Guidelines for PIV Data
NASA Astrophysics Data System (ADS)
Stegmeir, Matthew; Kassen, Dan
2016-11-01
As Particle Image Velocimetry has continued to mature, it has developed into a robust and flexible technique for velocimetry used by expert and non-expert users. While historical estimates of PIV accuracy have typically relied heavily on "rules of thumb" and analysis of idealized synthetic images, recently increased emphasis has been placed on better quantifying real-world PIV measurement uncertainty. Multiple techniques have been developed to provide per-vector instantaneous uncertainty estimates for PIV measurements. Often real-world experimental conditions introduce complications in collecting "optimal" data, and the effect of these conditions is important to consider when planning an experimental campaign. The current work utilizes the results of PIV Uncertainty Quantification techniques to develop a framework for PIV users to utilize estimated PIV confidence intervals to compute reliable data convergence criteria for optimal sampling of flow statistics. Results are compared using experimental and synthetic data, and recommended guidelines and procedures leveraging estimated PIV confidence intervals for efficient sampling for converged statistics are provided.
Statistical analysis of epidemiologic data of pregnancy outcomes
Butler, W.J.; Kalasinski, L.A. )
1989-02-01
In this paper, a generalized logistic regression model for correlated observations is used to analyze epidemiologic data on the frequency of spontaneous abortion among a group of women office workers. The results are compared to those obtained from the use of the standard logistic regression model that assumes statistical independence among all the pregnancies contributed by one woman. In this example, the correlation among pregnancies from the same woman is fairly small and did not have a substantial impact on the magnitude of estimates of parameters of the model. This is due at least partly to the small average number of pregnancies contributed by each woman.
JAWS data collection, analysis highlights, and microburst statistics
NASA Technical Reports Server (NTRS)
Mccarthy, J.; Roberts, R.; Schreiber, W.
1983-01-01
Organization, equipment, and the current status of the Joint Airport Weather Studies project initiated in relation to the microburst phenomenon are summarized. Some data collection techniques and preliminary statistics on microburst events recorded by Doppler radar are discussed as well. Radar studies show that microbursts occur much more often than expected, with majority of the events being potentially dangerous to landing or departing aircraft. Seventy events were registered, with the differential velocities ranging from 10 to 48 m/s; headwind/tailwind velocity differentials over 20 m/s are considered seriously hazardous. It is noted that a correlation is yet to be established between the velocity differential and incoherent radar reflectivity.
Information gathering for the Transportation Statistics Data Bank
Shappert, L.B.; Mason, P.J.
1981-10-01
The Transportation Statistics Data Bank (TSDB) was developed in 1974 to collect information on the transport of Department of Energy (DOE) materials. This computer program may be used to provide the framework for collecting more detailed information on DOE shipments of radioactive materials. This report describes the type of information that is needed in this area and concludes that the existing system could be readily modified to collect and process it. The additional needed information, available from bills of lading and similar documents, could be gathered from DOE field offices and transferred in a standard format to the TSDB system. Costs of the system are also discussed briefly.
Predictive data modeling of human type II diabetes related statistics
NASA Astrophysics Data System (ADS)
Jaenisch, Kristina L.; Jaenisch, Holger M.; Handley, James W.; Albritton, Nathaniel G.
2009-04-01
During the course of routine Type II treatment of one of the authors, it was decided to derive predictive analytical Data Models of the daily sampled vital statistics: namely weight, blood pressure, and blood sugar, to determine if the covariance among the observed variables could yield a descriptive equation based model, or better still, a predictive analytical model that could forecast the expected future trend of the variables and possibly eliminate the number of finger stickings required to montior blood sugar levels. The personal history and analysis with resulting models are presented.
Interpretation of Pennsylvania agricultural land use from ERTS-1 data
NASA Technical Reports Server (NTRS)
Mcmurtry, G. J.; Petersen, G. W. (Principal Investigator); Wilson, A. D.
1974-01-01
The author has identified the following significant results. To study the complex agricultural patterns in Pennsylvania, a portion of an ERTS scene was selected for detailed analysis. Various photographic products were made and were found to be only of limited value. This necessitated the digital processing of the ERTS data. Using an unsupervised classification procedure, it was possible to delineate the following categories: (1) forest land with a northern aspect, (2) forest land with a southern aspect, (3) valley trees, (4) wheat, (5) corn, (6) alfalfa, grass, pasture, (7) disturbed land, (8) builtup land, (9) strip mines, and (10) water. These land use categories were delineated at a scale of approximately 1:20,000 on the line printer output. Land use delineations were also made using the General Electric IMAGE 100 interactive analysis system.
Interpreting low spectral resolution data of transiting exoplanets
NASA Astrophysics Data System (ADS)
Griffith, C. A.; Turner, J. D.; Zellem, R.; Tinetti, G.; Teske, J.
2013-09-01
During primary transit transmission spectra of the exoplanet's limb are recorded as the planet passes in front of the star. During secondary eclipse, measurements yield the planetary emission spectra. Photometry and spectroscopy of transiting exoplanets indicate the presence of water, methane, carbon monoxide and potentially carbon dioxide in a number of extrasolar planets [1, 2, 3, 4, 5, 6]. Observations at different points in an exoplanet's orbit also reveal variations in the planet's temperature field with longitude, which manifest the planet's dynamical redistribution of heat [7]. Yet even for the brightest systems, molecular abundances are constrained only to within 3-5 orders ofmagnitude and temperatures as a function of pressure to roughly 300 K. A large part the uncertainties stem from the range of models that fit the data. Here we explore the degeneracies in the solution sets with the aim to better constrain and measure planetary characteristics.
Children's and Adults' Interpretation of Covariation Data: Does Symmetry of Variables Matter?
ERIC Educational Resources Information Center
Saffran, Andrea; Barchfeld, Petra; Sodian, Beate; Alibali, Martha W.
2016-01-01
In a series of 3 experiments, the authors investigated the influence of symmetry of variables on children's and adults' data interpretation. They hypothesized that symmetrical (i.e., present/present) variables would support correct interpretations more than asymmetrical (i.e., present/absent) variables. Participants were asked to judge covariation…
StegoWall: blind statistical detection of hidden data
NASA Astrophysics Data System (ADS)
Voloshynovskiy, Sviatoslav V.; Herrigel, Alexander; Rytsar, Yuri B.; Pun, Thierry
2002-04-01
Novel functional possibilities, provided by recent data hiding technologies, carry out the danger of uncontrolled (unauthorized) and unlimited information exchange that might be used by people with unfriendly interests. The multimedia industry as well as the research community recognize the urgent necessity for network security and copyright protection, or rather the lack of adequate law for digital multimedia protection. This paper advocates the need for detecting hidden data in digital and analog media as well as in electronic transmissions, and for attempting to identify the underlying hidden data. Solving this problem calls for the development of an architecture for blind stochastic hidden data detection in order to prevent unauthorized data exchange. The proposed architecture is called StegoWall; its key aspects are the solid investigation, the deep understanding, and the prediction of possible tendencies in the development of advanced data hiding technologies. The basic idea of our complex approach is to exploit all information about hidden data statistics to perform its detection based on a stochastic framework. The StegoWall system will be used for four main applications: robust watermarking, secret communications, integrity control and tamper proofing, and internet/network security.
Nuclear fuel corrosion over millennia interpreted using geologic data
Pearcy, E.C.; Manaktala, H.K.
1994-12-31
Corrosion of nuclear fuel over the 10,000 year regulatory period in a geologic repository will be a function of physical characteristics (e.g., crystallinity, crystal sizes, crystal forms) and chemical characteristics (e.g., crystal composition, compositional variability, accessory phases). Natural uraninite (nominally UO{sub 2+x}) which has undergone long-term corrosion can be studied to infer the long-term behavior of nuclear fuel. Previously, uraninite from the Nopal I deposit, Pena Blanca district, Chihuahua, Mexico, has been shown to constitute an outstanding analog material for comparison with nuclear fuel. Similarities between Nopal I uraninite and nuclear fuel have been shown to include bulk composition, general crystal structure, and total trace element content. Data presented here suggest that, as a bulk material, Nopal I uraninite compares favorably with irradiated nuclear fuel. Nevertheless, some fine-scale differences are noted between Nopal I uraninite and irradiated nuclear fuel with respect to both internal structures and compositions. These observations suggest that whereas the long-term responses of the two materials to oxidative alteration in a geologic repository may be similar, the detailed mechanisms of initial oxidant penetration and the short-term oxidative alternation of Nopal I uraninite and irradiated nuclear fuel are likely to be different.
Multivariate Statistical Analysis of MSL APXS Bulk Geochemical Data
NASA Astrophysics Data System (ADS)
Hamilton, V. E.; Edwards, C. S.; Thompson, L. M.; Schmidt, M. E.
2014-12-01
We apply cluster and factor analyses to bulk chemical data of 130 soil and rock samples measured by the Alpha Particle X-ray Spectrometer (APXS) on the Mars Science Laboratory (MSL) rover Curiosity through sol 650. Multivariate approaches such as principal components analysis (PCA), cluster analysis, and factor analysis compliment more traditional approaches (e.g., Harker diagrams), with the advantage of simultaneously examining the relationships between multiple variables for large numbers of samples. Principal components analysis has been applied with success to APXS, Pancam, and Mössbauer data from the Mars Exploration Rovers. Factor analysis and cluster analysis have been applied with success to thermal infrared (TIR) spectral data of Mars. Cluster analyses group the input data by similarity, where there are a number of different methods for defining similarity (hierarchical, density, distribution, etc.). For example, without any assumptions about the chemical contributions of surface dust, preliminary hierarchical and K-means cluster analyses clearly distinguish the physically adjacent rock targets Windjana and Stephen as being distinctly different than lithologies observed prior to Curiosity's arrival at The Kimberley. In addition, they are separated from each other, consistent with chemical trends observed in variation diagrams but without requiring assumptions about chemical relationships. We will discuss the variation in cluster analysis results as a function of clustering method and pre-processing (e.g., log transformation, correction for dust cover) and implications for interpreting chemical data. Factor analysis shares some similarities with PCA, and examines the variability among observed components of a dataset so as to reveal variations attributable to unobserved components. Factor analysis has been used to extract the TIR spectra of components that are typically observed in mixtures and only rarely in isolation; there is the potential for similar
A statistical algorithm for estimating chlorophyll concentration from MODIS data
NASA Astrophysics Data System (ADS)
Wattelez, Guillaume; Dupouy, Cécile; Mangeas, Morgan; Lèfevre, Jérôme; Touraivane, T.; Frouin, Robert J.
2014-11-01
We propose a statistical algorithm to assess chlorophyll-a concentration ([chl-a]) using remote sensing reflectance (Rrs) derived from MODerate Resolution Imaging Spectroradiometer (MODIS) data. This algorithm is a combination of two models: one for low [chl-a] (oligotrophic waters) and one for high [chl-a]. A satellite pixel is classified as low or high [chla] according to the Rrs ratio (488 and 555 nm channels). If a pixel is considered as a low [chl-a] pixel, a log-linear model is applied; otherwise, a more sophisticated model (Support Vector Machine) is applied. The log-linear model was developed thanks to supervised learning on Rrs and [chl-a] data from SeaBASS and more than 15 campaigns accomplished from 2002 to 2010 around New Caledonia. Several models to assess high [chl-a] were also tested with statistical methods. This novel approach outperforms the standard reflectance ratio approach. Compared with algorithms such as the current NASA OC3, Root Mean Square Error is 30% lower in New Caledonian waters.
Definition of Ensemble Error Statistics for Optimal Ensemble Data Assimilation
NASA Astrophysics Data System (ADS)
Frehlich, R.
2009-09-01
Next generation data assimilation methods must include the state dependent observation errors, i.e., the spatial and temporal variations produced by the atmospheric turbulent field. A rigorous analysis of optimal data assimilation algorithms and ensemble forecast systems requires a definition of model "truth" or perfect measurement which then defines the total observation error and forecast error. Truth is defined as the spatial average of the continuous atmospheric state variables centered on the model grid locations. To be consistent with the climatology of turbulence, the spatial average is chosen as the effective spatial filter of the numerical model. The observation errors then consist of two independent components: an instrument error and an observation sampling error which describes the mismatch of the spatial average of the observation and the spatial average of the perfect measurement or "truth". The observation sampling error is related to the "error of representativeness" but is defined only in terms of the local statistics of the atmosphere and the sampling pattern of the observation. Optimal data assimilation requires an estimate of the local background error correlation as well as the local observation error correlation. Both of these local correlations can be estimated from ensemble assimilation techniques where each member of the ensemble are produced by generating and assimilating random observations consistent with the estimates of the local sampling errors based on estimates of the local turbulent statistics. A rigorous evaluation of these optimal ensemble data assimilation techniques requires a definition of the ensemble members and the ensemble average that describes the error correlations. A new formulation is presented that is consistent with the climatology of atmospheric turbulence and the implications of this formulation for ensemble forecast systems is discussed.
An improvement approach to the interpretation of magnetic data
NASA Astrophysics Data System (ADS)
Zhang, H. L.; Hu, X. Y.; Liu, T. Y.
2012-04-01
There are numerous existing semi-automated data processing approaches being implemented which specialize in edge and depth of potential field source. The mathematical expression of tilt-angle has recently been developed into a depth-estimation routine, known as "tilt-depth". The tilt-depth was first introduced by Salem et al (2007) based on the tilt-angle which use first-order derivative to detect edge. In this paper, we propose the improvement on the tilt-depth method, which is based on the second-order derivatives of the reduced to pole (RTP) magnetic field, called edge detection and depth estimation based on vertical second-order derivatives (V2D-depth). Under certain assumptions such as when the contacts are nearly vertical and infinite depth extent and the magnetic field is vertical or RTP, the general expression published by Nabighian (1972) for the magnetic field over contacts located at a horizontal location of x=0 and at a depth of z0 is ( ) -x-- ΔT (x,z) = 2kFc·arctan z0 - z (1) Where kis the susceptibility contrast at the contact, F the magnitude of the magnetic field, c = 1 - cos2i · sin2A, A the angle between the positive h-axis and magnetic north, i the inclination of earth's field. The expressions for the vertical and horizontal derivatives of the magnetic field can be written as dΔT-= 2kF c·--z0--z-- dh x2 +(z0 - z)2 (2) dΔT-= 2kF c·--- x-- dz x2 +(z0 - z)2 (3) Based on Equations 2 and 3, we have 2 Tzz = d-ΔT-= 2kF c·--2x(z0--z)- dz2 [x2 + (z0 - z)2]2 (4) 2 2 2 Tzh = d-ΔT-= 2kF c·-(z0 -z)--x-2 dzdh [x2 + (z0 - z)2] (5) ° ---- x2 + (z - z)2 TzG = Tz2h +T 2zz = 2kFc ·----0--2- [x2 + (z0 - z)2] (6) Using Equations 4, 5 and 6, when z=0, we can get Tzz x T--+-T-= z- zG zh 0 (7) The V2D-depth is defined as ( T ) ( x ) θ = tan- 1 --zz-- = tan-1 - TzG + Tzh z0 (8) The V2D-depth amplitudes are restricted to values between -45° and +45° . It has the same interesting properties like the tilt-depth. Its responses vary from negative to positive
Lindsey, David A.
2001-01-01
Pebble count data from Quaternary gravel deposits north of Denver, Colo., were analyzed by multivariate statistical methods to identify lithologic factors that might affect aggregate quality. The pebble count data used in this analysis were taken from the map by Colton and Fitch (1974) and are supplemented by data reported by the Front Range Infrastructure Resources Project. This report provides data tables and results of the statistical analysis. The multivariate statistical analysis used here consists of log-contrast principal components analysis (method of Reyment and Savazzi, 1999) followed by rotation of principal components and factor interpretation. Three lithologic factors that might affect aggregate quality were identified: 1) granite and gneiss versus pegmatite, 2) quartz + quartzite versus total volcanic rocks, and 3) total sedimentary rocks (mainly sandstone) versus granite. Factor 1 (grain size of igneous and metamorphic rocks) may represent destruction during weathering and transport or varying proportions of rocks in source areas. Factor 2 (resistant source rocks) represents the dispersion shadow of metaquartzite detritus, perhaps enhanced by resistance of quartz and quartzite during weathering and transport. Factor 3 (proximity to sandstone source) represents dilution of gravel by soft sedimentary rocks (mainly sandstone), which are exposed mainly in hogbacks near the mountain front. Factor 1 probably does not affect aggregate quality. Factor 2 would be expected to enhance aggregate quality as measured by the Los Angeles degradation test. Factor 3 may diminish aggregate quality.
Application of an integrated geologic/production history data base for interpretation
Pearson, M.
1989-03-01
Geologic modeling systems provide valuable tools to assist the geologist who is interpreting subsurface conditions and managing geologic information. A single interpretation system that provides access to production history as well as geologic information is much more powerful than separate, nonintegrated systems for interpretation purposes. In addition, the useful life of the data base is extended to monitor the changing characteristics of the reservoir throughout the producing period. By increasing the usefulness of the data base, they have a more valuable resource and can further justify the efforts they take to create it. As geologists are required to perform more thorough analyses in less time, the value of an integrated data system increases. Using examples from Custer County, Oklahoma, this presentation describes a unique approach provided by an integrated geologic and production data base to perform interpretations and analyses beyond preliminary reservoir delineation.
Model-independent plot of dynamic PET data facilitates data interpretation and model selection.
Munk, Ole Lajord
2012-02-21
When testing new PET radiotracers or new applications of existing tracers, the blood-tissue exchange and the metabolism need to be examined. However, conventional plots of measured time-activity curves from dynamic PET do not reveal the inherent kinetic information. A novel model-independent volume-influx plot (vi-plot) was developed and validated. The new vi-plot shows the time course of the instantaneous distribution volume and the instantaneous influx rate. The vi-plot visualises physiological information that facilitates model selection and it reveals when a quasi-steady state is reached, which is a prerequisite for the use of the graphical analyses by Logan and Gjedde-Patlak. Both axes of the vi-plot have direct physiological interpretation, and the plot shows kinetic parameter in close agreement with estimates obtained by non-linear kinetic modelling. The vi-plot is equally useful for analyses of PET data based on a plasma input function or a reference region input function. The vi-plot is a model-independent and informative plot for data exploration that facilitates the selection of an appropriate method for data analysis.
The International Coal Statistics Data Base operations guide
Not Available
1991-04-01
The International Coal Statistics Data base (ICSD) is a micro- computer based system which contains informations related to international coal trade. This includes coal production, consumption, imports and exports information. The ICSD is a secondary data base, meaning that information contained therein is derived entirely from other primary sources. It uses dBase 3+ and Lotus 1-2-3 to locate, report and display data. The system is used for analysis in preparing the Annual Prospects for World Coal Trade (DOE/EIA-0363) publication. The ICSD system is menu driven, and also permits the user who is familiar with dBase and Lotus operations to leave the menu structure to perform independent queries. Documentation for the ICSD consists of three manuals -- the User's Guide, the Operations Manual and the Program Maintenance Manual. This Operations Manual explains how to install the programs, how to obtain reports on coal trade, what systems requirements apply, and how to update the major data files. It also explains file naming conventions, what each file does, and the programming procedures used to make the system work. The Operations Manual explains how to make the system respond to customized queries. It is organized around the ICSD menu structure and describes what each selection will do. Sample reports and graphs generated from individual menu selection are provided to acquaint the user with the various types of output. 17 figs.
Efficient Interpretation of Large-Scale Real Data by Static Inverse Optimization
NASA Astrophysics Data System (ADS)
Zhang, Hong; Ishikawa, Masumi
We have already proposed a methodology for static inverse optimization to interpret real data from a viewpoint of optimization. In this paper we propose a method for efficiently generating constraints by divide-and-conquer to interpret large-scale data by static inverse optimization. It radically decreases computational cost of generating constraints by deleting non-Pareto optimal data from given data. To evaluate the effectiveness of the proposed method, simulation experiments using 3-D artifical data are carried out. As an application to real data, criterion functions underlying decision making of about 5, 000 tenants living along Yamanote line and Soubu-Chuo line in Tokyo are estimated, providing interpretation of rented housing data from a viewpoint of optimization.
A Statistical Method for Estimating Luminosity Functions Using Truncated Data
NASA Astrophysics Data System (ADS)
Schafer, Chad M.
2007-06-01
The observational limitations of astronomical surveys lead to significant statistical inference challenges. One such challenge is the estimation of luminosity functions given redshift (z) and absolute magnitude (M) measurements from an irregularly truncated sample of objects. This is a bivariate density estimation problem; we develop here a statistically rigorous method which (1) does not assume a strict parametric form for the bivariate density; (2) does not assume independence between redshift and absolute magnitude (and hence allows evolution of the luminosity function with redshift); (3) does not require dividing the data into arbitrary bins; and (4) naturally incorporates a varying selection function. We accomplish this by decomposing the bivariate density φ(z,M) vialogφ(z,M)=f(z)+g(M)+h(z,M,θ), where f and g are estimated nonparametrically and h takes an assumed parametric form. There is a simple way of estimating the integrated mean squared error of the estimator; smoothing parameters are selected to minimize this quantity. Results are presented from the analysis of a sample of quasars.
Statistical Models for the Analysis of Zero-Inflated Pain Intensity Numeric Rating Scale Data.
Goulet, Joseph L; Buta, Eugenia; Bathulapalli, Harini; Gueorguieva, Ralitza; Brandt, Cynthia A
2017-03-01
Pain intensity is often measured in clinical and research settings using the 0 to 10 numeric rating scale (NRS). NRS scores are recorded as discrete values, and in some samples they may display a high proportion of zeroes and a right-skewed distribution. Despite this, statistical methods for normally distributed data are frequently used in the analysis of NRS data. We present results from an observational cross-sectional study examining the association of NRS scores with patient characteristics using data collected from a large cohort of 18,935 veterans in Department of Veterans Affairs care diagnosed with a potentially painful musculoskeletal disorder. The mean (variance) NRS pain was 3.0 (7.5), and 34% of patients reported no pain (NRS = 0). We compared the following statistical models for analyzing NRS scores: linear regression, generalized linear models (Poisson and negative binomial), zero-inflated and hurdle models for data with an excess of zeroes, and a cumulative logit model for ordinal data. We examined model fit, interpretability of results, and whether conclusions about the predictor effects changed across models. In this study, models that accommodate zero inflation provided a better fit than the other models. These models should be considered for the analysis of NRS data with a large proportion of zeroes.
Incorporating spatial context into statistical classification of multidimensional image data
NASA Technical Reports Server (NTRS)
Bauer, M. E. (Principal Investigator); Tilton, J. C.; Swain, P. H.
1981-01-01
Compound decision theory is employed to develop a general statistical model for classifying image data using spatial context. The classification algorithm developed from this model exploits the tendency of certain ground-cover classes to occur more frequently in some spatial contexts than in others. A key input to this contextural classifier is a quantitative characterization of this tendency: the context function. Several methods for estimating the context function are explored, and two complementary methods are recommended. The contextural classifier is shown to produce substantial improvements in classification accuracy compared to the accuracy produced by a non-contextural uniform-priors maximum likelihood classifier when these methods of estimating the context function are used. An approximate algorithm, which cuts computational requirements by over one-half, is presented. The search for an optimal implementation is furthered by an exploration of the relative merits of using spectral classes or information classes for classification and/or context function estimation.
NASA Technical Reports Server (NTRS)
Feiveson, Alan H.; Foy, Millennia; Ploutz-Snyder, Robert; Fiedler, James
2014-01-01
Do you have elevated p-values? Is the data analysis process getting you down? Do you experience anxiety when you need to respond to criticism of statistical methods in your manuscript? You may be suffering from Insufficient Statistical Support Syndrome (ISSS). For symptomatic relief of ISSS, come for a free consultation with JSC biostatisticians at our help desk during the poster sessions at the HRP Investigators Workshop. Get answers to common questions about sample size, missing data, multiple testing, when to trust the results of your analyses and more. Side effects may include sudden loss of statistics anxiety, improved interpretation of your data, and increased confidence in your results.
The International Coal Statistics Data Base program maintenance guide
Not Available
1991-06-01
The International Coal Statistics Data Base (ICSD) is a microcomputer-based system which contains information related to international coal trade. This includes coal production, consumption, imports and exports information. The ICSD is a secondary data base, meaning that information contained therein is derived entirely from other primary sources. It uses dBase III+ and Lotus 1-2-3 to locate, report and display data. The system is used for analysis in preparing the Annual Prospects for World Coal Trade (DOE/EIA-0363) publication. The ICSD system is menu driven and also permits the user who is familiar with dBase and Lotus operations to leave the menu structure to perform independent queries. Documentation for the ICSD consists of three manuals -- the User's Guide, the Operations Manual, and the Program Maintenance Manual. This Program Maintenance Manual provides the information necessary to maintain and update the ICSD system. Two major types of program maintenance documentation are presented in this manual. The first is the source code for the dBase III+ routines and related non-dBase programs used in operating the ICSD. The second is listings of the major component database field structures. A third important consideration for dBase programming, the structure of index files, is presented in the listing of source code for the index maintenance program. 1 fig.
Data Analysis & Statistical Methods for Command File Errors
NASA Technical Reports Server (NTRS)
Meshkat, Leila; Waggoner, Bruce; Bryant, Larry
2014-01-01
This paper explains current work on modeling for managing the risk of command file errors. It is focused on analyzing actual data from a JPL spaceflight mission to build models for evaluating and predicting error rates as a function of several key variables. We constructed a rich dataset by considering the number of errors, the number of files radiated, including the number commands and blocks in each file, as well as subjective estimates of workload and operational novelty. We have assessed these data using different curve fitting and distribution fitting techniques, such as multiple regression analysis, and maximum likelihood estimation to see how much of the variability in the error rates can be explained with these. We have also used goodness of fit testing strategies and principal component analysis to further assess our data. Finally, we constructed a model of expected error rates based on the what these statistics bore out as critical drivers to the error rate. This model allows project management to evaluate the error rate against a theoretically expected rate as well as anticipate future error rates.
Teschendorff, Andrew E; Sollich, Peter; Kuehn, Reimer
2014-06-01
A key challenge in systems biology is the elucidation of the underlying principles, or fundamental laws, which determine the cellular phenotype. Understanding how these fundamental principles are altered in diseases like cancer is important for translating basic scientific knowledge into clinical advances. While significant progress is being made, with the identification of novel drug targets and treatments by means of systems biological methods, our fundamental systems level understanding of why certain treatments succeed and others fail is still lacking. We here advocate a novel methodological framework for systems analysis and interpretation of molecular omic data, which is based on statistical mechanical principles. Specifically, we propose the notion of cellular signalling entropy (or uncertainty), as a novel means of analysing and interpreting omic data, and more fundamentally, as a means of elucidating systems-level principles underlying basic biology and disease. We describe the power of signalling entropy to discriminate cells according to differentiation potential and cancer status. We further argue the case for an empirical cellular entropy-robustness correlation theorem and demonstrate its existence in cancer cell line drug sensitivity data. Specifically, we find that high signalling entropy correlates with drug resistance and further describe how entropy could be used to identify the achilles heels of cancer cells. In summary, signalling entropy is a deep and powerful concept, based on rigorous statistical mechanical principles, which, with improved data quality and coverage, will allow a much deeper understanding of the systems biological principles underlying normal and disease physiology.
A Closer Look at Data Independence: Comment on “Lies, Damned Lies, and Statistics (in Geology)”
NASA Astrophysics Data System (ADS)
Kravtsov, Sergey; Saunders, Rolando Olivas
2011-02-01
In his Forum (Eos, 90(47), 443, doi:10.1029/2009EO470004, 2009), P. Vermeesch suggests that statistical tests are not fit to interpret long data records. He asserts that for large enough data sets any true null hypothesis will always be rejected. This is certainly not the case! Here we revisit this author's example of weekly distribution of earthquakes and show that statistical results support the commonsense expectation that seismic activity does not depend on weekday (see the online supplement to this Eos issue for details (http://www.agu.org/eos_elec/)).
NASA Astrophysics Data System (ADS)
Borradaile, G. J.; Werner, T.; Lagroix, F.
2003-04-01
The Kapuskasing Structural Zone (KSZ) reveals a section through Archean lower Crustal, granoblastic gneisses. Our new paleomagnetic data largely agrees with previous work but we show that interpretations vary according to the choices of statistical, demagnetization and field-correction techniques. First, where the orientation-distribution of characteristic remanence directions on the sphere is not circular-symmetrical, the commonly used statistical model is invalid (Fisher, 1953). Any tendency to form an elliptical distribution indicates the sample is drawn from a Bingham-type population (Bingham, 1964). Fisher and Bingham statistics produce different confidence estimates from the same data and the traditionally defined mean-vector may differ from the maximum eigenvector of an orthorhombic Bingham-distribution. It seems prudent to apply both models wherever a non-Fisher population is suspected and that may be appropriate in any tectonized rocks. Non-Fisher populations require larger sample-sizes so that focussing on individual sites may not be the most effective policy in tectonized rocks. More dispersed sampling across tectonic structures may be more productive. Second, from the same specimens, mean-vectors isolated by thermal and by AF demagnetization differ. Which treatment gives more meaningful results is difficult to decipher, especially in metamorphic rocks where the history of the magnetic minerals is not easily related to the ages of tectonic and petrological events. In this study, thermal demagnetization gave lower inclinations for paleomagnetic vectors and thus more distant paleopoles. Third, of more parochial significance, tilt-corrections may be unnecessary in the KSZ because magnetic fabrics and the thrust-ramp are constant in orientation to the depth at which they level off, at approximately 15km depth. With Archean geothermal gradients primary remanences were blocked after the foliation was tilted to rise on the thrust ramp. Therefore, the rocks were
Challenges and Lessons Learned in Generating and Interpreting NHANES Nutritional Biomarker Data.
Pfeiffer, Christine M; Lacher, David A; Schleicher, Rosemary L; Johnson, Clifford L; Yetley, Elizabeth A
2017-03-01
For the past 45 y, the National Center for Health Statistics at the CDC has carried out nutrition surveillance of the US population by collecting anthropometric, dietary intake, and nutritional biomarker data, the latter being the focus of this publication. The earliest biomarker testing assessed iron and vitamin A status. With time, a broad spectrum of water- and fat-soluble vitamins was added and biomarkers for other types of nutrients (e.g., fatty acids) and bioactive dietary compounds (e.g., phytoestrogens) were included in NHANES. The cross-sectional survey is flexible in design, and biomarkers may be measured for a short period of time or rotated in and out of surveys depending on scientific needs. Maintaining high-quality laboratory measurements over extended periods of time such that trends in status can be reliably assessed is a major goal of the testing laboratories. Physicians, health scientists, and policy makers rely on the NHANES reference data to compare the nutritional status of population groups, to assess the impact of various interventions, and to explore associations between nutritional status and health promotion or disease prevention. Focusing on the continuous NHANES, which started in 1999, this review uses a "lessons learned" approach to present a series of challenges that are relevant to researchers measuring biomarkers in NHANES and beyond. Some of those challenges are the use of multiple related biomarkers instead of a single biomarker for a specific nutrient (e.g., folate, vitamin B-12, iron), adhering to special needs for specimen collection and handling to ensure optimum specimen quality (e.g., vitamin C, folate, homocysteine, iodine, polyunsaturated fatty acids), the retrospective use of long-term quality-control data to correct for assay shifts (e.g., vitamin D, vitamin B-12), and the proper planning for and interpretation of crossover studies to adjust for systematic method changes (e.g., folate, vitamin D, ferritin).
Fuzzy logic and image processing techniques for the interpretation of seismic data
NASA Astrophysics Data System (ADS)
Orozco-del-Castillo, M. G.; Ortiz-Alemán, C.; Urrutia-Fucugauchi, J.; Rodríguez-Castellanos, A.
2011-06-01
Since interpretation of seismic data is usually a tedious and repetitive task, the ability to do so automatically or semi-automatically has become an important objective of recent research. We believe that the vagueness and uncertainty in the interpretation process makes fuzzy logic an appropriate tool to deal with seismic data. In this work we developed a semi-automated fuzzy inference system to detect the internal architecture of a mass transport complex (MTC) in seismic images. We propose that the observed characteristics of a MTC can be expressed as fuzzy if-then rules consisting of linguistic values associated with fuzzy membership functions. The constructions of the fuzzy inference system and various image processing techniques are presented. We conclude that this is a well-suited problem for fuzzy logic since the application of the proposed methodology yields a semi-automatically interpreted MTC which closely resembles the MTC from expert manual interpretation.
Statistical characteristics of ionospheric variability using oblique sounding data
NASA Astrophysics Data System (ADS)
Kurkin, Vladimir; Polekh, Nelya; Ivanova, Vera; Dumbrava, Zinaida; Podelsky, Igor
Using data from oblique sounding obtained over two paths Magadan-Irkutsk and Khabarovsk-Irkutsk in the 2006-2011 the statistical parameters of ionospheric variability are studied during equinox and the winter solstice. It was shown that the probability of maximum observed frequency registration with average standard deviations from the median in the range 5-10% in winter is 0.43, in spring and autumn - 0.64 over Magadan-Irkutsk path. In winter during daytime standard deviation does not exceed 10%, and at night it reaches 20% or more. During the equinox the daytime standard deviation increases to 12%, and at night it does not exceed 16%. This may be due to changes in lighting conditions at the midpoint of the path (58.2(°) N, 124.2(°) E). As far Khabarovsk-Irkutsk path standard deviations from their median less than the ones obtained for Magadan-Irkutsk path. The estimations are consistent with previously obtained results deduced from the vertical sounding data. The study was done under RF President Grant of Public Support for RF Leading Scientific Schools (NSh-2942.2014.5) and RFBR Grant No 14-05-00259.
Sharing brain mapping statistical results with the neuroimaging data model
Maumet, Camille; Auer, Tibor; Bowring, Alexander; Chen, Gang; Das, Samir; Flandin, Guillaume; Ghosh, Satrajit; Glatard, Tristan; Gorgolewski, Krzysztof J.; Helmer, Karl G.; Jenkinson, Mark; Keator, David B.; Nichols, B. Nolan; Poline, Jean-Baptiste; Reynolds, Richard; Sochat, Vanessa; Turner, Jessica; Nichols, Thomas E.
2016-01-01
Only a tiny fraction of the data and metadata produced by an fMRI study is finally conveyed to the community. This lack of transparency not only hinders the reproducibility of neuroimaging results but also impairs future meta-analyses. In this work we introduce NIDM-Results, a format specification providing a machine-readable description of neuroimaging statistical results along with key image data summarising the experiment. NIDM-Results provides a unified representation of mass univariate analyses including a level of detail consistent with available best practices. This standardized representation allows authors to relay methods and results in a platform-independent regularized format that is not tied to a particular neuroimaging software package. Tools are available to export NIDM-Result graphs and associated files from the widely used SPM and FSL software packages, and the NeuroVault repository can import NIDM-Results archives. The specification is publically available at: http://nidm.nidash.org/specs/nidm-results.html. PMID:27922621
SEDA: A software package for the Statistical Earthquake Data Analysis
Lombardi, A. M.
2017-01-01
In this paper, the first version of the software SEDA (SEDAv1.0), designed to help seismologists statistically analyze earthquake data, is presented. The package consists of a user-friendly Matlab-based interface, which allows the user to easily interact with the application, and a computational core of Fortran codes, to guarantee the maximum speed. The primary factor driving the development of SEDA is to guarantee the research reproducibility, which is a growing movement among scientists and highly recommended by the most important scientific journals. SEDAv1.0 is mainly devoted to produce accurate and fast outputs. Less care has been taken for the graphic appeal, which will be improved in the future. The main part of SEDAv1.0 is devoted to the ETAS modeling. SEDAv1.0 contains a set of consistent tools on ETAS, allowing the estimation of parameters, the testing of model on data, the simulation of catalogs, the identification of sequences and forecasts calculation. The peculiarities of routines inside SEDAv1.0 are discussed in this paper. More specific details on the software are presented in the manual accompanying the program package. PMID:28290482
A statistical framework for testing modularity in multidimensional data.
Márquez, Eladio J
2008-10-01
Modular variation of multivariate traits results from modular distribution of effects of genetic and epigenetic interactions among those traits. However, statistical methods rarely detect truly modular patterns, possibly because the processes that generate intramodular associations may overlap spatially. Methodologically, this overlap may cause multiple patterns of modularity to be equally consistent with observed covariances. To deal with this indeterminacy, the present study outlines a framework for testing a priori hypotheses of modularity in which putative modules are mathematically represented as multidimensional subspaces embedded in the data. Model expectations are computed by subdividing the data into arrays of variables, and intermodular interactions are represented by overlapping arrays. Covariance structures are thus modeled as the outcome of complex and nonorthogonal intermodular interactions. This approach is demonstrated by analyzing mandibular modularity in nine rodent species. A total of 620 models are fit to each species, and the most strongly supported are heuristically modified to improve their fit. Five modules common to all species are identified, which approximately map to the developmental modules of the mandible. Within species, these modules are embedded within larger "super-modules," suggesting that these conserved modules act as building blocks from which covariation patterns are built.
Sharing brain mapping statistical results with the neuroimaging data model.
Maumet, Camille; Auer, Tibor; Bowring, Alexander; Chen, Gang; Das, Samir; Flandin, Guillaume; Ghosh, Satrajit; Glatard, Tristan; Gorgolewski, Krzysztof J; Helmer, Karl G; Jenkinson, Mark; Keator, David B; Nichols, B Nolan; Poline, Jean-Baptiste; Reynolds, Richard; Sochat, Vanessa; Turner, Jessica; Nichols, Thomas E
2016-12-06
Only a tiny fraction of the data and metadata produced by an fMRI study is finally conveyed to the community. This lack of transparency not only hinders the reproducibility of neuroimaging results but also impairs future meta-analyses. In this work we introduce NIDM-Results, a format specification providing a machine-readable description of neuroimaging statistical results along with key image data summarising the experiment. NIDM-Results provides a unified representation of mass univariate analyses including a level of detail consistent with available best practices. This standardized representation allows authors to relay methods and results in a platform-independent regularized format that is not tied to a particular neuroimaging software package. Tools are available to export NIDM-Result graphs and associated files from the widely used SPM and FSL software packages, and the NeuroVault repository can import NIDM-Results archives. The specification is publically available at: http://nidm.nidash.org/specs/nidm-results.html.
SEDA: A software package for the Statistical Earthquake Data Analysis.
Lombardi, A M
2017-03-14
In this paper, the first version of the software SEDA (SEDAv1.0), designed to help seismologists statistically analyze earthquake data, is presented. The package consists of a user-friendly Matlab-based interface, which allows the user to easily interact with the application, and a computational core of Fortran codes, to guarantee the maximum speed. The primary factor driving the development of SEDA is to guarantee the research reproducibility, which is a growing movement among scientists and highly recommended by the most important scientific journals. SEDAv1.0 is mainly devoted to produce accurate and fast outputs. Less care has been taken for the graphic appeal, which will be improved in the future. The main part of SEDAv1.0 is devoted to the ETAS modeling. SEDAv1.0 contains a set of consistent tools on ETAS, allowing the estimation of parameters, the testing of model on data, the simulation of catalogs, the identification of sequences and forecasts calculation. The peculiarities of routines inside SEDAv1.0 are discussed in this paper. More specific details on the software are presented in the manual accompanying the program package.
SEDA: A software package for the Statistical Earthquake Data Analysis
NASA Astrophysics Data System (ADS)
Lombardi, A. M.
2017-03-01
In this paper, the first version of the software SEDA (SEDAv1.0), designed to help seismologists statistically analyze earthquake data, is presented. The package consists of a user-friendly Matlab-based interface, which allows the user to easily interact with the application, and a computational core of Fortran codes, to guarantee the maximum speed. The primary factor driving the development of SEDA is to guarantee the research reproducibility, which is a growing movement among scientists and highly recommended by the most important scientific journals. SEDAv1.0 is mainly devoted to produce accurate and fast outputs. Less care has been taken for the graphic appeal, which will be improved in the future. The main part of SEDAv1.0 is devoted to the ETAS modeling. SEDAv1.0 contains a set of consistent tools on ETAS, allowing the estimation of parameters, the testing of model on data, the simulation of catalogs, the identification of sequences and forecasts calculation. The peculiarities of routines inside SEDAv1.0 are discussed in this paper. More specific details on the software are presented in the manual accompanying the program package.
Dziurkowska, Ewelina; Wesolowski, Marek
2015-01-01
Multivariate statistical analysis is widely used in medical studies as a profitable tool facilitating diagnosis of some diseases, for instance, cancer, allergy, pneumonia, or Alzheimer's and psychiatric diseases. Taking this in consideration, the aim of this study was to use two multivariate techniques, hierarchical cluster analysis (HCA) and principal component analysis (PCA), to disclose the relationship between the drugs used in the therapy of major depressive disorder and the salivary cortisol level and the period of hospitalization. The cortisol contents in saliva of depressed women were quantified by HPLC with UV detection day-to-day during the whole period of hospitalization. A data set with 16 variables (e.g., the patients' age, multiplicity and period of hospitalization, initial and final cortisol level, highest and lowest hormone level, mean contents, and medians) characterizing 97 subjects was used for HCA and PCA calculations. Multivariate statistical analysis reveals that various groups of antidepressants affect at the varying degree the salivary cortisol level. The SSRIs, SNRIs, and the polypragmasy reduce most effectively the hormone secretion. Thus, both unsupervised pattern recognition methods, HCA and PCA, can be used as complementary tools for interpretation of the results obtained by laboratory diagnostic methods. PMID:26380376
Dziurkowska, Ewelina; Wesolowski, Marek
2015-01-01
Multivariate statistical analysis is widely used in medical studies as a profitable tool facilitating diagnosis of some diseases, for instance, cancer, allergy, pneumonia, or Alzheimer's and psychiatric diseases. Taking this in consideration, the aim of this study was to use two multivariate techniques, hierarchical cluster analysis (HCA) and principal component analysis (PCA), to disclose the relationship between the drugs used in the therapy of major depressive disorder and the salivary cortisol level and the period of hospitalization. The cortisol contents in saliva of depressed women were quantified by HPLC with UV detection day-to-day during the whole period of hospitalization. A data set with 16 variables (e.g., the patients' age, multiplicity and period of hospitalization, initial and final cortisol level, highest and lowest hormone level, mean contents, and medians) characterizing 97 subjects was used for HCA and PCA calculations. Multivariate statistical analysis reveals that various groups of antidepressants affect at the varying degree the salivary cortisol level. The SSRIs, SNRIs, and the polypragmasy reduce most effectively the hormone secretion. Thus, both unsupervised pattern recognition methods, HCA and PCA, can be used as complementary tools for interpretation of the results obtained by laboratory diagnostic methods.
ERIC Educational Resources Information Center
Neumann, David L.; Hood, Michelle; Neumann, Michelle M.
2013-01-01
Many teachers of statistics recommend using real-life data during class lessons. However, there has been little systematic study of what effect this teaching method has on student engagement and learning. The present study examined this question in a first-year university statistics course. Students (n = 38) were interviewed and their reflections…
MALDI imaging mass spectrometry: statistical data analysis and current computational challenges.
Alexandrov, Theodore
2012-01-01
Matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) imaging mass spectrometry, also called MALDI-imaging, is a label-free bioanalytical technique used for spatially-resolved chemical analysis of a sample. Usually, MALDI-imaging is exploited for analysis of a specially prepared tissue section thaw mounted onto glass slide. A tremendous development of the MALDI-imaging technique has been observed during the last decade. Currently, it is one of the most promising innovative measurement techniques in biochemistry and a powerful and versatile tool for spatially-resolved chemical analysis of diverse sample types ranging from biological and plant tissues to bio and polymer thin films. In this paper, we outline computational methods for analyzing MALDI-imaging data with the emphasis on multivariate statistical methods, discuss their pros and cons, and give recommendations on their application. The methods of unsupervised data mining as well as supervised classification methods for biomarker discovery are elucidated. We also present a high-throughput computational pipeline for interpretation of MALDI-imaging data using spatial segmentation. Finally, we discuss current challenges associated with the statistical analysis of MALDI-imaging data.
STATISTICAL ESTIMATION AND VISUALIZATION OF GROUND-WATER CONTAMINATION DATA
This work presents methods of visualizing and animating statistical estimates of ground water and/or soil contamination over a region from observations of the contaminant for that region. The primary statistical methods used to produce the regional estimates are nonparametric re...
EBprot: Statistical analysis of labeling-based quantitative proteomics data.
Koh, Hiromi W L; Swa, Hannah L F; Fermin, Damian; Ler, Siok Ghee; Gunaratne, Jayantha; Choi, Hyungwon
2015-08-01
Labeling-based proteomics is a powerful method for detection of differentially expressed proteins (DEPs). The current data analysis platform typically relies on protein-level ratios, which is obtained by summarizing peptide-level ratios for each protein. In shotgun proteomics, however, some proteins are quantified with more peptides than others, and this reproducibility information is not incorporated into the differential expression (DE) analysis. Here, we propose a novel probabilistic framework EBprot that directly models the peptide-protein hierarchy and rewards the proteins with reproducible evidence of DE over multiple peptides. To evaluate its performance with known DE states, we conducted a simulation study to show that the peptide-level analysis of EBprot provides better receiver-operating characteristic and more accurate estimation of the false discovery rates than the methods based on protein-level ratios. We also demonstrate superior classification performance of peptide-level EBprot analysis in a spike-in dataset. To illustrate the wide applicability of EBprot in different experimental designs, we applied EBprot to a dataset for lung cancer subtype analysis with biological replicates and another dataset for time course phosphoproteome analysis of EGF-stimulated HeLa cells with multiplexed labeling. Through these examples, we show that the peptide-level analysis of EBprot is a robust alternative to the existing statistical methods for the DE analysis of labeling-based quantitative datasets. The software suite is freely available on the Sourceforge website http://ebprot.sourceforge.net/. All MS data have been deposited in the ProteomeXchange with identifier PXD001426 (http://proteomecentral.proteomexchange.org/dataset/PXD001426/).
77 FR 65585 - Renewal of the Bureau of Labor Statistics Data Users Advisory Committee
Federal Register 2010, 2011, 2012, 2013, 2014
2012-10-29
... of Labor Statistics Renewal of the Bureau of Labor Statistics Data Users Advisory Committee The... determined that the renewal of the Bureau of Labor Statistics Data Users Advisory Committee (the ``Committee... of Labor Statistics by 29 U.S.C. 1 and 2. This determination follows consultation with the...
75 FR 67121 - Re-Establishment of the Bureau of Labor Statistics Data Users Advisory Committee
Federal Register 2010, 2011, 2012, 2013, 2014
2010-11-01
... of Labor Statistics Re-Establishment of the Bureau of Labor Statistics Data Users Advisory Committee... of Labor has determined that the re-establishment of the Bureau of Labor Statistics Data Users... duties imposed upon the Commissioner of Labor Statistics by 29 U.S.C. 1 and 2. This determination...
Kuo, Kuan-Liang; Fuh, Chiou-Shann
2011-12-01
Health examinations can obtain relatively complete health information and thus are important for the personal and public health management. For clinicians, one of the most important works in the health examinations is to interpret the health examination results. Continuously interpreting numerous health examination results of healthcare receivers is tedious and error-prone. This paper proposes a clinical decision support system to assist solving above problems. In order to customize the clinical decision support system intuitively and flexibly, this paper also proposes the rule syntax to implement computer-interpretable logic for health examinations. It is our purpose in this paper to describe the methodology of the proposed clinical decision support system. The evaluation was performed by the implementation and execution of decision rules on health examination results and a survey on clinical decision support system users. It reveals the efficiency and user satisfaction of proposed clinical decision support system. Positive impact of clinical data interpretation is also noted.
Geochemical portray of the Pacific Ridge: New isotopic data and statistical techniques
NASA Astrophysics Data System (ADS)
Hamelin, Cédric; Dosso, Laure; Hanan, Barry B.; Moreira, Manuel; Kositsky, Andrew P.; Thomas, Marion Y.
2011-02-01
Samples collected during the PACANTARCTIC 2 cruise fill a sampling gap from 53° to 41° S along the Pacific Antarctic Ridge (PAR). Analysis of Sr, Nd, Pb, Hf, and He isotope compositions of these new samples is shown together with published data from 66°S to 53°S and from the EPR. The recent advance in analytical mass spectrometry techniques generates a spectacular increase in the number of multidimensional isotopic data for oceanic basalts. Working with such multidimensional datasets generates a new approach for the data interpretation, preferably based on statistical analysis techniques. Principal Component Analysis (PCA) is a powerful mathematical tool to study this type of datasets. The purpose of PCA is to reduce the number of dimensions by keeping only those characteristics that contribute most to its variance. Using this technique, it becomes possible to have a statistical picture of the geochemical variations along the entire Pacific Ridge from 70°S to 10°S. The incomplete sampling of the ridge led previously to the identification of a large-scale division of the south Pacific mantle at the latitude of Easter Island. The PCA method applied here to the completed dataset reveals a different geochemical profile. Along the Pacific Ridge, a large-scale bell-shaped variation with an extremum at about 38°S of latitude is interpreted as a progressive change in the geochemical characteristics of the depleted matrix of the mantle. This Pacific Isotopic Bump (PIB) is also noticeable in the He isotopic ratio along-axis variation. The linear correlation observed between He and heavy radiogenic isotopes, together with the result of the PCA calculation, suggests that the large-scale variation is unrelated to the plume-ridge interactions in the area and should rather be attributed to the partial melting of a marble-cake assemblage.
Notes on interpretation of geophysical data over areas of mineralization in Afghanistan
Drenth, Benjamin J.
2011-01-01
Afghanistan has the potential to contain substantial metallic mineral resources. Although valuable mineral deposits have been identified, much of the country's potential remains unknown. Geophysical surveys, particularly those conducted from airborne platforms, are a well-accepted and cost-effective method for obtaining information on the geological setting of a given area. This report summarizes interpretive findings from various geophysical surveys over selected mineral targets in Afghanistan, highlighting what existing data tell us. These interpretations are mainly qualitative in nature, because of the low resolution of available geophysical data. Geophysical data and simple interpretations are included for these six areas and deposit types: (1) Aynak: Sedimentary-hosted copper; (2) Zarkashan: Porphyry copper; (3) Kundalan: Porphyry copper; (4) Dusar Shaida: Volcanic-hosted massive sulphide; (5) Khanneshin: Carbonatite-hosted rare earth element; and (6) Chagai Hills: Porphyry copper.
ERIC Educational Resources Information Center
Singamsetti, Rao
2007-01-01
In this paper an attempt is made to highlight some issues of interpretation of statistical concepts and interpretation of results as taught in undergraduate Business statistics courses. The use of modern technology in the class room is shown to have increased the efficiency and the ease of learning and teaching in statistics. The importance of…
Statistical object data analysis of taxonomic trees from human microbiome data.
La Rosa, Patricio S; Shands, Berkley; Deych, Elena; Zhou, Yanjiao; Sodergren, Erica; Weinstock, George; Shannon, William D
2012-01-01
Human microbiome research characterizes the microbial content of samples from human habitats to learn how interactions between bacteria and their host might impact human health. In this work a novel parametric statistical inference method based on object-oriented data analysis (OODA) for analyzing HMP data is proposed. OODA is an emerging area of statistical inference where the goal is to apply statistical methods to objects such as functions, images, and graphs or trees. The data objects that pertain to this work are taxonomic trees of bacteria built from analysis of 16S rRNA gene sequences (e.g. using RDP); there is one such object for each biological sample analyzed. Our goal is to model and formally compare a set of trees. The contribution of our work is threefold: first, a weighted tree structure to analyze RDP data is introduced; second, using a probability measure to model a set of taxonomic trees, we introduce an approximate MLE procedure for estimating model parameters and we derive LRT statistics for comparing the distributions of two metagenomic populations; and third the Jumpstart HMP data is analyzed using the proposed model providing novel insights and future directions of analysis.
NASA Astrophysics Data System (ADS)
Cobden, Laura; Mosca, Ilaria; Trampert, Jeannot; Ritsema, Jeroen
2012-11-01
Recent experimental studies indicate that perovskite, the dominant lower mantle mineral, undergoes a phase change to post-perovskite at high pressures. However, it has been unclear whether this transition occurs within the Earth's mantle, due to uncertainties in both the thermochemical state of the lowermost mantle and the pressure-temperature conditions of the phase boundary. In this study we compare the relative fit to global seismic data of mantle models which do and do not contain post-perovskite, following a statistical approach. Our data comprise more than 10,000 Pdiff and Sdiff travel-times, global in coverage, from which we extract the global distributions of dln VS and dln VP near the core-mantle boundary (CMB). These distributions are sensitive to the underlying lateral variations in mineralogy and temperature even after seismic uncertainties are taken into account, and are ideally suited for investigating the likelihood of the presence of post-perovskite. A post-perovskite-bearing CMB region provides a significantly closer fit to the seismic data than a post-perovskite-free CMB region on both a global and regional scale. These results complement previous local seismic reflection studies, which have shown a consistency between seismic observations and the physical properties of post-perovskite inside the deep Earth.
Interpretation of Ground Penetrating Radar data at the Hanford Site, Richland, Washington
NASA Astrophysics Data System (ADS)
Bergstrom, K. A.; Mitchell, T. H.; Kunk, J. R.
1993-07-01
Ground Penetrating Radar (GPR) is being used extensively during characterization and remediation of chemical and radioactive waste sites at the Hanford Site in Washington State. Time and money for GPR investigations are often not included during the planning and budgeting phase. Therefore GPR investigations must be inexpensive and quick to minimize impact on already established budgets and schedules. An approach to survey design, data collection, and interpretation has been developed which emphasizes speed and budget with minimal impact on the integrity of the interpretation or quality of the data. The following simple rules of thumb can be applied: (1) Assemble as much pre-survey information as possible, (2) Clearly define survey objectives prior to designing the survey and determine which combination of geophysical methods will best meet the objectives, (3) Continuously communicate with the client, before, during and after the investigation, (4) Only experienced GPR interpreters should acquire the field data, (5) Use real-time monitoring of the data to determine where and how much data to collect and assist in the interpretation, (6) Always 'error' in favor of collecting too much data, (7) Surveys should have closely spaced (preferably 5 feet, no more than 10 feet), orthogonal profiles, and (8) When possible, pull the antenna by hand.
Interpretation of Ground Penetrating Radar data at the Hanford Site, Richland, Washington
Bergstrom, K.A.; Mitchell, T.H.; Kunk, J.R.
1993-07-01
Ground Penetrating Radar (GPR) is being used extensively during characterization and remediation of chemical and radioactive waste sites at the Hanford Site in Washington State. Time and money for GPR investigations are often not included during the planning and budgeting phase. Therefore GPR investigations must be inexpensive and quick to minimize impact on already established budgets and schedules. An approach to survey design, data collection, and interpretation has been developed which emphasizes speed and budget with minimal impact on the integrity of the interpretation or quality of the data. The following simple rules of thumb can be applied: (1) Assemble as much pre-survey information as possible, (2) Clearly define survey objectives prior to designing the survey and determine which combination of geophysical methods will best meet the objectives, (3) Continuously communicate with the client, before, during and after the investigation, (4) Only experienced GPR interpreters should acquire the field data, (5) Use real-time monitoring of the data to determine where and how much data to collect and assist in the interpretation, (6) Always ``error`` in favor of collecting too much data, (7) Surveys should have closely spaced (preferably 5 feet, no more than 10 feet), orthogonal profiles, (8) When possible, pull the antenna by hand.
Interim Solar Radiation Data Manual: 30-Year Statistics from the National Solar Radiation Data Base
Not Available
1992-11-01
The 30-year (1961-1990) statistics contained in this document have been derived from the National Solar Radiation Data Base (NSRDB) produced by the National Renewable Energy Laboratory (NREL). They outline solar radiation sources, as well as 30-year monthly and annual means of 5 solar radiation elements (three surface and two extraterrestrial) and 12 meteorological elements for 239 locations.
Computer Search Center Statistics on Users and Data Bases
ERIC Educational Resources Information Center
Schipma, Peter B.
1974-01-01
Statistics gathered over five years of operation by the IIT Research Institute's Computer Search Center are summarized for profile terms and lists, use of truncation modes, use of logic operators, some characteristics of CA Condensates, etc. (Author/JB)
Robust statistical approaches to assess the degree of agreement of clinical data
NASA Astrophysics Data System (ADS)
Grilo, Luís M.; Grilo, Helena L.
2016-06-01
To analyze the blood of patients who took vitamin B12 for a period of time, two different medicine measurement methods were used (one is the established method, with more human intervention, and the other method uses essentially machines). Given the non-normality of the differences between both measurement methods, the limits of agreement are estimated using also a non-parametric approach to assess the degree of agreement of the clinical data. The bootstrap resampling method is applied in order to obtain robust confidence intervals for mean and median of differences. The approaches used are easy to apply, running a friendly software, and their outputs are also easy to interpret. In this case study the results obtained with (non)parametric approaches lead us to different statistical conclusions, but the decision whether agreement is acceptable or not is always a clinical judgment.
Statistical methods in joint modeling of longitudinal and survival data
NASA Astrophysics Data System (ADS)
Dempsey, Walter
Survival studies often generate not only a survival time for each patient but also a sequence of health measurements at annual or semi-annual check-ups while the patient remains alive. Such a sequence of random length accompanied by a survival time is called a survival process. Ordinarily robust health is associated with longer survival, so the two parts of a survival process cannot be assumed independent. The first part of the thesis is concerned with a general technique---reverse alignment---for constructing statistical models for survival processes. A revival model is a regression model in the sense that it incorporates covariate and treatment effects into both the distribution of survival times and the joint distribution of health outcomes. The revival model also determines a conditional survival distribution given the observed history, which describes how the subsequent survival distribution is determined by the observed progression of health outcomes. The second part of the thesis explores the concept of a consistent exchangeable survival process---a joint distribution of survival times in which the risk set evolves as a continuous-time Markov process with homogeneous transition rates. A correspondence with the de Finetti approach of constructing an exchangeable survival process by generating iid survival times conditional on a completely independent hazard measure is shown. Several specific processes are detailed, showing how the number of blocks of tied failure times grows asymptotically with the number of individuals in each case. In particular, we show that the set of Markov survival processes with weakly continuous predictive distributions can be characterized by a two-dimensional family called the harmonic process. The outlined methods are then applied to data, showing how they can be easily extended to handle censoring and inhomogeneity among patients.
Cunningham, Michael R.; Baumeister, Roy F.
2016-01-01
The limited resource model states that self-control is governed by a relatively finite set of inner resources on which people draw when exerting willpower. Once self-control resources have been used up or depleted, they are less available for other self-control tasks, leading to a decrement in subsequent self-control success. The depletion effect has been studied for over 20 years, tested or extended in more than 600 studies, and supported in an independent meta-analysis (Hagger et al., 2010). Meta-analyses are supposed to reduce bias in literature reviews. Carter et al.’s (2015) meta-analysis, by contrast, included a series of questionable decisions involving sampling, methods, and data analysis. We provide quantitative analyses of key sampling issues: exclusion of many of the best depletion studies based on idiosyncratic criteria and the emphasis on mini meta-analyses with low statistical power as opposed to the overall depletion effect. We discuss two key methodological issues: failure to code for research quality, and the quantitative impact of weak studies by novice researchers. We discuss two key data analysis issues: questionable interpretation of the results of trim and fill and Funnel Plot Asymmetry test procedures, and the use and misinterpretation of the untested Precision Effect Test and Precision Effect Estimate with Standard Error (PEESE) procedures. Despite these serious problems, the Carter et al. (2015) meta-analysis results actually indicate that there is a real depletion effect – contrary to their title. PMID:27826272
Cunningham, Michael R; Baumeister, Roy F
2016-01-01
The limited resource model states that self-control is governed by a relatively finite set of inner resources on which people draw when exerting willpower. Once self-control resources have been used up or depleted, they are less available for other self-control tasks, leading to a decrement in subsequent self-control success. The depletion effect has been studied for over 20 years, tested or extended in more than 600 studies, and supported in an independent meta-analysis (Hagger et al., 2010). Meta-analyses are supposed to reduce bias in literature reviews. Carter et al.'s (2015) meta-analysis, by contrast, included a series of questionable decisions involving sampling, methods, and data analysis. We provide quantitative analyses of key sampling issues: exclusion of many of the best depletion studies based on idiosyncratic criteria and the emphasis on mini meta-analyses with low statistical power as opposed to the overall depletion effect. We discuss two key methodological issues: failure to code for research quality, and the quantitative impact of weak studies by novice researchers. We discuss two key data analysis issues: questionable interpretation of the results of trim and fill and Funnel Plot Asymmetry test procedures, and the use and misinterpretation of the untested Precision Effect Test and Precision Effect Estimate with Standard Error (PEESE) procedures. Despite these serious problems, the Carter et al. (2015) meta-analysis results actually indicate that there is a real depletion effect - contrary to their title.
ISSUES IN THE STATISTICAL ANALYSIS OF SMALL-AREA HEALTH DATA. (R825173)
The availability of geographically indexed health and population data, with advances in computing, geographical information systems and statistical methodology, have opened the way for serious exploration of small area health statistics based on routine data. Such analyses may be...
WebGIS System Provides Spatial Context for Interpreting Biophysical data
NASA Astrophysics Data System (ADS)
Graham, R. L.; Santhana Vannan, K.; Olsen, L. M.; Palanisamy, G.; Cook, R. B.; Beaty, T. W.; Holladay, S. K.; Rhyne, T.; Voorhees, L. D.
2006-05-01
Understanding the spatial context of biophysical data such as measurements of Net Primary Productivity or carbon fluxes at tower sites is useful in their interpretation. The ORNL DAAC has developed a WebGIS system to help users visualize, locate and extract landcover, biophysical, elevation, and geopolitical data archived at the DAAC and/or point the users to the primary data location as in case of flux tower measurements. The system currently allows the user to extract data for thirteen different map features including four vector data sets and nine raster coverages. Four OGC layers are also available to help interpretation of the site specific data. The user can select either the Global or North American version of the system. Users can interrogate map features, extract and download map features including map layers (shape files). The user can download data for their region of interest as a shapefile in case of vector data and as a GeoTiff in case of raster data. A single file is created for each map feature. Twenty eight tools are provided to let the user identify, select, query, interpret and download the data.
What defines an Expert? - Uncertainty in the interpretation of seismic data
NASA Astrophysics Data System (ADS)
Bond, C. E.
2008-12-01
Studies focusing on the elicitation of information from experts are concentrated primarily in economics and world markets, medical practice and expert witness testimonies. Expert elicitation theory has been applied in the natural sciences, most notably in the prediction of fluid flow in hydrological studies. In the geological sciences expert elicitation has been limited to theoretical analysis with studies focusing on the elicitation element, gaining expert opinion rather than necessarily understanding the basis behind the expert view. In these cases experts are defined in a traditional sense, based for example on: standing in the field, no. of years of experience, no. of peer reviewed publications, the experts position in a company hierarchy or academia. Here traditional indicators of expertise have been compared for significance on affective seismic interpretation. Polytomous regression analysis has been used to assess the relative significance of length and type of experience on the outcome of a seismic interpretation exercise. Following the initial analysis the techniques used by participants to interpret the seismic image were added as additional variables to the analysis. Specific technical skills and techniques were found to be more important for the affective geological interpretation of seismic data than the traditional indicators of expertise. The results of a seismic interpretation exercise, the techniques used to interpret the seismic and the participant's prior experience have been combined and analysed to answer the question - who is and what defines an expert?
Lin, Meng Kuan; Nicolini, Oliver; Waxenegger, Harald; Galloway, Graham J.; Ullmann, Jeremy F. P.; Janke, Andrew L.
2013-01-01
Digital Imaging Processing (DIP) requires data extraction and output from a visualization tool to be consistent. Data handling and transmission between the server and a user is a systematic process in service interpretation. The use of integrated medical services for management and viewing of imaging data in combination with a mobile visualization tool can be greatly facilitated by data analysis and interpretation. This paper presents an integrated mobile application and DIP service, called M-DIP. The objective of the system is to (1) automate the direct data tiling, conversion, pre-tiling of brain images from Medical Imaging NetCDF (MINC), Neuroimaging Informatics Technology Initiative (NIFTI) to RAW formats; (2) speed up querying of imaging measurement; and (3) display high-level of images with three dimensions in real world coordinates. In addition, M-DIP provides the ability to work on a mobile or tablet device without any software installation using web-based protocols. M-DIP implements three levels of architecture with a relational middle-layer database, a stand-alone DIP server, and a mobile application logic middle level realizing user interpretation for direct querying and communication. This imaging software has the ability to display biological imaging data at multiple zoom levels and to increase its quality to meet users’ expectations. Interpretation of bioimaging data is facilitated by an interface analogous to online mapping services using real world coordinate browsing. This allows mobile devices to display multiple datasets simultaneously from a remote site. M-DIP can be used as a measurement repository that can be accessed by any network environment, such as a portable mobile or tablet device. In addition, this system and combination with mobile applications are establishing a virtualization tool in the neuroinformatics field to speed interpretation services. PMID:23847587
42 CFR 417.568 - Adequate financial records, statistical data, and cost finding.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 42 Public Health 3 2011-10-01 2011-10-01 false Adequate financial records, statistical data, and... financial records, statistical data, and cost finding. (a) Maintenance of records. (1) An HMO or CMP must maintain sufficient financial records and statistical data for proper determination of costs payable by...
42 CFR 417.568 - Adequate financial records, statistical data, and cost finding.
Code of Federal Regulations, 2013 CFR
2013-10-01
... 42 Public Health 3 2013-10-01 2013-10-01 false Adequate financial records, statistical data, and....568 Adequate financial records, statistical data, and cost finding. (a) Maintenance of records. (1) An HMO or CMP must maintain sufficient financial records and statistical data for proper determination...
42 CFR 417.568 - Adequate financial records, statistical data, and cost finding.
Code of Federal Regulations, 2014 CFR
2014-10-01
... 42 Public Health 3 2014-10-01 2014-10-01 false Adequate financial records, statistical data, and....568 Adequate financial records, statistical data, and cost finding. (a) Maintenance of records. (1) An HMO or CMP must maintain sufficient financial records and statistical data for proper determination...
42 CFR 417.568 - Adequate financial records, statistical data, and cost finding.
Code of Federal Regulations, 2012 CFR
2012-10-01
... 42 Public Health 3 2012-10-01 2012-10-01 false Adequate financial records, statistical data, and....568 Adequate financial records, statistical data, and cost finding. (a) Maintenance of records. (1) An HMO or CMP must maintain sufficient financial records and statistical data for proper determination...
42 CFR 417.568 - Adequate financial records, statistical data, and cost finding.
Code of Federal Regulations, 2010 CFR
2010-10-01
... 42 Public Health 3 2010-10-01 2010-10-01 false Adequate financial records, statistical data, and... financial records, statistical data, and cost finding. (a) Maintenance of records. (1) An HMO or CMP must maintain sufficient financial records and statistical data for proper determination of costs payable by...
NASA Technical Reports Server (NTRS)
Taylor, P. T.; Kis, K. I.; Wittmann, G.
2013-01-01
The ESA SWARM mission will have three earth orbiting magnetometer bearing satellites one in a high orbit and two side-by-side in lower orbits. These latter satellites will record a horizontal magnetic gradient. In order to determine how we can use these gradient measurements for interpretation of large geologic units we used ten years of CHAMP data to compute a horizontal gradient map over a section of southeastern Europe with our goal to interpret these data over the Pannonian Basin of Hungary.
Ocean optical measurements—II. Statistical analysis of data from Canadian eastern Arctic waters
NASA Astrophysics Data System (ADS)
Topliss, B. J.; Miller, J. R.; Horne, E. P. W.
1989-02-01
The attenuation of light in Arctic waters was found to be controlled by chlorophyll pigment and dissolved material with a possible contribution from suspended particulate matter. The potential dependence of the attenuation coefficient on pigment concentration, depth and material type was statistically investigated to evaluate these individual, but intercorrelated, contributions. When the variation of dissolved material with depth was selected as a separation criteria for the intercorrelated in situ variables the statistical analysis suggested a concentration dependence for the specific attenuation coefficient of chlorophyll pigments. A non-linear attenuation/pigment relationship for the Arctic data, governed by concentration and proportion of phaeophytin to chlorophyll, was found to be consistent with clear water data from the open ocean as well as from turbid waters on the Grand Banks. Although only approximately 25% of available light was absorbed by chlorophyll a pigment itself, the under-water spectrum was modified by these pigments in a manner similar to that occurring in clear open ocean waters. Scattering calculations gave large specific back-scattering values for low pigment concentrations in Arctic waters as well as for waters from an inshore glacial fjord, posing potential interpretation problems for remote sensing applications. In contrast scattering calculations for high pigment concentrations from the Arctic implied that potentially useful information might be extracted from high latitude imagery.
NASA Astrophysics Data System (ADS)
Line, C. E. R.; Hobbs, R. W.; Hudson, J. A.; Snyder, D. B.
1998-01-01
Statistical parameters describing heterogeneity in the Proterozoic basement of the Baltic Shield were estimated from controlled-source seismic data, using a statistical inversion based on the theory of wave propagation through random media (WPRM), derived from the parabolic wave approximation. Synthetic plane-wave seismograms generated from models of random media show consistency with WPRM theory for forward propagation in the weak-scattering regime, whilst for two-way propagation a discrepancy exists that is due to contamination of the primary wave by backscattered energy. Inverse modelling of the real seismic data suggests that the upper crust to depths of ~ 15 km can be characterized, subject to the range of spatial resolution of the method, by a medium with an exponential spatial autocorrelation function, an rms velocity fluctuation of 1.5 +/- 0.5 per cent and a correlation length of 150 +/- 50 m. Further inversions show that scattering is predominantly occurring in the uppermost ~ 2 km of crust, where rms velocity fluctuation is 3 - 6 per cent. Although values of correlation distance are well constrained by these inversions, there is a trade-off between thickness of scattering layer and rms velocity perturbation estimates, with both being relatively poorly resolved. The higher near-surface heterogeneity is interpreted to arise from fractures in the basement rocks that close under lithostatic pressure for depths greater than 2 - 3 km.
Quantum Correlations from the Conditional Statistics of Incomplete Data.
Sperling, J; Bartley, T J; Donati, G; Barbieri, M; Jin, X-M; Datta, A; Vogel, W; Walmsley, I A
2016-08-19
We study, in theory and experiment, the quantum properties of correlated light fields measured with click-counting detectors providing incomplete information on the photon statistics. We establish a correlation parameter for the conditional statistics, and we derive the corresponding nonclassicality criteria for detecting conditional quantum correlations. Classical bounds for Pearson's correlation parameter are formulated that allow us, once they are violated, to determine nonclassical correlations via the joint statistics. On the one hand, we demonstrate nonclassical correlations in terms of the joint click statistics of light produced by a parametric down-conversion source. On the other hand, we verify quantum correlations of a heralded, split single-photon state via the conditional click statistics together with a generalization to higher-order moments. We discuss the performance of the presented nonclassicality criteria to successfully discern joint and conditional quantum correlations. Remarkably, our results are obtained without making any assumptions on the response function, quantum efficiency, and dark-count rate of photodetectors.
Quantum Correlations from the Conditional Statistics of Incomplete Data
NASA Astrophysics Data System (ADS)
Sperling, J.; Bartley, T. J.; Donati, G.; Barbieri, M.; Jin, X.-M.; Datta, A.; Vogel, W.; Walmsley, I. A.
2016-08-01
We study, in theory and experiment, the quantum properties of correlated light fields measured with click-counting detectors providing incomplete information on the photon statistics. We establish a correlation parameter for the conditional statistics, and we derive the corresponding nonclassicality criteria for detecting conditional quantum correlations. Classical bounds for Pearson's correlation parameter are formulated that allow us, once they are violated, to determine nonclassical correlations via the joint statistics. On the one hand, we demonstrate nonclassical correlations in terms of the joint click statistics of light produced by a parametric down-conversion source. On the other hand, we verify quantum correlations of a heralded, split single-photon state via the conditional click statistics together with a generalization to higher-order moments. We discuss the performance of the presented nonclassicality criteria to successfully discern joint and conditional quantum correlations. Remarkably, our results are obtained without making any assumptions on the response function, quantum efficiency, and dark-count rate of photodetectors.
Kim, Seokyeon; Jeong, Seongmin; Woo, Insoo; Jang, Yun; Maciejewski, Ross; Ebert, David
2017-02-08
Geographic visualization research has focused on a variety of techniques to represent and explore spatiotemporal data. The goal of those techniques is to enable users to explore events and interactions over space and time in order to facilitate the discovery of patterns, anomalies and relationships within the data. However, it is difficult to extract and visualize data flow patterns over time for non-directional statistical data without trajectory information. In this work, we develop a novel flow analysis technique to extract, represent, and analyze flow maps of non-directional spatiotemporal data unaccompanied by trajectory information. We estimate a continuous distribution of these events over space and time, and extract flow fields for spatial and temporal changes utilizing a gravity model. Then, we visualize the spatiotemporal patterns in the data by employing flow visualization techniques. The user is presented with temporal trends of geo-referenced discrete events on a map. As such, overall spatiotemporal data flow patterns help users analyze geo-referenced temporal events, such as disease outbreaks, crime patterns, etc. To validate our model, we discard the trajectory information in an origin-destination dataset and apply our technique to the data and compare the derived trajectories and the original. Finally, we present spatiotemporal trend analysis for statistical datasets including twitter data, maritime search and rescue events, and syndromic surveillance.
PANDA: pathway and annotation explorer for visualizing and interpreting gene-centric data.
Hart, Steven N; Moore, Raymond M; Zimmermann, Michael T; Oliver, Gavin R; Egan, Jan B; Bryce, Alan H; Kocher, Jean-Pierre A
2015-01-01
Objective. Bringing together genomics, transcriptomics, proteomics, and other -omics technologies is an important step towards developing highly personalized medicine. However, instrumentation has advances far beyond expectations and now we are able to generate data faster than it can be interpreted. Materials and Methods. We have developed PANDA (Pathway AND Annotation) Explorer, a visualization tool that integrates gene-level annotation in the context of biological pathways to help interpret complex data from disparate sources. PANDA is a web-based application that displays data in the context of well-studied pathways like KEGG, BioCarta, and PharmGKB. PANDA represents data/annotations as icons in the graph while maintaining the other data elements (i.e., other columns for the table of annotations). Custom pathways from underrepresented diseases can be imported when existing data sources are inadequate. PANDA also allows sharing annotations among collaborators. Results. In our first use case, we show how easy it is to view supplemental data from a manuscript in the context of a user's own data. Another use-case is provided describing how PANDA was leveraged to design a treatment strategy from the somatic variants found in the tumor of a patient with metastatic sarcomatoid renal cell carcinoma. Conclusion. PANDA facilitates the interpretation of gene-centric annotations by visually integrating this information with context of biological pathways. The application can be downloaded or used directly from our website: http://bioinformaticstools.mayo.edu/research/panda-viewer/.
ROOT: A C++ framework for petabyte data storage, statistical analysis and visualization
Antcheva, I.; Ballintijn, M.; Bellenot, B.; Biskup, M.; Brun, R.; Buncic, N.; Canal, Ph.; Casadei, D.; Couet, O.; Fine, V.; Franco, L.; /CERN /CERN
2009-01-01
ROOT is an object-oriented C++ framework conceived in the high-energy physics (HEP) community, designed for storing and analyzing petabytes of data in an efficient way. Any instance of a C++ class can be stored into a ROOT file in a machine-independent compressed binary format. In ROOT the TTree object container is optimized for statistical data analysis over very large data sets by using vertical data storage techniques. These containers can span a large number of files on local disks, the web or a number of different shared file systems. In order to analyze this data, the user can chose out of a wide set of mathematical and statistical functions, including linear algebra classes, numerical algorithms such as integration and minimization, and various methods for performing regression analysis (fitting). In particular, the RooFit package allows the user to perform complex data modeling and fitting while the RooStats library provides abstractions and implementations for advanced statistical tools. Multivariate classification methods based on machine learning techniques are available via the TMVA package. A central piece in these analysis tools are the histogram classes which provide binning of one- and multi-dimensional data. Results can be saved in high-quality graphical formats like Postscript and PDF or in bitmap formats like JPG or GIF. The result can also be stored into ROOT macros that allow a full recreation and rework of the graphics. Users typically create their analysis macros step by step, making use of the interactive C++ interpreter CINT, while running over small data samples. Once the development is finished, they can run these macros at full compiled speed over large data sets, using on-the-fly compilation, or by creating a stand-alone batch program. Finally, if processing farms are available, the user can reduce the execution time of intrinsically parallel tasks - e.g. data mining in HEP - by using PROOF, which will take care of optimally
Qualitative Data Analysis and Interpretation in Counseling Psychology: Strategies for Best Practices
ERIC Educational Resources Information Center
Yeh, Christine J.; Inman, Arpana G.
2007-01-01
This article presents an overview of various strategies and methods of engaging in qualitative data interpretations and analyses in counseling psychology. The authors explore the themes of self, culture, collaboration, circularity, trustworthiness, and evidence deconstruction from multiple qualitative methodologies. Commonalities and differences…
Interpreting Evidence-of-Learning: Educational Research in the Era of Big Data
ERIC Educational Resources Information Center
Cope, Bill; Kalantzis, Mary
2015-01-01
In this article, we argue that big data can offer new opportunities and roles for educational researchers. In the traditional model of evidence-gathering and interpretation in education, researchers are independent observers, who pre-emptively create instruments of measurement, and insert these into the educational process in specialized times and…
Comparison and standardization of soil enzyme assay for meaningful data interpretation.
Deng, Shiping; Dick, Richard; Freeman, Christopher; Kandeler, Ellen; Weintraub, Michael N
2017-02-01
Data interpretation and comparison in enzyme assays can be challenging because of the complex nature of the environment and variations in methods employed. This letter provides an overview of common enzyme assays, the need for methods standardization, and solutions addressing some of the concerns in microplate fluorimetric assay approaches.
ERIC Educational Resources Information Center
Walther, Joachim; Sochacka, Nicola W.; Pawley, Alice L.
2016-01-01
This article explores challenges and opportunities associated with sharing qualitative data in engineering education research. This exploration is theoretically informed by an existing framework of interpretive research quality with a focus on the concept of Communicative Validation. Drawing on practice anecdotes from the authors' work, the…
Computational Approaches and Tools for Exposure Prioritization and Biomonitoring Data Interpretation
The ability to describe the source-environment-exposure-dose-response continuum is essential for identifying exposures of greater concern to prioritize chemicals for toxicity testing or risk assessment, as well as for interpreting biomarker data for better assessment of exposure ...
Interpreting Reading Assessment Data: Moving From Parts to Whole in a Testing Era
ERIC Educational Resources Information Center
Amendum, Steven J.; Conradi, Kristin; Pendleton, Melissa J.
2016-01-01
This article is designed to help teachers interpret reading assessment data from DIBELS beyond individual subtests to better support their students' needs. While it is important to understand the individual subtest measures, it is more vital to understand how each fits into the larger picture of reading development. The underlying construct of…
Guenther, P.T.; Poenitz, W.P.; Smith, A.B.
1980-01-01
Problem areas in the interpretation of fast-neutron data are discussed. Their impact on experimental uncertainties and hence the evaluation process are reviewed in the context of user needs. Contributions of supplementary information such as nuclear models and applications tests are explored. Specific means for resolving difficulties cited are proposed and illustrated.
Statistical Analysis of CMC Constituent and Processing Data
NASA Technical Reports Server (NTRS)
Fornuff, Jonathan
2004-01-01
Ceramic Matrix Composites (CMCs) are the next "big thing" in high-temperature structural materials. In the case of jet engines, it is widely believed that the metallic superalloys currently being utilized for hot structures (combustors, shrouds, turbine vanes and blades) are nearing their potential limits of improvement. In order to allow for increased turbine temperatures to increase engine efficiency, material scientists have begun looking toward advanced CMCs and SiC/SiC composites in particular. Ceramic composites provide greater strength-to-weight ratios at higher temperatures than metallic alloys, but at the same time require greater challenges in micro-structural optimization that in turn increases the cost of the material as well as increases the risk of variability in the material s thermo-structural behavior. to model various potential CMC engine materials and examines the current variability in these properties due to variability in component processing conditions and constituent materials; then, to see how processing and constituent variations effect key strength, stiffness, and thermal properties of the finished components. Basically, this means trying to model variations in the component s behavior by knowing what went into creating it. inter-phase and manufactured by chemical vapor infiltration (CVI) and melt infiltration (MI) were considered. Examinations of: (1) the percent constituents by volume, (2) the inter-phase thickness, (3) variations in the total porosity, and (4) variations in the chemical composition of the Sic fiber are carried out and modeled using various codes used here at NASA-Glenn (PCGina, NASALife, CEMCAN, etc...). The effects of these variations and the ranking of their respective influences on the various thermo-mechanical material properties are studied and compared to available test data. The properties of the materials as well as minor changes to geometry are then made to the computer model and the detrimental effects
Teaching for Statistical Literacy: Utilising Affordances in Real-World Data
ERIC Educational Resources Information Center
Chick, Helen L.; Pierce, Robyn
2012-01-01
It is widely held that context is important in teaching mathematics and statistics. Consideration of context is central to statistical thinking, and any teaching of statistics must incorporate this aspect. Indeed, it has been advocated that real-world data sets can motivate the learning of statistical principles. It is not, however, a…
Ogunnaike, Babatunde A; Gelmi, Claudio A; Edwards, Jeremy S
2010-05-21
Gene expression studies generate large quantities of data with the defining characteristic that the number of genes (whose expression profiles are to be determined) exceed the number of available replicates by several orders of magnitude. Standard spot-by-spot analysis still seeks to extract useful information for each gene on the basis of the number of available replicates, and thus plays to the weakness of microarrays. On the other hand, because of the data volume, treating the entire data set as an ensemble, and developing theoretical distributions for these ensembles provides a framework that plays instead to the strength of microarrays. We present theoretical results that under reasonable assumptions, the distribution of microarray intensities follows the Gamma model, with the biological interpretations of the model parameters emerging naturally. We subsequently establish that for each microarray data set, the fractional intensities can be represented as a mixture of Beta densities, and develop a procedure for using these results to draw statistical inference regarding differential gene expression. We illustrate the results with experimental data from gene expression studies on Deinococcus radiodurans following DNA damage using cDNA microarrays.
ERIC Educational Resources Information Center
Cruce, Ty M.
2009-01-01
This methodological note illustrates how a commonly used calculation of the Delta-p statistic is inappropriate for categorical independent variables, and this note provides users of logistic regression with a revised calculation of the Delta-p statistic that is more meaningful when studying the differences in the predicted probability of an…
Building Basic Statistical Literacy with U.S. Census Data
ERIC Educational Resources Information Center
Sheffield, Caroline C.; Karp, Karen S.; Brown, E. Todd
2010-01-01
The world is filled with information delivered through graphical representations--everything from voting trends to economic projections to health statistics. Whether comparing incomes of individuals by their level of education, tracking the rise and fall of state populations, or researching home ownership in different geographical areas, basic…
Interpretation Of Multifrequency Crosswell Electromagnetic Data With Frequency Dependent Core Data
Kirkendall, B; Roberts, J
2005-06-07
Interpretation of cross-borehole electromagnetic (EM) images acquired at enhanced oil recovery (EOR) sites has proven to be difficult due to the typically complex subsurface geology. Significant problems in image interpretation include correlation of specific electrical conductivity values with oil saturations, the time-dependent electrical variation of the subsurface during EOR, and the non-unique electrical conductivity relationship with subsurface conditions. In this study we perform laboratory electrical properties measurements of core samples from the EOR site to develop an interpretation approach that combines field images and petrophysical results. Cross-borehole EM images from the field indicate resistivity increases in EOR areas--behavior contrary to the intended waterflooding design. Laboratory measurements clearly show a decrease in resistivity with increasing effective pressure and are attributed to increased grain-to-grain contact enhancing a strong surface conductance. We also observe a resistivity increase for some samples during brine injection. These observations possibly explain the contrary behavior observed in the field images. Possible mechanisms for increasing the resistivity in the region include (1) increased oil content as injectate sweeps oil toward the plane of the observation wells; (2) lower conductance pore fluid displacing the high-conductivity brine; (3) degradation of grain-to-grain contacts of the initially conductive matrix; and (4) artifacts of the complicated resistivity/time history similar to that observed in the laboratory experiments.
3D interpretation of SHARAD radargram data using seismic processing routines
NASA Astrophysics Data System (ADS)
Kleuskens, M. H. P.; Oosthoek, J. H. P.
2009-04-01
Ground penetrating radar on board a satellite has entered the field of planetary geology. Two radars enable subsurface observations of Mars. In 2003, ESA launched the Mars Express equipped with MARSIS, a low frequency radar which was able to detect only the base of the ice caps. Since December 2006, the Shallow Radar (SHARAD) of Agenzia Spaziale Italiana (ASI) on board the NASA Mars Reconnaissance Orbiter (MRO) is active in orbit around Mars. The SHARAD radar covers the frequency band between 15 and 25 MHz. The vertical resolution is about 15 m in free space. The horizontal resolution is 300-1000 m along track and 1500-8000 m across track. The radar penetrates the subsurface of Mars up to 2 km deep, and is capable of detecting multiple reflections in the ice caps of Mars. Considering the scarcity of planetary data relative to terrestrial data, it is essential to combine all available types of data of an area of interest. Up to now SHARAD data has only been interpreted separately as 2D radargrams. The Geological Survey of the Netherlands has decades of experience in interpreting 2D and 3D seismic data of the Dutch subsurface, especially for the 3D interpretation of reservoir characteristics of the deeper subsurface. In this abstract we present a methodology which can be used for 3D interpretation of SHARAD data combined with surface data using state-of-the art seismic software applied in the oil and gas industry. We selected a region that would be most suitable to demonstrate 3D interpretation. The Titania Lobe of the North Polar ice cap was selected based on the abundancy of radar data and the complexity of the ice lobe. SHARAD data is released to the scientific community via the Planetary Data System. It includes ‘Reduced Data Records' (RDR) data, a binary format which contains the radargram. First the binary radargram data and corresponding coordinates were combined and converted to the commonly used seismic seg-y format. Second, we used the reservoir
Performance Data Gathering and Representation from Fixed-Size Statistical Data
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Jin, Haoqiang H.; Schmidt, Melisa A.; Kutler, Paul (Technical Monitor)
1997-01-01
The two commonly-used performance data types in the super-computing community, statistics and event traces, are discussed and compared. Statistical data are much more compact but lack the probative power event traces offer. Event traces, on the other hand, are unbounded and can easily fill up the entire file system during program execution. In this paper, we propose an innovative methodology for performance data gathering and representation that offers a middle ground. Two basic ideas are employed: the use of averages to replace recording data for each instance and 'formulae' to represent sequences associated with communication and control flow. The user can trade off tracing overhead, trace data size with data quality incrementally. In other words, the user will be able to limit the amount of trace data collected and, at the same time, carry out some of the analysis event traces offer using space-time views. With the help of a few simple examples, we illustrate the use of these techniques in performance tuning and compare the quality of the traces we collected with event traces. We found that the trace files thus obtained are, indeed, small, bounded and predictable before program execution, and that the quality of the space-time views generated from these statistical data are excellent. Furthermore, experimental results showed that the formulae proposed were able to capture all the sequences associated with 11 of the 15 applications tested. The performance of the formulae can be incrementally improved by allocating more memory at runtime to learn longer sequences.
Van Ness, Peter H.; Fried, Terri R.; Gill, Thomas M.
2012-01-01
This article’s main objective is to demonstrate that data analysis, including quantitative data analysis, is a process of interpretation involving basic hermeneutic principles that philosophers have identified in the interpretive process as applied to other, mainly literary, creations. Such principles include a version of the hermeneutic circle, an insistence on interpretive presuppositions, and a resistance to reducing the discovery of truth to the application of inductive methods. The importance of interpretation becomes especially evident when qualitative and quantitative methods are combined in a single clinical research project and when the data being analyzed are longitudinal. Study objectives will be accomplished by showing that three major hermeneutic principles make practical methodological contributions to an insightful, illustrative mixed methods analysis of a qualitative study of changes in functional disability over time embedded in the Precipitating Events Project—a major longitudinal, quantitative study of functional disability among older persons. Mixed methods, especially as shaped by hermeneutic insights such as the importance of empathetic understanding, are potentially valuable resources for scientific investigations of the experience of aging: a practical aim of this article is to articulate and demonstrate this contention. PMID:22582035
NASA Technical Reports Server (NTRS)
Smith, G. L.; Green, R. N.; Young, G. R.
1974-01-01
The NIMBUS-G environmental monitoring satellite has an instrument (a gas correlation spectrometer) onboard for measuring the mass of a given pollutant within a gas volume. The present paper treats the problem: How can this type measurement be used to estimate the distribution of pollutant levels in a metropolitan area. Estimation methods are used to develop this distribution. The pollution concentration caused by a point source is modeled as a Gaussian plume. The uncertainty in the measurements is used to determine the accuracy of estimating the source strength, the wind velocity, diffusion coefficients and source location.
NASA Astrophysics Data System (ADS)
Bearcock, Jenny; Lark, Murray
2016-04-01
Stream water is a key medium for regional geochemical survey. Stream water geochemical data have many potential applications, including mineral exploration, environmental monitoring and protection, catchment management and modelling potential impacts of climate or land use changes. However, stream waters are transient, and measurements are susceptible to various sources of temporal variation. In a regional geochemical survey stream water data comprise "snapshots" of the state of the medium at a sample time. For this reason the British Geological Survey (BGS) has included monitoring streams in its regional geochemical baseline surveys (GBASE) at which daily stream water samples are collected to supplement the spatial data collected in once-off sampling events. In this study we present results from spatio-temporal analysis of spatial stream water surveys and the associated monitoring stream data. We show how the interpretation of the temporal variability as a source of uncertainty depends on how the spatial data are interpreted (as estimates of a summer-time mean concentration, or as point measurements), and explore the implications of this uncertainty in the interpretation of stream water data in a regulatory context.
NASA Astrophysics Data System (ADS)
Watkins, Hannah; Bond, Clare; Butler, Rob
2016-04-01
Geological mapping techniques have advanced significantly in recent years from paper fieldslips to Toughbook, smartphone and tablet mapping; but how do the methods used to create a geological map affect the thought processes that result in the final map interpretation? Geological maps have many key roles in the field of geosciences including understanding geological processes and geometries in 3D, interpreting geological histories and understanding stratigraphic relationships in 2D and 3D. Here we consider the impact of the methods used to create a map on the thought processes that result in the final geological map interpretation. As mapping technology has advanced in recent years, the way in which we produce geological maps has also changed. Traditional geological mapping is undertaken using paper fieldslips, pencils and compass clinometers. The map interpretation evolves through time as data is collected. This interpretive process that results in the final geological map is often supported by recording in a field notebook, observations, ideas and alternative geological models explored with the use of sketches and evolutionary diagrams. In combination the field map and notebook can be used to challenge the map interpretation and consider its uncertainties. These uncertainties and the balance of data to interpretation are often lost in the creation of published 'fair' copy geological maps. The advent of Toughbooks, smartphones and tablets in the production of geological maps has changed the process of map creation. Digital data collection, particularly through the use of inbuilt gyrometers in phones and tablets, has changed smartphones into geological mapping tools that can be used to collect lots of geological data quickly. With GPS functionality this data is also geospatially located, assuming good GPS connectivity, and can be linked to georeferenced infield photography. In contrast line drawing, for example for lithological boundary interpretation and sketching
Heat-Passing Framework for Robust Interpretation of Data in Networks
Fang, Yi; Sun, Mengtian; Ramani, Karthik
2015-01-01
Researchers are regularly interested in interpreting the multipartite structure of data entities according to their functional relationships. Data is often heterogeneous with intricately hidden inner structure. With limited prior knowledge, researchers are likely to confront the problem of transforming this data into knowledge. We develop a new framework, called heat-passing, which exploits intrinsic similarity relationships within noisy and incomplete raw data, and constructs a meaningful map of the data. The proposed framework is able to rank, cluster, and visualize the data all at once. The novelty of this framework is derived from an analogy between the process of data interpretation and that of heat transfer, in which all data points contribute simultaneously and globally to reveal intrinsic similarities between regions of data, meaningful coordinates for embedding the data, and exemplar data points that lie at optimal positions for heat transfer. We demonstrate the effectiveness of the heat-passing framework for robustly partitioning the complex networks, analyzing the globin family of proteins and determining conformational states of macromolecules in the presence of high levels of noise. The results indicate that the methodology is able to reveal functionally consistent relationships in a robust fashion with no reference to prior knowledge. The heat-passing framework is very general and has the potential for applications to a broad range of research fields, for example, biological networks, social networks and semantic analysis of documents. PMID:25668316
Interdisciplinary application and interpretation of EREP data within the Susquehanna River Basin
NASA Technical Reports Server (NTRS)
Mcmurtry, G. J.; Petersen, G. W. (Principal Investigator)
1975-01-01
The author has identified the following significant results. It has become that lineaments seen on Skylab and ERTS images are not equally well defined, and that the clarity of definition of a particular lineament is recorded somewhat differently by different interpreters. In an effort to determine the extent of these variations, a semi-quantitative classification scheme was devised. In the field, along the crest of Bald Eagle Mountain in central Pennsylvania, statistical techniques borrowed from sedimentary petrography (point counting) were used to determine the existence and location of intensely fractured float rock. Verification of Skylab and ERTS detected lineaments on aerial photography at different scales indicated that the brecciated zones appear to occur at one margin of the 1 km zone of brecciation defined as a lineament. In the Lock Haven area, comparison of the film types from the SL4 S190A sensor revealed the black and white Pan X photography to be superior in quality for general interpretation to the black and white IR film. Also, the color positive film is better for interpretation than the color IR film.
Fitzgerald, Peter; Laughter, Mark D; Martyn, Rose; Richardson, Dave; Rowe, Nathan C; Pickett, Chris A; Younkin, James R; Shephard, Adam M
2010-01-01
Accountability scale data from the Global Nuclear Fuels (GNF) fuel fabrication facility in Wilmington, NC has been collected and analyzed as a part of the Cylinder Accountability and Tracking System (CATS) field trial in 2009. The purpose of the data collection was to demonstrate an authentication method for safeguards applications, and the use of load cell data in cylinder accountability. The scale data was acquired using a commercial off-the-shelf communication server with authentication and encryption capabilities. The authenticated weight data was then analyzed to determine facility operating activities. The data allowed for the determination of the number of full and empty cylinders weighed and the respective weights along with other operational activities. Data authentication concepts, practices and methods, the details of the GNF weight data authentication implementation and scale data interpretation results will be presented.
Statistical tests of ARIES data. [very long base interferometry geodesy
NASA Technical Reports Server (NTRS)
Musman, S.
1982-01-01
Statistical tests are performed on Project ARIES preliminary baseline measurements in the Southern California triangle formed by the Jet Propulsion Laboratory, the Owens Valley Radio Observatory, and the Goldstone tracking complex during 1976-1980. In addition to conventional one-dimensional tests a two-dimensional test which allows for an arbitrary correlation between errors in individual components is formulated using the Hotelling statistic. On two out of three baselines the mean rate of change in baseline vector is statistically significant. Apparent motions on all three baselines are consistent with a pure shear with north-south compression and east-west expansion of 1 x 10 to the -7th/year. The ARIES measurements are consistent with the USGS geodolite networks in Southern California and the SAFE laser satellite ranging experiment. All three experiments are consistent with a 6 cm/year motion between the Pacific and North American Plates and a band of diffuse shear 300 km wide, except that corresponding rotation of the entire triangle is not found.
49 CFR Schedule G to Subpart B of... - Selected Statistical Data
Code of Federal Regulations, 2010 CFR
2010-10-01
... 49 Transportation 8 2010-10-01 2010-10-01 false Selected Statistical Data G Schedule G to Subpart... Statistical Data () Greyhound Lines, Inc. () Trailways combined () All study carriers Line No. and Item (a.... (b) Other Statistics: 25Number of regulator route intercity passenger miles Sch. 9002, L. 12, col....
29 CFR 1904.42 - Requests from the Bureau of Labor Statistics for data.
Code of Federal Regulations, 2012 CFR
2012-07-01
... 29 Labor 5 2012-07-01 2012-07-01 false Requests from the Bureau of Labor Statistics for data. 1904... Statistics for data. (a) Basic requirement. If you receive a Survey of Occupational Injuries and Illnesses Form from the Bureau of Labor Statistics (BLS), or a BLS designee, you must promptly complete the...
49 CFR Schedule G to Subpart B of... - Selected Statistical Data
Code of Federal Regulations, 2013 CFR
2013-10-01
... 49 Transportation 8 2013-10-01 2013-10-01 false Selected Statistical Data G Schedule G to Subpart... Statistical Data () Greyhound Lines, Inc. () Trailways combined () All study carriers Line No. and Item (a.... (b) Other Statistics: 25Number of regulator route intercity passenger miles Sch. 9002, L. 12, col....
49 CFR Schedule G to Subpart B of... - Selected Statistical Data
Code of Federal Regulations, 2012 CFR
2012-10-01
... 49 Transportation 8 2012-10-01 2012-10-01 false Selected Statistical Data G Schedule G to Subpart... Statistical Data () Greyhound Lines, Inc. () Trailways combined () All study carriers Line No. and Item (a.... (b) Other Statistics: 25Number of regulator route intercity passenger miles Sch. 9002, L. 12, col....
29 CFR 1904.42 - Requests from the Bureau of Labor Statistics for data.
Code of Federal Regulations, 2011 CFR
2011-07-01
... 29 Labor 5 2011-07-01 2011-07-01 false Requests from the Bureau of Labor Statistics for data. 1904... Statistics for data. (a) Basic requirement. If you receive a Survey of Occupational Injuries and Illnesses Form from the Bureau of Labor Statistics (BLS), or a BLS designee, you must promptly complete the...
29 CFR 1904.42 - Requests from the Bureau of Labor Statistics for data.
Code of Federal Regulations, 2010 CFR
2010-07-01
... 29 Labor 5 2010-07-01 2010-07-01 false Requests from the Bureau of Labor Statistics for data. 1904... Statistics for data. (a) Basic requirement. If you receive a Survey of Occupational Injuries and Illnesses Form from the Bureau of Labor Statistics (BLS), or a BLS designee, you must promptly complete the...
29 CFR 1904.42 - Requests from the Bureau of Labor Statistics for data.
Code of Federal Regulations, 2014 CFR
2014-07-01
... 29 Labor 5 2014-07-01 2014-07-01 false Requests from the Bureau of Labor Statistics for data. 1904... Statistics for data. (a) Basic requirement. If you receive a Survey of Occupational Injuries and Illnesses Form from the Bureau of Labor Statistics (BLS), or a BLS designee, you must promptly complete the...
29 CFR 1904.42 - Requests from the Bureau of Labor Statistics for data.
Code of Federal Regulations, 2013 CFR
2013-07-01
... 29 Labor 5 2013-07-01 2013-07-01 false Requests from the Bureau of Labor Statistics for data. 1904... Statistics for data. (a) Basic requirement. If you receive a Survey of Occupational Injuries and Illnesses Form from the Bureau of Labor Statistics (BLS), or a BLS designee, you must promptly complete the...
49 CFR Schedule G to Subpart B of... - Selected Statistical Data
Code of Federal Regulations, 2014 CFR
2014-10-01
... 49 Transportation 8 2014-10-01 2014-10-01 false Selected Statistical Data G Schedule G to Subpart... Statistical Data () Greyhound Lines, Inc. () Trailways combined () All study carriers Line No. and Item (a.... (b) Other Statistics: 25Number of regulator route intercity passenger miles Sch. 9002, L. 12, col....
3 CFR - Enhanced Collection of Relevant Data and Statistics Relating to Women
Code of Federal Regulations, 2012 CFR
2012-01-01
... 3 The President 1 2012-01-01 2012-01-01 false Enhanced Collection of Relevant Data and Statistics... Collection of Relevant Data and Statistics Relating to Women Memorandum for the Heads of Executive... light of available statistical evidence. It will also assist the work of the nongovernmental...
Radar Derived Spatial Statistics of Summer Rain. Volume 2; Data Reduction and Analysis
NASA Technical Reports Server (NTRS)
Konrad, T. G.; Kropfli, R. A.
1975-01-01
Data reduction and analysis procedures are discussed along with the physical and statistical descriptors used. The statistical modeling techniques are outlined and examples of the derived statistical characterization of rain cells in terms of the several physical descriptors are presented. Recommendations concerning analyses which can be pursued using the data base collected during the experiment are included.
Comparison of Grammar-Based and Statistical Language Models Trained on the Same Data
NASA Technical Reports Server (NTRS)
Hockey, Beth Ann; Rfayner, Manny
2005-01-01
This paper presents a methodologically sound comparison of the performance of grammar-based (GLM) and statistical-based (SLM) recognizer architectures using data from the Clarissa procedure navigator domain. The Regulus open source packages make this possible with a method for constructing a grammar-based language model by training on a corpus. We construct grammar-based and statistical language models from the same corpus for comparison, and find that the grammar-based language models provide better performance in this domain. The best SLM version has a semantic error rate of 9.6%, while the best GLM version has an error rate of 6.0%. Part of this advantage is accounted for by the superior WER and Sentence Error Rate (SER) of the GLM (WER 7.42% versus 6.27%, and SER 12.41% versus 9.79%). The rest is most likely accounted for by the fact that the GLM architecture is able to use logical-form-based features, which permit tighter integration of recognition and semantic interpretation.
Effects of a Prior Virtual Experience on Students' Interpretations of Real Data
NASA Astrophysics Data System (ADS)
Chini, Jacquelyn J.; Carmichael, Adrian; Gire, Elizabeth; Rebello, N. Sanjay; Puntambekar, Sadhana
2010-10-01
Our previous work has shown that experimentation with virtual manipulatives supports students' conceptual learning about simple machines differently than experimentation with physical manipulatives [1]. This difference could be due to the "messiness" of physical data from factors such as dissipative effects and measurement uncertainty. In this study, we ask whether the prior experience of performing a virtual experiment affects how students interpret the data from a physical experiment. Students enrolled in a conceptual-based physics laboratory used a hypertext system to explore the science concepts related to simple machines and performed physical and virtual experiments to learn about pulleys and inclined planes. Approximately half of the students performed the physical experiments before the virtual experiments and the other half completed the virtual experiments first. We find that using virtual manipulatives before physical manipulatives may promote an interpretation of physical data that is more productive for conceptual learning.
Adaptive interpretation of gas well deliverability tests with generating data of the IPR curve
NASA Astrophysics Data System (ADS)
Sergeev, V. L.; Phuong, Nguyen T. H.; Krainov, A. I.
2017-01-01
The paper considers topical issues of improving accuracy of estimated parameters given by data obtained from gas well deliverability tests, decreasing test time, and reducing gas emissions into the atmosphere. The aim of the research is to develop the method of adaptive interpretation of gas well deliverability tests with a resulting IPR curve and using a technique of generating data, which allows taking into account additional a priori information, improving accuracy of determining formation pressure and flow coefficients, reducing test time. The present research is based on the previous theoretical and practical findings in the spheres of gas well deliverability tests, systems analysis, system identification, function optimization and linear algebra. To test the method, the authors used the field data of deliverability tests of two wells, run in the Urengoy gas and condensate field, Tyumen Oblast. The authors suggest the method of adaptive interpretation of gas well deliverability tests with the resulting IPR curve and the possibility of generating data of bottomhole pressure and a flow rate at different test stages. The suggested method allows defining the estimates of the formation pressure and flow coefficients, optimal in terms of preassigned measures of quality, and setting the adequate number of test stages in the course of well testing. The case study of IPR curve data processing has indicated that adaptive interpretation provides more accurate estimates on the formation pressure and flow coefficients, as well as reduces the number of test stages.
Graphical arterial blood gas visualization tool supports rapid and accurate data interpretation.
Doig, Alexa K; Albert, Robert W; Syroid, Noah D; Moon, Shaun; Agutter, Jim A
2011-04-01
A visualization tool that integrates numeric information from an arterial blood gas report with novel graphics was designed for the purpose of promoting rapid and accurate interpretation of acid-base data. A study compared data interpretation performance when arterial blood gas results were presented in a traditional numerical list versus the graphical visualization tool. Critical-care nurses (n = 15) and nursing students (n = 15) were significantly more accurate identifying acid-base states and assessing trends in acid-base data when using the graphical visualization tool. Critical-care nurses and nursing students using traditional numerical data had an average accuracy of 69% and 74%, respectively. Using the visualization tool, average accuracy improved to 83% for critical-care nurses and 93% for nursing students. Analysis of response times demonstrated that the visualization tool might help nurses overcome the "speed/accuracy trade-off" during high-stress situations when rapid decisions must be rendered. Perceived mental workload was significantly reduced for nursing students when they used the graphical visualization tool. In this study, the effects of implementing the graphical visualization were greater for nursing students than for critical-care nurses, which may indicate that the experienced nurses needed more training and use of the new technology prior to testing to show similar gains. Results of the objective and subjective evaluations support the integration of this graphical visualization tool into clinical environments that require accurate and timely interpretation of arterial blood gas data.
A Novel Approach to Asynchronous MVP Data Interpretation Based on Elliptical-Vectors
NASA Astrophysics Data System (ADS)
Kruglyakov, M.; Trofimov, I.; Korotaev, S.; Shneyer, V.; Popova, I.; Orekhova, D.; Scshors, Y.; Zhdanov, M. S.
2014-12-01
We suggest a novel approach to asynchronous magnetic-variation profiling (MVP) data interpretation. Standard method in MVP is based on the interpretation of the coefficients of linear relation between vertical and horizontal components of the measured magnetic field.From mathematical point of view this pair of linear coefficients is not a vector which leads to significant difficulties in asynchronous data interpretation. Our approach allows us to actually treat such a pair of complex numbers as a special vector called an ellipse-vector (EV). By choosing the particular definitions of complex length and direction, the basic relation of MVP can be considered as the dot product. This considerably simplifies the interpretation of asynchronous data. The EV is described by four real numbers: the values of major and minor semiaxes, the angular direction of the major semiaxis and the phase. The notation choice is motivated by historical reasons. It is important that different EV's components have different sensitivity with respect to the field sources and the local heterogeneities. Namely, the value of major semiaxis and the angular direction are mostly determined by the field source and the normal cross-section. On the other hand, the value of minor semiaxis and the phase are responsive to local heterogeneities. Since the EV is the general form of complex vector, the traditional Schmucker vectors can be explicitly expressed through its components.The proposed approach was successfully applied to interpretation the results of asynchronous measurements that had been obtained in the Arctic Ocean at the drift stations "North Pole" in 1962-1976.
Data mining and well logging interpretation: application to a conglomerate reservoir
NASA Astrophysics Data System (ADS)
Shi, Ning; Li, Hong-Qi; Luo, Wei-Ping
2015-06-01
Data mining is the process of extracting implicit but potentially useful information from incomplete, noisy, and fuzzy data. Data mining offers excellent nonlinear modeling and self-organized learning, and it can play a vital role in the interpretation of well logging data of complex reservoirs. We used data mining to identify the lithologies in a complex reservoir. The reservoir lithologies served as the classification task target and were identified using feature extraction, feature selection, and modeling of data streams. We used independent component analysis to extract information from well curves. We then used the branch-and-bound algorithm to look for the optimal feature subsets and eliminate redundant information. Finally, we used the C5.0 decision-tree algorithm to set up disaggregated models of the well logging curves. The modeling and actual logging data were in good agreement, showing the usefulness of data mining methods in complex reservoirs.
Presentation and interpretation of food intake data: factors affecting comparability across studies.
Faber, Mieke; Wenhold, Friede A M; Macintyre, Una E; Wentzel-Viljoen, Edelweiss; Steyn, Nelia P; Oldewage-Theron, Wilna H
2013-01-01
Non-uniform, unclear, or incomplete presentation of food intake data limits interpretation, usefulness, and comparisons across studies. In this contribution, we discuss factors affecting uniform reporting of food intake across studies. The amount of food eaten can be reported as mean portion size, number of servings or total amount of food consumed per day; the absolute intake value for the specific study depends on the denominator used because food intake data can be presented as per capita intake or for consumers only. To identify the foods mostly consumed, foods are reported and ranked according to total number of times consumed, number of consumers, total intake, or nutrient contribution by individual foods or food groups. Presentation of food intake data primarily depends on a study's aim; reported data thus often are not comparable across studies. Food intake data further depend on the dietary assessment methodology used and foods in the database consulted; and are influenced by the inherent limitations of all dietary assessments. Intake data can be presented as either single foods or as clearly defined food groups. Mixed dishes, reported as such or in terms of ingredients and items added during food preparation remain challenging. Comparable presentation of food consumption data is not always possible; presenting sufficient information will assist valid interpretation and optimal use of the presented data. A checklist was developed to strengthen the reporting of food intake data in science communication.
Contributions to Statistical Problems Related to Microarray Data
ERIC Educational Resources Information Center
Hong, Feng
2009-01-01
Microarray is a high throughput technology to measure the gene expression. Analysis of microarray data brings many interesting and challenging problems. This thesis consists three studies related to microarray data. First, we propose a Bayesian model for microarray data and use Bayes Factors to identify differentially expressed genes. Second, we…
AAMC Data Book. Statistical Information Related to Medical Education.
ERIC Educational Resources Information Center
Jolly, Paul, Ed.; Hudley, Dorothea M., Ed.
This 1994 version of an annual data book on United States medical education offers extensive data on 12 topics which are fundamental or most frequently requested. Data sources include the Association of American Medical Colleges, the Educational Commission for Foreign Medical Graduates, the National Institutes of Health, Health Care Financing…
Spatial Statistical Data Fusion for Remote Sensing Applications
NASA Technical Reports Server (NTRS)
Nguyen, Hai
2010-01-01
Data fusion is the process of combining information from heterogeneous sources into a single composite picture of the relevant process, such that the composite picture is generally more accurate and complete than that derived from any single source alone. Data collection is often incomplete, sparse, and yields incompatible information. Fusion techniques can make optimal use of such data. When investment in data collection is high, fusion gives the best return. Our study uses data from two satellites: (1) Multiangle Imaging SpectroRadiometer (MISR), (2) Moderate Resolution Imaging Spectroradiometer (MODIS).
NASA Astrophysics Data System (ADS)
Ivanova, A.; Lueth, S.
2015-12-01
Petrophysical investigations for CCS concern relationships between physical properties of rocks and geophysical observations for understanding behavior of injected CO2 in a geological formation. In turn 4D seismic surveying is a proven tool for CO2 monitoring. At the Ketzin pilot site (Germany) 4D seismic data have been acquired by means of a baseline (pre-injection) survey in 2005 and monitor surveys in 2009 and 2012. At Ketzin CO2 was injected in supercritical state from 2008 to 2013 in a sandstone saline aquifer (Stuttgart Formation) at a depth of about 650 m. The 4D seismic data from Ketzin reflected a pronounced effect of this injection. Seismic forward modeling using results of petrophysical experiments on two core samples fromthe target reservoir confirmed that effects of the injected CO2 on the 4D seismic data are significant. The petrophysical data were used in that modeling in order to reflect changes due to the CO2 injection in acoustic parameters of the reservoir. These petrophysical data were further used for a successful quantitative interpretation of the 4D seismic data at Ketzin. Now logs from a well (drilled in 2012) penetrating the reservoir containing information about changes in the acoustic parameters of the reservoir due to the CO2 injection are available. These logs were used to estimate impact of the petrophysical data on the qualitative and quantitative interpretation of the 4D seismic data at Ketzin. New synthetic seismograms were computed using the same software and the same wavelet as the old ones apart from the only difference and namely the changes in the input acoustic parameters would not be affected with any petrophysical experiments anymore. Now these changes were put in computing directly from the logs. In turn the new modelled changes due to the injection in the newly computed seismograms do not include any effects of the petrophysical data anymore. Key steps of the quantitative and qualitative interpretation of the 4D seismic
NASA Astrophysics Data System (ADS)
Yuan, Y.
2015-12-01
Boundary identification is a requested task in the interpretation of potential-field data, which has been widely used as a tool in exploration technologies for mineral resources. The main geological edges are fault lines and the borders of geological or rock bodies of different density, magnetic nature, and so on. Gravity gradient tensor data have been widely used in geophysical exploration for its large amount of information and containing higher frequency signals than gravity data, which can be used to delineate small scale anomalies. Therefore, combining multiple components of gradient tensor data to interpret gravity gradient tensor data is a challenge. This needs to develop new edge detector to process the gravity gradient tensor data. In order to make use of multiple components information, we first define directional total horizontal derivatives and enhanced directional total horizontal derivatives and use them to define new edge detectors. In order to display the edges of different amplitudes anomalies simultaneously, we present a normalization method. These methods have been tested on synthetic data to verify that the new methods can delineate the edges of different amplitude anomalies clearly and avoid bringing additional false edges when anomalies contain both positive and negative anomalies. Finally, we apply these methods to real full gravity gradient tensor data in St. Georges Bay, Canada, which get well results.
Air pollutant interactions with vegetation: research needs in data acquisition and interpretation
Lindberg, S. E.; McLauglin, S. B.
1980-01-01
The objective of this discussion is to consider problems involved in the acquisition, interpretation, and application of data collected in studies of air pollutant interactions with the terrestrial environment. Emphasis will be placed on a critical evaluation of current deficiencies and future research needs by addressing the following questions: (1) which pollutants are either sufficiently toxic, pervasive, or persistent to warrant the expense of monitoring and effects research; (2) what are the interactions of multiple pollutants during deposition and how do these influence toxicity; (3) how de we collect, report, and interpret deposition and air quality data to ensure its maximum utility in assessment of potential regional environmental effects; (4) what processes do we study, and how are they measured to most efficiently describe the relationship between air quality dose and ultimate impacts on terrestrial ecosystems; and (5) how do we integrate site-specific studies into regional estimates of present and potential environmental degradation (or benefit).
Morse, V.C.; Johnson, J.H.; Crittenden, J.L.; Anderson, T.D.
1986-05-01
There are successes and failures in recording and interpreting a single seismic line across the South Owl Creek Mountain fault on the west flank of the Casper arch. Information obtained from this type of work should help explorationists who are exploring structurally complex areas. A depth cross section lacks a subthrust prospect, but is illustrated to show that the South Owl Creek Mountain fault is steeper with less apparent displacement than in areas to the north. This cross section is derived from two-dimensional seismic modeling, using data processing methods specifically for modeling. A flat horizon and balancing technique helps confirm model accuracy. High-quality data were acquired using specifically designed seismic field parameters. The authors concluded that the methodology used is valid, and an interactive modeling program in addition to cross-line control can improve seismic interpretations in structurally complex areas.
Interpretation and Research On Landuse Based On Landsat 7 ETM Plus Remote Sensing Data
NASA Astrophysics Data System (ADS)
Hong, Haoyuan; Xu, Chong; Liu, Ximing; Chen, Wei
2016-10-01
The change of landuse is very important factor in environmental. For example, it has significant relationship with the natural disaster such as landslide. With the development of technology about GIS and RS, the interpretation of landuse become easy and accuracy. This paper based on Landsat 7 ETM + data, combined Xiushui County, Jiangxi Province of geographic information base data using maximum likelihood classification method however, the minimum distance, secession law, ISODATA so on Xiushui county land use remote sensing interpretation. The results show that the maximum likelihood classification accuracy of the overall evaluation of the maximum likelihood method is to examine the method best suited to the region. The result of this study was the use of land may be provided to decision makers and land-use planning.
A wavelet-based statistical analysis of FMRI data: I. motivation and data distribution modeling.
Dinov, Ivo D; Boscardin, John W; Mega, Michael S; Sowell, Elizabeth L; Toga, Arthur W
2005-01-01
We propose a new method for statistical analysis of functional magnetic resonance imaging (fMRI) data. The discrete wavelet transformation is employed as a tool for efficient and robust signal representation. We use structural magnetic resonance imaging (MRI) and fMRI to empirically estimate the distribution of the wavelet coefficients of the data both across individuals and spatial locations. An anatomical subvolume probabilistic atlas is used to tessellate the structural and functional signals into smaller regions each of which is processed separately. A frequency-adaptive wavelet shrinkage scheme is employed to obtain essentially optimal estimations of the signals in the wavelet space. The empirical distributions of the signals on all the regions are computed in a compressed wavelet space. These are modeled by heavy-tail distributions because their histograms exhibit slower tail decay than the Gaussian. We discovered that the Cauchy, Bessel K Forms, and Pareto distributions provide the most accurate asymptotic models for the distribution of the wavelet coefficients of the data. Finally, we propose a new model for statistical analysis of functional MRI data using this atlas-based wavelet space representation. In the second part of our investigation, we will apply this technique to analyze a large fMRI dataset involving repeated presentation of sensory-motor response stimuli in young, elderly, and demented subjects.
Interpretation of Lidar and Satellite Data Sets Using a Global Photochemical Model
NASA Technical Reports Server (NTRS)
Zenker, Thomas; Chyba, Thomas
1999-01-01
A primary goal of the NASA Tropospheric Chemistry Program (TCP) is to "contribute substantially to scientific understanding of human impacts on the global troposphere". In order to analyze global or regional trends and factors of the troposphere chemistry, for example, its oxidation capacity or composition, a continuous global/regional data coverage as well as model simulations are needed. The Global Tropospheric Experiment (GTE), a major component of the TCP, provides data vital to these questions via aircraft measurement of key trace chemical species in various remote regions of the world. Another component in NASA's effort are satellite projects for exploration of tropospheric chemistry and dynamics. A unique data product is the Tropospheric Ozone Residual (TOR) utilizing global tropospheric ozone data. Another key research tool are simulation studies of atmospheric chemistry and dynamics for the theoretical understanding of the atmosphere, the extrapolation of observed trends, and for sensitivity studies assessing a changing anthropogenic impact to air chemistry and climate. In the context with model simulations, field data derived from satellites or (airborne) field missions are needed for two purposes: 1. To initialize and validate model simulations, and 2., to interpret field data by comparison to model simulation results in order to analyze global or regional trends and deviations from standard tropospheric chemistry and transport conditions as defined by the simulations. Currently, there is neither a sufficient global data coverage available nor are existing well established global circulation models. The NASA LARC CTM model is currently not yet in a state to accomplish a sufficient tropospheric chemistry simulation, so that the current research under this cooperative agreement focuses on utilizing field data products for direct interpretation. They will be also available for model testing and a later interpretation with a finally utilized model.
Czaplewski, Raymond L
2015-09-17
Wall-to-wall remotely sensed data are increasingly available to monitor landscape dynamics over large geographic areas. However, statistical monitoring programs that use post-stratification cannot fully utilize those sensor data. The Kalman filter (KF) is an alternative statistical estimator. I develop a new KF algorithm that is numerically robust with large numbers of study variables and auxiliary sensor variables. A National Forest Inventory (NFI) illustrates application within an official statistics program. Practical recommendations regarding remote sensing and statistical issues are offered. This algorithm has the potential to increase the value of synoptic sensor data for statistical monitoring of large geographic areas.
Bias and Sensitivity in the Placement of Fossil Taxa Resulting from Interpretations of Missing Data
Sansom, Robert S.
2015-01-01
The utility of fossils in evolutionary contexts is dependent on their accurate placement in phylogenetic frameworks, yet intrinsic and widespread missing data make this problematic. The complex taphonomic processes occurring during fossilization can make it difficult to distinguish absence from non-preservation, especially in the case of exceptionally preserved soft-tissue fossils: is a particular morphological character (e.g., appendage, tentacle, or nerve) missing from a fossil because it was never there (phylogenetic absence), or just happened to not be preserved (taphonomic loss)? Missing data have not been tested in the context of interpretation of non-present anatomy nor in the context of directional shifts and biases in affinity. Here, complete taxa, both simulated and empirical, are subjected to data loss through the replacement of present entries (1s) with either missing (?s) or absent (0s) entries. Both cause taxa to drift down trees, from their original position, toward the root. Absolute thresholds at which downshift is significant are extremely low for introduced absences (two entries replaced, 6% of present characters). The opposite threshold in empirical fossil taxa is also found to be low; two absent entries replaced with presences causes fossil taxa to drift up trees. As such, only a few instances of non-preserved characters interpreted as absences will cause fossil organisms to be erroneously interpreted as more primitive than they were in life. This observed sensitivity to coding non-present morphology presents a problem for all evolutionary studies that attempt to use fossils to reconstruct rates of evolution or unlock sequences of morphological change. Stem-ward slippage, whereby fossilization processes cause organisms to appear artificially primitive, appears to be a ubiquitous and problematic phenomenon inherent to missing data, even when no decay biases exist. Absent characters therefore require explicit justification and taphonomic
Moessbauer Spectroscopy on the Martian Surface: Constraints on Interpretation of MER Data
NASA Technical Reports Server (NTRS)
Dyar, M. D.; Schaefer, M. W.
2003-01-01
Moessbauer spectrometers will be used on martian landers and rovers to identify and quantify relative amounts of Fe-bearing minerals, as well as to determine their Fe(3+)/Fe(2+) ratios, allowing more realistic modeling of martian mineralogy and evolution. However, derivation of mineral modes, Fe(3+)/Fe(2+) ratios, and phase identification via Moessbauer spectroscopy (MS) does have limitations. We discuss here the exciting potential of MS for remote planetary exploration, as well as constraints on interpretation of remote Moessbauer data.
Bias and sensitivity in the placement of fossil taxa resulting from interpretations of missing data.
Sansom, Robert S
2015-03-01
The utility of fossils in evolutionary contexts is dependent on their accurate placement in phylogenetic frameworks, yet intrinsic and widespread missing data make this problematic. The complex taphonomic processes occurring during fossilization can make it difficult to distinguish absence from non-preservation, especially in the case of exceptionally preserved soft-tissue fossils: is a particular morphological character (e.g., appendage, tentacle, or nerve) missing from a fossil because it was never there (phylogenetic absence), or just happened to not be preserved (taphonomic loss)? Missing data have not been tested in the context of interpretation of non-present anatomy nor in the context of directional shifts and biases in affinity. Here, complete taxa, both simulated and empirical, are subjected to data loss through the replacement of present entries (1s) with either missing (?s) or absent (0s) entries. Both cause taxa to drift down trees, from their original position, toward the root. Absolute thresholds at which downshift is significant are extremely low for introduced absences (two entries replaced, 6% of present characters). The opposite threshold in empirical fossil taxa is also found to be low; two absent entries replaced with presences causes fossil taxa to drift up trees. As such, only a few instances of non-preserved characters interpreted as absences will cause fossil organisms to be erroneously interpreted as more primitive than they were in life. This observed sensitivity to coding non-present morphology presents a problem for all evolutionary studies that attempt to use fossils to reconstruct rates of evolution or unlock sequences of morphological change. Stem-ward slippage, whereby fossilization processes cause organisms to appear artificially primitive, appears to be a ubiquitous and problematic phenomenon inherent to missing data, even when no decay biases exist. Absent characters therefore require explicit justification and taphonomic
Metabox: A Toolbox for Metabolomic Data Analysis, Interpretation and Integrative Exploration.
Wanichthanarak, Kwanjeera; Fan, Sili; Grapov, Dmitry; Barupal, Dinesh Kumar; Fiehn, Oliver
2017-01-01
Similar to genomic and proteomic platforms, metabolomic data acquisition and analysis is becoming a routine approach for investigating biological systems. However, computational approaches for metabolomic data analysis and integration are still maturing. Metabox is a bioinformatics toolbox for deep phenotyping analytics that combines data processing, statistical analysis, functional analysis and integrative exploration of metabolomic data within proteomic and transcriptomic contexts. With the number of options provided in each analysis module, it also supports data analysis of other 'omic' families. The toolbox is an R-based web application, and it is freely available at http://kwanjeeraw.github.io/metabox/ under the GPL-3 license.
Metabox: A Toolbox for Metabolomic Data Analysis, Interpretation and Integrative Exploration
Grapov, Dmitry; Barupal, Dinesh Kumar
2017-01-01
Similar to genomic and proteomic platforms, metabolomic data acquisition and analysis is becoming a routine approach for investigating biological systems. However, computational approaches for metabolomic data analysis and integration are still maturing. Metabox is a bioinformatics toolbox for deep phenotyping analytics that combines data processing, statistical analysis, functional analysis and integrative exploration of metabolomic data within proteomic and transcriptomic contexts. With the number of options provided in each analysis module, it also supports data analysis of other ‘omic’ families. The toolbox is an R-based web application, and it is freely available at http://kwanjeeraw.github.io/metabox/ under the GPL-3 license. PMID:28141874
Method of identifying clusters representing statistical dependencies in multivariate data
NASA Technical Reports Server (NTRS)
Borucki, W. J.; Card, D. H.; Lyle, G. C.
1975-01-01
Approach is first to cluster and then to compute spatial boundaries for resulting clusters. Next step is to compute, from set of Monte Carlo samples obtained from scrambled data, estimates of probabilities of obtaining at least as many points within boundaries as were actually observed in original data.
Pomeau, Yves; Louët, Sabine
2016-06-01
During the StatPhys Conference on 20th July 2016 in Lyon, France, Yves Pomeau and Daan Frenkel will be awarded the most important prize in the field of Statistical Mechanics: the 2016 Boltzmann Medal, named after the Austrian physicist and philosopher Ludwig Boltzmann. The award recognises Pomeau's key contributions to the Statistical Physics of non-equilibrium phenomena in general. And, in particular, for developing our modern understanding of fluid mechanics, instabilities, pattern formation and chaos. He is recognised as an outstanding theorist bridging disciplines from applied mathematics to statistical physics with a profound impact on the neighbouring fields of turbulence and mechanics. In the article Sabine Louët interviews Pomeau, who is an Editor for the European Physical Journal Special Topics. He shares his views and tells how he experienced the rise of Statistical Mechanics in the past few decades. He also touches upon the need to provide funding to people who have the rare ability to discover new things and ideas, and not just those who are good at filling in grant application forms.
Avalanche statistics from data with low time resolution
NASA Astrophysics Data System (ADS)
LeBlanc, Michael; Nawano, Aya; Wright, Wendelin J.; Gu, Xiaojun; Uhl, J. T.; Dahmen, Karin A.
2016-11-01
Extracting avalanche distributions from experimental microplasticity data can be hampered by limited time resolution. We compute the effects of low time resolution on avalanche size distributions and give quantitative criteria for diagnosing and circumventing problems associated with low time resolution. We show that traditional analysis of data obtained at low acquisition rates can lead to avalanche size distributions with incorrect power-law exponents or no power-law scaling at all. Furthermore, we demonstrate that it can lead to apparent data collapses with incorrect power-law and cutoff exponents. We propose new methods to analyze low-resolution stress-time series that can recover the size distribution of the underlying avalanches even when the resolution is so low that naive analysis methods give incorrect results. We test these methods on both downsampled simulation data from a simple model and downsampled bulk metallic glass compression data and find that the methods recover the correct critical exponents.
Avalanche statistics from data with low time resolution.
LeBlanc, Michael; Nawano, Aya; Wright, Wendelin J; Gu, Xiaojun; Uhl, J T; Dahmen, Karin A
2016-11-01
Extracting avalanche distributions from experimental microplasticity data can be hampered by limited time resolution. We compute the effects of low time resolution on avalanche size distributions and give quantitative criteria for diagnosing and circumventing problems associated with low time resolution. We show that traditional analysis of data obtained at low acquisition rates can lead to avalanche size distributions with incorrect power-law exponents or no power-law scaling at all. Furthermore, we demonstrate that it can lead to apparent data collapses with incorrect power-law and cutoff exponents. We propose new methods to analyze low-resolution stress-time series that can recover the size distribution of the underlying avalanches even when the resolution is so low that naive analysis methods give incorrect results. We test these methods on both downsampled simulation data from a simple model and downsampled bulk metallic glass compression data and find that the methods recover the correct critical exponents.
Statistical modeling for visualization evaluation through data fusion.
Chen, Xiaoyu; Jin, Ran
2017-01-19
There is a high demand of data visualization providing insights to users in various applications. However, a consistent, online visualization evaluation method to quantify mental workload or user preference is lacking, which leads to an inefficient visualization and user interface design process. Recently, the advancement of interactive and sensing technologies makes the electroencephalogram (EEG) signals, eye movements as well as visualization logs available in user-centered evaluation. This paper proposes a data fusion model and the application procedure for quantitative and online visualization evaluation. 15 participants joined the study based on three different visualization designs. The results provide a regularized regression model which can accurately predict the user's evaluation of task complexity, and indicate the significance of all three types of sensing data sets for visualization evaluation. This model can be widely applied to data visualization evaluation, and other user-centered designs evaluation and data analysis in human factors and ergonomics.
NASA Astrophysics Data System (ADS)
Patton, E. W.; Pinheiro, P.; McGuinness, D. L.
2014-12-01
We will describe the benefits we realized using semantic technologies to address the often challenging and resource intensive task of ontology alignment in service of data integration. Ontology alignment became relatively simple as we reused our existing semantic data integration framework, SemantEco. We work in the context of the Jefferson Project (JP), an effort to monitor and predict the health of Lake George in NY by deploying a large-scale sensor network in the lake, and analyzing the high-resolution sensor data. SemantEco is an open-source framework for building semantically-aware applications to assist users, particularly non-experts, in exploration and interpretation of integrated scientific data. SemantEco applications are composed of a set of modules that incorporate new datasets, extend the semantic capabilities of the system to integrate and reason about data, and provide facets for extending or controlling semantic queries. Whereas earlier SemantEco work focused on integration of water, air, and species data from government sources, we focus on redeploying it to provide a provenance-aware, semantic query and interpretation interface for JP's sensor data. By employing a minor alignment between SemantEco's ontology and the Human-Aware Sensor Network Ontology used to model the JP's sensor deployments, we were able to bring SemantEco's capabilities to bear on the JP sensor data and metadata. This alignment enabled SemantEco to perform the following tasks: (1) select JP datasets related to water quality; (2) understand how the JP's notion of water quality relates to water quality concepts in previous work; and (3) reuse existing SemantEco interactive data facets, e.g. maps and time series visualizations, and modules, e.g. the regulation module that interprets water quality data through the lens of various federal and state regulations. Semantic technologies, both as the engine driving SemantEco and the means of modeling the JP data, enabled us to rapidly
Statistical treatment of CO sub 2 data records. Technical memo
Elliott, W.P.
1989-05-01
The report describes the selection processes used by NOAA/GMCC for selecting background hourly average CO{sub 2} data from Mauna Loa Observatory. Selection involved three steps: a preliminary selection based on within hour variability of the CO{sub 2} concentration determined by checking the analog output of the CO{sub 2} analyzed on a strip chart recorder; an hour-to-hour concentration difference that rejects data which change by more than 0.25 ppm from one hour to the next; and a selection based on residuals from a spline fit. Examples are shown for the 1985 data, with emphasis on January and August.
Statistical analysis of general aviation VG-VGH data
NASA Technical Reports Server (NTRS)
Clay, L. E.; Dickey, R. L.; Moran, M. S.; Payauys, K. W.; Severyn, T. P.
1974-01-01
To represent the loads spectra of general aviation aircraft operating in the Continental United States, VG and VGH data collected since 1963 in eight operational categories were processed and analyzed. Adequacy of data sample and current operational categories, and parameter distributions required for valid data extrapolation were studied along with envelopes of equal probability of exceeding the normal load factor (n sub z) versus airspeed for gust and maneuver loads and the probability of exceeding current design maneuver, gust, and landing impact n sub z limits. The significant findings are included.
VanderHart, D L; McFadden, G B
1996-08-01
Proton spin diffusion data yield morphological information over dimensions covering approximately the 2-50 nm range. In this article, the interpretation of such data for polymers is emphasized, recognizing that the mathematical framework for much of this interpretation already exists in the literature. Practical issues are considered, for example, a useful scaling of plotted data is suggested, key attributes of the data are identified and ambiguities in the mapping of data into morphological models are spelled out. Discussion is limited to two-phase systems, where it is assumed that, by employing multiple-pulse methods polarization gradients can be generated, whose spunal sharpness is limited solety by the morphological definition of the interfaces. Interpretation of data in terms of morphology and stoichiometry is emphasized, where stoichiometric issues pertain only to chemically heterogeneous systems. Extraction of stoichiometric information from spin diffusion data is not commonly attempted; the discussion included herein allows for the possibility that the composition of phases may be chemically mixed. Methods for generating gradients are discussed only briefly. A standardized spin diffusion plot is proposed and the initial slope of this plot is tocussed on for providing information about morphology and stoichiometry. Ambiguities of interpretation considered include the dimensionality of the deduced morphology and, for systems with chemical heterogeneity the uniqueness of the compositional characterization of each phase. In addition, funite difference methods are used to simulate entire spin diffusion curves for idealized lamellar and hexagonal rod/matrix morphologies. Comparisons of these curves show that distinguishing 1-D and 2-D morphologies on the basis of experimental data is unlikely to be successful over the range of stoichiometrics where such morphologies are expected. Several examples of spin diffusion data are presented. Brief treatments of the
Tang, Qi-Yi; Zhang, Chuan-Xi
2013-04-01
A comprehensive but simple-to-use software package called DPS (Data Processing System) has been developed to execute a range of standard numerical analyses and operations used in experimental design, statistics and data mining. This program runs on standard Windows computers. Many of the functions are specific to entomological and other biological research and are not found in standard statistical software. This paper presents applications of DPS to experimental design, statistical analysis and data mining in entomology.
ERIC Educational Resources Information Center
Raker, Jeffrey R.; Holme, Thomas A.
2014-01-01
A cluster analysis was conducted with a set of survey data on chemistry faculty familiarity with 13 assessment terms. Cluster groupings suggest a high, middle, and low overall familiarity with the terminology and an independent high and low familiarity with terms related to fundamental statistics. The six resultant clusters were found to be…
NASA Astrophysics Data System (ADS)
Congedo, Marco; Barachant, Alexandre
2015-01-01
Currently the Riemannian geometry of symmetric positive definite (SPD) matrices is gaining momentum as a powerful tool in a wide range of engineering applications such as image, radar and biomedical data signal processing. If the data is not natively represented in the form of SPD matrices, typically we may summarize them in such form by estimating covariance matrices of the data. However once we manipulate such covariance matrices on the Riemannian manifold we lose the representation in the original data space. For instance, we can evaluate the geometric mean of a set of covariance matrices, but not the geometric mean of the data generating the covariance matrices, the space of interest in which the geometric mean can be interpreted. As a consequence, Riemannian information geometry is often perceived by non-experts as a "black-box" tool and this perception prevents a wider adoption in the scientific community. Hereby we show that we can overcome this limitation by constructing a special form of SPD matrix embedding both the covariance structure of the data and the data itself. Incidentally, whenever the original data can be represented in the form of a generic data matrix (not even square), this special SPD matrix enables an exhaustive and unique description of the data up to second-order statistics. This is achieved embedding the covariance structure of both the rows and columns of the data matrix, allowing naturally a wide range of possible applications and bringing us over and above just an interpretability issue. We demonstrate the method by manipulating satellite images (pansharpening) and event-related potentials (ERPs) of an electroencephalography brain-computer interface (BCI) study. The first example illustrates the effect of moving along geodesics in the original data space and the second provides a novel estimation of ERP average (geometric mean), showing that, in contrast to the usual arithmetic mean, this estimation is robust to outliers. In
Statistical Models for Nonexperimental Data: A Comment on Freedman.
ERIC Educational Resources Information Center
Fox, John
1987-01-01
Examines D. A. Freedman's criticism of path analysis, agreeing with Freedman's criticism of its application to nonexperimental data in the social sciences. Argues that Freedman's overall conclusions, however, are too pessimistic. (RB)
An interpretation model of GPR point data in tunnel geological prediction
NASA Astrophysics Data System (ADS)
He, Yu-yao; Li, Bao-qi; Guo, Yuan-shu; Wang, Teng-na; Zhu, Ya
2017-02-01
GPR (Ground Penetrating Radar) point data plays an absolutely necessary role in the tunnel geological prediction. However, the research work on the GPR point data is very little and the results does not meet the actual requirements of the project. In this paper, a GPR point data interpretation model which is based on WD (Wigner distribution) and deep CNN (convolutional neural network) is proposed. Firstly, the GPR point data is transformed by WD to get the map of time-frequency joint distribution; Secondly, the joint distribution maps are classified by deep CNN. The approximate location of geological target is determined by observing the time frequency map in parallel; Finally, the GPR point data is interpreted according to the classification results and position information from the map. The simulation results show that classification accuracy of the test dataset (include 1200 GPR point data) is 91.83% at the 200 iteration. Our model has the advantages of high accuracy and fast training speed, and can provide a scientific basis for the development of tunnel construction and excavation plan.
Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 2.1.
Deutsch, Eric W; Overall, Christopher M; Van Eyk, Jennifer E; Baker, Mark S; Paik, Young-Ki; Weintraub, Susan T; Lane, Lydie; Martens, Lennart; Vandenbrouck, Yves; Kusebauch, Ulrike; Hancock, William S; Hermjakob, Henning; Aebersold, Ruedi; Moritz, Robert L; Omenn, Gilbert S
2016-11-04
Every data-rich community research effort requires a clear plan for ensuring the quality of the data interpretation and comparability of analyses. To address this need within the Human Proteome Project (HPP) of the Human Proteome Organization (HUPO), we have developed through broad consultation a set of mass spectrometry data interpretation guidelines that should be applied to all HPP data contributions. For submission of manuscripts reporting HPP protein identification results, the guidelines are presented as a one-page checklist containing 15 essential points followed by two pages of expanded description of each. Here we present an overview of the guidelines and provide an in-depth description of each of the 15 elements to facilitate understanding of the intentions and rationale behind the guidelines, for both authors and reviewers. Broadly, these guidelines provide specific directions regarding how HPP data are to be submitted to mass spectrometry data repositories, how error analysis should be presented, and how detection of novel proteins should be supported with additional confirmatory evidence. These guidelines, developed by the HPP community, are presented to the broader scientific community for further discussion.
Correlation imaging method based on local wavenumber for interpreting magnetic data
NASA Astrophysics Data System (ADS)
Ma, Guoqing; Liu, Cai; Xu, Jiashu; Meng, Qingfa
2017-03-01
Depth estimation is a general task in the interpretation of magnetic data, and local wavenumber is an effective tool to accomplish this task, but this method requires the structural index of causative source when applies it to compute the depth of the source, which is hard to obtain for an unknown area. In this paper, we suggested a correlation imaging method to interpret magnetic data, which uses the correlation coefficient of local wavenumber of real magnetic data and transformative local wavenumber of synthetic magnetic data generated by assumed source to estimate the location of the source, and this method does not require any priori information of the source and does not require solving any matrix. The computation steps as follows, first, we assume that the causative sources are distributed regularly as a rectangular grid, and then separately compute the correlation coefficient of the local wavenumber of real data and local wavenumber of the anomaly generated by each assumed source, and the correlation coefficient gets maximum when the location parameters of the assumed source are in accord with the true locations of real sources. The synthetic tested results show that this method can obtain the location of magnetic source effectively and correctly, and is insensitive to magnetization direction and noise. This method is also applied to measured magnetic data, and get the location parameters of the source.
ERIC Educational Resources Information Center
Savalei, Victoria
2010-01-01
Incomplete nonnormal data are common occurrences in applied research. Although these 2 problems are often dealt with separately by methodologists, they often cooccur. Very little has been written about statistics appropriate for evaluating models with such data. This article extends several existing statistics for complete nonnormal data to…
ERIC Educational Resources Information Center
Carter, Jackie; Noble, Susan; Russell, Andrew; Swanson, Eric
2011-01-01
Increasing volumes of statistical data are being made available on the open web, including from the World Bank. This "data deluge" provides both opportunities and challenges. Good use of these data requires statistical literacy. This paper presents results from a project that set out to better understand how socioeconomic secondary data…
Statistical Machine Learning for Structured and High Dimensional Data
2014-09-17
which generalizes Stein’s unbiased risk estimate (SURE) to Wishart distributions. The resulting estimator is free of any tuning parameters, and enjoys...theory. We have analyzed the case of the normal means within a Sobolev ellipsoid, which is a standard setup in nonparametric regression. Our results ...data analysis problems. In particular, we have been working with data from the Kepler telescope for finding exoplanets orbiting distant stars. The
Statistical Inference: The Big Picture.
Kass, Robert E
2011-02-01
Statistics has moved beyond the frequentist-Bayesian controversies of the past. Where does this leave our ability to interpret results? I suggest that a philosophy compatible with statistical practice, labelled here statistical pragmatism, serves as a foundation for inference. Statistical pragmatism is inclusive and emphasizes the assumptions that connect statistical models with observed data. I argue that introductory courses often mis-characterize the process of statistical inference and I propose an alternative "big picture" depiction.
Student Reasoning from Data Tables: Data Interpretation in Light of Student Ability and Prior Belief
NASA Astrophysics Data System (ADS)
Bogdan, Abigail Marie
Here I present my work studying introductory physics students proficiency with the control of variables strategy to evaluate simple data tables. In this research, a primary goal was to identify and to describe the reasoning strategies that students use preferentially when evaluating simple data tables where the control of variables strategy is the normative evaluation strategy. In addition, I aimed to identify and describe the factors that affect students reasoning strategies when analyzing these simple data tables. In a series of experiments, I tested 1,360 introductory physics students, giving them simple tables of experimental data to analyze. Generally, each of the experiments that I conducted had two conditions. In both of these conditions, the data filling the tables was identical; however, in the first condition, the data table was presented in a physical context and students were given a short pre-test to measure their beliefs about the context. In the second condition, the table was given in a more generic context. This was repeated with multiple data tables and physical contexts. In addition to the data table task, students were given several measures of cognitive ability. By using students answers on the pretest about physical context, I was able to measure whether or not each students prior beliefs were consistent with the relationships shown in the data tables. Across all the experiments conducted here, I found that those students whose prior beliefs were consistent with the data were over three times more likely to draw a valid inference from the table than students whose prior beliefs were inconsistent with the data. By further analyzing students responses, I found evidence that this difference in performance could be accounted for by the presence of a belief bias. Students tended to cite data in suboptimal ways, frequently treating their own theories as a source of evidence to be supplemented by or illustrated with examples from the data. Because of
ROOT — A C++ framework for petabyte data storage, statistical analysis and visualization
NASA Astrophysics Data System (ADS)
Antcheva, I.; Ballintijn, M.; Bellenot, B.; Biskup, M.; Brun, R.; Buncic, N.; Canal, Ph.; Casadei, D.; Couet, O.; Fine, V.; Franco, L.; Ganis, G.; Gheata, A.; Maline, D. Gonzalez; Goto, M.; Iwaszkiewicz, J.; Kreshuk, A.; Segura, D. Marcos; Maunder, R.; Moneta, L.; Naumann, A.; Offermann, E.; Onuchin, V.; Panacek, S.; Rademakers, F.; Russo, P.; Tadel, M.
2009-12-01
ROOT is an object-oriented C++ framework conceived in the high-energy physics (HEP) community, designed for storing and analyzing petabytes of data in an efficient way. Any instance of a C++ class can be stored into a ROOT file in a machine-independent compressed binary format. In ROOT the TTree object container is optimized for statistical data analysis over very large data sets by using vertical data storage techniques. These containers can span a large number of files on local disks, the web, or a number of different shared file systems. In order to analyze this data, the user can chose out of a wide set of mathematical and statistical functions, including linear algebra classes, numerical algorithms such as integration and minimization, and various methods for performing regression analysis (fitting). In particular, the RooFit package allows the user to perform complex data modeling and fitting while the RooStats library provides abstractions and implementations for advanced statistical tools. Multivariate classification methods based on machine learning techniques are available via the TMVA package. A central piece in these analysis tools are the histogram classes which provide binning of one- and multi-dimensional data. Results can be saved in high-quality graphical formats like Postscript and PDF or in bitmap formats like JPG or GIF. The result can also be stored into ROOT macros that allow a full recreation and rework of the graphics. Users typically create their analysis macros step by step, making use of the interactive C++ interpreter CINT, while running over small data samples. Once the development is finished, they can run these macros at full compiled speed over large data sets, using on-the-fly compilation, or by creating a stand-alone batch program. Finally, if processing farms are available, the user can reduce the execution time of intrinsically parallel tasks — e.g. data mining in HEP — by using PROOF, which will take care of optimally
Reasoning from Data: How Students Collect and Interpret Data in Science Investigations
ERIC Educational Resources Information Center
Kanari, Zoe; Millar, Robin
2004-01-01
This study explored the understandings of data and measurement that school students draw upon, and the ways that they reason from data, when carrying out a practical science inquiry task. The two practical tasks used in the study each involved investigations of the relationships between two independent variables (IVs) and a dependent variable…
NASA Technical Reports Server (NTRS)
Dunn, A. R.
1975-01-01
Computer techniques for data analysis of sunspot observations are presented. Photographic spectra were converted to digital form and analyzed. Methods of determining magnetic field strengths, i.e., the Zeeman effect, are discussed. Errors originating with telescope equipment and the magnetograph are treated. Flow charts of test programs and procedures of the data analysis are shown.
49 CFR Schedule G to Subpart B of... - Selected Statistical Data
Code of Federal Regulations, 2011 CFR
2011-10-01
... 49 Transportation 8 2011-10-01 2011-10-01 false Selected Statistical Data G Schedule G to Subpart... Statistical Data () Greyhound Lines, Inc. () Trailways combined () All study carriers Line No. and Item (a... Schedule G is to develop selected property, labor and operational data for use in evaluating the...