A statistical model for interpreting computerized dynamic posturography data
NASA Technical Reports Server (NTRS)
Feiveson, Alan H.; Metter, E. Jeffrey; Paloski, William H.
2002-01-01
Computerized dynamic posturography (CDP) is widely used for assessment of altered balance control. CDP trials are quantified using the equilibrium score (ES), which ranges from zero to 100, as a decreasing function of peak sway angle. The problem of how best to model and analyze ESs from a controlled study is considered. The ES often exhibits a skewed distribution in repeated trials, which can lead to incorrect inference when applying standard regression or analysis of variance models. Furthermore, CDP trials are terminated when a patient loses balance. In these situations, the ES is not observable, but is assigned the lowest possible score--zero. As a result, the response variable has a mixed discrete-continuous distribution, further compromising inference obtained by standard statistical methods. Here, we develop alternative methodology for analyzing ESs under a stochastic model extending the ES to a continuous latent random variable that always exists, but is unobserved in the event of a fall. Loss of balance occurs conditionally, with probability depending on the realized latent ES. After fitting the model by a form of quasi-maximum-likelihood, one may perform statistical inference to assess the effects of explanatory variables. An example is provided, using data from the NIH/NIA Baltimore Longitudinal Study on Aging.
Misuse of statistics in the interpretation of data on low-level radiation
Hamilton, L.D.
1982-01-01
Four misuses of statistics in the interpretation of data of low-level radiation are reviewed: (1) post-hoc analysis and aggregation of data leading to faulty conclusions in the reanalysis of genetic effects of the atomic bomb, and premature conclusions on the Portsmouth Naval Shipyard data; (2) inappropriate adjustment for age and ignoring differences between urban and rural areas leading to potentially spurious increase in incidence of cancer at Rocky Flats; (3) hazard of summary statistics based on ill-conditioned individual rates leading to spurious association between childhood leukemia and fallout in Utah; and (4) the danger of prematurely published preliminary work with inadequate consideration of epidemiological problems - censored data - leading to inappropriate conclusions, needless alarm at the Portsmouth Naval Shipyard, and diversion of scarce research funds.
Phoenix, S.L.; Wu, E.M.
1983-03-01
This paper presents some new data on the strength and stress-rupture of Kevlar-49 fibers, fiber/epoxy strands and pressure vessels, and consolidated data obtained at LLNL over the past 10 years. This data are interpreted by using recent theoretical results from a micromechanical model of the statistical failure process, thereby gaining understanding of the roles of the epoxy matrix and ultraviolet radiation on long term lifetime.
Tasker, Gary D.; Granato, Gregory E.
2000-01-01
Decision makers need viable methods for the interpretation of local, regional, and national-highway runoff and urban-stormwater data including flows, concentrations and loads of chemical constituents and sediment, potential effects on receiving waters, and the potential effectiveness of various best management practices (BMPs). Valid (useful for intended purposes), current, and technically defensible stormwater-runoff models are needed to interpret data collected in field studies, to support existing highway and urban-runoffplanning processes, to meet National Pollutant Discharge Elimination System (NPDES) requirements, and to provide methods for computation of Total Maximum Daily Loads (TMDLs) systematically and economically. Historically, conceptual, simulation, empirical, and statistical models of varying levels of detail, complexity, and uncertainty have been used to meet various data-quality objectives in the decision-making processes necessary for the planning, design, construction, and maintenance of highways and for other land-use applications. Water-quality simulation models attempt a detailed representation of the physical processes and mechanisms at a given site. Empirical and statistical regional water-quality assessment models provide a more general picture of water quality or changes in water quality over a region. All these modeling techniques share one common aspect-their predictive ability is poor without suitable site-specific data for calibration. To properly apply the correct model, one must understand the classification of variables, the unique characteristics of water-resources data, and the concept of population structure and analysis. Classifying variables being used to analyze data may determine which statistical methods are appropriate for data analysis. An understanding of the characteristics of water-resources data is necessary to evaluate the applicability of different statistical methods, to interpret the results of these techniques
Statistics Translated: A Step-by-Step Guide to Analyzing and Interpreting Data
ERIC Educational Resources Information Center
Terrell, Steven R.
2012-01-01
Written in a humorous and encouraging style, this text shows how the most common statistical tools can be used to answer interesting real-world questions, presented as mysteries to be solved. Engaging research examples lead the reader through a series of six steps, from identifying a researchable problem to stating a hypothesis, identifying…
Statistical weld process monitoring with expert interpretation
Cook, G.E.; Barnett, R.J.; Strauss, A.M.; Thompson, F.M. Jr.
1996-12-31
A statistical weld process monitoring system is described. Using data of voltage, current, wire feed speed, gas flow rate, travel speed, and elapsed arc time collected while welding, the welding statistical process control (SPC) tool provides weld process quality control by implementing techniques of data trending analysis, tolerance analysis, and sequential analysis. For purposes of quality control, the control limits required for acceptance are specified in the weld procedure acceptance specifications. The control charts then provide quality assurance documentation for each weld. The statistical data trending analysis performed by the SPC program is not only valuable as a quality assurance monitoring and documentation system, it is also valuable in providing diagnostic assistance in troubleshooting equipment and material problems. Possible equipment/process problems are identified and matched with features of the SPC control charts. To aid in interpreting the voluminous statistical output generated by the SPC system, a large number of If-Then rules have been devised for providing computer-based expert advice for pinpointing problems based on out-of-limit variations of the control charts. The paper describes the SPC monitoring tool and the rule-based expert interpreter that has been developed for relating control chart trends to equipment/process problems.
The Malpractice of Statistical Interpretation
ERIC Educational Resources Information Center
Fraas, John W.; Newman, Isadore
1978-01-01
Problems associated with the use of gain scores, analysis of covariance, multicollinearity, part and partial correlation, and the lack of rectilinearity in regression are discussed. Particular attention is paid to the misuse of statistical techniques. (JKS)
Asfahani, Jamal
2014-02-01
Factor analysis technique is proposed in this research for interpreting the combination of nuclear well logging, including natural gamma ray, density and neutron-porosity, and the electrical well logging of long and short normal, in order to characterize the large extended basaltic areas in southern Syria. Kodana well logging data are used for testing and applying the proposed technique. The four resulting score logs enable to establish the lithological score cross-section of the studied well. The established cross-section clearly shows the distribution and the identification of four kinds of basalt which are hard massive basalt, hard basalt, pyroclastic basalt and the alteration basalt products, clay. The factor analysis technique is successfully applied on the Kodana well logging data in southern Syria, and can be used efficiently when several wells and huge well logging data with high number of variables are required to be interpreted.
The advent of new higher throughput analytical instrumentation has put a strain on interpreting and explaining the results from complex studies. Contemporary human, environmental, and biomonitoring data sets are comprised of tens or hundreds of analytes, multiple repeat measures...
Nash, J. Thomas; Frishman, David
1983-01-01
Analytical results for 61 elements in 370 samples from the Ranger Mine area are reported. Most of the rocks come from drill core in the Ranger No. 1 and Ranger No. 3 deposits, but 20 samples are from unmineralized drill core more than 1 km from ore. Statistical tests show that the elements Mg, Fe, F, Be, Co, Li, Ni, Pb, Sc, Th, Ti, V, CI, As, Br, Au, Ce, Dy, La Sc, Eu, Tb, Yb, and Tb have positive association with uranium, and Si, Ca, Na, K, Sr, Ba, Ce, and Cs have negative association. For most lithologic subsets Mg, Fe, Li, Cr, Ni, Pb, V, Y, Sm, Sc, Eu, and Yb are significantly enriched in ore-bearing rocks, whereas Ca, Na, K, Sr, Ba, Mn, Ce, and Cs are significantly depleted. These results are consistent with petrographic observations on altered rocks. Lithogeochemistry can aid exploration, but for these rocks requires methods that are expensive and not amenable to routine use.
Improve MWD data interpretation
Santley, D.J.; Ardrey, W.E.
1987-01-01
This article reports that measurement-while-drilling (MWD) technology is being used today in a broad range of real-time drilling applications. In its infancy, MWD was limited to providing directional survey and steering information. Today, the addition of formation sensors (resistivity, gamma) and drilling efficiency sensors (WOB, torque) has made MWD a much more useful drilling decision tool. In the process, the desirability of combining downhole MWD data with powerful analytical software and interpretive techniques has been recognized by both operators and service companies. However, the usual form in which MWD and wellsite analytical capabilities are combined leaves much to be desired. The most common approach is to incorporate MWD with large-scale computerized mud logging (CML) systems. Essentially, MWD decoding and display equipment is added to existing full-blown CML surface units.
ERIC Educational Resources Information Center
Bopp, Richard E.; Van Der Laan, Sharon J.
1985-01-01
Presents a search strategy for locating time-series or cross-sectional statistical data in published sources which was designed for undergraduate students who require 30 units of data for five separate variables in a statistical model. Instructional context and the broader applicability of the search strategy for general statistical research is…
Tuberculosis Data and Statistics
... Organization Chart Advisory Groups Federal TB Task Force Data and Statistics Language: English Español (Spanish) Recommend on ... United States publication. PDF [6 MB] Interactive TB Data Tool Online Tuberculosis Information System (OTIS) OTIS is ...
As watershed groups in the state of Georgia form and develop, they have a need for collecting, managing, and analyzing data associated with their watershed. Possible sources of data for flow, water quality, biology, habitat, and watershed characteristics include the U.S. Geologic...
Data collection and interpretation.
Citerio, Giuseppe; Park, Soojin; Schmidt, J Michael; Moberg, Richard; Suarez, Jose I; Le Roux, Peter D
2015-06-01
Patient monitoring is routinely performed in all patients who receive neurocritical care. The combined use of monitors, including the neurologic examination, laboratory analysis, imaging studies, and physiological parameters, is common in a platform called multi-modality monitoring (MMM). However, the full potential of MMM is only beginning to be realized since for the most part, decision making historically has focused on individual aspects of physiology in a largely threshold-based manner. The use of MMM now is being facilitated by the evolution of bio-informatics in critical care including developing techniques to acquire, store, retrieve, and display integrated data and new analytic techniques for optimal clinical decision making. In this review, we will discuss the crucial initial steps toward data and information management, which in this emerging era of data-intensive science is already shifting concepts of care for acute brain injury and has the potential to both reshape how we do research and enhance cost-effective clinical care. PMID:25846711
Data collection and interpretation.
Citerio, Giuseppe; Park, Soojin; Schmidt, J Michael; Moberg, Richard; Suarez, Jose I; Le Roux, Peter D
2015-06-01
Patient monitoring is routinely performed in all patients who receive neurocritical care. The combined use of monitors, including the neurologic examination, laboratory analysis, imaging studies, and physiological parameters, is common in a platform called multi-modality monitoring (MMM). However, the full potential of MMM is only beginning to be realized since for the most part, decision making historically has focused on individual aspects of physiology in a largely threshold-based manner. The use of MMM now is being facilitated by the evolution of bio-informatics in critical care including developing techniques to acquire, store, retrieve, and display integrated data and new analytic techniques for optimal clinical decision making. In this review, we will discuss the crucial initial steps toward data and information management, which in this emerging era of data-intensive science is already shifting concepts of care for acute brain injury and has the potential to both reshape how we do research and enhance cost-effective clinical care.
Statistical interpretation of “femtomolar” detection
Go, Jonghyun; Alam, Muhammad A.
2009-01-01
We calculate the statistics of diffusion-limited arrival-time distribution by a Monte Carlo method to suggest a simple statistical resolution of the enduring puzzle of nanobiosensors: a persistent gap between reports of analyte detection at approximately femtomolar concentration and theory suggesting the impossibility of approximately subpicomolar detection at the corresponding incubation time. The incubation time used in the theory is actually the mean incubation time, while experimental conditions suggest that device stability limited the minimum incubation time. The difference in incubation times—both described by characteristic power laws—provides an intuitive explanation of different detection limits anticipated by theory and experiments. PMID:19690630
Rossell, David
2016-01-01
Big Data brings unprecedented power to address scientific, economic and societal issues, but also amplifies the possibility of certain pitfalls. These include using purely data-driven approaches that disregard understanding the phenomenon under study, aiming at a dynamically moving target, ignoring critical data collection issues, summarizing or preprocessing the data inadequately and mistaking noise for signal. We review some success stories and illustrate how statistical principles can help obtain more reliable information from data. We also touch upon current challenges that require active methodological research, such as strategies for efficient computation, integration of heterogeneous data, extending the underlying theory to increasingly complex questions and, perhaps most importantly, training a new generation of scientists to develop and deploy these strategies. PMID:27722040
Max Born's Statistical Interpretation of Quantum Mechanics.
Pais, A
1982-12-17
In the summer of 1926, a statistical element was introduced for the first time in the fundamental laws of physics in two papers by Born. After a brief account of Born's earlier involvements with quantum physics, including his bringing the new mechanics to the United States, the motivation for and contents of Born's two papers are discussed. The reaction of his colleagues is described.
The Statistical Interpretation of Entropy: An Activity
ERIC Educational Resources Information Center
Timmberlake, Todd
2010-01-01
The second law of thermodynamics, which states that the entropy of an isolated macroscopic system can increase but will not decrease, is a cornerstone of modern physics. Ludwig Boltzmann argued that the second law arises from the motion of the atoms that compose the system. Boltzmann's statistical mechanics provides deep insight into the…
Interpreting Data: The Hybrid Mind
ERIC Educational Resources Information Center
Heisterkamp, Kimberly; Talanquer, Vicente
2015-01-01
The central goal of this study was to characterize major patterns of reasoning exhibited by college chemistry students when analyzing and interpreting chemical data. Using a case study approach, we investigated how a representative student used chemical models to explain patterns in the data based on structure-property relationships. Our results…
The Statistical Interpretation of Entropy: An Activity
NASA Astrophysics Data System (ADS)
Timmberlake, Todd
2010-11-01
The second law of thermodynamics, which states that the entropy of an isolated macroscopic system can increase but will not decrease, is a cornerstone of modern physics. Ludwig Boltzmann argued that the second law arises from the motion of the atoms that compose the system. Boltzmann's statistical mechanics provides deep insight into the functioning of the second law and also provided evidence for the existence of atoms at a time when many scientists (like Ernst Mach and Wilhelm Ostwald) were skeptical.
For a statistical interpretation of Helmholtz' thermal displacement
NASA Astrophysics Data System (ADS)
Podio-Guidugli, Paolo
2016-11-01
On moving from the classic papers by Einstein and Langevin on Brownian motion, two consistent statistical interpretations are given for the thermal displacement, a scalar field formally introduced by Helmholtz, whose time derivative is by definition the absolute temperature.
Paleomicrobiology Data: Authentification and Interpretation.
Drancourt, Michel
2016-06-01
The authenticity of some of the very first works in the field of paleopathology has been questioned, and standards have been progressively established for the experiments and the interpretation of data. Whereas most problems initially arose from the contamination of ancient specimens with modern human DNA, the situation is different in the field of paleomicrobiology, in which the risk for contamination is well-known and adequately managed by any laboratory team with expertise in the routine diagnosis of modern-day infections. Indeed, the exploration of ancient microbiota and pathogens is best done by such laboratory teams, with research directed toward the discovery and implementation of new techniques and the interpretation of data. PMID:27337456
The Statistical Interpretation of Classical Thermodynamic Heating and Expansion Processes
ERIC Educational Resources Information Center
Cartier, Stephen F.
2011-01-01
A statistical model has been developed and applied to interpret thermodynamic processes typically presented from the macroscopic, classical perspective. Through this model, students learn and apply the concepts of statistical mechanics, quantum mechanics, and classical thermodynamics in the analysis of the (i) constant volume heating, (ii)…
Muscular Dystrophy: Data and Statistics
... Statistics Recommend on Facebook Tweet Share Compartir MD STAR net Data and Statistics The following data and ... research [ Read Article ] For more information on MD STAR net see Research and Tracking . Key Findings Feature ...
Interpreting physicochemical experimental data sets.
Colclough, Nicola; Wenlock, Mark C
2015-09-01
With the wealth of experimental physicochemical data available to chemoinformaticians from the literature, commercial, and company databases an increasing challenge is the interpretation of such datasets. Subtle differences in experimental methodology used to generate these datasets can give rise to variations in physicochemical property values. Such methodology nuances will be apparent to an expert experimentalist but not necessarily to the data analyst and modeller. This paper describes the differences between common methodologies for measuring the four most important physicochemical properties namely aqueous solubility, octan-1-ol/water distribution coefficient, pK(a) and plasma protein binding highlighting key factors that can lead to systematic differences. Insight is given into how to identify datasets suitable for combining. PMID:26054297
Linda Stetzenbach; Lauren Nemnich; Davor Novosel
2009-08-31
Three independent tasks had been performed (Stetzenbach 2008, Stetzenbach 2008b, Stetzenbach 2009) to measure a variety of parameters in normative buildings across the United States. For each of these tasks 10 buildings were selected as normative indoor environments. Task 1 focused on office buildings, Task 13 focused on public schools, and Task 0606 focused on high performance buildings. To perform this task it was necessary to restructure the database for the Indoor Environmental Quality (IEQ) data and the Sound measurement as several issues were identified and resolved prior to and during the transfer of these data sets into SPSS. During overview discussions with the statistician utilized in this task it was determined that because the selection of indoor zones (1-6) was independently selected within each task; zones were not related by location across tasks. Therefore, no comparison would be valid across zones for the 30 buildings so the by location (zone) data were limited to three analysis sets of the buildings within each task. In addition, differences in collection procedures for lighting were used in Task 0606 as compared to Tasks 01 & 13 to improve sample collection. Therefore, these data sets could not be merged and compared so effects by-day data were run separately for Task 0606 and only Task 01 & 13 data were merged. Results of the statistical analysis of the IEQ parameters show statistically significant differences were found among days and zones for all tasks, although no differences were found by-day for Draft Rate data from Task 0606 (p>0.05). Thursday measurements of IEQ parameters were significantly different from Tuesday, and most Wednesday measures for all variables of Tasks 1 & 13. Data for all three days appeared to vary for Operative Temperature, whereas only Tuesday and Thursday differed for Draft Rate 1m. Although no Draft Rate measures within Task 0606 were found to significantly differ by-day, Temperature measurements for Tuesday and
Spirakis, C.S.; Pierson, C.T.; Santos, E.S.; Fishman, N.S.
1983-01-01
Statistical treatment of analytical data from 106 samples of uranium-mineralized and unmineralized or weakly mineralized rocks of the Morrison Formation from the northeastern part of the Church Rock area of the Grants uranium region indicates that along with uranium, the deposits in the northeast Church Rock area are enriched in barium, sulfur, sodium, vanadium and equivalent uranium. Selenium and molybdenum are sporadically enriched in the deposits and calcium, manganese, strontium, and yttrium are depleted. Unlike the primary deposits of the San Juan Basin, the deposits in the northeast part of the Church Rock area contain little organic carbon and several elements that are characteristically enriched in the primary deposits are not enriched or are enriched to a much lesser degree in the Church Rock deposits. The suite of elements associated with the deposits in the northeast part of the Church Rock area is also different from the suite of elements associated with the redistributed deposits in the Ambrosia Lake district. This suggests that the genesis of the Church Rock deposits is different, at least in part, from the genesis of the primary deposits of the San Juan Basin or the redistributed deposits at Ambrosia Lake.
NASA Astrophysics Data System (ADS)
Tema, E.; Zanella, E.; Pavón-Carrasco, F. J.; Kondopoulou, D.; Pavlides, S.
2015-10-01
We present the results of palaeomagnetic analysis on Late Bronge Age pottery from Santorini carried out in order to estimate the thermal effect of the Minoan eruption on the pre-Minoan habitation level. A total of 170 specimens from 108 ceramic fragments have been studied. The ceramics were collected from the surface of the pre-Minoan palaeosol at six different sites, including also samples from the Akrotiri archaeological site. The deposition temperatures of the first pyroclastic products have been estimated by the maximum overlap of the re-heating temperature intervals given by the individual fragments at site level. A new statistical elaboration of the temperature data has also been proposed, calculating at 95 per cent of probability the re-heating temperatures at each site. The obtained results show that the precursor tephra layer and the first pumice fall of the eruption were hot enough to re-heat the underlying ceramics at temperatures 160-230 °C in the non-inhabited sites while the temperatures recorded inside the Akrotiri village are slightly lower, varying from 130 to 200 °C. The decrease of the temperatures registered in the human settlements suggests that there was some interaction between the buildings and the pumice fallout deposits while probably the buildings debris layer caused by the preceding and syn-eruption earthquakes has also contributed to the decrease of the recorded re-heating temperatures.
Hahn, A.A.
1994-11-01
The complexity of instrumentation sometimes requires data analysis to be done before the result is presented to the control room. This tutorial reviews some of the theoretical assumptions underlying the more popular forms of data analysis and presents simple examples to illuminate the advantages and hazards of different techniques.
Spina Bifida Data and Statistics
... Materials About Us Information For... Media Policy Makers Data and Statistics Recommend on Facebook Tweet Share Compartir ... non-Hispanic white and non-Hispanic black women. Data from 12 state-based birth defects tracking programs ...
Birth Defects Data and Statistics
... Websites About Us Information For... Media Policy Makers Data & Statistics Language: English Español (Spanish) Recommend on Facebook ... of birth defects in the United States. For data on specific birth defects, please visit the specific ...
[Big data in official statistics].
Zwick, Markus
2015-08-01
The concept of "big data" stands to change the face of official statistics over the coming years, having an impact on almost all aspects of data production. The tasks of future statisticians will not necessarily be to produce new data, but rather to identify and make use of existing data to adequately describe social and economic phenomena. Until big data can be used correctly in official statistics, a lot of questions need to be answered and problems solved: the quality of data, data protection, privacy, and the sustainable availability are some of the more pressing issues to be addressed. The essential skills of official statisticians will undoubtedly change, and this implies a number of challenges to be faced by statistical education systems, in universities, and inside the statistical offices. The national statistical offices of the European Union have concluded a concrete strategy for exploring the possibilities of big data for official statistics, by means of the Big Data Roadmap and Action Plan 1.0. This is an important first step and will have a significant influence on implementing the concept of big data inside the statistical offices of Germany.
[Big data in official statistics].
Zwick, Markus
2015-08-01
The concept of "big data" stands to change the face of official statistics over the coming years, having an impact on almost all aspects of data production. The tasks of future statisticians will not necessarily be to produce new data, but rather to identify and make use of existing data to adequately describe social and economic phenomena. Until big data can be used correctly in official statistics, a lot of questions need to be answered and problems solved: the quality of data, data protection, privacy, and the sustainable availability are some of the more pressing issues to be addressed. The essential skills of official statisticians will undoubtedly change, and this implies a number of challenges to be faced by statistical education systems, in universities, and inside the statistical offices. The national statistical offices of the European Union have concluded a concrete strategy for exploring the possibilities of big data for official statistics, by means of the Big Data Roadmap and Action Plan 1.0. This is an important first step and will have a significant influence on implementing the concept of big data inside the statistical offices of Germany. PMID:26077871
Workplace Statistical Literacy for Teachers: Interpreting Box Plots
ERIC Educational Resources Information Center
Pierce, Robyn; Chick, Helen
2013-01-01
As a consequence of the increased use of data in workplace environments, there is a need to understand the demands that are placed on users to make sense of such data. In education, teachers are being increasingly expected to interpret and apply complex data about student and school performance, and, yet it is not clear that they always have the…
Pass-Fail Testing: Statistical Requirements and Interpretations
Gilliam, David; Leigh, Stefan; Rukhin, Andrew; Strawderman, William
2009-01-01
Performance standards for detector systems often include requirements for probability of detection and probability of false alarm at a specified level of statistical confidence. This paper reviews the accepted definitions of confidence level and of critical value. It describes the testing requirements for establishing either of these probabilities at a desired confidence level. These requirements are computable in terms of functions that are readily available in statistical software packages and general spreadsheet applications. The statistical interpretations of the critical values are discussed. A table is included for illustration, and a plot is presented showing the minimum required numbers of pass-fail tests. The results given here are applicable to one-sided testing of any system with performance characteristics conforming to a binomial distribution. PMID:27504221
Pass-Fail Testing: Statistical Requirements and Interpretations.
Gilliam, David; Leigh, Stefan; Rukhin, Andrew; Strawderman, William
2009-01-01
Performance standards for detector systems often include requirements for probability of detection and probability of false alarm at a specified level of statistical confidence. This paper reviews the accepted definitions of confidence level and of critical value. It describes the testing requirements for establishing either of these probabilities at a desired confidence level. These requirements are computable in terms of functions that are readily available in statistical software packages and general spreadsheet applications. The statistical interpretations of the critical values are discussed. A table is included for illustration, and a plot is presented showing the minimum required numbers of pass-fail tests. The results given here are applicable to one-sided testing of any system with performance characteristics conforming to a binomial distribution.
Statistical Interpretation of Natural and Technological Hazards in China
NASA Astrophysics Data System (ADS)
Borthwick, Alistair, ,, Prof.; Ni, Jinren, ,, Prof.
2010-05-01
China is prone to catastrophic natural hazards from floods, droughts, earthquakes, storms, cyclones, landslides, epidemics, extreme temperatures, forest fires, avalanches, and even tsunami. This paper will list statistics related to the six worst natural disasters in China over the past 100 or so years, ranked according to number of fatalities. The corresponding data for the six worst natural disasters in China over the past decade will also be considered. [The data are abstracted from the International Disaster Database, Centre for Research on the Epidemiology of Disasters (CRED), Université Catholique de Louvain, Brussels, Belgium, http://www.cred.be/ where a disaster is defined as occurring if one of the following criteria is fulfilled: 10 or more people reported killed; 100 or more people reported affected; a call for international assistance; or declaration of a state of emergency.] The statistics include the number of occurrences of each type of natural disaster, the number of deaths, the number of people affected, and the cost in billions of US dollars. Over the past hundred years, the largest disasters may be related to the overabundance or scarcity of water, and to earthquake damage. However, there has been a substantial relative reduction in fatalities due to water related disasters over the past decade, even though the overall numbers of people affected remain huge, as does the economic damage. This change is largely due to the efforts put in by China's water authorities to establish effective early warning systems, the construction of engineering countermeasures for flood protection, the implementation of water pricing and other measures for reducing excessive consumption during times of drought. It should be noted that the dreadful death toll due to the Sichuan Earthquake dominates recent data. Joint research has been undertaken between the Department of Environmental Engineering at Peking University and the Department of Engineering Science at Oxford
Statistics by Example, Exploring Data.
ERIC Educational Resources Information Center
Mosteller, Frederick; And Others
Part of a series of four pamphlets providing real-life problems in probability and statistics for the secondary school level, this booklet shows how to organize data in tables and graphs in order to get and to exhibit messages. Elementary probability concepts are also introduced. Fourteen different problem situations arising from biology,…
Engine Data Interpretation System (EDIS)
NASA Technical Reports Server (NTRS)
Cost, Thomas L.; Hofmann, Martin O.
1990-01-01
A prototype of an expert system was developed which applies qualitative or model-based reasoning to the task of post-test analysis and diagnosis of data resulting from a rocket engine firing. A combined component-based and process theory approach is adopted as the basis for system modeling. Such an approach provides a framework for explaining both normal and deviant system behavior in terms of individual component functionality. The diagnosis function is applied to digitized sensor time-histories generated during engine firings. The generic system is applicable to any liquid rocket engine but was adapted specifically in this work to the Space Shuttle Main Engine (SSME). The system is applied to idealized data resulting from turbomachinery malfunction in the SSME.
The broad topic of biomarker research has an often-overlooked component: the documentation and interpretation of the surrounding chemical environment and other meta-data, especially from visualization, analytical, and statistical perspectives (Pleil et al. 2014; Sobus et al. 2011...
Statistically significant relational data mining :
Berry, Jonathan W.; Leung, Vitus Joseph; Phillips, Cynthia Ann; Pinar, Ali; Robinson, David Gerald; Berger-Wolf, Tanya; Bhowmick, Sanjukta; Casleton, Emily; Kaiser, Mark; Nordman, Daniel J.; Wilson, Alyson G.
2014-02-01
This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publications that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.
Spatial Statistical Data Fusion (SSDF)
NASA Technical Reports Server (NTRS)
Braverman, Amy J.; Nguyen, Hai M.; Cressie, Noel
2013-01-01
As remote sensing for scientific purposes has transitioned from an experimental technology to an operational one, the selection of instruments has become more coordinated, so that the scientific community can exploit complementary measurements. However, tech nological and scientific heterogeneity across devices means that the statistical characteristics of the data they collect are different. The challenge addressed here is how to combine heterogeneous remote sensing data sets in a way that yields optimal statistical estimates of the underlying geophysical field, and provides rigorous uncertainty measures for those estimates. Different remote sensing data sets may have different spatial resolutions, different measurement error biases and variances, and other disparate characteristics. A state-of-the-art spatial statistical model was used to relate the true, but not directly observed, geophysical field to noisy, spatial aggregates observed by remote sensing instruments. The spatial covariances of the true field and the covariances of the true field with the observations were modeled. The observations are spatial averages of the true field values, over pixels, with different measurement noise superimposed. A kriging framework is used to infer optimal (minimum mean squared error and unbiased) estimates of the true field at point locations from pixel-level, noisy observations. A key feature of the spatial statistical model is the spatial mixed effects model that underlies it. The approach models the spatial covariance function of the underlying field using linear combinations of basis functions of fixed size. Approaches based on kriging require the inversion of very large spatial covariance matrices, and this is usually done by making simplifying assumptions about spatial covariance structure that simply do not hold for geophysical variables. In contrast, this method does not require these assumptions, and is also computationally much faster. This method is
Alternative interpretations of statistics on health effects of low-level radiation
Hamilton, L.D.
1983-11-01
Four examples of the interpretation of statistics of data on low-level radiation are reviewed: (a) genetic effects of the atomic bombs at Hiroshima and Nagasaki, (b) cancer at Rocky Flats, (c) childhood leukemia and fallout in Utah, and (d) cancer among workers at the Portsmouth Naval Shipyard. Aggregation of data, adjustment for age, and other problems related to the determination of health effects of low-level radiation are discussed. Troublesome issues related to post hoc analysis are considered.
Analysis of Visual Interpretation of Satellite Data
NASA Astrophysics Data System (ADS)
Svatonova, H.
2016-06-01
Millions of people of all ages and expertise are using satellite and aerial data as an important input for their work in many different fields. Satellite data are also gradually finding a new place in education, especially in the fields of geography and in environmental issues. The article presents the results of an extensive research in the area of visual interpretation of image data carried out in the years 2013 - 2015 in the Czech Republic. The research was aimed at comparing the success rate of the interpretation of satellite data in relation to a) the substrates (to the selected colourfulness, the type of depicted landscape or special elements in the landscape) and b) to selected characteristics of users (expertise, gender, age). The results of the research showed that (1) false colour images have a slightly higher percentage of successful interpretation than natural colour images, (2) colourfulness of an element expected or rehearsed by the user (regardless of the real natural colour) increases the success rate of identifying the element (3) experts are faster in interpreting visual data than non-experts, with the same degree of accuracy of solving the task, and (4) men and women are equally successful in the interpretation of visual image data.
Data Interpretation in the Digital Age
Leonelli, Sabina
2014-01-01
The consultation of internet databases and the related use of computer software to retrieve, visualise and model data have become key components of many areas of scientific research. This paper focuses on the relation of these developments to understanding the biology of organisms, and examines the conditions under which the evidential value of data posted online is assessed and interpreted by the researchers who access them, in ways that underpin and guide the use of those data to foster discovery. I consider the types of knowledge required to interpret data as evidence for claims about organisms, and in particular the relevance of knowledge acquired through physical interaction with actual organisms to assessing the evidential value of data found online. I conclude that familiarity with research in vivo is crucial to assessing the quality and significance of data visualised in silico; and that studying how biological data are disseminated, visualised, assessed and interpreted in the digital age provides a strong rationale for viewing scientific understanding as a social and distributed, rather than individual and localised, achievement. PMID:25729262
Interpreting genomic data via entropic dissection
Azad, Rajeev K.; Li, Jing
2013-01-01
Since the emergence of high-throughput genome sequencing platforms and more recently the next-generation platforms, the genome databases are growing at an astronomical rate. Tremendous efforts have been invested in recent years in understanding intriguing complexities beneath the vast ocean of genomic data. This is apparent in the spurt of computational methods for interpreting these data in the past few years. Genomic data interpretation is notoriously difficult, partly owing to the inherent heterogeneities appearing at different scales. Methods developed to interpret these data often suffer from their inability to adequately measure the underlying heterogeneities and thus lead to confounding results. Here, we present an information entropy-based approach that unravels the distinctive patterns underlying genomic data efficiently and thus is applicable in addressing a variety of biological problems. We show the robustness and consistency of the proposed methodology in addressing three different biological problems of significance—identification of alien DNAs in bacterial genomes, detection of structural variants in cancer cell lines and alignment-free genome comparison. PMID:23036836
A Critique of Divorce Statistics and Their Interpretation.
ERIC Educational Resources Information Center
Crosby, John F.
1980-01-01
Increasingly, appeals to the divorce statistic are employed to substantiate claims that the family is in a state of breakdown and marriage is passe. This article contains a consideration of reasons why the divorce statistics are invalid and/or unreliable as indicators of the present state of marriage and family. (Author)
Using Statistics to Lie, Distort, and Abuse Data
ERIC Educational Resources Information Center
Bintz, William; Moore, Sara; Adams, Cheryll; Pierce, Rebecca
2009-01-01
Statistics is a branch of mathematics that involves organization, presentation, and interpretation of data, both quantitative and qualitative. Data do not lie, but people do. On the surface, quantitative data are basically inanimate objects, nothing more than lifeless and meaningless symbols that appear on a page, calculator, computer, or in one's…
The Lure of Statistics in Data Mining
ERIC Educational Resources Information Center
Grover, Lovleen Kumar; Mehra, Rajni
2008-01-01
The field of Data Mining like Statistics concerns itself with "learning from data" or "turning data into information". For statisticians the term "Data mining" has a pejorative meaning. Instead of finding useful patterns in large volumes of data as in the case of Statistics, data mining has the connotation of searching for data to fit preconceived…
Regional interpretation of Kansas aeromagnetic data
Yarger, H.L.
1982-01-01
The aeromagnetic mapping techniques used in a regional aeromagnetic survey of the state are documented and a qualitative regional interpretation of the magnetic basement is presented. Geothermal gradients measured and data from oil well records indicate that geothermal resources in Kansas are of a low-grade nature. However, considerable variation in the gradient is noted statewide within the upper 500 meters of the sedimentary section; this suggests the feasibility of using groundwater for space heating by means of heat pumps.
Statistical characteristics of MST radar echoes and its interpretation
NASA Technical Reports Server (NTRS)
Woodman, Ronald F.
1989-01-01
Two concepts of fundamental importance are reviewed: the autocorrelation function and the frequency power spectrum. In addition, some turbulence concepts, the relationship between radar signals and atmospheric medium statistics, partial reflection, and the characteristics of noise and clutter interference are discussed.
Statistical Interpretation of the Local Field Inside Dielectrics.
ERIC Educational Resources Information Center
Berrera, Ruben G.; Mello, P. A.
1982-01-01
Compares several derivations of the Clausius-Mossotti relation to analyze consistently the nature of approximations used and their range of applicability. Also presents a statistical-mechanical calculation of the local field for classical system of harmonic oscillators interacting via the Coulomb potential. (Author/SK)
Interpretation of data by the clinician.
Goldzieher, J W
The cardinal challenges to every practicing physician are to interpret clinical data correctly and to place them in proper perspective. Clinical investigations frequently lack the rigidly controlled conditions and the careful experimental designs usually found in preclinical animal studies, and this deficiency is partially attributable to the inherent complexities of clinical medicine. Consequently, a great deal of controversy results from conflicting interpretations, extrapolations and overextension of limited data that are often equivocal. More careful appraisal of data and increased awareness of the well-known pitfalls found in retrospective and prospective studies, in which biostatistical design and clinical relevance are often incompatible, are emphasized, and personal biases and the flagrant sensationalism expounded by the media are condemned. The clinician is cautioned to sift through the data, consider the benefit/risk ratio for each patient and then to subordinate the role of critical scientist and assume the role of physician, exercising good judgment in light of the existing evidence and the immediate problems at hand. PMID:39823
Variation in reaction norms: Statistical considerations and biological interpretation.
Morrissey, Michael B; Liefting, Maartje
2016-09-01
Analysis of reaction norms, the functions by which the phenotype produced by a given genotype depends on the environment, is critical to studying many aspects of phenotypic evolution. Different techniques are available for quantifying different aspects of reaction norm variation. We examine what biological inferences can be drawn from some of the more readily applicable analyses for studying reaction norms. We adopt a strongly biologically motivated view, but draw on statistical theory to highlight strengths and drawbacks of different techniques. In particular, consideration of some formal statistical theory leads to revision of some recently, and forcefully, advocated opinions on reaction norm analysis. We clarify what simple analysis of the slope between mean phenotype in two environments can tell us about reaction norms, explore the conditions under which polynomial regression can provide robust inferences about reaction norm shape, and explore how different existing approaches may be used to draw inferences about variation in reaction norm shape. We show how mixed model-based approaches can provide more robust inferences than more commonly used multistep statistical approaches, and derive new metrics of the relative importance of variation in reaction norm intercepts, slopes, and curvatures.
Interpreting magnetic data by integral moments
NASA Astrophysics Data System (ADS)
Tontini, F. Caratori; Pedersen, L. B.
2008-09-01
The use of the integral moments for interpreting magnetic data is based on a very elegant property of potential fields, but in the past it has not been completely exploited due to problems concerning real data. We describe a new 3-D development of previous 2-D results aimed at determining the magnetization direction, extending the calculation to second-order moments to recover the centre of mass of the magnetization distribution. The method is enhanced to reduce the effects of the regional field that often alters the first-order solutions. Moreover, we introduce an iterative correction to properly assess the errors coming from finite-size surveys or interaction with neighbouring anomalies, which are the most important causes of the failing of the method for real data. We test the method on some synthetic examples, and finally, we show the results obtained by analysing the aeromagnetic anomaly of the Monte Vulture volcano in Southern Italy.
Bean, Heather D; Pleil, Joachim D; Hill, Jane E
2015-02-01
The broad topic of biomarker research has an often-overlooked component: the documentation and interpretation of the surrounding chemical environment and other meta-data, especially from visualization, analytical and statistical perspectives. A second concern is how the environment interacts with human systems biology, what the variability is in "normal" subjects, and how such biological observations might be reconstructed to infer external stressors. In this article, we report on recent research presentations from a symposium at the 248th American Chemical Society meeting held in San Francisco, 10-14 August 2014, that focused on providing some insight into these important issues.
Bean, Heather D; Pleil, Joachim D; Hill, Jane E
2015-02-01
The broad topic of biomarker research has an often-overlooked component: the documentation and interpretation of the surrounding chemical environment and other meta-data, especially from visualization, analytical and statistical perspectives. A second concern is how the environment interacts with human systems biology, what the variability is in "normal" subjects, and how such biological observations might be reconstructed to infer external stressors. In this article, we report on recent research presentations from a symposium at the 248th American Chemical Society meeting held in San Francisco, 10-14 August 2014, that focused on providing some insight into these important issues. PMID:25444302
Aerosol backscatter lidar calibration and data interpretation
NASA Technical Reports Server (NTRS)
Kavaya, M. J.; Menzies, R. T.
1984-01-01
A treatment of the various factors involved in lidar data acquisition and analysis is presented. This treatment highlights sources of fundamental, systematic, modeling, and calibration errors that may affect the accurate interpretation and calibration of lidar aerosol backscatter data. The discussion primarily pertains to ground based, pulsed CO2 lidars that probe the troposphere and are calibrated using large, hard calibration targets. However, a large part of the analysis is relevant to other types of lidar systems such as lidars operating at other wavelengths; continuous wave (CW) lidars; lidars operating in other regions of the atmosphere; lidars measuring nonaerosol elastic or inelastic backscatter; airborne or Earth-orbiting lidar platforms; and lidars employing combinations of the above characteristics.
Data Torturing and the Misuse of Statistical Tools
Abate, Marcey L.
1999-08-16
Statistical concepts, methods, and tools are often used in the implementation of statistical thinking. Unfortunately, statistical tools are all too often misused by not applying them in the context of statistical thinking that focuses on processes, variation, and data. The consequences of this misuse may be ''data torturing'' or going beyond reasonable interpretation of the facts due to a misunderstanding of the processes creating the data or the misinterpretation of variability in the data. In the hope of averting future misuse and data torturing, examples are provided where the application of common statistical tools, in the absence of statistical thinking, provides deceptive results by not adequately representing the underlying process and variability. For each of the examples, a discussion is provided on how applying the concepts of statistical thinking may have prevented the data torturing. The lessons learned from these examples will provide an increased awareness of the potential for many statistical methods to mislead and a better understanding of how statistical thinking broadens and increases the effectiveness of statistical tools.
Smart Interpretation - Application of Machine Learning in Geological Interpretation of AEM Data
NASA Astrophysics Data System (ADS)
Bach, T.; Gulbrandsen, M. L.; Jacobsen, R.; Pallesen, T. M.; Jørgensen, F.; Høyer, A. S.; Hansen, T. M.
2015-12-01
When using airborne geophysical measurements in e.g. groundwater mapping, an overwhelming amount of data is collected. Increasingly larger survey areas, denser data collection and limited resources, combines to an increasing problem of building geological models that use all the available data in a manner that is consistent with the geologists knowledge about the geology of the survey area. In the ERGO project, funded by The Danish National Advanced Technology Foundation, we address this problem, by developing new, usable tools, enabling the geologist utilize her geological knowledge directly in the interpretation of the AEM data, and thereby handle the large amount of data, In the project we have developed the mathematical basis for capturing geological expertise in a statistical model. Based on this, we have implemented new algorithms that have been operationalized and embedded in user friendly software. In this software, the machine learning algorithm, Smart Interpretation, enables the geologist to use the system as an assistant in the geological modelling process. As the software 'learns' the geology from the geologist, the system suggest new modelling features in the data. In this presentation we demonstrate the application of the results from the ERGO project, including the proposed modelling workflow utilized on a variety of data examples.
Inferring the statistical interpretation of quantum mechanics from the classical limit
Gottfried
2000-06-01
It is widely believed that the statistical interpretation of quantum mechanics cannot be inferred from the Schrodinger equation itself, and must be stated as an additional independent axiom. Here I propose that the situation is not so stark. For systems that have both continuous and discrete degrees of freedom (such as coordinates and spin respectively), the statistical interpretation for the discrete variables is implied by requiring that the system's gross motion can be classically described under circumstances specified by the Schrodinger equation. However, this is not a full-fledged derivation of the statistical interpretation because it does not apply to the continuous variables of classical mechanics.
Recent statistical methods for orientation data
NASA Technical Reports Server (NTRS)
Batschelet, E.
1972-01-01
The application of statistical methods for determining the areas of animal orientation and navigation are discussed. The method employed is limited to the two-dimensional case. Various tests for determining the validity of the statistical analysis are presented. Mathematical models are included to support the theoretical considerations and tables of data are developed to show the value of information obtained by statistical analysis.
Teaching Social Statistics with Simulated Data.
ERIC Educational Resources Information Center
Halley, Fred S.
1991-01-01
Suggests using simulated data to teach students about the nature and use of statistical tests and measures. Observes that simulated data contains built-in pure relationships with no poor response rates or coding or sampling errors. Recommends suitable software. Includes information on using data sets, demonstrating statistical principles, and…
Instruments, methods, statistics, plasmaphysical interpretation of type IIIb bursts
NASA Astrophysics Data System (ADS)
Urbarz, H. W.
Type-IIIb solar bursts in the m-dkm band and the methods used to study them are characterized in a review of recent research. The value of high-resolution spectrographs (with effective apertures of 1000-100,000 sq m, frequency resolution 20 kHz, and time resolution 100 msec) in detecting and investigating type-IIIb bursts is emphasized, and the parameters of the most important instruments are listed in a table. Burst spectra, sources, polarization, flux, occurrence, and association with other types are discussed and illustrated with sample spectra, tables, and histograms. The statistics of observations made at Weissenau Observatory (Tuebingen, FRG) from August, 1978, through December, 1979, are considered in detail. Theories proposed to explain type-III and type-IIIb bursts are summarized, including frequency splitting (FS) of the Langmuir spectrum, FS during the transverse-wave conversion process, FS during propagation-effect transverse-wave escape, and discrete source regions with different f(p) values.
A data-management system for detailed areal interpretive data
Ferrigno, C.F.
1986-01-01
A data storage and retrieval system has been developed to organize and preserve areal interpretive data. This system can be used by any study where there is a need to store areal interpretive data that generally is presented in map form. This system provides the capability to grid areal interpretive data for input to groundwater flow models at any spacing and orientation. The data storage and retrieval system is designed to be used for studies that cover small areas such as counties. The system is built around a hierarchically structured data base consisting of related latitude-longitude blocks. The information in the data base can be stored at different levels of detail, with the finest detail being a block of 6 sec of latitude by 6 sec of longitude (approximately 0.01 sq mi). This system was implemented on a mainframe computer using a hierarchical data base management system. The computer programs are written in Fortran IV and PL/1. The design and capabilities of the data storage and retrieval system, and the computer programs that are used to implement the system are described. Supplemental sections contain the data dictionary, user documentation of the data-system software, changes that would need to be made to use this system for other studies, and information on the computer software tape. (Lantz-PTT)
Laterally constrained inversion for CSAMT data interpretation
NASA Astrophysics Data System (ADS)
Wang, Ruo; Yin, Changchun; Wang, Miaoyue; Di, Qingyun
2015-10-01
Laterally constrained inversion (LCI) has been successfully applied to the inversion of dc resistivity, TEM and airborne EM data. However, it hasn't been yet applied to the interpretation of controlled-source audio-frequency magnetotelluric (CSAMT) data. In this paper, we apply the LCI method for CSAMT data inversion by preconditioning the Jacobian matrix. We apply a weighting matrix to Jacobian to balance the sensitivity of model parameters, so that the resolution with respect to different model parameters becomes more uniform. Numerical experiments confirm that this can improve the convergence of the inversion. We first invert a synthetic dataset with and without noise to investigate the effect of LCI applications to CSAMT data, for the noise free data, the results show that the LCI method can recover the true model better compared to the traditional single-station inversion; and for the noisy data, the true model is recovered even with a noise level of 8%, indicating that LCI inversions are to some extent noise insensitive. Then, we re-invert two CSAMT datasets collected respectively in a watershed and a coal mine area in Northern China and compare our results with those from previous inversions. The comparison with the previous inversion in a coal mine shows that LCI method delivers smoother layer interfaces that well correlate to seismic data, while comparison with a global searching algorithm of simulated annealing (SA) in a watershed shows that though both methods deliver very similar good results, however, LCI algorithm presented in this paper runs much faster. The inversion results for the coal mine CSAMT survey show that a conductive water-bearing zone that was not revealed by the previous inversions has been identified by the LCI. This further demonstrates that the method presented in this paper works for CSAMT data inversion.
Polarimetric radar data decomposition and interpretation
NASA Technical Reports Server (NTRS)
Sun, Guoqing; Ranson, K. Jon
1993-01-01
Significant efforts have been made to decompose polarimetric radar data into several simple scattering components. The components which are selected because of their physical significance can be used to classify SAR (Synthetic Aperture Radar) image data. If particular components can be related to forest parameters, inversion procedures may be developed to estimate these parameters from the scattering components. Several methods have been used to decompose an averaged Stoke's matrix or covariance matrix into three components representing odd (surface), even (double-bounce) and diffuse (volume) scatterings. With these decomposition techniques, phenomena, such as canopy-ground interactions, randomness of orientation, and size of scatters can be examined from SAR data. In this study we applied the method recently reported by van Zyl (1992) to decompose averaged backscattering covariance matrices extracted from JPL SAR images over forest stands in Maine, USA. These stands are mostly mixed stands of coniferous and deciduous trees. Biomass data have been derived from field measurements of DBH and tree density using allometric equations. The interpretation of the decompositions and relationships with measured stand biomass are presented in this paper.
[Blood proteins in African trypanosomiasis: variations and statistical interpretations].
Cailliez, M; Poupin, F; Pages, J P; Savel, J
1982-01-01
The estimation of blood orosomucoid, haptoglobin, C-reactive protein and immunoglobulins levels, has enable us to prove a specific proteic profile in the human african trypanosomiasis, as compared with other that of parasitic diseases, and with an healthy african reference group. Data processing informatique by principal components analysis, provide a valuable pool for epidemiological surveys.
Statistical analysis of scintillation data
Chua, S.; Noonan, J.P.; Basu, S.
1981-09-01
The Nakagami-m distribution has traditionally been used successfully to model the probability characteristics of ionospheric scintillations at UHF. This report investigates the distribution properties of scintillation data in the L-band range. Specifically, the appropriateness of the Nakagami-m and lognormal distributions is tested. Briefly the results confirm that the Nakagami-m is appropriate for UHF but not for L-band scintillations. The lognormal provides a better fit to the distribution of L-band scintillations and is an adequate model allowing for an error of + or - 0.1 or smaller in predicted probability with a sample size of 256.
Statistics for characterizing data on the periphery
Theiler, James P; Hush, Donald R
2010-01-01
We introduce a class of statistics for characterizing the periphery of a distribution, and show that these statistics are particularly valuable for problems in target detection. Because so many detection algorithms are rooted in Gaussian statistics, we concentrate on ellipsoidal models of high-dimensional data distributions (that is to say: covariance matrices), but we recommend several alternatives to the sample covariance matrix that more efficiently model the periphery of a distribution, and can more effectively detect anomalous data samples.
Impact of Equity Models and Statistical Measures on Interpretations of Educational Reform
ERIC Educational Resources Information Center
Rodriguez, Idaykis; Brewe, Eric; Sawtelle, Vashti; Kramer, Laird H.
2012-01-01
We present three models of equity and show how these, along with the statistical measures used to evaluate results, impact interpretation of equity in education reform. Equity can be defined and interpreted in many ways. Most equity education reform research strives to achieve equity by closing achievement gaps between groups. An example is given…
Statistical Literacy: Data Tell a Story
ERIC Educational Resources Information Center
Sole, Marla A.
2016-01-01
Every day, students collect, organize, and analyze data to make decisions. In this data-driven world, people need to assess how much trust they can place in summary statistics. The results of every survey and the safety of every drug that undergoes a clinical trial depend on the correct application of appropriate statistics. Recognizing the…
... to Other Websites Information For... Media Policy Makers Data & Statistics Language: English Español (Spanish) Recommend on Facebook Tweet Share Compartir * The data on this page are from the article, “Venous ...
Data Mining: Going beyond Traditional Statistics
ERIC Educational Resources Information Center
Zhao, Chun-Mei; Luan, Jing
2006-01-01
The authors provide an overview of data mining, giving special attention to the relationship between data mining and statistics to unravel some misunderstandings about the two techniques. (Contains 1 figure.)
Distributed data collection for a database of radiological image interpretations
NASA Astrophysics Data System (ADS)
Long, L. Rodney; Ostchega, Yechiam; Goh, Gin-Hua; Thoma, George R.
1997-01-01
The National Library of Medicine, in collaboration with the National Center for Health Statistics and the National Institute for Arthritis and Musculoskeletal and Skin Diseases, has built a system for collecting radiological interpretations for a large set of x-ray images acquired as part of the data gathered in the second National Health and Nutrition Examination Survey. This system is capable of delivering across the Internet 5- and 10-megabyte x-ray images to Sun workstations equipped with X Window based 2048 X 2560 image displays, for the purpose of having these images interpreted for the degree of presence of particular osteoarthritic conditions in the cervical and lumbar spines. The collected interpretations can then be stored in a database at the National Library of Medicine, under control of the Illustra DBMS. This system is a client/server database application which integrates (1) distributed server processing of client requests, (2) a customized image transmission method for faster Internet data delivery, (3) distributed client workstations with high resolution displays, image processing functions and an on-line digital atlas, and (4) relational database management of the collected data.
Shafieloo, Arman
2012-05-01
By introducing Crossing functions and hyper-parameters I show that the Bayesian interpretation of the Crossing Statistics [1] can be used trivially for the purpose of model selection among cosmological models. In this approach to falsify a cosmological model there is no need to compare it with other models or assume any particular form of parametrization for the cosmological quantities like luminosity distance, Hubble parameter or equation of state of dark energy. Instead, hyper-parameters of Crossing functions perform as discriminators between correct and wrong models. Using this approach one can falsify any assumed cosmological model without putting priors on the underlying actual model of the universe and its parameters, hence the issue of dark energy parametrization is resolved. It will be also shown that the sensitivity of the method to the intrinsic dispersion of the data is small that is another important characteristic of the method in testing cosmological models dealing with data with high uncertainties.
Collecting operational event data for statistical analysis
Atwood, C.L.
1994-09-01
This report gives guidance for collecting operational data to be used for statistical analysis, especially analysis of event counts. It discusses how to define the purpose of the study, the unit (system, component, etc.) to be studied, events to be counted, and demand or exposure time. Examples are given of classification systems for events in the data sources. A checklist summarizes the essential steps in data collection for statistical analysis.
Barber, Chris; Cayley, Alex; Hanser, Thierry; Harding, Alex; Heghes, Crina; Vessey, Jonathan D; Werner, Stephane; Weiner, Sandy K; Wichard, Joerg; Giddings, Amanda; Glowienke, Susanne; Parenty, Alexis; Brigo, Alessandro; Spirkl, Hans-Peter; Amberg, Alexander; Kemper, Ray; Greene, Nigel
2016-04-01
The relative wealth of bacterial mutagenicity data available in the public literature means that in silico quantitative/qualitative structure activity relationship (QSAR) systems can readily be built for this endpoint. A good means of evaluating the performance of such systems is to use private unpublished data sets, which generally represent a more distinct chemical space than publicly available test sets and, as a result, provide a greater challenge to the model. However, raw performance metrics should not be the only factor considered when judging this type of software since expert interpretation of the results obtained may allow for further improvements in predictivity. Enough information should be provided by a QSAR to allow the user to make general, scientifically-based arguments in order to assess and overrule predictions when necessary. With all this in mind, we sought to validate the performance of the statistics-based in vitro bacterial mutagenicity prediction system Sarah Nexus (version 1.1) against private test data sets supplied by nine different pharmaceutical companies. The results of these evaluations were then analysed in order to identify findings presented by the model which would be useful for the user to take into consideration when interpreting the results and making their final decision about the mutagenic potential of a given compound. PMID:26708083
Basic statistical tools in research and data analysis
Ali, Zulfiqar; Bhaskar, S Bala
2016-01-01
Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if proper statistical tests are used. This article will try to acquaint the reader with the basic research tools that are utilised while conducting various studies. The article covers a brief outline of the variables, an understanding of quantitative and qualitative variables and the measures of central tendency. An idea of the sample size estimation, power analysis and the statistical errors is given. Finally, there is a summary of parametric and non-parametric tests used for data analysis. PMID:27729694
Confidentiality of Research and Statistical Data.
ERIC Educational Resources Information Center
Law Enforcement Assistance Administration (Dept. of Justice), Washington, DC.
This document was prepared by the Privacy and Security Staff, National Criminal Justice Information and Statistics Service, in conjunction with the Law Enforcement Assistance Administration (LEAA) Office of General Counsel, to explain and discuss the requirements of the LEAA regulations governing confidentiality of research and statistical data.…
Statistical Data Analyses of Trace Chemical, Biochemical, and Physical Analytical Signatures
Udey, Ruth Norma
2013-01-01
Analytical and bioanalytical chemistry measurement results are most meaningful when interpreted using rigorous statistical treatments of the data. The same data set may provide many dimensions of information depending on the questions asked through the applied statistical methods. Three principal projects illustrated the wealth of information gained through the application of statistical data analyses to diverse problems.
Topology for statistical modeling of petascale data.
Pascucci, Valerio; Mascarenhas, Ajith Arthur; Rusek, Korben; Bennett, Janine Camille; Levine, Joshua; Pebay, Philippe Pierre; Gyulassy, Attila; Thompson, David C.; Rojas, Joseph Maurice
2011-07-01
This document presents current technical progress and dissemination of results for the Mathematics for Analysis of Petascale Data (MAPD) project titled 'Topology for Statistical Modeling of Petascale Data', funded by the Office of Science Advanced Scientific Computing Research (ASCR) Applied Math program. Many commonly used algorithms for mathematical analysis do not scale well enough to accommodate the size or complexity of petascale data produced by computational simulations. The primary goal of this project is thus to develop new mathematical tools that address both the petascale size and uncertain nature of current data. At a high level, our approach is based on the complementary techniques of combinatorial topology and statistical modeling. In particular, we use combinatorial topology to filter out spurious data that would otherwise skew statistical modeling techniques, and we employ advanced algorithms from algebraic statistics to efficiently find globally optimal fits to statistical models. This document summarizes the technical advances we have made to date that were made possible in whole or in part by MAPD funding. These technical contributions can be divided loosely into three categories: (1) advances in the field of combinatorial topology, (2) advances in statistical modeling, and (3) new integrated topological and statistical methods.
HistFitter software framework for statistical data analysis
NASA Astrophysics Data System (ADS)
Baak, M.; Besjes, G. J.; Côté, D.; Koutsman, A.; Lorenz, J.; Short, D.
2015-04-01
We present a software framework for statistical data analysis, called HistFitter, that has been used extensively by the ATLAS Collaboration to analyze big datasets originating from proton-proton collisions at the Large Hadron Collider at CERN. Since 2012 HistFitter has been the standard statistical tool in searches for supersymmetric particles performed by ATLAS. HistFitter is a programmable and flexible framework to build, book-keep, fit, interpret and present results of data models of nearly arbitrary complexity. Starting from an object-oriented configuration, defined by users, the framework builds probability density functions that are automatically fit to data and interpreted with statistical tests. Internally HistFitter uses the statistics packages RooStats and HistFactory. A key innovation of HistFitter is its design, which is rooted in analysis strategies of particle physics. The concepts of control, signal and validation regions are woven into its fabric. These are progressively treated with statistically rigorous built-in methods. Being capable of working with multiple models at once that describe the data, HistFitter introduces an additional level of abstraction that allows for easy bookkeeping, manipulation and testing of large collections of signal hypotheses. Finally, HistFitter provides a collection of tools to present results with publication quality style through a simple command-line interface.
NASA Astrophysics Data System (ADS)
Kuić, Domagoj
2016-05-01
In this paper an alternative approach to statistical mechanics based on the maximum information entropy principle (MaxEnt) is examined, specifically its close relation with the Gibbs method of ensembles. It is shown that the MaxEnt formalism is the logical extension of the Gibbs formalism of equilibrium statistical mechanics that is entirely independent of the frequentist interpretation of probabilities only as factual (i.e. experimentally verifiable) properties of the real world. Furthermore, we show that, consistently with the law of large numbers, the relative frequencies of the ensemble of systems prepared under identical conditions (i.e. identical constraints) actually correspond to the MaxEnt probabilites in the limit of a large number of systems in the ensemble. This result implies that the probabilities in statistical mechanics can be interpreted, independently of the frequency interpretation, on the basis of the maximum information entropy principle.
A spatial scan statistic for multinomial data
Jung, Inkyung; Kulldorff, Martin; Richard, Otukei John
2014-01-01
As a geographical cluster detection analysis tool, the spatial scan statistic has been developed for different types of data such as Bernoulli, Poisson, ordinal, exponential and normal. Another interesting data type is multinomial. For example, one may want to find clusters where the disease-type distribution is statistically significantly different from the rest of the study region when there are different types of disease. In this paper, we propose a spatial scan statistic for such data, which is useful for geographical cluster detection analysis for categorical data without any intrinsic order information. The proposed method is applied to meningitis data consisting of five different disease categories to identify areas with distinct disease-type patterns in two counties in the U.K. The performance of the method is evaluated through a simulation study. PMID:20680984
Statistical Approaches to Functional Neuroimaging Data
DuBois Bowman, F; Guo, Ying; Derado, Gordana
2007-01-01
Synopsis The field of statistics makes valuable contributions to functional neuroimaging research by establishing procedures for the design and conduct of neuroimaging experiements and by providing tools for objectively quantifying and measuring the strength of scientific evidence provided by the data. Two common functional neuroimaging research objecitves include detecting brain regions that reveal task-related alterations in measured brain activity (activations) and identifying highly correlated brain regions that exhibit similar patterns of activity over time (functional connectivity). In this article, we highlight various statistical procedures for analyzing data from activation studies and from functional connectivity studies, focusing on functional magnetic resonance imaging (fMRI) and positron emission tomography (PET) data. We also discuss emerging statistical methods for prediction using fMRI and PET data, which stand to increase the translational significance of functional neuroimaging data to clinical practice. PMID:17983962
The early statistical interpretations of quantum mechanics in the USA and USSR
NASA Astrophysics Data System (ADS)
Pechenkin, Alexander
2012-02-01
This article is devoted to the statistical (ensemble) interpretations of quantum mechanics which appeared in the USA and USSR before War II and in the early war years. The author emphasizes a remarkable similarity between the statements which arose in different scientific, philosophical, and even political contexts. The comparative analysis extends to the scientific and philosophical traditions which lay behind the American and Soviet statistical interpretations of quantum mechanics. The author insists that the philosophy of quantum mechanics is an autonomous branch rather than an applied philosophy or philosophical physics.
Transit Spectroscopy: new data analysis techniques and interpretation
NASA Astrophysics Data System (ADS)
Tinetti, Giovanna; Waldmann, Ingo P.; Morello, Giuseppe; Tessenyi, Marcell; Varley, Ryan; Barton, Emma; Yurchenko, Sergey; Tennyson, Jonathan; Hollis, Morgan
2014-11-01
Planetary science beyond the boundaries of our Solar System is today in its infancy. Until a couple of decades ago, the detailed investigation of the planetary properties was restricted to objects orbiting inside the Kuiper Belt. Today, we cannot ignore that the number of known planets has increased by two orders of magnitude nor that these planets resemble anything but the objects present in our own Solar System. A key observable for planets is the chemical composition and state of their atmosphere. To date, two methods can be used to sound exoplanetary atmospheres: transit and eclipse spectroscopy, and direct imaging spectroscopy. Although the field of exoplanet spectroscopy has been very successful in past years, there are a few serious hurdles that need to be overcome to progress in this area: in particular instrument systematics are often difficult to disentangle from the signal, data are sparse and often not recorded simultaneously causing degeneracy of interpretation. We will present here new data analysis techniques and interpretation developed by the “ExoLights” team at UCL to address the above-mentioned issues. Said techniques include statistical tools, non-parametric, machine-learning algorithms, optimized radiative transfer models and spectroscopic line-lists. These new tools have been successfully applied to existing data recorded with space and ground instruments, shedding new light on our knowledge and understanding of these alien worlds.
Statistical data of the uranium industry
1983-01-01
This report is a compendium of information relating to US uranium reserves and potential resources and to exploration, mining, milling, and other activities of the uranium industry through 1982. The statistics are based primarily on data provided voluntarily by the uranium exploration, mining and milling companies. The compendium has been published annually since 1968 and reflects the basic programs of the Grand Junction Area Office of the US Department of Energy. Statistical data obtained from surveys conducted by the Energy Information Administration are included in Section IX. The production, reserves, and drilling data are reported in a manner which avoids disclosure of proprietary information.
Topology for Statistical Modeling of Petascale Data
Bennett, Janine Camille; Pebay, Philippe Pierre; Pascucci, Valerio; Levine, Joshua; Gyulassy, Attila; Rojas, Maurice
2014-07-01
This document presents current technical progress and dissemination of results for the Mathematics for Analysis of Petascale Data (MAPD) project titled "Topology for Statistical Modeling of Petascale Data", funded by the Office of Science Advanced Scientific Computing Research (ASCR) Applied Math program.
How to limit clinical errors in interpretation of data.
Wright, P; Jansen, C; Wyatt, J C
1998-11-01
We all assume that we can understand and correctly interpret what we read. However, interpretation is a collection of subtle processes that are easily influenced by poor presentation or wording of information. This article examines how evidence-based principles of information design can be applied to medical records to enhance clinical understanding and accuracy in interpretation of the detailed data that they contain.
Statistical Tools for the Interpretation of Enzootic West Nile virus Transmission Dynamics.
Caillouët, Kevin A; Robertson, Suzanne
2016-01-01
Interpretation of enzootic West Nile virus (WNV) surveillance indicators requires little advanced mathematical skill, but greatly enhances the ability of public health officials to prescribe effective WNV management tactics. Stepwise procedures for the calculation of mosquito infection rates (IR) and vector index (VI) are presented alongside statistical tools that require additional computation. A brief review of advantages and important considerations for each statistic's use is provided. PMID:27188561
Data-Intensive Statistical Computations in Astronomy
NASA Astrophysics Data System (ADS)
Szalay, Alex
2010-01-01
The emerging large datasets are posing major challenges for their subsequent statistical analyses. One needs reinvent optimal statistical algorithms, where the cost of computing is taken into account. Moving large amounts of data is becoming increasingly untenable, thus our computations must be performed close to the data. Existing computer architectures are CPU-heavy, while the first passes of most data analyses require an extreme I/O bandwidth. Novel computational algorithms, optimized for extreme datasets, and the new, data-intensive architectures must be invented. The outputs of large numerical simulations increasingly resemble the "observable” universe, with data volumes are approaching if not exceeding observational data. Persistent "laboratories” of numerical experiments will soon be publicly available, and will change the way we approach the comparisons of observational data to first principle simulations.
Statistical Analysis of Big Data on Pharmacogenomics
Fan, Jianqing; Liu, Han
2013-01-01
This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively review several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, large-scale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identifying important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed. PMID:23602905
Statistical data of the uranium industry
1981-01-01
Data are presented on US uranium reserves, potential resources, exploration, mining, drilling, milling, and other activities of the uranium industry through 1980. The compendium reflects the basic programs of the Grand Junction Office. Statistics are based primarily on information provided by the uranium exploration, mining, and milling companies. Data on commercial U/sub 3/O/sub 8/ sales and purchases are included. Data on non-US uranium production and resources are presented in the appendix. (DMC)
Statistical data of the uranium industry
1982-01-01
Statistical Data of the Uranium Industry is a compendium of information relating to US uranium reserves and potential resources and to exploration, mining, milling, and other activities of the uranium industry through 1981. The statistics are based primarily on data provided voluntarily by the uranium exploration, mining, and milling companies. The compendium has been published annually since 1968 and reflects the basic programs of the Grand Junction Area Office (GJAO) of the US Department of Energy. The production, reserves, and drilling information is reported in a manner which avoids disclosure of proprietary information.
Material Phase Causality or a Dynamics-Statistical Interpretation of Quantum Mechanics
Koprinkov, I. G.
2010-11-25
The internal phase dynamics of a quantum system interacting with an electromagnetic field is revealed in details. Theoretical and experimental evidences of a causal relation of the phase of the wave function to the dynamics of the quantum system are presented sistematically for the first time. A dynamics-statistical interpretation of the quantum mechanics is introduced.
Interpreting Statistical Significance Test Results: A Proposed New "What If" Method.
ERIC Educational Resources Information Center
Kieffer, Kevin M.; Thompson, Bruce
As the 1994 publication manual of the American Psychological Association emphasized, "p" values are affected by sample size. As a result, it can be helpful to interpret the results of statistical significant tests in a sample size context by conducting so-called "what if" analyses. However, these methods can be inaccurate unless "corrected" effect…
Seasonal variations of decay rate measurement data and their interpretation.
Schrader, Heinrich
2016-08-01
Measurement data of long-lived radionuclides, for example, (85)Kr, (90)Sr, (108m)Ag, (133)Ba, (152)Eu, (154)Eu and (226)Ra, and particularly the relative residuals of fitted raw data from current measurements of ionization chambers for half-life determination show small periodic seasonal variations with amplitudes of about 0.15%. The interpretation of these fluctuations is a matter of controversy whether the observed effect is produced by some interaction with the radionuclides themselves or is an artifact of the measuring chain. At the origin of such a discussion there is the exponential decay law of radioactive substances used for data fitting, one of the fundamentals of nuclear physics. Some groups of physicists use statistical methods and analyze correlations with various parameters of the measurement data and, for example, the Earth-Sun distance, as a basis of interpretation. In this article, data measured at the Physikalisch-Technische Bundesanstalt and published earlier are the subject of a correlation analysis using the corresponding time series of data with varying measurement conditions. An overview of these measurement conditions producing instrument instabilities is given and causality relations are discussed. The resulting correlation coefficients for various series of the same radionuclide using similar measurement conditions are in the order of 0.7, which indicates a high correlation, and for series of the same radionuclide using different measurement conditions and changes of the measuring chain of the order of -0.2 or even lower, which indicates an anti-correlation. These results provide strong arguments that the observed seasonal variations are caused by the measuring chain and, in particular, by the type of measuring electronics used.
Seasonal variations of decay rate measurement data and their interpretation.
Schrader, Heinrich
2016-08-01
Measurement data of long-lived radionuclides, for example, (85)Kr, (90)Sr, (108m)Ag, (133)Ba, (152)Eu, (154)Eu and (226)Ra, and particularly the relative residuals of fitted raw data from current measurements of ionization chambers for half-life determination show small periodic seasonal variations with amplitudes of about 0.15%. The interpretation of these fluctuations is a matter of controversy whether the observed effect is produced by some interaction with the radionuclides themselves or is an artifact of the measuring chain. At the origin of such a discussion there is the exponential decay law of radioactive substances used for data fitting, one of the fundamentals of nuclear physics. Some groups of physicists use statistical methods and analyze correlations with various parameters of the measurement data and, for example, the Earth-Sun distance, as a basis of interpretation. In this article, data measured at the Physikalisch-Technische Bundesanstalt and published earlier are the subject of a correlation analysis using the corresponding time series of data with varying measurement conditions. An overview of these measurement conditions producing instrument instabilities is given and causality relations are discussed. The resulting correlation coefficients for various series of the same radionuclide using similar measurement conditions are in the order of 0.7, which indicates a high correlation, and for series of the same radionuclide using different measurement conditions and changes of the measuring chain of the order of -0.2 or even lower, which indicates an anti-correlation. These results provide strong arguments that the observed seasonal variations are caused by the measuring chain and, in particular, by the type of measuring electronics used. PMID:27258217
On the Interpretation of Running Trends as Summary Statistics for Time Series Analysis
NASA Astrophysics Data System (ADS)
Vigo, Isabel M.; Trottini, Mario; Belda, Santiago
2016-04-01
In recent years, running trends analysis (RTA) has been widely used in climate applied research as summary statistics for time series analysis. There is no doubt that RTA might be a useful descriptive tool, but despite its general use in applied research, precisely what it reveals about the underlying time series is unclear and, as a result, its interpretation is unclear too. This work contributes to such interpretation in two ways: 1) an explicit formula is obtained for the set of time series with a given series of running trends, making it possible to show that running trends, alone, perform very poorly as summary statistics for time series analysis; and 2) an equivalence is established between RTA and the estimation of a (possibly nonlinear) trend component of the underlying time series using a weighted moving average filter. Such equivalence provides a solid ground for RTA implementation and interpretation/validation.
Dotto, G L; Pinto, L A A; Hachicha, M A; Knani, S
2015-03-15
In this work, statistical physics treatment was employed to study the adsorption of food dyes onto chitosan films, in order to obtain new physicochemical interpretations at molecular level. Experimental equilibrium curves were obtained for the adsorption of four dyes (FD&C red 2, FD&C yellow 5, FD&C blue 2, Acid Red 51) at different temperatures (298, 313 and 328 K). A statistical physics formula was used to interpret these curves, and the parameters such as, number of adsorbed dye molecules per site (n), anchorage number (n'), receptor sites density (NM), adsorbed quantity at saturation (N asat), steric hindrance (τ), concentration at half saturation (c1/2) and molar adsorption energy (ΔE(a)) were estimated. The relation of the above mentioned parameters with the chemical structure of the dyes and temperature was evaluated and interpreted.
Multivariate statistical analysis of environmental monitoring data
Ross, D.L.
1997-11-01
EPA requires statistical procedures to determine whether soil or ground water adjacent to or below waste units is contaminated. These statistical procedures are often based on comparisons between two sets of data: one representing background conditions, and one representing site conditions. Since statistical requirements were originally promulgated in the 1980s, EPA has made several improvements and modifications. There are, however, problems which remain. One problem is that the regulations do not require a minimum probability that contaminated sites will be correctly identified. Another problems is that the effect of testing several correlated constituents on the probable outcome of the statistical tests has not been quantified. Results from computer simulations to determine power functions for realistic monitoring situations are presented here. Power functions for two different statistical procedures: the Student`s t-test, and the multivariate Hotelling`s T{sup 2} test, are compared. The comparisons indicate that the multivariate test is often more powerful when the tests are applied with significance levels to control the probability of falsely identifying clean sites as contaminated. This program could also be used to verify that statistical procedures achieve some minimum power standard at a regulated waste unit.
MICROARRAY DATA ANALYSIS USING MULTIPLE STATISTICAL MODELS
Microarray Data Analysis Using Multiple Statistical Models
Wenjun Bao1, Judith E. Schmid1, Amber K. Goetz1, Ming Ouyang2, William J. Welsh2,Andrew I. Brooks3,4, ChiYi Chu3,Mitsunori Ogihara3,4, Yinhe Cheng5, David J. Dix1. 1National Health and Environmental Effects Researc...
Interpretation of remotely sensed data and its applications in oceanography
NASA Technical Reports Server (NTRS)
Parada, N. D. J. (Principal Investigator); Tanaka, K.; Inostroza, H. M.; Verdesio, J. J.
1982-01-01
The methodology of interpretation of remote sensing data and its oceanographic applications are described. The elements of image interpretation for different types of sensors are discussed. The sensors utilized are the multispectral scanner of LANDSAT, and the thermal infrared of NOAA and geostationary satellites. Visual and automatic data interpretation in studies of pollution, the Brazil current system, and upwelling along the southeastern Brazilian coast are compared.
Analysis and Interpretation of Financial Data.
ERIC Educational Resources Information Center
Robinson, Daniel D.
1975-01-01
Understanding the financial reports of colleges and universities has long been a problem because of the lack of comparability of the data presented. Recently, there has been a move to agree on uniform standards for financial accounting and reporting for the field of higher education. In addition to comparable data, the efforts to make financial…
Engine Data Interpretation System (EDIS), phase 2
NASA Technical Reports Server (NTRS)
Cost, Thomas L.; Hofmann, Martin O.
1991-01-01
A prototype of an expert system was developed which applies qualitative constraint-based reasoning to the task of post-test analysis of data resulting from a rocket engine firing. Data anomalies are detected and corresponding faults are diagnosed. Engine behavior is reconstructed using measured data and knowledge about engine behavior. Knowledge about common faults guides but does not restrict the search for the best explanation in terms of hypothesized faults. The system contains domain knowledge about the behavior of common rocket engine components and was configured for use with the Space Shuttle Main Engine (SSME). A graphical user interface allows an expert user to intimately interact with the system during diagnosis. The system was applied to data taken during actual SSME tests where data anomalies were observed.
Critical analysis of adsorption data statistically
NASA Astrophysics Data System (ADS)
Kaushal, Achla; Singh, S. K.
2016-09-01
Experimental data can be presented, computed, and critically analysed in a different way using statistics. A variety of statistical tests are used to make decisions about the significance and validity of the experimental data. In the present study, adsorption was carried out to remove zinc ions from contaminated aqueous solution using mango leaf powder. The experimental data was analysed statistically by hypothesis testing applying t test, paired t test and Chi-square test to (a) test the optimum value of the process pH, (b) verify the success of experiment and (c) study the effect of adsorbent dose in zinc ion removal from aqueous solutions. Comparison of calculated and tabulated values of t and χ 2 showed the results in favour of the data collected from the experiment and this has been shown on probability charts. K value for Langmuir isotherm was 0.8582 and m value for Freundlich adsorption isotherm obtained was 0.725, both are <1, indicating favourable isotherms. Karl Pearson's correlation coefficient values for Langmuir and Freundlich adsorption isotherms were obtained as 0.99 and 0.95 respectively, which show higher degree of correlation between the variables. This validates the data obtained for adsorption of zinc ions from the contaminated aqueous solution with the help of mango leaf powder.
Redman-MacLaren, Michelle; Mills, Jane; Tommbe, Rachael
2014-01-01
Background Participatory approaches to qualitative research practice constantly change in response to evolving research environments. Researchers are increasingly encouraged to undertake secondary analysis of qualitative data, despite epistemological and ethical challenges. Interpretive focus groups can be described as a more participative method for groups to analyse qualitative data. Objective To facilitate interpretive focus groups with women in Papua New Guinea to extend analysis of existing qualitative data and co-create new primary data. The purpose of this was to inform a transformational grounded theory and subsequent health promoting action. Design A two-step approach was used in a grounded theory study about how women experience male circumcision in Papua New Guinea. Participants analysed portions or ‘chunks’ of existing qualitative data in story circles and built upon this analysis by using the visual research method of storyboarding. Results New understandings of the data were evoked when women in interpretive focus groups analysed the data ‘chunks’. Interpretive focus groups encouraged women to share their personal experiences about male circumcision. The visual method of storyboarding enabled women to draw pictures to represent their experiences. This provided an additional focus for whole-of-group discussions about the research topic. Conclusions Interpretive focus groups offer opportunity to enhance trustworthiness of findings when researchers undertake secondary analysis of qualitative data. The co-analysis of existing data and co-generation of new data between research participants and researchers informed an emergent transformational grounded theory and subsequent health promoting action. PMID:25138532
Geophysical interpretation of Venus gravity data
NASA Technical Reports Server (NTRS)
Reasenberg, R. D.
1985-01-01
The investigation of the subsurface mass distribution of Venus through the analysis of the data from Pioneer Venus Orbiter (PVO) is presented. The Doppler tracking data was used to map the gravitational potential, which was compared to the topographic data from the PVO radar (ORAD). In order to obtain an unbiased comparison, the topography obtained from the PVO-ORAD was filtered to introduce distortions which are the same as those of our gravity models. The last major software package that was required in order to determine the spectral admittance Z (lambda) was used. This package solves the forward problem: given the topography and its density, and assuming no compensation, find the resulting spacecraft acceleration along a given nominal trajectory. The filtered topography is obtained by processing these accelerations in the same way (i.e., with the same geophysical inverter) as the Doppler-rate data that we use to estimate the gravity maps.
Telemetry Boards Interpret Rocket, Airplane Engine Data
NASA Technical Reports Server (NTRS)
2009-01-01
For all the data gathered by the space shuttle while in orbit, NASA engineers are just as concerned about the information it generates on the ground. From the moment the shuttle s wheels touch the runway to the break of its electrical umbilical cord at 0.4 seconds before its next launch, sensors feed streams of data about the status of the vehicle and its various systems to Kennedy Space Center s shuttle crews. Even while the shuttle orbiter is refitted in Kennedy s orbiter processing facility, engineers constantly monitor everything from power levels to the testing of the mechanical arm in the orbiter s payload bay. On the launch pad and up until liftoff, the Launch Control Center, attached to the large Vehicle Assembly Building, screens all of the shuttle s vital data. (Once the shuttle clears its launch tower, this responsibility shifts to Mission Control at Johnson Space Center, with Kennedy in a backup role.) Ground systems for satellite launches also generate significant amounts of data. At Cape Canaveral Air Force Station, across the Banana River from Kennedy s location on Merritt Island, Florida, NASA rockets carrying precious satellite payloads into space flood the Launch Vehicle Data Center with sensor information on temperature, speed, trajectory, and vibration. The remote measurement and transmission of systems data called telemetry is essential to ensuring the safe and successful launch of the Agency s space missions. When a launch is unsuccessful, as it was for this year s Orbiting Carbon Observatory satellite, telemetry data also provides valuable clues as to what went wrong and how to remedy any problems for future attempts. All of this information is streamed from sensors in the form of binary code: strings of ones and zeros. One small company has partnered with NASA to provide technology that renders raw telemetry data intelligible not only for Agency engineers, but also for those in the private sector.
Statistical Treatment of Looking-Time Data
2016-01-01
Looking times (LTs) are frequently measured in empirical research on infant cognition. We analyzed the statistical distribution of LTs across participants to develop recommendations for their treatment in infancy research. Our analyses focused on a common within-subject experimental design, in which longer looking to novel or unexpected stimuli is predicted. We analyzed data from 2 sources: an in-house set of LTs that included data from individual participants (47 experiments, 1,584 observations), and a representative set of published articles reporting group-level LT statistics (149 experiments from 33 articles). We established that LTs are log-normally distributed across participants, and therefore, should always be log-transformed before parametric statistical analyses. We estimated the typical size of significant effects in LT studies, which allowed us to make recommendations about setting sample sizes. We show how our estimate of the distribution of effect sizes of LT studies can be used to design experiments to be analyzed by Bayesian statistics, where the experimenter is required to determine in advance the predicted effect size rather than the sample size. We demonstrate the robustness of this method in both sets of LT experiments. PMID:26845505
Simultaneous Statistical Inference for Epigenetic Data
Schildknecht, Konstantin; Olek, Sven; Dickhaus, Thorsten
2015-01-01
Epigenetic research leads to complex data structures. Since parametric model assumptions for the distribution of epigenetic data are hard to verify we introduce in the present work a nonparametric statistical framework for two-group comparisons. Furthermore, epigenetic analyses are often performed at various genetic loci simultaneously. Hence, in order to be able to draw valid conclusions for specific loci, an appropriate multiple testing correction is necessary. Finally, with technologies available for the simultaneous assessment of many interrelated biological parameters (such as gene arrays), statistical approaches also need to deal with a possibly unknown dependency structure in the data. Our statistical approach to the nonparametric comparison of two samples with independent multivariate observables is based on recently developed multivariate multiple permutation tests. We adapt their theory in order to cope with families of hypotheses regarding relative effects. Our results indicate that the multivariate multiple permutation test keeps the pre-assigned type I error level for the global null hypothesis. In combination with the closure principle, the family-wise error rate for the simultaneous test of the corresponding locus/parameter-specific null hypotheses can be controlled. In applications we demonstrate that group differences in epigenetic data can be detected reliably with our methodology. PMID:25965389
Component fragilities. Data collection, analysis and interpretation
Bandyopadhyay, K.K.; Hofmayer, C.H.
1985-01-01
As part of the component fragility research program sponsored by the US NRC, BNL is involved in establishing seismic fragility levels for various nuclear power plant equipment with emphasis on electrical equipment. To date, BNL has reviewed approximately seventy test reports to collect fragility or high level test data for switchgears, motor control centers and similar electrical cabinets, valve actuators and numerous electrical and control devices, e.g., switches, transmitters, potentiometers, indicators, relays, etc., of various manufacturers and models. BNL has also obtained test data from EPRI/ANCO. Analysis of the collected data reveals that fragility levels can best be described by a group of curves corresponding to various failure modes. The lower bound curve indicates the initiation of malfunctioning or structural damage, whereas the upper bound curve corresponds to overall failure of the equipment based on known failure modes occurring separately or interactively. For some components, the upper and lower bound fragility levels are observed to vary appreciably depending upon the manufacturers and models. For some devices, testing even at the shake table vibration limit does not exhibit any failure. Failure of a relay is observed to be a frequent cause of failure of an electrical panel or a system. An extensive amount of additional fregility or high level test data exists.
Geophysical Interpretation of Venus Gravity Data
NASA Technical Reports Server (NTRS)
Reasenberg, R. D.
1985-01-01
The subsurface distribution of Venus was investigated through the analysis of the data from Pioneer Venus Orbiter (PVO). In particular, the Doppler tracking data were used to map the gravitational potential. These were compared to the topographic data from the PVO radar (ORAD). In order to obtain an unbiased comparison, the topography data obtained from the PVO-ORAD were filtered to introduce distortions which are the same as those of the gravity models. Both the gravity and filtered topography maps are derived by two stage processes with a common second stage. In the first stage, the topography was used to calculate a corresponding spacecraft acceleration under the assumptions that the topography has a uniform given density and no compensation. In the second stage, the acceleration measures found in the first stage were passed through a linear inverter to yield maps of gravity and topography. Because these maps are the result of the same inversion process, they contain the same distortion; a comparison between them is unbiased to first order.
Szabolcsi, Zoltán; Farkas, Zsuzsa; Borbély, Andrea; Bárány, Gusztáv; Varga, Dániel; Heinrich, Attila; Völgyi, Antónia; Pamjav, Horolma
2015-11-01
When the DNA profile from a crime-scene matches that of a suspect, the weight of DNA evidence depends on the unbiased estimation of the match probability of the profiles. For this reason, it is required to establish and expand the databases that reflect the actual allele frequencies in the population applied. 21,473 complete DNA profiles from Databank samples were used to establish the allele frequency database to represent the population of Hungarian suspects. We used fifteen STR loci (PowerPlex ESI16) including five, new ESS loci. The aim was to calculate the statistical, forensic efficiency parameters for the Databank samples and compare the newly detected data to the earlier report. The population substructure caused by relatedness may influence the frequency of profiles estimated. As our Databank profiles were considered non-random samples, possible relationships between the suspects can be assumed. Therefore, population inbreeding effect was estimated using the FIS calculation. The overall inbreeding parameter was found to be 0.0106. Furthermore, we tested the impact of the two allele frequency datasets on 101 randomly chosen STR profiles, including full and partial profiles. The 95% confidence interval estimates for the profile frequencies (pM) resulted in a tighter range when we used the new dataset compared to the previously published ones. We found that the FIS had less effect on frequency values in the 21,473 samples than the application of minimum allele frequency. No genetic substructure was detected by STRUCTURE analysis. Due to the low level of inbreeding effect and the high number of samples, the new dataset provides unbiased and precise estimates of LR for statistical interpretation of forensic casework and allows us to use lower allele frequencies.
Impact of equity models and statistical measures on interpretations of educational reform
NASA Astrophysics Data System (ADS)
Rodriguez, Idaykis; Brewe, Eric; Sawtelle, Vashti; Kramer, Laird H.
2012-12-01
We present three models of equity and show how these, along with the statistical measures used to evaluate results, impact interpretation of equity in education reform. Equity can be defined and interpreted in many ways. Most equity education reform research strives to achieve equity by closing achievement gaps between groups. An example is given by the study by Lorenzo et al. that shows that interactive engagement methods lead to increased gender equity. In this paper, we reexamine the results of Lorenzo et al. through three models of equity. We find that interpretation of the results strongly depends on the model of equity chosen. Further, we argue that researchers must explicitly state their model of equity as well as use effect size measurements to promote clarity in education reform.
Thoth: Software for data visualization & statistics
NASA Astrophysics Data System (ADS)
Laher, R. R.
2016-10-01
Thoth is a standalone software application with a graphical user interface for making it easy to query, display, visualize, and analyze tabular data stored in relational databases and data files. From imported data tables, it can create pie charts, bar charts, scatter plots, and many other kinds of data graphs with simple menus and mouse clicks (no programming required), by leveraging the open-source JFreeChart library. It also computes useful table-column data statistics. A mature tool, having underwent development and testing over several years, it is written in the Java computer language, and hence can be run on any computing platform that has a Java Virtual Machine and graphical-display capability. It can be downloaded and used by anyone free of charge, and has general applicability in science, engineering, medical, business, and other fields. Special tools and features for common tasks in astronomy and astrophysical research are included in the software.
Bayesian methods for interpreting plutonium urinalysis data
Miller, G.; Inkret, W.C.
1995-09-01
The authors discuss an internal dosimetry problem, where measurements of plutonium in urine are used to calculate radiation doses. The authors have developed an algorithm using the MAXENT method. The method gives reasonable results, however the role of the entropy prior distribution is to effectively fit the urine data using intakes occurring close in time to each measured urine result, which is unrealistic. A better approximation for the actual prior is the log-normal distribution; however, with the log-normal distribution another calculational approach must be used. Instead of calculating the most probable values, they turn to calculating expectation values directly from the posterior probability, which is feasible for a small number of intakes.
HistFitter: a flexible framework for statistical data analysis
NASA Astrophysics Data System (ADS)
Besjes, G. J.; Baak, M.; Côté, D.; Koutsman, A.; Lorenz, J. M.; Short, D.
2015-12-01
HistFitter is a software framework for statistical data analysis that has been used extensively in the ATLAS Collaboration to analyze data of proton-proton collisions produced by the Large Hadron Collider at CERN. Most notably, HistFitter has become a de-facto standard in searches for supersymmetric particles since 2012, with some usage for Exotic and Higgs boson physics. HistFitter coherently combines several statistics tools in a programmable and flexible framework that is capable of bookkeeping hundreds of data models under study using thousands of generated input histograms. HistFitter interfaces with the statistics tools HistFactory and RooStats to construct parametric models and to perform statistical tests of the data, and extends these tools in four key areas. The key innovations are to weave the concepts of control, validation and signal regions into the very fabric of HistFitter, and to treat these with rigorous methods. Multiple tools to visualize and interpret the results through a simple configuration interface are also provided.
MSL DAN Passive Data and Interpretations
NASA Astrophysics Data System (ADS)
Tate, C. G.; Moersch, J.; Jun, I.; Ming, D. W.; Mitrofanov, I. G.; Litvak, M. L.; Behar, A.; Boynton, W. V.; Drake, D.; Lisov, D.; Mischna, M. A.; Hardgrove, C. J.; Milliken, R.; Sanin, A. B.; Starr, R. D.; Martín-Torres, J.; Zorzano, M. P.; Fedosov, F.; Golovin, D.; Harshman, K.; Kozyrev, A.; Malakhov, A. V.; Mokrousov, M.; Nikiforov, S.; Varenikov, A.
2014-12-01
In its passive mode of operation, The Mars Science Laboratory Dynamic Albedo of Neutrons experiment (DAN) detects low energy neutrons that are produced by two different sources on Mars. Neutrons are produced by the rover's Multi-Mission Radioisotope Thermoelectric Generator (MMRTG) and by interactions of high energy galactic cosmic rays (GCR) within the atmosphere and regolith. As these neutrons propagate through the subsurface, their energies can be moderated by interactions with hydrogen nuclei. More hydrogen leads to greater moderation (thermalization) of the neutron population energies. The presence of high thermal neutron absorbing elements within the regolith also complicates the spectrum of the returning neutron population, as shown by Hardgrove et al. DAN measures the thermal and epithermal neutron populations leaking from the surface to infer the amount of water equivalent hydrogen (WEH) in the shallow regolith. Extensive modeling is performed using a Monte Carlo approach (MCNPX) to analyze DAN passive measurements at fixed locations and along rover traverse segments. DAN passive WEH estimates along Curiosity's traverse will be presented along with an analysis of trends in the data and a description of correlations between these results and the geologic characteristics of the surfaces traversed.
Common misconceptions about data analysis and statistics.
Motulsky, Harvey J
2015-02-01
Ideally, any experienced investigator with the right tools should be able to reproduce a finding published in a peer-reviewed biomedical science journal. In fact, the reproducibility of a large percentage of published findings has been questioned. Undoubtedly, there are many reasons for this, but one reason may be that investigators fool themselves due to a poor understanding of statistical concepts. In particular, investigators often make these mistakes: (1) P-Hacking. This is when you reanalyze a data set in many different ways, or perhaps reanalyze with additional replicates, until you get the result you want. (2) Overemphasis on P values rather than on the actual size of the observed effect. (3) Overuse of statistical hypothesis testing, and being seduced by the word "significant". (4) Overreliance on standard errors, which are often misunderstood. PMID:25692012
Common misconceptions about data analysis and statistics.
Motulsky, Harvey J
2014-11-01
Ideally, any experienced investigator with the right tools should be able to reproduce a finding published in a peer-reviewed biomedical science journal. In fact, the reproducibility of a large percentage of published findings has been questioned. Undoubtedly, there are many reasons for this, but one reason maybe that investigators fool themselves due to a poor understanding of statistical concepts. In particular, investigators often make these mistakes: 1. P-Hacking. This is when you reanalyze a data set in many different ways, or perhaps reanalyze with additional replicates, until you get the result you want. 2. Overemphasis on P values rather than on the actual size of the observed effect. 3. Overuse of statistical hypothesis testing, and being seduced by the word "significant". 4. Overreliance on standard errors, which are often misunderstood. PMID:25213136
Common misconceptions about data analysis and statistics.
Motulsky, Harvey J
2014-10-01
Ideally, any experienced investigator with the right tools should be able to reproduce a finding published in a peer-reviewed biomedical science journal. In fact, however, the reproducibility of a large percentage of published findings has been questioned. Undoubtedly, there are many reasons for this, but one reason may be that investigators fool themselves due to a poor understanding of statistical concepts. In particular, investigators often make these mistakes: 1) P-hacking, which is when you reanalyze a data set in many different ways, or perhaps reanalyze with additional replicates, until you get the result you want; 2) overemphasis on P values rather than on the actual size of the observed effect; 3) overuse of statistical hypothesis testing, and being seduced by the word "significant"; and 4) over-reliance on standard errors, which are often misunderstood. PMID:25204545
Common misconceptions about data analysis and statistics.
Motulsky, Harvey J
2014-11-01
Ideally, any experienced investigator with the right tools should be able to reproduce a finding published in a peer-reviewed biomedical science journal. In fact, the reproducibility of a large percentage of published findings has been questioned. Undoubtedly, there are many reasons for this, but one reason maybe that investigators fool themselves due to a poor understanding of statistical concepts. In particular, investigators often make these mistakes: 1. P-Hacking. This is when you reanalyze a data set in many different ways, or perhaps reanalyze with additional replicates, until you get the result you want. 2. Overemphasis on P values rather than on the actual size of the observed effect. 3. Overuse of statistical hypothesis testing, and being seduced by the word "significant". 4. Overreliance on standard errors, which are often misunderstood.
Common misconceptions about data analysis and statistics.
Motulsky, Harvey J
2014-10-01
Ideally, any experienced investigator with the right tools should be able to reproduce a finding published in a peer-reviewed biomedical science journal. In fact, however, the reproducibility of a large percentage of published findings has been questioned. Undoubtedly, there are many reasons for this, but one reason may be that investigators fool themselves due to a poor understanding of statistical concepts. In particular, investigators often make these mistakes: 1) P-hacking, which is when you reanalyze a data set in many different ways, or perhaps reanalyze with additional replicates, until you get the result you want; 2) overemphasis on P values rather than on the actual size of the observed effect; 3) overuse of statistical hypothesis testing, and being seduced by the word "significant"; and 4) over-reliance on standard errors, which are often misunderstood.
Common misconceptions about data analysis and statistics.
Motulsky, Harvey J
2015-02-01
Ideally, any experienced investigator with the right tools should be able to reproduce a finding published in a peer-reviewed biomedical science journal. In fact, the reproducibility of a large percentage of published findings has been questioned. Undoubtedly, there are many reasons for this, but one reason may be that investigators fool themselves due to a poor understanding of statistical concepts. In particular, investigators often make these mistakes: (1) P-Hacking. This is when you reanalyze a data set in many different ways, or perhaps reanalyze with additional replicates, until you get the result you want. (2) Overemphasis on P values rather than on the actual size of the observed effect. (3) Overuse of statistical hypothesis testing, and being seduced by the word "significant". (4) Overreliance on standard errors, which are often misunderstood.
The seismic analyzer: interpreting and illustrating 2D seismic data.
Patel, Daniel; Giertsen, Christopher; Thurmond, John; Gjelberg, John; Gröller, M Eduard
2008-01-01
We present a toolbox for quickly interpreting and illustrating 2D slices of seismic volumetric reflection data. Searching for oil and gas involves creating a structural overview of seismic reflection data to identify hydrocarbon reservoirs. We improve the search of seismic structures by precalculating the horizon structures of the seismic data prior to interpretation. We improve the annotation of seismic structures by applying novel illustrative rendering algorithms tailored to seismic data, such as deformed texturing and line and texture transfer functions. The illustrative rendering results in multi-attribute and scale invariant visualizations where features are represented clearly in both highly zoomed in and zoomed out views. Thumbnail views in combination with interactive appearance control allows for a quick overview of the data before detailed interpretation takes place. These techniques help reduce the work of seismic illustrators and interpreters.
Menzerath-Altmann Law: Statistical Mechanical Interpretation as Applied to a Linguistic Organization
NASA Astrophysics Data System (ADS)
Eroglu, Sertac
2014-10-01
The distribution behavior described by the empirical Menzerath-Altmann law is frequently encountered during the self-organization of linguistic and non-linguistic natural organizations at various structural levels. This study presents a statistical mechanical derivation of the law based on the analogy between the classical particles of a statistical mechanical organization and the distinct words of a textual organization. The derived model, a transformed (generalized) form of the Menzerath-Altmann model, was termed as the statistical mechanical Menzerath-Altmann model. The derived model allows interpreting the model parameters in terms of physical concepts. We also propose that many organizations presenting the Menzerath-Altmann law behavior, whether linguistic or not, can be methodically examined by the transformed distribution model through the properly defined structure-dependent parameter and the energy associated states.
78 FR 10166 - Access Interpreting; Transfer of Data
Federal Register 2010, 2011, 2012, 2013, 2014
2013-02-13
... From the Federal Register Online via the Government Publishing Office ENVIRONMENTAL PROTECTION AGENCY Access Interpreting; Transfer of Data AGENCY: Environmental Protection Agency (EPA). ACTION: Notice. SUMMARY: This notice announces that pesticide related information submitted to EPA's Office...
Energy statistics data finder. [Monograph; energy-related census data
Not Available
1980-08-01
Energy-related data collected by the Bureau of the Census covers economic and demographic areas and provides data on a regular basis to produce current estimates from survey programs. Series report numbers, a summary of subject content, geographic detail, and report frequency are identified under the following major publication title categories: Agriculture, Retail Trade, Wholesale Trade, Service Industries, Construction, Transportation, Enterprise Statistics, County Business Patterns, Foreign Trade, Governments, Manufacturers, Mineral Industries, 1980 Census of Population and Housing, Annual Housing Survey and Travel-to-Work Supplement, and Statistical Compendia. The data are also available on computer tapes, microfiche, and in special tabulations. (DCK)
Statistical challenges of high-dimensional data
Johnstone, Iain M.; Titterington, D. Michael
2009-01-01
Modern applications of statistical theory and methods can involve extremely large datasets, often with huge numbers of measurements on each of a comparatively small number of experimental units. New methodology and accompanying theory have emerged in response: the goal of this Theme Issue is to illustrate a number of these recent developments. This overview article introduces the difficulties that arise with high-dimensional data in the context of the very familiar linear statistical model: we give a taste of what can nevertheless be achieved when the parameter vector of interest is sparse, that is, contains many zero elements. We describe other ways of identifying low-dimensional subspaces of the data space that contain all useful information. The topic of classification is then reviewed along with the problem of identifying, from within a very large set, the variables that help to classify observations. Brief mention is made of the visualization of high-dimensional data and ways to handle computational problems in Bayesian analysis are described. At appropriate points, reference is made to the other papers in the issue. PMID:19805443
The Statistical Literacy Needed to Interpret School Assessment Data
ERIC Educational Resources Information Center
Chick, Helen; Pierce, Robyn
2013-01-01
State-wide and national testing in areas such as literacy and numeracy produces reports containing graphs and tables illustrating school and individual performance. These are intended to inform teachers, principals, and education organisations about student and school outcomes, to guide change and improvement. Given the complexity of the…
Models to interpret bedform geometries from cross-bed data
Luthi, S.M. ); Banavar, J.R. ); Bayer, U. )
1990-03-01
Semi-elliptical and sinusoidal bedform crestlines were modeled with curvature and sinuosity as parameters. Both bedform crestlines are propagated at various angles of migration over a finite area of deposition. Two computational approaches are used, a statistical random sampling (Monte Carlo) technique over the area of the deposit, and an analytical method based on topology and differential geometry. The resulting foreset azimuth distributions provide a catalogue for a variety of situations. The resulting thickness distributions have a simple shape and can be combined with the azimuth distributions to constrain further the cross-strata geometry. Paleocurrent directions obtained by these models can differ substantially from other methods, especially for obliquely migrating low-curvature bedforms. Interpretation of foreset azimuth data from outcrops and wells can be done either by visual comparison with the catalogued distributions, or by iterative computational fits. Studied examples include eolian cross-strata from the Permian Rotliegendes in the North Sea, fluvial dunes from the Devonian in the Catskills (New York State), the Triassic Schilfsandstein (West Germany) and the Paleozoic-Jurassic of the Western Desert (Egypt), as well as recent tidal dunes from the German coast of the North Sea and tidal cross-strata from the Devonian Koblentquartzit (West Germany). In all cases the semi-elliptical bedform model gave a good fit to the data, suggesting that it may be applicable over a wide range of bedforms. The data from the Western Desert could only be explained by data scatter due to channel sinuosity combining with the scatter attributed to the ellipticity of the bedform crestlines. These models, therefore, may also allow simulations of some hierarchically structured bedforms.
Statistical methods and computing for big data
Wang, Chun; Chen, Ming-Hui; Schifano, Elizabeth; Wu, Jing
2016-01-01
Big data are data on a massive scale in terms of volume, intensity, and complexity that exceed the capacity of standard analytic tools. They present opportunities as well as challenges to statisticians. The role of computational statisticians in scientific discovery from big data analyses has been under-recognized even by peer statisticians. This article summarizes recent methodological and software developments in statistics that address the big data challenges. Methodologies are grouped into three classes: subsampling-based, divide and conquer, and online updating for stream data. As a new contribution, the online updating approach is extended to variable selection with commonly used criteria, and their performances are assessed in a simulation study with stream data. Software packages are summarized with focuses on the open source R and R packages, covering recent tools that help break the barriers of computer memory and computing power. Some of the tools are illustrated in a case study with a logistic regression for the chance of airline delay. PMID:27695593
Statistical methods and computing for big data
Wang, Chun; Chen, Ming-Hui; Schifano, Elizabeth; Wu, Jing
2016-01-01
Big data are data on a massive scale in terms of volume, intensity, and complexity that exceed the capacity of standard analytic tools. They present opportunities as well as challenges to statisticians. The role of computational statisticians in scientific discovery from big data analyses has been under-recognized even by peer statisticians. This article summarizes recent methodological and software developments in statistics that address the big data challenges. Methodologies are grouped into three classes: subsampling-based, divide and conquer, and online updating for stream data. As a new contribution, the online updating approach is extended to variable selection with commonly used criteria, and their performances are assessed in a simulation study with stream data. Software packages are summarized with focuses on the open source R and R packages, covering recent tools that help break the barriers of computer memory and computing power. Some of the tools are illustrated in a case study with a logistic regression for the chance of airline delay.
Statistical fusion of GPR and EMI data
NASA Astrophysics Data System (ADS)
Weisenseel, Robert A.; Karl, William C.; Castanon, David A.; Miller, Eric L.; Rappaport, Carey M.; DiMarzio, Charles A.
1999-08-01
In this paper, we develop a statistical detection system exploiting sensor fusion for the detection of plastic A/P miens. We design and test the system using data from Monte Carlo electromagnetic induction (EMI) and ground penetrating radar (GPR) simulations. We include the effects of both random soil surface variability and sensor noise. In spite of the presence of a rough surface, we can obtain good result fusing EMI and GPR data using a statistical approach in a simple clutter environment. More generally, we develop a framework for simulation and testing of sensor configurations and sensor fusion approaches for landmine and unexploded ordinance detection systems. Exploiting accurate electromagnetic simulation, we develop a controlled environment for testing sensor fusion concepts, from varied sensor arrangements to detection algorithms, In this environment, we can examine the effect of changing mine structure, soil parameters, and sensor geometry on the sensor fusion problem. We can then generalize these results to produce mine detectors robust to real-world variations.
Regional interpretation of water-quality monitoring data
Smith, R.A.; Schwarz, G.E.; Alexander, R.B.
1997-01-01
We describe a method for using spatially referenced regressions of contaminant transport on watershed attributes (SPARROW) in regional water-quality assessment. The method is designed to reduce the problems of data interpretation caused by sparse sampling, network bias, and basin heterogeneity. The regression equation relates measured transport rates in streams to spatially referenced descriptors of pollution sources and land-surface and stream-channel characteristics. Regression models of total phosphorus (TP) and total nitrogen (TN) transport are constructed for a region defined as the nontidal conterminous United States. Observed TN and TP transport rates are derived from water-quality records for 414 stations in the National Stream Quality Accounting Network. Nutrient sources identified in the equations include point sources, applied fertilizer, livestock waste, nonagricultural land, and atmospheric deposition (TN only). Surface characteristics found to be significant predictors of land-water delivery include soil permeability, stream density, and temperature (TN only). Estimated instream decay coefficients for the two contaminants decrease monotonically with increasing stream size. TP transport is found to be significantly reduced by reservoir retention. Spatial referencing of basin attributes in relation to the stream channel network greatly increases their statistical significance and model accuracy. The method is used to estimate the proportion of watersheds in the conterminous United States (i.e., hydrologic cataloging units) with outflow TP concentrations less than the criterion of 0.1 mg L, and to classify cataloging units according to local TN yield (kg/km2/yr).
Weatherization Assistance Program - Background Data and Statistics
Eisenberg, Joel Fred
2010-03-01
This technical memorandum is intended to provide readers with information that may be useful in understanding the purposes, performance, and outcomes of the Department of Energy's (DOE's) Weatherization Assistance Program (Weatherization). Weatherization has been in operation for over thirty years and is the nation's largest single residential energy efficiency program. Its primary purpose, established by law, is 'to increase the energy efficiency of dwellings owned or occupied by low-income persons, reduce their total residential energy expenditures, and improve their health and safety, especially low-income persons who are particularly vulnerable such as the elderly, the handicapped, and children.' The American Reinvestment and Recovery Act PL111-5 (ARRA), passed and signed into law in February 2009, committed $5 Billion over two years to an expanded Weatherization Assistance Program. This has created substantial interest in the program, the population it serves, the energy and cost savings it produces, and its cost-effectiveness. This memorandum is intended to address the need for this kind of information. Statistically valid answers to many of the questions surrounding Weatherization and its performance require comprehensive evaluation of the program. DOE is undertaking precisely this kind of independent evaluation in order to ascertain program effectiveness and to improve its performance. Results of this evaluation effort will begin to emerge in late 2010 and 2011, but they require substantial time and effort. In the meantime, the data and statistics in this memorandum can provide reasonable and transparent estimates of key program characteristics. The memorandum is laid out in three sections. The first deals with some key characteristics describing low-income energy consumption and expenditures. The second section provides estimates of energy savings and energy bill reductions that the program can reasonably be presumed to be producing. The third section
Quantitative interpretation of airborne gravity gradiometry data for mineral exploration
NASA Astrophysics Data System (ADS)
Martinez, Cericia D.
In the past two decades, commercialization of previously classified instrumentation has provided the ability to rapidly collect quality gravity gradient measurements for resource exploration. In the near future, next-generation instrumentation are expected to further advance acquisition of higher-quality data not subject to pre-processing regulations. Conversely, the ability to process and interpret gravity gradiometry data has not kept pace with innovations occurring in data acquisition systems. The purpose of the research presented in this thesis is to contribute to the understanding, development, and application of processing and interpretation techniques available for airborne gravity gradiometry in resource exploration. In particular, this research focuses on the utility of 3D inversion of gravity gradiometry for interpretation purposes. Towards this goal, I investigate the requisite components for an integrated interpretation workflow. In addition to practical 3D inversions, components of the workflow include estimation of density for terrain correction, processing of multi-component data using equivalent source for denoising, quantification of noise level, and component conversion. The objective is to produce high quality density distributions for subsequent geological interpretation. I then investigate the use of the inverted density model in orebody imaging, lithology differentiation, and resource evaluation. The systematic and sequential approach highlighted in the thesis addresses some of the challenges facing the use of gravity gradiometry as an exploration tool, while elucidating a procedure for incorporating gravity gradient interpretations into the lifecycle of not only resource exploration, but also resource modeling.
Statistical atlas based extrapolation of CT data
NASA Astrophysics Data System (ADS)
Chintalapani, Gouthami; Murphy, Ryan; Armiger, Robert S.; Lepisto, Jyri; Otake, Yoshito; Sugano, Nobuhiko; Taylor, Russell H.; Armand, Mehran
2010-02-01
We present a framework to estimate the missing anatomical details from a partial CT scan with the help of statistical shape models. The motivating application is periacetabular osteotomy (PAO), a technique for treating developmental hip dysplasia, an abnormal condition of the hip socket that, if untreated, may lead to osteoarthritis. The common goals of PAO are to reduce pain, joint subluxation and improve contact pressure distribution by increasing the coverage of the femoral head by the hip socket. While current diagnosis and planning is based on radiological measurements, because of significant structural variations in dysplastic hips, a computer-assisted geometrical and biomechanical planning based on CT data is desirable to help the surgeon achieve optimal joint realignments. Most of the patients undergoing PAO are young females, hence it is usually desirable to minimize the radiation dose by scanning only the joint portion of the hip anatomy. These partial scans, however, do not provide enough information for biomechanical analysis due to missing iliac region. A statistical shape model of full pelvis anatomy is constructed from a database of CT scans. The partial volume is first aligned with the statistical atlas using an iterative affine registration, followed by a deformable registration step and the missing information is inferred from the atlas. The atlas inferences are further enhanced by the use of X-ray images of the patient, which are very common in an osteotomy procedure. The proposed method is validated with a leave-one-out analysis method. Osteotomy cuts are simulated and the effect of atlas predicted models on the actual procedure is evaluated.
NASA Technical Reports Server (NTRS)
Shewhart, Mark
1991-01-01
Statistical Process Control (SPC) charts are one of several tools used in quality control. Other tools include flow charts, histograms, cause and effect diagrams, check sheets, Pareto diagrams, graphs, and scatter diagrams. A control chart is simply a graph which indicates process variation over time. The purpose of drawing a control chart is to detect any changes in the process signalled by abnormal points or patterns on the graph. The Artificial Intelligence Support Center (AISC) of the Acquisition Logistics Division has developed a hybrid machine learning expert system prototype which automates the process of constructing and interpreting control charts.
Monroe, Scott; Cai, Li
2015-01-01
This research is concerned with two topics in assessing model fit for categorical data analysis. The first topic involves the application of a limited-information overall test, introduced in the item response theory literature, to structural equation modeling (SEM) of categorical outcome variables. Most popular SEM test statistics assess how well the model reproduces estimated polychoric correlations. In contrast, limited-information test statistics assess how well the underlying categorical data are reproduced. Here, the recently introduced C2 statistic of Cai and Monroe (2014) is applied. The second topic concerns how the root mean square error of approximation (RMSEA) fit index can be affected by the number of categories in the outcome variable. This relationship creates challenges for interpreting RMSEA. While the two topics initially appear unrelated, they may conveniently be studied in tandem since RMSEA is based on an overall test statistic, such as C2. The results are illustrated with an empirical application to data from a large-scale educational survey.
A statistical methodology for deriving reservoir properties from seismic data
Fournier, F.; Derain, J.F.
1995-09-01
The use of seismic data to better constrain the reservoir model between wells has become an important goal for seismic interpretation. The authors propose a methodology for deriving soft geologic information from seismic data and discuss its application through a case study in offshore Congo. The methodology combines seismic facies analysis and statistical calibration techniques applied to seismic attributes characterizing the traces at the reservoir level. They built statistical relationships between seismic attributes and reservoir properties from a calibration population consisting of wells and their adjacent traces. The correlation studies are based on the canonical correlation analysis technique, while the statistical model comes from a multivariate regression between the canonical seismic variables and the reservoir properties, whenever they ar predictable. In the case study, they predicted estimates and associated uncertainties on the lithofacies thicknesses cumulated over the reservoir interval from the seismic information. They carried out a seismic facies identification and compared the geological prediction results in the cases of a calibration on the whole data set and a calibration done independently on the traces (and wells) related to each seismic facies. The later approach produces a significant improvement in the geological estimation from the seismic information, mainly because the large scale geological variations (and associated seismic ones) over the field can be accounted for.
Statistical Analysis of Cardiovascular Data from FAP
NASA Technical Reports Server (NTRS)
Sealey, Meghan
2016-01-01
pressure, etc.) to see which could best predict how long the subjects could tolerate the tilt tests. With this I plan to analyze an artificial gravity study in order to determine the effects of orthostatic intolerance during spaceflight. From these projects, I became efficient in using the statistical software Stata, which I had previously never used before. I learned new statistical methods, such as mixed-effects linear regression, maximum likelihood estimation on longitudinal data, and post model-fitting tests to see if certain parameters contribute significantly to the model, all of which will better my understanding for when I continue studying for my masters' degree. I was also able to demonstrate my knowledge of statistics by helping other students run statistical analyses for their own projects. After completing these projects, the experience and knowledge gained from completing this analysis exemplifies the type of work that I would like to pursue in the future. After completing my masters' degree, I plan to pursue a career in biostatistics, which is exactly the position that I interned as, and I plan to use this experience to contribute to that goal
Interpreting Survey Data to Inform Solid-Waste Education Programs
ERIC Educational Resources Information Center
McKeown, Rosalyn
2006-01-01
Few examples exist on how to use survey data to inform public environmental education programs. I suggest a process for interpreting statewide survey data with the four questions that give insights into local context and make it possible to gain insight into potential target audiences and community priorities. The four questions are: What…
Design, analysis, and interpretation of field quality-control data for water-sampling projects
Mueller, David K.; Schertz, Terry L.; Martin, Jeffrey D.; Sandstrom, Mark W.
2015-01-01
The report provides extensive information about statistical methods used to analyze quality-control data in order to estimate potential bias and variability in environmental data. These methods include construction of confidence intervals on various statistical measures, such as the mean, percentiles and percentages, and standard deviation. The methods are used to compare quality-control results with the larger set of environmental data in order to determine whether the effects of bias and variability might interfere with interpretation of these data. Examples from published reports are presented to illustrate how the methods are applied, how bias and variability are reported, and how the interpretation of environmental data can be qualified based on the quality-control analysis.
Soil VisNIR chemometric performance statistics should be interpreted as random variables
NASA Astrophysics Data System (ADS)
Brown, David J.; Gasch, Caley K.; Poggio, Matteo; Morgan, Cristine L. S.
2015-04-01
Chemometric models are normally evaluated using performance statistics such as the Standard Error of Prediction (SEP) or the Root Mean Squared Error of Prediction (RMSEP). These statistics are used to evaluate the quality of chemometric models relative to other published work on a specific soil property or to compare the results from different processing and modeling techniques (e.g. Partial Least Squares Regression or PLSR and random forest algorithms). Claims are commonly made about the overall success of an application or the relative performance of different modeling approaches assuming that these performance statistics are fixed population parameters. While most researchers would acknowledge that small differences in performance statistics are not important, rarely are performance statistics treated as random variables. Given that we are usually comparing modeling approaches for general application, and given that the intent of VisNIR soil spectroscopy is to apply chemometric calibrations to larger populations than are included in our soil-spectral datasets, it is more appropriate to think of performance statistics as random variables with variation introduced through the selection of samples for inclusion in a given study and through the division of samples into calibration and validation sets (including spiking approaches). Here we look at the variation in VisNIR performance statistics for the following soil-spectra datasets: (1) a diverse US Soil Survey soil-spectral library with 3768 samples from all 50 states and 36 different countries; (2) 389 surface and subsoil samples taken from US Geological Survey continental transects; (3) the Texas Soil Spectral Library (TSSL) with 3000 samples; (4) intact soil core scans of Texas soils with 700 samples; (5) approximately 400 in situ scans from the Pacific Northwest region; and (6) miscellaneous local datasets. We find the variation in performance statistics to be surprisingly large. This has important
Statistical mapping of count survey data
Royle, J. Andrew; Link, W.A.; Sauer, J.R.; Scott, J. Michael; Heglund, Patricia J.; Morrison, Michael L.; Haufler, Jonathan B.; Wall, William A.
2002-01-01
We apply a Poisson mixed model to the problem of mapping (or predicting) bird relative abundance from counts collected from the North American Breeding Bird Survey (BBS). The model expresses the logarithm of the Poisson mean as a sum of a fixed term (which may depend on habitat variables) and a random effect which accounts for remaining unexplained variation. The random effect is assumed to be spatially correlated, thus providing a more general model than the traditional Poisson regression approach. Consequently, the model is capable of improved prediction when data are autocorrelated. Moreover, formulation of the mapping problem in terms of a statistical model facilitates a wide variety of inference problems which are cumbersome or even impossible using standard methods of mapping. For example, assessment of prediction uncertainty, including the formal comparison of predictions at different locations, or through time, using the model-based prediction variance is straightforward under the Poisson model (not so with many nominally model-free methods). Also, ecologists may generally be interested in quantifying the response of a species to particular habitat covariates or other landscape attributes. Proper accounting for the uncertainty in these estimated effects is crucially dependent on specification of a meaningful statistical model. Finally, the model may be used to aid in sampling design, by modifying the existing sampling plan in a manner which minimizes some variance-based criterion. Model fitting under this model is carried out using a simulation technique known as Markov Chain Monte Carlo. Application of the model is illustrated using Mourning Dove (Zenaida macroura) counts from Pennsylvania BBS routes. We produce both a model-based map depicting relative abundance, and the corresponding map of prediction uncertainty. We briefly address the issue of spatial sampling design under this model. Finally, we close with some discussion of mapping in relation to
Accessing seismic data through geological interpretation: Challenges and solutions
NASA Astrophysics Data System (ADS)
Butler, R. W.; Clayton, S.; McCaffrey, B.
2008-12-01
Between them, the world's research programs, national institutions and corporations, especially oil and gas companies, have acquired substantial volumes of seismic reflection data. Although the vast majority are proprietary and confidential, significant data are released and available for research, including those in public data libraries. The challenge now is to maximise use of these data, by providing routes to seismic not simply on the basis of acquisition or processing attributes but via the geology they image. The Virtual Seismic Atlas (VSA: www.seismicatlas.org) meets this challenge by providing an independent, free-to-use community based internet resource that captures and shares the geological interpretation of seismic data globally. Images and associated documents are explicitly indexed by extensive metadata trees, using not only existing survey and geographical data but also the geology they portray. The solution uses a Documentum database interrogated through Endeca Guided Navigation, to search, discover and retrieve images. The VSA allows users to compare contrasting interpretations of clean data thereby exploring the ranges of uncertainty in the geometric interpretation of subsurface structure. The metadata structures can be used to link reports and published research together with other data types such as wells. And the VSA can link to existing data libraries. Searches can take different paths, revealing arrays of geological analogues, new datasets while providing entirely novel insights and genuine surprises. This can then drive new creative opportunities for research and training, and expose the contents of seismic data libraries to the world.
NASA Astrophysics Data System (ADS)
Jha, Sanjeev Kumar; Comunian, Alessandro; Mariethoz, Gregoire; Kelly, Bryce F. J.
2014-10-01
We develop a stochastic approach to construct channelized 3-D geological models constrained to borehole measurements as well as geological interpretation. The methodology is based on simple 2-D geologist-provided sketches of fluvial depositional elements, which are extruded in the 3rd dimension. Multiple-point geostatistics (MPS) is used to impair horizontal variability to the structures by introducing geometrical transformation parameters. The sketches provided by the geologist are used as elementary training images, whose statistical information is expanded through randomized transformations. We demonstrate the applicability of the approach by applying it to modeling a fluvial valley filling sequence in the Maules Creek catchment, Australia. The facies models are constrained to borehole logs, spatial information borrowed from an analogue and local orientations derived from the present-day stream networks. The connectivity in the 3-D facies models is evaluated using statistical measures and transport simulations. Comparison with a statistically equivalent variogram-based model shows that our approach is more suited for building 3-D facies models that contain structures specific to the channelized environment and which have a significant influence on the transport processes.
Integrated Exploration Platform: An Interactive Multi-Data Interpretation Tool
NASA Astrophysics Data System (ADS)
Wong, Jason C.; Kovesi, Peter; Holden, Eun-Jung; Gessner, Klaus; Murdie, Ruth
2014-05-01
With the recent increase of geoscientific data being made publically available, it becomes increasingly difficult to efficiently analyse all the data and incorporate it into a single coherent interpretation. The Integrated Exploration Platform is a GIS module that aims to facilitate multi-data interpretation through innovative visualisation and assistive tools to provide improved geological clarity in mineral exploration datasets. We introduce an interactive blending paradigm, where different data can be simultaneously perused, to better facilitate the interpretation of complex information from multiple data sources. Blending combines datasets to form a single image in a way that effectively represents each dataset. In addition, each of these blending tools is used in an interactive manner through control of a blending cursor within each tool. Moving this cursor will change the component weighting of each dataset in the blend. The interactivity of blending the data is important in conveying information. For image blending to be useful in multi-data interpretation, it is important for associations between features and individual input images to remain identifiable and distinct within the blend. The exploratory movements made by the user in interacting with a blending tool are crucial in achieving this. Ultimately, interactivity can provide more information from within a single dataset and better reveal correlations between datasets than typical static blended images. We have developed a family of different multi-image blending tools that are integrated into the Integrated Exploration Platform. These have been designed to support different types of data critical to mineral exploration, such as geophysical data, radiometric data, and ASTER data, with the intention that other data types will be accounted for in the near future.
Geologic interpretation of HCMM and aircraft thermal data
NASA Technical Reports Server (NTRS)
1982-01-01
Progress on the Heat Capacity Mapping Mission (HCMM) follow-on study is reported. Numerous image products for geologic interpretation of both HCMM and aircraft thermal data were produced. These include, among others, various combinations of the thermal data with LANDSAT and SEASAT data. The combined data sets were displayed using simple color composites, principal component color composites and black and white images, and hue, saturation intensity color composites. Algorithms for incorporating both atmospheric and elevation data simultaneously into the digital processing for creation of quantitatively correct thermal inertia images, are in the final development stage. A field trip to Death Valley was undertaken to field check the aircraft and HCMM data.
Understanding MWD data acquisition can improve log interpretation
Fagin, R.A. )
1994-02-14
By understanding how measurement-while-drilling (MWD) tools acquire data and how the data are processed, engineers and geologists can better interpret MWD logs. Wire line and MWD log data sometimes do not precisely match. If a discrepancy occurs between MWD and wire line logs run across the same interval, many log interpreters will condemn the MWD data. Recognizing the differences and the correct data requires a better understanding of the MWD tool operational principles. Because MWD logs are becoming more widely accepted as quantitative replacements for equivalent wire line logs, the differences between logs should be analyzed logically. This paper discusses these differences by describing the following: time-based acquisition, filtering, depth control, environmental variables, and quality control.
Shock Classication of Ordinary Chondrites: New Data and Interpretations
NASA Astrophysics Data System (ADS)
Stoffler, D.; Keil, K.; Scott, E. R. D.
1992-07-01
Introduction. The recently proposed classification system for shocked chondrites (1) is based on a microscopic survey of 76 non-Antarctic H, L, and LL chondrites. Obviously, a larger database is highly desirable in order to confirm earlier conclusions and to allow for a statistically relevant interpretation of the data. Here, we report the shock classification of an additional 54 ordinary chondrites and summarize implications based on a total of 130 samples. New observations on shock effects. Continued studies of those shock effects in olivine and plagioclase that are indicative of the shock stages S1 - S6 as defined in (1) revealed the following: Planar deformation features in olivine, considered typical of stage S5, occur occasionally in stage S3 and are common in stage S4. In some S4 chondrites plagioclase is not partially isotropic but still birefringent coexisting with a small fraction of S3 olivines. Opaque shock veins occur not only in shock stage S3 and above (1) but have now been found in a few chondrites of shock stage S2. Thermal annealing of shock effects. Planar fractures and planar deformation features in olivine persist up to the temperatures required for recrystallization of olivine (> ca. 900 degrees C). Shock history of breccias. In a number of petrologic types 3 and 4 chondrites without recognizable (polymict) breccia texture, we found chondrules and olivine fragments with different shock histories ranging from S1 to S3. Regolith and fragmental breccias are polymict with regard to lithology and shock. The intensity of the latest shock typically varies from S1 to S4 in the breccias studied so far. Frequency distribution of shock stages. A significant difference between H and L chondrites is emerging in contrast to our previous statistics (1), whereas the conspicuous lack of shock stages S5 and S6 in type 3 and 4 chondrites is clearly confirmed (Fig. 1). Correlation between shock and noble gas content. The concentration of radiogenic argon and of
Parameter Interpretation and Reduction for a Unified Statistical Mechanical Surface Tension Model.
Boyer, Hallie; Wexler, Anthony; Dutcher, Cari S
2015-09-01
Surface properties of aqueous solutions are important for environments as diverse as atmospheric aerosols and biocellular membranes. Previously, we developed a surface tension model for both electrolyte and nonelectrolyte aqueous solutions across the entire solute concentration range (Wexler and Dutcher, J. Phys. Chem. Lett. 2013, 4, 1723-1726). The model differentiated between adsorption of solute molecules in the bulk and surface of solution using the statistical mechanics of multilayer sorption solution model of Dutcher et al. (J. Phys. Chem. A 2013, 117, 3198-3213). The parameters in the model had physicochemical interpretations, but remained largely empirical. In the current work, these parameters are related to solute molecular properties in aqueous solutions. For nonelectrolytes, sorption tendencies suggest a strong relation with molecular size and functional group spacing. For electrolytes, surface adsorption of ions follows ion surface-bulk partitioning calculations by Pegram and Record (J. Phys. Chem. B 2007, 111, 5411-5417). PMID:26275040
Parameter Interpretation and Reduction for a Unified Statistical Mechanical Surface Tension Model.
Boyer, Hallie; Wexler, Anthony; Dutcher, Cari S
2015-09-01
Surface properties of aqueous solutions are important for environments as diverse as atmospheric aerosols and biocellular membranes. Previously, we developed a surface tension model for both electrolyte and nonelectrolyte aqueous solutions across the entire solute concentration range (Wexler and Dutcher, J. Phys. Chem. Lett. 2013, 4, 1723-1726). The model differentiated between adsorption of solute molecules in the bulk and surface of solution using the statistical mechanics of multilayer sorption solution model of Dutcher et al. (J. Phys. Chem. A 2013, 117, 3198-3213). The parameters in the model had physicochemical interpretations, but remained largely empirical. In the current work, these parameters are related to solute molecular properties in aqueous solutions. For nonelectrolytes, sorption tendencies suggest a strong relation with molecular size and functional group spacing. For electrolytes, surface adsorption of ions follows ion surface-bulk partitioning calculations by Pegram and Record (J. Phys. Chem. B 2007, 111, 5411-5417).
2014-01-01
Background A new algorithm has been developed to enable the interpretation of black box models. The developed algorithm is agnostic to learning algorithm and open to all structural based descriptors such as fragments, keys and hashed fingerprints. The algorithm has provided meaningful interpretation of Ames mutagenicity predictions from both random forest and support vector machine models built on a variety of structural fingerprints. A fragmentation algorithm is utilised to investigate the model’s behaviour on specific substructures present in the query. An output is formulated summarising causes of activation and deactivation. The algorithm is able to identify multiple causes of activation or deactivation in addition to identifying localised deactivations where the prediction for the query is active overall. No loss in performance is seen as there is no change in the prediction; the interpretation is produced directly on the model’s behaviour for the specific query. Results Models have been built using multiple learning algorithms including support vector machine and random forest. The models were built on public Ames mutagenicity data and a variety of fingerprint descriptors were used. These models produced a good performance in both internal and external validation with accuracies around 82%. The models were used to evaluate the interpretation algorithm. Interpretation was revealed that links closely with understood mechanisms for Ames mutagenicity. Conclusion This methodology allows for a greater utilisation of the predictions made by black box models and can expedite further study based on the output for a (quantitative) structure activity model. Additionally the algorithm could be utilised for chemical dataset investigation and knowledge extraction/human SAR development. PMID:24661325
New Statistical Approach to the Analysis of Hierarchical Data
NASA Astrophysics Data System (ADS)
Neuman, S. P.; Guadagnini, A.; Riva, M.
2014-12-01
Many variables possess a hierarchical structure reflected in how their increments vary in space and/or time. Quite commonly the increments (a) fluctuate in a highly irregular manner; (b) possess symmetric, non-Gaussian frequency distributions characterized by heavy tails that often decay with separation distance or lag; (c) exhibit nonlinear power-law scaling of sample structure functions in a midrange of lags, with breakdown in such scaling at small and large lags; (d) show extended power-law scaling (ESS) at all lags; and (e) display nonlinear scaling of power-law exponent with order of sample structure function. Some interpret this to imply that the variables are multifractal, which explains neither breakdowns in power-law scaling nor ESS. We offer an alternative interpretation consistent with all above phenomena. It views data as samples from stationary, anisotropic sub-Gaussian random fields subordinated to truncated fractional Brownian motion (tfBm) or truncated fractional Gaussian noise (tfGn). The fields are scaled Gaussian mixtures with random variances. Truncation of fBm and fGn entails filtering out components below data measurement or resolution scale and above domain scale. Our novel interpretation of the data allows us to obtain maximum likelihood estimates of all parameters characterizing the underlying truncated sub-Gaussian fields. These parameters in turn make it possible to downscale or upscale all statistical moments to situations entailing smaller or larger measurement or resolution and sampling scales, respectively. They also allow one to perform conditional or unconditional Monte Carlo simulations of random field realizations corresponding to these scales. Aspects of our approach are illustrated on field and laboratory measured porous and fractured rock permeabilities, as well as soil texture characteristics and neural network estimates of unsaturated hydraulic parameters in a deep vadose zone near Phoenix, Arizona. We also use our approach
Lessons learned from interlaboratory comparisons of bioassay data interpretation.
Doerfel, H; Andrasi, A; Bailey, M; Berkovski, V; Castellani, C M; Hurtgen, C; Jourdain, J R; LeGuen, B
2003-01-01
When a set of bioassay data is given to two different dosimetrists, it is likely that these data will be interpreted differently, that different methods and dosimetric models will be applied and therefore different numerical values will be obtained. Thus, it is important for laboratories dealing with internal dosimetry to undergo performance testing procedures such as interlaboratory comparisons of bioassay data interpretation. Several intercomparison exercises have already been organised at national and international levels. The largest one so far was the 3rd European Intercomparison Exercise on Internal Dose Assessment, which has been organised in the framework of the EULEP/EURADOS Action Group, 'Derivation of parameter values for application to the new model of the human respiratory tract for occupational exposure'. The most important lesson learned from these intercomparison exercises was the need to develop agreed guidelines for internal dose evaluation procedures to promote harmonisation of assessments between organisations and countries.
A technique for interpretation of multispectral remote sensor data
NASA Technical Reports Server (NTRS)
Williamson, A. N.
1973-01-01
The author has identified the following significant results. The U.S. Army Engineer Waterways Experiment Station is engaged in a study to detect from ERTS-1 satellite data alterations to the absorption and scattering properties caused by movement of suspended particles and solutes in selected areas of the Chesapeake Bay and to correlate the data to determine the feasibility of delineating flow patterns, flushing action of the estuary, and sediment and pollutant dispersion. As a part of this study, ADP techniques have been developed that permit automatic interpretation of data from any multispectral remote sensor with computer systems which have limited memory capacity and computing speed. The multispectral remote sensor is considered as a reflectance spectrophotometer. The data which define the spectral reflectance characteristics of a scene are scanned pixel by pixel. Each pixel whose spectral reflectance matches a reference spectrum is identified, and the results are shown in a map that identifies the locations where spectrum matches were detected and spectrum that was matched. The interpretation technique is described and an example of interpreted data from ERTS-1 is presented.
Kernel score statistic for dependent data.
Malzahn, Dörthe; Friedrichs, Stefanie; Rosenberger, Albert; Bickeböller, Heike
2014-01-01
The kernel score statistic is a global covariance component test over a set of genetic markers. It provides a flexible modeling framework and does not collapse marker information. We generalize the kernel score statistic to allow for familial dependencies and to adjust for random confounder effects. With this extension, we adjust our analysis of real and simulated baseline systolic blood pressure for polygenic familial background. We find that the kernel score test gains appreciably in power through the use of sequencing compared to tag-single-nucleotide polymorphisms for very rare single nucleotide polymorphisms with <1% minor allele frequency.
Implementation of ILLIAC 4 algorithms for multispectral image interpretation. [earth resources data
NASA Technical Reports Server (NTRS)
Ray, R. M.; Thomas, J. D.; Donovan, W. E.; Swain, P. H.
1974-01-01
Research has focused on the design and partial implementation of a comprehensive ILLIAC software system for computer-assisted interpretation of multispectral earth resources data such as that now collected by the Earth Resources Technology Satellite. Research suggests generally that the ILLIAC 4 should be as much as two orders of magnitude more cost effective than serial processing computers for digital interpretation of ERTS imagery via multivariate statistical classification techniques. The potential of the ARPA Network as a mechanism for interfacing geographically-dispersed users to an ILLIAC 4 image processing facility is discussed.
Rosenfield, G.H.
1986-01-01
Statistical analysis is conducted to determine the unique value of real- and synthetic-aperture side-looking airborne radar (SLAR) to detect interpreted structural elements. SLAR images were compared to standard and digitally enhanced Landsat multispectral scanner (MSS) images and to aerial photographs. After interpretation of the imagery, data were cumulated by total length in miles and by frequency of counts. Maximum uniqueness is obtained first from real-aperture SLAR, 58.3% of total, and, second, from digitally enhanced Landsat MSS images, 54.1% of total. ?? 1986 Plenum Publishing Corporation.
Outpatient health care statistics data warehouse--logical design.
Natek, S
1999-01-01
Outpatient Health Care Statistics on a state level is a central point where all relevant statistic data are collected from various sources from all over the country. Various and complex requirements for processing and reporting data makes Outpatient Health Care Statistics on a state level a perfect example for efficient implementing of Data warehouse technology. The research investigates logical design of data warehouse with a special attention on a different data modeling technique in various phases of a logical data warehouse design. The research shows that a requirement for processing statistical data determines the design decision and thus the scope and semantic value of final data warehouse.
Statistical Treatment of Looking-Time Data
ERIC Educational Resources Information Center
Csibra, Gergely; Hernik, Mikolaj; Mascaro, Olivier; Tatone, Denis; Lengyel, Máté
2016-01-01
Looking times (LTs) are frequently measured in empirical research on infant cognition. We analyzed the statistical distribution of LTs across participants to develop recommendations for their treatment in infancy research. Our analyses focused on a common within-subject experimental design, in which longer looking to novel or unexpected stimuli is…
Patton, Charles J.; Gilroy, Edward J.
1999-01-01
Data on which this report is based, including nutrient concentrations in synthetic reference samples determined concurrently with those in real samples, are extensive (greater than 20,000 determinations) and have been published separately. In addition to confirming the well-documented instability of nitrite in acidified samples, this study also demonstrates that when biota are removed from samples at collection sites by 0.45-micrometer membrane filtration, subsequent preservation with sulfuric acid or mercury (II) provides no statistically significant improvement in nutrient concentration stability during storage at 4 degrees Celsius for 30 days. Biocide preservation had no statistically significant effect on the 30-day stability of phosphorus concentrations in whole-water splits from any of the 15 stations, but did stabilize Kjeldahl nitrogen concentrations in whole-water splits from three data-collection stations where ammonium accounted for at least half of the measured Kjeldahl nitrogen.
Scattering and extinction: interpreting hazes in stellar occultation data
NASA Astrophysics Data System (ADS)
Bosh, Amanda S.; Levine, Stephen; Sickafoose, Amanda A.; Person, Michael J.
2016-10-01
There has been debate concerning interpretation of stellar occultation data and whether those data contain evidence for hazes within Pluto's atmosphere. Multiple layers of haze have been imaged in at Pluto with the New Horizons spacecraft; color-dependent differences in minimum flux from stellar occultations also suggests haze. We look at a purely geometric approach, to evaluate whether it is valid to sidestep details of atmospheric temperature structure and, in an approximate manner, conduct an analysis of the 2015 stellar occultation data that is consistent with the New Horizons imaging results. Support for this work was provided by NASA SSO grant NNX15AJ82G to Lowell Observatory.
Next-generation sequencing data interpretation: enhancing reproducibility and accessibility.
Nekrutenko, Anton; Taylor, James
2012-09-01
Areas of life sciences research that were previously distant from each other in ideology, analysis practices and toolkits, such as microbial ecology and personalized medicine, have all embraced techniques that rely on next-generation sequencing instruments. Yet the capacity to generate the data greatly outpaces our ability to analyse it. Existing sequencing technologies are more mature and accessible than the methodologies that are available for individual researchers to move, store, analyse and present data in a fashion that is transparent and reproducible. Here we discuss currently pressing issues with analysis, interpretation, reproducibility and accessibility of these data, and we present promising solutions and venture into potential future developments.
Mobile Collection and Automated Interpretation of EEG Data
NASA Technical Reports Server (NTRS)
Mintz, Frederick; Moynihan, Philip
2007-01-01
A system that would comprise mobile and stationary electronic hardware and software subsystems has been proposed for collection and automated interpretation of electroencephalographic (EEG) data from subjects in everyday activities in a variety of environments. By enabling collection of EEG data from mobile subjects engaged in ordinary activities (in contradistinction to collection from immobilized subjects in clinical settings), the system would expand the range of options and capabilities for performing diagnoses. Each subject would be equipped with one of the mobile subsystems, which would include a helmet that would hold floating electrodes (see figure) in those positions on the patient s head that are required in classical EEG data-collection techniques. A bundle of wires would couple the EEG signals from the electrodes to a multi-channel transmitter also located in the helmet. Electronic circuitry in the helmet transmitter would digitize the EEG signals and transmit the resulting data via a multidirectional RF patch antenna to a remote location. At the remote location, the subject s EEG data would be processed and stored in a database that would be auto-administered by a newly designed relational database management system (RDBMS). In this RDBMS, in nearly real time, the newly stored data would be subjected to automated interpretation that would involve comparison with other EEG data and concomitant peer-reviewed diagnoses stored in international brain data bases administered by other similar RDBMSs.
Prophecies of doom. [Bias in interpretation of environmental data
Linden, H.R.
1993-09-15
Biased interpretations of environmental data are choking the life out of the global energy system. Prevalent biases in the interpretation of scientific, technical, resource, and economic data influence energy policy and define the new energy/environment/economy (E[sup 3]) paradigm. These biases impede the orderly evolution of the global energy system to sustainability because they make rising energy consumption appear far less socially, economically, and environmentally beneficial than it is in fact - both on the basis of historical record and of forecasts that assign proper weight to ongoing technology advances. Energy consumption growth is impeded by these biases even when governed by least-cost energy service strategies that internalize all quantifiable externalities and mandate all cost-effective efficiency improvements in the delivery of heating, lighting, cooling, refrigeration, shaft-horsepower, and passenger- and ton-miles to ultimate consumers.
Huang, Ruili; Southall, Noel; Xia, Menghang; Cho, Ming-Hsuang; Jadhav, Ajit; Nguyen, Dac-Trung; Inglese, James; Tice, Raymond R.; Austin, Christopher P.
2009-01-01
In support of the U.S. Tox21 program, we have developed a simple and chemically intuitive model we call weighted feature significance (WFS) to predict the toxicological activity of compounds, based on the statistical enrichment of structural features in toxic compounds. We trained and tested the model on the following: (1) data from quantitative high–throughput screening cytotoxicity and caspase activation assays conducted at the National Institutes of Health Chemical Genomics Center, (2) data from Salmonella typhimurium reverse mutagenicity assays conducted by the U.S. National Toxicology Program, and (3) hepatotoxicity data published in the Registry of Toxic Effects of Chemical Substances. Enrichments of structural features in toxic compounds are evaluated for their statistical significance and compiled into a simple additive model of toxicity and then used to score new compounds for potential toxicity. The predictive power of the model for cytotoxicity was validated using an independent set of compounds from the U.S. Environmental Protection Agency tested also at the National Institutes of Health Chemical Genomics Center. We compared the performance of our WFS approach with classical classification methods such as Naive Bayesian clustering and support vector machines. In most test cases, WFS showed similar or slightly better predictive power, especially in the prediction of hepatotoxic compounds, where WFS appeared to have the best performance among the three methods. The new algorithm has the important advantages of simplicity, power, interpretability, and ease of implementation. PMID:19805409
Reiber, Hansotto
2016-06-01
The physiological and biophysical knowledge base for interpretations of cerebrospinal fluid (CSF) data and reference ranges are essential for the clinical pathologist and neurochemist. With the popular description of the CSF flow dependent barrier function, the dynamics and concentration gradients of blood-derived, brain-derived and leptomeningeal proteins in CSF or the specificity-independent functions of B-lymphocytes in brain also the neurologist, psychiatrist, neurosurgeon as well as the neuropharmacologist may find essentials for diagnosis, research or development of therapies. This review may help to replace the outdated ideas like "leakage" models of the barriers, linear immunoglobulin Index Interpretations or CSF electrophoresis. Calculations, Interpretations and analytical pitfalls are described for albumin quotients, quantitation of immunoglobulin synthesis in Reibergrams, oligoclonal IgG, IgM analysis, the polyspecific ( MRZ- ) antibody reaction, the statistical treatment of CSF data and general quality assessment in the CSF laboratory. The diagnostic relevance is documented in an accompaning review. PMID:27332077
Amplitude interpretation and visualization of three-dimensional reflection data
Enachescu, M.E. )
1994-07-01
Digital recording and processing of modern three-dimensional surveys allow for relative good preservation and correct spatial positioning of seismic reflection amplitude. A four-dimensional seismic reflection field matrix R (x,y,t,A), which can be computer visualized (i.e., real-time interactively rendered, edited, and animated), is now available to the interpreter. The amplitude contains encoded geological information indirectly related to lithologies and reservoir properties. The magnitude of the amplitude depends not only on the acoustic impedance contrast across a boundary, but is also strongly affected by the shape of the reflective boundary. This allows the interpreter to image subtle tectonic and structural elements not obvious on time-structure maps. The use of modern workstations allows for appropriate color coding of the total available amplitude range, routine on-screen time/amplitude extraction, and late display of horizon amplitude maps (horizon slices) or complex amplitude-structure spatial visualization. Stratigraphic, structural, tectonic, fluid distribution, and paleogeographic information are commonly obtained by displaying the amplitude variation A = A(x,y,t) associated with a particular reflective surface or seismic interval. As illustrated with several case histories, traditional structural and stratigraphic interpretation combined with a detailed amplitude study generally greatly enhance extraction of subsurface geological information from a reflection data volume. In the context of three-dimensional seismic surveys, the horizon amplitude map (horizon slice), amplitude attachment to structure and [open quotes]bright clouds[close quotes] displays are very powerful tools available to the interpreter.
Honda, Takayuki; Tozuka, Minoru
2015-09-01
In the reversed clinicopathological conference (R-CPC), three specialists in laboratory medicine interpreted routine laboratory data independently in order to understand the detailed state of a patient. R-CPC is an educational method to use laboratory data appropriately, and it is also important to select differential diagnoses in a process of clinical reasoning in addition to the present illness and physical examination. Routine laboratory tests can be performed repeatedly at a relatively low cost, and their time-series analysis can be performed. Interpretation of routine laboratory data is almost the same as taking physical findings. General findings are initially checked and then the state of each organ is examined. Although routine laboratory tests cost little, we can gain much more information from them about the patient than physical examinations. PMID:26731894
Lee, L.; Helsel, D.
2005-01-01
Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.
Statistical data of the uranium industry
1980-01-01
This document is a compilation of historical facts and figures through 1979. These statistics are based primarily on information provided voluntarily by the uranium exploration, mining, and milling companies. The production, reserves, drilling, and production capability information has been reported in a manner which avoids disclosure of proprietary information. Only the totals for the $1.5 reserves are reported. Because of increased interest in higher cost resources for long range planning purposes, a section covering the distribution of $100 per pound reserves statistics has been newly included. A table of mill recovery ranges for the January 1, 1980 reserves has also been added to this year's edition. The section on domestic uranium production capability has been deleted this year but will be included next year. The January 1, 1980 potential resource estimates are unchanged from the January 1, 1979 estimates.
[Intelligent interpretation of home monitoring blood glucose data].
Dió, Mihály; Deutsch, Tibor; Biczók, Tímea; Mészáros, Judit
2015-07-19
Self monitoring of blood glucose is the cornerstone of diabetes management. However, the data obtained by self monitoring of blood glucose have rarely been used with the highest advantage. Few physicians routinely download data from memory-equipped glucose meters and analyse these data systematically at the time of patient visits. There is a need for improved methods for the display and analysis of blood glucose data along with a modular approach for identification of clinical problems. The authors present a systematic methodology for the analysis and interpretation of self monitoring blood glucose data in order to assist the management of patients with diabetes. This approach utilizes the followings 1) overall quality of glycemic control; 2) severity and timing of hypoglycemia and hyperglycemia; 3) variability of blood glucose readings; 4) various temporal patterns extracted from recorded data and 5) adequacy of self monitoring blood glucose data. Based on reliable measures of the quality of glycaemic control and glucose variability, a prioritized problem list is derived along with the probable causes of the detected problems. Finally, problems and their interpretation are used to guide clinicians to choose therapeutic actions and/or recommend behaviour change in order to solve the problems that have been identified.
Eide, I; Zahlsen, K
1996-01-01
The paper describes experimental and statistical methods for toxicokinetic evaluation of mixtures in inhalation experiments. Synthetic mixtures of three C9 n-paraffinic, naphthenic and aromatic hydrocarbons (n-nonane, trimethylcyclohexane and trimethylbenzene, respectively) were studied in the rat after inhalation for 12h. The hydrocarbons were mixed according to principles for statistical experimental design using mixture design at four vapour levels (75, 150, 300 and 450 ppm) to support an empirical model with linear, interaction and quadratic terms (Taylor polynome). Immediately after exposure, concentrations of hydrocarbons were measured by head space gas chromatography in blood, brain, liver, kidneys and perirenal fat. Multivariate data analysis and modelling were performed with PLS (projections to latent structures). The best models were obtained after removing all interaction terms, suggesting that there were no interactions between the hydrocarbons with respect to absorption and distribution. Uptake of paraffins and particularly aromatics is best described by quadratic models, whereas the uptake of the naphthenic hydrocarbons is nearly linear. All models are good, with high correlation (r2) and prediction properties (Q2), the latter after cross validation. The concentrations of aromates in blood were high compared to the other hydrocarbons. At concentrations below 250 ppm, the naphthene reached higher concentrations in the brain compared to the paraffin and the aromate. Statistical experimental design, multivariate data analysis and modelling have proved useful for the evaluation of synthetic mixtures. The principles may also be used in the design of liquid mixtures, which may be evaporated partially or completely.
NASA Astrophysics Data System (ADS)
Dralle, D.; Karst, N.; Thompson, S. E.
2015-12-01
Multiple competing theories suggest that power law behavior governs the observed first-order dynamics of streamflow recessions - the important process by which catchments dry-out via the stream network, altering the availability of surface water resources and in-stream habitat. Frequently modeled as: dq/dt = -aqb, recessions typically exhibit a high degree of variability, even within a single catchment, as revealed by significant shifts in the values of "a" and "b" across recession events. One potential source of this variability lies in underlying, hard-to-observe fluctuations in how catchment water storage is partitioned amongst distinct storage elements, each having different discharge behaviors. Testing this and competing hypotheses with widely available streamflow timeseries, however, has been hindered by a power law scaling artifact that obscures meaningful covariation between the recession parameters, "a" and "b". Here we briefly outline a technique that removes this artifact, revealing intriguing new patterns in the joint distribution of recession parameters. Using long-term flow data from catchments in Northern California, we explore temporal variations, and find that the "a" parameter varies strongly with catchment wetness. Then we explore how the "b" parameter changes with "a", and find that measures of its variation are maximized at intermediate "a" values. We propose an interpretation of this pattern based on statistical mechanics, meaning "b" can be viewed as an indicator of the catchment "microstate" - i.e. the partitioning of storage - and "a" as a measure of the catchment macrostate (i.e. the total storage). In statistical mechanics, entropy (i.e. microstate variance, that is the variance of "b") is maximized for intermediate values of extensive variables (i.e. wetness, "a"), as observed in the recession data. This interpretation of "a" and "b" was supported by model runs using a multiple-reservoir catchment toy model, and lends support to the
Quantitative interpretation of Great Lakes remote sensing data
NASA Technical Reports Server (NTRS)
Shook, D. F.; Salzman, J.; Svehla, R. A.; Gedney, R. T.
1980-01-01
The paper discusses the quantitative interpretation of Great Lakes remote sensing water quality data. Remote sensing using color information must take into account (1) the existence of many different organic and inorganic species throughout the Great Lakes, (2) the occurrence of a mixture of species in most locations, and (3) spatial variations in types and concentration of species. The radiative transfer model provides a potential method for an orderly analysis of remote sensing data and a physical basis for developing quantitative algorithms. Predictions and field measurements of volume reflectances are presented which show the advantage of using a radiative transfer model. Spectral absorptance and backscattering coefficients for two inorganic sediments are reported.
Interdisciplinary applications and interpretations of remotely sensed data
NASA Technical Reports Server (NTRS)
Peterson, G. W.; Mcmurtry, G. J.
1972-01-01
An interdisciplinary approach to use remote sensor for the inventory of natural resources is discussed. The areas under investigation are land use, determination of pollution sources and damage, and analysis of geologic structure and terrain. The geographical area of primary interest is the Susquehanna River Basin. Descriptions of the data obtained by aerial cameras, multiband cameras, optical mechanical scanners, and radar are included. The Earth Resources Technology Satellite and Skylab program are examined. Interpretations of spacecraft data to show specific areas of interest are developed.
Laboratory study supporting the interpretation of Solar Dynamics Observatory data
Trabert, E.; Beiersdorfer, P.
2015-01-29
High-resolution extreme ultraviolet spectra of ions in an electron beam ion trap are investigated as a laboratory complement of the moderate-resolution observation bands of the AIA experiment on board the Solar Dynamics Observatory (SDO) spacecraft. Here, the latter observations depend on dominant iron lines of various charge states which in combination yield temperature information on the solar plasma. Our measurements suggest additions to the spectral models that are used in the SDO data interpretation. In the process, we also note a fair number of inconsistencies among the wavelength reference data bases.
Borehole seismic data processing and interpretation: New free software
NASA Astrophysics Data System (ADS)
Farfour, Mohammed; Yoon, Wang Jung
2015-12-01
Vertical Seismic Profile (VSP) surveying is a vital tool in subsurface imaging and reservoir characterization. The technique allows geophysicists to infer critical information that cannot be obtained otherwise. MVSP is a new MATLAB tool with a graphical user interface (GUI) for VSP shot modeling, data processing, and interpretation. The software handles VSP data from the loading and preprocessing stages to the final stage of corridor plotting and integration with well and seismic data. Several seismic and signal processing toolboxes are integrated and modified to suit and enrich the processing and display packages. The main motivation behind the development of the software is to provide new geoscientists and students in the geoscience fields with free software that brings together all VSP modules in one easy-to-use package. The software has several modules that allow the user to test, process, compare, visualize, and produce publication-quality results. The software is developed as a stand-alone MATLAB application that requires only MATLAB Compiler Runtime (MCR) to run with full functionality. We present a detailed description of MVSP and use the software to create synthetic VSP data. The data are then processed using different available tools. Next, real data are loaded and fully processed using the software. The data are then integrated with well data for more detailed analysis and interpretation. In order to evaluate the software processing flow accuracy, the same data are processed using commercial software. Comparison of the processing results shows that MVSP is able to process VSP data as efficiently as commercial software packages currently used in industry, and provides similar high-quality processed data.
Internet Data Analysis for the Undergraduate Statistics Curriculum
ERIC Educational Resources Information Center
Sanchez, Juana; He, Yan
2005-01-01
Statistics textbooks for undergraduates have not caught up with the enormous amount of analysis of Internet data that is taking place these days. Case studies that use Web server log data or Internet network traffic data are rare in undergraduate Statistics education. And yet these data provide numerous examples of skewed and bimodal…
Guidelines for Statistical Analysis of Percentage of Syllables Stuttered Data
ERIC Educational Resources Information Center
Jones, Mark; Onslow, Mark; Packman, Ann; Gebski, Val
2006-01-01
Purpose: The purpose of this study was to develop guidelines for the statistical analysis of percentage of syllables stuttered (%SS) data in stuttering research. Method; Data on %SS from various independent sources were used to develop a statistical model to describe this type of data. On the basis of this model, %SS data were simulated with…
Empirical approach to interpreting card-sorting data
NASA Astrophysics Data System (ADS)
Wolf, Steven F.; Dougherty, Daniel P.; Kortemeyer, Gerd
2012-06-01
Since it was first published 30 years ago, the seminal paper of Chi et al. on expert and novice categorization of introductory problems led to a plethora of follow-up studies within and outside of the area of physics [Cogn. Sci. 5, 121 (1981)COGSD50364-021310.1207/s15516709cog0502_2]. These studies frequently encompass “card-sorting” exercises whereby the participants group problems. While this technique certainly allows insights into problem solving approaches, simple descriptive statistics more often than not fail to find significant differences between experts and novices. In moving beyond descriptive statistics, we describe a novel microscopic approach that takes into account the individual identity of the cards and uses graph theory and models to visualize, analyze, and interpret problem categorization experiments. We apply these methods to an introductory physics (mechanics) problem categorization experiment, and find that most of the variation in sorting outcome is not due to the sorter being an expert versus a novice, but rather due to an independent characteristic that we named “stacker” versus “spreader.” The fact that the expert-novice distinction only accounts for a smaller amount of the variation may explain the frequent null results when conducting these experiments.
Comparison of genomic data via statistical distribution.
Amiri, Saeid; Dinov, Ivo D
2016-10-21
Sequence comparison has become an essential tool in bioinformatics, because highly homologous sequences usually imply significant functional or structural similarity. Traditional sequence analysis techniques are based on preprocessing and alignment, which facilitate measuring and quantitative characterization of genetic differences, variability and complexity. However, recent developments of next generation and whole genome sequencing technologies give rise to new challenges that are related to measuring similarity and capturing rearrangements of large segments contained in the genome. This work is devoted to illustrating different methods recently introduced for quantifying sequence distances and variability. Most of the alignment-free methods rely on counting words, which are small contiguous fragments of the genome. Our approach considers the locations of nucleotides in the sequences and relies more on appropriate statistical distributions. The results of this technique for comparing sequences, by extracting information and comparing matching fidelity and location regularization information, are very encouraging, specifically to classify mutation sequences. PMID:27460589
Comparison of genomic data via statistical distribution.
Amiri, Saeid; Dinov, Ivo D
2016-10-21
Sequence comparison has become an essential tool in bioinformatics, because highly homologous sequences usually imply significant functional or structural similarity. Traditional sequence analysis techniques are based on preprocessing and alignment, which facilitate measuring and quantitative characterization of genetic differences, variability and complexity. However, recent developments of next generation and whole genome sequencing technologies give rise to new challenges that are related to measuring similarity and capturing rearrangements of large segments contained in the genome. This work is devoted to illustrating different methods recently introduced for quantifying sequence distances and variability. Most of the alignment-free methods rely on counting words, which are small contiguous fragments of the genome. Our approach considers the locations of nucleotides in the sequences and relies more on appropriate statistical distributions. The results of this technique for comparing sequences, by extracting information and comparing matching fidelity and location regularization information, are very encouraging, specifically to classify mutation sequences.
Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.
The Systematic Interpretation of Cosmic Ray Data (The Transport Project)
NASA Technical Reports Server (NTRS)
Guzik, T. Gregory
1997-01-01
The Transport project's primary goals were to: (1) Provide measurements of critical fragmentation cross sections; (2) Study the cross section systematics; (3) Improve the galactic cosmic ray propagation methodology; and (4) Use the new cross section measurements to improve the interpretation of cosmic ray data. To accomplish these goals a collaboration was formed consisting of researchers in the US at Louisiana State University (LSU), Lawrence Berkeley Laboratory (LBL), Goddard Space Flight Center (GSFC), the University of Minnesota (UM), New Mexico State University (NMSU), in France at the Centre d'Etudes de Saclay and in Italy at the Universita di Catania. The US institutions, lead by LSU, were responsible for measuring new cross sections using the LBL HISS facility, analysis of these measurements and their application to interpreting cosmic ray data. France developed a liquid hydrogen target that was used in the HISS experiment and participated in the data interpretation. Italy developed a Multifunctional Neutron Spectrometer (MUFFINS) for the HISS runs to measure the energy spectra, angular distributions and multiplicities of neutrons emitted during the high energy interactions. The Transport Project was originally proposed to NASA during Summer, 1988 and funding began January, 1989. Transport was renewed twice (1991, 1994) and finally concluded at LSU on September, 30, 1997. During the more than 8 years of effort we had two major experiment runs at LBL, obtained data on the interaction of twenty different beams with a liquid hydrogen target, completed the analysis of fifteen of these datasets obtaining 590 new cross section measurements, published nine journal articles as well as eighteen conference proceedings papers, and presented more than thirty conference talks.
Statistical analyses to support forensic interpretation for a new ten-locus STR profiling system.
Foreman, L A; Evett, I W
2001-01-01
A new ten-locus STR (short tandem repeat) profiling system was recently introduced into casework by the Forensic Science Service (FSS) and statistical analyses are described here based on data collected using this new system for the three major racial groups of the UK: Caucasian. Afro-Caribbean and Asian (of Indo-Pakistani descent). Allele distributions are compared and the FSS position with regard to routine significance testing of DNA frequency databases is discussed. An investigation of match probability calculations is carried out and the consequent analyses are shown to provide support for proposed changes in how the FSS reports DNA results when very small match probabilities are involved.
Statistical Considerations of Data Processing in Giovanni Online Tool
NASA Technical Reports Server (NTRS)
Suhung, Shen; Leptoukh, G.; Acker, J.; Berrick, S.
2005-01-01
The GES DISC Interactive Online Visualization and Analysis Infrastructure (Giovanni) is a web-based interface for the rapid visualization and analysis of gridded data from a number of remote sensing instruments. The GES DISC currently employs several Giovanni instances to analyze various products, such as Ocean-Giovanni for ocean products from SeaWiFS and MODIS-Aqua; TOMS & OM1 Giovanni for atmospheric chemical trace gases from TOMS and OMI, and MOVAS for aerosols from MODIS, etc. (http://giovanni.gsfc.nasa.gov) Foremost among the Giovanni statistical functions is data averaging. Two aspects of this function are addressed here. The first deals with the accuracy of averaging gridded mapped products vs. averaging from the ungridded Level 2 data. Some mapped products contain mean values only; others contain additional statistics, such as number of pixels (NP) for each grid, standard deviation, etc. Since NP varies spatially and temporally, averaging with or without weighting by NP will be different. In this paper, we address differences of various weighting algorithms for some datasets utilized in Giovanni. The second aspect is related to different averaging methods affecting data quality and interpretation for data with non-normal distribution. The present study demonstrates results of different spatial averaging methods using gridded SeaWiFS Level 3 mapped monthly chlorophyll a data. Spatial averages were calculated using three different methods: arithmetic mean (AVG), geometric mean (GEO), and maximum likelihood estimator (MLE). Biogeochemical data, such as chlorophyll a, are usually considered to have a log-normal distribution. The study determined that differences between methods tend to increase with increasing size of a selected coastal area, with no significant differences in most open oceans. The GEO method consistently produces values lower than AVG and MLE. The AVG method produces values larger than MLE in some cases, but smaller in other cases. Further
Efficient statistical mapping of avian count data
Royle, J. Andrew; Wikle, C.K.
2005-01-01
We develop a spatial modeling framework for count data that is efficient to implement in high-dimensional prediction problems. We consider spectral parameterizations for the spatially varying mean of a Poisson model. The spectral parameterization of the spatial process is very computationally efficient, enabling effective estimation and prediction in large problems using Markov chain Monte Carlo techniques. We apply this model to creating avian relative abundance maps from North American Breeding Bird Survey (BBS) data. Variation in the ability of observers to count birds is modeled as spatially independent noise, resulting in over-dispersion relative to the Poisson assumption. This approach represents an improvement over existing approaches used for spatial modeling of BBS data which are either inefficient for continental scale modeling and prediction or fail to accommodate important distributional features of count data thus leading to inaccurate accounting of prediction uncertainty.
Traumatic Brain Injury (TBI) Data and Statistics
... data.cdc.gov . Emergency Department Visits, Hospitalizations, and Deaths Rates of TBI-related Emergency Department Visits, Hospitalizations, ... related Hospitalizations by Age Group and Injury Mechanism Deaths Rates of TBI-related Deaths by Sex Rates ...
Simple Hartmann test data interpretation for ophthalmic lenses
NASA Astrophysics Data System (ADS)
Salas-Peimbert, Didia Patricia; Trujillo-Schiaffino, Gerardo; González-Silva, Jorge Alberto; Almazán-Cuellar, Saúl; Malacara-Doblado, Daniel
2006-04-01
This article describes a simple Hartmann test data interpretation that can be used to evaluate the performance of ophthalmic lenses. Considering each spot of the Hartmann pattern such as a single test ray, using simple ray tracing analysis, it is possible to calculate the power values from the lens under test at the point corresponding with each spot. The values obtained by this procedure are used to plot the power distribution map of the entire lens. We present the results obtained applying this method with single vision, bifocal, and progressive lenses.
Geological Interpretation of PSInSAR Data at Regional Scale
Meisina, Claudia; Zucca, Francesco; Notti, Davide; Colombo, Alessio; Cucchi, Anselmo; Savio, Giuliano; Giannico, Chiara; Bianchi, Marco
2008-01-01
Results of a PSInSAR™ project carried out by the Regional Agency for Environmental Protection (ARPA) in Piemonte Region (Northern Italy) are presented and discussed. A methodology is proposed for the interpretation of the PSInSAR™ data at the regional scale, easy to use by the public administrations and by civil protection authorities. Potential and limitations of the PSInSAR™ technique for ground movement detection on a regional scale and monitoring are then estimated in relationship with different geological processes and various geological environments.
Statistical analysis of life history calendar data.
Eerola, Mervi; Helske, Satu
2016-04-01
The life history calendar is a data-collection tool for obtaining reliable retrospective data about life events. To illustrate the analysis of such data, we compare the model-based probabilistic event history analysis and the model-free data mining method, sequence analysis. In event history analysis, we estimate instead of transition hazards the cumulative prediction probabilities of life events in the entire trajectory. In sequence analysis, we compare several dissimilarity metrics and contrast data-driven and user-defined substitution costs. As an example, we study young adults' transition to adulthood as a sequence of events in three life domains. The events define the multistate event history model and the parallel life domains in multidimensional sequence analysis. The relationship between life trajectories and excess depressive symptoms in middle age is further studied by their joint prediction in the multistate model and by regressing the symptom scores on individual-specific cluster indices. The two approaches complement each other in life course analysis; sequence analysis can effectively find typical and atypical life patterns while event history analysis is needed for causal inquiries.
Flexibility in data interpretation: effects of representational format
Braithwaite, David W.; Goldstone, Robert L.
2013-01-01
Graphs and tables differentially support performance on specific tasks. For tasks requiring reading off single data points, tables are as good as or better than graphs, while for tasks involving relationships among data points, graphs often yield better performance. However, the degree to which graphs and tables support flexibility across a range of tasks is not well-understood. In two experiments, participants detected main and interaction effects in line graphs and tables of bivariate data. Graphs led to more efficient performance, but also lower flexibility, as indicated by a larger discrepancy in performance across tasks. In particular, detection of main effects of variables represented in the graph legend was facilitated relative to detection of main effects of variables represented in the x-axis. Graphs may be a preferable representational format when the desired task or analytical perspective is known in advance, but may also induce greater interpretive bias than tables, necessitating greater care in their use and design. PMID:24427145
Interpretation methodology and analysis of in-flight lightning data
NASA Technical Reports Server (NTRS)
Rudolph, T.; Perala, R. A.
1982-01-01
A methodology is presented whereby electromagnetic measurements of inflight lightning stroke data can be understood and extended to other aircraft. Recent measurements made on the NASA F106B aircraft indicate that sophisticated numerical techniques and new developments in corona modeling are required to fully understand the data. Thus the problem is nontrivial and successful interpretation can lead to a significant understanding of the lightning/aircraft interaction event. This is of particular importance because of the problem of lightning induced transient upset of new technology low level microcircuitry which is being used in increasing quantities in modern and future avionics. Inflight lightning data is analyzed and lightning environments incident upon the F106B are determined.
Statistical inference for serial dilution assay data.
Lee, M L; Whitmore, G A
1999-12-01
Serial dilution assays are widely employed for estimating substance concentrations and minimum inhibitory concentrations. The Poisson-Bernoulli model for such assays is appropriate for count data but not for continuous measurements that are encountered in applications involving substance concentrations. This paper presents practical inference methods based on a log-normal model and illustrates these methods using a case application involving bacterial toxins.
Improved interpretation of satellite altimeter data using genetic algorithms
NASA Technical Reports Server (NTRS)
Messa, Kenneth; Lybanon, Matthew
1992-01-01
Genetic algorithms (GA) are optimization techniques that are based on the mechanics of evolution and natural selection. They take advantage of the power of cumulative selection, in which successive incremental improvements in a solution structure become the basis for continued development. A GA is an iterative procedure that maintains a 'population' of 'organisms' (candidate solutions). Through successive 'generations' (iterations) the population as a whole improves in simulation of Darwin's 'survival of the fittest'. GA's have been shown to be successful where noise significantly reduces the ability of other search techniques to work effectively. Satellite altimetry provides useful information about oceanographic phenomena. It provides rapid global coverage of the oceans and is not as severely hampered by cloud cover as infrared imagery. Despite these and other benefits, several factors lead to significant difficulty in interpretation. The GA approach to the improved interpretation of satellite data involves the representation of the ocean surface model as a string of parameters or coefficients from the model. The GA searches in parallel, a population of such representations (organisms) to obtain the individual that is best suited to 'survive', that is, the fittest as measured with respect to some 'fitness' function. The fittest organism is the one that best represents the ocean surface model with respect to the altimeter data.
ERIC Educational Resources Information Center
Boysen, Guy A.
2015-01-01
Student evaluations of teaching are among the most accepted and important indicators of college teachers' performance. However, faculty and administrators can overinterpret small variations in mean teaching evaluations. The current research examined the effect of including statistical information on the interpretation of teaching evaluations.…
Plausible inference and the interpretation of quantitative data
Nakhleh, C.W.
1998-02-01
The analysis of quantitative data is central to scientific investigation. Probability theory, which is founded on two rules, the sum and product rules, provides the unique, logically consistent method for drawing valid inferences from quantitative data. This primer on the use of probability theory is meant to fulfill a pedagogical purpose. The discussion begins at the foundation of scientific inference by showing how the sum and product rules of probability theory follow from some very basic considerations of logical consistency. The authors then develop general methods of probability theory that are essential to the analysis and interpretation of data. They discuss how to assign probability distributions using the principle of maximum entropy, how to estimate parameters from data, how to handle nuisance parameters whose values are of little interest, and how to determine which of a set of models is most justified by a data set. All these methods are used together in most realistic data analyses. Examples are given throughout to illustrate the basic points.
Statistical approach for evaluation of contraceptive data.
Tripathi, Vriyesh
2008-04-01
This article will define how best to analyse data collected from a longitudinal follow up on contraceptive use and discontinuation, with special consideration to the needs of developing countries. Accessibility and acceptability of contraceptives at the ground level remains low and it is an overlooked area of research. The author presents a set of propositions that are closer in spirit to practical recommendations than to formal theorems. We will comment specifically on issues of model validation of model through bootstrapping techniques. The paper makes a presentation of a multivariate model to assess the rate of discontinuation of contraception, while accounting for the possibility that there may be factors that influence both a couple's choice of provider and their probability of discontinuation.
Statistical approach for evaluation of contraceptive data.
Tripathi, Vriyesh
2008-04-01
This article will define how best to analyse data collected from a longitudinal follow up on contraceptive use and discontinuation, with special consideration to the needs of developing countries. Accessibility and acceptability of contraceptives at the ground level remains low and it is an overlooked area of research. The author presents a set of propositions that are closer in spirit to practical recommendations than to formal theorems. We will comment specifically on issues of model validation of model through bootstrapping techniques. The paper makes a presentation of a multivariate model to assess the rate of discontinuation of contraception, while accounting for the possibility that there may be factors that influence both a couple's choice of provider and their probability of discontinuation. PMID:20695150
Control Statistics Process Data Base V4
1998-05-07
The check standard database program, CSP_CB, is a menu-driven program that can acquire measurement data for check standards having a parameter dependence (such as frequency) or no parameter dependence (for example, mass measurements). The program may be run stand-alone or leaded as a subprogram to a Basic program already in memory. The software was designed to require little additional work on the part of the user. The facilitate this design goal, the program is entirelymore » menu-driven. In addition, the user does have control of file names and parameters within a definition file which sets up the basic scheme of file names.« less
Rapp, J.B.
1991-01-01
Q-mode factor analysis was used to quantitate the distribution of the major aliphatic hydrocarbon (n-alkanes, pristane, phytane) systems in sediments from a variety of marine environments. The compositions of the pure end members of the systems were obtained from factor scores and the distribution of the systems within each sample was obtained from factor loadings. All the data, from the diverse environments sampled (estuarine (San Francisco Bay), fresh-water (San Francisco Peninsula), polar-marine (Antarctica) and geothermal-marine (Gorda Ridge) sediments), were reduced to three major systems: a terrestrial system (mostly high molecular weight aliphatics with odd-numbered-carbon predominance), a mature system (mostly low molecular weight aliphatics without predominance) and a system containing mostly high molecular weight aliphatics with even-numbered-carbon predominance. With this statistical approach, it is possible to assign the percentage contribution from various sources to the observed distribution of aliphatic hydrocarbons in each sediment sample. ?? 1991.
INDIANS IN OKLAHOMA, SOCIAL AND ECONOMIC STATISTICAL DATA.
ERIC Educational Resources Information Center
HUNTER, BILL; TUCKER, TOM
STATISTICAL DATA ARE PRESENTED ON THE INDIAN POPULATION OF OKLAHOMA, ALONG WITH A BRIEF HISTORY OF SOME OF THE 67 INDIAN TRIBES FOUND IN THE STATE AND NARRATIVE SUMMARIES OF THE STATISTICAL DATA. MAPS OF CURRENT AND PAST INDIAN LANDS ARE SHOWN IN RELATION TO CURRENT COUNTY LINES. GRAPHS PORTRAY POPULATION COMPOSITION, RURAL AND URBAN POPULATION…
Using Data from Climate Science to Teach Introductory Statistics
ERIC Educational Resources Information Center
Witt, Gary
2013-01-01
This paper shows how the application of simple statistical methods can reveal to students important insights from climate data. While the popular press is filled with contradictory opinions about climate science, teachers can encourage students to use introductory-level statistics to analyze data for themselves on this important issue in public…
Transformations on Data Sets and Their Effects on Descriptive Statistics
ERIC Educational Resources Information Center
Fox, Thomas B.
2005-01-01
The activity asks students to examine the effects on the descriptive statistics of a data set that has undergone either a translation or a scale change. They make conjectures relative to the effects on the statistics of a transformation on a data set and then they defend their conjectures and deductively verify several of them.
Experimental uncertainty estimation and statistics for data having interval uncertainty.
Kreinovich, Vladik (Applied Biomathematics, Setauket, New York); Oberkampf, William Louis (Applied Biomathematics, Setauket, New York); Ginzburg, Lev (Applied Biomathematics, Setauket, New York); Ferson, Scott (Applied Biomathematics, Setauket, New York); Hajagos, Janos (Applied Biomathematics, Setauket, New York)
2007-05-01
This report addresses the characterization of measurements that include epistemic uncertainties in the form of intervals. It reviews the application of basic descriptive statistics to data sets which contain intervals rather than exclusively point estimates. It describes algorithms to compute various means, the median and other percentiles, variance, interquartile range, moments, confidence limits, and other important statistics and summarizes the computability of these statistics as a function of sample size and characteristics of the intervals in the data (degree of overlap, size and regularity of widths, etc.). It also reviews the prospects for analyzing such data sets with the methods of inferential statistics such as outlier detection and regressions. The report explores the tradeoff between measurement precision and sample size in statistical results that are sensitive to both. It also argues that an approach based on interval statistics could be a reasonable alternative to current standard methods for evaluating, expressing and propagating measurement uncertainties.
Statistical Modeling of Large-Scale Scientific Simulation Data
Eliassi-Rad, T; Baldwin, C; Abdulla, G; Critchlow, T
2003-11-15
With the advent of massively parallel computer systems, scientists are now able to simulate complex phenomena (e.g., explosions of a stars). Such scientific simulations typically generate large-scale data sets over the spatio-temporal space. Unfortunately, the sheer sizes of the generated data sets make efficient exploration of them impossible. Constructing queriable statistical models is an essential step in helping scientists glean new insight from their computer simulations. We define queriable statistical models to be descriptive statistics that (1) summarize and describe the data within a user-defined modeling error, and (2) are able to answer complex range-based queries over the spatiotemporal dimensions. In this chapter, we describe systems that build queriable statistical models for large-scale scientific simulation data sets. In particular, we present our Ad-hoc Queries for Simulation (AQSim) infrastructure, which reduces the data storage requirements and query access times by (1) creating and storing queriable statistical models of the data at multiple resolutions, and (2) evaluating queries on these models of the data instead of the entire data set. Within AQSim, we focus on three simple but effective statistical modeling techniques. AQSim's first modeling technique (called univariate mean modeler) computes the ''true'' (unbiased) mean of systematic partitions of the data. AQSim's second statistical modeling technique (called univariate goodness-of-fit modeler) uses the Andersen-Darling goodness-of-fit method on systematic partitions of the data. Finally, AQSim's third statistical modeling technique (called multivariate clusterer) utilizes the cosine similarity measure to cluster the data into similar groups. Our experimental evaluations on several scientific simulation data sets illustrate the value of using these statistical models on large-scale simulation data sets.
THINK Back: KNowledge-based Interpretation of High Throughput data.
Farfán, Fernando; Ma, Jun; Sartor, Maureen A; Michailidis, George; Jagadish, Hosagrahar V
2012-01-01
Results of high throughput experiments can be challenging to interpret. Current approaches have relied on bulk processing the set of expression levels, in conjunction with easily obtained external evidence, such as co-occurrence. While such techniques can be used to reason probabilistically, they are not designed to shed light on what any individual gene, or a network of genes acting together, may be doing. Our belief is that today we have the information extraction ability and the computational power to perform more sophisticated analyses that consider the individual situation of each gene. The use of such techniques should lead to qualitatively superior results. The specific aim of this project is to develop computational techniques to generate a small number of biologically meaningful hypotheses based on observed results from high throughput microarray experiments, gene sequences, and next-generation sequences. Through the use of relevant known biomedical knowledge, as represented in published literature and public databases, we can generate meaningful hypotheses that will aide biologists to interpret their experimental data. We are currently developing novel approaches that exploit the rich information encapsulated in biological pathway graphs. Our methods perform a thorough and rigorous analysis of biological pathways, using complex factors such as the topology of the pathway graph and the frequency in which genes appear on different pathways, to provide more meaningful hypotheses to describe the biological phenomena captured by high throughput experiments, when compared to other existing methods that only consider partial information captured by biological pathways. PMID:22536867
THINK Back: KNowledge-based Interpretation of High Throughput data
2012-01-01
Results of high throughput experiments can be challenging to interpret. Current approaches have relied on bulk processing the set of expression levels, in conjunction with easily obtained external evidence, such as co-occurrence. While such techniques can be used to reason probabilistically, they are not designed to shed light on what any individual gene, or a network of genes acting together, may be doing. Our belief is that today we have the information extraction ability and the computational power to perform more sophisticated analyses that consider the individual situation of each gene. The use of such techniques should lead to qualitatively superior results. The specific aim of this project is to develop computational techniques to generate a small number of biologically meaningful hypotheses based on observed results from high throughput microarray experiments, gene sequences, and next-generation sequences. Through the use of relevant known biomedical knowledge, as represented in published literature and public databases, we can generate meaningful hypotheses that will aide biologists to interpret their experimental data. We are currently developing novel approaches that exploit the rich information encapsulated in biological pathway graphs. Our methods perform a thorough and rigorous analysis of biological pathways, using complex factors such as the topology of the pathway graph and the frequency in which genes appear on different pathways, to provide more meaningful hypotheses to describe the biological phenomena captured by high throughput experiments, when compared to other existing methods that only consider partial information captured by biological pathways. PMID:22536867
THINK Back: KNowledge-based Interpretation of High Throughput data.
Farfán, Fernando; Ma, Jun; Sartor, Maureen A; Michailidis, George; Jagadish, Hosagrahar V
2012-03-13
Results of high throughput experiments can be challenging to interpret. Current approaches have relied on bulk processing the set of expression levels, in conjunction with easily obtained external evidence, such as co-occurrence. While such techniques can be used to reason probabilistically, they are not designed to shed light on what any individual gene, or a network of genes acting together, may be doing. Our belief is that today we have the information extraction ability and the computational power to perform more sophisticated analyses that consider the individual situation of each gene. The use of such techniques should lead to qualitatively superior results. The specific aim of this project is to develop computational techniques to generate a small number of biologically meaningful hypotheses based on observed results from high throughput microarray experiments, gene sequences, and next-generation sequences. Through the use of relevant known biomedical knowledge, as represented in published literature and public databases, we can generate meaningful hypotheses that will aide biologists to interpret their experimental data. We are currently developing novel approaches that exploit the rich information encapsulated in biological pathway graphs. Our methods perform a thorough and rigorous analysis of biological pathways, using complex factors such as the topology of the pathway graph and the frequency in which genes appear on different pathways, to provide more meaningful hypotheses to describe the biological phenomena captured by high throughput experiments, when compared to other existing methods that only consider partial information captured by biological pathways.
Estimating aquifer channel recharge using optical data interpretation.
Walter, Gary R; Necsoiu, Marius; McGinnis, Ronald
2012-01-01
Recharge through intermittent and ephemeral stream channels is believed to be a primary aquifer recharge process in arid and semiarid environments. The intermittent nature of precipitation and flow events in these channels, and their often remote locations, makes direct flow and loss measurements difficult and expensive. Airborne and satellite optical images were interpreted to evaluate aquifer recharge due to stream losses on the Frio River in south-central Texas. Losses in the Frio River are believed to be a major contributor of recharge to the Edwards Aquifer. The results of this work indicate that interpretation of readily available remote sensing optical images can offer important insights into the spatial distribution of aquifer recharge from losing streams. In cases where upstream gauging data are available, simple visual analysis of the length of the flowing reach downstream from the gauging station can be used to estimate channel losses. In the case of the Frio River, the rate of channel loss estimated from the length of the flowing reach at low flows was about half of the loss rate calculated from in-stream gain-loss measurements. Analysis based on water-surface width and channel slope indicated that losses were mainly in a reach downstream of the mapped recharge zone. The analysis based on water-surface width, however, did not indicate that this method could yield accurate estimates of actual flow in pool and riffle streams, such as the Frio River and similar rivers draining the Edwards Plateau.
Yu, Victoria; Kishan, Amar U.; Cao, Minsong; Low, Daniel; Lee, Percy; Ruan, Dan
2014-03-15
Purpose: To demonstrate a new method of evaluating dose response of treatment-induced lung radiographic injury post-SBRT (stereotactic body radiotherapy) treatment and the discovery of bimodal dose behavior within clinically identified injury volumes. Methods: Follow-up CT scans at 3, 6, and 12 months were acquired from 24 patients treated with SBRT for stage-1 primary lung cancers or oligometastic lesions. Injury regions in these scans were propagated to the planning CT coordinates by performing deformable registration of the follow-ups to the planning CTs. A bimodal behavior was repeatedly observed from the probability distribution for dose values within the deformed injury regions. Based on a mixture-Gaussian assumption, an Expectation-Maximization (EM) algorithm was used to obtain characteristic parameters for such distribution. Geometric analysis was performed to interpret such parameters and infer the critical dose level that is potentially inductive of post-SBRT lung injury. Results: The Gaussian mixture obtained from the EM algorithm closely approximates the empirical dose histogram within the injury volume with good consistency. The average Kullback-Leibler divergence values between the empirical differential dose volume histogram and the EM-obtained Gaussian mixture distribution were calculated to be 0.069, 0.063, and 0.092 for the 3, 6, and 12 month follow-up groups, respectively. The lower Gaussian component was located at approximately 70% prescription dose (35 Gy) for all three follow-up time points. The higher Gaussian component, contributed by the dose received by planning target volume, was located at around 107% of the prescription dose. Geometrical analysis suggests the mean of the lower Gaussian component, located at 35 Gy, as a possible indicator for a critical dose that induces lung injury after SBRT. Conclusions: An innovative and improved method for analyzing the correspondence between lung radiographic injury and SBRT treatment dose has
Seismic data processing and interpretation on the loess plateau, Part 1: Seismic data processing
NASA Astrophysics Data System (ADS)
Jiang, Jiayu; Fu, Shouxian; Li, Jiuling
2005-12-01
Branching river channels and the coexistence of valleys, ridges, hills, and slopes as the result of long-term weathering and erosion form the unique loess topography. The Changqing Geophysical Company, working in these complex conditions, has established a suite of technologies for high-fidelity processing and fine interpretation of seismic data. This article introduces the processes involved in the data processing and interpretation and illustrates the results.
Interpretation of AMS-02 electrons and positrons data
Mauro, M. Di; Donato, F.; Fornengo, N.; Vittino, A.; Lineros, R. E-mail: donato@to.infn.it E-mail: rlineros@ific.uv.es
2014-04-01
We perform a combined analysis of the recent AMS-02 data on electrons, positrons, electrons plus positrons and positron fraction, in a self-consistent framework where we realize a theoretical modeling of all the astrophysical components that can contribute to the observed fluxes in the whole energy range. The primary electron contribution is modeled through the sum of an average flux from distant sources and the fluxes from the local supernova remnants in the Green catalog. The secondary electron and positron fluxes originate from interactions on the interstellar medium of primary cosmic rays, for which we derive a novel determination by using AMS-02 proton and helium data. Primary positrons and electrons from pulsar wind nebulae in the ATNF catalog are included and studied in terms of their most significant (while loosely known) properties and under different assumptions (average contribution from the whole catalog, single dominant pulsar, a few dominant pulsars). We obtain a remarkable agreement between our various modeling and the AMS-02 data for all types of analysis, demonstrating that the whole AMS-02 leptonic data admit a self-consistent interpretation in terms of astrophysical contributions.
Revisiting the interpretation of casein micelle SAXS data.
Ingham, B; Smialowska, A; Erlangga, G D; Matia-Merino, L; Kirby, N M; Wang, C; Haverkamp, R G; Carr, A J
2016-08-17
An in-depth, critical review of model-dependent fitting of small-angle X-ray scattering (SAXS) data of bovine skim milk has led us to develop a new mathematical model for interpreting these data. Calcium-edge resonant soft X-ray scattering data provides unequivocal evidence as to the shape and location of the scattering due to colloidal calcium phosphate, which is manifested as a correlation peak centred at q = 0.035 Å(-1). In SAXS data this feature is seldom seen, although most literature studies attribute another feature centred at q = 0.08-0.1 Å(-1) to CCP. This work shows that the major SAXS features are due to protein arrangements: the casein micelle itself; internal regions approximately 20 nm in size, separated by water channels; and protein structures which are inhomogeneous on a 1-3 nm length scale. The assignment of these features is consistent with their behaviour under various conditions, including hydration time after reconstitution, addition of EDTA (a Ca-chelating agent), addition of urea, and reduction of pH. PMID:27491477
Importance of data management with statistical analysis set division.
Wang, Ling; Li, Chan-juan; Jiang, Zhi-wei; Xia, Jie-lai
2015-11-01
Testing of hypothesis was affected by statistical analysis set division which was an important data management work before data base lock-in. Objective division of statistical analysis set under blinding was the guarantee of scientific trial conclusion. All the subjects having accepted at least once trial treatment after randomization should be concluded in safety set. Full analysis set should be close to the intention-to-treat as far as possible. Per protocol set division was the most difficult to control in blinded examination because of more subjectivity than the other two. The objectivity of statistical analysis set division must be guaranteed by the accurate raw data, the comprehensive data check and the scientific discussion, all of which were the strict requirement of data management. Proper division of statistical analysis set objectively and scientifically is an important approach to improve the data management quality. PMID:26911044
Importance of data management with statistical analysis set division.
Wang, Ling; Li, Chan-juan; Jiang, Zhi-wei; Xia, Jie-lai
2015-11-01
Testing of hypothesis was affected by statistical analysis set division which was an important data management work before data base lock-in. Objective division of statistical analysis set under blinding was the guarantee of scientific trial conclusion. All the subjects having accepted at least once trial treatment after randomization should be concluded in safety set. Full analysis set should be close to the intention-to-treat as far as possible. Per protocol set division was the most difficult to control in blinded examination because of more subjectivity than the other two. The objectivity of statistical analysis set division must be guaranteed by the accurate raw data, the comprehensive data check and the scientific discussion, all of which were the strict requirement of data management. Proper division of statistical analysis set objectively and scientifically is an important approach to improve the data management quality.
Tsitouridou, Roxani; Papazova, Petia; Simeonova, Pavlina; Simeonov, Vasil
2013-01-01
The size distribution of aerosol particles (PM0.015-PM18) in relation to their soluble inorganic species and total water soluble organic compounds (WSOC) was investigated at an urban site of Thessaloniki, Northern Greece. The sampling period was from February to July 2007. The determined compounds were compared with mass concentrations of the PM fractions for nano (N: 0.015 < Dp < 0.06), ultrafine (UFP: 0.015 < Dp < 0.125), fine (FP: 0.015 < Dp < 2.0) and coarse particles (CP: 2.0 < Dp < 8.0) in order to perform mass closure of the water soluble content for the respective fractions. Electrolytes were the dominant species in all fractions (24-27%), followed by WSOC (16-23%). The water soluble inorganic and organic content was found to account for 53% of the nanoparticle, 48% of the ultrafine particle, 45% of the fine particle and 44% of the coarse particle mass. Correlations between the analyzed species were performed and the effect of local and long-range transported emissions was examined by wind direction and backward air mass trajectories. Multivariate statistical analysis (cluster analysis and principal components analysis) of the collected data was performed in order to reveal the specific data structure. Possible sources of air pollution were identified and an attempt is made to find patterns of similarity between the different sized aerosols and the seasons of monitoring. It was proven that several major latent factors are responsible for the data structure despite the size of the aerosols - mineral (soil) dust, sea sprays, secondary emissions, combustion sources and industrial impact. The seasonal separation proved to be not very specific. PMID:24007436
Interpreting two-photon imaging data of lymphocyte motility.
Meyer-Hermann, Michael E; Maini, Philip K
2005-06-01
Recently, using two-photon imaging it has been found that the movement of B and T cells in lymph nodes can be described by a random walk with persistence of orientation in the range of 2 minutes. We interpret this new class of lymphocyte motility data within a theoretical model. The model considers cell movement to be composed of the movement of subunits of the cell membrane. In this way movement and deformation of the cell are correlated to each other. We find that, indeed, the lymphocyte movement in lymph nodes can best be described as a random walk with persistence of orientation. The assumption of motility induced cell elongation is consistent with the data. Within the framework of our model the two-photon data suggest that T and B cells are in a single velocity state with large stochastic width. The alternative of three different velocity states with frequent changes of their state and small stochastic width is less likely. Two velocity states can be excluded. PMID:16089770
Enhancements to the Engine Data Interpretation System (EDIS)
NASA Technical Reports Server (NTRS)
Hofmann, Martin O.
1993-01-01
The Engine Data Interpretation System (EDIS) expert system project assists the data review personnel at NASA/MSFC in performing post-test data analysis and engine diagnosis of the Space Shuttle Main Engine (SSME). EDIS uses knowledge of the engine, its components, and simple thermodynamic principles instead of, and in addition to, heuristic rules gathered from the engine experts. EDIS reasons in cooperation with human experts, following roughly the pattern of logic exhibited by human experts. EDIS concentrates on steady-state static faults, such as small leaks, and component degradations, such as pump efficiencies. The objective of this contract was to complete the set of engine component models, integrate heuristic rules into EDIS, integrate the Power Balance Model into EDIS, and investigate modification of the qualitative reasoning mechanisms to allow 'fuzzy' value classification. The results of this contract is an operational version of EDIS. EDIS will become a module of the Post-Test Diagnostic System (PTDS) and will, in this context, provide system-level diagnostic capabilities which integrate component-specific findings provided by other modules.
Enhancements to the Engine Data Interpretation System (EDIS)
NASA Technical Reports Server (NTRS)
Hofmann, Martin O.
1993-01-01
The Engine Data Interpretation System (EDIS) expert system project assists the data review personnel at NASA/MSFC in performing post-test data analysis and engine diagnosis of the Space Shuttle Main Engine (SSME). EDIS uses knowledge of the engine, its components, and simple thermodynamic principles instead of, and in addition to, heuristic rules gathered from the engine experts. EDIS reasons in cooperation with human experts, following roughly the pattern of logic exhibited by human experts. EDIS concentrates on steady-state static faults, such as small leaks, and component degradations, such as pump efficiencies. The objective of this contract was to complete the set of engine component models, integrate heuristic rules into EDIS, integrate the Power Balance Model into EDIS, and investigate modification of the qualitative reasoning mechanisms to allow 'fuzzy' value classification. The result of this contract is an operational version of EDIS. EDIS will become a module of the Post-Test Diagnostic System (PTDS) and will, in this context, provide system-level diagnostic capabilities which integrate component-specific findings provided by other modules.
Phase 1 report on sensor technology, data fusion and data interpretation for site characterization
Beckerman, M.
1991-10-01
In this report we discuss sensor technology, data fusion and data interpretation approaches of possible maximal usefulness for subsurface imaging and characterization of land-fill waste sites. Two sensor technologies, terrain conductivity using electromagnetic induction and ground penetrating radar, are described and the literature on the subject is reviewed. We identify the maximum entropy stochastic method as one providing a rigorously justifiable framework for fusing the sensor data, briefly summarize work done by us in this area, and examine some of the outstanding issues with regard to data fusion and interpretation. 25 refs., 17 figs.
[Relationship of statistics and data management in clinical trials].
Chen, Feng; Sun, Hua-long; Shen, Tong; Yu, Hao
2015-11-01
A perfect clinical trial must nave a solid study design, strict conduction, complete quality control, non-interference of statistical result, and acceptable risk-benefit ratio. To reach the target, the quality control (QC) should be performed from the study design to conduction, from the analysis to conclusion. We discuss the relationship between data management and biostatistics from the statistical point of view, and emphasize the importance of the statistical concept and methods in the improvement of data quality in clinical data management.
Using Data Mining to Teach Applied Statistics and Correlation
ERIC Educational Resources Information Center
Hartnett, Jessica L.
2016-01-01
This article describes two class activities that introduce the concept of data mining and very basic data mining analyses. Assessment data suggest that students learned some of the conceptual basics of data mining, understood some of the ethical concerns related to the practice, and were able to perform correlations via the Statistical Package for…
Bayesian Analysis of Order-Statistics Models for Ranking Data.
ERIC Educational Resources Information Center
Yu, Philip L. H.
2000-01-01
Studied the order-statistics models, extending the usual normal order-statistics model into one in which the underlying random variables followed a multivariate normal distribution. Used a Bayesian approach and the Gibbs sampling technique. Applied the proposed method to analyze presidential election data from the American Psychological…
Library Statistics of Colleges and Universities, Fall 1975. Institutional Data.
ERIC Educational Resources Information Center
Smith, Stanley V.
This report is part of the National Center for Education Statistics' Tenth Annual Higher Education General Information Survey (HEGIS), and also part of the first Library General Information Survey (LIBGIS). Extensive statistical data are presented in seven lengthy tables as follows: (1) Number of Units Held at End of Year in Library Collection,…
NASA Astrophysics Data System (ADS)
Bouzid, Mohamed; Sellaoui, Lotfi; Khalfaoui, Mohamed; Belmabrouk, Hafedh; Lamine, Abdelmottaleb Ben
2016-02-01
In this work, we studied the adsorption of ethanol on three types of activated carbon, namely parent Maxsorb III and two chemically modified activated carbons (H2-Maxsorb III and KOH-H2-Maxsorb III). This investigation has been conducted on the basis of the grand canonical formalism in statistical physics and on simplified assumptions. This led to three parameter equations describing the adsorption of ethanol onto the three types of activated carbon. There was a good correlation between experimental data and results obtained by the new proposed equation. The parameters characterizing the adsorption isotherm were the number of adsorbed molecules (s) per site n, the density of the receptor sites per unit mass of the adsorbent Nm, and the energetic parameter p1/2. They were estimated for the studied systems by a non linear least square regression. The results show that the ethanol molecules were adsorbed in perpendicular (or non parallel) position to the adsorbent surface. The magnitude of the calculated adsorption energies reveals that ethanol is physisorbed onto activated carbon. Both van der Waals and hydrogen interactions were involved in the adsorption process. The calculated values of the specific surface AS, proved that the three types of activated carbon have a highly microporous surface.
"What If" Analyses: Ways to Interpret Statistical Significance Test Results Using EXCEL or "R"
ERIC Educational Resources Information Center
Ozturk, Elif
2012-01-01
The present paper aims to review two motivations to conduct "what if" analyses using Excel and "R" to understand the statistical significance tests through the sample size context. "What if" analyses can be used to teach students what statistical significance tests really do and in applied research either prospectively to estimate what sample size…
Identification and interpretation of patterns in rocket engine data
NASA Technical Reports Server (NTRS)
Lo, C. F.; Wu, K.; Whitehead, B. A.
1993-01-01
A prototype software system was constructed to detect anomalous Space Shuttle Main Engine (SSME) behavior in the early stages of fault development significantly earlier than the indication provided by either redline detection mechanism or human expert analysis. The major task of the research project is to analyze ground test data, to identify patterns associated with the anomalous engine behavior, and to develop a pattern identification and detection system on the basis of this analysis. A prototype expert system which was developed on both PC and Symbolics 3670 lisp machine for detecting anomalies in turbopump vibration data was checked with data from ground tests 902-473, 902-501, 902-519, and 904-097 of the Space Shuttle Main Engine. The neural networks method was also applied to supplement the statistical method utilized in the prototype system to investigate the feasibility in detecting anomalies in turbopump vibration of SSME. In most cases the anomalies detected by the expert system agree with those reported by NASA. On the neural networks approach, the results are given the successful detection rate higher than 95 percent to identify either normal or abnormal running condition based on the experimental data as well as numerical simulation.
Identification and interpretation of patterns in rocket engine data
NASA Astrophysics Data System (ADS)
Lo, C. F.; Wu, K.; Whitehead, B. A.
1993-10-01
A prototype software system was constructed to detect anomalous Space Shuttle Main Engine (SSME) behavior in the early stages of fault development significantly earlier than the indication provided by either redline detection mechanism or human expert analysis. The major task of the research project is to analyze ground test data, to identify patterns associated with the anomalous engine behavior, and to develop a pattern identification and detection system on the basis of this analysis. A prototype expert system which was developed on both PC and Symbolics 3670 lisp machine for detecting anomalies in turbopump vibration data was checked with data from ground tests 902-473, 902-501, 902-519, and 904-097 of the Space Shuttle Main Engine. The neural networks method was also applied to supplement the statistical method utilized in the prototype system to investigate the feasibility in detecting anomalies in turbopump vibration of SSME. In most cases the anomalies detected by the expert system agree with those reported by NASA. On the neural networks approach, the results are given the successful detection rate higher than 95 percent to identify either normal or abnormal running condition based on the experimental data as well as numerical simulation.
3D seismic data interpretation of Boonsville Field, Texas
NASA Astrophysics Data System (ADS)
Alhakeem, Aamer Ali
The Boonsville field is one of the largest gas fields in the US located in the Fort Worth Basin, north central Texas. The highest potential reservoirs reside in the Bend Conglomerate deposited during the Pennsylvanian. The Boonsville data set is prepared by the Bureau of Economic Geology at the University of Texas, Austin, as part of the secondary gas recovery program. The Boonsville field seismic data set covers an area of 5.5 mi2. It includes 38 wells data. The Bend Conglomerate is deposited in fluvio-deltaic transaction. It is subdivided into many genetic sequences which include depositions of sandy conglomerate representing the potential reserves in the Boonsville field. The geologic structure of the Boonsville field subsurface are visualized by constructing structure maps of Caddo, Davis, Runaway, Beans Cr, Vineyard, and Wade. The mapping includes time structure, depth structure, horizon slice, velocity maps, and isopach maps. Many anticlines and folds are illustrated. Karst collapse features are indicated specially in the lower Atoka. Dipping direction of the Bend Conglomerate horizons are changing from dipping toward north at the top to dipping toward east at the bottom. Stratigraphic interpretation of the Runaway Formation and the Vineyard Formation using well logs and seismic data integration showed presence of fluvial dominated channels, point bars, and a mouth bar. RMS amplitude maps are generated and used as direct hydrocarbon indicator for the targeted formations. As a result, bright spots are indicated and used to identify potential reservoirs. Petrophysical analysis is conducted to obtain gross, net pay, NGR, water saturation, shale volume, porosity, and gas formation factor. Volumetric calculations estimated 989.44 MMSCF as the recoverable original gas in-place for a prospect in the Runaway and 3.32 BSCF for a prospect in the Vineyard Formation.
Chromosome microarrays in diagnostic testing: interpreting the genomic data.
Peters, Greg B; Pertile, Mark D
2014-01-01
DNA-based Chromosome MicroArrays (CMAs) are now well established as diagnostic tools in clinical genetics laboratories. Over the last decade, the primary application of CMAs has been the genome-wide detection of a particular class of mutation known as copy number variants (CNVs). Since 2010, CMA testing has been recommended as a first-tier test for detection of CNVs associated with intellectual disability, autism spectrum disorders, and/or multiple congenital anomalies…in the post-natal setting. CNVs are now regarded as pathogenic in 14-18 % of patients referred for these (and related) disorders.Through consideration of clinical examples, and several microarray platforms, we attempt to provide an appreciation of microarray diagnostics, from the initial inspection of the microarray data, to the composing of the patient report. In CMA data interpretation, a major challenge comes from the high frequency of clinically irrelevant CNVs observed within "patient" and "normal" populations. As might be predicted, the more common and clinically insignificant CNVs tend to be the smaller ones <100 kb in length, involving few or no known genes. However, this relationship is not at all straightforward: CNV length and gene content are only very imperfect indicators of CNV pathogenicity. Presently, there are no reliable means of separating, a priori, the benign from the pathological CNV classes.This chapter also considers sources of technical "noise" within CMA data sets. Some level of noise is inevitable in diagnostic genomics, given the very large number of data points generated in any one test. Noise further limits CMA resolution, and some miscalling of CNVs is unavoidable. In this, there is no ideal solution, but various strategies for handling noise are available. Even without solutions, consideration of these diagnostic problems per se is informative, as they afford critical insights into the biological and technical underpinnings of CNV discovery. These are indispensable
NASA Technical Reports Server (NTRS)
Bhatia, A. K.; Underhill, A. B.
1986-01-01
The interpretation of the intensities of the hydrogen and helium emission lines in O and Wolf-Rayet spectra in terms of the abundance of hydrogen relative to helium requires information regarding the distribution of hydrogen and helium atoms and ions over their several energy states. In addition, some estimate is needed regarding the transmission of the radiation through the stellar mantle. The present paper provides new information concerning the population of the energy levels of hydrogen and helium when statistical equilibrium occurs in the presence of a radiation field. The results are applied to an interpretation of the spectra of four Wolf-Rayet stars, taking into account the implications for interpreting the spectra of O stars, OB supergiants, and Be stars.
Boyle temperature as a point of ideal gas in gentile statistics and its economic interpretation
NASA Astrophysics Data System (ADS)
Maslov, V. P.; Maslova, T. V.
2014-07-01
Boyle temperature is interpreted as the temperature at which the formation of dimers becomes impossible. To Irving Fisher's correspondence principle we assign two more quantities: the number of degrees of freedom, and credit. We determine the danger level of the mass of money M when the mutual trust between economic agents begins to fall.
Antweiler, R.C.; Taylor, H.E.
2008-01-01
The main classes of statistical treatment of below-detection limit (left-censored) environmental data for the determination of basic statistics that have been used in the literature are substitution methods, maximum likelihood, regression on order statistics (ROS), and nonparametric techniques. These treatments, along with using all instrument-generated data (even those below detection), were evaluated by examining data sets in which the true values of the censored data were known. It was found that for data sets with less than 70% censored data, the best technique overall for determination of summary statistics was the nonparametric Kaplan-Meier technique. ROS and the two substitution methods of assigning one-half the detection limit value to censored data or assigning a random number between zero and the detection limit to censored data were adequate alternatives. The use of these two substitution methods, however, requires a thorough understanding of how the laboratory censored the data. The technique of employing all instrument-generated data - including numbers below the detection limit - was found to be less adequate than the above techniques. At high degrees of censoring (greater than 70% censored data), no technique provided good estimates of summary statistics. Maximum likelihood techniques were found to be far inferior to all other treatments except substituting zero or the detection limit value to censored data.
Accuracy and Efficiency of Data Interpretation: A Comparison of Data Display Methods
ERIC Educational Resources Information Center
Lefebre, Elizabeth; Fabrizio, Michael; Merbitz, Charles
2008-01-01
Although behavior analysis relies primarily on visual inspection for interpreting data, previous research shows that the method of display can influence the judgments. In the current study, 26 Board-Certified Behavior Analysts reviewed two data sets displayed on each of three methods--equal-interval graphs, tables, and Standard Celeration…
Statistical methods of combining information: Applications to sensor data fusion
Burr, T.
1996-12-31
This paper reviews some statistical approaches to combining information from multiple sources. Promising new approaches will be described, and potential applications to combining not-so-different data sources such as sensor data will be discussed. Experiences with one real data set are described.
Cryovolcanism on Titan: Interpretations from Cassini RADAR data
NASA Astrophysics Data System (ADS)
Lopes, R. M.; Wall, S. D.; Stofan, E. R.; Wood, C. A.; Nelson, R. M.; Mitchell, K. L.; Radebaugh, J.; Stiles, B. W.; Kamp, L. W.; Lorenz, R. D.; Lunine, J. I.; Janssen, M. A.; Farr, T. G.; Mitri, G.; Kirk, R.; Paganelli, F.
2008-12-01
Several surface features interpreted as cryovolcanic in origin have been observed on the surface of Titan by both Cassini's RADAR (in SAR mode) and VIMS instruments throughout the Cassini prime mission. These include large flows, an eroded volcanic dome or shield, and calderas associated with flows. The Titan flyby T41 of 22 February 2008 includes a SAR image of part of Hotei Arcus, a semi-circular albedo feature, some 650 km in length along the arc, centered at 26S 79W. A second SAR image of Hotei was acquired May 12, 2008 on flyby T43. These images show that the arcuate southern boundary of Hotei, also seen in ISS data, appears somewhat mountainous in the SAR imagery, and 5 distinct narrow channels, presumably fluvial, flow radially inwards. In the center of the arc, the images reveal lobate, flowlike features that embay surrounding terrains and cover the channels. Analysis of these features suggest that they are of cryovolcanic origin and younger than surrounding terrain. Their appearance is superficially similar to a region in western Xanadu at 10S 140W imaged by RADAR on flyby T13, on Apr 30, 2006. These two regions are morphologically unlike most of the other cryovolcanic regions so far seen on Titan. Both regions correspond to those identified by the Cassini VIMS as having anomalous and variable infrared brightness, probably due to recent cryovolcanic activity. The RADAR images provide morphological evidence that is consistent with cryovolcanism.
Statistical summaries of selected Iowa streamflow data through September 2013
Eash, David A.; O'Shea, Padraic S.; Weber, Jared R.; Nguyen, Kevin T.; Montgomery, Nicholas L.; Simonson, Adrian J.
2016-01-04
Statistical summaries of streamflow data collected at 184 streamgages in Iowa are presented in this report. All streamgages included for analysis have at least 10 years of continuous record collected before or through September 2013. This report is an update to two previously published reports that presented statistical summaries of selected Iowa streamflow data through September 1988 and September 1996. The statistical summaries include (1) monthly and annual flow durations, (2) annual exceedance probabilities of instantaneous peak discharges (flood frequencies), (3) annual exceedance probabilities of high discharges, and (4) annual nonexceedance probabilities of low discharges and seasonal low discharges. Also presented for each streamgage are graphs of the annual mean discharges, mean annual mean discharges, 50-percent annual flow-duration discharges (median flows), harmonic mean flows, mean daily mean discharges, and flow-duration curves. Two sets of statistical summaries are presented for each streamgage, which include (1) long-term statistics for the entire period of streamflow record and (2) recent-term statistics for or during the 30-year period of record from 1984 to 2013. The recent-term statistics are only calculated for streamgages with streamflow records pre-dating the 1984 water year and with at least 10 years of record during 1984–2013. The streamflow statistics in this report are not adjusted for the effects of water use; although some of this water is used consumptively, most of it is returned to the streams.
Pleil, Joachim D; Sobus, Jon R; Stiegel, Matthew A; Hu, Di; Oliver, Karen D; Olenick, Cassandra; Strynar, Mark; Clark, Mary; Madden, Michael C; Funk, William E
2014-01-01
The progression of science is driven by the accumulation of knowledge and builds upon published work of others. Another important feature is to place current results into the context of previous observations. The published literature, however, often does not provide sufficient direct information for the reader to interpret the results beyond the scope of that particular article. Authors tend to provide only summary statistics in various forms, such as means and standard deviations, median and range, quartiles, 95% confidence intervals, and so on, rather than providing measurement data. Second, essentially all environmental and biomonitoring measurements have an underlying lognormal distribution, so certain published statistical characterizations may be inappropriate for comparisons. The aim of this study was to review and develop direct conversions of different descriptions of data into a standard format comprised of the geometric mean (GM) and the geometric standard deviation (GSD) and then demonstrate how, under the assumption of lognormal distribution, these parameters are used to answer questions of confidence intervals, exceedance levels, and statistical differences among distributions. A wide variety of real-world measurement data sets was reviewed, and it was demonstrated that these data sets are indeed of lognormal character, thus making them amenable to these methods. Potential errors incurred from making retrospective estimates from disparate summary statistics are described. In addition to providing tools to interpret "other people's data," this review should also be seen as a cautionary tale for publishing one's own data to make it as useful as possible for other researchers.
The Galactic Center: possible interpretations of observational data.
NASA Astrophysics Data System (ADS)
Zakharov, Alexander
2015-08-01
There are not too many astrophysical cases where one really has an opportunity to check predictions of general relativity in the strong gravitational field limit. For these aims the black hole at the Galactic Center is one of the most interesting cases since it is the closest supermassive black hole. Gravitational lensing is a natural phenomenon based on the effect of light deflection in a gravitational field (isotropic geodesics are not straight lines in gravitational field and in a weak gravitational field one has small corrections for light deflection while the perturbative approach is not suitable for a strong gravitational field). Now there are two basic observational techniques to investigate a gravitational potential at the Galactic Center, namely, a) monitoring the orbits of bright stars near the Galactic Center to reconstruct a gravitational potential; b) measuring a size and a shape of shadows around black hole giving an alternative possibility to evaluate black hole parameters in mm-band with VLBI-technique. At the moment one can use a small relativistic correction approach for stellar orbit analysis (however, in the future the approximation will not be not precise enough due to enormous progress of observational facilities) while now for smallest structure analysis in VLBI observations one really needs a strong gravitational field approximation. We discuss results of observations, their conventional interpretations, tensions between observations and models and possible hints for a new physics from the observational data and tensions between observations and interpretations.References1. A.F. Zakharov, F. De Paolis, G. Ingrosso, and A. A. Nucita, New Astronomy Reviews, 56, 64 (2012).2. D. Borka, P. Jovanovic, V. Borka Jovanovic and A.F. Zakharov, Physical Reviews D, 85, 124004 (2012).3. D. Borka, P. Jovanovic, V. Borka Jovanovic and A.F. Zakharov, Journal of Cosmology and Astroparticle Physics, 11, 050 (2013).4. A.F. Zakharov, Physical Reviews D 90
[Some basic aspects in statistical analysis of visual acuity data].
Ren, Ze-Qin
2007-06-01
All visual acuity charts used currently have their own shortcomings. Therefore, it is difficult for ophthalmologists to evaluate visual acuity data. Many problems present in the use of statistical methods for handling visual acuity data in clinical research. The quantitative relationship between visual acuity and visual angle varied in different visual acuity charts. The type of visual acuity and visual angle are different from each other. Therefore, different statistical methods should be used for different data sources. A correct understanding and analysis of visual acuity data could be obtained only after the elucidation of these aspects.
Social inequality: from data to statistical physics modeling
NASA Astrophysics Data System (ADS)
Chatterjee, Arnab; Ghosh, Asim; Inoue, Jun-ichi; Chakrabarti, Bikas K.
2015-09-01
Social inequality is a topic of interest since ages, and has attracted researchers across disciplines to ponder over it origin, manifestation, characteristics, consequences, and finally, the question of how to cope with it. It is manifested across different strata of human existence, and is quantified in several ways. In this review we discuss the origins of social inequality, the historical and commonly used non-entropic measures such as Lorenz curve, Gini index and the recently introduced k index. We also discuss some analytical tools that aid in understanding and characterizing them. Finally, we argue how statistical physics modeling helps in reproducing the results and interpreting them.
Statistical Modeling of Large-Scale Simulation Data
Eliassi-Rad, T; Critchlow, T; Abdulla, G
2002-02-22
With the advent of fast computer systems, Scientists are now able to generate terabytes of simulation data. Unfortunately, the shear size of these data sets has made efficient exploration of them impossible. To aid scientists in gathering knowledge from their simulation data, we have developed an ad-hoc query infrastructure. Our system, called AQSim (short for Ad-hoc Queries for Simulation) reduces the data storage requirements and access times in two stages. First, it creates and stores mathematical and statistical models of the data. Second, it evaluates queries on the models of the data instead of on the entire data set. In this paper, we present two simple but highly effective statistical modeling techniques for simulation data. Our first modeling technique computes the true mean of systematic partitions of the data. It makes no assumptions about the distribution of the data and uses a variant of the root mean square error to evaluate a model. In our second statistical modeling technique, we use the Andersen-Darling goodness-of-fit method on systematic partitions of the data. This second method evaluates a model by how well it passes the normality test on the data. Both of our statistical models summarize the data so as to answer range queries in the most effective way. We calculate precision on an answer to a query by scaling the one-sided Chebyshev Inequalities with the original mesh's topology. Our experimental evaluations on two scientific simulation data sets illustrate the value of using these statistical modeling techniques on large simulation data sets.
Textual Analysis and Data Mining: An Interpreting Research on Nursing.
De Caro, W; Mitello, L; Marucci, A R; Lancia, L; Sansoni, J
2016-01-01
Every day there is a data explosion on the web. In 2013, 5 exabytes of content were created each day. Every hour internet networks carries a quantity of texts equivalent to twenty billion books. For idea Iit is a huge mass of information on the linguistic behavior of people and society that was unthinkable until a few years ago. It is an opportunity for valuable analysis for understanding social phenomena, also in nursing and health care sector.This poster shows the the steps of an idealy strategy for textual statistical analysis and the process of extracting useful information about health care, referring expecially nursing care from journal and web information. We show the potential of web tools of Text Mining applications (DTM, Wordle, Voyant Tools, Taltac 2.10, Treecloud and other web 2.0 app) analyzing text data and information extraction about sentiment, perception, scientific activites and visibility of nursing. This specific analysis is conduct analyzing "Repubblica", first newspaper in Italy (years of analisys: 2012-14) and one italian scientific nursing journal (years: 2012-14). PMID:27332424
Textual Analysis and Data Mining: An Interpreting Research on Nursing.
De Caro, W; Mitello, L; Marucci, A R; Lancia, L; Sansoni, J
2016-01-01
Every day there is a data explosion on the web. In 2013, 5 exabytes of content were created each day. Every hour internet networks carries a quantity of texts equivalent to twenty billion books. For idea Iit is a huge mass of information on the linguistic behavior of people and society that was unthinkable until a few years ago. It is an opportunity for valuable analysis for understanding social phenomena, also in nursing and health care sector.This poster shows the the steps of an idealy strategy for textual statistical analysis and the process of extracting useful information about health care, referring expecially nursing care from journal and web information. We show the potential of web tools of Text Mining applications (DTM, Wordle, Voyant Tools, Taltac 2.10, Treecloud and other web 2.0 app) analyzing text data and information extraction about sentiment, perception, scientific activites and visibility of nursing. This specific analysis is conduct analyzing "Repubblica", first newspaper in Italy (years of analisys: 2012-14) and one italian scientific nursing journal (years: 2012-14).
Korjus, Kristjan; Hebart, Martin N.; Vicente, Raul
2016-01-01
Supervised machine learning methods typically require splitting data into multiple chunks for training, validating, and finally testing classifiers. For finding the best parameters of a classifier, training and validation are usually carried out with cross-validation. This is followed by application of the classifier with optimized parameters to a separate test set for estimating the classifier’s generalization performance. With limited data, this separation of test data creates a difficult trade-off between having more statistical power in estimating generalization performance versus choosing better parameters and fitting a better model. We propose a novel approach that we term “Cross-validation and cross-testing” improving this trade-off by re-using test data without biasing classifier performance. The novel approach is validated using simulated data and electrophysiological recordings in humans and rodents. The results demonstrate that the approach has a higher probability of discovering significant results than the standard approach of cross-validation and testing, while maintaining the nominal alpha level. In contrast to nested cross-validation, which is maximally efficient in re-using data, the proposed approach additionally maintains the interpretability of individual parameters. Taken together, we suggest an addition to currently used machine learning approaches which may be particularly useful in cases where model weights do not require interpretation, but parameters do. PMID:27564393
Korjus, Kristjan; Hebart, Martin N; Vicente, Raul
2016-01-01
Supervised machine learning methods typically require splitting data into multiple chunks for training, validating, and finally testing classifiers. For finding the best parameters of a classifier, training and validation are usually carried out with cross-validation. This is followed by application of the classifier with optimized parameters to a separate test set for estimating the classifier's generalization performance. With limited data, this separation of test data creates a difficult trade-off between having more statistical power in estimating generalization performance versus choosing better parameters and fitting a better model. We propose a novel approach that we term "Cross-validation and cross-testing" improving this trade-off by re-using test data without biasing classifier performance. The novel approach is validated using simulated data and electrophysiological recordings in humans and rodents. The results demonstrate that the approach has a higher probability of discovering significant results than the standard approach of cross-validation and testing, while maintaining the nominal alpha level. In contrast to nested cross-validation, which is maximally efficient in re-using data, the proposed approach additionally maintains the interpretability of individual parameters. Taken together, we suggest an addition to currently used machine learning approaches which may be particularly useful in cases where model weights do not require interpretation, but parameters do. PMID:27564393
Estimation of global network statistics from incomplete data.
Bliss, Catherine A; Danforth, Christopher M; Dodds, Peter Sheridan
2014-01-01
Complex networks underlie an enormous variety of social, biological, physical, and virtual systems. A profound complication for the science of complex networks is that in most cases, observing all nodes and all network interactions is impossible. Previous work addressing the impacts of partial network data is surprisingly limited, focuses primarily on missing nodes, and suggests that network statistics derived from subsampled data are not suitable estimators for the same network statistics describing the overall network topology. We generate scaling methods to predict true network statistics, including the degree distribution, from only partial knowledge of nodes, links, or weights. Our methods are transparent and do not assume a known generating process for the network, thus enabling prediction of network statistics for a wide variety of applications. We validate analytical results on four simulated network classes and empirical data sets of various sizes. We perform subsampling experiments by varying proportions of sampled data and demonstrate that our scaling methods can provide very good estimates of true network statistics while acknowledging limits. Lastly, we apply our techniques to a set of rich and evolving large-scale social networks, Twitter reply networks. Based on 100 million tweets, we use our scaling techniques to propose a statistical characterization of the Twitter Interactome from September 2008 to November 2008. Our treatment allows us to find support for Dunbar's hypothesis in detecting an upper threshold for the number of active social contacts that individuals maintain over the course of one week.
Professional judgment and the interpretation of viable mold air sampling data.
Johnson, David; Thompson, David; Clinkenbeard, Rodney; Redus, Jason
2008-10-01
Although mold air sampling is technically straightforward, interpreting the results to decide if there is an indoor source is not. Applying formal statistical tests to mold sampling data is an error-prone practice due to the extreme data variability. With neither established exposure limits nor useful statistical techniques, indoor air quality investigators often must rely on their professional judgment, but the lack of a consensus "decision strategy" incorporating explicit decision criteria requires professionals to establish their own personal set of criteria when interpreting air sampling data. This study examined the level of agreement among indoor air quality practitioners in their evaluation of airborne mold sampling data and explored differences in inter-evaluator assessments. Eighteen investigators independently judged 30 sets of viable mold air sampling results to indicate: "definite indoor mold source," "likely indoor mold source," "not enough information to decide," "likely no indoor mold source," or "definitely no indoor mold source." Kappa coefficient analysis indicated weak inter-observer reliability, and comparison of evaluator mean scores showed clear inter-evaluator differences in their overall scoring patterns. The responses were modeled on indicator "traits" of the data sets using a generalized, linear mixed model approach and showed several traits to be associated with respondents' ratings, but they also demonstrated distinct and divergent inter-evaluator response patterns. Conclusions were that there was only weak overall agreement in evaluation of the mold sampling data, that particular traits of the data were associated with the conclusions reached, and that there were substantial inter-evaluator differences that were likely due to differences in the personal decision criteria employed by the individual evaluators. The overall conclusion was that there is a need for additional work to rigorously explore the constellation of decision criteria
ERIC Educational Resources Information Center
Lipsey, Mark W.; Puzio, Kelly; Yun, Cathy; Hebert, Michael A.; Steinka-Fry, Kasia; Cole, Mikel W.; Roberts, Megan; Anthony, Karen S.; Busick, Matthew D.
2012-01-01
This paper is directed to researchers who conduct and report education intervention studies. Its purpose is to stimulate and guide them to go a step beyond reporting the statistics that emerge from their analysis of the differences between experimental groups on the respective outcome variables. With what is often very minimal additional effort,…
Mars Geological Province Designations for the Interpretation of GRS Data
NASA Technical Reports Server (NTRS)
Dohm, J. M.; Kerry, K.; Baker, V. R.; Boynton, W.; Maruyama, Shige; Anderson, R. C.
2005-01-01
elemental information, we have defined geologic provinces that represent significant windows into the geological evolution of Mars, unfolding the GEOMARS Theory and forming the basis for interpreting GRS data.
Kappa statistic for clustered matched-pair data.
Yang, Zhao; Zhou, Ming
2014-07-10
Kappa statistic is widely used to assess the agreement between two procedures in the independent matched-pair data. For matched-pair data collected in clusters, on the basis of the delta method and sampling techniques, we propose a nonparametric variance estimator for the kappa statistic without within-cluster correlation structure or distributional assumptions. The results of an extensive Monte Carlo simulation study demonstrate that the proposed kappa statistic provides consistent estimation and the proposed variance estimator behaves reasonably well for at least a moderately large number of clusters (e.g., K ≥50). Compared with the variance estimator ignoring dependence within a cluster, the proposed variance estimator performs better in maintaining the nominal coverage probability when the intra-cluster correlation is fair (ρ ≥0.3), with more pronounced improvement when ρ is further increased. To illustrate the practical application of the proposed estimator, we analyze two real data examples of clustered matched-pair data.
Data Acquisition and Preprocessing in Studies on Humans: What Is Not Taught in Statistics Classes?
Zhu, Yeyi; Hernandez, Ladia M; Mueller, Peter; Dong, Yongquan; Forman, Michele R
2013-01-01
The aim of this paper is to address issues in research that may be missing from statistics classes and important for (bio-)statistics students. In the context of a case study, we discuss data acquisition and preprocessing steps that fill the gap between research questions posed by subject matter scientists and statistical methodology for formal inference. Issues include participant recruitment, data collection training and standardization, variable coding, data review and verification, data cleaning and editing, and documentation. Despite the critical importance of these details in research, most of these issues are rarely discussed in an applied statistics program. One reason for the lack of more formal training is the difficulty in addressing the many challenges that can possibly arise in the course of a study in a systematic way. This article can help to bridge this gap between research questions and formal statistical inference by using an illustrative case study for a discussion. We hope that reading and discussing this paper and practicing data preprocessing exercises will sensitize statistics students to these important issues and achieve optimal conduct, quality control, analysis, and interpretation of a study.
Data Acquisition and Preprocessing in Studies on Humans: What Is Not Taught in Statistics Classes?
Zhu, Yeyi; Hernandez, Ladia M.; Mueller, Peter; Dong, Yongquan; Forman, Michele R.
2013-01-01
The aim of this paper is to address issues in research that may be missing from statistics classes and important for (bio-)statistics students. In the context of a case study, we discuss data acquisition and preprocessing steps that fill the gap between research questions posed by subject matter scientists and statistical methodology for formal inference. Issues include participant recruitment, data collection training and standardization, variable coding, data review and verification, data cleaning and editing, and documentation. Despite the critical importance of these details in research, most of these issues are rarely discussed in an applied statistics program. One reason for the lack of more formal training is the difficulty in addressing the many challenges that can possibly arise in the course of a study in a systematic way. This article can help to bridge this gap between research questions and formal statistical inference by using an illustrative case study for a discussion. We hope that reading and discussing this paper and practicing data preprocessing exercises will sensitize statistics students to these important issues and achieve optimal conduct, quality control, analysis, and interpretation of a study. PMID:24511148
Using Carbon Emissions Data to "Heat Up" Descriptive Statistics
ERIC Educational Resources Information Center
Brooks, Robert
2012-01-01
This article illustrates using carbon emissions data in an introductory statistics assignment. The carbon emissions data has desirable characteristics including: choice of measure; skewness; and outliers. These complexities allow research and public policy debate to be introduced. (Contains 4 figures and 2 tables.)
ERIC Educational Resources Information Center
Maltese, Adam V.; Svetina, Dubravka; Harsh, Joseph A.
2015-01-01
In the STEM fields, adequate proficiency in reading and interpreting graphs is widely held as a central element for scientific literacy given the importance of data visualizations to succinctly present complex information. Although prior research espouses methods to improve graphing proficiencies, there is little understanding about when and how…
Mining gene expression data by interpreting principal components
Roden, Joseph C; King, Brandon W; Trout, Diane; Mortazavi, Ali; Wold, Barbara J; Hart, Christopher E
2006-01-01
Background There are many methods for analyzing microarray data that group together genes having similar patterns of expression over all conditions tested. However, in many instances the biologically important goal is to identify relatively small sets of genes that share coherent expression across only some conditions, rather than all or most conditions as required in traditional clustering; e.g. genes that are highly up-regulated and/or down-regulated similarly across only a subset of conditions. Equally important is the need to learn which conditions are the decisive ones in forming such gene sets of interest, and how they relate to diverse conditional covariates, such as disease diagnosis or prognosis. Results We present a method for automatically identifying such candidate sets of biologically relevant genes using a combination of principal components analysis and information theoretic metrics. To enable easy use of our methods, we have developed a data analysis package that facilitates visualization and subsequent data mining of the independent sources of significant variation present in gene microarray expression datasets (or in any other similarly structured high-dimensional dataset). We applied these tools to two public datasets, and highlight sets of genes most affected by specific subsets of conditions (e.g. tissues, treatments, samples, etc.). Statistically significant associations for highlighted gene sets were shown via global analysis for Gene Ontology term enrichment. Together with covariate associations, the tool provides a basis for building testable hypotheses about the biological or experimental causes of observed variation. Conclusion We provide an unsupervised data mining technique for diverse microarray expression datasets that is distinct from major methods now in routine use. In test uses, this method, based on publicly available gene annotations, appears to identify numerous sets of biologically relevant genes. It has proven especially
ERIC Educational Resources Information Center
Olsen, Robert J.
2008-01-01
I describe how data pooling and data visualization can be employed in the first-semester general chemistry laboratory to introduce core statistical concepts such as central tendency and dispersion of a data set. The pooled data are plotted as a 1-D scatterplot, a purpose-designed number line through which statistical features of the data are…
Smolders, R.; Den Hond, E.; Koppen, G.; Govarts, E.; Willems, H.; Casteleyn, L.; Kolossa-Gehring, M.; Fiddicke, U.; Castaño, A.; Koch, H.M.; Angerer, J.; Esteban, M.; Sepai, O.; Exley, K.; Bloemen, L.; Horvat, M.; Knudsen, L.E.; Joas, A.; Joas, R.; Biot, P.; and others
2015-08-15
In 2011 and 2012, the COPHES/DEMOCOPHES twin projects performed the first ever harmonized human biomonitoring survey in 17 European countries. In more than 1800 mother–child pairs, individual lifestyle data were collected and cadmium, cotinine and certain phthalate metabolites were measured in urine. Total mercury was determined in hair samples. While the main goal of the COPHES/DEMOCOPHES twin projects was to develop and test harmonized protocols and procedures, the goal of the current paper is to investigate whether the observed differences in biomarker values among the countries implementing DEMOCOPHES can be interpreted using information from external databases on environmental quality and lifestyle. In general, 13 countries having implemented DEMOCOPHES provided high-quality data from external sources that were relevant for interpretation purposes. However, some data were not available for reporting or were not in line with predefined specifications. Therefore, only part of the external information could be included in the statistical analyses. Nonetheless, there was a highly significant correlation between national levels of fish consumption and mercury in hair, the strength of antismoking legislation was significantly related to urinary cotinine levels, and we were able to show indications that also urinary cadmium levels were associated with environmental quality and food quality. These results again show the potential of biomonitoring data to provide added value for (the evaluation of) evidence-informed policy making. - Highlights: • External data was collected to interpret HBM data from DEMOCOPHES. • Hg in hair could be related to fish consumption across different countries. • Urinary cotinine was related to strictness of anti-smoking legislation. • Urinary Cd was borderline significantly related to air and food quality. • Lack of comparable data among countries hampered the analysis.
Cho, Yunju; Ahmed, Arif; Islam, Annana; Kim, Sunghwan
2015-01-01
Because of the increasing importance of heavy and unconventional crude oil as an energy source, there is a growing need for petroleomics: the pursuit of more complete and detailed knowledge of the chemical compositions of crude oil. Crude oil has an extremely complex nature; hence, techniques with ultra-high resolving capabilities, such as Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS), are necessary. FT-ICR MS has been successfully applied to the study of heavy and unconventional crude oils such as bitumen and shale oil. However, the analysis of crude oil with FT-ICR MS is not trivial, and it has pushed analysis to the limits of instrumental and methodological capabilities. For example, high-resolution mass spectra of crude oils may contain over 100,000 peaks that require interpretation. To visualize large data sets more effectively, data processing methods such as Kendrick mass defect analysis and statistical analyses have been developed. The successful application of FT-ICR MS to the study of crude oil has been critically dependent on key developments in FT-ICR MS instrumentation and data processing methods. This review offers an introduction to the basic principles, FT-ICR MS instrumentation development, ionization techniques, and data interpretation methods for petroleomics and is intended for readers having no prior experience in this field of study.
A Flexible Approach for the Statistical Visualization of Ensemble Data
Potter, K.; Wilson, A.; Bremer, P.; Williams, Dean N.; Pascucci, V.; Johnson, C.
2009-09-29
Scientists are increasingly moving towards ensemble data sets to explore relationships present in dynamic systems. Ensemble data sets combine spatio-temporal simulation results generated using multiple numerical models, sampled input conditions and perturbed parameters. While ensemble data sets are a powerful tool for mitigating uncertainty, they pose significant visualization and analysis challenges due to their complexity. We present a collection of overview and statistical displays linked through a high level of interactivity to provide a framework for gaining key scientific insight into the distribution of the simulation results as well as the uncertainty associated with the data. In contrast to methods that present large amounts of diverse information in a single display, we argue that combining multiple linked statistical displays yields a clearer presentation of the data and facilitates a greater level of visual data analysis. We demonstrate this approach using driving problems from climate modeling and meteorology and discuss generalizations to other fields.
Data analysis using the Gnu R system for statistical computation
Simone, James; /Fermilab
2011-07-01
R is a language system for statistical computation. It is widely used in statistics, bioinformatics, machine learning, data mining, quantitative finance, and the analysis of clinical drug trials. Among the advantages of R are: it has become the standard language for developing statistical techniques, it is being actively developed by a large and growing global user community, it is open source software, it is highly portable (Linux, OS-X and Windows), it has a built-in documentation system, it produces high quality graphics and it is easily extensible with over four thousand extension library packages available covering statistics and applications. This report gives a very brief introduction to R with some examples using lattice QCD simulation results. It then discusses the development of R packages designed for chi-square minimization fits for lattice n-pt correlation functions.
Choosing from Plausible Alternatives in Interpreting Qualitative Data.
ERIC Educational Resources Information Center
Donmoyer, Robert
This paper addresses a variation of the traditional validity question asked of qualitative researchers. Here the question is not "How do we know the qualitative researcher's question is valid?" but rather, "How does the qualitative researcher choose from among a multitude of apparently valid or at least plausible interpretations?" As early as…
Interpreting School Satisfaction Data from a Marketing Perspective.
ERIC Educational Resources Information Center
Pandiani, John A.; James, Brad C.; Banks, Steven M.
This paper presents results of a customer satisfaction survey of Vermont elementary and secondary public schools concerning satisfaction with mental health services during the 1996-97 school year. Analysis of completed questionnaires (N=233) are interpreted from a marketing perspective. Findings are reported for: (1) treated prevalence of…
Incorporating Ecotourist Needs Data into the Interpretive Planning Process.
ERIC Educational Resources Information Center
Masberg, Barbara A.; Savige, Margaret
1996-01-01
Discusses a model that enables systematic input from external sources during the interpretive planning process. Results indicate that the Ecotourist Needs Assessment (ETNA) is a feasible way to inform the planning process and better meet the needs of ecotourists. Contains 24 references. (DDR)
Statistics for correlated data: phylogenies, space, and time.
Ives, Anthony R; Zhu, Jun
2006-02-01
Here we give an introduction to the growing number of statistical techniques for analyzing data that are not independent realizations of the same sampling process--in other words, correlated data. We focus on regression problems, in which the value of a given variable depends linearly on the value of another variable. To illustrate different types of processes leading to correlated data, we analyze four simulated examples representing diverse problems arising in ecological studies. The first example is a comparison among species to determine the relationship between home-range area and body size; because species are phylogenetically related, they do not represent independent samples. The second example addresses spatial variation in net primary production and how this might be affected by soil nitrogen; because nearby locations are likely to have similar net primary productivity for reasons other than soil nitrogen, spatial correlation is likely. In the third example, we consider a time-series model to ask whether the decrease in density of a butterfly species is the result of decreases in its host-plant density; because the population density of a species in one generation is likely to affect the density in the following generation, time-series data are often correlated. The fourth example combines both spatial and temporal correlation in an experiment in which prey densities are manipulated to determine the response of predators to their food supply. For each of these examples, we use a different statistical approach for analyzing models of correlated data. Our goal is to give an overview of conceptual issues surrounding correlated data, rather than a detailed tutorial in how to apply different statistical techniques. By dispelling some of the mystery behind correlated data, we hope to encourage ecologists to learn about statistics that could be useful in their own work. Although at first encounter these techniques might seem complicated, they have the power to
Incorrectness of traditional statistical treatment of data obtained from CELSS
NASA Astrophysics Data System (ADS)
Bartsev, S. I.
Traditional approach to statistical treatment of data is based on the presumption about data independence. Closure of matter turnover inevitably results in dependency of data obtained from CELSS experiments. Taking this into account a statistical coefficient displaying the degree of interaction of system elements is proposed. In addition this coefficient shows the degree of applicability of traditional statistic treatment. On the base of maximum likelihood approach a formula for correct assessment of statistical expectation of CELSS parameters is proposed: [ mathord{frown}over μ_i =bar {x}_i +σ bar {x_i }^2 (A_0-sumlimits_j{bar {x}_j })/sumlimits_j {σ bar {x_j }^2 } quad , ] where hat {μ }_i - estimation of statistical expectation; bar {x}_i - arithmetic average; σ bar {x_i }^2 - dispersion of arithmetic average; A0 - total amount of given chemical element or total mass into CELSS. This formula is applicable to completely closed CELSS or in the case of precise control of mass in CELSS. The properties of introduced estimations are discussed.
A note on the kappa statistic for clustered dichotomous data.
Zhou, Ming; Yang, Zhao
2014-06-30
The kappa statistic is widely used to assess the agreement between two raters. Motivated by a simulation-based cluster bootstrap method to calculate the variance of the kappa statistic for clustered physician-patients dichotomous data, we investigate its special correlation structure and develop a new simple and efficient data generation algorithm. For the clustered physician-patients dichotomous data, based on the delta method and its special covariance structure, we propose a semi-parametric variance estimator for the kappa statistic. An extensive Monte Carlo simulation study is performed to evaluate the performance of the new proposal and five existing methods with respect to the empirical coverage probability, root-mean-square error, and average width of the 95% confidence interval for the kappa statistic. The variance estimator ignoring the dependence within a cluster is generally inappropriate, and the variance estimators from the new proposal, bootstrap-based methods, and the sampling-based delta method perform reasonably well for at least a moderately large number of clusters (e.g., the number of clusters K ⩾50). The new proposal and sampling-based delta method provide convenient tools for efficient computations and non-simulation-based alternatives to the existing bootstrap-based methods. Moreover, the new proposal has acceptable performance even when the number of clusters is as small as K = 25. To illustrate the practical application of all the methods, one psychiatric research data and two simulated clustered physician-patients dichotomous data are analyzed.
Statistical interpretation of transient current power-law decay in colloidal quantum dot arrays
NASA Astrophysics Data System (ADS)
Sibatov, R. T.
2011-08-01
A new statistical model of the charge transport in colloidal quantum dot arrays is proposed. It takes into account Coulomb blockade forbidding multiple occupancy of nanocrystals and the influence of energetic disorder of interdot space. The model explains power-law current transients and the presence of the memory effect. The fractional differential analogue of the Ohm law is found phenomenologically for nanocrystal arrays. The model combines ideas that were considered as conflicting by other authors: the Scher-Montroll idea about the power-law distribution of waiting times in localized states for disordered semiconductors is applied taking into account Coulomb blockade; Novikov's condition about the asymptotic power-law distribution of time intervals between successful current pulses in conduction channels is fulfilled; and the carrier injection blocking predicted by Ginger and Greenham (2000 J. Appl. Phys. 87 1361) takes place.
NASA Astrophysics Data System (ADS)
Chung, Jung R.; DeLaughter, Aimee H.; Baba, Justin S.; Spiegelman, Clifford H.; Amoss, M. S.; Cote, Gerard L.
2003-07-01
The Mueller matrix describes all the polarizing properties of a sample, and therefore the optical differences between cancerous and non-cancerous tissue should be present within the matrix elements. We present in this paper the Mueller matrices of three types of tissue; normal, benign mole, and malignant melanoma on a Sinclair swine model. Feature extraction is done on the Mueller matrix elements resulting in the retardance images, diattenuation images, and depolarization images. These images are analyzed in an attempt to determine the important factors for the identification of cancerous lesions from their benign counterparts. In addition, the extracted features are analyzed using statistical processing to develop an accurate classification scheme and to identify the importance of each parameter in the determination of cancerous versus non-cancerous tissue.
Statistical analysis and interpolation of compositional data in materials science.
Pesenson, Misha Z; Suram, Santosh K; Gregoire, John M
2015-02-01
Compositional data are ubiquitous in chemistry and materials science: analysis of elements in multicomponent systems, combinatorial problems, etc., lead to data that are non-negative and sum to a constant (for example, atomic concentrations). The constant sum constraint restricts the sampling space to a simplex instead of the usual Euclidean space. Since statistical measures such as mean and standard deviation are defined for the Euclidean space, traditional correlation studies, multivariate analysis, and hypothesis testing may lead to erroneous dependencies and incorrect inferences when applied to compositional data. Furthermore, composition measurements that are used for data analytics may not include all of the elements contained in the material; that is, the measurements may be subcompositions of a higher-dimensional parent composition. Physically meaningful statistical analysis must yield results that are invariant under the number of composition elements, requiring the application of specialized statistical tools. We present specifics and subtleties of compositional data processing through discussion of illustrative examples. We introduce basic concepts, terminology, and methods required for the analysis of compositional data and utilize them for the spatial interpolation of composition in a sputtered thin film. The results demonstrate the importance of this mathematical framework for compositional data analysis (CDA) in the fields of materials science and chemistry.
A statistical test to determine the quality of accelerometer data.
Slaven, J E; Andrew, M E; Violanti, J M; Burchfiel, C M; Vila, B J
2006-04-01
Accelerometer data quality can be inadequate due to data corruption or to non-compliance of the subject with regard to study protocols. We propose a simple statistical test to determine if accelerometer data are of good quality and can be used for analysis or if the data are of poor quality and should be discarded. We tested several data evaluation methods using a group of 105 subjects who wore Motionlogger actigraphs (Ambulatory Monitoring, Inc.) over a 15 day period to assess sleep quality in a study of health outcomes associated with stress among police officers. Using leave-one-out cross-validation and calibration-testing methods of discrimination statistics, error rates for the methods ranged from 0.0167 to 0.4046. We found that the best method was to use the overall average distance between consecutive time points and the overall average mean amplitude of consecutive time points. These values gave us a classification error rate of 0.0167. The average distance between points is a measure of smoothness in the data, and the average mean amplitude between points gave an average reading. Both of these values were then normed to determine a final statistic, K, which was then compared to a cut-off value, K(C), to determine data quality.
Quick Access: Find Statistical Data on the Internet.
ERIC Educational Resources Information Center
Su, Di
1999-01-01
Provides an annotated list of Internet sources (World Wide Web, ftp, and gopher sites) for current and historical statistical business data, including selected interest rates, the Consumer Price Index, the Producer Price Index, foreign currency exchange rates, noon buying rates, per diem rates, the special drawing right, stock quotes, and mutual…
Uses of Statistical Data to Support Financial Reports.
ERIC Educational Resources Information Center
Walters, Donald L.
1989-01-01
The formats of the school budget and annual financial report are usually prescribed by state regulations. Creative use of data already collected for other purposes will give meaning to a district's financial story. Lists recommended tables for the statistical section of a comprehensive annual financial report. (MLF)
Statistical Physics in the Era of Big Data
ERIC Educational Resources Information Center
Wang, Dashun
2013-01-01
With the wealth of data provided by a wide range of high-throughout measurement tools and technologies, statistical physics of complex systems is entering a new phase, impacting in a meaningful fashion a wide range of fields, from cell biology to computer science to economics. In this dissertation, by applying tools and techniques developed in…
Statistical Modeling for Radiation Hardness Assurance: Toward Bigger Data
NASA Technical Reports Server (NTRS)
Ladbury, R.; Campola, M. J.
2015-01-01
New approaches to statistical modeling in radiation hardness assurance are discussed. These approaches yield quantitative bounds on flight-part radiation performance even in the absence of conventional data sources. This allows the analyst to bound radiation risk at all stages and for all decisions in the RHA process. It also allows optimization of RHA procedures for the project's risk tolerance.
ERIC Educational Resources Information Center
Knirk, Frederick G.
Designed to assist educational researchers in utilizing microcomputers, this paper presents information on four types of computer software: writing tools for educators, statistical software designed to perform analyses of small and moderately large data sets, project management tools, and general education/research oriented information services…
Australian Vocational Education and Training Statistics 1998: Financial Data.
ERIC Educational Resources Information Center
National Centre for Vocational Education Research, Leabrook (Australia).
This publication presents some of the highlights of financial information summarized from the national collections of vocational education and training (VET) data in Australia for 1998. The report includes detailed statistics for Australia, its eight states and territories, and the Australian National Training Authority. The financial information…
Mississippi Public Junior Colleges Statistical Data, 1981-82.
ERIC Educational Resources Information Center
Moody, George V.; And Others
Designed to enable college administrators to reflect on norms and trends within Mississippi's 16 public junior colleges and to provide information to the general public, this report presents statistical data for 1981-82 on enrollments, finance, personnel, and services. After introductory material providing college addresses and the schedule of…
Exploring Foundation Concepts in Introductory Statistics Using Dynamic Data Points
ERIC Educational Resources Information Center
Ekol, George
2015-01-01
This paper analyses introductory statistics students' verbal and gestural expressions as they interacted with a dynamic sketch (DS) designed using "Sketchpad" software. The DS involved numeric data points built on the number line whose values changed as the points were dragged along the number line. The study is framed on aggregate…
Introduction to Statistics and Data Analysis With Computer Applications I.
ERIC Educational Resources Information Center
Morris, Carl; Rolph, John
This document consists of unrevised lecture notes for the first half of a 20-week in-house graduate course at Rand Corporation. The chapter headings are: (1) Histograms and descriptive statistics; (2) Measures of dispersion, distance and goodness of fit; (3) Using JOSS for data analysis; (4) Binomial distribution and normal approximation; (5)…
Harnessing Multivariate Statistics for Ellipsoidal Data in Structural Geology
NASA Astrophysics Data System (ADS)
Roberts, N.; Davis, J. R.; Titus, S.; Tikoff, B.
2015-12-01
Most structural geology articles do not state significance levels, report confidence intervals, or perform regressions to find trends. This is, in part, because structural data tend to include directions, orientations, ellipsoids, and tensors, which are not treatable by elementary statistics. We describe a full procedural methodology for the statistical treatment of ellipsoidal data. We use a reconstructed dataset of deformed ooids in Maryland from Cloos (1947) to illustrate the process. Normalized ellipsoids have five degrees of freedom and can be represented by a second order tensor. This tensor can be permuted into a five dimensional vector that belongs to a vector space and can be treated with standard multivariate statistics. Cloos made several claims about the distribution of deformation in the South Mountain fold, Maryland, and we reexamine two particular claims using hypothesis testing: 1) octahedral shear strain increases towards the axial plane of the fold; 2) finite strain orientation varies systematically along the trend of the axial trace as it bends with the Appalachian orogen. We then test the null hypothesis that the southern segment of South Mountain is the same as the northern segment. This test illustrates the application of ellipsoidal statistics, which combine both orientation and shape. We report confidence intervals for each test, and graphically display our results with novel plots. This poster illustrates the importance of statistics in structural geology, especially when working with noisy or small datasets.
Systematic misregistration and the statistical analysis of surface data.
Gee, Andrew H; Treece, Graham M
2014-02-01
Spatial normalisation is a key element of statistical parametric mapping and related techniques for analysing cohort statistics on voxel arrays and surfaces. The normalisation process involves aligning each individual specimen to a template using some sort of registration algorithm. Any misregistration will result in data being mapped onto the template at the wrong location. At best, this will introduce spatial imprecision into the subsequent statistical analysis. At worst, when the misregistration varies systematically with a covariate of interest, it may lead to false statistical inference. Since misregistration generally depends on the specimen's shape, we investigate here the effect of allowing for shape as a confound in the statistical analysis, with shape represented by the dominant modes of variation observed in the cohort. In a series of experiments on synthetic surface data, we demonstrate how allowing for shape can reveal true effects that were previously masked by systematic misregistration, and also guard against misinterpreting systematic misregistration as a true effect. We introduce some heuristics for disentangling misregistration effects from true effects, and demonstrate the approach's practical utility in a case study of the cortical bone distribution in 268 human femurs.
Statistical summaries of fatigue data for design purposes
NASA Technical Reports Server (NTRS)
Wirsching, P. H.
1983-01-01
Two methods are discussed for constructing a design curve on the safe side of fatigue data. Both the tolerance interval and equivalent prediction interval (EPI) concepts provide such a curve while accounting for both the distribution of the estimators in small samples and the data scatter. The EPI is also useful as a mechanism for providing necessary statistics on S-N data for a full reliability analysis which includes uncertainty in all fatigue design factors. Examples of statistical analyses of the general strain life relationship are presented. The tolerance limit and EPI techniques for defining a design curve are demonstrated. Examples usng WASPALOY B and RQC-100 data demonstrate that a reliability model could be constructed by considering the fatigue strength and fatigue ductility coefficients as two independent random variables. A technique given for establishing the fatigue strength for high cycle lives relies on an extrapolation technique and also accounts for "runners." A reliability model or design value can be specified.
Uncertainty resulting from multiple data usage in statistical downscaling
NASA Astrophysics Data System (ADS)
Kannan, S.; Ghosh, Subimal; Mishra, Vimal; Salvi, Kaustubh
2014-06-01
Statistical downscaling (SD), used for regional climate projections with coarse resolution general circulation model (GCM) outputs, is characterized by uncertainties resulting from multiple models. Here we observe another source of uncertainty resulting from the use of multiple observed and reanalysis data products in model calibration. In the training of SD, for Indian Summer Monsoon Rainfall (ISMR), we use two reanalysis data as predictors and three gridded data products for ISMR from different sources. We observe that the uncertainty resulting from six possible training options is comparable to that resulting from multiple GCMs. Though the original GCM simulations project spatially uniform increasing change of ISMR, at the end of 21st century, the same is not obtained with SD, which projects spatially heterogeneous and mixed changes of ISMR. This is due to the differences in statistical relationship between rainfall and predictors in GCM simulations and observed/reanalysis data, and SD considers the latter.
Method of interpretation of remotely sensed data and applications to land use
NASA Technical Reports Server (NTRS)
Parada, N. D. J. (Principal Investigator); Dossantos, A. P.; Foresti, C.; Demoraesnovo, E. M. L.; Niero, M.; Lombardo, M. A.
1981-01-01
Instructional material describing a methodology of remote sensing data interpretation and examples of applicatons to land use survey are presented. The image interpretation elements are discussed for different types of sensor systems: aerial photographs, radar, and MSS/LANDSAT. Visual and automatic LANDSAT image interpretation is emphasized.
A Comprehensive Statistically-Based Method to Interpret Real-Time Flowing Measurements
Keita Yoshioka; Pinan Dawkrajai; Analis A. Romero; Ding Zhu; A. D. Hill; Larry W. Lake
2007-01-15
With the recent development of temperature measurement systems, continuous temperature profiles can be obtained with high precision. Small temperature changes can be detected by modern temperature measuring instruments such as fiber optic distributed temperature sensor (DTS) in intelligent completions and will potentially aid the diagnosis of downhole flow conditions. In vertical wells, since elevational geothermal changes make the wellbore temperature sensitive to the amount and the type of fluids produced, temperature logs can be used successfully to diagnose the downhole flow conditions. However, geothermal temperature changes along the wellbore being small for horizontal wells, interpretations of a temperature log become difficult. The primary temperature differences for each phase (oil, water, and gas) are caused by frictional effects. Therefore, in developing a thermal model for horizontal wellbore, subtle temperature changes must be accounted for. In this project, we have rigorously derived governing equations for a producing horizontal wellbore and developed a prediction model of the temperature and pressure by coupling the wellbore and reservoir equations. Also, we applied Ramey's model (1962) to the build section and used an energy balance to infer the temperature profile at the junction. The multilateral wellbore temperature model was applied to a wide range of cases at varying fluid thermal properties, absolute values of temperature and pressure, geothermal gradients, flow rates from each lateral, and the trajectories of each build section. With the prediction models developed, we present inversion studies of synthetic and field examples. These results are essential to identify water or gas entry, to guide flow control devices in intelligent completions, and to decide if reservoir stimulation is needed in particular horizontal sections. This study will complete and validate these inversion studies.
A COMPREHENSIVE STATISTICALLY-BASED METHOD TO INTERPRET REAL-TIME FLOWING MEASUREMENTS
Pinan Dawkrajai; Analis A. Romero; Keita Yoshioka; Ding Zhu; A.D. Hill; Larry W. Lake
2004-10-01
In this project, we are developing new methods for interpreting measurements in complex wells (horizontal, multilateral and multi-branching wells) to determine the profiles of oil, gas, and water entry. These methods are needed to take full advantage of ''smart'' well instrumentation, a technology that is rapidly evolving to provide the ability to continuously and permanently monitor downhole temperature, pressure, volumetric flow rate, and perhaps other fluid flow properties at many locations along a wellbore; and hence, to control and optimize well performance. In this first year, we have made considerable progress in the development of the forward model of temperature and pressure behavior in complex wells. In this period, we have progressed on three major parts of the forward problem of predicting the temperature and pressure behavior in complex wells. These three parts are the temperature and pressure behaviors in the reservoir near the wellbore, in the wellbore or laterals in the producing intervals, and in the build sections connecting the laterals, respectively. Many models exist to predict pressure behavior in reservoirs and wells, but these are almost always isothermal models. To predict temperature behavior we derived general mass, momentum, and energy balance equations for these parts of the complex well system. Analytical solutions for the reservoir and wellbore parts for certain special conditions show the magnitude of thermal effects that could occur. Our preliminary sensitivity analyses show that thermal effects caused by near-wellbore reservoir flow can cause temperature changes that are measurable with smart well technology. This is encouraging for the further development of the inverse model.
Occupational exposure decisions: can limited data interpretation training help improve accuracy?
Logan, Perry; Ramachandran, Gurumurthy; Mulhausen, John; Hewett, Paul
2009-06-01
Accurate exposure assessments are critical for ensuring that potentially hazardous exposures are properly identified and controlled. The availability and accuracy of exposure assessments can determine whether resources are appropriately allocated to engineering and administrative controls, medical surveillance, personal protective equipment and other programs designed to protect workers. A desktop study was performed using videos, task information and sampling data to evaluate the accuracy and potential bias of participants' exposure judgments. Desktop exposure judgments were obtained from occupational hygienists for material handling jobs with small air sampling data sets (0-8 samples) and without the aid of computers. In addition, data interpretation tests (DITs) were administered to participants where they were asked to estimate the 95th percentile of an underlying log-normal exposure distribution from small data sets. Participants were presented with an exposure data interpretation or rule of thumb training which included a simple set of rules for estimating 95th percentiles for small data sets from a log-normal population. DIT was given to each participant before and after the rule of thumb training. Results of each DIT and qualitative and quantitative exposure judgments were compared with a reference judgment obtained through a Bayesian probabilistic analysis of the sampling data to investigate overall judgment accuracy and bias. There were a total of 4386 participant-task-chemical judgments for all data collections: 552 qualitative judgments made without sampling data and 3834 quantitative judgments with sampling data. The DITs and quantitative judgments were significantly better than random chance and much improved by the rule of thumb training. In addition, the rule of thumb training reduced the amount of bias in the DITs and quantitative judgments. The mean DIT % correct scores increased from 47 to 64% after the rule of thumb training (P < 0.001). The
Occupational exposure decisions: can limited data interpretation training help improve accuracy?
Logan, Perry; Ramachandran, Gurumurthy; Mulhausen, John; Hewett, Paul
2009-06-01
Accurate exposure assessments are critical for ensuring that potentially hazardous exposures are properly identified and controlled. The availability and accuracy of exposure assessments can determine whether resources are appropriately allocated to engineering and administrative controls, medical surveillance, personal protective equipment and other programs designed to protect workers. A desktop study was performed using videos, task information and sampling data to evaluate the accuracy and potential bias of participants' exposure judgments. Desktop exposure judgments were obtained from occupational hygienists for material handling jobs with small air sampling data sets (0-8 samples) and without the aid of computers. In addition, data interpretation tests (DITs) were administered to participants where they were asked to estimate the 95th percentile of an underlying log-normal exposure distribution from small data sets. Participants were presented with an exposure data interpretation or rule of thumb training which included a simple set of rules for estimating 95th percentiles for small data sets from a log-normal population. DIT was given to each participant before and after the rule of thumb training. Results of each DIT and qualitative and quantitative exposure judgments were compared with a reference judgment obtained through a Bayesian probabilistic analysis of the sampling data to investigate overall judgment accuracy and bias. There were a total of 4386 participant-task-chemical judgments for all data collections: 552 qualitative judgments made without sampling data and 3834 quantitative judgments with sampling data. The DITs and quantitative judgments were significantly better than random chance and much improved by the rule of thumb training. In addition, the rule of thumb training reduced the amount of bias in the DITs and quantitative judgments. The mean DIT % correct scores increased from 47 to 64% after the rule of thumb training (P < 0.001). The
Feature-Based Statistical Analysis of Combustion Simulation Data
Bennett, J; Krishnamoorthy, V; Liu, S; Grout, R; Hawkes, E; Chen, J; Pascucci, V; Bremer, P T
2011-11-18
We present a new framework for feature-based statistical analysis of large-scale scientific data and demonstrate its effectiveness by analyzing features from Direct Numerical Simulations (DNS) of turbulent combustion. Turbulent flows are ubiquitous and account for transport and mixing processes in combustion, astrophysics, fusion, and climate modeling among other disciplines. They are also characterized by coherent structure or organized motion, i.e. nonlocal entities whose geometrical features can directly impact molecular mixing and reactive processes. While traditional multi-point statistics provide correlative information, they lack nonlocal structural information, and hence, fail to provide mechanistic causality information between organized fluid motion and mixing and reactive processes. Hence, it is of great interest to capture and track flow features and their statistics together with their correlation with relevant scalar quantities, e.g. temperature or species concentrations. In our approach we encode the set of all possible flow features by pre-computing merge trees augmented with attributes, such as statistical moments of various scalar fields, e.g. temperature, as well as length-scales computed via spectral analysis. The computation is performed in an efficient streaming manner in a pre-processing step and results in a collection of meta-data that is orders of magnitude smaller than the original simulation data. This meta-data is sufficient to support a fully flexible and interactive analysis of the features, allowing for arbitrary thresholds, providing per-feature statistics, and creating various global diagnostics such as Cumulative Density Functions (CDFs), histograms, or time-series. We combine the analysis with a rendering of the features in a linked-view browser that enables scientists to interactively explore, visualize, and analyze the equivalent of one terabyte of simulation data. We highlight the utility of this new framework for combustion
Statistical Software for spatial analysis of stratigraphic data sets
2003-04-08
Stratistics s a tool for statistical analysis of spatially explicit data sets and model output for description and for model-data comparisons. lt is intended for the analysis of data sets commonly used in geology, such as gamma ray logs and lithologic sequences, as well as 2-D data such as maps. Stratistics incorporates a far wider range of spatial analysis methods drawn from multiple disciplines, than are currently available in other packages. These include incorporation ofmore » techniques from spatial and landscape ecology, fractal analysis, and mathematical geology. Its use should substantially reduce the risk associated with the use of predictive models« less
Normalization and extraction of interpretable metrics from raw accelerometry data
Bai, Jiawei; He, Bing; Shou, Haochang; Zipunnikov, Vadim; Glass, Thomas A.; Crainiceanu, Ciprian M.
2014-01-01
We introduce an explicit set of metrics for human activity based on high-density acceleration recordings from a hip-worn tri-axial accelerometer. These metrics are based on two concepts: (i) Time Active, a measure of the length of time when activity is distinguishable from rest and (ii) AI, a measure of the relative amplitude of activity relative to rest. All measurements are normalized (have the same interpretation across subjects and days), easy to explain and implement, and reproducible across platforms and software implementations. Metrics were validated by visual inspection of results and quantitative in-lab replication studies, and by an association study with health outcomes PMID:23999141
Methodology of remote sensing data interpretation and geological applications. [Brazil
NASA Technical Reports Server (NTRS)
Parada, N. D. J. (Principal Investigator); Veneziani, P.; Dosanjos, C. E.
1982-01-01
Elements of photointerpretation discussed include the analysis of photographic texture and structure as well as film tonality. The method used is based on conventional techniques developed for interpreting aerial black and white photographs. By defining the properties which characterize the form and individuality of dual images, homologous zones can be identified. Guy's logic method (1966) was adapted and used on functions of resolution, scale, and spectral characteristics of remotely sensed products. Applications of LANDSAT imagery are discussed for regional geological mapping, mineral exploration, hydrogeology, and geotechnical engineering in Brazil.
Statistically invalid classification of high throughput gene expression data.
Barbash, Shahar; Soreq, Hermona
2013-01-01
Classification analysis based on high throughput data is a common feature in neuroscience and other fields of science, with a rapidly increasing impact on both basic biology and disease-related studies. The outcome of such classifications often serves to delineate novel biochemical mechanisms in health and disease states, identify new targets for therapeutic interference, and develop innovative diagnostic approaches. Given the importance of this type of studies, we screened 111 recently-published high-impact manuscripts involving classification analysis of gene expression, and found that 58 of them (53%) based their conclusions on a statistically invalid method which can lead to bias in a statistical sense (lower true classification accuracy then the reported classification accuracy). In this report we characterize the potential methodological error and its scope, investigate how it is influenced by different experimental parameters, and describe statistically valid methods for avoiding such classification mistakes.
Statistical Treatment of Earth Observing System Pyroshock Separation Test Data
NASA Technical Reports Server (NTRS)
McNelis, Anne M.; Hughes, William O.
1998-01-01
The Earth Observing System (EOS) AM-1 spacecraft for NASA's Mission to Planet Earth is scheduled to be launched on an Atlas IIAS vehicle in June of 1998. One concern is that the instruments on the EOS spacecraft are sensitive to the shock-induced vibration produced when the spacecraft separates from the launch vehicle. By employing unique statistical analysis to the available ground test shock data, the NASA Lewis Research Center found that shock-induced vibrations would not be as great as the previously specified levels of Lockheed Martin. The EOS pyroshock separation testing, which was completed in 1997, produced a large quantity of accelerometer data to characterize the shock response levels at the launch vehicle/spacecraft interface. Thirteen pyroshock separation firings of the EOS and payload adapter configuration yielded 78 total measurements at the interface. The multiple firings were necessary to qualify the newly developed Lockheed Martin six-hardpoint separation system. Because of the unusually large amount of data acquired, Lewis developed a statistical methodology to predict the maximum expected shock levels at the interface between the EOS spacecraft and the launch vehicle. Then, this methodology, which is based on six shear plate accelerometer measurements per test firing at the spacecraft/launch vehicle interface, was used to determine the shock endurance specification for EOS. Each pyroshock separation test of the EOS spacecraft simulator produced its own set of interface accelerometer data. Probability distributions, histograms, the median, and higher order moments (skew and kurtosis) were analyzed. The data were found to be lognormally distributed, which is consistent with NASA pyroshock standards. Each set of lognormally transformed test data produced was analyzed to determine if the data should be combined statistically. Statistical testing of the data's standard deviations and means (F and t testing, respectively) determined if data sets were
Mathematical and statistical approaches for interpreting biomarker compounds in exhaled human breath
The various instrumental techniques, human studies, and diagnostic tests that produce data from samples of exhaled breath have one thing in common: they all need to be put into a context wherein a posed question can actually be answered. Exhaled breath contains numerous compoun...
Interpreting the Results of Weighted Least-Squares Regression: Caveats for the Statistical Consumer.
ERIC Educational Resources Information Center
Willett, John B.; Singer, Judith D.
In research, data sets often occur in which the variance of the distribution of the dependent variable at given levels of the predictors is a function of the values of the predictors. In this situation, the use of weighted least-squares (WLS) or techniques is required. Weights suitable for use in a WLS regression analysis must be estimated. A…
Kissling, Grace E; Haseman, Joseph K; Zeiger, Errol
2015-09-01
A recent article by Gaus (2014) demonstrates a serious misunderstanding of the NTP's statistical analysis and interpretation of rodent carcinogenicity data as reported in Technical Report 578 (Ginkgo biloba) (NTP, 2013), as well as a failure to acknowledge the abundant literature on false positive rates in rodent carcinogenicity studies. The NTP reported Ginkgo biloba extract to be carcinogenic in mice and rats. Gaus claims that, in this study, 4800 statistical comparisons were possible, and that 209 of them were statistically significant (p<0.05) compared with 240 (4800×0.05) expected by chance alone; thus, the carcinogenicity of Ginkgo biloba extract cannot be definitively established. However, his assumptions and calculations are flawed since he incorrectly assumes that the NTP uses no correction for multiple comparisons, and that significance tests for discrete data operate at exactly the nominal level. He also misrepresents the NTP's decision making process, overstates the number of statistical comparisons made, and ignores the fact that the mouse liver tumor effects were so striking (e.g., p<0.0000000000001) that it is virtually impossible that they could be false positive outcomes. Gaus' conclusion that such obvious responses merely "generate a hypothesis" rather than demonstrate a real carcinogenic effect has no scientific credibility. Moreover, his claims regarding the high frequency of false positive outcomes in carcinogenicity studies are misleading because of his methodological misconceptions and errors.
Kissling, Grace E; Haseman, Joseph K; Zeiger, Errol
2015-09-01
A recent article by Gaus (2014) demonstrates a serious misunderstanding of the NTP's statistical analysis and interpretation of rodent carcinogenicity data as reported in Technical Report 578 (Ginkgo biloba) (NTP, 2013), as well as a failure to acknowledge the abundant literature on false positive rates in rodent carcinogenicity studies. The NTP reported Ginkgo biloba extract to be carcinogenic in mice and rats. Gaus claims that, in this study, 4800 statistical comparisons were possible, and that 209 of them were statistically significant (p<0.05) compared with 240 (4800×0.05) expected by chance alone; thus, the carcinogenicity of Ginkgo biloba extract cannot be definitively established. However, his assumptions and calculations are flawed since he incorrectly assumes that the NTP uses no correction for multiple comparisons, and that significance tests for discrete data operate at exactly the nominal level. He also misrepresents the NTP's decision making process, overstates the number of statistical comparisons made, and ignores the fact that the mouse liver tumor effects were so striking (e.g., p<0.0000000000001) that it is virtually impossible that they could be false positive outcomes. Gaus' conclusion that such obvious responses merely "generate a hypothesis" rather than demonstrate a real carcinogenic effect has no scientific credibility. Moreover, his claims regarding the high frequency of false positive outcomes in carcinogenicity studies are misleading because of his methodological misconceptions and errors. PMID:25261588
Interpreting and Reporting Radiological Water-Quality Data
McCurdy, David E.; Garbarino, John R.; Mullin, Ann H.
2008-01-01
This document provides information to U.S. Geological Survey (USGS) Water Science Centers on interpreting and reporting radiological results for samples of environmental matrices, most notably water. The information provided is intended to be broadly useful throughout the United States, but it is recommended that scientists who work at sites containing radioactive hazardous wastes need to consult additional sources for more detailed information. The document is largely based on recognized national standards and guidance documents for radioanalytical sample processing, most notably the Multi-Agency Radiological Laboratory Analytical Protocols Manual (MARLAP), and on documents published by the U.S. Environmental Protection Agency and the American National Standards Institute. It does not include discussion of standard USGS practices including field quality-control sample analysis, interpretive report policies, and related issues, all of which shall always be included in any effort by the Water Science Centers. The use of 'shall' in this report signifies a policy requirement of the USGS Office of Water Quality.
Kim, Kyoung-Ho; Yun, Seong-Taek; Choi, Byoung-Young; Chae, Gi-Tak; Joo, Yongsung; Kim, Kangjoo; Kim, Hyoung-Soo
2009-07-21
Hydrochemical and multivariate statistical interpretations of 16 physicochemical parameters of 45 groundwater samples from a riverside alluvial aquifer underneath an agricultural area in Osong, central Korea, were performed in this study to understand the spatial controls of nitrate concentrations in terms of biogeochemical processes occurring near oxbow lakes within a fluvial plain. Nitrate concentrations in groundwater showed a large variability from 0.1 to 190.6 mg/L (mean=35.0 mg/L) with significantly lower values near oxbow lakes. The evaluation of hydrochemical data indicated that the groundwater chemistry (especially, degree of nitrate contamination) is mainly controlled by two competing processes: 1) agricultural contamination and 2) redox processes. In addition, results of factorial kriging, consisting of two steps (i.e., co-regionalization and factor analysis), reliably showed a spatial control of the concentrations of nitrate and other redox-sensitive species; in particular, significant denitrification was observed restrictedly near oxbow lakes. The results of this study indicate that sub-oxic conditions in an alluvial groundwater system are developed geologically and geochemically in and near oxbow lakes, which can effectively enhance the natural attenuation of nitrate before the groundwater discharges to nearby streams. This study also demonstrates the usefulness of multivariate statistical analysis in groundwater study as a supplementary tool for interpretation of complex hydrochemical data sets. PMID:19524319
NASA Astrophysics Data System (ADS)
Kim, Kyoung-Ho; Yun, Seong-Taek; Choi, Byoung-Young; Chae, Gi-Tak; Joo, Yongsung; Kim, Kangjoo; Kim, Hyoung-Soo
2009-07-01
Hydrochemical and multivariate statistical interpretations of 16 physicochemical parameters of 45 groundwater samples from a riverside alluvial aquifer underneath an agricultural area in Osong, central Korea, were performed in this study to understand the spatial controls of nitrate concentrations in terms of biogeochemical processes occurring near oxbow lakes within a fluvial plain. Nitrate concentrations in groundwater showed a large variability from 0.1 to 190.6 mg/L (mean = 35.0 mg/L) with significantly lower values near oxbow lakes. The evaluation of hydrochemical data indicated that the groundwater chemistry (especially, degree of nitrate contamination) is mainly controlled by two competing processes: 1) agricultural contamination and 2) redox processes. In addition, results of factorial kriging, consisting of two steps (i.e., co-regionalization and factor analysis), reliably showed a spatial control of the concentrations of nitrate and other redox-sensitive species; in particular, significant denitrification was observed restrictedly near oxbow lakes. The results of this study indicate that sub-oxic conditions in an alluvial groundwater system are developed geologically and geochemically in and near oxbow lakes, which can effectively enhance the natural attenuation of nitrate before the groundwater discharges to nearby streams. This study also demonstrates the usefulness of multivariate statistical analysis in groundwater study as a supplementary tool for interpretation of complex hydrochemical data sets.
Kim, Kyoung-Ho; Yun, Seong-Taek; Choi, Byoung-Young; Chae, Gi-Tak; Joo, Yongsung; Kim, Kangjoo; Kim, Hyoung-Soo
2009-07-21
Hydrochemical and multivariate statistical interpretations of 16 physicochemical parameters of 45 groundwater samples from a riverside alluvial aquifer underneath an agricultural area in Osong, central Korea, were performed in this study to understand the spatial controls of nitrate concentrations in terms of biogeochemical processes occurring near oxbow lakes within a fluvial plain. Nitrate concentrations in groundwater showed a large variability from 0.1 to 190.6 mg/L (mean=35.0 mg/L) with significantly lower values near oxbow lakes. The evaluation of hydrochemical data indicated that the groundwater chemistry (especially, degree of nitrate contamination) is mainly controlled by two competing processes: 1) agricultural contamination and 2) redox processes. In addition, results of factorial kriging, consisting of two steps (i.e., co-regionalization and factor analysis), reliably showed a spatial control of the concentrations of nitrate and other redox-sensitive species; in particular, significant denitrification was observed restrictedly near oxbow lakes. The results of this study indicate that sub-oxic conditions in an alluvial groundwater system are developed geologically and geochemically in and near oxbow lakes, which can effectively enhance the natural attenuation of nitrate before the groundwater discharges to nearby streams. This study also demonstrates the usefulness of multivariate statistical analysis in groundwater study as a supplementary tool for interpretation of complex hydrochemical data sets.
Hu, Yang; Zhang, Ying; Ren, Jun
2016-01-01
The overall goal is to establish a reliable human protein-protein interaction network and develop computational tools to characterize a protein-protein interaction (PPI) network and the role of individual proteins in the context of the network topology and their expression status. A novel and unique feature of our approach is that we assigned confidence measure to each derived interacting pair and account for the confidence in our network analysis. We integrated experimental data to infer human PPI network. Our model treated the true interacting status (yes versus no) for any given pair of human proteins as a latent variable whose value was not observed. The experimental data were the manifestation of interacting status, which provided evidence as to the likelihood of the interaction. The confidence of interactions would depend on the strength and consistency of the evidence.
Hu, Yang; Zhang, Ying; Ren, Jun; Wang, Yadong; Wang, Zhenzhen; Zhang, Jun
2016-01-01
The overall goal is to establish a reliable human protein-protein interaction network and develop computational tools to characterize a protein-protein interaction (PPI) network and the role of individual proteins in the context of the network topology and their expression status. A novel and unique feature of our approach is that we assigned confidence measure to each derived interacting pair and account for the confidence in our network analysis. We integrated experimental data to infer human PPI network. Our model treated the true interacting status (yes versus no) for any given pair of human proteins as a latent variable whose value was not observed. The experimental data were the manifestation of interacting status, which provided evidence as to the likelihood of the interaction. The confidence of interactions would depend on the strength and consistency of the evidence. PMID:27648447
Hu, Yang; Zhang, Ying; Ren, Jun
2016-01-01
The overall goal is to establish a reliable human protein-protein interaction network and develop computational tools to characterize a protein-protein interaction (PPI) network and the role of individual proteins in the context of the network topology and their expression status. A novel and unique feature of our approach is that we assigned confidence measure to each derived interacting pair and account for the confidence in our network analysis. We integrated experimental data to infer human PPI network. Our model treated the true interacting status (yes versus no) for any given pair of human proteins as a latent variable whose value was not observed. The experimental data were the manifestation of interacting status, which provided evidence as to the likelihood of the interaction. The confidence of interactions would depend on the strength and consistency of the evidence. PMID:27648447
Hocquette, Jean François; Brandstetter, Anna M.
2002-06-01
In studies on enzyme activity or gene expression at the protein level, data are usually analyzed by using a standard curve after subtracting blank values. In most cases and for most techniques (spectrophotometric assays, ELISA), this approach satisfies the basic principles of linearity and specificity. In our experience, this might be also the case for Western-blot analysis. By contrast, mRNA data are usually presented as arbitrary units of the ratio of a target RNA over levels of a control RNA species. We here demonstrate by simple experiments and various examples that this data-normalization procedure may result in misleading conclusions. Common molecular biology techniques have never been carefully tested according to the basic principles of validation of quantitative techniques. We thus prefer a regression-based approach for quantifying mRNA levels relatively to a control RNA species by Northern-blot, semi-quantitative RT-PCR or similar techniques. This type of techniques is also characterized by a lower reproducibility for repeated assays when compared to biochemical analyses. Therefore, we also recommend to design experiments, which allow the detection of a similar range of variance by biochemical and molecular biology techniques. Otherwise, spurious conclusions may be provided regarding the control level of gene expression.
Statistical Analysis of Strength Data for an Aerospace Aluminum Alloy
NASA Technical Reports Server (NTRS)
Neergaard, L.; Malone, T.
2001-01-01
Aerospace vehicles are produced in limited quantities that do not always allow development of MIL-HDBK-5 A-basis design allowables. One method of examining production and composition variations is to perform 100% lot acceptance testing for aerospace Aluminum (Al) alloys. This paper discusses statistical trends seen in strength data for one Al alloy. A four-step approach reduced the data to residuals, visualized residuals as a function of time, grouped data with quantified scatter, and conducted analysis of variance (ANOVA).
Statistical design and analysis of RNA sequencing data.
Auer, Paul L; Doerge, R W
2010-06-01
Next-generation sequencing technologies are quickly becoming the preferred approach for characterizing and quantifying entire genomes. Even though data produced from these technologies are proving to be the most informative of any thus far, very little attention has been paid to fundamental design aspects of data collection and analysis, namely sampling, randomization, replication, and blocking. We discuss these concepts in an RNA sequencing framework. Using simulations we demonstrate the benefits of collecting replicated RNA sequencing data according to well known statistical designs that partition the sources of biological and technical variation. Examples of these designs and their corresponding models are presented with the goal of testing differential expression.
Statistical Analysis of Strength Data for an Aerospace Aluminum Alloy
NASA Technical Reports Server (NTRS)
Neergaard, Lynn; Malone, Tina; Gentz, Steven J. (Technical Monitor)
2000-01-01
Aerospace vehicles are produced in limited quantities that do not always allow development of MIL-HDBK-5 A-basis design allowables. One method of examining production and composition variations is to perform 100% lot acceptance testing for aerospace Aluminum (Al) alloys. This paper discusses statistical trends seen in strength data for one Al alloy. A four-step approach reduced the data to residuals, visualized residuals as a function of time, grouped data with quantified scatter, and conducted analysis of variance (ANOVA).
Spatial Statistical Estimation for Massive Sea Surface Temperature Data
NASA Astrophysics Data System (ADS)
Marchetti, Y.; Vazquez, J.; Nguyen, H.; Braverman, A. J.
2015-12-01
We combine several large remotely sensed sea surface temperature (SST) datasets to create a single high-resolution SST dataset that has no missing data and provides an uncertainty associated with each value. This high resolution dataset will optimize estimates of SST in critical parts of the world's oceans, such as coastal upwelling regions. We use Spatial Statistical Data Fusion (SSDF), a statistical methodology for predicting global spatial fields by exploiting spatial correlations in the data. The main advantages of SSDF over spatial smoothing methodologies include the provision of probabilistic uncertainties, the ability to incorporate multiple datasets with varying footprints, measurement errors and biases, and estimation at any desired resolution. In order to accommodate massive input and output datasets, we introduce two modifications of the existing SSDF algorithm. First, we compute statistical model parameters based on coarse resolution aggregated data. Second, we use an adaptive spatial grid that allows us to perform estimation in a specified region of interest, but incorporate spatial dependence between locations in that region and all locations globally. Finally, we demonstrate with a case study involving estimations on the full globe at coarse resolution grid (30 km) and a high resolution (1 km) inset for the Gulf Stream region.
Eichler, Gabriel S; Reimers, Mark; Kane, David; Weinstein, John N
2007-01-01
Interpretation of microarray data remains a challenge, and most methods fail to consider the complex, nonlinear regulation of gene expression. To address that limitation, we introduce Learner of Functional Enrichment (LeFE), a statistical/machine learning algorithm based on Random Forest, and demonstrate it on several diverse datasets: smoker/never smoker, breast cancer classification, and cancer drug sensitivity. We also compare it with previously published algorithms, including Gene Set Enrichment Analysis. LeFE regularly identifies statistically significant functional themes consistent with known biology. PMID:17845722
NASA Astrophysics Data System (ADS)
Abraham, J. D.; Ball, L. B.; Bedrosian, P. A.; Cannia, J. C.; Deszcz-Pan, M.; Minsley, B. J.; Peterson, S. M.; Smith, B. D.
2009-12-01
contacts between hydrostratigraphic units. This provides a 3D image of the hydrostratigraphic units interpreted from the electrical resistivity derived from the HEM tied to statistical confidences on the picked contacts. The interpreted 2D and 3D data provides the groundwater modeler with a high-resolution hydrogeologic framework and a solid understanding of the uncertainty in the information it provides. This interpretation facilitates more informed modeling decisions, more accurate groundwater models, and development of more effective water-resources management strategies.
Statistics in experimental design, preprocessing, and analysis of proteomics data.
Jung, Klaus
2011-01-01
High-throughput experiments in proteomics, such as 2-dimensional gel electrophoresis (2-DE) and mass spectrometry (MS), yield usually high-dimensional data sets of expression values for hundreds or thousands of proteins which are, however, observed on only a relatively small number of biological samples. Statistical methods for the planning and analysis of experiments are important to avoid false conclusions and to receive tenable results. In this chapter, the most frequent experimental designs for proteomics experiments are illustrated. In particular, focus is put on studies for the detection of differentially regulated proteins. Furthermore, issues of sample size planning, statistical analysis of expression levels as well as methods for data preprocessing are covered.
Probability and Statistics in Astronomical Machine Learning and Data Minin
NASA Astrophysics Data System (ADS)
Scargle, Jeffrey
2012-03-01
Statistical issues peculiar to astronomy have implications for machine learning and data mining. It should be obvious that statistics lies at the heart of machine learning and data mining. Further it should be no surprise that the passive observational nature of astronomy, the concomitant lack of sampling control, and the uniqueness of its realm (the whole universe!) lead to some special statistical issues and problems. As described in the Introduction to this volume, data analysis technology is largely keeping up with major advances in astrophysics and cosmology, even driving many of them. And I realize that there are many scientists with good statistical knowledge and instincts, especially in the modern era I like to call the Age of Digital Astronomy. Nevertheless, old impediments still lurk, and the aim of this chapter is to elucidate some of them. Many experiences with smart people doing not-so-smart things (cf. the anecdotes collected in the Appendix here) have convinced me that the cautions given here need to be emphasized. Consider these four points: 1. Data analysis often involves searches of many cases, for example, outcomes of a repeated experiment, for a feature of the data. 2. The feature comprising the goal of such searches may not be defined unambiguously until the search is carried out, or perhaps vaguely even then. 3. The human visual system is very good at recognizing patterns in noisy contexts. 4. People are much easier to convince of something they want to believe, or already believe, as opposed to unpleasant or surprising facts. One can argue that all four are good things during the initial, exploratory phases of most data analysis. They represent the curiosity and creativity of the scientific process, especially during the exploration of data collections from new observational programs such as all-sky surveys in wavelengths not accessed before or sets of images of a planetary surface not yet explored. On the other hand, confirmatory scientific
Computational and Statistical Analysis of Protein Mass Spectrometry Data
Noble, William Stafford; MacCoss, Michael J.
2012-01-01
High-throughput proteomics experiments involving tandem mass spectrometry produce large volumes of complex data that require sophisticated computational analyses. As such, the field offers many challenges for computational biologists. In this article, we briefly introduce some of the core computational and statistical problems in the field and then describe a variety of outstanding problems that readers of PLoS Computational Biology might be able to help solve. PMID:22291580
The GEOS Ozone Data Assimilation System: Specification of Error Statistics
NASA Technical Reports Server (NTRS)
Stajner, Ivanka; Riishojgaard, Lars Peter; Rood, Richard B.
2000-01-01
A global three-dimensional ozone data assimilation system has been developed at the Data Assimilation Office of the NASA/Goddard Space Flight Center. The Total Ozone Mapping Spectrometer (TOMS) total ozone and the Solar Backscatter Ultraviolet (SBUV) or (SBUV/2) partial ozone profile observations are assimilated. The assimilation, into an off-line ozone transport model, is done using the global Physical-space Statistical Analysis Scheme (PSAS). This system became operational in December 1999. A detailed description of the statistical analysis scheme, and in particular, the forecast and observation error covariance models is given. A new global anisotropic horizontal forecast error correlation model accounts for a varying distribution of observations with latitude. Correlations are largest in the zonal direction in the tropics where data is sparse. Forecast error variance model is proportional to the ozone field. The forecast error covariance parameters were determined by maximum likelihood estimation. The error covariance models are validated using x squared statistics. The analyzed ozone fields in the winter 1992 are validated against independent observations from ozone sondes and HALOE. There is better than 10% agreement between mean Halogen Occultation Experiment (HALOE) and analysis fields between 70 and 0.2 hPa. The global root-mean-square (RMS) difference between TOMS observed and forecast values is less than 4%. The global RMS difference between SBUV observed and analyzed ozone between 50 and 3 hPa is less than 15%.
The statistical analysis of multivariate serological frequency data.
Reyment, Richard A
2005-11-01
Data occurring in the form of frequencies are common in genetics-for example, in serology. Examples are provided by the AB0 group, the Rhesus group, and also DNA data. The statistical analysis of tables of frequencies is carried out using the available methods of multivariate analysis with usually three principal aims. One of these is to seek meaningful relationships between the components of a data set, the second is to examine relationships between populations from which the data have been obtained, the third is to bring about a reduction in dimensionality. This latter aim is usually realized by means of bivariate scatter diagrams using scores computed from a multivariate analysis. The multivariate statistical analysis of tables of frequencies cannot safely be carried out by standard multivariate procedures because they represent compositions and are therefore embedded in simplex space, a subspace of full space. Appropriate procedures for simplex space are compared and contrasted with simple standard methods of multivariate analysis ("raw" principal component analysis). The study shows that the differences between a log-ratio model and a simple logarithmic transformation of proportions may not be very great, particularly as regards graphical ordinations, but important discrepancies do occur. The divergencies between logarithmically based analyses and raw data are, however, great. Published data on Rhesus alleles observed for Italian populations are used to exemplify the subject. PMID:16024067
The statistical analysis of multivariate serological frequency data.
Reyment, Richard A
2005-11-01
Data occurring in the form of frequencies are common in genetics-for example, in serology. Examples are provided by the AB0 group, the Rhesus group, and also DNA data. The statistical analysis of tables of frequencies is carried out using the available methods of multivariate analysis with usually three principal aims. One of these is to seek meaningful relationships between the components of a data set, the second is to examine relationships between populations from which the data have been obtained, the third is to bring about a reduction in dimensionality. This latter aim is usually realized by means of bivariate scatter diagrams using scores computed from a multivariate analysis. The multivariate statistical analysis of tables of frequencies cannot safely be carried out by standard multivariate procedures because they represent compositions and are therefore embedded in simplex space, a subspace of full space. Appropriate procedures for simplex space are compared and contrasted with simple standard methods of multivariate analysis ("raw" principal component analysis). The study shows that the differences between a log-ratio model and a simple logarithmic transformation of proportions may not be very great, particularly as regards graphical ordinations, but important discrepancies do occur. The divergencies between logarithmically based analyses and raw data are, however, great. Published data on Rhesus alleles observed for Italian populations are used to exemplify the subject.
A decision-theory approach to interpretable set analysis for high-dimensional data.
Boca, Simina M; Bravo, Héctor Céorrada; Caffo, Brian; Leek, Jeffrey T; Parmigiani, Giovanni
2013-09-01
A key problem in high-dimensional significance analysis is to find pre-defined sets that show enrichment for a statistical signal of interest; the classic example is the enrichment of gene sets for differentially expressed genes. Here, we propose a new decision-theory approach to the analysis of gene sets which focuses on estimating the fraction of non-null variables in a set. We introduce the idea of "atoms," non-overlapping sets based on the original pre-defined set annotations. Our approach focuses on finding the union of atoms that minimizes a weighted average of the number of false discoveries and missed discoveries. We introduce a new false discovery rate for sets, called the atomic false discovery rate (afdr), and prove that the optimal estimator in our decision-theory framework is to threshold the afdr. These results provide a coherent and interpretable framework for the analysis of sets that addresses the key issues of overlapping annotations and difficulty in interpreting p values in both competitive and self-contained tests. We illustrate our method and compare it to a popular existing method using simulated examples, as well as gene-set and brain ROI data analyses.
A decision-theory approach to interpretable set analysis for high-dimensional data.
Boca, Simina M; Bravo, Héctor Céorrada; Caffo, Brian; Leek, Jeffrey T; Parmigiani, Giovanni
2013-09-01
A key problem in high-dimensional significance analysis is to find pre-defined sets that show enrichment for a statistical signal of interest; the classic example is the enrichment of gene sets for differentially expressed genes. Here, we propose a new decision-theory approach to the analysis of gene sets which focuses on estimating the fraction of non-null variables in a set. We introduce the idea of "atoms," non-overlapping sets based on the original pre-defined set annotations. Our approach focuses on finding the union of atoms that minimizes a weighted average of the number of false discoveries and missed discoveries. We introduce a new false discovery rate for sets, called the atomic false discovery rate (afdr), and prove that the optimal estimator in our decision-theory framework is to threshold the afdr. These results provide a coherent and interpretable framework for the analysis of sets that addresses the key issues of overlapping annotations and difficulty in interpreting p values in both competitive and self-contained tests. We illustrate our method and compare it to a popular existing method using simulated examples, as well as gene-set and brain ROI data analyses. PMID:23909925
Analyzing and interpreting genome data at the network level with ConsensusPathDB.
Herwig, Ralf; Hardt, Christopher; Lienhard, Matthias; Kamburov, Atanas
2016-10-01
ConsensusPathDB consists of a comprehensive collection of human (as well as mouse and yeast) molecular interaction data integrated from 32 different public repositories and a web interface featuring a set of computational methods and visualization tools to explore these data. This protocol describes the use of ConsensusPathDB (http://consensuspathdb.org) with respect to the functional and network-based characterization of biomolecules (genes, proteins and metabolites) that are submitted to the system either as a priority list or together with associated experimental data such as RNA-seq. The tool reports interaction network modules, biochemical pathways and functional information that are significantly enriched by the user's input, applying computational methods for statistical over-representation, enrichment and graph analysis. The results of this protocol can be observed within a few minutes, even with genome-wide data. The resulting network associations can be used to interpret high-throughput data mechanistically, to characterize and prioritize biomarkers, to integrate different omics levels, to design follow-up functional assay experiments and to generate topology for kinetic models at different scales. PMID:27606777
Statistical Inference for Big Data Problems in Molecular Biophysics
Ramanathan, Arvind; Savol, Andrej; Burger, Virginia; Quinn, Shannon; Agarwal, Pratul K; Chennubhotla, Chakra
2012-01-01
We highlight the role of statistical inference techniques in providing biological insights from analyzing long time-scale molecular simulation data. Technologi- cal and algorithmic improvements in computation have brought molecular simu- lations to the forefront of techniques applied to investigating the basis of living systems. While these longer simulations, increasingly complex reaching petabyte scales presently, promise a detailed view into microscopic behavior, teasing out the important information has now become a true challenge on its own. Mining this data for important patterns is critical to automating therapeutic intervention discovery, improving protein design, and fundamentally understanding the mech- anistic basis of cellular homeostasis.
Optimization of Statistical Methods Impact on Quantitative Proteomics Data.
Pursiheimo, Anna; Vehmas, Anni P; Afzal, Saira; Suomi, Tomi; Chand, Thaman; Strauss, Leena; Poutanen, Matti; Rokka, Anne; Corthals, Garry L; Elo, Laura L
2015-10-01
As tools for quantitative label-free mass spectrometry (MS) rapidly develop, a consensus about the best practices is not apparent. In the work described here we compared popular statistical methods for detecting differential protein expression from quantitative MS data using both controlled experiments with known quantitative differences for specific proteins used as standards as well as "real" experiments where differences in protein abundance are not known a priori. Our results suggest that data-driven reproducibility-optimization can consistently produce reliable differential expression rankings for label-free proteome tools and are straightforward in their application. PMID:26321463
Accidents in Malaysian construction industry: statistical data and court cases.
Chong, Heap Yih; Low, Thuan Siang
2014-01-01
Safety and health issues remain critical to the construction industry due to its working environment and the complexity of working practises. This research attempts to adopt 2 research approaches using statistical data and court cases to address and identify the causes and behavior underlying construction safety and health issues in Malaysia. Factual data on the period of 2000-2009 were retrieved to identify the causes and agents that contributed to health issues. Moreover, court cases were tabulated and analyzed to identify legal patterns of parties involved in construction site accidents. Approaches of this research produced consistent results and highlighted a significant reduction in the rate of accidents per construction project in Malaysia.
Optimization of Statistical Methods Impact on Quantitative Proteomics Data.
Pursiheimo, Anna; Vehmas, Anni P; Afzal, Saira; Suomi, Tomi; Chand, Thaman; Strauss, Leena; Poutanen, Matti; Rokka, Anne; Corthals, Garry L; Elo, Laura L
2015-10-01
As tools for quantitative label-free mass spectrometry (MS) rapidly develop, a consensus about the best practices is not apparent. In the work described here we compared popular statistical methods for detecting differential protein expression from quantitative MS data using both controlled experiments with known quantitative differences for specific proteins used as standards as well as "real" experiments where differences in protein abundance are not known a priori. Our results suggest that data-driven reproducibility-optimization can consistently produce reliable differential expression rankings for label-free proteome tools and are straightforward in their application.
A Geophysical Atlas for Interpretation of Satellite-derived Data
NASA Technical Reports Server (NTRS)
Lowman, P. D., Jr. (Editor); Frey, H. V. (Editor); Davis, W. M.; Greenberg, A. P.; Hutchinson, M. K.; Langel, R. A.; Lowrey, B. E.; Marsh, J. G.; Mead, G. D.; Okeefe, J. A.
1979-01-01
A compilation of maps of global geophysical and geological data plotted on a common scale and projection is presented. The maps include satellite gravity, magnetic, seismic, volcanic, tectonic activity, and mantle velocity anomaly data. The Bibliographic references for all maps are included.
Helping Students Interpret Large-Scale Data Tables
ERIC Educational Resources Information Center
Prodromou, Theodosia
2016-01-01
New technologies have completely altered the ways that citizens can access data. Indeed, emerging online data sources give citizens access to an enormous amount of numerical information that provides new sorts of evidence used to influence public opinion. In this new environment, two trends have had a significant impact on our increasingly…
Spatial Statistical Procedures to Validate Input Data in Energy Models
Lawrence Livermore National Laboratory
2006-01-27
Energy modeling and analysis often relies on data collected for other purposes such as census counts, atmospheric and air quality observations, economic trends, and other primarily non-energy-related uses. Systematic collection of empirical data solely for regional, national, and global energy modeling has not been established as in the above-mentioned fields. Empirical and modeled data relevant to energy modeling is reported and available at various spatial and temporal scales that might or might not be those needed and used by the energy modeling community. The incorrect representation of spatial and temporal components of these data sets can result in energy models producing misleading conclusions, especially in cases of newly evolving technologies with spatial and temporal operating characteristics different from the dominant fossil and nuclear technologies that powered the energy economy over the last two hundred years. Increased private and government research and development and public interest in alternative technologies that have a benign effect on the climate and the environment have spurred interest in wind, solar, hydrogen, and other alternative energy sources and energy carriers. Many of these technologies require much finer spatial and temporal detail to determine optimal engineering designs, resource availability, and market potential. This paper presents exploratory and modeling techniques in spatial statistics that can improve the usefulness of empirical and modeled data sets that do not initially meet the spatial and/or temporal requirements of energy models. In particular, we focus on (1) aggregation and disaggregation of spatial data, (2) predicting missing data, and (3) merging spatial data sets. In addition, we introduce relevant statistical software models commonly used in the field for various sizes and types of data sets.
Spatial Statistical Procedures to Validate Input Data in Energy Models
Johannesson, G.; Stewart, J.; Barr, C.; Brady Sabeff, L.; George, R.; Heimiller, D.; Milbrandt, A.
2006-01-01
Energy modeling and analysis often relies on data collected for other purposes such as census counts, atmospheric and air quality observations, economic trends, and other primarily non-energy related uses. Systematic collection of empirical data solely for regional, national, and global energy modeling has not been established as in the abovementioned fields. Empirical and modeled data relevant to energy modeling is reported and available at various spatial and temporal scales that might or might not be those needed and used by the energy modeling community. The incorrect representation of spatial and temporal components of these data sets can result in energy models producing misleading conclusions, especially in cases of newly evolving technologies with spatial and temporal operating characteristics different from the dominant fossil and nuclear technologies that powered the energy economy over the last two hundred years. Increased private and government research and development and public interest in alternative technologies that have a benign effect on the climate and the environment have spurred interest in wind, solar, hydrogen, and other alternative energy sources and energy carriers. Many of these technologies require much finer spatial and temporal detail to determine optimal engineering designs, resource availability, and market potential. This paper presents exploratory and modeling techniques in spatial statistics that can improve the usefulness of empirical and modeled data sets that do not initially meet the spatial and/or temporal requirements of energy models. In particular, we focus on (1) aggregation and disaggregation of spatial data, (2) predicting missing data, and (3) merging spatial data sets. In addition, we introduce relevant statistical software models commonly used in the field for various sizes and types of data sets.
Summary of Quantitative Interpretation of Image Far Ultraviolet Auroral Data
NASA Technical Reports Server (NTRS)
Frey, H. U.; Immel, T. J.; Mende, S. B.; Gerard, J.-C.; Hubert, B.; Habraken, S.; Span, J.; Gladstone, G. R.; Bisikalo, D. V.; Shematovich, V. I.; Six, N. Frank (Technical Monitor)
2002-01-01
Direct imaging of the magnetosphere by instruments on the IMAGE spacecraft is supplemented by simultaneous observations of the global aurora in three far ultraviolet (FUV) wavelength bands. The purpose of the multi-wavelength imaging is to study the global auroral particle and energy input from thc magnetosphere into the atmosphere. This paper describes provides the method for quantitative interpretation of FUV measurements. The Wide-Band Imaging Camera (WIC) provides broad band ultraviolet images of the aurora with maximum spatial and temporal resolution by imaging the nitrogen lines and bands between 140 and 180 nm wavelength. The Spectrographic Imager (SI), a dual wavelength monochromatic instrument, images both Doppler-shifted Lyman alpha emissions produced by precipitating protons, in the SI-12 channel and OI 135.6 nm emissions in the SI-13 channel. From the SI-12 Doppler shifted Lyman alpha images it is possible to obtain the precipitating proton flux provided assumptions are made regarding the mean energy of the protons. Knowledge of the proton (flux and energy) component allows the calculation of the contribution produced by protons in the WIC and SI-13 instruments. Comparison of the corrected WIC and SI-13 signals provides a measure of the electron mean energy, which can then be used to determine the electron energy fluxun-. To accomplish this reliable modeling emission modeling and instrument calibrations are required. In-flight calibration using early-type stars was used to validate the pre-flight laboratory calibrations and determine long-term trends in sensitivity. In general, very reasonable agreement is found between in-situ measurements and remote quantitative determinations.
Statistical comparison of the AGDISP model with deposit data
NASA Astrophysics Data System (ADS)
Duan, Baozhong; Yendol, William G.; Mierzejewski, Karl
An aerial spray Agricultural Dispersal (AGDISP) model was tested against quantitative field data. The microbial pesticide Bacillus thuringiensis (Bt) was sprayed as fine spray from a helicopted over a flat site in various meteorological conditions. Droplet deposition on evenly spaced Kromekote cards, 0.15 m above the ground, was measured with image analysis equipment. Six complete data sets out of the 12 trials were selected for data comparison. A set of statistical parameters suggested by the American Meteorological Society and other authors was applied for comparisons of the model prediction with the ground deposit data. The results indicated that AGDISP tended to overpredict the average volume deposition by a factor of two. The sensitivity test of the AGDISP model to the input wind direction showed that the model may not be sensitive to variations in wind direction within 10 degrees relative to aircraft flight path.
Outpatient health care statistics data warehouse--implementation.
Zilli, D
1999-01-01
Data warehouse implementation is assumed to be a very knowledge-demanding, expensive and long-lasting process. As such it requires senior management sponsorship, involvement of experts, a big budget and probably years of development time. Presented Outpatient Health Care Statistics Data Warehouse implementation research provides ample evidence against the infallibility of the above statements. New, inexpensive, but powerful technology, which provides outstanding platform for On-Line Analytical Processing (OLAP), has emerged recently. Presumably, it will be the basis for the estimated future growth of data warehouse market, both in the medical and in other business fields. Methods and tools for building, maintaining and exploiting data warehouses are also briefly discussed in the paper.
Common misconceptions about data analysis and statistics1
Motulsky, Harvey J
2015-01-01
Ideally, any experienced investigator with the right tools should be able to reproduce a finding published in a peer-reviewed biomedical science journal. In fact, the reproducibility of a large percentage of published findings has been questioned. Undoubtedly, there are many reasons for this, but one reason may be that investigators fool themselves due to a poor understanding of statistical concepts. In particular, investigators often make these mistakes: (1) P-Hacking. This is when you reanalyze a data set in many different ways, or perhaps reanalyze with additional replicates, until you get the result you want. (2) Overemphasis on P values rather than on the actual size of the observed effect. (3) Overuse of statistical hypothesis testing, and being seduced by the word “significant”. (4) Overreliance on standard errors, which are often misunderstood. PMID:25692012
Interpreting Microarray Data to Build Models of Microbial Genetic Regulation Networks
Sokhansanj, B; Garnham, J B; Fitch, J P
2002-01-23
Microarrays and DNA chips are an efficient, high-throughput technology for measuring temporal changes in the expression of message RNA (mRNA) from thousands of genes (often the entire genome of an organism) in a single experiment. A crucial drawback of microarray experiments is that results are inherently qualitative: data are generally neither quantitatively repeatable, nor may microarray spot intensities be calibrated to in vivo mRNA concentrations. Nevertheless, microarrays represent by the far the cheapest and fastest way to obtain information about a cells global genetic regulatory networks. Besides poor signal characteristics, the massive number of data produced by microarray experiments poses challenges for visualization, interpretation and model building. Towards initial model development, we have developed a Java tool for visualizing the spatial organization of gene expression in bacteria. We are also developing an approach to inferring and testing qualitative fuzzy logic models of gene regulation using microarray data. Because we are developing and testing qualitative hypotheses that do not require quantitative precision, our statistical evaluation of experimental data is limited to checking for validity and consistency. Our goals are to maximize the impact of inexpensive microarray technology, bearing in mind that biological models and hypotheses are typically qualitative.
A Statistical Quality Model for Data-Driven Speech Animation.
Ma, Xiaohan; Deng, Zhigang
2012-11-01
In recent years, data-driven speech animation approaches have achieved significant successes in terms of animation quality. However, how to automatically evaluate the realism of novel synthesized speech animations has been an important yet unsolved research problem. In this paper, we propose a novel statistical model (called SAQP) to automatically predict the quality of on-the-fly synthesized speech animations by various data-driven techniques. Its essential idea is to construct a phoneme-based, Speech Animation Trajectory Fitting (SATF) metric to describe speech animation synthesis errors and then build a statistical regression model to learn the association between the obtained SATF metric and the objective speech animation synthesis quality. Through delicately designed user studies, we evaluate the effectiveness and robustness of the proposed SAQP model. To the best of our knowledge, this work is the first-of-its-kind, quantitative quality model for data-driven speech animation. We believe it is the important first step to remove a critical technical barrier for applying data-driven speech animation techniques to numerous online or interactive talking avatar applications.
Speakman, John R; Fletcher, Quinn; Vaanholt, Lobke
2013-03-01
The epidemics of obesity and diabetes have aroused great interest in the analysis of energy balance, with the use of organisms ranging from nematode worms to humans. Although generating energy-intake or -expenditure data is relatively straightforward, the most appropriate way to analyse the data has been an issue of contention for many decades. In the last few years, a consensus has been reached regarding the best methods for analysing such data. To facilitate using these best-practice methods, we present here an algorithm that provides a step-by-step guide for analysing energy-intake or -expenditure data. The algorithm can be used to analyse data from either humans or experimental animals, such as small mammals or invertebrates. It can be used in combination with any commercial statistics package; however, to assist with analysis, we have included detailed instructions for performing each step for three popular statistics packages (SPSS, MINITAB and R). We also provide interpretations of the results obtained at each step. We hope that this algorithm will assist in the statistically appropriate analysis of such data, a field in which there has been much confusion and some controversy.
NASA Astrophysics Data System (ADS)
Klose, C. D.; Giese, R.; Löw, S.; Borm, G.
Especially for deep underground excavations, the prediction of the locations of small- scale hazardous geotechnical structures is nearly impossible when exploration is re- stricted to surface based methods. Hence, for the AlpTransit base tunnels, exploration ahead has become an essential component of the excavation plan. The project de- scribed in this talk aims at improving the technology for the geological interpretation of reflection seismic data. The discovered geological-seismic relations will be used to develop an interpretation system based on artificial intelligence to predict hazardous geotechnical structures of the advancing tunnel face. This talk gives, at first, an overview about the data mining of geological and seismic properties of metamorphic rocks within the Penninic gneiss zone in Southern Switzer- land. The data results from measurements of a specific geophysical prediction system developed by the GFZ Potsdam, Germany, along the 2600 m long and 1400 m deep Faido access tunnel. The goal is to find those seismic features (i.e. compression and shear wave velocities, velocity ratios and velocity gradients) which show a significant relation to geological properties (i.e. fracturing and fabric features). The seismic properties were acquired from different tomograms, whereas the geolog- ical features derive from tunnel face maps. The features are statistically compared with the seismic rock properties taking into account the different methods used for the tunnel excavation (TBM and Drill/Blast). Fracturing and the mica content stay in a positive relation to the velocity values. Both, P- and S-wave velocities near the tunnel surface describe the petrology better, whereas in the interior of the rock mass they correlate to natural micro- and macro-scopic fractures surrounding tectonites, i.e. cataclasites. The latter lie outside of the excavation damage zone and the tunnel loos- ening zone. The shear wave velocities are better indicators for rock
Interpreting Disasters From Limited Data Availability: A Guatemalan Study Case
NASA Astrophysics Data System (ADS)
Soto Gomez, A.
2012-12-01
Guatemala is located in a geographical area exposed to multiple natural hazards. Although Guatemalan populations live in hazardous conditions, limited scientific research is being focused in this particular geographical area. Thorough studies are needed to understand the disasters occurring in the country and consequently enable decision makers and professionals to plan future actions, yet available data is limited. Data comprised in the available data sources is limited by their timespan or the size of the events included and therefore is insufficient to provide the whole picture of the disasters in the country. This study proposes a methodology to use the available data within one of the most important catchments in the country, the Samala River basin, to look for answers to what kind of disasters occurs? Where such events happen? And, why do they happen? Three datasets from different source agencies -one global, one regional, and one local- have been analyzed numerically and spatially using spreadsheets, numerical computing software, and geographic information systems. Analyses results have been coupled in order to search for possible answers to the established questions. It has been found a relation between the compositions of data of two of the three datasets analyzed. The third has shown a very different composition probably because the inclusion criteria of the dataset exclude smaller but more frequent disasters in its records. In all the datasets the most frequent type of disasters are those caused by hydrometeorological hazards i.e. floods and landslides. It has been found a relation between the occurrences of disasters and the records of precipitation in the area, but this relation is not strong enough to affirm that the disasters are the direct result of rain in the area and further studies must be carried out to explore other potential causes. Analyzing the existing data contributes to identify what kind of data is needed and this would be useful to
Searching the Heavens: Astronomy, Computation, Statistics, Data Mining and Philosophy
NASA Astrophysics Data System (ADS)
Glymour, Clark
2012-03-01
Our first and purest science, the mother of scientific methods, sustained by sheer curiosity, searching the heavens we cannot manipulate. From the beginning, astronomy has combined mathematical idealization, technological ingenuity, and indefatigable data collection with procedures to search through assembled data for the processes that govern the cosmos. Astronomers are, and ever have been, data miners, and for that reason astronomical methods (but not astronomical discoveries) have often been despised by statisticians and philosophers. Epithets laced the statistical literature: Ransacking! Data dredging! Double Counting! Statistical disdain was usually directed at social scientists and biologists, rarely if ever at astronomers, but the methodological attitudes and goals that many twentieth-century philosophers and statisticians rejected were creations of the astronomical tradition. The philosophical criticisms were earlier and more direct. In the shadow (or in Alexander Popeâs phrasing, the light) cast on nature in the eighteenth century by the Newtonian triumph, David Hume revived arguments from the ancient Greeks to challenge the very possibility of coming to know what causes what. His conclusion was endorsed in the twentieth century by many philosophers who found talk of causation unnecessary or unacceptably metaphysical, and absorbed by many statisticians as a general suspicion of causal claims, except possibly when they are founded on experimental manipulation. And yet in the hands of a mathematician, Thomas Bayes, and another mathematician and philosopher, Richard Price, Humeâs essays prompted the development of a new kind of statistics, the kind we now call "Bayesian." The computer and new data acquisition methods have begun to dissolve the antipathy between astronomy, philosophy, and statistics. But the resolution is practical, without much reflection on the arguments or the course of events. So, I offer a largely unoriginal history
Interpreting Temperature Strain Data from Meso-Scale Clathrate Experiments
Leeman, John R; Rawn, Claudia J; Ulrich, Shannon M; Elwood Madden, Megan; Phelps, Tommy Joe
2012-01-01
Gas hydrates are important in global climate change, carbon sequestra- tion, and seafloor stability. Currently, formation and dissociation pathways are poorly defined. We present a new approach for processing large amounts of data from meso-scale experiments, such as the LUNA distributed sensing system (DSS) in the seafloor process simulator (SPS) at Oak Ridge National Laboratory. The DSS provides a proxy for temperature measurement with a high spatial resolution allowing the heat of reaction during gas hydrate formation/dissociation to aid in locating clathrates in the vessel. The DSS fibers are placed in the sediment following an Archimedean spiral design and then the position of each sensor is solved by iterating over the arc length formula with Newtons method. The data is then gridded with 1 a natural neighbor interpolation algorithm to allow contouring of the data. The solution of the sensor locations is verified with hot and cold stimulus in known locations. An experiment was preformed with a vertically split column of sand and silt. The DSS system clearly showed hydrate forming in the sand first, then slowly creeping into the silt. Similar systems and data processing techniques could be used for monitoring of hydrates in natural environments or in any situation where a hybrid temperature/strain index is useful. Further ad- vances in fiber technology allow the fiber to be applied in any configuration and the position of each sensor to be precisely determined making practical applications easier.
Summary Statistics for Homemade ?Play Dough? -- Data Acquired at LLNL
Kallman, J S; Morales, K E; Whipple, R E; Huber, R D; Martz, A; Brown, W D; Smith, J A; Schneberk, D J; Martz, Jr., H E; White, III, W T
2010-03-11
Using x-ray computerized tomography (CT), we have characterized the x-ray linear attenuation coefficients (LAC) of a homemade Play Dough{trademark}-like material, designated as PDA. Table 1 gives the first-order statistics for each of four CT measurements, estimated with a Gaussian kernel density estimator (KDE) analysis. The mean values of the LAC range from a high of about 2700 LMHU{sub D} 100kVp to a low of about 1200 LMHUD at 300kVp. The standard deviation of each measurement is around 10% to 15% of the mean. The entropy covers the range from 6.0 to 7.4. Ordinarily, we would model the LAC of the material and compare the modeled values to the measured values. In this case, however, we did not have the detailed chemical composition of the material and therefore did not model the LAC. Using a method recently proposed by Lawrence Livermore National Laboratory (LLNL), we estimate the value of the effective atomic number, Z{sub eff}, to be near 10. LLNL prepared about 50mL of the homemade 'Play Dough' in a polypropylene vial and firmly compressed it immediately prior to the x-ray measurements. We used the computer program IMGREC to reconstruct the CT images. The values of the key parameters used in the data capture and image reconstruction are given in this report. Additional details may be found in the experimental SOP and a separate document. To characterize the statistical distribution of LAC values in each CT image, we first isolated an 80% central-core segment of volume elements ('voxels') lying completely within the specimen, away from the walls of the polypropylene vial. All of the voxels within this central core, including those comprised of voids and inclusions, are included in the statistics. We then calculated the mean value, standard deviation and entropy for (a) the four image segments and for (b) their digital gradient images. (A digital gradient image of a given image was obtained by taking the absolute value of the difference between the initial image
Methods for Quantitative Interpretation of Retarding Field Analyzer Data
Calvey, J.R.; Crittenden, J.A.; Dugan, G.F.; Palmer, M.A.; Furman, M.; Harkay, K.
2011-03-28
Over the course of the CesrTA program at Cornell, over 30 Retarding Field Analyzers (RFAs) have been installed in the CESR storage ring, and a great deal of data has been taken with them. These devices measure the local electron cloud density and energy distribution, and can be used to evaluate the efficacy of different cloud mitigation techniques. Obtaining a quantitative understanding of RFA data requires use of cloud simulation programs, as well as a detailed model of the detector itself. In a drift region, the RFA can be modeled by postprocessing the output of a simulation code, and one can obtain best fit values for important simulation parameters with a chi-square minimization method.
Market Available Virgin Nickel Analysis Data Summary Interpretation Report
Hampson, Steve; Volpe, John
2004-10-01
Collection, analysis, and assessment of market available nickel samples for their radionuclide content is being conducted to support efforts of the Purchase Area Community Reuse Organization (PACRO) to identify and implement a decontamination method that will allow for the sale and recycling of contaminated Paducah Gaseous Diffusion Plant (PGDP) nickel-metal stockpiles. The objectives of the Nickel Project address the lack of radionuclide data in market available nickel metal. The lack of radionuclide data for commercial-recycled nickel metal or commercial-virgin nickel metal has been detrimental to assessments of the potential impacts of the free-release of recycled PGDP nickel on public health. The nickel project, to date, has only evaluated "virgin" nickel metal which is derived form non-recycled sources.
Statistical analysis of epidemiologic data of pregnancy outcomes
Butler, W.J.; Kalasinski, L.A. )
1989-02-01
In this paper, a generalized logistic regression model for correlated observations is used to analyze epidemiologic data on the frequency of spontaneous abortion among a group of women office workers. The results are compared to those obtained from the use of the standard logistic regression model that assumes statistical independence among all the pregnancies contributed by one woman. In this example, the correlation among pregnancies from the same woman is fairly small and did not have a substantial impact on the magnitude of estimates of parameters of the model. This is due at least partly to the small average number of pregnancies contributed by each woman.
Information gathering for the Transportation Statistics Data Bank
Shappert, L.B.; Mason, P.J.
1981-10-01
The Transportation Statistics Data Bank (TSDB) was developed in 1974 to collect information on the transport of Department of Energy (DOE) materials. This computer program may be used to provide the framework for collecting more detailed information on DOE shipments of radioactive materials. This report describes the type of information that is needed in this area and concludes that the existing system could be readily modified to collect and process it. The additional needed information, available from bills of lading and similar documents, could be gathered from DOE field offices and transferred in a standard format to the TSDB system. Costs of the system are also discussed briefly.
Using fuzzy sets for data interpretation in natural analogue studies
De Lemos, F.L.; Sullivan, T.; Hellmuth, K.H.
2008-07-01
Natural analogue studies can play a key role in deep geological radioactive disposal systems safety assessment. These studies can help develop a better understanding of complex natural processes and, therefore, provide valuable means of confidence building in the safety assessment. In evaluation of natural analogues, there are, however, several sources of uncertainties that stem from factors such as complexity; lack of data; and ignorance. Often, analysts have to simplify the mathematical models in order to cope with the various sources of complexity and this ads uncertainty to the model results. The uncertainties reflected in model predictions must be addressed to understand their impact on safety assessment and therefore, the utility of natural analogues. Fuzzy sets can be used to represent the information regarding the natural processes and their mutual connections. With this methodology we are able to quantify and propagate the epistemic uncertainties in both processes and, thereby, assign degrees of truth to the similarities between them. An example calculation with literature data is provided. In conclusion: Fuzzy sets are an effective way of quantifying semi-quantitative information such as natural analogues data. Epistemic uncertainty that stems from complexity and lack of knowledge regarding natural processes are represented by the degrees of membership. It also facilitates the propagation of this uncertainty throughout the performance assessment by the extension principle. This principle allows calculation with fuzzy numbers, where fuzzy input results in fuzzy output. This may be one of the main applications of fuzzy sets theory to radioactive waste disposal facility performance assessment. Through the translation of natural data into fuzzy numbers, the effect of parameters in important processes in one site can be quantified and compared to processes in other sites with different conditions. The approach presented in this paper can be extended to
Linear and nonlinear interpretation of CV-580 lightning data
NASA Technical Reports Server (NTRS)
Ng, Poh H.; Rudolph, Terence H.; Perala, Rodney A.
1988-01-01
Numerical models developed for the study of lightning strike data acquired by in-flight aircraft are applied to the data measured on the CV-580. The basic technique used is the three dimensional time domain finite difference solution of Maxwell's equations. Both linear and nonlinear models are used in the analysis. In the linear model, the lightning channel and the aircraft are assumed to form a linear time invariant system. A transfer function technique can then be used to study the response of the aircraft to a given lightning strike current. Conversely, the lightning current can be inferred from the measured response. In the nonlinear model, the conductivity of air in the vicinity of the aircraft is calculated and incorporated into the solution of the Maxwell's equations. The nonlinear model thus simulates corona formation and air breakdown. Results obtained from the models are in reasonable agreement with the measured data. This study provides another validation of the models and increases confidence that the models may be used to predict aircraft response to any general lightning strike.
Statistical assessment of model fit for synthetic aperture radar data
NASA Astrophysics Data System (ADS)
DeVore, Michael D.; O'Sullivan, Joseph A.
2001-08-01
Parametric approaches to problems of inference from observed data often rely on assumed probabilistic models for the data which may be based on knowledge of the physics of the data acquisition. Given a rich enough collection of sample data, the validity of those assumed models can be assessed in a statistical hypothesis testing framework using any of a number of goodness-of-fit tests developed over the last hundred years for this purpose. Such assessments can be used both to compare alternate models for observed data and to help determine the conditions under which a given model breaks down. We apply three such methods, the (chi) 2 test of Karl Pearson, Kolmogorov's goodness-of-fit test, and the D'Agostino-Pearson test for normality, to quantify how well the data fit various models for synthetic aperture radar (SAR) images. The results of these tests are used to compare a conditionally Gaussian model for complex-valued SAR pixel values, a conditionally log-normal model for SAR pixel magnitudes, and a conditionally normal model for SAR pixel quarter-power values. Sample data for these tests are drawn from the publicly released MSTAR dataset.
Challenges in analysis and interpretation of microsatellite data for population genetic studies
Putman, Alexander I; Carbone, Ignazio
2014-01-01
Advancing technologies have facilitated the ever-widening application of genetic markers such as microsatellites into new systems and research questions in biology. In light of the data and experience accumulated from several years of using microsatellites, we present here a literature review that synthesizes the limitations of microsatellites in population genetic studies. With a focus on population structure, we review the widely used fixation (FST) statistics and Bayesian clustering algorithms and find that the former can be confusing and problematic for microsatellites and that the latter may be confounded by complex population models and lack power in certain cases. Clustering, multivariate analyses, and diversity-based statistics are increasingly being applied to infer population structure, but in some instances these methods lack formalization with microsatellites. Migration-specific methods perform well only under narrow constraints. We also examine the use of microsatellites for inferring effective population size, changes in population size, and deeper demographic history, and find that these methods are untested and/or highly context-dependent. Overall, each method possesses important weaknesses for use with microsatellites, and there are significant constraints on inferences commonly made using microsatellite markers in the areas of population structure, admixture, and effective population size. To ameliorate and better understand these constraints, researchers are encouraged to analyze simulated datasets both prior to and following data collection and analysis, the latter of which is formalized within the approximate Bayesian computation framework. We also examine trends in the literature and show that microsatellites continue to be widely used, especially in non-human subject areas. This review assists with study design and molecular marker selection, facilitates sound interpretation of microsatellite data while fostering respect for their practical
Undergraduate non-science majors' descriptions and interpretations of scientific data visualizations
NASA Astrophysics Data System (ADS)
Swenson, Sandra Signe
Professionally developed and freely accessible through the Internet, scientific data maps have great potential for teaching and learning with data in the science classroom. Solving problems or developing ideas while using data maps of Earth phenomena in the science classroom may help students to understand the nature and process of science. Little is known about how students perceive and interpret scientific data visualizations. This study was an in-depth exploration of descriptions and interpretations of topographic and bathymetric data maps made by a population of 107 non-science majors at an urban public college. Survey, interviews, and artifacts were analyzed within an epistemological framework for understanding data collected about the Earth, by examining representational strategies used to understand maps, and by examining student interpretations using Bloom's Taxonomy of Educational Objectives. The findings suggest that the majority of students interpret data maps by assuming iconicity that was not intended by the maps creator; that students do not appear to have a robust understanding of how data is collected about Earth phenomena; and that while most students are able to make some kinds of interpretations of the data maps, often their interpretations are not based upon the actual data the map is representing. This study provided baseline information of student understanding of data maps from which educators may design curriculum for teaching and learning about Earth phenomena.
Automatic interpretation of ERTS data for forest management
NASA Technical Reports Server (NTRS)
Kirvida, L.; Johnson, G. R.
1973-01-01
Automatic stratification of forested land from ERTS-1 data provides a valuable tool for resource management. The results are useful for wood product yield estimates, recreation and wild life management, forest inventory and forest condition monitoring. Automatic procedures based on both multi-spectral and spatial features are evaluated. With five classes, training and testing on the same samples, classification accuracy of 74% was achieved using the MSS multispectral features. When adding texture computed from 8 x 8 arrays, classification accuracy of 99% was obtained.
Interpret with caution: multicollinearity in multiple regression of cognitive data.
Morrison, Catriona M
2003-08-01
Shibihara and Kondo in 2002 reported a reanalysis of the 1997 Kanji picture-naming data of Yamazaki, Ellis, Morrison, and Lambon-Ralph in which independent variables were highly correlated. Their addition of the variable visual familiarity altered the previously reported pattern of results, indicating that visual familiarity, but not age of acquisition, was important in predicting Kanji naming speed. The present paper argues that caution should be taken when drawing conclusions from multiple regression analyses in which the independent variables are so highly correlated, as such multicollinearity can lead to unreliable output.
Analysis and Interpretation of Synthetic Time Strings of Oscillation Data
NASA Technical Reports Server (NTRS)
Mihalas, B. W.; Christensen-Dalsgaard, J.; Brown, T. M.
1984-01-01
Artificial strings of solar oscillation data with gaps and noise, corresponding to the output of different spatial filter functions, were analyzed. Peaks in the power spectrum are identified for values of the degree l from 0 to 18, and rotational splitting is estimated. The filters prove effective in facilitating identification of essentially all the real peaks in the power spectrum. Estimates of peak frequencies and amplitudes and rotational splitting frequencies are in reasonably good agreement with the input values. Spurious peaks in autocorrelation spectra correspond to the frequency spacing between power peaks with the same order n, differing by one or two in the degree l.
Interpretation of Pennsylvania agricultural land use from ERTS-1 data
NASA Technical Reports Server (NTRS)
Mcmurtry, G. J.; Petersen, G. W. (Principal Investigator); Wilson, A. D.
1974-01-01
The author has identified the following significant results. To study the complex agricultural patterns in Pennsylvania, a portion of an ERTS scene was selected for detailed analysis. Various photographic products were made and were found to be only of limited value. This necessitated the digital processing of the ERTS data. Using an unsupervised classification procedure, it was possible to delineate the following categories: (1) forest land with a northern aspect, (2) forest land with a southern aspect, (3) valley trees, (4) wheat, (5) corn, (6) alfalfa, grass, pasture, (7) disturbed land, (8) builtup land, (9) strip mines, and (10) water. These land use categories were delineated at a scale of approximately 1:20,000 on the line printer output. Land use delineations were also made using the General Electric IMAGE 100 interactive analysis system.
Children's and Adults' Interpretation of Covariation Data: Does Symmetry of Variables Matter?
ERIC Educational Resources Information Center
Saffran, Andrea; Barchfeld, Petra; Sodian, Beate; Alibali, Martha W.
2016-01-01
In a series of 3 experiments, the authors investigated the influence of symmetry of variables on children's and adults' data interpretation. They hypothesized that symmetrical (i.e., present/present) variables would support correct interpretations more than asymmetrical (i.e., present/absent) variables. Participants were asked to judge covariation…
Right-sizing statistical models for longitudinal data.
Wood, Phillip K; Steinley, Douglas; Jackson, Kristina M
2015-12-01
Arguments are proposed that researchers using longitudinal data should consider more and less complex statistical model alternatives to their initially chosen techniques in an effort to "right-size" the model to the data at hand. Such model comparisons may alert researchers who use poorly fitting, overly parsimonious models to more complex, better-fitting alternatives and, alternatively, may identify more parsimonious alternatives to overly complex (and perhaps empirically underidentified and/or less powerful) statistical models. A general framework is proposed for considering (often nested) relationships between a variety of psychometric and growth curve models. A 3-step approach is proposed in which models are evaluated based on the number and patterning of variance components prior to selection of better-fitting growth models that explain both mean and variation-covariation patterns. The orthogonal free curve slope intercept (FCSI) growth model is considered a general model that includes, as special cases, many models, including the factor mean (FM) model (McArdle & Epstein, 1987), McDonald's (1967) linearly constrained factor model, hierarchical linear models (HLMs), repeated-measures multivariate analysis of variance (MANOVA), and the linear slope intercept (linearSI) growth model. The FCSI model, in turn, is nested within the Tuckerized factor model. The approach is illustrated by comparing alternative models in a longitudinal study of children's vocabulary and by comparing several candidate parametric growth and chronometric models in a Monte Carlo study. PMID:26237507
Bayesian Sensitivity Analysis of Statistical Models with Missing Data
ZHU, HONGTU; IBRAHIM, JOSEPH G.; TANG, NIANSHENG
2013-01-01
Methods for handling missing data depend strongly on the mechanism that generated the missing values, such as missing completely at random (MCAR) or missing at random (MAR), as well as other distributional and modeling assumptions at various stages. It is well known that the resulting estimates and tests may be sensitive to these assumptions as well as to outlying observations. In this paper, we introduce various perturbations to modeling assumptions and individual observations, and then develop a formal sensitivity analysis to assess these perturbations in the Bayesian analysis of statistical models with missing data. We develop a geometric framework, called the Bayesian perturbation manifold, to characterize the intrinsic structure of these perturbations. We propose several intrinsic influence measures to perform sensitivity analysis and quantify the effect of various perturbations to statistical models. We use the proposed sensitivity analysis procedure to systematically investigate the tenability of the non-ignorable missing at random (NMAR) assumption. Simulation studies are conducted to evaluate our methods, and a dataset is analyzed to illustrate the use of our diagnostic measures. PMID:24753718
Securing co-operation from persons supplying statistical data
Aubenque, M. J.; Blaikley, R. M.; Harris, F. Fraser; Lal, R. B.; Neurdenburg, M. G.; Hernández, R. de Shelly
1954-01-01
Securing the co-operation of persons supplying information required for medical statistics is essentially a problem in human relations, and an understanding of the motivations, attitudes, and behaviour of the respondents is necessary. Before any new statistical survey is undertaken, it is suggested by Aubenque and Harris that a preliminary review be made so that the maximum use is made of existing information. Care should also be taken not to burden respondents with an overloaded questionnaire. Aubenque and Harris recommend simplified reporting. Complete population coverage is not necessary. Neurdenburg suggests that the co-operation and support of such organizations as medical associations and social security boards are important and that propaganda should be directed specifically to the groups whose co-operation is sought. Informal personal contacts are valuable and desirable, according to Blaikley, but may have adverse effects if the right kind of approach is not made. Financial payments as an incentive in securing co-operation are opposed by Neurdenburg, who proposes that only postage-free envelopes or similar small favours be granted. Blaikley and Harris, on the other hand, express the view that financial incentives may do much to gain the support of those required to furnish data; there are, however, other incentives, and full use should be made of the natural inclinations of respondents. Compulsion may be necessary in certain instances, but administrative rather than statutory measures should be adopted. Penalties, according to Aubenque, should be inflicted only when justified by imperative health requirements. The results of surveys should be made available as soon as possible to those who co-operated, and Aubenque and Harris point out that they should also be of practical value to the suppliers of the information. Greater co-operation can be secured from medical persons who have an understanding of the statistical principles involved; Aubenque and
Statistical analysis of test data for APM rod issue
Edwards, T.B.; Harris, S.P.; Reeve, C.P.
1992-05-01
The uncertainty associated with the use of the K-Reactor axial power monitors (APMs) to measure roof-top-ratios is investigated in this report. Internal heating test data acquired under both DC-flow conditions and AC-flow conditions have been analyzed. These tests were conducted to simulate gamma heating at the lower power levels planned for reactor operation. The objective of this statistical analysis is to investigate the relationship between the observed and true roof-top-ratio (RTR) values and associated uncertainties at power levels within this lower operational range. Conditional on a given, known power level, a prediction interval for the true RTR value corresponding to a new, observed RTR is given. This is done for a range of power levels. Estimates of total system uncertainty are also determined by combining the analog-to-digital converter uncertainty with the results from the test data.
Estimation of descriptive statistics for multiply censored water quality data
Helsel, D.R.; Cohn, T.A.
1988-01-01
This paper extends the work of Gilliom and Helsel on procedures for estimating descriptive statistics of water quality data than contain "less than' observations. Previously, procedures were evaluated when only one detection limit was present. Here the performance of estimators for data that have multiple detection limits is investigated. Probability plotting and maximum likelihood methods perform substantially better than simple substitution procedures now commonly in use. Therefore simple substitution procedures eg substitution of the detection limit should be avoided. Probability plotting methods are more robust than maximum likelihood methods to misspecification of the parent distribution and their use should be encouraged in the typical situation where the parent distribution is unknown. When utilized correctly, less than values frequently contain nearly as much information for estimating population moments and quantiles as would the same observations had the detection limit been below them. -Authors
Multivariate statistical analysis of atom probe tomography data.
Parish, Chad M; Miller, Michael K
2010-10-01
The application of spectrum imaging multivariate statistical analysis methods, specifically principal component analysis (PCA), to atom probe tomography (APT) data has been investigated. The mathematical method of analysis is described and the results for two example datasets are analyzed and presented. The first dataset is from the analysis of a PM 2000 Fe-Cr-Al-Ti steel containing two different ultrafine precipitate populations. PCA properly describes the matrix and precipitate phases in a simple and intuitive manner. A second APT example is from the analysis of an irradiated reactor pressure vessel steel. Fine, nm-scale Cu-enriched precipitates having a core-shell structure were identified and qualitatively described by PCA. Advantages, disadvantages, and future prospects for implementing these data analysis methodologies for APT datasets, particularly with regard to quantitative analysis, are also discussed. PMID:20650566
Multivariate Statistical Analysis of MSL APXS Bulk Geochemical Data
NASA Astrophysics Data System (ADS)
Hamilton, V. E.; Edwards, C. S.; Thompson, L. M.; Schmidt, M. E.
2014-12-01
We apply cluster and factor analyses to bulk chemical data of 130 soil and rock samples measured by the Alpha Particle X-ray Spectrometer (APXS) on the Mars Science Laboratory (MSL) rover Curiosity through sol 650. Multivariate approaches such as principal components analysis (PCA), cluster analysis, and factor analysis compliment more traditional approaches (e.g., Harker diagrams), with the advantage of simultaneously examining the relationships between multiple variables for large numbers of samples. Principal components analysis has been applied with success to APXS, Pancam, and Mössbauer data from the Mars Exploration Rovers. Factor analysis and cluster analysis have been applied with success to thermal infrared (TIR) spectral data of Mars. Cluster analyses group the input data by similarity, where there are a number of different methods for defining similarity (hierarchical, density, distribution, etc.). For example, without any assumptions about the chemical contributions of surface dust, preliminary hierarchical and K-means cluster analyses clearly distinguish the physically adjacent rock targets Windjana and Stephen as being distinctly different than lithologies observed prior to Curiosity's arrival at The Kimberley. In addition, they are separated from each other, consistent with chemical trends observed in variation diagrams but without requiring assumptions about chemical relationships. We will discuss the variation in cluster analysis results as a function of clustering method and pre-processing (e.g., log transformation, correction for dust cover) and implications for interpreting chemical data. Factor analysis shares some similarities with PCA, and examines the variability among observed components of a dataset so as to reveal variations attributable to unobserved components. Factor analysis has been used to extract the TIR spectra of components that are typically observed in mixtures and only rarely in isolation; there is the potential for similar
Lindsey, David A.
2001-01-01
Pebble count data from Quaternary gravel deposits north of Denver, Colo., were analyzed by multivariate statistical methods to identify lithologic factors that might affect aggregate quality. The pebble count data used in this analysis were taken from the map by Colton and Fitch (1974) and are supplemented by data reported by the Front Range Infrastructure Resources Project. This report provides data tables and results of the statistical analysis. The multivariate statistical analysis used here consists of log-contrast principal components analysis (method of Reyment and Savazzi, 1999) followed by rotation of principal components and factor interpretation. Three lithologic factors that might affect aggregate quality were identified: 1) granite and gneiss versus pegmatite, 2) quartz + quartzite versus total volcanic rocks, and 3) total sedimentary rocks (mainly sandstone) versus granite. Factor 1 (grain size of igneous and metamorphic rocks) may represent destruction during weathering and transport or varying proportions of rocks in source areas. Factor 2 (resistant source rocks) represents the dispersion shadow of metaquartzite detritus, perhaps enhanced by resistance of quartz and quartzite during weathering and transport. Factor 3 (proximity to sandstone source) represents dilution of gravel by soft sedimentary rocks (mainly sandstone), which are exposed mainly in hogbacks near the mountain front. Factor 1 probably does not affect aggregate quality. Factor 2 would be expected to enhance aggregate quality as measured by the Los Angeles degradation test. Factor 3 may diminish aggregate quality.
NASA Astrophysics Data System (ADS)
Parsons, Sharon; Nadeau, Léopold; Keating, Pierre; Chung, Chang-Jo
2006-06-01
Predictive geological mapping relies largely on the empirical and statistical analysis of aeromagnetic data. However, in most applications the analysis remains essentially visual and unconstrained. The lithological and structural diversity of rock units underlying the Mingan Region make it an ideal test area to apply more rigorous approaches to magnetic data processing and interpretation, and to assess their usefulness and limitations. In the application discussed here, various derivatives and transformations of the total field magnetic data are evaluated empirically by photo-interpretation using a Geographic Information System. We show that rock types are best represented using the total field and vertical derivative of the magnetic data, whereas contacts between rock types are best delineated using the horizontal derivative of the total field and the analytic signal. In addition, the maxima of the analytic signal are used to estimate the direction of dip of large-scale geological units. Statistical analyses show that the correlation between geology and magnetic data is not directly proportional. Finally, the source of discrepancies between mapped geological units and magnetic response are evaluated through theoretical data modeling of representative geological bodies.
Generation of dense statistical connectomes from sparse morphological data
Egger, Robert; Dercksen, Vincent J.; Udvary, Daniel; Hege, Hans-Christian; Oberlaender, Marcel
2014-01-01
Sensory-evoked signal flow, at cellular and network levels, is primarily determined by the synaptic wiring of the underlying neuronal circuitry. Measurements of synaptic innervation, connection probabilities and subcellular organization of synaptic inputs are thus among the most active fields of research in contemporary neuroscience. Methods to measure these quantities range from electrophysiological recordings over reconstructions of dendrite-axon overlap at light-microscopic levels to dense circuit reconstructions of small volumes at electron-microscopic resolution. However, quantitative and complete measurements at subcellular resolution and mesoscopic scales to obtain all local and long-range synaptic in/outputs for any neuron within an entire brain region are beyond present methodological limits. Here, we present a novel concept, implemented within an interactive software environment called NeuroNet, which allows (i) integration of sparsely sampled (sub)cellular morphological data into an accurate anatomical reference frame of the brain region(s) of interest, (ii) up-scaling to generate an average dense model of the neuronal circuitry within the respective brain region(s) and (iii) statistical measurements of synaptic innervation between all neurons within the model. We illustrate our approach by generating a dense average model of the entire rat vibrissal cortex, providing the required anatomical data, and illustrate how to measure synaptic innervation statistically. Comparing our results with data from paired recordings in vitro and in vivo, as well as with reconstructions of synaptic contact sites at light- and electron-microscopic levels, we find that our in silico measurements are in line with previous results. PMID:25426033
The International Coal Statistics Data Base operations guide
Not Available
1991-04-01
The International Coal Statistics Data base (ICSD) is a micro- computer based system which contains informations related to international coal trade. This includes coal production, consumption, imports and exports information. The ICSD is a secondary data base, meaning that information contained therein is derived entirely from other primary sources. It uses dBase 3+ and Lotus 1-2-3 to locate, report and display data. The system is used for analysis in preparing the Annual Prospects for World Coal Trade (DOE/EIA-0363) publication. The ICSD system is menu driven, and also permits the user who is familiar with dBase and Lotus operations to leave the menu structure to perform independent queries. Documentation for the ICSD consists of three manuals -- the User's Guide, the Operations Manual and the Program Maintenance Manual. This Operations Manual explains how to install the programs, how to obtain reports on coal trade, what systems requirements apply, and how to update the major data files. It also explains file naming conventions, what each file does, and the programming procedures used to make the system work. The Operations Manual explains how to make the system respond to customized queries. It is organized around the ICSD menu structure and describes what each selection will do. Sample reports and graphs generated from individual menu selection are provided to acquaint the user with the various types of output. 17 figs.
Interpreting Low Spatial Resolution Thermal Data from Active Volcanoes on Io and the Earth
NASA Technical Reports Server (NTRS)
Keszthelyi, L.; Harris, A. J. L.; Flynn, L.; Davies, A. G.; McEwen, A.
2001-01-01
The style of volcanism was successfully determined at a number of active volcanoes on Io and the Earth using the same techniques to interpret thermal remote sensing data. Additional information is contained in the original extended abstract.
NASA Technical Reports Server (NTRS)
Feiveson, Alan H.; Foy, Millennia; Ploutz-Snyder, Robert; Fiedler, James
2014-01-01
Do you have elevated p-values? Is the data analysis process getting you down? Do you experience anxiety when you need to respond to criticism of statistical methods in your manuscript? You may be suffering from Insufficient Statistical Support Syndrome (ISSS). For symptomatic relief of ISSS, come for a free consultation with JSC biostatisticians at our help desk during the poster sessions at the HRP Investigators Workshop. Get answers to common questions about sample size, missing data, multiple testing, when to trust the results of your analyses and more. Side effects may include sudden loss of statistics anxiety, improved interpretation of your data, and increased confidence in your results.
Model-independent plot of dynamic PET data facilitates data interpretation and model selection.
Munk, Ole Lajord
2012-02-21
When testing new PET radiotracers or new applications of existing tracers, the blood-tissue exchange and the metabolism need to be examined. However, conventional plots of measured time-activity curves from dynamic PET do not reveal the inherent kinetic information. A novel model-independent volume-influx plot (vi-plot) was developed and validated. The new vi-plot shows the time course of the instantaneous distribution volume and the instantaneous influx rate. The vi-plot visualises physiological information that facilitates model selection and it reveals when a quasi-steady state is reached, which is a prerequisite for the use of the graphical analyses by Logan and Gjedde-Patlak. Both axes of the vi-plot have direct physiological interpretation, and the plot shows kinetic parameter in close agreement with estimates obtained by non-linear kinetic modelling. The vi-plot is equally useful for analyses of PET data based on a plasma input function or a reference region input function. The vi-plot is a model-independent and informative plot for data exploration that facilitates the selection of an appropriate method for data analysis.
Statistical analysis of heartbeat data with wavelet techniques
NASA Astrophysics Data System (ADS)
Pazsit, Imre
2004-05-01
The purpose of this paper is to demonstrate the use of some methods of signal analysis, performed on ECG and in some cases blood pressure signals, for the classification of the health status of the heart of mice and rats. Spectral and wavelet analysis were performed on the raw signals. FFT-based coherence and phase was also calculated between blood pressure and raw ECG signals. Finally, RR-intervals were deduced from the ECG signals and an analysis of the fractal dimensions was performed. The analysis was made on data from mice and rats. A correlation was found between the health status of the mice and the rats and some of the statistical descriptors, most notably the phase of the cross-spectra between ECG and blood pressure, and the fractal properties and dimensions of the interbeat series (RR-interval fluctuations).
Multispectral data acquisition and classification - Statistical models for system design
NASA Technical Reports Server (NTRS)
Huck, F. O.; Park, S. K.
1978-01-01
In this paper we relate the statistical processes that are involved in multispectral data acquisition and classification to a simple radiometric model of the earth surface and atmosphere. If generalized, these formulations could provide an analytical link between the steadily improving models of our environment and the performance characteristics of rapidly advancing device technology. This link is needed to bring system analysis tools to the task of optimizing remote sensing and (real-time) signal processing systems as a function of target and atmospheric properties, remote sensor spectral bands and system topology (e.g., image-plane processing), radiometric sensitivity and calibration accuracy, compensation for imaging conditions (e.g., atmospheric effects), and classification rates and errors.
Incorporating spatial context into statistical classification of multidimensional image data
NASA Technical Reports Server (NTRS)
Bauer, M. E. (Principal Investigator); Tilton, J. C.; Swain, P. H.
1981-01-01
Compound decision theory is employed to develop a general statistical model for classifying image data using spatial context. The classification algorithm developed from this model exploits the tendency of certain ground-cover classes to occur more frequently in some spatial contexts than in others. A key input to this contextural classifier is a quantitative characterization of this tendency: the context function. Several methods for estimating the context function are explored, and two complementary methods are recommended. The contextural classifier is shown to produce substantial improvements in classification accuracy compared to the accuracy produced by a non-contextural uniform-priors maximum likelihood classifier when these methods of estimating the context function are used. An approximate algorithm, which cuts computational requirements by over one-half, is presented. The search for an optimal implementation is furthered by an exploration of the relative merits of using spectral classes or information classes for classification and/or context function estimation.
Statistical Analysis of Data with Non-Detectable Values
Frome, E.L.
2004-08-26
Environmental exposure measurements are, in general, positive and may be subject to left censoring, i.e. the measured value is less than a ''limit of detection''. In occupational monitoring, strategies for assessing workplace exposures typically focus on the mean exposure level or the probability that any measurement exceeds a limit. A basic problem of interest in environmental risk assessment is to determine if the mean concentration of an analyte is less than a prescribed action level. Parametric methods, used to determine acceptable levels of exposure, are often based on a two parameter lognormal distribution. The mean exposure level and/or an upper percentile (e.g. the 95th percentile) are used to characterize exposure levels, and upper confidence limits are needed to describe the uncertainty in these estimates. In certain situations it is of interest to estimate the probability of observing a future (or ''missed'') value of a lognormal variable. Statistical methods for random samples (without non-detects) from the lognormal distribution are well known for each of these situations. In this report, methods for estimating these quantities based on the maximum likelihood method for randomly left censored lognormal data are described and graphical methods are used to evaluate the lognormal assumption. If the lognormal model is in doubt and an alternative distribution for the exposure profile of a similar exposure group is not available, then nonparametric methods for left censored data are used. The mean exposure level, along with the upper confidence limit, is obtained using the product limit estimate, and the upper confidence limit on the 95th percentile (i.e. the upper tolerance limit) is obtained using a nonparametric approach. All of these methods are well known but computational complexity has limited their use in routine data analysis with left censored data. The recent development of the R environment for statistical data analysis and graphics has greatly
Statistical Analysis of Surface Water Quality Data of Eastern Massachusetts
NASA Astrophysics Data System (ADS)
Andronache, C.; Hon, R.; Tedder, N.; Xian, Q.; Schaudt, B.
2008-05-01
We present a characterization of current state of surface water, changes in time and dependence on land use, precipitation regime, and possible other natural and human influences based on data from the USGS National Water Quality Assessment (NAWQA) Program for New England streams. Time series analysis is used to detect changes and relationship with discharge and precipitation regime. Statistical techniques are employed to analyze relationships among multiple chemical variable monitored. Analysis of ion concentrations reveals information about possible natural sources and processes, and anthropogenic influences. A notable example is the increase in salt concentration in ground and surface waters, with impact on drinking water quality. Salt concentration increase in water can be linked to road salt usage during winters with heavy snowfall and other factors. Road salt enters water supplies by percolation through soil into groundwater or runoff and drainage into reservoirs. After entering fast-flowing streams, rivers and lakes, salt runoff concentrations are rapidly diluted. Road salt infiltration is more common for groundwater-based supplies, such as wells, springs, and reservoirs that are recharged mainly by groundwater. We use principal component analysis and other statistical procedures to obtain a description of the dominant independent variables that influence the observed chemical compositional range. In most cases, over 85 percent of the total variation can be explained by 3 to 4 components. The overwhelming variation is attributed to a large compositional range of Na and Cl seen even if all data are combined into a single dataset. Na versus Cl correlation coefficients are commonly greater than 0.9. Second components are typically associated with dilutions by overland flows (non winter months) and/or increased concentrations due to evaporation (summer season) or overland flows (winter season) if a snow storm is followed by the application of deicers on road
The International Coal Statistics Data Base program maintenance guide
Not Available
1991-06-01
The International Coal Statistics Data Base (ICSD) is a microcomputer-based system which contains information related to international coal trade. This includes coal production, consumption, imports and exports information. The ICSD is a secondary data base, meaning that information contained therein is derived entirely from other primary sources. It uses dBase III+ and Lotus 1-2-3 to locate, report and display data. The system is used for analysis in preparing the Annual Prospects for World Coal Trade (DOE/EIA-0363) publication. The ICSD system is menu driven and also permits the user who is familiar with dBase and Lotus operations to leave the menu structure to perform independent queries. Documentation for the ICSD consists of three manuals -- the User's Guide, the Operations Manual, and the Program Maintenance Manual. This Program Maintenance Manual provides the information necessary to maintain and update the ICSD system. Two major types of program maintenance documentation are presented in this manual. The first is the source code for the dBase III+ routines and related non-dBase programs used in operating the ICSD. The second is listings of the major component database field structures. A third important consideration for dBase programming, the structure of index files, is presented in the listing of source code for the index maintenance program. 1 fig.
Data Analysis & Statistical Methods for Command File Errors
NASA Technical Reports Server (NTRS)
Meshkat, Leila; Waggoner, Bruce; Bryant, Larry
2014-01-01
This paper explains current work on modeling for managing the risk of command file errors. It is focused on analyzing actual data from a JPL spaceflight mission to build models for evaluating and predicting error rates as a function of several key variables. We constructed a rich dataset by considering the number of errors, the number of files radiated, including the number commands and blocks in each file, as well as subjective estimates of workload and operational novelty. We have assessed these data using different curve fitting and distribution fitting techniques, such as multiple regression analysis, and maximum likelihood estimation to see how much of the variability in the error rates can be explained with these. We have also used goodness of fit testing strategies and principal component analysis to further assess our data. Finally, we constructed a model of expected error rates based on the what these statistics bore out as critical drivers to the error rate. This model allows project management to evaluate the error rate against a theoretically expected rate as well as anticipate future error rates.
Slow and fast solar wind - data selection and statistical analysis
NASA Astrophysics Data System (ADS)
Wawrzaszek, Anna; Macek, Wiesław M.; Bruno, Roberto; Echim, Marius
2014-05-01
In this work we consider the important problem of selection of slow and fast solar wind data measured in-situ by the Ulysses spacecraft during two solar minima (1995-1997, 2007-2008) and solar maximum (1999-2001). To recognise different types of solar wind we use a set of following parameters: radial velocity, proton density, proton temperature, the distribution of charge states of oxygen ions, and compressibility of magnetic field. We present how this idea of the data selection works on Ulysses data. In the next step we consider the chosen intervals for fast and slow solar wind and perform statistical analysis of the fluctuating magnetic field components. In particular, we check the possibility of identification of inertial range by considering the scale dependence of the third and fourth orders scaling exponents of structure function. We try to verify the size of inertial range depending on the heliographic latitudes, heliocentric distance and phase of the solar cycle. Research supported by the European Community's Seventh Framework Programme (FP7/2007 - 2013) under grant agreement no 313038/STORM.
Models to interpret bed-form geometries from cross-bed data
Luthi, S.M. ); Banavar, J.R. ); Bayer, U. )
1990-05-01
To improve the understanding of the relation of cross-bed azimuth distributions to bed-forms, geometric models were developed for migrating bed forms using a minimum number of parameters. Semielliptical and sinusoidal bed-form crestlines were modeled with curvature and sinuosity as parameters. Both bedform crestlines are propagated at various angles of migration over a finite area of deposition. Two computational approaches are used, a statistical random sampling (Monte Carlo) technique over the area of the deposit, and an analytical method based on topology and differential geometry. The resulting foreset azimuth distributions provide a catalog for a variety of simulations. The resulting thickness distributions have a simple shape and can be combined with the azimuth distributions to further constrain the cross-strata geometry. Paleocurrent directions obtained by these models can differ substantially from other methods, especially for obliquely migrating low-curvature bed forms. Interpretation of foreset azimuth data from outcrops and wells can be done either by visual comparison with the cataloged distributions, or by iterative computational fits. Studied examples include eolian cross-strata from the Permian Rotliegendes in the North Sea, fluvial dunes from the Devonian in the Catskills (New York state), the Triassic Schilfsandstein (Federal Republic of Germany), and the Paleozoic-Jurassic of the Western Desert (Egypt), as well as recent tidal dunes from the German coast of the North Sea and tidal cross-strata from the Devonian Koblentzquartzit (Federal Republic of Germany). In all cases the semi-elliptical bed-form model gave a good fit to the data, suggesting that it may be applicable over a wide range of bed forms. The data from the Western Desert could be explained only by data scatter due to channel sinuosity combined with the scatter attributed to the ellipticity of the bed-form crestlines.
The International Coal Statistics Data Base user's guide
Not Available
1991-06-01
The ICSD is a microcomputer-based system which presents four types of data: (1) the quantity of coal traded between importers and exporters, (2) the price of particular ranks of coal and the cost of shipping it in world trade, (3) a detailed look at coal shipments entering and leaving the United States, and (4) the context for world coal trade in the form of data on how coal and other primary energy sources are used now and are projected to be used in the future, especially by major industrial economies. The ICSD consists of more than 140 files organized into a rapid query system for coal data. It can operate on any IBM-compatible microcomputer with 640 kilobytes memory and a hard disk drive with at least 8 megabytes of available space. The ICSD is: 1. A menu-driven, interactive data base using Dbase 3+ and Lotus 1-2-3. 2. Inputs include official and commercial statistics on international coal trade volumes and consumption. 3. Outputs include dozens of reports and color graphic displays. Output report type include Lotus worksheets, dBase data bases, ASCII text files, screen displays, and printed reports. 4. Flexible design permits user to follow structured query system or design his own queries using either Lotus or dBase procedures. 5. Incudes maintenance programs to configure the system, correct indexing errors, back-up work, restore corrupted files, annotate user-created files and update system programs, use DOS shells, and much more. Forecasts and other information derived from the ICSD are published in EIA's Annual Prospects for World Coal Trade (DOE/EIA-0363).
Rigby, Sean P; Edler, Karen J
2002-06-01
The use of a semi-empirical alternative to the standard Washburn equation for the interpretation of raw mercury porosimetry data has been advocated. The alternative expression takes account of variations in both mercury contact angle and surface tension with pore size, for both advancing and retreating mercury meniscii. The semi-empirical equation presented was ultimately derived from electron microscopy data, obtained for controlled pore glasses by previous workers. It has been found that this equation is also suitable for the interpretation of raw data for sol-gel silica spheres. Interpretation of mercury porosimetry data using the alternative to the standard Washburn equation was found to give rise to pore sizes similar to those obtained from corresponding SAXS data. The interpretation of porosimetry data, for both whole and finely powdered silica spheres, using the alternative expression has demonstrated that the hysteresis and mercury entrapment observed for whole samples does not occur for fragmented samples. Therefore, for these materials, the structural hysteresis and overall level of mercury entrapment is caused by the macroscopic (> approximately 30 microm), and not the microscopic (< approximately 30 microm), properties of the porous medium. This finding suggested that mercury porosimetry may be used to obtain a statistical characterization of sample macroscopic structure similar to that obtained using MRI. In addition, from a comparison of the pore size distribution from porosimetry with that obtained using complementary nitrogen sorption data, it was found that, even in the absence of hysteresis and mercury entrapment, pore shielding effects were still present. This observation suggested that the mercury extrusion process does not occur by a piston-type retraction mechanism and, therefore, the usual method for the application of percolation concepts to mercury retraction is flawed. PMID:16290649
Teschendorff, Andrew E; Sollich, Peter; Kuehn, Reimer
2014-06-01
A key challenge in systems biology is the elucidation of the underlying principles, or fundamental laws, which determine the cellular phenotype. Understanding how these fundamental principles are altered in diseases like cancer is important for translating basic scientific knowledge into clinical advances. While significant progress is being made, with the identification of novel drug targets and treatments by means of systems biological methods, our fundamental systems level understanding of why certain treatments succeed and others fail is still lacking. We here advocate a novel methodological framework for systems analysis and interpretation of molecular omic data, which is based on statistical mechanical principles. Specifically, we propose the notion of cellular signalling entropy (or uncertainty), as a novel means of analysing and interpreting omic data, and more fundamentally, as a means of elucidating systems-level principles underlying basic biology and disease. We describe the power of signalling entropy to discriminate cells according to differentiation potential and cancer status. We further argue the case for an empirical cellular entropy-robustness correlation theorem and demonstrate its existence in cancer cell line drug sensitivity data. Specifically, we find that high signalling entropy correlates with drug resistance and further describe how entropy could be used to identify the achilles heels of cancer cells. In summary, signalling entropy is a deep and powerful concept, based on rigorous statistical mechanical principles, which, with improved data quality and coverage, will allow a much deeper understanding of the systems biological principles underlying normal and disease physiology.
Teschendorff, Andrew E; Sollich, Peter; Kuehn, Reimer
2014-06-01
A key challenge in systems biology is the elucidation of the underlying principles, or fundamental laws, which determine the cellular phenotype. Understanding how these fundamental principles are altered in diseases like cancer is important for translating basic scientific knowledge into clinical advances. While significant progress is being made, with the identification of novel drug targets and treatments by means of systems biological methods, our fundamental systems level understanding of why certain treatments succeed and others fail is still lacking. We here advocate a novel methodological framework for systems analysis and interpretation of molecular omic data, which is based on statistical mechanical principles. Specifically, we propose the notion of cellular signalling entropy (or uncertainty), as a novel means of analysing and interpreting omic data, and more fundamentally, as a means of elucidating systems-level principles underlying basic biology and disease. We describe the power of signalling entropy to discriminate cells according to differentiation potential and cancer status. We further argue the case for an empirical cellular entropy-robustness correlation theorem and demonstrate its existence in cancer cell line drug sensitivity data. Specifically, we find that high signalling entropy correlates with drug resistance and further describe how entropy could be used to identify the achilles heels of cancer cells. In summary, signalling entropy is a deep and powerful concept, based on rigorous statistical mechanical principles, which, with improved data quality and coverage, will allow a much deeper understanding of the systems biological principles underlying normal and disease physiology. PMID:24675401
Statistically significant data base of rock properties for geothermal use
NASA Astrophysics Data System (ADS)
Koch, A.; Jorand, R.; Clauser, C.
2009-04-01
The high risk of failure due to the unknown properties of the target rocks at depth is a major obstacle for the exploration of geothermal energy. In general, the ranges of thermal and hydraulic properties given in compilations of rock properties are too large to be useful to constrain properties at a specific site. To overcome this problem, we study the thermal and hydraulic rock properties of the main rock types in Germany in a statistical approach. An important aspect is the use of data from exploration wells that are largely untapped for the purpose of geothermal exploration. In the current project stage, we have been analyzing mostly Devonian and Carboniferous drill cores from 20 deep boreholes in the region of the Lower Rhine Embayment and the Ruhr area (western North Rhine Westphalia). In total, we selected 230 core samples with a length of up to 30 cm from the core archive of the State Geological Survey. The use of core scanning technology allowed the rapid measurement of thermal conductivity, sonic velocity, and gamma density under dry and water saturated conditions with high resolution for a large number of samples. In addition, we measured porosity, bulk density, and matrix density based on Archimedes' principle and pycnometer analysis. As first results we present arithmetic means, medians and standard deviations characterizing the petrophysical properties and their variability for specific lithostratigraphic units. Bi- and multimodal frequency distributions correspond to the occurrence of different lithologies such as shale, limestone, dolomite, sandstone, siltstone, marlstone, and quartz-schist. In a next step, the data set will be combined with logging data and complementary mineralogical analyses to derive the variation of thermal conductivity with depth. As a final result, this may be used to infer thermal conductivity for boreholes without appropriate core data which were drilled in similar geological settings.
Dziurkowska, Ewelina; Wesolowski, Marek
2015-01-01
Multivariate statistical analysis is widely used in medical studies as a profitable tool facilitating diagnosis of some diseases, for instance, cancer, allergy, pneumonia, or Alzheimer's and psychiatric diseases. Taking this in consideration, the aim of this study was to use two multivariate techniques, hierarchical cluster analysis (HCA) and principal component analysis (PCA), to disclose the relationship between the drugs used in the therapy of major depressive disorder and the salivary cortisol level and the period of hospitalization. The cortisol contents in saliva of depressed women were quantified by HPLC with UV detection day-to-day during the whole period of hospitalization. A data set with 16 variables (e.g., the patients' age, multiplicity and period of hospitalization, initial and final cortisol level, highest and lowest hormone level, mean contents, and medians) characterizing 97 subjects was used for HCA and PCA calculations. Multivariate statistical analysis reveals that various groups of antidepressants affect at the varying degree the salivary cortisol level. The SSRIs, SNRIs, and the polypragmasy reduce most effectively the hormone secretion. Thus, both unsupervised pattern recognition methods, HCA and PCA, can be used as complementary tools for interpretation of the results obtained by laboratory diagnostic methods.
Statistical characteristics of ionospheric variability using oblique sounding data
NASA Astrophysics Data System (ADS)
Kurkin, Vladimir; Polekh, Nelya; Ivanova, Vera; Dumbrava, Zinaida; Podelsky, Igor
Using data from oblique sounding obtained over two paths Magadan-Irkutsk and Khabarovsk-Irkutsk in the 2006-2011 the statistical parameters of ionospheric variability are studied during equinox and the winter solstice. It was shown that the probability of maximum observed frequency registration with average standard deviations from the median in the range 5-10% in winter is 0.43, in spring and autumn - 0.64 over Magadan-Irkutsk path. In winter during daytime standard deviation does not exceed 10%, and at night it reaches 20% or more. During the equinox the daytime standard deviation increases to 12%, and at night it does not exceed 16%. This may be due to changes in lighting conditions at the midpoint of the path (58.2(°) N, 124.2(°) E). As far Khabarovsk-Irkutsk path standard deviations from their median less than the ones obtained for Magadan-Irkutsk path. The estimations are consistent with previously obtained results deduced from the vertical sounding data. The study was done under RF President Grant of Public Support for RF Leading Scientific Schools (NSh-2942.2014.5) and RFBR Grant No 14-05-00259.
ERIC Educational Resources Information Center
Neumann, David L.; Hood, Michelle; Neumann, Michelle M.
2013-01-01
Many teachers of statistics recommend using real-life data during class lessons. However, there has been little systematic study of what effect this teaching method has on student engagement and learning. The present study examined this question in a first-year university statistics course. Students (n = 38) were interviewed and their reflections…
Analysis of statistical model properties from discrete nuclear structure data
NASA Astrophysics Data System (ADS)
Firestone, Richard B.
2012-02-01
Experimental M1, E1, and E2 photon strengths have been compiled from experimental data in the Evaluated Nuclear Structure Data File (ENSDF) and the Evaluated Gamma-ray Activation File (EGAF). Over 20,000 Weisskopf reduced transition probabilities were recovered from the ENSDF and EGAF databases. These transition strengths have been analyzed for their dependence on transition energies, initial and final level energies, spin/parity dependence, and nuclear deformation. ENSDF BE1W values were found to increase exponentially with energy, possibly consistent with the Axel-Brink hypothesis, although considerable excess strength observed for transitions between 4-8 MeV. No similar energy dependence was observed in EGAF or ARC data. BM1W average values were nearly constant at all energies above 1 MeV with substantial excess strength below 1 MeV and between 4-8 MeV. BE2W values decreased exponentially by a factor of 1000 from 0 to 16 MeV. The distribution of ENSDF transition probabilities for all multipolarities could be described by a lognormal statistical distribution. BE1W, BM1W, and BE2W strengths all increased substantially for initial transition level energies between 4-8 MeV possibly due to dominance of spin-flip and Pygmy resonance transitions at those excitations. Analysis of the average resonance capture data indicated no transition probability dependence on final level spins or energies between 0-3 MeV. The comparison of favored to unfavored transition probabilities for odd-A or odd-Z targets indicated only partial support for the expected branching intensity ratios with many unfavored transitions having nearly the same strength as favored ones. Average resonance capture BE2W transition strengths generally increased with greater deformation. Analysis of ARC data suggest that there is a large E2 admixture in M1 transitions with the mixing ratio δ ≈ 1.0. The ENSDF reduced transition strengths were considerably stronger than those derived from capture gamma ray
Method of representation of remote sensing data that facilitates visual interpretation
NASA Astrophysics Data System (ADS)
Sheremetyeva, T. A.
2004-06-01
We present a method of visualization of the remote sensing data that allows a quick synthesis of heterogeneous data for its interpretation by a human operator. The method is suitable for processing pictures of one optical band as well as polyzonal and hyperspectral aero pictures. It allows using a priori knowledge for visualization. Different methods of preliminary image processing can also be easily included in the model. As a result a number of alternative visualizations of the same dataset can be obtained depending on the interpretation objectives. The method is particularly efficient at the interpretation of barely visible objects. It makes it possible to reduce the influence of particular conditions of remote probing such as lightening conditions and an optical receiver on the results of visual interpretation.
Fuzzy logic and image processing techniques for the interpretation of seismic data
NASA Astrophysics Data System (ADS)
Orozco-del-Castillo, M. G.; Ortiz-Alemán, C.; Urrutia-Fucugauchi, J.; Rodríguez-Castellanos, A.
2011-06-01
Since interpretation of seismic data is usually a tedious and repetitive task, the ability to do so automatically or semi-automatically has become an important objective of recent research. We believe that the vagueness and uncertainty in the interpretation process makes fuzzy logic an appropriate tool to deal with seismic data. In this work we developed a semi-automated fuzzy inference system to detect the internal architecture of a mass transport complex (MTC) in seismic images. We propose that the observed characteristics of a MTC can be expressed as fuzzy if-then rules consisting of linguistic values associated with fuzzy membership functions. The constructions of the fuzzy inference system and various image processing techniques are presented. We conclude that this is a well-suited problem for fuzzy logic since the application of the proposed methodology yields a semi-automatically interpreted MTC which closely resembles the MTC from expert manual interpretation.
MALDI imaging mass spectrometry: statistical data analysis and current computational challenges
2012-01-01
Matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) imaging mass spectrometry, also called MALDI-imaging, is a label-free bioanalytical technique used for spatially-resolved chemical analysis of a sample. Usually, MALDI-imaging is exploited for analysis of a specially prepared tissue section thaw mounted onto glass slide. A tremendous development of the MALDI-imaging technique has been observed during the last decade. Currently, it is one of the most promising innovative measurement techniques in biochemistry and a powerful and versatile tool for spatially-resolved chemical analysis of diverse sample types ranging from biological and plant tissues to bio and polymer thin films. In this paper, we outline computational methods for analyzing MALDI-imaging data with the emphasis on multivariate statistical methods, discuss their pros and cons, and give recommendations on their application. The methods of unsupervised data mining as well as supervised classification methods for biomarker discovery are elucidated. We also present a high-throughput computational pipeline for interpretation of MALDI-imaging data using spatial segmentation. Finally, we discuss current challenges associated with the statistical analysis of MALDI-imaging data. PMID:23176142
ERIC Educational Resources Information Center
Barner, David; Snedeker, Jesse
2008-01-01
Four experiments investigated 4-year-olds' understanding of adjective-noun compositionality and their sensitivity to statistics when interpreting scalar adjectives. In Experiments 1 and 2, children selected "tall" and "short" items from 9 novel objects called "pimwits" (1-9 in. in height) or from this array plus 4 taller or shorter distractor…
77 FR 65585 - Renewal of the Bureau of Labor Statistics Data Users Advisory Committee
Federal Register 2010, 2011, 2012, 2013, 2014
2012-10-29
... of Labor Statistics Renewal of the Bureau of Labor Statistics Data Users Advisory Committee The... determined that the renewal of the Bureau of Labor Statistics Data Users Advisory Committee (the ``Committee... of Labor Statistics by 29 U.S.C. 1 and 2. This determination follows consultation with the...
STATISTICAL ESTIMATION AND VISUALIZATION OF GROUND-WATER CONTAMINATION DATA
This work presents methods of visualizing and animating statistical estimates of ground water and/or soil contamination over a region from observations of the contaminant for that region. The primary statistical methods used to produce the regional estimates are nonparametric re...
Chen, Chih-Hao; Hsu, Chueh-Lin; Huang, Shih-Hao; Chen, Shih-Yuan; Hung, Yi-Lin; Chen, Hsiao-Rong; Wu, Yu-Chung
2015-01-01
Although genome-wide expression analysis has become a routine tool for gaining insight into molecular mechanisms, extraction of information remains a major challenge. It has been unclear why standard statistical methods, such as the t-test and ANOVA, often lead to low levels of reproducibility, how likely applying fold-change cutoffs to enhance reproducibility is to miss key signals, and how adversely using such methods has affected data interpretations. We broadly examined expression data to investigate the reproducibility problem and discovered that molecular heterogeneity, a biological property of genetically different samples, has been improperly handled by the statistical methods. Here we give a mathematical description of the discovery and report the development of a statistical method, named HTA, for better handling molecular heterogeneity. We broadly demonstrate the improved sensitivity and specificity of HTA over the conventional methods and show that using fold-change cutoffs has lost much information. We illustrate the especial usefulness of HTA for heterogeneous diseases, by applying it to existing data sets of schizophrenia, bipolar disorder and Parkinson’s disease, and show it can abundantly and reproducibly uncover disease signatures not previously detectable. Based on 156 biological data sets, we estimate that the methodological issue has affected over 96% of expression studies and that HTA can profoundly correct 86% of the affected data interpretations. The methodological advancement can better facilitate systems understandings of biological processes, render biological inferences that are more reliable than they have hitherto been and engender translational medical applications, such as identifying diagnostic biomarkers and drug prediction, which are more robust. PMID:25793610
NASA Astrophysics Data System (ADS)
Lacroix, Dominic
Ground-penetrating radar (GPR) is one of the major geophysical prospecting techniques used in archaeology. Complex GPR profile data contains detailed reflections produced by subsurface features, but they are difficult to interpret. To help the interpretation of GPR profile data in an archaeological context, the use of computer models is investigated. Synthetic models can be used to produce reflection analogues that can be compared to real field data to help identify reflections produced by specific archaeological features. Modelling results can also be used to test hypotheses to determine which best explains the reflections observed in GPR profile data. Two test cases are presented, clearly demonstrating the benefits of using GPR models to help interpret reflection patterns produced by buried archaeological features.
Statistical methods for detecting periodic fragments in DNA sequence data
2011-01-01
Background Period 10 dinucleotides are structurally and functionally validated factors that influence the ability of DNA to form nucleosomes, histone core octamers. Robust identification of periodic signals in DNA sequences is therefore required to understand nucleosome organisation in genomes. While various techniques for identifying periodic components in genomic sequences have been proposed or adopted, the requirements for such techniques have not been considered in detail and confirmatory testing for a priori specified periods has not been developed. Results We compared the estimation accuracy and suitability for confirmatory testing of autocorrelation, discrete Fourier transform (DFT), integer period discrete Fourier transform (IPDFT) and a previously proposed Hybrid measure. A number of different statistical significance procedures were evaluated but a blockwise bootstrap proved superior. When applied to synthetic data whose period-10 signal had been eroded, or for which the signal was approximately period-10, the Hybrid technique exhibited superior properties during exploratory period estimation. In contrast, confirmatory testing using the blockwise bootstrap procedure identified IPDFT as having the greatest statistical power. These properties were validated on yeast sequences defined from a ChIP-chip study where the Hybrid metric confirmed the expected dominance of period-10 in nucleosome associated DNA but IPDFT identified more significant occurrences of period-10. Application to the whole genomes of yeast and mouse identified ~ 21% and ~ 19% respectively of these genomes as spanned by period-10 nucleosome positioning sequences (NPS). Conclusions For estimating the dominant period, we find the Hybrid period estimation method empirically to be the most effective for both eroded and approximate periodicity. The blockwise bootstrap was found to be effective as a significance measure, performing particularly well in the problem of period detection in the
Diagnostic Interpretation of Array Data Using Public Databases and Internet Sources
de Leeuw, Nicole; Dijkhuizen, Trijnie; Hehir-Kwa, Jayne Y.; Carter, Nigel P.; Feuk, Lars; Firth, Helen V.; Kuhn, Robert M.; Ledbetter, David H.; Martin, Christa Lese; van Ravenswaaij-Arts, Conny M. A.; Scherer, Steven W.; Shams, Soheil; Van Vooren, Steven; Sijmons, Rolf; Swertz, Morris; Hastings, Ros
2016-01-01
The range of commercially available array platforms and analysis software packages is expanding and their utility is improving, making reliable detection of copy-number variants (CNVs) relatively straightforward. Reliable interpretation of CNV data, however, is often difficult and requires expertise. With our knowledge of the human genome growing rapidly, applications for array testing continuously broadening, and the resolution of CNV detection increasing, this leads to great complexity in interpreting what can be daunting data. Correct CNV interpretation and optimal use of the genotype information provided by single-nucleotide polymorphism probes on an array depends largely on knowledge present in various resources. In addition to the availability of host laboratories’ own datasets and national registries, there are several public databases and Internet resources with genotype and phenotype information that can be used for array data interpretation. With so many resources now available, it is important to know which are fit-for-purpose in a diagnostic setting. We summarize the characteristics of the most commonly used Internet databases and resources, and propose a general data interpretation strategy that can be used for comparative hybridization, comparative intensity, and genotype-based array data. PMID:26285306
Geochemical portray of the Pacific Ridge: New isotopic data and statistical techniques
NASA Astrophysics Data System (ADS)
Hamelin, Cédric; Dosso, Laure; Hanan, Barry B.; Moreira, Manuel; Kositsky, Andrew P.; Thomas, Marion Y.
2011-02-01
Samples collected during the PACANTARCTIC 2 cruise fill a sampling gap from 53° to 41° S along the Pacific Antarctic Ridge (PAR). Analysis of Sr, Nd, Pb, Hf, and He isotope compositions of these new samples is shown together with published data from 66°S to 53°S and from the EPR. The recent advance in analytical mass spectrometry techniques generates a spectacular increase in the number of multidimensional isotopic data for oceanic basalts. Working with such multidimensional datasets generates a new approach for the data interpretation, preferably based on statistical analysis techniques. Principal Component Analysis (PCA) is a powerful mathematical tool to study this type of datasets. The purpose of PCA is to reduce the number of dimensions by keeping only those characteristics that contribute most to its variance. Using this technique, it becomes possible to have a statistical picture of the geochemical variations along the entire Pacific Ridge from 70°S to 10°S. The incomplete sampling of the ridge led previously to the identification of a large-scale division of the south Pacific mantle at the latitude of Easter Island. The PCA method applied here to the completed dataset reveals a different geochemical profile. Along the Pacific Ridge, a large-scale bell-shaped variation with an extremum at about 38°S of latitude is interpreted as a progressive change in the geochemical characteristics of the depleted matrix of the mantle. This Pacific Isotopic Bump (PIB) is also noticeable in the He isotopic ratio along-axis variation. The linear correlation observed between He and heavy radiogenic isotopes, together with the result of the PCA calculation, suggests that the large-scale variation is unrelated to the plume-ridge interactions in the area and should rather be attributed to the partial melting of a marble-cake assemblage.
Interpretation Of Assembly Task Constraints From Position And Force Sensory Data
NASA Astrophysics Data System (ADS)
Hou, E. S. H.; Lee, C. S. G.
1990-03-01
One of the major deficiencies in current robot control schemes is the lack of high-level knowledge in the feedback loop. Typically, the sensory data acquired are fed back to the robot controller with minimal amount of processing. However, by accumulating useful sensory data and processing them intelligently, one can obtain invaluable information about the state of the task being performed by the robot. This paper presents a method based on the screw theory for interpreting the position and force sensory data into high-level assembly task constraints. The position data are obtained from the joint angle encoders of the manipulator and the force data are obtained from a wrist force sensor attached to the mounting plate of the manipulator end-effector. The interpretation of the sensory data is divided into two subproblems: representation problem and interpretation problem. Spatial and physical constraints based on the screw axis and force axis of the manipulator are used to represent the high-level task constraints. Algorithms which yield least-squared error results are developed to obtain the spatial and physical constraints from the position and force data. The spatial and physical constraints obtained from the sensory data are then compared with the desired spatial and physical constraints to interpret the state of the assembly task. Computer simulation and experimental results for verifying the validity of the algorithms are also presented and discussed.
Gunter, Bert; Brideau, Christine; Pikounis, Bill; Liaw, Andy
2003-12-01
High-throughput screening (HTS) is used in modern drug discovery to screen hundreds of thousands to millions of compounds on selected protein targets. It is an industrial-scale process relying on sophisticated automation and state-of-the-art detection technologies. Quality control (QC) is an integral part of the process and is used to ensure good quality data and mini mize assay variability while maintaining assay sensitivity. The authors describe new QC methods and show numerous real examples from their biologist-friendly Stat Server HTS application, a custom-developed software tool built from the commercially available S-PLUS and Stat Server statistical analysis and server software. This system remotely processes HTS data using powerful and sophisticated statistical methodology but insulates users from the technical details by outputting results in a variety of readily interpretable graphs and tables. It allows users to visualize HTS data and examine assay performance during the HTS campaign to quickly react to or avoid quality problems.
Characterization of spatial statistics of distributed targets in SAR data. [applied to sea-ice data
NASA Technical Reports Server (NTRS)
Rignot, E.; Kwok, R.
1993-01-01
A statistical approach to the analysis of spatial statistics in polarimetric multifrequency SAR data, which is aimed at extracting the intrinsic variability of the target by removing variability from other sources, is presented. An image model, which takes into account three sources of spatial variability, namely, image speckle, system noise, and the intrinsic spatial variability of the target or texture, is described. It is shown that the presence of texture increases the image variance-to-mean square ratio and introduces deviations of the image autocovariance function from the expected SAR system response. The approach is exemplified by sea-ice SAR imagery acquired by the Jet Propulsion Laboratory three-frequency polarimetric airborne SAR. Data obtained indicate that, for different sea-ice types, the spatial statistics seem to vary more across frequency than across polarization and the observed differences increase in magnitude with decreasing frequency.
Interpreters in cross-cultural interviews: a three-way coconstruction of data.
Björk Brämberg, Elisabeth; Dahlberg, Karin
2013-02-01
Our focus in this article is research interviews that involve two languages. We present an epistemological and methodological analysis of the meaning of qualitative interviewing with an interpreter. The results of the analysis show that such interviewing is not simply exchanging words between two languages, but means understanding, grasping the essential meanings of the spoken words, which requires an interpreter to bridge the different horizons of understanding. Consequently, a research interview including an interpreter means a three-way coconstruction of data. We suggest that interpreters be thoroughly introduced into the research process and research interview technique, that they take part in the preparations for the interview event, and evaluate the translation process with the researcher and informant after the interview. PMID:23258420
ISSUES IN THE STATISTICAL ANALYSIS OF SMALL-AREA HEALTH DATA. (R825173)
The availability of geographically indexed health and population data, with advances in computing, geographical information systems and statistical methodology, have opened the way for serious exploration of small area health statistics based on routine data. Such analyses may be...
Computer Search Center Statistics on Users and Data Bases
ERIC Educational Resources Information Center
Schipma, Peter B.
1974-01-01
Statistics gathered over five years of operation by the IIT Research Institute's Computer Search Center are summarized for profile terms and lists, use of truncation modes, use of logic operators, some characteristics of CA Condensates, etc. (Author/JB)
47 CFR 1.363 - Introduction of statistical data.
Code of Federal Regulations, 2010 CFR
2010-10-01
... analyses, and experiments, and those parts of other studies involving statistical methodology shall be.... When alternative models and variables have been employed, a record shall be kept of these...
47 CFR 1.363 - Introduction of statistical data.
Code of Federal Regulations, 2011 CFR
2011-10-01
... analyses, and experiments, and those parts of other studies involving statistical methodology shall be.... When alternative models and variables have been employed, a record shall be kept of these...
47 CFR 1.363 - Introduction of statistical data.
Code of Federal Regulations, 2014 CFR
2014-10-01
... analyses, and experiments, and those parts of other studies involving statistical methodology shall be.... When alternative models and variables have been employed, a record shall be kept of these...
47 CFR 1.363 - Introduction of statistical data.
Code of Federal Regulations, 2012 CFR
2012-10-01
... analyses, and experiments, and those parts of other studies involving statistical methodology shall be.... When alternative models and variables have been employed, a record shall be kept of these...
47 CFR 1.363 - Introduction of statistical data.
Code of Federal Regulations, 2013 CFR
2013-10-01
... analyses, and experiments, and those parts of other studies involving statistical methodology shall be.... When alternative models and variables have been employed, a record shall be kept of these...
Notes on interpretation of geophysical data over areas of mineralization in Afghanistan
Drenth, Benjamin J.
2011-01-01
Afghanistan has the potential to contain substantial metallic mineral resources. Although valuable mineral deposits have been identified, much of the country's potential remains unknown. Geophysical surveys, particularly those conducted from airborne platforms, are a well-accepted and cost-effective method for obtaining information on the geological setting of a given area. This report summarizes interpretive findings from various geophysical surveys over selected mineral targets in Afghanistan, highlighting what existing data tell us. These interpretations are mainly qualitative in nature, because of the low resolution of available geophysical data. Geophysical data and simple interpretations are included for these six areas and deposit types: (1) Aynak: Sedimentary-hosted copper; (2) Zarkashan: Porphyry copper; (3) Kundalan: Porphyry copper; (4) Dusar Shaida: Volcanic-hosted massive sulphide; (5) Khanneshin: Carbonatite-hosted rare earth element; and (6) Chagai Hills: Porphyry copper.
Using Neural Networks for Descriptive Statistical Analysis of Educational Data.
ERIC Educational Resources Information Center
Tirri, Henry; And Others
Methodological issues of using a class of neural networks called Mixture Density Networks (MDN) for discriminant analysis are discussed. MDN models have the advantage of having a rigorous probabilistic interpretation, and they have proven to be a viable alternative as a classification procedure in discrete domains. Both classification and…
Robust statistical approaches to assess the degree of agreement of clinical data
NASA Astrophysics Data System (ADS)
Grilo, Luís M.; Grilo, Helena L.
2016-06-01
To analyze the blood of patients who took vitamin B12 for a period of time, two different medicine measurement methods were used (one is the established method, with more human intervention, and the other method uses essentially machines). Given the non-normality of the differences between both measurement methods, the limits of agreement are estimated using also a non-parametric approach to assess the degree of agreement of the clinical data. The bootstrap resampling method is applied in order to obtain robust confidence intervals for mean and median of differences. The approaches used are easy to apply, running a friendly software, and their outputs are also easy to interpret. In this case study the results obtained with (non)parametric approaches lead us to different statistical conclusions, but the decision whether agreement is acceptable or not is always a clinical judgment.
Quantum of area {Delta}A=8{pi}l{sub P}{sup 2} and a statistical interpretation of black hole entropy
Ropotenko, Kostiantyn
2010-08-15
In contrast to alternative values, the quantum of area {Delta}A=8{pi}l{sub P}{sup 2} does not follow from the usual statistical interpretation of black hole entropy; on the contrary, a statistical interpretation follows from it. This interpretation is based on the two concepts: nonadditivity of black hole entropy and Landau quantization. Using nonadditivity a microcanonical distribution for a black hole is found and it is shown that the statistical weight of a black hole should be proportional to its area. By analogy with conventional Landau quantization, it is shown that quantization of a black hole is nothing but the Landau quantization. The Landau levels of a black hole and their degeneracy are found. The degree of degeneracy is equal to the number of ways to distribute a patch of area 8{pi}l{sub P}{sup 2} over the horizon. Taking into account these results, it is argued that the black hole entropy should be of the form S{sub bh}=2{pi}{center_dot}{Delta}{Gamma}, where the number of microstates is {Delta}{Gamma}=A/8{pi}l{sub P}{sup 2}. The nature of the degrees of freedom responsible for black hole entropy is elucidated. The applications of the new interpretation are presented. The effect of noncommuting coordinates is discussed.
Statistical Analysis of Probability of Detection Hit/Miss Data for Small Data Sets
NASA Astrophysics Data System (ADS)
Harding, C. A.; Hugo, G. R.
2003-03-01
This paper examines the validity of statistical methods for determining nondestructive inspection probability of detection (POD) curves from relatively small hit/miss POD data sets. One method published in the literature is shown to be invalid for analysis of POD hit/miss data. Another standard method is shown to be valid only for data sets containing more than 200 observations. An improved method is proposed which allows robust lower 95% confidence limit POD curves to be determined from data sets containing as few as 50 hit/miss observations.
42 CFR 417.568 - Adequate financial records, statistical data, and cost finding.
Code of Federal Regulations, 2014 CFR
2014-10-01
... 42 Public Health 3 2014-10-01 2014-10-01 false Adequate financial records, statistical data, and....568 Adequate financial records, statistical data, and cost finding. (a) Maintenance of records. (1) An HMO or CMP must maintain sufficient financial records and statistical data for proper determination...
42 CFR 417.568 - Adequate financial records, statistical data, and cost finding.
Code of Federal Regulations, 2013 CFR
2013-10-01
... 42 Public Health 3 2013-10-01 2013-10-01 false Adequate financial records, statistical data, and....568 Adequate financial records, statistical data, and cost finding. (a) Maintenance of records. (1) An HMO or CMP must maintain sufficient financial records and statistical data for proper determination...
42 CFR 417.568 - Adequate financial records, statistical data, and cost finding.
Code of Federal Regulations, 2012 CFR
2012-10-01
... 42 Public Health 3 2012-10-01 2012-10-01 false Adequate financial records, statistical data, and....568 Adequate financial records, statistical data, and cost finding. (a) Maintenance of records. (1) An HMO or CMP must maintain sufficient financial records and statistical data for proper determination...
42 CFR 417.568 - Adequate financial records, statistical data, and cost finding.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 42 Public Health 3 2011-10-01 2011-10-01 false Adequate financial records, statistical data, and... financial records, statistical data, and cost finding. (a) Maintenance of records. (1) An HMO or CMP must maintain sufficient financial records and statistical data for proper determination of costs payable by...
NASA Astrophysics Data System (ADS)
McLean, M. A.; Wilson, C. J. L.; Boger, S. D.; Betts, P. G.; Rawling, T. J.; Damaske, D.
2009-06-01
Geological exposures in the Lambert Rift region of East Antarctica comprise scattered coastal outcrops and inland nunataks sporadically protruding through the Antarctic ice sheet from Prydz Bay to the southernmost end of the Prince Charles Mountains. This study utilized airborne magnetic, gravity, and ice radar data to interpret the distribution and architecture of tectonic terranes that are largely buried beneath the thick ice sheet. Free-air and Bouguer gravity data are highly influenced by the subice and mantle topography, respectively. Gravity stripping facilitated the removal of the effect of ice and Moho, and the residual gravity data set thus obtained for the intermediate crustal level allowed a direct comparison with magnetic data. Interpretation of geophysical data also provided insight into the distribution and geometry of four tectonic blocks: namely, the Vestfold, Beaver, Mawson, and Gamburtsev domains. These tectonic domains are supported by surface observations such as rock descriptions, isotopic data sets, and structural mapping.
WebGIS System Provides Spatial Context for Interpreting Biophysical data
NASA Astrophysics Data System (ADS)
Graham, R. L.; Santhana Vannan, K.; Olsen, L. M.; Palanisamy, G.; Cook, R. B.; Beaty, T. W.; Holladay, S. K.; Rhyne, T.; Voorhees, L. D.
2006-05-01
Understanding the spatial context of biophysical data such as measurements of Net Primary Productivity or carbon fluxes at tower sites is useful in their interpretation. The ORNL DAAC has developed a WebGIS system to help users visualize, locate and extract landcover, biophysical, elevation, and geopolitical data archived at the DAAC and/or point the users to the primary data location as in case of flux tower measurements. The system currently allows the user to extract data for thirteen different map features including four vector data sets and nine raster coverages. Four OGC layers are also available to help interpretation of the site specific data. The user can select either the Global or North American version of the system. Users can interrogate map features, extract and download map features including map layers (shape files). The user can download data for their region of interest as a shapefile in case of vector data and as a GeoTiff in case of raster data. A single file is created for each map feature. Twenty eight tools are provided to let the user identify, select, query, interpret and download the data.
What defines an Expert? - Uncertainty in the interpretation of seismic data
NASA Astrophysics Data System (ADS)
Bond, C. E.
2008-12-01
Studies focusing on the elicitation of information from experts are concentrated primarily in economics and world markets, medical practice and expert witness testimonies. Expert elicitation theory has been applied in the natural sciences, most notably in the prediction of fluid flow in hydrological studies. In the geological sciences expert elicitation has been limited to theoretical analysis with studies focusing on the elicitation element, gaining expert opinion rather than necessarily understanding the basis behind the expert view. In these cases experts are defined in a traditional sense, based for example on: standing in the field, no. of years of experience, no. of peer reviewed publications, the experts position in a company hierarchy or academia. Here traditional indicators of expertise have been compared for significance on affective seismic interpretation. Polytomous regression analysis has been used to assess the relative significance of length and type of experience on the outcome of a seismic interpretation exercise. Following the initial analysis the techniques used by participants to interpret the seismic image were added as additional variables to the analysis. Specific technical skills and techniques were found to be more important for the affective geological interpretation of seismic data than the traditional indicators of expertise. The results of a seismic interpretation exercise, the techniques used to interpret the seismic and the participant's prior experience have been combined and analysed to answer the question - who is and what defines an expert?
Quantum Correlations from the Conditional Statistics of Incomplete Data.
Sperling, J; Bartley, T J; Donati, G; Barbieri, M; Jin, X-M; Datta, A; Vogel, W; Walmsley, I A
2016-08-19
We study, in theory and experiment, the quantum properties of correlated light fields measured with click-counting detectors providing incomplete information on the photon statistics. We establish a correlation parameter for the conditional statistics, and we derive the corresponding nonclassicality criteria for detecting conditional quantum correlations. Classical bounds for Pearson's correlation parameter are formulated that allow us, once they are violated, to determine nonclassical correlations via the joint statistics. On the one hand, we demonstrate nonclassical correlations in terms of the joint click statistics of light produced by a parametric down-conversion source. On the other hand, we verify quantum correlations of a heralded, split single-photon state via the conditional click statistics together with a generalization to higher-order moments. We discuss the performance of the presented nonclassicality criteria to successfully discern joint and conditional quantum correlations. Remarkably, our results are obtained without making any assumptions on the response function, quantum efficiency, and dark-count rate of photodetectors. PMID:27588857
Quantum Correlations from the Conditional Statistics of Incomplete Data
NASA Astrophysics Data System (ADS)
Sperling, J.; Bartley, T. J.; Donati, G.; Barbieri, M.; Jin, X.-M.; Datta, A.; Vogel, W.; Walmsley, I. A.
2016-08-01
We study, in theory and experiment, the quantum properties of correlated light fields measured with click-counting detectors providing incomplete information on the photon statistics. We establish a correlation parameter for the conditional statistics, and we derive the corresponding nonclassicality criteria for detecting conditional quantum correlations. Classical bounds for Pearson's correlation parameter are formulated that allow us, once they are violated, to determine nonclassical correlations via the joint statistics. On the one hand, we demonstrate nonclassical correlations in terms of the joint click statistics of light produced by a parametric down-conversion source. On the other hand, we verify quantum correlations of a heralded, split single-photon state via the conditional click statistics together with a generalization to higher-order moments. We discuss the performance of the presented nonclassicality criteria to successfully discern joint and conditional quantum correlations. Remarkably, our results are obtained without making any assumptions on the response function, quantum efficiency, and dark-count rate of photodetectors.
Quantum Correlations from the Conditional Statistics of Incomplete Data.
Sperling, J; Bartley, T J; Donati, G; Barbieri, M; Jin, X-M; Datta, A; Vogel, W; Walmsley, I A
2016-08-19
We study, in theory and experiment, the quantum properties of correlated light fields measured with click-counting detectors providing incomplete information on the photon statistics. We establish a correlation parameter for the conditional statistics, and we derive the corresponding nonclassicality criteria for detecting conditional quantum correlations. Classical bounds for Pearson's correlation parameter are formulated that allow us, once they are violated, to determine nonclassical correlations via the joint statistics. On the one hand, we demonstrate nonclassical correlations in terms of the joint click statistics of light produced by a parametric down-conversion source. On the other hand, we verify quantum correlations of a heralded, split single-photon state via the conditional click statistics together with a generalization to higher-order moments. We discuss the performance of the presented nonclassicality criteria to successfully discern joint and conditional quantum correlations. Remarkably, our results are obtained without making any assumptions on the response function, quantum efficiency, and dark-count rate of photodetectors.
Teaching for Statistical Literacy: Utilising Affordances in Real-World Data
ERIC Educational Resources Information Center
Chick, Helen L.; Pierce, Robyn
2012-01-01
It is widely held that context is important in teaching mathematics and statistics. Consideration of context is central to statistical thinking, and any teaching of statistics must incorporate this aspect. Indeed, it has been advocated that real-world data sets can motivate the learning of statistical principles. It is not, however, a…
ROOT: A C++ framework for petabyte data storage, statistical analysis and visualization
Antcheva, I.; Ballintijn, M.; Bellenot, B.; Biskup, M.; Brun, R.; Buncic, N.; Canal, Ph.; Casadei, D.; Couet, O.; Fine, V.; Franco, L.; /CERN /CERN
2009-01-01
ROOT is an object-oriented C++ framework conceived in the high-energy physics (HEP) community, designed for storing and analyzing petabytes of data in an efficient way. Any instance of a C++ class can be stored into a ROOT file in a machine-independent compressed binary format. In ROOT the TTree object container is optimized for statistical data analysis over very large data sets by using vertical data storage techniques. These containers can span a large number of files on local disks, the web or a number of different shared file systems. In order to analyze this data, the user can chose out of a wide set of mathematical and statistical functions, including linear algebra classes, numerical algorithms such as integration and minimization, and various methods for performing regression analysis (fitting). In particular, the RooFit package allows the user to perform complex data modeling and fitting while the RooStats library provides abstractions and implementations for advanced statistical tools. Multivariate classification methods based on machine learning techniques are available via the TMVA package. A central piece in these analysis tools are the histogram classes which provide binning of one- and multi-dimensional data. Results can be saved in high-quality graphical formats like Postscript and PDF or in bitmap formats like JPG or GIF. The result can also be stored into ROOT macros that allow a full recreation and rework of the graphics. Users typically create their analysis macros step by step, making use of the interactive C++ interpreter CINT, while running over small data samples. Once the development is finished, they can run these macros at full compiled speed over large data sets, using on-the-fly compilation, or by creating a stand-alone batch program. Finally, if processing farms are available, the user can reduce the execution time of intrinsically parallel tasks - e.g. data mining in HEP - by using PROOF, which will take care of optimally
Statistical Analysis of CMC Constituent and Processing Data
NASA Technical Reports Server (NTRS)
Fornuff, Jonathan
2004-01-01
Ceramic Matrix Composites (CMCs) are the next "big thing" in high-temperature structural materials. In the case of jet engines, it is widely believed that the metallic superalloys currently being utilized for hot structures (combustors, shrouds, turbine vanes and blades) are nearing their potential limits of improvement. In order to allow for increased turbine temperatures to increase engine efficiency, material scientists have begun looking toward advanced CMCs and SiC/SiC composites in particular. Ceramic composites provide greater strength-to-weight ratios at higher temperatures than metallic alloys, but at the same time require greater challenges in micro-structural optimization that in turn increases the cost of the material as well as increases the risk of variability in the material s thermo-structural behavior. to model various potential CMC engine materials and examines the current variability in these properties due to variability in component processing conditions and constituent materials; then, to see how processing and constituent variations effect key strength, stiffness, and thermal properties of the finished components. Basically, this means trying to model variations in the component s behavior by knowing what went into creating it. inter-phase and manufactured by chemical vapor infiltration (CVI) and melt infiltration (MI) were considered. Examinations of: (1) the percent constituents by volume, (2) the inter-phase thickness, (3) variations in the total porosity, and (4) variations in the chemical composition of the Sic fiber are carried out and modeled using various codes used here at NASA-Glenn (PCGina, NASALife, CEMCAN, etc...). The effects of these variations and the ranking of their respective influences on the various thermo-mechanical material properties are studied and compared to available test data. The properties of the materials as well as minor changes to geometry are then made to the computer model and the detrimental effects
Lin, Meng Kuan; Nicolini, Oliver; Waxenegger, Harald; Galloway, Graham J.; Ullmann, Jeremy F. P.; Janke, Andrew L.
2013-01-01
Digital Imaging Processing (DIP) requires data extraction and output from a visualization tool to be consistent. Data handling and transmission between the server and a user is a systematic process in service interpretation. The use of integrated medical services for management and viewing of imaging data in combination with a mobile visualization tool can be greatly facilitated by data analysis and interpretation. This paper presents an integrated mobile application and DIP service, called M-DIP. The objective of the system is to (1) automate the direct data tiling, conversion, pre-tiling of brain images from Medical Imaging NetCDF (MINC), Neuroimaging Informatics Technology Initiative (NIFTI) to RAW formats; (2) speed up querying of imaging measurement; and (3) display high-level of images with three dimensions in real world coordinates. In addition, M-DIP provides the ability to work on a mobile or tablet device without any software installation using web-based protocols. M-DIP implements three levels of architecture with a relational middle-layer database, a stand-alone DIP server, and a mobile application logic middle level realizing user interpretation for direct querying and communication. This imaging software has the ability to display biological imaging data at multiple zoom levels and to increase its quality to meet users’ expectations. Interpretation of bioimaging data is facilitated by an interface analogous to online mapping services using real world coordinate browsing. This allows mobile devices to display multiple datasets simultaneously from a remote site. M-DIP can be used as a measurement repository that can be accessed by any network environment, such as a portable mobile or tablet device. In addition, this system and combination with mobile applications are establishing a virtualization tool in the neuroinformatics field to speed interpretation services. PMID:23847587
ERIC Educational Resources Information Center
Cruce, Ty M.
2009-01-01
This methodological note illustrates how a commonly used calculation of the Delta-p statistic is inappropriate for categorical independent variables, and this note provides users of logistic regression with a revised calculation of the Delta-p statistic that is more meaningful when studying the differences in the predicted probability of an…
NASA Technical Reports Server (NTRS)
Taylor, P. T.; Kis, K. I.; Wittmann, G.
2013-01-01
The ESA SWARM mission will have three earth orbiting magnetometer bearing satellites one in a high orbit and two side-by-side in lower orbits. These latter satellites will record a horizontal magnetic gradient. In order to determine how we can use these gradient measurements for interpretation of large geologic units we used ten years of CHAMP data to compute a horizontal gradient map over a section of southeastern Europe with our goal to interpret these data over the Pannonian Basin of Hungary.
Building Basic Statistical Literacy with U.S. Census Data
ERIC Educational Resources Information Center
Sheffield, Caroline C.; Karp, Karen S.; Brown, E. Todd
2010-01-01
The world is filled with information delivered through graphical representations--everything from voting trends to economic projections to health statistics. Whether comparing incomes of individuals by their level of education, tracking the rise and fall of state populations, or researching home ownership in different geographical areas, basic…
Simple Data Sets for Distinct Basic Summary Statistics
ERIC Educational Resources Information Center
Lesser, Lawrence M.
2011-01-01
It is important to avoid ambiguity with numbers because unfortunate choices of numbers can inadvertently make it possible for students to form misconceptions or make it difficult for teachers to tell if students obtained the right answer for the right reason. Therefore, it is important to make sure when introducing basic summary statistics that…
Computational Approaches and Tools for Exposure Prioritization and Biomonitoring Data Interpretation
The ability to describe the source-environment-exposure-dose-response continuum is essential for identifying exposures of greater concern to prioritize chemicals for toxicity testing or risk assessment, as well as for interpreting biomarker data for better assessment of exposure ...
Qualitative Data Analysis and Interpretation in Counseling Psychology: Strategies for Best Practices
ERIC Educational Resources Information Center
Yeh, Christine J.; Inman, Arpana G.
2007-01-01
This article presents an overview of various strategies and methods of engaging in qualitative data interpretations and analyses in counseling psychology. The authors explore the themes of self, culture, collaboration, circularity, trustworthiness, and evidence deconstruction from multiple qualitative methodologies. Commonalities and differences…
Interpreting Reading Assessment Data: Moving From Parts to Whole in a Testing Era
ERIC Educational Resources Information Center
Amendum, Steven J.; Conradi, Kristin; Pendleton, Melissa J.
2016-01-01
This article is designed to help teachers interpret reading assessment data from DIBELS beyond individual subtests to better support their students' needs. While it is important to understand the individual subtest measures, it is more vital to understand how each fits into the larger picture of reading development. The underlying construct of…
ERIC Educational Resources Information Center
Walther, Joachim; Sochacka, Nicola W.; Pawley, Alice L.
2016-01-01
This article explores challenges and opportunities associated with sharing qualitative data in engineering education research. This exploration is theoretically informed by an existing framework of interpretive research quality with a focus on the concept of Communicative Validation. Drawing on practice anecdotes from the authors' work, the…
Maltais Lapointe, Genevieve; Lynnerup, Niels; Hoppa, Robert D
2016-01-01
The most common method to predict nasal projection for forensic facial approximation is Gerasimov's two-tangent method. Ullrich H, Stephan CN (J Forensic Sci, 2011; 56: 470) argued that the method has not being properly implemented and a revised interpretation was proposed. The aim of this study was to compare the accuracy of both versions using a sample of 66 postmortem cranial CT data. The true nasal tip was defined using pronasale and nasal spine line, as it was not originally specified by Gerasimov. The original guidelines were found to be highly inaccurate with the position of the nasal tip being overestimated by c. 2 cm. Despite the revised interpretation consistently resulting in smaller distance from true nasal tip, the method was not statistically accurate (p > 0.05) in positioning the tip of the nose (absolute distance >5 mm). These results support that Gerasimov's method was not properly performed, and Ullrich H, Stephan CN (J Forensic Sci, 2011; 56: 470) interpretation should be used instead.
Performance Data Gathering and Representation from Fixed-Size Statistical Data
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Jin, Haoqiang H.; Schmidt, Melisa A.; Kutler, Paul (Technical Monitor)
1997-01-01
The two commonly-used performance data types in the super-computing community, statistics and event traces, are discussed and compared. Statistical data are much more compact but lack the probative power event traces offer. Event traces, on the other hand, are unbounded and can easily fill up the entire file system during program execution. In this paper, we propose an innovative methodology for performance data gathering and representation that offers a middle ground. Two basic ideas are employed: the use of averages to replace recording data for each instance and 'formulae' to represent sequences associated with communication and control flow. The user can trade off tracing overhead, trace data size with data quality incrementally. In other words, the user will be able to limit the amount of trace data collected and, at the same time, carry out some of the analysis event traces offer using space-time views. With the help of a few simple examples, we illustrate the use of these techniques in performance tuning and compare the quality of the traces we collected with event traces. We found that the trace files thus obtained are, indeed, small, bounded and predictable before program execution, and that the quality of the space-time views generated from these statistical data are excellent. Furthermore, experimental results showed that the formulae proposed were able to capture all the sequences associated with 11 of the 15 applications tested. The performance of the formulae can be incrementally improved by allocating more memory at runtime to learn longer sequences.
Nair, G. Jaya
2013-01-01
Medical coding and dictionaries for clinical trials have seen a wave of change over the past decade where emphasis on more standardized tools for coding and reporting clinical data has taken precedence. Coding personifies the backbone of clinical reporting as safety data reports primarily depend on the coded data. Hence, maintaining an optimum quality of coding is quintessential to the accurate analysis and interpretation of critical clinical data. The perception that medical coding is merely a process of assigning numeric/alphanumeric codes to clinical data needs to be revisited. The significance of quality coding and its impact on clinical reporting has been highlighted in this article. PMID:24010060
NASA Technical Reports Server (NTRS)
Smith, G. L.; Green, R. N.; Young, G. R.
1974-01-01
The NIMBUS-G environmental monitoring satellite has an instrument (a gas correlation spectrometer) onboard for measuring the mass of a given pollutant within a gas volume. The present paper treats the problem: How can this type measurement be used to estimate the distribution of pollutant levels in a metropolitan area. Estimation methods are used to develop this distribution. The pollution concentration caused by a point source is modeled as a Gaussian plume. The uncertainty in the measurements is used to determine the accuracy of estimating the source strength, the wind velocity, diffusion coefficients and source location.
3D interpretation of SHARAD radargram data using seismic processing routines
NASA Astrophysics Data System (ADS)
Kleuskens, M. H. P.; Oosthoek, J. H. P.
2009-04-01
Ground penetrating radar on board a satellite has entered the field of planetary geology. Two radars enable subsurface observations of Mars. In 2003, ESA launched the Mars Express equipped with MARSIS, a low frequency radar which was able to detect only the base of the ice caps. Since December 2006, the Shallow Radar (SHARAD) of Agenzia Spaziale Italiana (ASI) on board the NASA Mars Reconnaissance Orbiter (MRO) is active in orbit around Mars. The SHARAD radar covers the frequency band between 15 and 25 MHz. The vertical resolution is about 15 m in free space. The horizontal resolution is 300-1000 m along track and 1500-8000 m across track. The radar penetrates the subsurface of Mars up to 2 km deep, and is capable of detecting multiple reflections in the ice caps of Mars. Considering the scarcity of planetary data relative to terrestrial data, it is essential to combine all available types of data of an area of interest. Up to now SHARAD data has only been interpreted separately as 2D radargrams. The Geological Survey of the Netherlands has decades of experience in interpreting 2D and 3D seismic data of the Dutch subsurface, especially for the 3D interpretation of reservoir characteristics of the deeper subsurface. In this abstract we present a methodology which can be used for 3D interpretation of SHARAD data combined with surface data using state-of-the art seismic software applied in the oil and gas industry. We selected a region that would be most suitable to demonstrate 3D interpretation. The Titania Lobe of the North Polar ice cap was selected based on the abundancy of radar data and the complexity of the ice lobe. SHARAD data is released to the scientific community via the Planetary Data System. It includes ‘Reduced Data Records' (RDR) data, a binary format which contains the radargram. First the binary radargram data and corresponding coordinates were combined and converted to the commonly used seismic seg-y format. Second, we used the reservoir
Interpretation Of Multifrequency Crosswell Electromagnetic Data With Frequency Dependent Core Data
Kirkendall, B; Roberts, J
2005-06-07
Interpretation of cross-borehole electromagnetic (EM) images acquired at enhanced oil recovery (EOR) sites has proven to be difficult due to the typically complex subsurface geology. Significant problems in image interpretation include correlation of specific electrical conductivity values with oil saturations, the time-dependent electrical variation of the subsurface during EOR, and the non-unique electrical conductivity relationship with subsurface conditions. In this study we perform laboratory electrical properties measurements of core samples from the EOR site to develop an interpretation approach that combines field images and petrophysical results. Cross-borehole EM images from the field indicate resistivity increases in EOR areas--behavior contrary to the intended waterflooding design. Laboratory measurements clearly show a decrease in resistivity with increasing effective pressure and are attributed to increased grain-to-grain contact enhancing a strong surface conductance. We also observe a resistivity increase for some samples during brine injection. These observations possibly explain the contrary behavior observed in the field images. Possible mechanisms for increasing the resistivity in the region include (1) increased oil content as injectate sweeps oil toward the plane of the observation wells; (2) lower conductance pore fluid displacing the high-conductivity brine; (3) degradation of grain-to-grain contacts of the initially conductive matrix; and (4) artifacts of the complicated resistivity/time history similar to that observed in the laboratory experiments.
Radar Derived Spatial Statistics of Summer Rain. Volume 2; Data Reduction and Analysis
NASA Technical Reports Server (NTRS)
Konrad, T. G.; Kropfli, R. A.
1975-01-01
Data reduction and analysis procedures are discussed along with the physical and statistical descriptors used. The statistical modeling techniques are outlined and examples of the derived statistical characterization of rain cells in terms of the several physical descriptors are presented. Recommendations concerning analyses which can be pursued using the data base collected during the experiment are included.
29 CFR 1904.42 - Requests from the Bureau of Labor Statistics for data.
Code of Federal Regulations, 2010 CFR
2010-07-01
... 29 Labor 5 2010-07-01 2010-07-01 false Requests from the Bureau of Labor Statistics for data. 1904... Statistics for data. (a) Basic requirement. If you receive a Survey of Occupational Injuries and Illnesses Form from the Bureau of Labor Statistics (BLS), or a BLS designee, you must promptly complete the...
29 CFR 1904.42 - Requests from the Bureau of Labor Statistics for data.
Code of Federal Regulations, 2014 CFR
2014-07-01
... 29 Labor 5 2014-07-01 2014-07-01 false Requests from the Bureau of Labor Statistics for data. 1904... Statistics for data. (a) Basic requirement. If you receive a Survey of Occupational Injuries and Illnesses Form from the Bureau of Labor Statistics (BLS), or a BLS designee, you must promptly complete the...
29 CFR 1904.42 - Requests from the Bureau of Labor Statistics for data.
Code of Federal Regulations, 2011 CFR
2011-07-01
... 29 Labor 5 2011-07-01 2011-07-01 false Requests from the Bureau of Labor Statistics for data. 1904... Statistics for data. (a) Basic requirement. If you receive a Survey of Occupational Injuries and Illnesses Form from the Bureau of Labor Statistics (BLS), or a BLS designee, you must promptly complete the...
49 CFR Schedule G to Subpart B of... - Selected Statistical Data
Code of Federal Regulations, 2013 CFR
2013-10-01
... 49 Transportation 8 2013-10-01 2013-10-01 false Selected Statistical Data G Schedule G to Subpart... Statistical Data () Greyhound Lines, Inc. () Trailways combined () All study carriers Line No. and Item (a.... (b) Other Statistics: 25Number of regulator route intercity passenger miles Sch. 9002, L. 12, col....
49 CFR Schedule G to Subpart B of... - Selected Statistical Data
Code of Federal Regulations, 2012 CFR
2012-10-01
... 49 Transportation 8 2012-10-01 2012-10-01 false Selected Statistical Data G Schedule G to Subpart... Statistical Data () Greyhound Lines, Inc. () Trailways combined () All study carriers Line No. and Item (a.... (b) Other Statistics: 25Number of regulator route intercity passenger miles Sch. 9002, L. 12, col....
49 CFR Schedule G to Subpart B of... - Selected Statistical Data
Code of Federal Regulations, 2014 CFR
2014-10-01
... 49 Transportation 8 2014-10-01 2014-10-01 false Selected Statistical Data G Schedule G to Subpart... Statistical Data () Greyhound Lines, Inc. () Trailways combined () All study carriers Line No. and Item (a.... (b) Other Statistics: 25Number of regulator route intercity passenger miles Sch. 9002, L. 12, col....
29 CFR 1904.42 - Requests from the Bureau of Labor Statistics for data.
Code of Federal Regulations, 2012 CFR
2012-07-01
... 29 Labor 5 2012-07-01 2012-07-01 false Requests from the Bureau of Labor Statistics for data. 1904... Statistics for data. (a) Basic requirement. If you receive a Survey of Occupational Injuries and Illnesses Form from the Bureau of Labor Statistics (BLS), or a BLS designee, you must promptly complete the...
29 CFR 1904.42 - Requests from the Bureau of Labor Statistics for data.
Code of Federal Regulations, 2013 CFR
2013-07-01
... 29 Labor 5 2013-07-01 2013-07-01 false Requests from the Bureau of Labor Statistics for data. 1904... Statistics for data. (a) Basic requirement. If you receive a Survey of Occupational Injuries and Illnesses Form from the Bureau of Labor Statistics (BLS), or a BLS designee, you must promptly complete the...
Van Ness, Peter H.; Fried, Terri R.; Gill, Thomas M.
2012-01-01
This article’s main objective is to demonstrate that data analysis, including quantitative data analysis, is a process of interpretation involving basic hermeneutic principles that philosophers have identified in the interpretive process as applied to other, mainly literary, creations. Such principles include a version of the hermeneutic circle, an insistence on interpretive presuppositions, and a resistance to reducing the discovery of truth to the application of inductive methods. The importance of interpretation becomes especially evident when qualitative and quantitative methods are combined in a single clinical research project and when the data being analyzed are longitudinal. Study objectives will be accomplished by showing that three major hermeneutic principles make practical methodological contributions to an insightful, illustrative mixed methods analysis of a qualitative study of changes in functional disability over time embedded in the Precipitating Events Project—a major longitudinal, quantitative study of functional disability among older persons. Mixed methods, especially as shaped by hermeneutic insights such as the importance of empathetic understanding, are potentially valuable resources for scientific investigations of the experience of aging: a practical aim of this article is to articulate and demonstrate this contention. PMID:22582035
Chagovets, Vtaliy; Kononikhin, Aleksey; Starodubtseva, Nataliia; Kostyukevich, Yury; Popov, Igor; Frankevich, Vladimir; Nikolaev, Eugene
2016-01-01
The importance of high-resolution mass spectrometry for the correct data interpretation of a direct tissue analysis is demonstrated with an example of its clinical application for an endometriosis study. Multivariate analysis of the data discovers lipid species differentially expressed in different tissues under investigation. High-resolution mass spectrometry allows unambiguous separation of peaks with close masses that correspond to proton and sodium adducts of phosphatidylcholines and to phosphatidylcholines differing in double bond number. PMID:27553733
Comparison of Grammar-Based and Statistical Language Models Trained on the Same Data
NASA Technical Reports Server (NTRS)
Hockey, Beth Ann; Rfayner, Manny
2005-01-01
This paper presents a methodologically sound comparison of the performance of grammar-based (GLM) and statistical-based (SLM) recognizer architectures using data from the Clarissa procedure navigator domain. The Regulus open source packages make this possible with a method for constructing a grammar-based language model by training on a corpus. We construct grammar-based and statistical language models from the same corpus for comparison, and find that the grammar-based language models provide better performance in this domain. The best SLM version has a semantic error rate of 9.6%, while the best GLM version has an error rate of 6.0%. Part of this advantage is accounted for by the superior WER and Sentence Error Rate (SER) of the GLM (WER 7.42% versus 6.27%, and SER 12.41% versus 9.79%). The rest is most likely accounted for by the fact that the GLM architecture is able to use logical-form-based features, which permit tighter integration of recognition and semantic interpretation.
NASA Astrophysics Data System (ADS)
Bearcock, Jenny; Lark, Murray
2016-04-01
Stream water is a key medium for regional geochemical survey. Stream water geochemical data have many potential applications, including mineral exploration, environmental monitoring and protection, catchment management and modelling potential impacts of climate or land use changes. However, stream waters are transient, and measurements are susceptible to various sources of temporal variation. In a regional geochemical survey stream water data comprise "snapshots" of the state of the medium at a sample time. For this reason the British Geological Survey (BGS) has included monitoring streams in its regional geochemical baseline surveys (GBASE) at which daily stream water samples are collected to supplement the spatial data collected in once-off sampling events. In this study we present results from spatio-temporal analysis of spatial stream water surveys and the associated monitoring stream data. We show how the interpretation of the temporal variability as a source of uncertainty depends on how the spatial data are interpreted (as estimates of a summer-time mean concentration, or as point measurements), and explore the implications of this uncertainty in the interpretation of stream water data in a regulatory context.
Interdisciplinary application and interpretation of EREP data within the Susquehanna River Basin
NASA Technical Reports Server (NTRS)
Mcmurtry, G. J.; Petersen, G. W. (Principal Investigator)
1975-01-01
The author has identified the following significant results. It has become that lineaments seen on Skylab and ERTS images are not equally well defined, and that the clarity of definition of a particular lineament is recorded somewhat differently by different interpreters. In an effort to determine the extent of these variations, a semi-quantitative classification scheme was devised. In the field, along the crest of Bald Eagle Mountain in central Pennsylvania, statistical techniques borrowed from sedimentary petrography (point counting) were used to determine the existence and location of intensely fractured float rock. Verification of Skylab and ERTS detected lineaments on aerial photography at different scales indicated that the brecciated zones appear to occur at one margin of the 1 km zone of brecciation defined as a lineament. In the Lock Haven area, comparison of the film types from the SL4 S190A sensor revealed the black and white Pan X photography to be superior in quality for general interpretation to the black and white IR film. Also, the color positive film is better for interpretation than the color IR film.
NASA Astrophysics Data System (ADS)
Watkins, Hannah; Bond, Clare; Butler, Rob
2016-04-01
Geological mapping techniques have advanced significantly in recent years from paper fieldslips to Toughbook, smartphone and tablet mapping; but how do the methods used to create a geological map affect the thought processes that result in the final map interpretation? Geological maps have many key roles in the field of geosciences including understanding geological processes and geometries in 3D, interpreting geological histories and understanding stratigraphic relationships in 2D and 3D. Here we consider the impact of the methods used to create a map on the thought processes that result in the final geological map interpretation. As mapping technology has advanced in recent years, the way in which we produce geological maps has also changed. Traditional geological mapping is undertaken using paper fieldslips, pencils and compass clinometers. The map interpretation evolves through time as data is collected. This interpretive process that results in the final geological map is often supported by recording in a field notebook, observations, ideas and alternative geological models explored with the use of sketches and evolutionary diagrams. In combination the field map and notebook can be used to challenge the map interpretation and consider its uncertainties. These uncertainties and the balance of data to interpretation are often lost in the creation of published 'fair' copy geological maps. The advent of Toughbooks, smartphones and tablets in the production of geological maps has changed the process of map creation. Digital data collection, particularly through the use of inbuilt gyrometers in phones and tablets, has changed smartphones into geological mapping tools that can be used to collect lots of geological data quickly. With GPS functionality this data is also geospatially located, assuming good GPS connectivity, and can be linked to georeferenced infield photography. In contrast line drawing, for example for lithological boundary interpretation and sketching
Heat-Passing Framework for Robust Interpretation of Data in Networks
Fang, Yi; Sun, Mengtian; Ramani, Karthik
2015-01-01
Researchers are regularly interested in interpreting the multipartite structure of data entities according to their functional relationships. Data is often heterogeneous with intricately hidden inner structure. With limited prior knowledge, researchers are likely to confront the problem of transforming this data into knowledge. We develop a new framework, called heat-passing, which exploits intrinsic similarity relationships within noisy and incomplete raw data, and constructs a meaningful map of the data. The proposed framework is able to rank, cluster, and visualize the data all at once. The novelty of this framework is derived from an analogy between the process of data interpretation and that of heat transfer, in which all data points contribute simultaneously and globally to reveal intrinsic similarities between regions of data, meaningful coordinates for embedding the data, and exemplar data points that lie at optimal positions for heat transfer. We demonstrate the effectiveness of the heat-passing framework for robustly partitioning the complex networks, analyzing the globin family of proteins and determining conformational states of macromolecules in the presence of high levels of noise. The results indicate that the methodology is able to reveal functionally consistent relationships in a robust fashion with no reference to prior knowledge. The heat-passing framework is very general and has the potential for applications to a broad range of research fields, for example, biological networks, social networks and semantic analysis of documents. PMID:25668316
Fitzgerald, Peter; Laughter, Mark D; Martyn, Rose; Richardson, Dave; Rowe, Nathan C; Pickett, Chris A; Younkin, James R; Shephard, Adam M
2010-01-01
Accountability scale data from the Global Nuclear Fuels (GNF) fuel fabrication facility in Wilmington, NC has been collected and analyzed as a part of the Cylinder Accountability and Tracking System (CATS) field trial in 2009. The purpose of the data collection was to demonstrate an authentication method for safeguards applications, and the use of load cell data in cylinder accountability. The scale data was acquired using a commercial off-the-shelf communication server with authentication and encryption capabilities. The authenticated weight data was then analyzed to determine facility operating activities. The data allowed for the determination of the number of full and empty cylinders weighed and the respective weights along with other operational activities. Data authentication concepts, practices and methods, the details of the GNF weight data authentication implementation and scale data interpretation results will be presented.
Effects of a Prior Virtual Experience on Students' Interpretations of Real Data
NASA Astrophysics Data System (ADS)
Chini, Jacquelyn J.; Carmichael, Adrian; Gire, Elizabeth; Rebello, N. Sanjay; Puntambekar, Sadhana
2010-10-01
Our previous work has shown that experimentation with virtual manipulatives supports students' conceptual learning about simple machines differently than experimentation with physical manipulatives [1]. This difference could be due to the "messiness" of physical data from factors such as dissipative effects and measurement uncertainty. In this study, we ask whether the prior experience of performing a virtual experiment affects how students interpret the data from a physical experiment. Students enrolled in a conceptual-based physics laboratory used a hypertext system to explore the science concepts related to simple machines and performed physical and virtual experiments to learn about pulleys and inclined planes. Approximately half of the students performed the physical experiments before the virtual experiments and the other half completed the virtual experiments first. We find that using virtual manipulatives before physical manipulatives may promote an interpretation of physical data that is more productive for conceptual learning.
ERIC Educational Resources Information Center
Carter, Jackie; Noble, Susan; Russell, Andrew; Swanson, Eric
2011-01-01
Increasing volumes of statistical data are being made available on the open web, including from the World Bank. This "data deluge" provides both opportunities and challenges. Good use of these data requires statistical literacy. This paper presents results from a project that set out to better understand how socioeconomic secondary data are being…
Graphical arterial blood gas visualization tool supports rapid and accurate data interpretation.
Doig, Alexa K; Albert, Robert W; Syroid, Noah D; Moon, Shaun; Agutter, Jim A
2011-04-01
A visualization tool that integrates numeric information from an arterial blood gas report with novel graphics was designed for the purpose of promoting rapid and accurate interpretation of acid-base data. A study compared data interpretation performance when arterial blood gas results were presented in a traditional numerical list versus the graphical visualization tool. Critical-care nurses (n = 15) and nursing students (n = 15) were significantly more accurate identifying acid-base states and assessing trends in acid-base data when using the graphical visualization tool. Critical-care nurses and nursing students using traditional numerical data had an average accuracy of 69% and 74%, respectively. Using the visualization tool, average accuracy improved to 83% for critical-care nurses and 93% for nursing students. Analysis of response times demonstrated that the visualization tool might help nurses overcome the "speed/accuracy trade-off" during high-stress situations when rapid decisions must be rendered. Perceived mental workload was significantly reduced for nursing students when they used the graphical visualization tool. In this study, the effects of implementing the graphical visualization were greater for nursing students than for critical-care nurses, which may indicate that the experienced nurses needed more training and use of the new technology prior to testing to show similar gains. Results of the objective and subjective evaluations support the integration of this graphical visualization tool into clinical environments that require accurate and timely interpretation of arterial blood gas data.
Contributions to Statistical Problems Related to Microarray Data
ERIC Educational Resources Information Center
Hong, Feng
2009-01-01
Microarray is a high throughput technology to measure the gene expression. Analysis of microarray data brings many interesting and challenging problems. This thesis consists three studies related to microarray data. First, we propose a Bayesian model for microarray data and use Bayes Factors to identify differentially expressed genes. Second, we…
A Novel Approach to Asynchronous MVP Data Interpretation Based on Elliptical-Vectors
NASA Astrophysics Data System (ADS)
Kruglyakov, M.; Trofimov, I.; Korotaev, S.; Shneyer, V.; Popova, I.; Orekhova, D.; Scshors, Y.; Zhdanov, M. S.
2014-12-01
We suggest a novel approach to asynchronous magnetic-variation profiling (MVP) data interpretation. Standard method in MVP is based on the interpretation of the coefficients of linear relation between vertical and horizontal components of the measured magnetic field.From mathematical point of view this pair of linear coefficients is not a vector which leads to significant difficulties in asynchronous data interpretation. Our approach allows us to actually treat such a pair of complex numbers as a special vector called an ellipse-vector (EV). By choosing the particular definitions of complex length and direction, the basic relation of MVP can be considered as the dot product. This considerably simplifies the interpretation of asynchronous data. The EV is described by four real numbers: the values of major and minor semiaxes, the angular direction of the major semiaxis and the phase. The notation choice is motivated by historical reasons. It is important that different EV's components have different sensitivity with respect to the field sources and the local heterogeneities. Namely, the value of major semiaxis and the angular direction are mostly determined by the field source and the normal cross-section. On the other hand, the value of minor semiaxis and the phase are responsive to local heterogeneities. Since the EV is the general form of complex vector, the traditional Schmucker vectors can be explicitly expressed through its components.The proposed approach was successfully applied to interpretation the results of asynchronous measurements that had been obtained in the Arctic Ocean at the drift stations "North Pole" in 1962-1976.
Simulations of Statistical Model Fits to RHIC Data
NASA Astrophysics Data System (ADS)
Llope, W. J.
2013-04-01
The application of statistical model fits to experimentally measured particle multiplicity ratios allows inferences of the average values of temperatures, T, baryochemical potentials, μB, and other quantities at chemical freeze-out. The location of the boundary between the hadronic and partonic regions in the (μB,T) phase diagram, and the possible existence of a critical point, remains largely speculative. The search for a critical point using the moments of the particle multiplicity distributions in tightly centrality constrained event samples makes the tacit assumption that the variances in the (μB,T) values in these samples is sufficiently small to tightly localize the events in the phase diagram. This and other aspects were explored in simulations by coupling the UrQMD transport model to the statistical model code Thermus. The phase diagram trajectories of individual events versus the time in fm/c was calculated versus the centrality and beam energy. The variances of the (μB,T) values at freeze-out, even in narrow centrality bins, are seen to be relatively large. This suggests that a new way to constrain the events on the phase diagram may lead to more sensitive searches for the possible critical point.
Spatial Statistical Data Fusion for Remote Sensing Applications
NASA Technical Reports Server (NTRS)
Nguyen, Hai
2010-01-01
Data fusion is the process of combining information from heterogeneous sources into a single composite picture of the relevant process, such that the composite picture is generally more accurate and complete than that derived from any single source alone. Data collection is often incomplete, sparse, and yields incompatible information. Fusion techniques can make optimal use of such data. When investment in data collection is high, fusion gives the best return. Our study uses data from two satellites: (1) Multiangle Imaging SpectroRadiometer (MISR), (2) Moderate Resolution Imaging Spectroradiometer (MODIS).
Keenan, Michael R; Smentkowski, Vincent S; Ulfig, Robert M; Oltman, Edward; Larson, David J; Kelly, Thomas F
2011-06-01
We demonstrate for the first time that multivariate statistical analysis techniques can be applied to atom probe tomography data to estimate the chemical composition of a sample at the full spatial resolution of the atom probe in three dimensions. Whereas the raw atom probe data provide the specific identity of an atom at a precise location, the multivariate results can be interpreted in terms of the probabilities that an atom representing a particular chemical phase is situated there. When aggregated to the size scale of a single atom (∼0.2 nm), atom probe spectral-image datasets are huge and extremely sparse. In fact, the average spectrum will have somewhat less than one total count per spectrum due to imperfect detection efficiency. These conditions, under which the variance in the data is completely dominated by counting noise, test the limits of multivariate analysis, and an extensive discussion of how to extract the chemical information is presented. Efficient numerical approaches to performing principal component analysis (PCA) on these datasets, which may number hundreds of millions of individual spectra, are put forward, and it is shown that PCA can be computed in a few seconds on a typical laptop computer.
Waldman, Irwin D; Lilienfeld, Scott O
2016-03-01
We comment on Sijtsma's (2014) thought-provoking essay on how to minimize questionable research practices (QRPs) in psychology. We agree with Sijtsma that proactive measures to decrease the risk of QRPs will ultimately be more productive than efforts to target individual researchers and their work. In particular, we concur that encouraging researchers to make their data and research materials public is the best institutional antidote against QRPs, although we are concerned that Sijtsma's proposal to delegate more responsibility to statistical and methodological consultants could inadvertently reinforce the dichotomy between the substantive and statistical aspects of research. We also discuss sources of false-positive findings and replication failures in psychological research, and outline potential remedies for these problems. We conclude that replicability is the best metric of the minimization of QRPs and their adverse effects on psychological research.
NASA Astrophysics Data System (ADS)
Helbert, J.; D'Amore, M.; Maturilli, A.; Izenberg, N. R.; Klima, R. L.; Holsclaw, G. M.; McClintock, W. E.; Sprague, A. L.; Vilas, F.; Domingue, D. L.; D'Incecco, P.; Head, J. W.; Gillis-Davis, J. J.; Solomon, S. C.
2011-12-01
We assess compositional heterogeneity on the surface of Mercury with data from MESSENGER'S Mercury Atmospheric and Surface Composition Spectrometer (MASCS). The data were obtained during the spacecraft's early orbital phase and cover nearly a pole-to-pole portion of the planet from about 0° to -45°E longitude. Under the hypothesis that surface compositional information can be efficiently derived from spectral reflectance measurements with the use of statistical techniques, we have employed principal component and clustering analyses to identify and characterize spectral units from observations by MASCS. This method proved successful with the interpretation of MASCS data obtained during MESSENGER's Mercury flybys despite the absence of a photometric correction. The statistical technique allows the extraction of underlying relationships among compositional units. We were able to cluster surface observations into distinct classes that correspond well to geomorphological units identified from MESSENGER images, such as plains and heavily cratered terrain. We also identified areas where the geometry of spectral observations matches that of the biconical reflectance attachment used at the DLR Planetary Emissivity Laboratory (PEL). For comparison with spectra from these areas we obtained spectra for a wide range of candidate minerals. The minerals were thermally processed in the Mercury simulation chamber at PEL, by heating them to Mercury peak temperatures under vacuum conditions. This procedure is a first step in the development at PEL of a capability to measure the near-infrared spectra directly at Mercury temperatures. Although such thermal processing cannot capture all spectral changes induced by the high temperatures, it allows an assessment of some of the effects of exposure to Mercury's harsh environment, providing a more realistic comparison than with standard unheated terrestrial minerals. We are therefore able to make inferences on possible mineralogical
Czaplewski, Raymond L.
2015-01-01
Wall-to-wall remotely sensed data are increasingly available to monitor landscape dynamics over large geographic areas. However, statistical monitoring programs that use post-stratification cannot fully utilize those sensor data. The Kalman filter (KF) is an alternative statistical estimator. I develop a new KF algorithm that is numerically robust with large numbers of study variables and auxiliary sensor variables. A National Forest Inventory (NFI) illustrates application within an official statistics program. Practical recommendations regarding remote sensing and statistical issues are offered. This algorithm has the potential to increase the value of synoptic sensor data for statistical monitoring of large geographic areas. PMID:26393588
Czaplewski, Raymond L
2015-09-17
Wall-to-wall remotely sensed data are increasingly available to monitor landscape dynamics over large geographic areas. However, statistical monitoring programs that use post-stratification cannot fully utilize those sensor data. The Kalman filter (KF) is an alternative statistical estimator. I develop a new KF algorithm that is numerically robust with large numbers of study variables and auxiliary sensor variables. A National Forest Inventory (NFI) illustrates application within an official statistics program. Practical recommendations regarding remote sensing and statistical issues are offered. This algorithm has the potential to increase the value of synoptic sensor data for statistical monitoring of large geographic areas.
Presentation and interpretation of food intake data: factors affecting comparability across studies.
Faber, Mieke; Wenhold, Friede A M; Macintyre, Una E; Wentzel-Viljoen, Edelweiss; Steyn, Nelia P; Oldewage-Theron, Wilna H
2013-01-01
Non-uniform, unclear, or incomplete presentation of food intake data limits interpretation, usefulness, and comparisons across studies. In this contribution, we discuss factors affecting uniform reporting of food intake across studies. The amount of food eaten can be reported as mean portion size, number of servings or total amount of food consumed per day; the absolute intake value for the specific study depends on the denominator used because food intake data can be presented as per capita intake or for consumers only. To identify the foods mostly consumed, foods are reported and ranked according to total number of times consumed, number of consumers, total intake, or nutrient contribution by individual foods or food groups. Presentation of food intake data primarily depends on a study's aim; reported data thus often are not comparable across studies. Food intake data further depend on the dietary assessment methodology used and foods in the database consulted; and are influenced by the inherent limitations of all dietary assessments. Intake data can be presented as either single foods or as clearly defined food groups. Mixed dishes, reported as such or in terms of ingredients and items added during food preparation remain challenging. Comparable presentation of food consumption data is not always possible; presenting sufficient information will assist valid interpretation and optimal use of the presented data. A checklist was developed to strengthen the reporting of food intake data in science communication.
Multivariate statistical analysis as a tool for the segmentation of 3D spectral data.
Lucas, G; Burdet, P; Cantoni, M; Hébert, C
2013-01-01
Acquisition of three-dimensional (3D) spectral data is nowadays common using many different microanalytical techniques. In order to proceed to the 3D reconstruction, data processing is necessary not only to deal with noisy acquisitions but also to segment the data in term of chemical composition. In this article, we demonstrate the value of multivariate statistical analysis (MSA) methods for this purpose, allowing fast and reliable results. Using scanning electron microscopy (SEM) and energy-dispersive X-ray spectroscopy (EDX) coupled with a focused ion beam (FIB), a stack of spectrum images have been acquired on a sample produced by laser welding of a nickel-titanium wire and a stainless steel wire presenting a complex microstructure. These data have been analyzed using principal component analysis (PCA) and factor rotations. PCA allows to significantly improve the overall quality of the data, but produces abstract components. Here it is shown that rotated components can be used without prior knowledge of the sample to help the interpretation of the data, obtaining quickly qualitative mappings representative of elements or compounds found in the material. Such abundance maps can then be used to plot scatter diagrams and interactively identify the different domains in presence by defining clusters of voxels having similar compositions. Identified voxels are advantageously overlaid on secondary electron (SE) images with higher resolution in order to refine the segmentation. The 3D reconstruction can then be performed using available commercial softwares on the basis of the provided segmentation. To asses the quality of the segmentation, the results have been compared to an EDX quantification performed on the same data. PMID:24035679
Pomeau, Yves; Louët, Sabine
2016-06-01
During the StatPhys Conference on 20th July 2016 in Lyon, France, Yves Pomeau and Daan Frenkel will be awarded the most important prize in the field of Statistical Mechanics: the 2016 Boltzmann Medal, named after the Austrian physicist and philosopher Ludwig Boltzmann. The award recognises Pomeau's key contributions to the Statistical Physics of non-equilibrium phenomena in general. And, in particular, for developing our modern understanding of fluid mechanics, instabilities, pattern formation and chaos. He is recognised as an outstanding theorist bridging disciplines from applied mathematics to statistical physics with a profound impact on the neighbouring fields of turbulence and mechanics. In the article Sabine Louët interviews Pomeau, who is an Editor for the European Physical Journal Special Topics. He shares his views and tells how he experienced the rise of Statistical Mechanics in the past few decades. He also touches upon the need to provide funding to people who have the rare ability to discover new things and ideas, and not just those who are good at filling in grant application forms. PMID:27349556
Pomeau, Yves; Louët, Sabine
2016-06-01
During the StatPhys Conference on 20th July 2016 in Lyon, France, Yves Pomeau and Daan Frenkel will be awarded the most important prize in the field of Statistical Mechanics: the 2016 Boltzmann Medal, named after the Austrian physicist and philosopher Ludwig Boltzmann. The award recognises Pomeau's key contributions to the Statistical Physics of non-equilibrium phenomena in general. And, in particular, for developing our modern understanding of fluid mechanics, instabilities, pattern formation and chaos. He is recognised as an outstanding theorist bridging disciplines from applied mathematics to statistical physics with a profound impact on the neighbouring fields of turbulence and mechanics. In the article Sabine Louët interviews Pomeau, who is an Editor for the European Physical Journal Special Topics. He shares his views and tells how he experienced the rise of Statistical Mechanics in the past few decades. He also touches upon the need to provide funding to people who have the rare ability to discover new things and ideas, and not just those who are good at filling in grant application forms.
NASA Astrophysics Data System (ADS)
Ivanova, A.; Lueth, S.
2015-12-01
Petrophysical investigations for CCS concern relationships between physical properties of rocks and geophysical observations for understanding behavior of injected CO2 in a geological formation. In turn 4D seismic surveying is a proven tool for CO2 monitoring. At the Ketzin pilot site (Germany) 4D seismic data have been acquired by means of a baseline (pre-injection) survey in 2005 and monitor surveys in 2009 and 2012. At Ketzin CO2 was injected in supercritical state from 2008 to 2013 in a sandstone saline aquifer (Stuttgart Formation) at a depth of about 650 m. The 4D seismic data from Ketzin reflected a pronounced effect of this injection. Seismic forward modeling using results of petrophysical experiments on two core samples fromthe target reservoir confirmed that effects of the injected CO2 on the 4D seismic data are significant. The petrophysical data were used in that modeling in order to reflect changes due to the CO2 injection in acoustic parameters of the reservoir. These petrophysical data were further used for a successful quantitative interpretation of the 4D seismic data at Ketzin. Now logs from a well (drilled in 2012) penetrating the reservoir containing information about changes in the acoustic parameters of the reservoir due to the CO2 injection are available. These logs were used to estimate impact of the petrophysical data on the qualitative and quantitative interpretation of the 4D seismic data at Ketzin. New synthetic seismograms were computed using the same software and the same wavelet as the old ones apart from the only difference and namely the changes in the input acoustic parameters would not be affected with any petrophysical experiments anymore. Now these changes were put in computing directly from the logs. In turn the new modelled changes due to the injection in the newly computed seismograms do not include any effects of the petrophysical data anymore. Key steps of the quantitative and qualitative interpretation of the 4D seismic
NASA Astrophysics Data System (ADS)
Yuan, Y.
2015-12-01
Boundary identification is a requested task in the interpretation of potential-field data, which has been widely used as a tool in exploration technologies for mineral resources. The main geological edges are fault lines and the borders of geological or rock bodies of different density, magnetic nature, and so on. Gravity gradient tensor data have been widely used in geophysical exploration for its large amount of information and containing higher frequency signals than gravity data, which can be used to delineate small scale anomalies. Therefore, combining multiple components of gradient tensor data to interpret gravity gradient tensor data is a challenge. This needs to develop new edge detector to process the gravity gradient tensor data. In order to make use of multiple components information, we first define directional total horizontal derivatives and enhanced directional total horizontal derivatives and use them to define new edge detectors. In order to display the edges of different amplitudes anomalies simultaneously, we present a normalization method. These methods have been tested on synthetic data to verify that the new methods can delineate the edges of different amplitude anomalies clearly and avoid bringing additional false edges when anomalies contain both positive and negative anomalies. Finally, we apply these methods to real full gravity gradient tensor data in St. Georges Bay, Canada, which get well results.
Morse, V.C.; Johnson, J.H.; Crittenden, J.L.; Anderson, T.D.
1986-05-01
There are successes and failures in recording and interpreting a single seismic line across the South Owl Creek Mountain fault on the west flank of the Casper arch. Information obtained from this type of work should help explorationists who are exploring structurally complex areas. A depth cross section lacks a subthrust prospect, but is illustrated to show that the South Owl Creek Mountain fault is steeper with less apparent displacement than in areas to the north. This cross section is derived from two-dimensional seismic modeling, using data processing methods specifically for modeling. A flat horizon and balancing technique helps confirm model accuracy. High-quality data were acquired using specifically designed seismic field parameters. The authors concluded that the methodology used is valid, and an interactive modeling program in addition to cross-line control can improve seismic interpretations in structurally complex areas.
Analysis and interpretation of microplate-based oxygen consumption and pH data.
Divakaruni, Ajit S; Paradyse, Alexander; Ferrick, David A; Murphy, Anne N; Jastroch, Martin
2014-01-01
Breakthrough technologies to measure cellular oxygen consumption and proton efflux are reigniting the study of cellular energetics by increasing the scope and pace with which discoveries are made. As we learn the variation in metabolism between cell types is large, it is helpful to continually provide additional perspectives and update our roadmap for data interpretation. In that spirit, this chapter provides the following for those conducting microplate-based oxygen consumption experiments: (i) a description of the standard parameters for measuring respiration in intact cells, (ii) a framework for data analysis and normalization, and (iii) examples of measuring respiration in permeabilized cells to follow up results observed with intact cells. Additionally, rate-based measurements of extracellular pH are increasingly used as a qualitative indicator of glycolytic flux. As a resource to help interpret these measurements, this chapter also provides a detailed accounting of proton production during glucose oxidation in the context of plate-based assays. PMID:25416364
Analysis and interpretation of microplate-based oxygen consumption and pH data.
Divakaruni, Ajit S; Paradyse, Alexander; Ferrick, David A; Murphy, Anne N; Jastroch, Martin
2014-01-01
Breakthrough technologies to measure cellular oxygen consumption and proton efflux are reigniting the study of cellular energetics by increasing the scope and pace with which discoveries are made. As we learn the variation in metabolism between cell types is large, it is helpful to continually provide additional perspectives and update our roadmap for data interpretation. In that spirit, this chapter provides the following for those conducting microplate-based oxygen consumption experiments: (i) a description of the standard parameters for measuring respiration in intact cells, (ii) a framework for data analysis and normalization, and (iii) examples of measuring respiration in permeabilized cells to follow up results observed with intact cells. Additionally, rate-based measurements of extracellular pH are increasingly used as a qualitative indicator of glycolytic flux. As a resource to help interpret these measurements, this chapter also provides a detailed accounting of proton production during glucose oxidation in the context of plate-based assays.
Air pollutant interactions with vegetation: research needs in data acquisition and interpretation
Lindberg, S. E.; McLauglin, S. B.
1980-01-01
The objective of this discussion is to consider problems involved in the acquisition, interpretation, and application of data collected in studies of air pollutant interactions with the terrestrial environment. Emphasis will be placed on a critical evaluation of current deficiencies and future research needs by addressing the following questions: (1) which pollutants are either sufficiently toxic, pervasive, or persistent to warrant the expense of monitoring and effects research; (2) what are the interactions of multiple pollutants during deposition and how do these influence toxicity; (3) how de we collect, report, and interpret deposition and air quality data to ensure its maximum utility in assessment of potential regional environmental effects; (4) what processes do we study, and how are they measured to most efficiently describe the relationship between air quality dose and ultimate impacts on terrestrial ecosystems; and (5) how do we integrate site-specific studies into regional estimates of present and potential environmental degradation (or benefit).
Interpretation of Lidar and Satellite Data Sets Using a Global Photochemical Model
NASA Technical Reports Server (NTRS)
Zenker, Thomas; Chyba, Thomas
1999-01-01
A primary goal of the NASA Tropospheric Chemistry Program (TCP) is to "contribute substantially to scientific understanding of human impacts on the global troposphere". In order to analyze global or regional trends and factors of the troposphere chemistry, for example, its oxidation capacity or composition, a continuous global/regional data coverage as well as model simulations are needed. The Global Tropospheric Experiment (GTE), a major component of the TCP, provides data vital to these questions via aircraft measurement of key trace chemical species in various remote regions of the world. Another component in NASA's effort are satellite projects for exploration of tropospheric chemistry and dynamics. A unique data product is the Tropospheric Ozone Residual (TOR) utilizing global tropospheric ozone data. Another key research tool are simulation studies of atmospheric chemistry and dynamics for the theoretical understanding of the atmosphere, the extrapolation of observed trends, and for sensitivity studies assessing a changing anthropogenic impact to air chemistry and climate. In the context with model simulations, field data derived from satellites or (airborne) field missions are needed for two purposes: 1. To initialize and validate model simulations, and 2., to interpret field data by comparison to model simulation results in order to analyze global or regional trends and deviations from standard tropospheric chemistry and transport conditions as defined by the simulations. Currently, there is neither a sufficient global data coverage available nor are existing well established global circulation models. The NASA LARC CTM model is currently not yet in a state to accomplish a sufficient tropospheric chemistry simulation, so that the current research under this cooperative agreement focuses on utilizing field data products for direct interpretation. They will be also available for model testing and a later interpretation with a finally utilized model.
Bias and Sensitivity in the Placement of Fossil Taxa Resulting from Interpretations of Missing Data
Sansom, Robert S.
2015-01-01
The utility of fossils in evolutionary contexts is dependent on their accurate placement in phylogenetic frameworks, yet intrinsic and widespread missing data make this problematic. The complex taphonomic processes occurring during fossilization can make it difficult to distinguish absence from non-preservation, especially in the case of exceptionally preserved soft-tissue fossils: is a particular morphological character (e.g., appendage, tentacle, or nerve) missing from a fossil because it was never there (phylogenetic absence), or just happened to not be preserved (taphonomic loss)? Missing data have not been tested in the context of interpretation of non-present anatomy nor in the context of directional shifts and biases in affinity. Here, complete taxa, both simulated and empirical, are subjected to data loss through the replacement of present entries (1s) with either missing (?s) or absent (0s) entries. Both cause taxa to drift down trees, from their original position, toward the root. Absolute thresholds at which downshift is significant are extremely low for introduced absences (two entries replaced, 6% of present characters). The opposite threshold in empirical fossil taxa is also found to be low; two absent entries replaced with presences causes fossil taxa to drift up trees. As such, only a few instances of non-preserved characters interpreted as absences will cause fossil organisms to be erroneously interpreted as more primitive than they were in life. This observed sensitivity to coding non-present morphology presents a problem for all evolutionary studies that attempt to use fossils to reconstruct rates of evolution or unlock sequences of morphological change. Stem-ward slippage, whereby fossilization processes cause organisms to appear artificially primitive, appears to be a ubiquitous and problematic phenomenon inherent to missing data, even when no decay biases exist. Absent characters therefore require explicit justification and taphonomic
Interpretation and display of the NURE data base using computer graphics
Koller, G R
1980-01-01
Computer graphics not only is an integral part of data reduction and interpretation, it is also a fundamental aid in the planning and forecasting of the National Uranium Resource Evaluation program at Savannah River Laboratory. Computer graphics not only allows more rapid execution of tasks which could be performed manually, but also presents scientists with new capabilities which would be exceedingly impractical to apply were it not for the application of computer graphics to a problem.
Bias and sensitivity in the placement of fossil taxa resulting from interpretations of missing data.
Sansom, Robert S
2015-03-01
The utility of fossils in evolutionary contexts is dependent on their accurate placement in phylogenetic frameworks, yet intrinsic and widespread missing data make this problematic. The complex taphonomic processes occurring during fossilization can make it difficult to distinguish absence from non-preservation, especially in the case of exceptionally preserved soft-tissue fossils: is a particular morphological character (e.g., appendage, tentacle, or nerve) missing from a fossil because it was never there (phylogenetic absence), or just happened to not be preserved (taphonomic loss)? Missing data have not been tested in the context of interpretation of non-present anatomy nor in the context of directional shifts and biases in affinity. Here, complete taxa, both simulated and empirical, are subjected to data loss through the replacement of present entries (1s) with either missing (?s) or absent (0s) entries. Both cause taxa to drift down trees, from their original position, toward the root. Absolute thresholds at which downshift is significant are extremely low for introduced absences (two entries replaced, 6% of present characters). The opposite threshold in empirical fossil taxa is also found to be low; two absent entries replaced with presences causes fossil taxa to drift up trees. As such, only a few instances of non-preserved characters interpreted as absences will cause fossil organisms to be erroneously interpreted as more primitive than they were in life. This observed sensitivity to coding non-present morphology presents a problem for all evolutionary studies that attempt to use fossils to reconstruct rates of evolution or unlock sequences of morphological change. Stem-ward slippage, whereby fossilization processes cause organisms to appear artificially primitive, appears to be a ubiquitous and problematic phenomenon inherent to missing data, even when no decay biases exist. Absent characters therefore require explicit justification and taphonomic
Tang, Qi-Yi; Zhang, Chuan-Xi
2013-04-01
A comprehensive but simple-to-use software package called DPS (Data Processing System) has been developed to execute a range of standard numerical analyses and operations used in experimental design, statistics and data mining. This program runs on standard Windows computers. Many of the functions are specific to entomological and other biological research and are not found in standard statistical software. This paper presents applications of DPS to experimental design, statistical analysis and data mining in entomology.
Linked Micromaps: Statistical Summaries of Aquatic Monitoring Data in a Spatial Context
Communicating summaries of spatial data to decision makers and the public is challenging. Linked micromaps provide a way to simultaneously present geographic context and statistical summaries of data. Monitoring data collected over areal units, such as watersheds or ecoregions,...
The Health of Children--1970: Selected Data From the National Center for Health Statistics.
ERIC Educational Resources Information Center
National Center for Health Statistics (DHEW/PHS), Hyattsville, MD.
In this booklet, charts and graphs present data from four divisions of the National Center for Health Statistics. The divisions represented are those concerned with vital statistics (births, deaths, fetal deaths, marriages and divorces); health interview statistics (information on health and demographic factors related to illness); health…
Statistical examination of climatological data relevant to global temperature variation
Gray, H.L.; Gunst, R.F.; Woodward, W.A.
1992-01-01
The research group at Southern Methodist University has been involved in the examination of climatological data as specified in the proposal. Our efforts have resulted in three papers which have been submitted to scholarly journals, as well as several other projects which should be completed either during the next six months or next year. In the following, we discuss our results to date along with projected progress within the next six months. Major topics discussed in this progress report include: testing for trend in the global temperature data; (2) defining and estimating mean global temperature change; and, (3) the effect of initial conditions on autoregressive models for global temperature data.