statistical analysis approach: Topics by Science.gov

Sample records for statistical analysis approach

How to interpret the results of medical time series data analysis: Classical statistical approaches versus dynamic Bayesian network modeling.

PubMed

Onisko, Agnieszka; Druzdzel, Marek J; Austin, R Marshall

2016-01-01

Classical statistics is a well-established approach in the analysis of medical data. While the medical community seems to be familiar with the concept of a statistical analysis and its interpretation, the Bayesian approach, argued by many of its proponents to be superior to the classical frequentist approach, is still not well-recognized in the analysis of medical data. The goal of this study is to encourage data analysts to use the Bayesian approach, such as modeling with graphical probabilistic networks, as an insightful alternative to classical statistical analysis of medical data. This paper offers a comparison of two approaches to analysis of medical time series data: (1) classical statistical approach, such as the Kaplan-Meier estimator and the Cox proportional hazards regression model, and (2) dynamic Bayesian network modeling. Our comparison is based on time series cervical cancer screening data collected at Magee-Womens Hospital, University of Pittsburgh Medical Center over 10 years. The main outcomes of our comparison are cervical cancer risk assessments produced by the three approaches. However, our analysis discusses also several aspects of the comparison, such as modeling assumptions, model building, dealing with incomplete data, individualized risk assessment, results interpretation, and model validation. Our study shows that the Bayesian approach is (1) much more flexible in terms of modeling effort, and (2) it offers an individualized risk assessment, which is more cumbersome for classical statistical approaches.
On Conceptual Analysis as the Primary Qualitative Approach to Statistics Education Research in Psychology

ERIC Educational Resources Information Center

Petocz, Agnes; Newbery, Glenn

2010-01-01

Statistics education in psychology often falls disappointingly short of its goals. The increasing use of qualitative approaches in statistics education research has extended and enriched our understanding of statistical cognition processes, and thus facilitated improvements in statistical education and practices. Yet conceptual analysis, a…
Propensity Score Analysis: An Alternative Statistical Approach for HRD Researchers

ERIC Educational Resources Information Center

Keiffer, Greggory L.; Lane, Forrest C.

2016-01-01

Purpose: This paper aims to introduce matching in propensity score analysis (PSA) as an alternative statistical approach for researchers looking to make causal inferences using intact groups. Design/methodology/approach: An illustrative example demonstrated the varying results of analysis of variance, analysis of covariance and PSA on a heuristic…
Meta-Analysis for Primary and Secondary Data Analysis: The Super-Experiment Metaphor.

ERIC Educational Resources Information Center

Jackson, Sally

1991-01-01

Considers the relation between meta-analysis statistics and analysis of variance statistics. Discusses advantages and disadvantages as a primary data analysis tool. Argues that the two approaches are partial paraphrases of one another. Advocates an integrative approach that introduces the best of meta-analytic thinking into primary analysis…
Active Structural Acoustic Control as an Approach to Acoustic Optimization of Lightweight Structures

DTIC Science & Technology

2001-06-01

appropriate approach based on Statistical Energy Analysis (SEA) would facilitate investigations of the structural behavior at a high modal density. On the way...higher frequency investigations an approach based on the Statistical Energy Analysis (SEA) is recommended to describe the structural dynamic behavior
Compositional Solution Space Quantification for Probabilistic Software Analysis

NASA Technical Reports Server (NTRS)

Borges, Mateus; Pasareanu, Corina S.; Filieri, Antonio; d'Amorim, Marcelo; Visser, Willem

2014-01-01

Probabilistic software analysis aims at quantifying how likely a target event is to occur during program execution. Current approaches rely on symbolic execution to identify the conditions to reach the target event and try to quantify the fraction of the input domain satisfying these conditions. Precise quantification is usually limited to linear constraints, while only approximate solutions can be provided in general through statistical approaches. However, statistical approaches may fail to converge to an acceptable accuracy within a reasonable time. We present a compositional statistical approach for the efficient quantification of solution spaces for arbitrarily complex constraints over bounded floating-point domains. The approach leverages interval constraint propagation to improve the accuracy of the estimation by focusing the sampling on the regions of the input domain containing the sought solutions. Preliminary experiments show significant improvement on previous approaches both in results accuracy and analysis time.
CADDIS Volume 4. Data Analysis: Selecting an Analysis Approach

EPA Pesticide Factsheets

An approach for selecting statistical analyses to inform causal analysis. Describes methods for determining whether test site conditions differ from reference expectations. Describes an approach for estimating stressor-response relationships.
Implementation of Statistics Textbook Support with ICT and Portfolio Assessment Approach to Improve Students Teacher Mathematical Connection Skills

NASA Astrophysics Data System (ADS)

Hendikawati, P.; Dewi, N. R.

2017-04-01

Statistics needed for use in the data analysis process and had a comprehensive implementation in daily life so that students must master the well statistical material. The use of Statistics textbook support with ICT and portfolio assessment approach was expected to help the students to improve mathematical connection skills. The subject of this research was 30 student teachers who take Statistics courses. The results of this research are the use of Statistics textbook support with ICT and portfolio assessment approach can improve students mathematical connection skills.
Determination of apparent coupling factors for adhesive bonded acrylic plates using SEAL approach

NASA Astrophysics Data System (ADS)

Pankaj, Achuthan. C.; Shivaprasad, M. V.; Murigendrappa, S. M.

2018-04-01

Apparent coupling loss factors (CLF) and velocity responses has been computed for two lap joined adhesive bonded plates using finite element and experimental statistical energy analysis like approach. A finite element model of the plates has been created using ANSYS software. The statistical energy parameters have been computed using the velocity responses obtained from a harmonic forced excitation analysis. Experiments have been carried out for two different cases of adhesive bonded joints and the results have been compared with the apparent coupling factors and velocity responses obtained from finite element analysis. The results obtained from the studies signify the importance of modeling of adhesive bonded joints in computation of the apparent coupling factors and its further use in computation of energies and velocity responses using statistical energy analysis like approach.
Energy-density field approach for low- and medium-frequency vibroacoustic analysis of complex structures using a statistical computational model

NASA Astrophysics Data System (ADS)

Kassem, M.; Soize, C.; Gagliardini, L.

2009-06-01

In this paper, an energy-density field approach applied to the vibroacoustic analysis of complex industrial structures in the low- and medium-frequency ranges is presented. This approach uses a statistical computational model. The analyzed system consists of an automotive vehicle structure coupled with its internal acoustic cavity. The objective of this paper is to make use of the statistical properties of the frequency response functions of the vibroacoustic system observed from previous experimental and numerical work. The frequency response functions are expressed in terms of a dimensionless matrix which is estimated using the proposed energy approach. Using this dimensionless matrix, a simplified vibroacoustic model is proposed.
Statistical Approaches to Adjusting Weights for Dependent Arms in Network Meta-analysis.

PubMed

Su, Yu-Xuan; Tu, Yu-Kang

2018-05-22

Network meta-analysis compares multiple treatments in terms of their efficacy and harm by including evidence from randomized controlled trials. Most clinical trials use parallel design, where patients are randomly allocated to different treatments and receive only one treatment. However, some trials use within person designs such as split-body, split-mouth and cross-over designs, where each patient may receive more than one treatment. Data from treatment arms within these trials are no longer independent, so the correlations between dependent arms need to be accounted for within the statistical analyses. Ignoring these correlations may result in incorrect conclusions. The main objective of this study is to develop statistical approaches to adjusting weights for dependent arms within special design trials. In this study, we demonstrate the following three approaches: the data augmentation approach, the adjusting variance approach, and the reducing weight approach. These three methods could be perfectly applied in current statistic tools such as R and STATA. An example of periodontal regeneration was used to demonstrate how these approaches could be undertaken and implemented within statistical software packages, and to compare results from different approaches. The adjusting variance approach can be implemented within the network package in STATA, while reducing weight approach requires computer software programming to set up the within-study variance-covariance matrix. This article is protected by copyright. All rights reserved.
Scaling up to address data science challenges

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wendelberger, Joanne R.

Statistics and Data Science provide a variety of perspectives and technical approaches for exploring and understanding Big Data. Partnerships between scientists from different fields such as statistics, machine learning, computer science, and applied mathematics can lead to innovative approaches for addressing problems involving increasingly large amounts of data in a rigorous and effective manner that takes advantage of advances in computing. Here, this article will explore various challenges in Data Science and will highlight statistical approaches that can facilitate analysis of large-scale data including sampling and data reduction methods, techniques for effective analysis and visualization of large-scale simulations, and algorithmsmore » and procedures for efficient processing.« less
Scaling up to address data science challenges

DOE PAGES

Wendelberger, Joanne R.

2017-04-27

Statistics and Data Science provide a variety of perspectives and technical approaches for exploring and understanding Big Data. Partnerships between scientists from different fields such as statistics, machine learning, computer science, and applied mathematics can lead to innovative approaches for addressing problems involving increasingly large amounts of data in a rigorous and effective manner that takes advantage of advances in computing. Here, this article will explore various challenges in Data Science and will highlight statistical approaches that can facilitate analysis of large-scale data including sampling and data reduction methods, techniques for effective analysis and visualization of large-scale simulations, and algorithmsmore » and procedures for efficient processing.« less
Experimental Quiet Sprocket Design and Noise Reduction in Tracked Vehicles

DTIC Science & Technology

1981-04-01

Track and Suspension Noise Reduction Statistical Energy Analysis Mechanical Impedance Measurement Finite Element Modal Analysis\\Noise Sources 2...shape and idler attachment are different. These differen- ces were investigated using the concepts of statistical energy analysis for hull generated noise...element r,’calculated from Statistical Energy Analysis . Such an approach will be valid within reasonable limits for frequencies of about 200 Hz and
Categorical data processing for real estate objects valuation using statistical analysis

NASA Astrophysics Data System (ADS)

Parygin, D. S.; Malikov, V. P.; Golubev, A. V.; Sadovnikova, N. P.; Petrova, T. M.; Finogeev, A. G.

2018-05-01

Theoretical and practical approaches to the use of statistical methods for studying various properties of infrastructure objects are analyzed in the paper. Methods of forecasting the value of objects are considered. A method for coding categorical variables describing properties of real estate objects is proposed. The analysis of the results of modeling the price of real estate objects using regression analysis and an algorithm based on a comparative approach is carried out.
Meta‐analysis using individual participant data: one‐stage and two‐stage approaches, and why they may differ

PubMed Central

Ensor, Joie; Riley, Richard D.

2016-01-01

Meta‐analysis using individual participant data (IPD) obtains and synthesises the raw, participant‐level data from a set of relevant studies. The IPD approach is becoming an increasingly popular tool as an alternative to traditional aggregate data meta‐analysis, especially as it avoids reliance on published results and provides an opportunity to investigate individual‐level interactions, such as treatment‐effect modifiers. There are two statistical approaches for conducting an IPD meta‐analysis: one‐stage and two‐stage. The one‐stage approach analyses the IPD from all studies simultaneously, for example, in a hierarchical regression model with random effects. The two‐stage approach derives aggregate data (such as effect estimates) in each study separately and then combines these in a traditional meta‐analysis model. There have been numerous comparisons of the one‐stage and two‐stage approaches via theoretical consideration, simulation and empirical examples, yet there remains confusion regarding when each approach should be adopted, and indeed why they may differ. In this tutorial paper, we outline the key statistical methods for one‐stage and two‐stage IPD meta‐analyses, and provide 10 key reasons why they may produce different summary results. We explain that most differences arise because of different modelling assumptions, rather than the choice of one‐stage or two‐stage itself. We illustrate the concepts with recently published IPD meta‐analyses, summarise key statistical software and provide recommendations for future IPD meta‐analyses. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:27747915
Multivariate Meta-Analysis of Heterogeneous Studies Using Only Summary Statistics: Efficiency and Robustness

PubMed Central

Liu, Dungang; Liu, Regina; Xie, Minge

2014-01-01

Meta-analysis has been widely used to synthesize evidence from multiple studies for common hypotheses or parameters of interest. However, it has not yet been fully developed for incorporating heterogeneous studies, which arise often in applications due to different study designs, populations or outcomes. For heterogeneous studies, the parameter of interest may not be estimable for certain studies, and in such a case, these studies are typically excluded from conventional meta-analysis. The exclusion of part of the studies can lead to a non-negligible loss of information. This paper introduces a metaanalysis for heterogeneous studies by combining the confidence density functions derived from the summary statistics of individual studies, hence referred to as the CD approach. It includes all the studies in the analysis and makes use of all information, direct as well as indirect. Under a general likelihood inference framework, this new approach is shown to have several desirable properties, including: i) it is asymptotically as efficient as the maximum likelihood approach using individual participant data (IPD) from all studies; ii) unlike the IPD analysis, it suffices to use summary statistics to carry out the CD approach. Individual-level data are not required; and iii) it is robust against misspecification of the working covariance structure of the parameter estimates. Besides its own theoretical significance, the last property also substantially broadens the applicability of the CD approach. All the properties of the CD approach are further confirmed by data simulated from a randomized clinical trials setting as well as by real data on aircraft landing performance. Overall, one obtains an unifying approach for combining summary statistics, subsuming many of the existing meta-analysis methods as special cases. PMID:26190875
The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective.

PubMed

Kruschke, John K; Liddell, Torrin M

2018-02-01

In the practice of data analysis, there is a conceptual distinction between hypothesis testing, on the one hand, and estimation with quantified uncertainty on the other. Among frequentists in psychology, a shift of emphasis from hypothesis testing to estimation has been dubbed "the New Statistics" (Cumming 2014). A second conceptual distinction is between frequentist methods and Bayesian methods. Our main goal in this article is to explain how Bayesian methods achieve the goals of the New Statistics better than frequentist methods. The article reviews frequentist and Bayesian approaches to hypothesis testing and to estimation with confidence or credible intervals. The article also describes Bayesian approaches to meta-analysis, randomized controlled trials, and power analysis.
Statistical significance of trace evidence matches using independent physicochemical measurements

NASA Astrophysics Data System (ADS)

Almirall, Jose R.; Cole, Michael; Furton, Kenneth G.; Gettinby, George

1997-02-01

A statistical approach to the significance of glass evidence is proposed using independent physicochemical measurements and chemometrics. Traditional interpretation of the significance of trace evidence matches or exclusions relies on qualitative descriptors such as 'indistinguishable from,' 'consistent with,' 'similar to' etc. By performing physical and chemical measurements with are independent of one another, the significance of object exclusions or matches can be evaluated statistically. One of the problems with this approach is that the human brain is excellent at recognizing and classifying patterns and shapes but performs less well when that object is represented by a numerical list of attributes. Chemometrics can be employed to group similar objects using clustering algorithms and provide statistical significance in a quantitative manner. This approach is enhanced when population databases exist or can be created and the data in question can be evaluated given these databases. Since the selection of the variables used and their pre-processing can greatly influence the outcome, several different methods could be employed in order to obtain a more complete picture of the information contained in the data. Presently, we report on the analysis of glass samples using refractive index measurements and the quantitative analysis of the concentrations of the metals: Mg, Al, Ca, Fe, Mn, Ba, Sr, Ti and Zr. The extension of this general approach to fiber and paint comparisons also is discussed. This statistical approach should not replace the current interpretative approaches to trace evidence matches or exclusions but rather yields an additional quantitative measure. The lack of sufficient general population databases containing the needed physicochemical measurements and the potential for confusion arising from statistical analysis currently hamper this approach and ways of overcoming these obstacles are presented.
An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data.

PubMed

Jenkinson, Garrett; Abante, Jordi; Feinberg, Andrew P; Goutsias, John

2018-03-07

DNA methylation is a stable form of epigenetic memory used by cells to control gene expression. Whole genome bisulfite sequencing (WGBS) has emerged as a gold-standard experimental technique for studying DNA methylation by producing high resolution genome-wide methylation profiles. Statistical modeling and analysis is employed to computationally extract and quantify information from these profiles in an effort to identify regions of the genome that demonstrate crucial or aberrant epigenetic behavior. However, the performance of most currently available methods for methylation analysis is hampered by their inability to directly account for statistical dependencies between neighboring methylation sites, thus ignoring significant information available in WGBS reads. We present a powerful information-theoretic approach for genome-wide modeling and analysis of WGBS data based on the 1D Ising model of statistical physics. This approach takes into account correlations in methylation by utilizing a joint probability model that encapsulates all information available in WGBS methylation reads and produces accurate results even when applied on single WGBS samples with low coverage. Using the Shannon entropy, our approach provides a rigorous quantification of methylation stochasticity in individual WGBS samples genome-wide. Furthermore, it utilizes the Jensen-Shannon distance to evaluate differences in methylation distributions between a test and a reference sample. Differential performance assessment using simulated and real human lung normal/cancer data demonstrate a clear superiority of our approach over DSS, a recently proposed method for WGBS data analysis. Critically, these results demonstrate that marginal methods become statistically invalid when correlations are present in the data. This contribution demonstrates clear benefits and the necessity of modeling joint probability distributions of methylation using the 1D Ising model of statistical physics and of quantifying methylation stochasticity using concepts from information theory. By employing this methodology, substantial improvement of DNA methylation analysis can be achieved by effectively taking into account the massive amount of statistical information available in WGBS data, which is largely ignored by existing methods.

Microscopic saw mark analysis: an empirical approach.

PubMed

Love, Jennifer C; Derrick, Sharon M; Wiersema, Jason M; Peters, Charles

2015-01-01

Microscopic saw mark analysis is a well published and generally accepted qualitative analytical method. However, little research has focused on identifying and mitigating potential sources of error associated with the method. The presented study proposes the use of classification trees and random forest classifiers as an optimal, statistically sound approach to mitigate the potential for error of variability and outcome error in microscopic saw mark analysis. The statistical model was applied to 58 experimental saw marks created with four types of saws. The saw marks were made in fresh human femurs obtained through anatomical gift and were analyzed using a Keyence digital microscope. The statistical approach weighed the variables based on discriminatory value and produced decision trees with an associated outcome error rate of 8.62-17.82%. © 2014 American Academy of Forensic Sciences.
STATISTICAL SAMPLING AND DATA ANALYSIS

EPA Science Inventory

Research is being conducted to develop approaches to improve soil and sediment sampling techniques, measurement design and geostatistics, and data analysis via chemometric, environmetric, and robust statistical methods. Improvements in sampling contaminated soil and other hetero...
A Frequency Domain Approach to Pretest Analysis Model Correlation and Model Updating for the Mid-Frequency Range

DTIC Science & Technology

2009-02-01

range of modal analysis and the high frequency region of statistical energy analysis , is referred to as the mid-frequency range. The corresponding...frequency range of modal analysis and the high frequency region of statistical energy analysis , is referred to as the mid-frequency range. The...predictions. The averaging process is consistent with the averaging done in statistical energy analysis for stochastic systems. The FEM will always
Analysis of Coastal Dunes: A Remote Sensing and Statistical Approach.

ERIC Educational Resources Information Center

Jones, J. Richard

1985-01-01

Remote sensing analysis and statistical methods were used to analyze the coastal dunes of Plum Island, Massachusetts. The research methodology used provides an example of a student project for remote sensing, geomorphology, or spatial analysis courses at the university level. (RM)
Measuring Efficiency of Tunisian Schools in the Presence of Quasi-Fixed Inputs: A Bootstrap Data Envelopment Analysis Approach

ERIC Educational Resources Information Center

Essid, Hedi; Ouellette, Pierre; Vigeant, Stephane

2010-01-01

The objective of this paper is to measure the efficiency of high schools in Tunisia. We use a statistical data envelopment analysis (DEA)-bootstrap approach with quasi-fixed inputs to estimate the precision of our measure. To do so, we developed a statistical model serving as the foundation of the data generation process (DGP). The DGP is…
After p Values: The New Statistics for Undergraduate Neuroscience Education.

PubMed

Calin-Jageman, Robert J

2017-01-01

Statistical inference is a methodological cornerstone for neuroscience education. For many years this has meant inculcating neuroscience majors into null hypothesis significance testing with p values. There is increasing concern, however, about the pervasive misuse of p values. It is time to start planning statistics curricula for neuroscience majors that replaces or de-emphasizes p values. One promising alternative approach is what Cumming has dubbed the "New Statistics", an approach that emphasizes effect sizes, confidence intervals, meta-analysis, and open science. I give an example of the New Statistics in action and describe some of the key benefits of adopting this approach in neuroscience education.
The Practicality of Statistical Physics Handout Based on KKNI and the Constructivist Approach

NASA Astrophysics Data System (ADS)

Sari, S. Y.; Afrizon, R.

2018-04-01

Statistical physics lecture shows that: 1) the performance of lecturers, social climate, students’ competence and soft skills needed at work are in enough category, 2) students feel difficulties in following the lectures of statistical physics because it is abstract, 3) 40.72% of students needs more understanding in the form of repetition, practice questions and structured tasks, and 4) the depth of statistical physics material needs to be improved gradually and structured. This indicates that learning materials in accordance of The Indonesian National Qualification Framework or Kerangka Kualifikasi Nasional Indonesia (KKNI) with the appropriate learning approach are needed to help lecturers and students in lectures. The author has designed statistical physics handouts which have very valid criteria (90.89%) according to expert judgment. In addition, the practical level of handouts designed also needs to be considered in order to be easy to use, interesting and efficient in lectures. The purpose of this research is to know the practical level of statistical physics handout based on KKNI and a constructivist approach. This research is a part of research and development with 4-D model developed by Thiagarajan. This research activity has reached part of development test at Development stage. Data collection took place by using a questionnaire distributed to lecturers and students. Data analysis using descriptive data analysis techniques in the form of percentage. The analysis of the questionnaire shows that the handout of statistical physics has very practical criteria. The conclusion of this study is statistical physics handouts based on the KKNI and constructivist approach have been practically used in lectures.
Analysis and Report on SD2000: A Workshop to Determine Structural Dynamics Research for the Millenium

DTIC Science & Technology

2000-04-10

interest. These include Statistical Energy Analysis (SEA), fuzzy structure theory, and approaches combining modal analysis and SEA. Non-determinism...34 arising with increasing frequency. This has led to Statistical Energy Analysis , in which a system is modelled as a collection of coupled subsystems...22. IUTAM Symposium on Statistical Energy Analysis . 1999 Ed. F.J. Fahy and W.G. Price. Kluwer Academic Publishing. • 23. R.S. Langley and P
Do-it-yourself statistics: A computer-assisted likelihood approach to analysis of data from genetic crosses.

PubMed Central

Robbins, L G

2000-01-01

Graduate school programs in genetics have become so full that courses in statistics have often been eliminated. In addition, typical introductory statistics courses for the "statistics user" rather than the nascent statistician are laden with methods for analysis of measured variables while genetic data are most often discrete numbers. These courses are often seen by students and genetics professors alike as largely irrelevant cookbook courses. The powerful methods of likelihood analysis, although commonly employed in human genetics, are much less often used in other areas of genetics, even though current computational tools make this approach readily accessible. This article introduces the MLIKELY.PAS computer program and the logic of do-it-yourself maximum-likelihood statistics. The program itself, course materials, and expanded discussions of some examples that are only summarized here are available at http://www.unisi. it/ricerca/dip/bio_evol/sitomlikely/mlikely.h tml. PMID:10628965
A κ-generalized statistical mechanics approach to income analysis

NASA Astrophysics Data System (ADS)

Clementi, F.; Gallegati, M.; Kaniadakis, G.

2009-02-01

This paper proposes a statistical mechanics approach to the analysis of income distribution and inequality. A new distribution function, having its roots in the framework of κ-generalized statistics, is derived that is particularly suitable for describing the whole spectrum of incomes, from the low-middle income region up to the high income Pareto power-law regime. Analytical expressions for the shape, moments and some other basic statistical properties are given. Furthermore, several well-known econometric tools for measuring inequality, which all exist in a closed form, are considered. A method for parameter estimation is also discussed. The model is shown to fit remarkably well the data on personal income for the United States, and the analysis of inequality performed in terms of its parameters is revealed as very powerful.
Statistical analysis of weigh-in-motion data for bridge design in Vermont.

DOT National Transportation Integrated Search

2014-10-01

This study investigates the suitability of the HL-93 live load model recommended by AASHTO LRFD Specifications : for its use in the analysis and design of bridges in Vermont. The method of approach consists in performing a : statistical analysis of w...
Sociological Paradoxes and Graduate Statistics Classes. A Response to "The Sociology of Teaching Graduate Statistics"

ERIC Educational Resources Information Center

Hardy, Melissa

2005-01-01

This article presents a response to Timothy Patrick Moran's article "The Sociology of Teaching Graduate Statistics." In his essay, Moran argues that exciting developments in techniques of quantitative analysis are currently coupled with a much less exciting formulaic approach to teaching sociology graduate students about quantitative analysis. The…
Atmospheric Tracer Inverse Modeling Using Markov Chain Monte Carlo (MCMC)

NASA Astrophysics Data System (ADS)

Kasibhatla, P.

2004-12-01

In recent years, there has been an increasing emphasis on the use of Bayesian statistical estimation techniques to characterize the temporal and spatial variability of atmospheric trace gas sources and sinks. The applications have been varied in terms of the particular species of interest, as well as in terms of the spatial and temporal resolution of the estimated fluxes. However, one common characteristic has been the use of relatively simple statistical models for describing the measurement and chemical transport model error statistics and prior source statistics. For example, multivariate normal probability distribution functions (pdfs) are commonly used to model these quantities and inverse source estimates are derived for fixed values of pdf paramaters. While the advantage of this approach is that closed form analytical solutions for the a posteriori pdfs of interest are available, it is worth exploring Bayesian analysis approaches which allow for a more general treatment of error and prior source statistics. Here, we present an application of the Markov Chain Monte Carlo (MCMC) methodology to an atmospheric tracer inversion problem to demonstrate how more gereral statistical models for errors can be incorporated into the analysis in a relatively straightforward manner. The MCMC approach to Bayesian analysis, which has found wide application in a variety of fields, is a statistical simulation approach that involves computing moments of interest of the a posteriori pdf by efficiently sampling this pdf. The specific inverse problem that we focus on is the annual mean CO2 source/sink estimation problem considered by the TransCom3 project. TransCom3 was a collaborative effort involving various modeling groups and followed a common modeling and analysis protocoal. As such, this problem provides a convenient case study to demonstrate the applicability of the MCMC methodology to atmospheric tracer source/sink estimation problems.
On the analysis of very small samples of Gaussian repeated measurements: an alternative approach.

PubMed

Westgate, Philip M; Burchett, Woodrow W

2017-03-15

The analysis of very small samples of Gaussian repeated measurements can be challenging. First, due to a very small number of independent subjects contributing outcomes over time, statistical power can be quite small. Second, nuisance covariance parameters must be appropriately accounted for in the analysis in order to maintain the nominal test size. However, available statistical strategies that ensure valid statistical inference may lack power, whereas more powerful methods may have the potential for inflated test sizes. Therefore, we explore an alternative approach to the analysis of very small samples of Gaussian repeated measurements, with the goal of maintaining valid inference while also improving statistical power relative to other valid methods. This approach uses generalized estimating equations with a bias-corrected empirical covariance matrix that accounts for all small-sample aspects of nuisance correlation parameter estimation in order to maintain valid inference. Furthermore, the approach utilizes correlation selection strategies with the goal of choosing the working structure that will result in the greatest power. In our study, we show that when accurate modeling of the nuisance correlation structure impacts the efficiency of regression parameter estimation, this method can improve power relative to existing methods that yield valid inference. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Method for Identifying Probable Archaeological Sites from Remotely Sensed Data

NASA Technical Reports Server (NTRS)

Tilton, James C.; Comer, Douglas C.; Priebe, Carey E.; Sussman, Daniel

2011-01-01

Archaeological sites are being compromised or destroyed at a catastrophic rate in most regions of the world. The best solution to this problem is for archaeologists to find and study these sites before they are compromised or destroyed. One way to facilitate the necessary rapid, wide area surveys needed to find these archaeological sites is through the generation of maps of probable archaeological sites from remotely sensed data. We describe an approach for identifying probable locations of archaeological sites over a wide area based on detecting subtle anomalies in vegetative cover through a statistically based analysis of remotely sensed data from multiple sources. We further developed this approach under a recent NASA ROSES Space Archaeology Program project. Under this project we refined and elaborated this statistical analysis to compensate for potential slight miss-registrations between the remote sensing data sources and the archaeological site location data. We also explored data quantization approaches (required by the statistical analysis approach), and we identified a superior data quantization approached based on a unique image segmentation approach. In our presentation we will summarize our refined approach and demonstrate the effectiveness of the overall approach with test data from Santa Catalina Island off the southern California coast. Finally, we discuss our future plans for further improving our approach.
A new statistical methodology predicting chip failure probability considering electromigration

NASA Astrophysics Data System (ADS)

Sun, Ted

In this research thesis, we present a new approach to analyze chip reliability subject to electromigration (EM) whose fundamental causes and EM phenomenon happened in different materials are presented in this thesis. This new approach utilizes the statistical nature of EM failure in order to assess overall EM risk. It includes within-die temperature variations from the chip's temperature map extracted by an Electronic Design Automation (EDA) tool to estimate the failure probability of a design. Both the power estimation and thermal analysis are performed in the EDA flow. We first used the traditional EM approach to analyze the design with a single temperature across the entire chip that involves 6 metal and 5 via layers. Next, we used the same traditional approach but with a realistic temperature map. The traditional EM analysis approach and that coupled with a temperature map and the comparison between the results of considering and not considering temperature map are presented in in this research. A comparison between these two results confirms that using a temperature map yields a less pessimistic estimation of the chip's EM risk. Finally, we employed the statistical methodology we developed considering a temperature map and different use-condition voltages and frequencies to estimate the overall failure probability of the chip. The statistical model established considers the scaling work with the usage of traditional Black equation and four major conditions. The statistical result comparisons are within our expectations. The results of this statistical analysis confirm that the chip level failure probability is higher i) at higher use-condition frequencies for all use-condition voltages, and ii) when a single temperature instead of a temperature map across the chip is considered. In this thesis, I start with an overall review on current design types, common flows, and necessary verifications and reliability checking steps used in this IC design industry. Furthermore, the important concepts about "Scripting Automation" which is used in all the integration of using diversified EDA tools in this research work are also described in detail with several examples and my completed coding works are also put in the appendix for your reference. Hopefully, this construction of my thesis will give readers a thorough understanding about my research work from the automation of EDA tools to the statistical data generation, from the nature of EM to the statistical model construction, and the comparisons among the traditional EM analysis and the statistical EM analysis approaches.
Statistical Surrogate Modeling of Atmospheric Dispersion Events Using Bayesian Adaptive Splines

NASA Astrophysics Data System (ADS)

Francom, D.; Sansó, B.; Bulaevskaya, V.; Lucas, D. D.

2016-12-01

Uncertainty in the inputs of complex computer models, including atmospheric dispersion and transport codes, is often assessed via statistical surrogate models. Surrogate models are computationally efficient statistical approximations of expensive computer models that enable uncertainty analysis. We introduce Bayesian adaptive spline methods for producing surrogate models that capture the major spatiotemporal patterns of the parent model, while satisfying all the necessities of flexibility, accuracy and computational feasibility. We present novel methodological and computational approaches motivated by a controlled atmospheric tracer release experiment conducted at the Diablo Canyon nuclear power plant in California. Traditional methods for building statistical surrogate models often do not scale well to experiments with large amounts of data. Our approach is well suited to experiments involving large numbers of model inputs, large numbers of simulations, and functional output for each simulation. Our approach allows us to perform global sensitivity analysis with ease. We also present an approach to calibration of simulators using field data.
Exploratory study on a statistical method to analyse time resolved data obtained during nanomaterial exposure measurements

NASA Astrophysics Data System (ADS)

Clerc, F.; Njiki-Menga, G.-H.; Witschger, O.

2013-04-01

Most of the measurement strategies that are suggested at the international level to assess workplace exposure to nanomaterials rely on devices measuring, in real time, airborne particles concentrations (according different metrics). Since none of the instruments to measure aerosols can distinguish a particle of interest to the background aerosol, the statistical analysis of time resolved data requires special attention. So far, very few approaches have been used for statistical analysis in the literature. This ranges from simple qualitative analysis of graphs to the implementation of more complex statistical models. To date, there is still no consensus on a particular approach and the current period is always looking for an appropriate and robust method. In this context, this exploratory study investigates a statistical method to analyse time resolved data based on a Bayesian probabilistic approach. To investigate and illustrate the use of the this statistical method, particle number concentration data from a workplace study that investigated the potential for exposure via inhalation from cleanout operations by sandpapering of a reactor producing nanocomposite thin films have been used. In this workplace study, the background issue has been addressed through the near-field and far-field approaches and several size integrated and time resolved devices have been used. The analysis of the results presented here focuses only on data obtained with two handheld condensation particle counters. While one was measuring at the source of the released particles, the other one was measuring in parallel far-field. The Bayesian probabilistic approach allows a probabilistic modelling of data series, and the observed task is modelled in the form of probability distributions. The probability distributions issuing from time resolved data obtained at the source can be compared with the probability distributions issuing from the time resolved data obtained far-field, leading in a quantitative estimation of the airborne particles released at the source when the task is performed. Beyond obtained results, this exploratory study indicates that the analysis of the results requires specific experience in statistics.
Estimation of elastic moduli of graphene monolayer in lattice statics approach at nonzero temperature

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zubko, I. Yu., E-mail: zoubko@list.ru; Kochurov, V. I.

2015-10-27

For the aim of the crystal temperature control the computational-statistical approach to studying thermo-mechanical properties for finite sized crystals is presented. The approach is based on the combination of the high-performance computational techniques and statistical analysis of the crystal response on external thermo-mechanical actions for specimens with the statistically small amount of atoms (for instance, nanoparticles). The heat motion of atoms is imitated in the statics approach by including the independent degrees of freedom for atoms connected with their oscillations. We obtained that under heating, graphene material response is nonsymmetric.
Radiomic analysis in prediction of Human Papilloma Virus status.

PubMed

Yu, Kaixian; Zhang, Youyi; Yu, Yang; Huang, Chao; Liu, Rongjie; Li, Tengfei; Yang, Liuqing; Morris, Jeffrey S; Baladandayuthapani, Veerabhadran; Zhu, Hongtu

2017-12-01

Human Papilloma Virus (HPV) has been associated with oropharyngeal cancer prognosis. Traditionally the HPV status is tested through invasive lab test. Recently, the rapid development of statistical image analysis techniques has enabled precise quantitative analysis of medical images. The quantitative analysis of Computed Tomography (CT) provides a non-invasive way to assess HPV status for oropharynx cancer patients. We designed a statistical radiomics approach analyzing CT images to predict HPV status. Various radiomics features were extracted from CT scans, and analyzed using statistical feature selection and prediction methods. Our approach ranked the highest in the 2016 Medical Image Computing and Computer Assisted Intervention (MICCAI) grand challenge: Oropharynx Cancer (OPC) Radiomics Challenge, Human Papilloma Virus (HPV) Status Prediction. Further analysis on the most relevant radiomic features distinguishing HPV positive and negative subjects suggested that HPV positive patients usually have smaller and simpler tumors.

A Survey of Probabilistic Methods for Dynamical Systems with Uncertain Parameters.

DTIC Science & Technology

1986-05-01

J., "An Approach to the Theoretical Background of Statistical Energy Analysis Applied to Structural Vibration," Journ. Acoust. Soc. Amer., Vol. 69...1973, Sect. 8.3. 80. Lyon, R.H., " Statistical Energy Analysis of Dynamical Systems," M.I.T. Press, 1975. e) Late References added in Proofreading !! 81...Dowell, E.H., and Kubota, Y., "Asymptotic Modal Analysis and ’~ y C-" -165- Statistical Energy Analysis of Dynamical Systems," Journ. Appi. - Mech
Bayesian Statistics and Uncertainty Quantification for Safety Boundary Analysis in Complex Systems

NASA Technical Reports Server (NTRS)

He, Yuning; Davies, Misty Dawn

2014-01-01

The analysis of a safety-critical system often requires detailed knowledge of safe regions and their highdimensional non-linear boundaries. We present a statistical approach to iteratively detect and characterize the boundaries, which are provided as parameterized shape candidates. Using methods from uncertainty quantification and active learning, we incrementally construct a statistical model from only few simulation runs and obtain statistically sound estimates of the shape parameters for safety boundaries.
NAUSEA and the Principle of Supplementarity of Damping and Isolation in Noise Control.

DTIC Science & Technology

1980-02-01

New approaches and uses of the statistical energy analysis (NAUSEA) have been considered and developed in recent months. The advances were made...possible in that the requirement, in the olde statistical energy analysis , that the dynamic systems be highly reverberant and the couplings between the...analytical consideration in terms of the statistical energy analysis (SEA). A brief discussion and simple examples that relate to these recent advances
Evaluating and Reporting Statistical Power in Counseling Research

ERIC Educational Resources Information Center

Balkin, Richard S.; Sheperis, Carl J.

2011-01-01

Despite recommendations from the "Publication Manual of the American Psychological Association" (6th ed.) to include information on statistical power when publishing quantitative results, authors seldom include analysis or discussion of statistical power. The rationale for discussing statistical power is addressed, approaches to using "G*Power" to…
Fisher statistics for analysis of diffusion tensor directional information.

PubMed

Hutchinson, Elizabeth B; Rutecki, Paul A; Alexander, Andrew L; Sutula, Thomas P

2012-04-30

A statistical approach is presented for the quantitative analysis of diffusion tensor imaging (DTI) directional information using Fisher statistics, which were originally developed for the analysis of vectors in the field of paleomagnetism. In this framework, descriptive and inferential statistics have been formulated based on the Fisher probability density function, a spherical analogue of the normal distribution. The Fisher approach was evaluated for investigation of rat brain DTI maps to characterize tissue orientation in the corpus callosum, fornix, and hilus of the dorsal hippocampal dentate gyrus, and to compare directional properties in these regions following status epilepticus (SE) or traumatic brain injury (TBI) with values in healthy brains. Direction vectors were determined for each region of interest (ROI) for each brain sample and Fisher statistics were applied to calculate the mean direction vector and variance parameters in the corpus callosum, fornix, and dentate gyrus of normal rats and rats that experienced TBI or SE. Hypothesis testing was performed by calculation of Watson's F-statistic and associated p-value giving the likelihood that grouped observations were from the same directional distribution. In the fornix and midline corpus callosum, no directional differences were detected between groups, however in the hilus, significant (p<0.0005) differences were found that robustly confirmed observations that were suggested by visual inspection of directionally encoded color DTI maps. The Fisher approach is a potentially useful analysis tool that may extend the current capabilities of DTI investigation by providing a means of statistical comparison of tissue structural orientation. Copyright © 2012 Elsevier B.V. All rights reserved.
Demonstration of a software design and statistical analysis methodology with application to patient outcomes data sets

PubMed Central

Mayo, Charles; Conners, Steve; Warren, Christopher; Miller, Robert; Court, Laurence; Popple, Richard

2013-01-01

Purpose: With emergence of clinical outcomes databases as tools utilized routinely within institutions, comes need for software tools to support automated statistical analysis of these large data sets and intrainstitutional exchange from independent federated databases to support data pooling. In this paper, the authors present a design approach and analysis methodology that addresses both issues. Methods: A software application was constructed to automate analysis of patient outcomes data using a wide range of statistical metrics, by combining use of C#.Net and R code. The accuracy and speed of the code was evaluated using benchmark data sets. Results: The approach provides data needed to evaluate combinations of statistical measurements for ability to identify patterns of interest in the data. Through application of the tools to a benchmark data set for dose-response threshold and to SBRT lung data sets, an algorithm was developed that uses receiver operator characteristic curves to identify a threshold value and combines use of contingency tables, Fisher exact tests, Welch t-tests, and Kolmogorov-Smirnov tests to filter the large data set to identify values demonstrating dose-response. Kullback-Leibler divergences were used to provide additional confirmation. Conclusions: The work demonstrates the viability of the design approach and the software tool for analysis of large data sets. PMID:24320426
Demonstration of a software design and statistical analysis methodology with application to patient outcomes data sets.

PubMed

Mayo, Charles; Conners, Steve; Warren, Christopher; Miller, Robert; Court, Laurence; Popple, Richard

2013-11-01

With emergence of clinical outcomes databases as tools utilized routinely within institutions, comes need for software tools to support automated statistical analysis of these large data sets and intrainstitutional exchange from independent federated databases to support data pooling. In this paper, the authors present a design approach and analysis methodology that addresses both issues. A software application was constructed to automate analysis of patient outcomes data using a wide range of statistical metrics, by combining use of C#.Net and R code. The accuracy and speed of the code was evaluated using benchmark data sets. The approach provides data needed to evaluate combinations of statistical measurements for ability to identify patterns of interest in the data. Through application of the tools to a benchmark data set for dose-response threshold and to SBRT lung data sets, an algorithm was developed that uses receiver operator characteristic curves to identify a threshold value and combines use of contingency tables, Fisher exact tests, Welch t-tests, and Kolmogorov-Smirnov tests to filter the large data set to identify values demonstrating dose-response. Kullback-Leibler divergences were used to provide additional confirmation. The work demonstrates the viability of the design approach and the software tool for analysis of large data sets.
Meta-analysis of correlated traits via summary statistics from GWASs with an application in hypertension.

PubMed

Zhu, Xiaofeng; Feng, Tao; Tayo, Bamidele O; Liang, Jingjing; Young, J Hunter; Franceschini, Nora; Smith, Jennifer A; Yanek, Lisa R; Sun, Yan V; Edwards, Todd L; Chen, Wei; Nalls, Mike; Fox, Ervin; Sale, Michele; Bottinger, Erwin; Rotimi, Charles; Liu, Yongmei; McKnight, Barbara; Liu, Kiang; Arnett, Donna K; Chakravati, Aravinda; Cooper, Richard S; Redline, Susan

2015-01-08

Genome-wide association studies (GWASs) have identified many genetic variants underlying complex traits. Many detected genetic loci harbor variants that associate with multiple-even distinct-traits. Most current analysis approaches focus on single traits, even though the final results from multiple traits are evaluated together. Such approaches miss the opportunity to systemically integrate the phenome-wide data available for genetic association analysis. In this study, we propose a general approach that can integrate association evidence from summary statistics of multiple traits, either correlated, independent, continuous, or binary traits, which might come from the same or different studies. We allow for trait heterogeneity effects. Population structure and cryptic relatedness can also be controlled. Our simulations suggest that the proposed method has improved statistical power over single-trait analysis in most of the cases we studied. We applied our method to the Continental Origins and Genetic Epidemiology Network (COGENT) African ancestry samples for three blood pressure traits and identified four loci (CHIC2, HOXA-EVX1, IGFBP1/IGFBP3, and CDH17; p < 5.0 × 10(-8)) associated with hypertension-related traits that were missed by a single-trait analysis in the original report. Six additional loci with suggestive association evidence (p < 5.0 × 10(-7)) were also observed, including CACNA1D and WNT3. Our study strongly suggests that analyzing multiple phenotypes can improve statistical power and that such analysis can be executed with the summary statistics from GWASs. Our method also provides a way to study a cross phenotype (CP) association by using summary statistics from GWASs of multiple phenotypes. Copyright © 2015 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Heads Up! a Calculation- & Jargon-Free Approach to Statistics

ERIC Educational Resources Information Center

Giese, Alan R.

2012-01-01

Evaluating the strength of evidence in noisy data is a critical step in scientific thinking that typically relies on statistics. Students without statistical training will benefit from heuristic models that highlight the logic of statistical analysis. The likelihood associated with various coin-tossing outcomes gives students such a model. There…
Comparison of student's learning achievement through realistic mathematics education (RME) approach and problem solving approach on grade VII

NASA Astrophysics Data System (ADS)

Ilyas, Muhammad; Salwah

2017-02-01

The type of this research was experiment. The purpose of this study was to determine the difference and the quality of student's learning achievement between students who obtained learning through Realistic Mathematics Education (RME) approach and students who obtained learning through problem solving approach. This study was a quasi-experimental research with non-equivalent experiment group design. The population of this study was all students of grade VII in one of junior high school in Palopo, in the second semester of academic year 2015/2016. Two classes were selected purposively as sample of research that was: year VII-5 as many as 28 students were selected as experiment group I and VII-6 as many as 23 students were selected as experiment group II. Treatment that used in the experiment group I was learning by RME Approach, whereas in the experiment group II by problem solving approach. Technique of data collection in this study gave pretest and posttest to students. The analysis used in this research was an analysis of descriptive statistics and analysis of inferential statistics using t-test. Based on the analysis of descriptive statistics, it can be concluded that the average score of students' mathematics learning after taught using problem solving approach was similar to the average results of students' mathematics learning after taught using realistic mathematics education (RME) approach, which are both at the high category. In addition, It can also be concluded that; (1) there was no difference in the results of students' mathematics learning taught using realistic mathematics education (RME) approach and students who taught using problem solving approach, (2) quality of learning achievement of students who received RME approach and problem solving approach learning was same, which was at the high category.
A note on generalized Genome Scan Meta-Analysis statistics

PubMed Central

Koziol, James A; Feng, Anne C

2005-01-01

Background Wise et al. introduced a rank-based statistical technique for meta-analysis of genome scans, the Genome Scan Meta-Analysis (GSMA) method. Levinson et al. recently described two generalizations of the GSMA statistic: (i) a weighted version of the GSMA statistic, so that different studies could be ascribed different weights for analysis; and (ii) an order statistic approach, reflecting the fact that a GSMA statistic can be computed for each chromosomal region or bin width across the various genome scan studies. Results We provide an Edgeworth approximation to the null distribution of the weighted GSMA statistic, and, we examine the limiting distribution of the GSMA statistics under the order statistic formulation, and quantify the relevance of the pairwise correlations of the GSMA statistics across different bins on this limiting distribution. We also remark on aggregate criteria and multiple testing for determining significance of GSMA results. Conclusion Theoretical considerations detailed herein can lead to clarification and simplification of testing criteria for generalizations of the GSMA statistic. PMID:15717930
Statistical basis and outputs of stable isotope mixing models: Comment on Fry (2013)

EPA Science Inventory

A recent article by Fry (2013; Mar Ecol Prog Ser 472:1−13) reviewed approaches to solving underdetermined stable isotope mixing systems, and presented a new graphical approach and set of summary statistics for the analysis of such systems. In his review, Fry (2013) mis-characteri...
Phenotypic mapping of metabolic profiles using self-organizing maps of high-dimensional mass spectrometry data.

PubMed

Goodwin, Cody R; Sherrod, Stacy D; Marasco, Christina C; Bachmann, Brian O; Schramm-Sapyta, Nicole; Wikswo, John P; McLean, John A

2014-07-01

A metabolic system is composed of inherently interconnected metabolic precursors, intermediates, and products. The analysis of untargeted metabolomics data has conventionally been performed through the use of comparative statistics or multivariate statistical analysis-based approaches; however, each falls short in representing the related nature of metabolic perturbations. Herein, we describe a complementary method for the analysis of large metabolite inventories using a data-driven approach based upon a self-organizing map algorithm. This workflow allows for the unsupervised clustering, and subsequent prioritization of, correlated features through Gestalt comparisons of metabolic heat maps. We describe this methodology in detail, including a comparison to conventional metabolomics approaches, and demonstrate the application of this method to the analysis of the metabolic repercussions of prolonged cocaine exposure in rat sera profiles.
Meta-analysis of diagnostic test data: a bivariate Bayesian modeling approach.

PubMed

Verde, Pablo E

2010-12-30

In the last decades, the amount of published results on clinical diagnostic tests has expanded very rapidly. The counterpart to this development has been the formal evaluation and synthesis of diagnostic results. However, published results present substantial heterogeneity and they can be regarded as so far removed from the classical domain of meta-analysis, that they can provide a rather severe test of classical statistical methods. Recently, bivariate random effects meta-analytic methods, which model the pairs of sensitivities and specificities, have been presented from the classical point of view. In this work a bivariate Bayesian modeling approach is presented. This approach substantially extends the scope of classical bivariate methods by allowing the structural distribution of the random effects to depend on multiple sources of variability. Meta-analysis is summarized by the predictive posterior distributions for sensitivity and specificity. This new approach allows, also, to perform substantial model checking, model diagnostic and model selection. Statistical computations are implemented in the public domain statistical software (WinBUGS and R) and illustrated with real data examples. Copyright © 2010 John Wiley & Sons, Ltd.
Comparison of future and base precipitation anomalies by SimCLIM statistical projection through ensemble approach in Pakistan

NASA Astrophysics Data System (ADS)

Amin, Asad; Nasim, Wajid; Mubeen, Muhammad; Kazmi, Dildar Hussain; Lin, Zhaohui; Wahid, Abdul; Sultana, Syeda Refat; Gibbs, Jim; Fahad, Shah

2017-09-01

Unpredictable precipitation trends have largely influenced by climate change which prolonged droughts or floods in South Asia. Statistical analysis of monthly, seasonal, and annual precipitation trend carried out for different temporal (1996-2015 and 2041-2060) and spatial scale (39 meteorological stations) in Pakistan. Statistical downscaling model (SimCLIM) was used for future precipitation projection (2041-2060) and analyzed by statistical approach. Ensemble approach combined with representative concentration pathways (RCPs) at medium level used for future projections. The magnitude and slop of trends were derived by applying Mann-Kendal and Sen's slop statistical approaches. Geo-statistical application used to generate precipitation trend maps. Comparison of base and projected precipitation by statistical analysis represented by maps and graphical visualization which facilitate to detect trends. Results of this study projects that precipitation trend was increasing more than 70% of weather stations for February, March, April, August, and September represented as base years. Precipitation trend was decreased in February to April but increase in July to October in projected years. Highest decreasing trend was reported in January for base years which was also decreased in projected years. Greater variation in precipitation trends for projected and base years was reported in February to April. Variations in projected precipitation trend for Punjab and Baluchistan highly accredited in March and April. Seasonal analysis shows large variation in winter, which shows increasing trend for more than 30% of weather stations and this increased trend approaches 40% for projected precipitation. High risk was reported in base year pre-monsoon season where 90% of weather station shows increasing trend but in projected years this trend decreased up to 33%. Finally, the annual precipitation trend has increased for more than 90% of meteorological stations in base (1996-2015) which has decreased for projected year (2041-2060) up to 76%. These result revealed that overall precipitation trend is decreasing in future year which may prolonged the drought in 14% of weather stations under study.
Applied Statistics: From Bivariate through Multivariate Techniques [with CD-ROM

ERIC Educational Resources Information Center

Warner, Rebecca M.

2007-01-01

This book provides a clear introduction to widely used topics in bivariate and multivariate statistics, including multiple regression, discriminant analysis, MANOVA, factor analysis, and binary logistic regression. The approach is applied and does not require formal mathematics; equations are accompanied by verbal explanations. Students are asked…
A Bifactor Approach to Model Multifaceted Constructs in Statistical Mediation Analysis

ERIC Educational Resources Information Center

Gonzalez, Oscar; MacKinnon, David P.

2018-01-01

Statistical mediation analysis allows researchers to identify the most important mediating constructs in the causal process studied. Identifying specific mediators is especially relevant when the hypothesized mediating construct consists of multiple related facets. The general definition of the construct and its facets might relate differently to…
On the Utility of Content Analysis in Author Attribution: "The Federalist."

ERIC Educational Resources Information Center

Martindale, Colin; McKenzie, Dean

1995-01-01

Compares the success of lexical statistics, content analysis, and function words in determining the true author of "The Federalist." The function word approach proved most successful in attributing the papers to James Madison. Lexical statistics contributed nothing, while content analytic measures resulted in some success. (MJP)
Theoretical Aspects of the Patterns Recognition Statistical Theory Used for Developing the Diagnosis Algorithms for Complicated Technical Systems

NASA Astrophysics Data System (ADS)

Obozov, A. A.; Serpik, I. N.; Mihalchenko, G. S.; Fedyaeva, G. A.

2017-01-01

In the article, the problem of application of the pattern recognition (a relatively young area of engineering cybernetics) for analysis of complicated technical systems is examined. It is shown that the application of a statistical approach for hard distinguishable situations could be the most effective. The different recognition algorithms are based on Bayes approach, which estimates posteriori probabilities of a certain event and an assumed error. Application of the statistical approach to pattern recognition is possible for solving the problem of technical diagnosis complicated systems and particularly big powered marine diesel engines.
The use and misuse of statistical methodologies in pharmacology research.

PubMed

Marino, Michael J

2014-01-01

Descriptive, exploratory, and inferential statistics are necessary components of hypothesis-driven biomedical research. Despite the ubiquitous need for these tools, the emphasis on statistical methods in pharmacology has become dominated by inferential methods often chosen more by the availability of user-friendly software than by any understanding of the data set or the critical assumptions of the statistical tests. Such frank misuse of statistical methodology and the quest to reach the mystical α<0.05 criteria has hampered research via the publication of incorrect analysis driven by rudimentary statistical training. Perhaps more critically, a poor understanding of statistical tools limits the conclusions that may be drawn from a study by divorcing the investigator from their own data. The net result is a decrease in quality and confidence in research findings, fueling recent controversies over the reproducibility of high profile findings and effects that appear to diminish over time. The recent development of "omics" approaches leading to the production of massive higher dimensional data sets has amplified these issues making it clear that new approaches are needed to appropriately and effectively mine this type of data. Unfortunately, statistical education in the field has not kept pace. This commentary provides a foundation for an intuitive understanding of statistics that fosters an exploratory approach and an appreciation for the assumptions of various statistical tests that hopefully will increase the correct use of statistics, the application of exploratory data analysis, and the use of statistical study design, with the goal of increasing reproducibility and confidence in the literature. Copyright © 2013. Published by Elsevier Inc.

Time Series Expression Analyses Using RNA-seq: A Statistical Approach

PubMed Central

Oh, Sunghee; Song, Seongho; Grabowski, Gregory; Zhao, Hongyu; Noonan, James P.

2013-01-01

RNA-seq is becoming the de facto standard approach for transcriptome analysis with ever-reducing cost. It has considerable advantages over conventional technologies (microarrays) because it allows for direct identification and quantification of transcripts. Many time series RNA-seq datasets have been collected to study the dynamic regulations of transcripts. However, statistically rigorous and computationally efficient methods are needed to explore the time-dependent changes of gene expression in biological systems. These methods should explicitly account for the dependencies of expression patterns across time points. Here, we discuss several methods that can be applied to model timecourse RNA-seq data, including statistical evolutionary trajectory index (SETI), autoregressive time-lagged regression (AR(1)), and hidden Markov model (HMM) approaches. We use three real datasets and simulation studies to demonstrate the utility of these dynamic methods in temporal analysis. PMID:23586021
Time series expression analyses using RNA-seq: a statistical approach.

PubMed

Oh, Sunghee; Song, Seongho; Grabowski, Gregory; Zhao, Hongyu; Noonan, James P

2013-01-01

RNA-seq is becoming the de facto standard approach for transcriptome analysis with ever-reducing cost. It has considerable advantages over conventional technologies (microarrays) because it allows for direct identification and quantification of transcripts. Many time series RNA-seq datasets have been collected to study the dynamic regulations of transcripts. However, statistically rigorous and computationally efficient methods are needed to explore the time-dependent changes of gene expression in biological systems. These methods should explicitly account for the dependencies of expression patterns across time points. Here, we discuss several methods that can be applied to model timecourse RNA-seq data, including statistical evolutionary trajectory index (SETI), autoregressive time-lagged regression (AR(1)), and hidden Markov model (HMM) approaches. We use three real datasets and simulation studies to demonstrate the utility of these dynamic methods in temporal analysis.
Statistical Approaches Used to Assess the Equity of Access to Food Outlets: A Systematic Review

PubMed Central

Lamb, Karen E.; Thornton, Lukar E.; Cerin, Ester; Ball, Kylie

2015-01-01

Background Inequalities in eating behaviours are often linked to the types of food retailers accessible in neighbourhood environments. Numerous studies have aimed to identify if access to healthy and unhealthy food retailers is socioeconomically patterned across neighbourhoods, and thus a potential risk factor for dietary inequalities. Existing reviews have examined differences between methodologies, particularly focussing on neighbourhood and food outlet access measure definitions. However, no review has informatively discussed the suitability of the statistical methodologies employed; a key issue determining the validity of study findings. Our aim was to examine the suitability of statistical approaches adopted in these analyses. Methods Searches were conducted for articles published from 2000–2014. Eligible studies included objective measures of the neighbourhood food environment and neighbourhood-level socio-economic status, with a statistical analysis of the association between food outlet access and socio-economic status. Results Fifty-four papers were included. Outlet accessibility was typically defined as the distance to the nearest outlet from the neighbourhood centroid, or as the number of food outlets within a neighbourhood (or buffer). To assess if these measures were linked to neighbourhood disadvantage, common statistical methods included ANOVA, correlation, and Poisson or negative binomial regression. Although all studies involved spatial data, few considered spatial analysis techniques or spatial autocorrelation. Conclusions With advances in GIS software, sophisticated measures of neighbourhood outlet accessibility can be considered. However, approaches to statistical analysis often appear less sophisticated. Care should be taken to consider assumptions underlying the analysis and the possibility of spatially correlated residuals which could affect the results. PMID:29546115
Mathematical and Statistical Techniques for Systems Medicine: The Wnt Signaling Pathway as a Case Study.

PubMed

MacLean, Adam L; Harrington, Heather A; Stumpf, Michael P H; Byrne, Helen M

2016-01-01

The last decade has seen an explosion in models that describe phenomena in systems medicine. Such models are especially useful for studying signaling pathways, such as the Wnt pathway. In this chapter we use the Wnt pathway to showcase current mathematical and statistical techniques that enable modelers to gain insight into (models of) gene regulation and generate testable predictions. We introduce a range of modeling frameworks, but focus on ordinary differential equation (ODE) models since they remain the most widely used approach in systems biology and medicine and continue to offer great potential. We present methods for the analysis of a single model, comprising applications of standard dynamical systems approaches such as nondimensionalization, steady state, asymptotic and sensitivity analysis, and more recent statistical and algebraic approaches to compare models with data. We present parameter estimation and model comparison techniques, focusing on Bayesian analysis and coplanarity via algebraic geometry. Our intention is that this (non-exhaustive) review may serve as a useful starting point for the analysis of models in systems medicine.
Multivariate analysis: A statistical approach for computations

NASA Astrophysics Data System (ADS)

Michu, Sachin; Kaushik, Vandana

2014-10-01

Multivariate analysis is a type of multivariate statistical approach commonly used in, automotive diagnosis, education evaluating clusters in finance etc and more recently in the health-related professions. The objective of the paper is to provide a detailed exploratory discussion about factor analysis (FA) in image retrieval method and correlation analysis (CA) of network traffic. Image retrieval methods aim to retrieve relevant images from a collected database, based on their content. The problem is made more difficult due to the high dimension of the variable space in which the images are represented. Multivariate correlation analysis proposes an anomaly detection and analysis method based on the correlation coefficient matrix. Anomaly behaviors in the network include the various attacks on the network like DDOs attacks and network scanning.
Circularly-symmetric complex normal ratio distribution for scalar transmissibility functions. Part III: Application to statistical modal analysis

NASA Astrophysics Data System (ADS)

Yan, Wang-Ji; Ren, Wei-Xin

2018-01-01

This study applies the theoretical findings of circularly-symmetric complex normal ratio distribution Yan and Ren (2016) [1,2] to transmissibility-based modal analysis from a statistical viewpoint. A probabilistic model of transmissibility function in the vicinity of the resonant frequency is formulated in modal domain, while some insightful comments are offered. It theoretically reveals that the statistics of transmissibility function around the resonant frequency is solely dependent on 'noise-to-signal' ratio and mode shapes. As a sequel to the development of the probabilistic model of transmissibility function in modal domain, this study poses the process of modal identification in the context of Bayesian framework by borrowing a novel paradigm. Implementation issues unique to the proposed approach are resolved by Lagrange multiplier approach. Also, this study explores the possibility of applying Bayesian analysis in distinguishing harmonic components and structural ones. The approaches are verified through simulated data and experimentally testing data. The uncertainty behavior due to variation of different factors is also discussed in detail.
Organization and Carrying out the Educational Experiment and Statistical Analysis of Its Results in IHL

ERIC Educational Resources Information Center

Sidorov, Oleg V.; Kozub, Lyubov' V.; Goferberg, Alexander V.; Osintseva, Natalya V.

2018-01-01

The article discusses the methodological approach to the technology of the educational experiment performance, the ways of the research data processing by means of research methods and methods of mathematical statistics. The article shows the integrated use of some effective approaches to the training of the students majoring in…
A shift from significance test to hypothesis test through power analysis in medical research.

PubMed

Singh, G

2006-01-01

Medical research literature until recently, exhibited substantial dominance of the Fisher's significance test approach of statistical inference concentrating more on probability of type I error over Neyman-Pearson's hypothesis test considering both probability of type I and II error. Fisher's approach dichotomises results into significant or not significant results with a P value. The Neyman-Pearson's approach talks of acceptance or rejection of null hypothesis. Based on the same theory these two approaches deal with same objective and conclude in their own way. The advancement in computing techniques and availability of statistical software have resulted in increasing application of power calculations in medical research and thereby reporting the result of significance tests in the light of power of the test also. Significance test approach, when it incorporates power analysis contains the essence of hypothesis test approach. It may be safely argued that rising application of power analysis in medical research may have initiated a shift from Fisher's significance test to Neyman-Pearson's hypothesis test procedure.
An issue of literacy on pediatric arterial hypertension

NASA Astrophysics Data System (ADS)

Teodoro, M. Filomena; Romana, Andreia; Simão, Carla

2017-11-01

Arterial hypertension in pediatric age is a public health problem, whose prevalence has increased significantly over time. Pediatric arterial hypertension (PAH) is under-diagnosed in most cases, a highly prevalent disease, appears without notice with multiple consequences on the children's health and future adults. Children caregivers and close family must know the PAH existence, the negative consequences associated with it, the risk factors and, finally, must do prevention. In [12, 13] can be found a statistical data analysis using a simpler questionnaire introduced in [4] under the aim of a preliminary study about PAH caregivers acquaintance. A continuation of such analysis is detailed in [14]. An extension of such questionnaire was built and applied to a distinct population and it was filled online. The statistical approach is partially reproduced in the present work. Some statistical models were estimated using several approaches, namely multivariate analysis (factorial analysis), also adequate methods to analyze the kind of data in study.
Improved Statistics for Genome-Wide Interaction Analysis

PubMed Central

Ueki, Masao; Cordell, Heather J.

2012-01-01

Recently, Wu and colleagues [1] proposed two novel statistics for genome-wide interaction analysis using case/control or case-only data. In computer simulations, their proposed case/control statistic outperformed competing approaches, including the fast-epistasis option in PLINK and logistic regression analysis under the correct model; however, reasons for its superior performance were not fully explored. Here we investigate the theoretical properties and performance of Wu et al.'s proposed statistics and explain why, in some circumstances, they outperform competing approaches. Unfortunately, we find minor errors in the formulae for their statistics, resulting in tests that have higher than nominal type 1 error. We also find minor errors in PLINK's fast-epistasis and case-only statistics, although theory and simulations suggest that these errors have only negligible effect on type 1 error. We propose adjusted versions of all four statistics that, both theoretically and in computer simulations, maintain correct type 1 error rates under the null hypothesis. We also investigate statistics based on correlation coefficients that maintain similar control of type 1 error. Although designed to test specifically for interaction, we show that some of these previously-proposed statistics can, in fact, be sensitive to main effects at one or both loci, particularly in the presence of linkage disequilibrium. We propose two new “joint effects” statistics that, provided the disease is rare, are sensitive only to genuine interaction effects. In computer simulations we find, in most situations considered, that highest power is achieved by analysis under the correct genetic model. Such an analysis is unachievable in practice, as we do not know this model. However, generally high power over a wide range of scenarios is exhibited by our joint effects and adjusted Wu statistics. We recommend use of these alternative or adjusted statistics and urge caution when using Wu et al.'s originally-proposed statistics, on account of the inflated error rate that can result. PMID:22496670
A Statistical Method for Synthesizing Mediation Analyses Using the Product of Coefficient Approach Across Multiple Trials

PubMed Central

Huang, Shi; MacKinnon, David P.; Perrino, Tatiana; Gallo, Carlos; Cruden, Gracelyn; Brown, C Hendricks

2016-01-01

Mediation analysis often requires larger sample sizes than main effect analysis to achieve the same statistical power. Combining results across similar trials may be the only practical option for increasing statistical power for mediation analysis in some situations. In this paper, we propose a method to estimate: 1) marginal means for mediation path a, the relation of the independent variable to the mediator; 2) marginal means for path b, the relation of the mediator to the outcome, across multiple trials; and 3) the between-trial level variance-covariance matrix based on a bivariate normal distribution. We present the statistical theory and an R computer program to combine regression coefficients from multiple trials to estimate a combined mediated effect and confidence interval under a random effects model. Values of coefficients a and b, along with their standard errors from each trial are the input for the method. This marginal likelihood based approach with Monte Carlo confidence intervals provides more accurate inference than the standard meta-analytic approach. We discuss computational issues, apply the method to two real-data examples and make recommendations for the use of the method in different settings. PMID:28239330
Variation in reaction norms: Statistical considerations and biological interpretation.

PubMed

Morrissey, Michael B; Liefting, Maartje

2016-09-01

Analysis of reaction norms, the functions by which the phenotype produced by a given genotype depends on the environment, is critical to studying many aspects of phenotypic evolution. Different techniques are available for quantifying different aspects of reaction norm variation. We examine what biological inferences can be drawn from some of the more readily applicable analyses for studying reaction norms. We adopt a strongly biologically motivated view, but draw on statistical theory to highlight strengths and drawbacks of different techniques. In particular, consideration of some formal statistical theory leads to revision of some recently, and forcefully, advocated opinions on reaction norm analysis. We clarify what simple analysis of the slope between mean phenotype in two environments can tell us about reaction norms, explore the conditions under which polynomial regression can provide robust inferences about reaction norm shape, and explore how different existing approaches may be used to draw inferences about variation in reaction norm shape. We show how mixed model-based approaches can provide more robust inferences than more commonly used multistep statistical approaches, and derive new metrics of the relative importance of variation in reaction norm intercepts, slopes, and curvatures. © 2016 The Author(s). Evolution © 2016 The Society for the Study of Evolution.
A Statistical Approach for Testing Cross-Phenotype Effects of Rare Variants

PubMed Central

Broadaway, K. Alaine; Cutler, David J.; Duncan, Richard; Moore, Jacob L.; Ware, Erin B.; Jhun, Min A.; Bielak, Lawrence F.; Zhao, Wei; Smith, Jennifer A.; Peyser, Patricia A.; Kardia, Sharon L.R.; Ghosh, Debashis; Epstein, Michael P.

2016-01-01

Increasing empirical evidence suggests that many genetic variants influence multiple distinct phenotypes. When cross-phenotype effects exist, multivariate association methods that consider pleiotropy are often more powerful than univariate methods that model each phenotype separately. Although several statistical approaches exist for testing cross-phenotype effects for common variants, there is a lack of similar tests for gene-based analysis of rare variants. In order to fill this important gap, we introduce a statistical method for cross-phenotype analysis of rare variants using a nonparametric distance-covariance approach that compares similarity in multivariate phenotypes to similarity in rare-variant genotypes across a gene. The approach can accommodate both binary and continuous phenotypes and further can adjust for covariates. Our approach yields a closed-form test whose significance can be evaluated analytically, thereby improving computational efficiency and permitting application on a genome-wide scale. We use simulated data to demonstrate that our method, which we refer to as the Gene Association with Multiple Traits (GAMuT) test, provides increased power over competing approaches. We also illustrate our approach using exome-chip data from the Genetic Epidemiology Network of Arteriopathy. PMID:26942286
Comparison of a non-stationary voxelation-corrected cluster-size test with TFCE for group-Level MRI inference.

PubMed

Li, Huanjie; Nickerson, Lisa D; Nichols, Thomas E; Gao, Jia-Hong

2017-03-01

Two powerful methods for statistical inference on MRI brain images have been proposed recently, a non-stationary voxelation-corrected cluster-size test (CST) based on random field theory and threshold-free cluster enhancement (TFCE) based on calculating the level of local support for a cluster, then using permutation testing for inference. Unlike other statistical approaches, these two methods do not rest on the assumptions of a uniform and high degree of spatial smoothness of the statistic image. Thus, they are strongly recommended for group-level fMRI analysis compared to other statistical methods. In this work, the non-stationary voxelation-corrected CST and TFCE methods for group-level analysis were evaluated for both stationary and non-stationary images under varying smoothness levels, degrees of freedom and signal to noise ratios. Our results suggest that, both methods provide adequate control for the number of voxel-wise statistical tests being performed during inference on fMRI data and they are both superior to current CSTs implemented in popular MRI data analysis software packages. However, TFCE is more sensitive and stable for group-level analysis of VBM data. Thus, the voxelation-corrected CST approach may confer some advantages by being computationally less demanding for fMRI data analysis than TFCE with permutation testing and by also being applicable for single-subject fMRI analyses, while the TFCE approach is advantageous for VBM data. Hum Brain Mapp 38:1269-1280, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Accounting for standard errors of vision-specific latent trait in regression models.

PubMed

Wong, Wan Ling; Li, Xiang; Li, Jialiang; Wong, Tien Yin; Cheng, Ching-Yu; Lamoureux, Ecosse L

2014-07-11

To demonstrate the effectiveness of Hierarchical Bayesian (HB) approach in a modeling framework for association effects that accounts for SEs of vision-specific latent traits assessed using Rasch analysis. A systematic literature review was conducted in four major ophthalmic journals to evaluate Rasch analysis performed on vision-specific instruments. The HB approach was used to synthesize the Rasch model and multiple linear regression model for the assessment of the association effects related to vision-specific latent traits. The effectiveness of this novel HB one-stage "joint-analysis" approach allows all model parameters to be estimated simultaneously and was compared with the frequently used two-stage "separate-analysis" approach in our simulation study (Rasch analysis followed by traditional statistical analyses without adjustment for SE of latent trait). Sixty-six reviewed articles performed evaluation and validation of vision-specific instruments using Rasch analysis, and 86.4% (n = 57) performed further statistical analyses on the Rasch-scaled data using traditional statistical methods; none took into consideration SEs of the estimated Rasch-scaled scores. The two models on real data differed for effect size estimations and the identification of "independent risk factors." Simulation results showed that our proposed HB one-stage "joint-analysis" approach produces greater accuracy (average of 5-fold decrease in bias) with comparable power and precision in estimation of associations when compared with the frequently used two-stage "separate-analysis" procedure despite accounting for greater uncertainty due to the latent trait. Patient-reported data, using Rasch analysis techniques, do not take into account the SE of latent trait in association analyses. The HB one-stage "joint-analysis" is a better approach, producing accurate effect size estimations and information about the independent association of exposure variables with vision-specific latent traits. Copyright 2014 The Association for Research in Vision and Ophthalmology, Inc.
Assessing compositional variability through graphical analysis and Bayesian statistical approaches: case studies on transgenic crops.

PubMed

Harrigan, George G; Harrison, Jay M

2012-01-01

New transgenic (GM) crops are subjected to extensive safety assessments that include compositional comparisons with conventional counterparts as a cornerstone of the process. The influence of germplasm, location, environment, and agronomic treatments on compositional variability is, however, often obscured in these pair-wise comparisons. Furthermore, classical statistical significance testing can often provide an incomplete and over-simplified summary of highly responsive variables such as crop composition. In order to more clearly describe the influence of the numerous sources of compositional variation we present an introduction to two alternative but complementary approaches to data analysis and interpretation. These include i) exploratory data analysis (EDA) with its emphasis on visualization and graphics-based approaches and ii) Bayesian statistical methodology that provides easily interpretable and meaningful evaluations of data in terms of probability distributions. The EDA case-studies include analyses of herbicide-tolerant GM soybean and insect-protected GM maize and soybean. Bayesian approaches are presented in an analysis of herbicide-tolerant GM soybean. Advantages of these approaches over classical frequentist significance testing include the more direct interpretation of results in terms of probabilities pertaining to quantities of interest and no confusion over the application of corrections for multiple comparisons. It is concluded that a standardized framework for these methodologies could provide specific advantages through enhanced clarity of presentation and interpretation in comparative assessments of crop composition.
A statistical approach to EMI - Theory and experiment

NASA Astrophysics Data System (ADS)

Weiner, Donald; Capraro, Gerard

A probabilistic approach to electromagnetic interference (EMI) is presented. The approach is illustrated by analyzing an experimental circuit in which EMI occurs. Both random and weakly nonlinear effects are accounted for in the analysis.
AN EMPIRICAL BAYES APPROACH TO COMBINING ESTIMATES OF THE VALUE OF A STATISTICAL LIFE FOR ENVIRONMENTAL POLICY ANALYSIS

EPA Science Inventory

This analysis updates EPA's standard VSL estimate by using a more comprehensive collection of VSL studies that include studies published between 1992 and 2000, as well as applying a more appropriate statistical method. We provide a pooled effect VSL estimate by applying the empi...
Meta-analysis and The Cochrane Collaboration: 20 years of the Cochrane Statistical Methods Group

PubMed Central

2013-01-01

The Statistical Methods Group has played a pivotal role in The Cochrane Collaboration over the past 20 years. The Statistical Methods Group has determined the direction of statistical methods used within Cochrane reviews, developed guidance for these methods, provided training, and continued to discuss and consider new and controversial issues in meta-analysis. The contribution of Statistical Methods Group members to the meta-analysis literature has been extensive and has helped to shape the wider meta-analysis landscape. In this paper, marking the 20th anniversary of The Cochrane Collaboration, we reflect on the history of the Statistical Methods Group, beginning in 1993 with the identification of aspects of statistical synthesis for which consensus was lacking about the best approach. We highlight some landmark methodological developments that Statistical Methods Group members have contributed to in the field of meta-analysis. We discuss how the Group implements and disseminates statistical methods within The Cochrane Collaboration. Finally, we consider the importance of robust statistical methodology for Cochrane systematic reviews, note research gaps, and reflect on the challenges that the Statistical Methods Group faces in its future direction. PMID:24280020
Morphological texture assessment of oral bone as a screening tool for osteoporosis

NASA Astrophysics Data System (ADS)

Analoui, Mostafa; Eggertsson, Hafsteinn; Eckert, George

2001-07-01

Three classes of texture analysis approaches have been employed to assess the textural characteristic of oral bone. A set of linear structuring elements was used to compute granulometric features of trabecular bone. Multifractal analysis was also used to compute the fractal dimension of the corresponding tissues. In addition, some statistical features and histomorphometric parameters were computed. To assess the proposed approach we acquired digital intraoral radiographs of 47 subjects (14 males and 33 females). All radiographs were captured at 12 bits/pixel. Images were converted to binary form through a sliding locally adaptive thresholding approach. Each subject was scanned by DEXA for bone dosimetry. Subject were classified into one of the following three categories according World Health Organization (WHO) standard (1) healthy, (2) with osteopenia and (3) osteoporosis. In this study fractal dimension showed very low correlation with bone mineral density (BMD) measurements, which did not reach a level of statistical significance (p<0.5). However, entropy of pattern spectrum (EPS), along with statistical features and histomorphometric parameters, has shown correlation coefficients ranging from low to high, with statistical significance for both males and females. The results of this study indicate the utility of this approach for bone texture analysis. It is conjectured that designing a 2-D structuring element, specially tuned to trabecular bone texture, will increase the efficacy of the proposed method.

Comparative Performance Analysis of a Hyper-Temporal Ndvi Analysis Approach and a Landscape-Ecological Mapping Approach

NASA Astrophysics Data System (ADS)

Ali, A.; de Bie, C. A. J. M.; Scarrott, R. G.; Ha, N. T. T.; Skidmore, A. K.

2012-07-01

Both agricultural area expansion and intensification are necessary to cope with the growing demand for food, and the growing threat of food insecurity which is rapidly engulfing poor and under-privileged sections of the global population. Therefore, it is of paramount importance to have the ability to accurately estimate crop area and spatial distribution. Remote sensing has become a valuable tool for estimating and mapping cropland areas, useful in food security monitoring. This work contributes to addressing this broad issue, focusing on the comparative performance analysis of two mapping approaches (i) a hyper-temporal Normalized Difference Vegetation Index (NDVI) analysis approach and (ii) a Landscape-ecological approach. The hyper-temporal NDVI analysis approach utilized SPOT 10-day NDVI imagery from April 1998-December 2008, whilst the Landscape-ecological approach used multitemporal Landsat-7 ETM+ imagery acquired intermittently between 1992 and 2002. Pixels in the time-series NDVI dataset were clustered using an ISODATA clustering algorithm adapted to determine the optimal number of pixel clusters to successfully generalize hyper-temporal datasets. Clusters were then characterized with crop cycle information, and flooding information to produce an NDVI unit map of rice classes with flood regime and NDVI profile information. A Landscape-ecological map was generated using a combination of digitized homogenous map units in the Landsat-7 ETM+ imagery, a Land use map 2005 of the Mekong delta, and supplementary datasets on the regions terrain, geo-morphology and flooding depths. The output maps were validated using reported crop statistics, and regression analyses were used to ascertain the relationship between land use area estimated from maps, and those reported in district crop statistics. The regression analysis showed that the hyper-temporal NDVI analysis approach explained 74% and 76% of the variability in reported crop statistics in two rice crop and three rice crop land use systems respectively. In contrast, 64% and 63% of the variability was explained respectively by the Landscape-ecological map. Overall, the results indicate the hyper-temporal NDVI analysis approach is more accurate and more useful in exploring when, why and how agricultural land use manifests itself in space and time. Furthermore, the NDVI analysis approach was found to be easier to implement, was more cost effective, and involved less subjective user intervention than the landscape-ecological approach.
SOCR: Statistics Online Computational Resource

ERIC Educational Resources Information Center

Dinov, Ivo D.

2006-01-01

The need for hands-on computer laboratory experience in undergraduate and graduate statistics education has been firmly established in the past decade. As a result a number of attempts have been undertaken to develop novel approaches for problem-driven statistical thinking, data analysis and result interpretation. In this paper we describe an…
Improved Test Planning and Analysis Through the Use of Advanced Statistical Methods

NASA Technical Reports Server (NTRS)

Green, Lawrence L.; Maxwell, Katherine A.; Glass, David E.; Vaughn, Wallace L.; Barger, Weston; Cook, Mylan

2016-01-01

The goal of this work is, through computational simulations, to provide statistically-based evidence to convince the testing community that a distributed testing approach is superior to a clustered testing approach for most situations. For clustered testing, numerous, repeated test points are acquired at a limited number of test conditions. For distributed testing, only one or a few test points are requested at many different conditions. The statistical techniques of Analysis of Variance (ANOVA), Design of Experiments (DOE) and Response Surface Methods (RSM) are applied to enable distributed test planning, data analysis and test augmentation. The D-Optimal class of DOE is used to plan an optimally efficient single- and multi-factor test. The resulting simulated test data are analyzed via ANOVA and a parametric model is constructed using RSM. Finally, ANOVA can be used to plan a second round of testing to augment the existing data set with new data points. The use of these techniques is demonstrated through several illustrative examples. To date, many thousands of comparisons have been performed and the results strongly support the conclusion that the distributed testing approach outperforms the clustered testing approach.
Combined statistical analyses for long-term stability data with multiple storage conditions: a simulation study.

PubMed

Almalik, Osama; Nijhuis, Michiel B; van den Heuvel, Edwin R

2014-01-01

Shelf-life estimation usually requires that at least three registration batches are tested for stability at multiple storage conditions. The shelf-life estimates are often obtained by linear regression analysis per storage condition, an approach implicitly suggested by ICH guideline Q1E. A linear regression analysis combining all data from multiple storage conditions was recently proposed in the literature when variances are homogeneous across storage conditions. The combined analysis is expected to perform better than the separate analysis per storage condition, since pooling data would lead to an improved estimate of the variation and higher numbers of degrees of freedom, but this is not evident for shelf-life estimation. Indeed, the two approaches treat the observed initial batch results, the intercepts in the model, and poolability of batches differently, which may eliminate or reduce the expected advantage of the combined approach with respect to the separate approach. Therefore, a simulation study was performed to compare the distribution of simulated shelf-life estimates on several characteristics between the two approaches and to quantify the difference in shelf-life estimates. In general, the combined statistical analysis does estimate the true shelf life more consistently and precisely than the analysis per storage condition, but it did not outperform the separate analysis in all circumstances.
Statistical Analysis of Individual Participant Data Meta-Analyses: A Comparison of Methods and Recommendations for Practice

PubMed Central

Stewart, Gavin B.; Altman, Douglas G.; Askie, Lisa M.; Duley, Lelia; Simmonds, Mark C.; Stewart, Lesley A.

2012-01-01

Background Individual participant data (IPD) meta-analyses that obtain “raw” data from studies rather than summary data typically adopt a “two-stage” approach to analysis whereby IPD within trials generate summary measures, which are combined using standard meta-analytical methods. Recently, a range of “one-stage” approaches which combine all individual participant data in a single meta-analysis have been suggested as providing a more powerful and flexible approach. However, they are more complex to implement and require statistical support. This study uses a dataset to compare “two-stage” and “one-stage” models of varying complexity, to ascertain whether results obtained from the approaches differ in a clinically meaningful way. Methods and Findings We included data from 24 randomised controlled trials, evaluating antiplatelet agents, for the prevention of pre-eclampsia in pregnancy. We performed two-stage and one-stage IPD meta-analyses to estimate overall treatment effect and to explore potential treatment interactions whereby particular types of women and their babies might benefit differentially from receiving antiplatelets. Two-stage and one-stage approaches gave similar results, showing a benefit of using anti-platelets (Relative risk 0.90, 95% CI 0.84 to 0.97). Neither approach suggested that any particular type of women benefited more or less from antiplatelets. There were no material differences in results between different types of one-stage model. Conclusions For these data, two-stage and one-stage approaches to analysis produce similar results. Although one-stage models offer a flexible environment for exploring model structure and are useful where across study patterns relating to types of participant, intervention and outcome mask similar relationships within trials, the additional insights provided by their usage may not outweigh the costs of statistical support for routine application in syntheses of randomised controlled trials. Researchers considering undertaking an IPD meta-analysis should not necessarily be deterred by a perceived need for sophisticated statistical methods when combining information from large randomised trials. PMID:23056232
Assessment of statistical methods used in library-based approaches to microbial source tracking.

PubMed

Ritter, Kerry J; Carruthers, Ethan; Carson, C Andrew; Ellender, R D; Harwood, Valerie J; Kingsley, Kyle; Nakatsu, Cindy; Sadowsky, Michael; Shear, Brian; West, Brian; Whitlock, John E; Wiggins, Bruce A; Wilbur, Jayson D

2003-12-01

Several commonly used statistical methods for fingerprint identification in microbial source tracking (MST) were examined to assess the effectiveness of pattern-matching algorithms to correctly identify sources. Although numerous statistical methods have been employed for source identification, no widespread consensus exists as to which is most appropriate. A large-scale comparison of several MST methods, using identical fecal sources, presented a unique opportunity to assess the utility of several popular statistical methods. These included discriminant analysis, nearest neighbour analysis, maximum similarity and average similarity, along with several measures of distance or similarity. Threshold criteria for excluding uncertain or poorly matched isolates from final analysis were also examined for their ability to reduce false positives and increase prediction success. Six independent libraries used in the study were constructed from indicator bacteria isolated from fecal materials of humans, seagulls, cows and dogs. Three of these libraries were constructed using the rep-PCR technique and three relied on antibiotic resistance analysis (ARA). Five of the libraries were constructed using Escherichia coli and one using Enterococcus spp. (ARA). Overall, the outcome of this study suggests a high degree of variability across statistical methods. Despite large differences in correct classification rates among the statistical methods, no single statistical approach emerged as superior. Thresholds failed to consistently increase rates of correct classification and improvement was often associated with substantial effective sample size reduction. Recommendations are provided to aid in selecting appropriate analyses for these types of data.
Investigation of Structure/Fluid Interaction in Piping Systems Using an Intensity Measurement Approach.

DTIC Science & Technology

1987-07-01

of vibrational power flow had been considered by experiments in the area of statistical energy analysis (SEA)8, 9 using other measurement ipproaches...Constants in Statistical Energy Analysis of Structure," J. Acoust. Soc. Am. Vol. 52, No. 2, pp. 516-524 (1973) 9. Fahy, F. and R. Pierri, "Application of
Discovering Randomness, Recovering Expertise: The Different Approaches to the Quality in Measurement of Coulomb and Gauss and of Today's Students

ERIC Educational Resources Information Center

Heinicke, Susanne; Heering, Peter

2013-01-01

The aim of this paper is to discuss different approaches to the quality (or uncertainty) of measurement data considering both historical examples and today's students' views. Today's teaching of data analysis is very much focussed on the application of statistical routines (often called the "Gaussian approach" to error analysis). Studies on…
Statistical Power Analysis with Microsoft Excel: Normal Tests for One or Two Means as a Prelude to Using Non-Central Distributions to Calculate Power

ERIC Educational Resources Information Center

Texeira, Antonio; Rosa, Alvaro; Calapez, Teresa

2009-01-01

This article presents statistical power analysis (SPA) based on the normal distribution using Excel, adopting textbook and SPA approaches. The objective is to present the latter in a comparative way within a framework that is familiar to textbook level readers, as a first step to understand SPA with other distributions. The analysis focuses on the…
From fields to objects: A review of geographic boundary analysis

NASA Astrophysics Data System (ADS)

Jacquez, G. M.; Maruca, S.; Fortin, M.-J.

Geographic boundary analysis is a relatively new approach unfamiliar to many spatial analysts. It is best viewed as a technique for defining objects - geographic boundaries - on spatial fields, and for evaluating the statistical significance of characteristics of those boundary objects. This is accomplished using null spatial models representative of the spatial processes expected in the absence of boundary-generating phenomena. Close ties to the object-field dialectic eminently suit boundary analysis to GIS data. The majority of existing spatial methods are field-based in that they describe, estimate, or predict how attributes (variables defining the field) vary through geographic space. Such methods are appropriate for field representations but not object representations. As the object-field paradigm gains currency in geographic information science, appropriate techniques for the statistical analysis of objects are required. The methods reviewed in this paper are a promising foundation. Geographic boundary analysis is clearly a valuable addition to the spatial statistical toolbox. This paper presents the philosophy of, and motivations for geographic boundary analysis. It defines commonly used statistics for quantifying boundaries and their characteristics, as well as simulation procedures for evaluating their significance. We review applications of these techniques, with the objective of making this promising approach accessible to the GIS-spatial analysis community. We also describe the implementation of these methods within geographic boundary analysis software: GEM.
Exploration of the Maximum Entropy/Optimal Projection Approach to Control Design Synthesis for Large Space Structures.

DTIC Science & Technology

1985-02-01

Energy Analysis , a branch of dynamic modal analysis developed for analyzing acoustic vibration problems, its present stage of development embodies a...Maximum Entropy Stochastic Modelling and Reduced-Order Design Synthesis is a rigorous new approach to this class of problems. Inspired by Statistical
Modeling and replicating statistical topology and evidence for CMB nonhomogeneity

PubMed Central

Agami, Sarit

2017-01-01

Under the banner of “big data,” the detection and classification of structure in extremely large, high-dimensional, data sets are two of the central statistical challenges of our times. Among the most intriguing new approaches to this challenge is “TDA,” or “topological data analysis,” one of the primary aims of which is providing nonmetric, but topologically informative, preanalyses of data which make later, more quantitative, analyses feasible. While TDA rests on strong mathematical foundations from topology, in applications, it has faced challenges due to difficulties in handling issues of statistical reliability and robustness, often leading to an inability to make scientific claims with verifiable levels of statistical confidence. We propose a methodology for the parametric representation, estimation, and replication of persistence diagrams, the main diagnostic tool of TDA. The power of the methodology lies in the fact that even if only one persistence diagram is available for analysis—the typical case for big data applications—the replications permit conventional statistical hypothesis testing. The methodology is conceptually simple and computationally practical, and provides a broadly effective statistical framework for persistence diagram TDA analysis. We demonstrate the basic ideas on a toy example, and the power of the parametric approach to TDA modeling in an analysis of cosmic microwave background (CMB) nonhomogeneity. PMID:29078301
Approaching Career Criminals With An Intelligence Cycle

DTIC Science & Technology

2015-12-01

including arrest statistics and “arrest statistics have been used as the main barometer of juvenile delinquent activity, (but) many juvenile... Statistical Briefing Book,” 187. 26 guided by theories about the causes of delinquent behavior, but there was no determination if those efforts achieved the...children.”110 However, the most evidence-based comparison of juvenile delinquency reduction programs is the statistical meta-analysis (a systematic
Mapping Quantitative Traits in Unselected Families: Algorithms and Examples

PubMed Central

Dupuis, Josée; Shi, Jianxin; Manning, Alisa K.; Benjamin, Emelia J.; Meigs, James B.; Cupples, L. Adrienne; Siegmund, David

2009-01-01

Linkage analysis has been widely used to identify from family data genetic variants influencing quantitative traits. Common approaches have both strengths and limitations. Likelihood ratio tests typically computed in variance component analysis can accommodate large families but are highly sensitive to departure from normality assumptions. Regression-based approaches are more robust but their use has primarily been restricted to nuclear families. In this paper, we develop methods for mapping quantitative traits in moderately large pedigrees. Our methods are based on the score statistic which in contrast to the likelihood ratio statistic, can use nonparametric estimators of variability to achieve robustness of the false positive rate against departures from the hypothesized phenotypic model. Because the score statistic is easier to calculate than the likelihood ratio statistic, our basic mapping methods utilize relatively simple computer code that performs statistical analysis on output from any program that computes estimates of identity-by-descent. This simplicity also permits development and evaluation of methods to deal with multivariate and ordinal phenotypes, and with gene-gene and gene-environment interaction. We demonstrate our methods on simulated data and on fasting insulin, a quantitative trait measured in the Framingham Heart Study. PMID:19278016
Multivariate meta-analysis: a robust approach based on the theory of U-statistic.

PubMed

Ma, Yan; Mazumdar, Madhu

2011-10-30

Meta-analysis is the methodology for combining findings from similar research studies asking the same question. When the question of interest involves multiple outcomes, multivariate meta-analysis is used to synthesize the outcomes simultaneously taking into account the correlation between the outcomes. Likelihood-based approaches, in particular restricted maximum likelihood (REML) method, are commonly utilized in this context. REML assumes a multivariate normal distribution for the random-effects model. This assumption is difficult to verify, especially for meta-analysis with small number of component studies. The use of REML also requires iterative estimation between parameters, needing moderately high computation time, especially when the dimension of outcomes is large. A multivariate method of moments (MMM) is available and is shown to perform equally well to REML. However, there is a lack of information on the performance of these two methods when the true data distribution is far from normality. In this paper, we propose a new nonparametric and non-iterative method for multivariate meta-analysis on the basis of the theory of U-statistic and compare the properties of these three procedures under both normal and skewed data through simulation studies. It is shown that the effect on estimates from REML because of non-normal data distribution is marginal and that the estimates from MMM and U-statistic-based approaches are very similar. Therefore, we conclude that for performing multivariate meta-analysis, the U-statistic estimation procedure is a viable alternative to REML and MMM. Easy implementation of all three methods are illustrated by their application to data from two published meta-analysis from the fields of hip fracture and periodontal disease. We discuss ideas for future research based on U-statistic for testing significance of between-study heterogeneity and for extending the work to meta-regression setting. Copyright © 2011 John Wiley & Sons, Ltd.
Meta-analysis of gene-level associations for rare variants based on single-variant statistics.

PubMed

Hu, Yi-Juan; Berndt, Sonja I; Gustafsson, Stefan; Ganna, Andrea; Hirschhorn, Joel; North, Kari E; Ingelsson, Erik; Lin, Dan-Yu

2013-08-08

Meta-analysis of genome-wide association studies (GWASs) has led to the discoveries of many common variants associated with complex human diseases. There is a growing recognition that identifying "causal" rare variants also requires large-scale meta-analysis. The fact that association tests with rare variants are performed at the gene level rather than at the variant level poses unprecedented challenges in the meta-analysis. First, different studies may adopt different gene-level tests, so the results are not compatible. Second, gene-level tests require multivariate statistics (i.e., components of the test statistic and their covariance matrix), which are difficult to obtain. To overcome these challenges, we propose to perform gene-level tests for rare variants by combining the results of single-variant analysis (i.e., p values of association tests and effect estimates) from participating studies. This simple strategy is possible because of an insight that multivariate statistics can be recovered from single-variant statistics, together with the correlation matrix of the single-variant test statistics, which can be estimated from one of the participating studies or from a publicly available database. We show both theoretically and numerically that the proposed meta-analysis approach provides accurate control of the type I error and is as powerful as joint analysis of individual participant data. This approach accommodates any disease phenotype and any study design and produces all commonly used gene-level tests. An application to the GWAS summary results of the Genetic Investigation of ANthropometric Traits (GIANT) consortium reveals rare and low-frequency variants associated with human height. The relevant software is freely available. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Measuring, Understanding, and Responding to Covert Social Networks: Passive and Active Tomography

DTIC Science & Technology

2017-11-29

Methods for generating a random sample of networks with desired properties are important tools for the analysis of social , biological, and information...on Theoretical Foundations for Statistical Network Analysis at the Isaac Newton Institute for Mathematical Sciences at Cambridge U. (organized by...Approach SOCIAL SCIENCES STATISTICS EECS Problems span three disciplines Scientific focus is needed at the interfaces
Data management in large-scale collaborative toxicity studies: how to file experimental data for automated statistical analysis.

PubMed

Stanzel, Sven; Weimer, Marc; Kopp-Schneider, Annette

2013-06-01

High-throughput screening approaches are carried out for the toxicity assessment of a large number of chemical compounds. In such large-scale in vitro toxicity studies several hundred or thousand concentration-response experiments are conducted. The automated evaluation of concentration-response data using statistical analysis scripts saves time and yields more consistent results in comparison to data analysis performed by the use of menu-driven statistical software. Automated statistical analysis requires that concentration-response data are available in a standardised data format across all compounds. To obtain consistent data formats, a standardised data management workflow must be established, including guidelines for data storage, data handling and data extraction. In this paper two procedures for data management within large-scale toxicological projects are proposed. Both procedures are based on Microsoft Excel files as the researcher's primary data format and use a computer programme to automate the handling of data files. The first procedure assumes that data collection has not yet started whereas the second procedure can be used when data files already exist. Successful implementation of the two approaches into the European project ACuteTox is illustrated. Copyright © 2012 Elsevier Ltd. All rights reserved.
Teaching statistics in biology: using inquiry-based learning to strengthen understanding of statistical analysis in biology laboratory courses.

PubMed

Metz, Anneke M

2008-01-01

There is an increasing need for students in the biological sciences to build a strong foundation in quantitative approaches to data analyses. Although most science, engineering, and math field majors are required to take at least one statistics course, statistical analysis is poorly integrated into undergraduate biology course work, particularly at the lower-division level. Elements of statistics were incorporated into an introductory biology course, including a review of statistics concepts and opportunity for students to perform statistical analysis in a biological context. Learning gains were measured with an 11-item statistics learning survey instrument developed for the course. Students showed a statistically significant 25% (p < 0.005) increase in statistics knowledge after completing introductory biology. Students improved their scores on the survey after completing introductory biology, even if they had previously completed an introductory statistics course (9%, improvement p < 0.005). Students retested 1 yr after completing introductory biology showed no loss of their statistics knowledge as measured by this instrument, suggesting that the use of statistics in biology course work may aid long-term retention of statistics knowledge. No statistically significant differences in learning were detected between male and female students in the study.
Challenges in Biomarker Discovery: Combining Expert Insights with Statistical Analysis of Complex Omics Data

PubMed Central

McDermott, Jason E.; Wang, Jing; Mitchell, Hugh; Webb-Robertson, Bobbie-Jo; Hafen, Ryan; Ramey, John; Rodland, Karin D.

2012-01-01

Introduction The advent of high throughput technologies capable of comprehensive analysis of genes, transcripts, proteins and other significant biological molecules has provided an unprecedented opportunity for the identification of molecular markers of disease processes. However, it has simultaneously complicated the problem of extracting meaningful molecular signatures of biological processes from these complex datasets. The process of biomarker discovery and characterization provides opportunities for more sophisticated approaches to integrating purely statistical and expert knowledge-based approaches. Areas covered In this review we will present examples of current practices for biomarker discovery from complex omic datasets and the challenges that have been encountered in deriving valid and useful signatures of disease. We will then present a high-level review of data-driven (statistical) and knowledge-based methods applied to biomarker discovery, highlighting some current efforts to combine the two distinct approaches. Expert opinion Effective, reproducible and objective tools for combining data-driven and knowledge-based approaches to identify predictive signatures of disease are key to future success in the biomarker field. We will describe our recommendations for possible approaches to this problem including metrics for the evaluation of biomarkers. PMID:23335946

Ankle plantarflexion strength in rearfoot and forefoot runners: a novel clusteranalytic approach.

PubMed

Liebl, Dominik; Willwacher, Steffen; Hamill, Joseph; Brüggemann, Gert-Peter

2014-06-01

The purpose of the present study was to test for differences in ankle plantarflexion strengths of habitually rearfoot and forefoot runners. In order to approach this issue, we revisit the problem of classifying different footfall patterns in human runners. A dataset of 119 subjects running shod and barefoot (speed 3.5m/s) was analyzed. The footfall patterns were clustered by a novel statistical approach, which is motivated by advances in the statistical literature on functional data analysis. We explain the novel statistical approach in detail and compare it to the classically used strike index of Cavanagh and Lafortune (1980). The two groups found by the new cluster approach are well interpretable as a forefoot and a rearfoot footfall groups. The subsequent comparison study of the clustered subjects reveals that runners with a forefoot footfall pattern are capable of producing significantly higher joint moments in a maximum voluntary contraction (MVC) of their ankle plantarflexor muscles tendon units; difference in means: 0.28Nm/kg. This effect remains significant after controlling for an additional gender effect and for differences in training levels. Our analysis confirms the hypothesis that forefoot runners have a higher mean MVC plantarflexion strength than rearfoot runners. Furthermore, we demonstrate that our proposed stochastic cluster analysis provides a robust and useful framework for clustering foot strikes. Copyright © 2014 Elsevier B.V. All rights reserved.
Daniel Goodman’s empirical approach to Bayesian statistics

USGS Publications Warehouse

Gerrodette, Tim; Ward, Eric; Taylor, Rebecca L.; Schwarz, Lisa K.; Eguchi, Tomoharu; Wade, Paul; Himes Boor, Gina

2016-01-01

Bayesian statistics, in contrast to classical statistics, uses probability to represent uncertainty about the state of knowledge. Bayesian statistics has often been associated with the idea that knowledge is subjective and that a probability distribution represents a personal degree of belief. Dr. Daniel Goodman considered this viewpoint problematic for issues of public policy. He sought to ground his Bayesian approach in data, and advocated the construction of a prior as an empirical histogram of “similar” cases. In this way, the posterior distribution that results from a Bayesian analysis combined comparable previous data with case-specific current data, using Bayes’ formula. Goodman championed such a data-based approach, but he acknowledged that it was difficult in practice. If based on a true representation of our knowledge and uncertainty, Goodman argued that risk assessment and decision-making could be an exact science, despite the uncertainties. In his view, Bayesian statistics is a critical component of this science because a Bayesian analysis produces the probabilities of future outcomes. Indeed, Goodman maintained that the Bayesian machinery, following the rules of conditional probability, offered the best legitimate inference from available data. We give an example of an informative prior in a recent study of Steller sea lion spatial use patterns in Alaska.
A scaling procedure for the response of an isolated system with high modal overlap factor

NASA Astrophysics Data System (ADS)

De Rosa, S.; Franco, F.

2008-10-01

The paper deals with a numerical approach that reduces some physical sizes of the solution domain to compute the dynamic response of an isolated system: it has been named Asymptotical Scaled Modal Analysis (ASMA). The proposed numerical procedure alters the input data needed to obtain the classic modal responses to increase the frequency band of validity of the discrete or continuous coordinates model through the definition of a proper scaling coefficient. It is demonstrated that the computational cost remains acceptable while the frequency range of analysis increases. Moreover, with reference to the flexural vibrations of a rectangular plate, the paper discusses the ASMA vs. the statistical energy analysis and the energy distribution approach. Some insights are also given about the limits of the scaling coefficient. Finally it is shown that the linear dynamic response, predicted with the scaling procedure, has the same quality and characteristics of the statistical energy analysis, but it can be useful when the system cannot be solved appropriately by the standard Statistical Energy Analysis (SEA).
Model Uncertainty and Robustness: A Computational Framework for Multimodel Analysis

ERIC Educational Resources Information Center

Young, Cristobal; Holsteen, Katherine

2017-01-01

Model uncertainty is pervasive in social science. A key question is how robust empirical results are to sensible changes in model specification. We present a new approach and applied statistical software for computational multimodel analysis. Our approach proceeds in two steps: First, we estimate the modeling distribution of estimates across all…
Dealing with missing standard deviation and mean values in meta-analysis of continuous outcomes: a systematic review.

PubMed

Weir, Christopher J; Butcher, Isabella; Assi, Valentina; Lewis, Stephanie C; Murray, Gordon D; Langhorne, Peter; Brady, Marian C

2018-03-07

Rigorous, informative meta-analyses rely on availability of appropriate summary statistics or individual participant data. For continuous outcomes, especially those with naturally skewed distributions, summary information on the mean or variability often goes unreported. While full reporting of original trial data is the ideal, we sought to identify methods for handling unreported mean or variability summary statistics in meta-analysis. We undertook two systematic literature reviews to identify methodological approaches used to deal with missing mean or variability summary statistics. Five electronic databases were searched, in addition to the Cochrane Colloquium abstract books and the Cochrane Statistics Methods Group mailing list archive. We also conducted cited reference searching and emailed topic experts to identify recent methodological developments. Details recorded included the description of the method, the information required to implement the method, any underlying assumptions and whether the method could be readily applied in standard statistical software. We provided a summary description of the methods identified, illustrating selected methods in example meta-analysis scenarios. For missing standard deviations (SDs), following screening of 503 articles, fifteen methods were identified in addition to those reported in a previous review. These included Bayesian hierarchical modelling at the meta-analysis level; summary statistic level imputation based on observed SD values from other trials in the meta-analysis; a practical approximation based on the range; and algebraic estimation of the SD based on other summary statistics. Following screening of 1124 articles for methods estimating the mean, one approximate Bayesian computation approach and three papers based on alternative summary statistics were identified. Illustrative meta-analyses showed that when replacing a missing SD the approximation using the range minimised loss of precision and generally performed better than omitting trials. When estimating missing means, a formula using the median, lower quartile and upper quartile performed best in preserving the precision of the meta-analysis findings, although in some scenarios, omitting trials gave superior results. Methods based on summary statistics (minimum, maximum, lower quartile, upper quartile, median) reported in the literature facilitate more comprehensive inclusion of randomised controlled trials with missing mean or variability summary statistics within meta-analyses.
Public and patient involvement in quantitative health research: A statistical perspective.

PubMed

Hannigan, Ailish

2018-06-19

The majority of studies included in recent reviews of impact for public and patient involvement (PPI) in health research had a qualitative design. PPI in solely quantitative designs is underexplored, particularly its impact on statistical analysis. Statisticians in practice have a long history of working in both consultative (indirect) and collaborative (direct) roles in health research, yet their perspective on PPI in quantitative health research has never been explicitly examined. To explore the potential and challenges of PPI from a statistical perspective at distinct stages of quantitative research, that is sampling, measurement and statistical analysis, distinguishing between indirect and direct PPI. Statistical analysis is underpinned by having a representative sample, and a collaborative or direct approach to PPI may help achieve that by supporting access to and increasing participation of under-represented groups in the population. Acknowledging and valuing the role of lay knowledge of the context in statistical analysis and in deciding what variables to measure may support collective learning and advance scientific understanding, as evidenced by the use of participatory modelling in other disciplines. A recurring issue for quantitative researchers, which reflects quantitative sampling methods, is the selection and required number of PPI contributors, and this requires further methodological development. Direct approaches to PPI in quantitative health research may potentially increase its impact, but the facilitation and partnership skills required may require further training for all stakeholders, including statisticians. © 2018 The Authors Health Expectations published by John Wiley & Sons Ltd.
Statistical performance and information content of time lag analysis and redundancy analysis in time series modeling.

PubMed

Angeler, David G; Viedma, Olga; Moreno, José M

2009-11-01

Time lag analysis (TLA) is a distance-based approach used to study temporal dynamics of ecological communities by measuring community dissimilarity over increasing time lags. Despite its increased use in recent years, its performance in comparison with other more direct methods (i.e., canonical ordination) has not been evaluated. This study fills this gap using extensive simulations and real data sets from experimental temporary ponds (true zooplankton communities) and landscape studies (landscape categories as pseudo-communities) that differ in community structure and anthropogenic stress history. Modeling time with a principal coordinate of neighborhood matrices (PCNM) approach, the canonical ordination technique (redundancy analysis; RDA) consistently outperformed the other statistical tests (i.e., TLAs, Mantel test, and RDA based on linear time trends) using all real data. In addition, the RDA-PCNM revealed different patterns of temporal change, and the strength of each individual time pattern, in terms of adjusted variance explained, could be evaluated, It also identified species contributions to these patterns of temporal change. This additional information is not provided by distance-based methods. The simulation study revealed better Type I error properties of the canonical ordination techniques compared with the distance-based approaches when no deterministic component of change was imposed on the communities. The simulation also revealed that strong emphasis on uniform deterministic change and low variability at other temporal scales is needed to result in decreased statistical power of the RDA-PCNM approach relative to the other methods. Based on the statistical performance of and information content provided by RDA-PCNM models, this technique serves ecologists as a powerful tool for modeling temporal change of ecological (pseudo-) communities.
SOCR: Statistics Online Computational Resource

PubMed Central

Dinov, Ivo D.

2011-01-01

The need for hands-on computer laboratory experience in undergraduate and graduate statistics education has been firmly established in the past decade. As a result a number of attempts have been undertaken to develop novel approaches for problem-driven statistical thinking, data analysis and result interpretation. In this paper we describe an integrated educational web-based framework for: interactive distribution modeling, virtual online probability experimentation, statistical data analysis, visualization and integration. Following years of experience in statistical teaching at all college levels using established licensed statistical software packages, like STATA, S-PLUS, R, SPSS, SAS, Systat, etc., we have attempted to engineer a new statistics education environment, the Statistics Online Computational Resource (SOCR). This resource performs many of the standard types of statistical analysis, much like other classical tools. In addition, it is designed in a plug-in object-oriented architecture and is completely platform independent, web-based, interactive, extensible and secure. Over the past 4 years we have tested, fine-tuned and reanalyzed the SOCR framework in many of our undergraduate and graduate probability and statistics courses and have evidence that SOCR resources build student’s intuition and enhance their learning. PMID:21451741
Statistical Significance for Hierarchical Clustering

PubMed Central

Kimes, Patrick K.; Liu, Yufeng; Hayes, D. Neil; Marron, J. S.

2017-01-01

Summary Cluster analysis has proved to be an invaluable tool for the exploratory and unsupervised analysis of high dimensional datasets. Among methods for clustering, hierarchical approaches have enjoyed substantial popularity in genomics and other fields for their ability to simultaneously uncover multiple layers of clustering structure. A critical and challenging question in cluster analysis is whether the identified clusters represent important underlying structure or are artifacts of natural sampling variation. Few approaches have been proposed for addressing this problem in the context of hierarchical clustering, for which the problem is further complicated by the natural tree structure of the partition, and the multiplicity of tests required to parse the layers of nested clusters. In this paper, we propose a Monte Carlo based approach for testing statistical significance in hierarchical clustering which addresses these issues. The approach is implemented as a sequential testing procedure guaranteeing control of the family-wise error rate. Theoretical justification is provided for our approach, and its power to detect true clustering structure is illustrated through several simulation studies and applications to two cancer gene expression datasets. PMID:28099990
An improved approach for flight readiness certification: Methodology for failure risk assessment and application examples. Volume 2: Software documentation

NASA Technical Reports Server (NTRS)

Moore, N. R.; Ebbeler, D. H.; Newlin, L. E.; Sutharshana, S.; Creager, M.

1992-01-01

An improved methodology for quantitatively evaluating failure risk of spaceflight systems to assess flight readiness and identify risk control measures is presented. This methodology, called Probabilistic Failure Assessment (PFA), combines operating experience from tests and flights with engineering analysis to estimate failure risk. The PFA methodology is of particular value when information on which to base an assessment of failure risk, including test experience and knowledge of parameters used in engineering analyses of failure phenomena, is expensive or difficult to acquire. The PFA methodology is a prescribed statistical structure in which engineering analysis models that characterize failure phenomena are used conjointly with uncertainties about analysis parameters and/or modeling accuracy to estimate failure probability distributions for specific failure modes, These distributions can then be modified, by means of statistical procedures of the PFA methodology, to reflect any test or flight experience. Conventional engineering analysis models currently employed for design of failure prediction are used in this methodology. The PFA methodology is described and examples of its application are presented. Conventional approaches to failure risk evaluation for spaceflight systems are discussed, and the rationale for the approach taken in the PFA methodology is presented. The statistical methods, engineering models, and computer software used in fatigue failure mode applications are thoroughly documented.
An improved approach for flight readiness certification: Methodology for failure risk assessment and application examples, volume 1

NASA Technical Reports Server (NTRS)

Moore, N. R.; Ebbeler, D. H.; Newlin, L. E.; Sutharshana, S.; Creager, M.

1992-01-01

An improved methodology for quantitatively evaluating failure risk of spaceflight systems to assess flight readiness and identify risk control measures is presented. This methodology, called Probabilistic Failure Assessment (PFA), combines operating experience from tests and flights with engineering analysis to estimate failure risk. The PFA methodology is of particular value when information on which to base an assessment of failure risk, including test experience and knowledge of parameters used in engineering analyses of failure phenomena, is expensive or difficult to acquire. The PFA methodology is a prescribed statistical structure in which engineering analysis models that characterize failure phenomena are used conjointly with uncertainties about analysis parameters and/or modeling accuracy to estimate failure probability distributions for specific failure modes. These distributions can then be modified, by means of statistical procedures of the PFA methodology, to reflect any test or flight experience. Conventional engineering analysis models currently employed for design of failure prediction are used in this methodology. The PFA methodology is described and examples of its application are presented. Conventional approaches to failure risk evaluation for spaceflight systems are discussed, and the rationale for the approach taken in the PFA methodology is presented. The statistical methods, engineering models, and computer software used in fatigue failure mode applications are thoroughly documented.
Statistical primer: how to deal with missing data in scientific research?

PubMed

Papageorgiou, Grigorios; Grant, Stuart W; Takkenberg, Johanna J M; Mokhles, Mostafa M

2018-05-10

Missing data are a common challenge encountered in research which can compromise the results of statistical inference when not handled appropriately. This paper aims to introduce basic concepts of missing data to a non-statistical audience, list and compare some of the most popular approaches for handling missing data in practice and provide guidelines and recommendations for dealing with and reporting missing data in scientific research. Complete case analysis and single imputation are simple approaches for handling missing data and are popular in practice, however, in most cases they are not guaranteed to provide valid inferences. Multiple imputation is a robust and general alternative which is appropriate for data missing at random, surpassing the disadvantages of the simpler approaches, but should always be conducted with care. The aforementioned approaches are illustrated and compared in an example application using Cox regression.
Statistical Analysis of Protein Ensembles

NASA Astrophysics Data System (ADS)

Máté, Gabriell; Heermann, Dieter

2014-04-01

As 3D protein-configuration data is piling up, there is an ever-increasing need for well-defined, mathematically rigorous analysis approaches, especially that the vast majority of the currently available methods rely heavily on heuristics. We propose an analysis framework which stems from topology, the field of mathematics which studies properties preserved under continuous deformations. First, we calculate a barcode representation of the molecules employing computational topology algorithms. Bars in this barcode represent different topological features. Molecules are compared through their barcodes by statistically determining the difference in the set of their topological features. As a proof-of-principle application, we analyze a dataset compiled of ensembles of different proteins, obtained from the Ensemble Protein Database. We demonstrate that our approach correctly detects the different protein groupings.
ON THE GEOSTATISTICAL APPROACH TO THE INVERSE PROBLEM. (R825689C037)

EPA Science Inventory

Abstract
The geostatistical approach to the inverse problem is discussed with emphasis on the importance of structural analysis. Although the geostatistical approach is occasionally misconstrued as mere cokriging, in fact it consists of two steps: estimation of statist...
Analysis and Long-Term Follow-Up of the Surgical Treatment of Children With Craniopharyngioma.

PubMed

Cheng, Jing; Shao, Qiang; Pan, Zhiyong; You, Jin

2016-11-01

To investigate the relationship between the operative approach, clinical pathological factors, and curative effect of the surgical treatment in the patients with craniopharyngioma; to provide a theoretical basis for determining the prognosis and reducing the recurrence rate during the long-term postoperative follow-up in children. This was a retrospective analysis of the clinical data of 92 children who underwent surgical treatment in our department from May 2011 to January 2005. Long-term follow-up was performed from 12 months to 8 years. The pterional approach was used in 49 patients, the interhemispheric approach in 20 patients, the corpus callosum approach in 16 patients, and the butterfly approach in 7 patients. Pathological classification was performed by hematoxylin and eosin stain staining of the pathological tissues and evaluated according to the different surgical approaches, MRI calcification status, calcification type, pathological type, whether radiotherapy was performed, postoperative recurrence, and death. For the pterion approach resection, there was near total resection in 46 patients (93.9%) with the lowest recurrence rate. The operative approach and postoperative recurrence rates were compared; the difference was statistically significant (P <0.05). For comparison of the operative approach and postoperative mortality, the difference was not statistically significant (P >0.05). There was not a significant difference between the MRI classification and postoperative recurrence rate (P >0.05). Comparing the degree of tumor calcification with the recurrence rate after operation and the mortality rate, the difference was statistically significant (P <0.05). The recurrence rate and mortality rate of adamantimous craniopharyngioma and squamous papillary craniopharyngioma in 2 groups following operation were compared, and the differences were statistically significant (P <0.05). Postoperative adjuvant radiotherapy was compared with the postoperative recurrence rate and mortality; the differences were statistically significant (P <0.05). The main effects on tumor recurrence include the choice of surgical approach and degree of calcification. The adamantimous craniopharyngioma relapse rate is higher, which could be because invasion of craniopharyngioma only occurs with adamantimous craniopharyngioma. Postoperative radiotherapy can significantly prolong the recurrence time and reduce the mortality rate of patients with craniopharyngioma.
Meta-analysis: An Approach to the Synthesis of Research Results.

ERIC Educational Resources Information Center

Glass, Gene V.

1982-01-01

Discusses three general characteristics and four criticisms of meta-analysis (statistical analysis of the summary findings of many research studies). Illustrates application of meta-analysis on research studies relating to school class size and achievement and inquiry teaching of biology. (JN)
Research methodology in dentistry: Part II — The relevance of statistics in research

PubMed Central

Krithikadatta, Jogikalmat; Valarmathi, Srinivasan

2012-01-01

The lifeline of original research depends on adept statistical analysis. However, there have been reports of statistical misconduct in studies that could arise from the inadequate understanding of the fundamental of statistics. There have been several reports on this across medical and dental literature. This article aims at encouraging the reader to approach statistics from its logic rather than its theoretical perspective. The article also provides information on statistical misuse in the Journal of Conservative Dentistry between the years 2008 and 2011 PMID:22876003
The same analysis approach: Practical protection against the pitfalls of novel neuroimaging analysis methods.

PubMed

Görgen, Kai; Hebart, Martin N; Allefeld, Carsten; Haynes, John-Dylan

2017-12-27

Standard neuroimaging data analysis based on traditional principles of experimental design, modelling, and statistical inference is increasingly complemented by novel analysis methods, driven e.g. by machine learning methods. While these novel approaches provide new insights into neuroimaging data, they often have unexpected properties, generating a growing literature on possible pitfalls. We propose to meet this challenge by adopting a habit of systematic testing of experimental design, analysis procedures, and statistical inference. Specifically, we suggest to apply the analysis method used for experimental data also to aspects of the experimental design, simulated confounds, simulated null data, and control data. We stress the importance of keeping the analysis method the same in main and test analyses, because only this way possible confounds and unexpected properties can be reliably detected and avoided. We describe and discuss this Same Analysis Approach in detail, and demonstrate it in two worked examples using multivariate decoding. With these examples, we reveal two sources of error: A mismatch between counterbalancing (crossover designs) and cross-validation which leads to systematic below-chance accuracies, and linear decoding of a nonlinear effect, a difference in variance. Copyright © 2017 Elsevier Inc. All rights reserved.
Spatio-temporal analysis of aftershock sequences in terms of Non Extensive Statistical Physics.

NASA Astrophysics Data System (ADS)

Chochlaki, Kalliopi; Vallianatos, Filippos

2017-04-01

Earth's seismicity is considered as an extremely complicated process where long-range interactions and fracturing exist (Vallianatos et al., 2016). For this reason, in order to analyze it, we use an innovative methodological approach, introduced by Tsallis (Tsallis, 1988; 2009), named Non Extensive Statistical Physics. This approach introduce a generalization of the Boltzmann-Gibbs statistical mechanics and it is based on the definition of Tsallis entropy Sq, which maximized leads the the so-called q-exponential function that expresses the probability distribution function that maximizes the Sq. In the present work, we utilize the concept of Non Extensive Statistical Physics in order to analyze the spatiotemporal properties of several aftershock series. Marekova (Marekova, 2014) suggested that the probability densities of the inter-event distances between successive aftershocks follow a beta distribution. Using the same data set we analyze the inter-event distance distribution of several aftershocks sequences in different geographic regions by calculating non extensive parameters that determine the behavior of the system and by fitting the q-exponential function, which expresses the degree of non-extentivity of the investigated system. Furthermore, the inter-event times distribution of the aftershocks as well as the frequency-magnitude distribution has been analyzed. The results supports the applicability of Non Extensive Statistical Physics ideas in aftershock sequences where a strong correlation exists along with memory effects. References C. Tsallis, Possible generalization of Boltzmann-Gibbs statistics, J. Stat. Phys. 52 (1988) 479-487. doi:10.1007/BF01016429 C. Tsallis, Introduction to nonextensive statistical mechanics: Approaching a complex world, 2009. doi:10.1007/978-0-387-85359-8. E. Marekova, Analysis of the spatial distribution between successive earthquakes in aftershocks series, Annals of Geophysics, 57, 5, doi:10.4401/ag-6556, 2014 F. Vallianatos, G. Papadakis, G. Michas, Generalized statistical mechanics approaches to earthquakes and tectonics. Proc. R. Soc. A, 472, 20160497, 2016.
Characterization and classification of oral tissues using excitation and emission matrix: a statistical modeling approach

NASA Astrophysics Data System (ADS)

Kanniyappan, Udayakumar; Gnanatheepaminstein, Einstein; Prakasarao, Aruna; Dornadula, Koteeswaran; Singaravelu, Ganesan

2017-02-01

Cancer is one of the most common human threats around the world and diagnosis based on optical spectroscopy especially fluorescence technique has been established as the standard approach among scientist to explore the biochemical and morphological changes in tissues. In this regard, the present work aims to extract spectral signatures of the various fluorophores present in oral tissues using parallel factor analysis (PARAFAC). Subsequently, the statistical analysis also to be performed to show its diagnostic potential in distinguishing malignant, premalignant from normal oral tissues. Hence, the present study may lead to the possible and/or alternative tool for oral cancer diagnosis.

Global Profiling and Novel Structure Discovery Using Multiple Neutral Loss/Precursor Ion Scanning Combined with Substructure Recognition and Statistical Analysis (MNPSS): Characterization of Terpene-Conjugated Curcuminoids in Curcuma longa as a Case Study.

PubMed

Qiao, Xue; Lin, Xiong-hao; Ji, Shuai; Zhang, Zheng-xiang; Bo, Tao; Guo, De-an; Ye, Min

2016-01-05

To fully understand the chemical diversity of an herbal medicine is challenging. In this work, we describe a new approach to globally profile and discover novel compounds from an herbal extract using multiple neutral loss/precursor ion scanning combined with substructure recognition and statistical analysis. Turmeric (the rhizomes of Curcuma longa L.) was used as an example. This approach consists of three steps: (i) multiple neutral loss/precursor ion scanning to obtain substructure information; (ii) targeted identification of new compounds by extracted ion current and substructure recognition; and (iii) untargeted identification using total ion current and multivariate statistical analysis to discover novel structures. Using this approach, 846 terpecurcumins (terpene-conjugated curcuminoids) were discovered from turmeric, including a number of potentially novel compounds. Furthermore, two unprecedented compounds (terpecurcumins X and Y) were purified, and their structures were identified by NMR spectroscopy. This study extended the application of mass spectrometry to global profiling of natural products in herbal medicines and could help chemists to rapidly discover novel compounds from a complex matrix.
The effect of project-based learning on students' statistical literacy levels for data representation

NASA Astrophysics Data System (ADS)

Koparan, Timur; Güven, Bülent

2015-07-01

The point of this study is to define the effect of project-based learning approach on 8th Grade secondary-school students' statistical literacy levels for data representation. To achieve this goal, a test which consists of 12 open-ended questions in accordance with the views of experts was developed. Seventy 8th grade secondary-school students, 35 in the experimental group and 35 in the control group, took this test twice, one before the application and one after the application. All the raw scores were turned into linear points by using the Winsteps 3.72 modelling program that makes the Rasch analysis and t-tests, and an ANCOVA analysis was carried out with the linear points. Depending on the findings, it was concluded that the project-based learning approach increases students' level of statistical literacy for data representation. Students' levels of statistical literacy before and after the application were shown through the obtained person-item maps.
Teaching Statistics in Biology: Using Inquiry-based Learning to Strengthen Understanding of Statistical Analysis in Biology Laboratory Courses

PubMed Central

2008-01-01

There is an increasing need for students in the biological sciences to build a strong foundation in quantitative approaches to data analyses. Although most science, engineering, and math field majors are required to take at least one statistics course, statistical analysis is poorly integrated into undergraduate biology course work, particularly at the lower-division level. Elements of statistics were incorporated into an introductory biology course, including a review of statistics concepts and opportunity for students to perform statistical analysis in a biological context. Learning gains were measured with an 11-item statistics learning survey instrument developed for the course. Students showed a statistically significant 25% (p < 0.005) increase in statistics knowledge after completing introductory biology. Students improved their scores on the survey after completing introductory biology, even if they had previously completed an introductory statistics course (9%, improvement p < 0.005). Students retested 1 yr after completing introductory biology showed no loss of their statistics knowledge as measured by this instrument, suggesting that the use of statistics in biology course work may aid long-term retention of statistics knowledge. No statistically significant differences in learning were detected between male and female students in the study. PMID:18765754
Statistical approaches in published ophthalmic clinical science papers: a comparison to statistical practice two decades ago.

PubMed

Zhang, Harrison G; Ying, Gui-Shuang

2018-02-09

The aim of this study is to evaluate the current practice of statistical analysis of eye data in clinical science papers published in British Journal of Ophthalmology ( BJO ) and to determine whether the practice of statistical analysis has improved in the past two decades. All clinical science papers (n=125) published in BJO in January-June 2017 were reviewed for their statistical analysis approaches for analysing primary ocular measure. We compared our findings to the results from a previous paper that reviewed BJO papers in 1995. Of 112 papers eligible for analysis, half of the studies analysed the data at an individual level because of the nature of observation, 16 (14%) studies analysed data from one eye only, 36 (32%) studies analysed data from both eyes at ocular level, one study (1%) analysed the overall summary of ocular finding per individual and three (3%) studies used the paired comparison. Among studies with data available from both eyes, 50 (89%) of 56 papers in 2017 did not analyse data from both eyes or ignored the intereye correlation, as compared with in 60 (90%) of 67 papers in 1995 (P=0.96). Among studies that analysed data from both eyes at an ocular level, 33 (92%) of 36 studies completely ignored the intereye correlation in 2017, as compared with in 16 (89%) of 18 studies in 1995 (P=0.40). A majority of studies did not analyse the data properly when data from both eyes were available. The practice of statistical analysis did not improve in the past two decades. Collaborative efforts should be made in the vision research community to improve the practice of statistical analysis for ocular data. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Detection and Evaluation of Spatio-Temporal Spike Patterns in Massively Parallel Spike Train Data with SPADE.

PubMed

Quaglio, Pietro; Yegenoglu, Alper; Torre, Emiliano; Endres, Dominik M; Grün, Sonja

2017-01-01

Repeated, precise sequences of spikes are largely considered a signature of activation of cell assemblies. These repeated sequences are commonly known under the name of spatio-temporal patterns (STPs). STPs are hypothesized to play a role in the communication of information in the computational process operated by the cerebral cortex. A variety of statistical methods for the detection of STPs have been developed and applied to electrophysiological recordings, but such methods scale poorly with the current size of available parallel spike train recordings (more than 100 neurons). In this work, we introduce a novel method capable of overcoming the computational and statistical limits of existing analysis techniques in detecting repeating STPs within massively parallel spike trains (MPST). We employ advanced data mining techniques to efficiently extract repeating sequences of spikes from the data. Then, we introduce and compare two alternative approaches to distinguish statistically significant patterns from chance sequences. The first approach uses a measure known as conceptual stability, of which we investigate a computationally cheap approximation for applications to such large data sets. The second approach is based on the evaluation of pattern statistical significance. In particular, we provide an extension to STPs of a method we recently introduced for the evaluation of statistical significance of synchronous spike patterns. The performance of the two approaches is evaluated in terms of computational load and statistical power on a variety of artificial data sets that replicate specific features of experimental data. Both methods provide an effective and robust procedure for detection of STPs in MPST data. The method based on significance evaluation shows the best overall performance, although at a higher computational cost. We name the novel procedure the spatio-temporal Spike PAttern Detection and Evaluation (SPADE) analysis.
Detection and Evaluation of Spatio-Temporal Spike Patterns in Massively Parallel Spike Train Data with SPADE

PubMed Central

Quaglio, Pietro; Yegenoglu, Alper; Torre, Emiliano; Endres, Dominik M.; Grün, Sonja

2017-01-01

Repeated, precise sequences of spikes are largely considered a signature of activation of cell assemblies. These repeated sequences are commonly known under the name of spatio-temporal patterns (STPs). STPs are hypothesized to play a role in the communication of information in the computational process operated by the cerebral cortex. A variety of statistical methods for the detection of STPs have been developed and applied to electrophysiological recordings, but such methods scale poorly with the current size of available parallel spike train recordings (more than 100 neurons). In this work, we introduce a novel method capable of overcoming the computational and statistical limits of existing analysis techniques in detecting repeating STPs within massively parallel spike trains (MPST). We employ advanced data mining techniques to efficiently extract repeating sequences of spikes from the data. Then, we introduce and compare two alternative approaches to distinguish statistically significant patterns from chance sequences. The first approach uses a measure known as conceptual stability, of which we investigate a computationally cheap approximation for applications to such large data sets. The second approach is based on the evaluation of pattern statistical significance. In particular, we provide an extension to STPs of a method we recently introduced for the evaluation of statistical significance of synchronous spike patterns. The performance of the two approaches is evaluated in terms of computational load and statistical power on a variety of artificial data sets that replicate specific features of experimental data. Both methods provide an effective and robust procedure for detection of STPs in MPST data. The method based on significance evaluation shows the best overall performance, although at a higher computational cost. We name the novel procedure the spatio-temporal Spike PAttern Detection and Evaluation (SPADE) analysis. PMID:28596729
Bioassessment Tools for Stony Corals: Monitoring Approaches and Proposed Sampling Plan for the U.S. Virgin Islands

EPA Science Inventory

This document describes three general approaches to the design of a sampling plan for biological monitoring of coral reefs. Status assessment, trend detection and targeted monitoring each require a different approach to site selection and statistical analysis. For status assessm...
The Design and Analysis of Transposon-Insertion Sequencing Experiments

PubMed Central

Chao, Michael C.; Abel, Sören; Davis, Brigid M.; Waldor, Matthew K.

2016-01-01

Preface Transposon-insertion sequencing (TIS) is a powerful approach that can be widely applied to genome-wide definition of loci that are required for growth in diverse conditions. However, experimental design choices and stochastic biological processes can heavily influence the results of TIS experiments and affect downstream statistical analysis. Here, we discuss TIS experimental parameters and how these factors relate to the benefits and limitations of the various statistical frameworks that can be applied to computational analysis of TIS data. PMID:26775926
Analyzing Dyadic Sequence Data—Research Questions and Implied Statistical Models

PubMed Central

Fuchs, Peter; Nussbeck, Fridtjof W.; Meuwly, Nathalie; Bodenmann, Guy

2017-01-01

The analysis of observational data is often seen as a key approach to understanding dynamics in romantic relationships but also in dyadic systems in general. Statistical models for the analysis of dyadic observational data are not commonly known or applied. In this contribution, selected approaches to dyadic sequence data will be presented with a focus on models that can be applied when sample sizes are of medium size (N = 100 couples or less). Each of the statistical models is motivated by an underlying potential research question, the most important model results are presented and linked to the research question. The following research questions and models are compared with respect to their applicability using a hands on approach: (I) Is there an association between a particular behavior by one and the reaction by the other partner? (Pearson Correlation); (II) Does the behavior of one member trigger an immediate reaction by the other? (aggregated logit models; multi-level approach; basic Markov model); (III) Is there an underlying dyadic process, which might account for the observed behavior? (hidden Markov model); and (IV) Are there latent groups of dyads, which might account for observing different reaction patterns? (mixture Markov; optimal matching). Finally, recommendations for researchers to choose among the different models, issues of data handling, and advises to apply the statistical models in empirical research properly are given (e.g., in a new r-package “DySeq”). PMID:28443037
Integration of modern statistical tools for the analysis of climate extremes into the web-GIS “CLIMATE”

NASA Astrophysics Data System (ADS)

Ryazanova, A. A.; Okladnikov, I. G.; Gordov, E. P.

2017-11-01

The frequency of occurrence and magnitude of precipitation and temperature extreme events show positive trends in several geographical regions. These events must be analyzed and studied in order to better understand their impact on the environment, predict their occurrences, and mitigate their effects. For this purpose, we augmented web-GIS called “CLIMATE” to include a dedicated statistical package developed in the R language. The web-GIS “CLIMATE” is a software platform for cloud storage processing and visualization of distributed archives of spatial datasets. It is based on a combined use of web and GIS technologies with reliable procedures for searching, extracting, processing, and visualizing the spatial data archives. The system provides a set of thematic online tools for the complex analysis of current and future climate changes and their effects on the environment. The package includes new powerful methods of time-dependent statistics of extremes, quantile regression and copula approach for the detailed analysis of various climate extreme events. Specifically, the very promising copula approach allows obtaining the structural connections between the extremes and the various environmental characteristics. The new statistical methods integrated into the web-GIS “CLIMATE” can significantly facilitate and accelerate the complex analysis of climate extremes using only a desktop PC connected to the Internet.
Targeted versus statistical approaches to selecting parameters for modelling sediment provenance

NASA Astrophysics Data System (ADS)

Laceby, J. Patrick

2017-04-01

One effective field-based approach to modelling sediment provenance is the source fingerprinting technique. Arguably, one of the most important steps for this approach is selecting the appropriate suite of parameters or fingerprints used to model source contributions. Accordingly, approaches to selecting parameters for sediment source fingerprinting will be reviewed. Thereafter, opportunities and limitations of these approaches and some future research directions will be presented. For properties to be effective tracers of sediment, they must discriminate between sources whilst behaving conservatively. Conservative behavior is characterized by constancy in sediment properties, where the properties of sediment sources remain constant, or at the very least, any variation in these properties should occur in a predictable and measurable way. Therefore, properties selected for sediment source fingerprinting should remain constant through sediment detachment, transportation and deposition processes, or vary in a predictable and measurable way. One approach to select conservative properties for sediment source fingerprinting is to identify targeted tracers, such as caesium-137, that provide specific source information (e.g. surface versus subsurface origins). A second approach is to use statistical tests to select an optimal suite of conservative properties capable of modelling sediment provenance. In general, statistical approaches use a combination of a discrimination (e.g. Kruskal Wallis H-test, Mann-Whitney U-test) and parameter selection statistics (e.g. Discriminant Function Analysis or Principle Component Analysis). The challenge is that modelling sediment provenance is often not straightforward and there is increasing debate in the literature surrounding the most appropriate approach to selecting elements for modelling. Moving forward, it would be beneficial if researchers test their results with multiple modelling approaches, artificial mixtures, and multiple lines of evidence to provide secondary support to their initial modelling results. Indeed, element selection can greatly impact modelling results and having multiple lines of evidence will help provide confidence when modelling sediment provenance.
Probabilistic models for reactive behaviour in heterogeneous condensed phase media

NASA Astrophysics Data System (ADS)

Baer, M. R.; Gartling, D. K.; DesJardin, P. E.

2012-02-01

This work presents statistically-based models to describe reactive behaviour in heterogeneous energetic materials. Mesoscale effects are incorporated in continuum-level reactive flow descriptions using probability density functions (pdfs) that are associated with thermodynamic and mechanical states. A generalised approach is presented that includes multimaterial behaviour by treating the volume fraction as a random kinematic variable. Model simplifications are then sought to reduce the complexity of the description without compromising the statistical approach. Reactive behaviour is first considered for non-deformable media having a random temperature field as an initial state. A pdf transport relationship is derived and an approximate moment approach is incorporated in finite element analysis to model an example application whereby a heated fragment impacts a reactive heterogeneous material which leads to a delayed cook-off event. Modelling is then extended to include deformation effects associated with shock loading of a heterogeneous medium whereby random variables of strain, strain-rate and temperature are considered. A demonstrative mesoscale simulation of a non-ideal explosive is discussed that illustrates the joint statistical nature of the strain and temperature fields during shock loading to motivate the probabilistic approach. This modelling is derived in a Lagrangian framework that can be incorporated in continuum-level shock physics analysis. Future work will consider particle-based methods for a numerical implementation of this modelling approach.
Bayesian approach for counting experiment statistics applied to a neutrino point source analysis

NASA Astrophysics Data System (ADS)

Bose, D.; Brayeur, L.; Casier, M.; de Vries, K. D.; Golup, G.; van Eijndhoven, N.

2013-12-01

In this paper we present a model independent analysis method following Bayesian statistics to analyse data from a generic counting experiment and apply it to the search for neutrinos from point sources. We discuss a test statistic defined following a Bayesian framework that will be used in the search for a signal. In case no signal is found, we derive an upper limit without the introduction of approximations. The Bayesian approach allows us to obtain the full probability density function for both the background and the signal rate. As such, we have direct access to any signal upper limit. The upper limit derivation directly compares with a frequentist approach and is robust in the case of low-counting observations. Furthermore, it allows also to account for previous upper limits obtained by other analyses via the concept of prior information without the need of the ad hoc application of trial factors. To investigate the validity of the presented Bayesian approach, we have applied this method to the public IceCube 40-string configuration data for 10 nearby blazars and we have obtained a flux upper limit, which is in agreement with the upper limits determined via a frequentist approach. Furthermore, the upper limit obtained compares well with the previously published result of IceCube, using the same data set.
A Vehicle for Bivariate Data Analysis

ERIC Educational Resources Information Center

Roscoe, Matt B.

2016-01-01

Instead of reserving the study of probability and statistics for special fourth-year high school courses, the Common Core State Standards for Mathematics (CCSSM) takes a "statistics for all" approach. The standards recommend that students in grades 6-8 learn to summarize and describe data distributions, understand probability, draw…
MODEL ANALYSIS OF RIPARIAN BUFFER EFFECTIVENESS FOR REDUCING NUTRIENT INPUTS TO STREAMS IN AGRICULTURAL LANDSCAPES

EPA Science Inventory

Federal and state agencies responsible for protecting water quality rely mainly on statistically-based methods to assess and manage risks to the nation's streams, lakes and estuaries. Although statistical approaches provide valuable information on current trends in water quality...
Using Multilevel Modeling in Language Assessment Research: A Conceptual Introduction

ERIC Educational Resources Information Center

Barkaoui, Khaled

2013-01-01

This article critiques traditional single-level statistical approaches (e.g., multiple regression analysis) to examining relationships between language test scores and variables in the assessment setting. It highlights the conceptual, methodological, and statistical problems associated with these techniques in dealing with multilevel or nested…
Challenges in Biomarker Discovery: Combining Expert Insights with Statistical Analysis of Complex Omics Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

McDermott, Jason E.; Wang, Jing; Mitchell, Hugh D.

2013-01-01

The advent of high throughput technologies capable of comprehensive analysis of genes, transcripts, proteins and other significant biological molecules has provided an unprecedented opportunity for the identification of molecular markers of disease processes. However, it has simultaneously complicated the problem of extracting meaningful signatures of biological processes from these complex datasets. The process of biomarker discovery and characterization provides opportunities both for purely statistical and expert knowledge-based approaches and would benefit from improved integration of the two. Areas covered In this review we will present examples of current practices for biomarker discovery from complex omic datasets and the challenges thatmore » have been encountered. We will then present a high-level review of data-driven (statistical) and knowledge-based methods applied to biomarker discovery, highlighting some current efforts to combine the two distinct approaches. Expert opinion Effective, reproducible and objective tools for combining data-driven and knowledge-based approaches to biomarker discovery and characterization are key to future success in the biomarker field. We will describe our recommendations of possible approaches to this problem including metrics for the evaluation of biomarkers.« less
Estimating individual benefits of medical or behavioral treatments in severely ill patients.

PubMed

Diaz, Francisco J

2017-01-01

There is a need for statistical methods appropriate for the analysis of clinical trials from a personalized-medicine viewpoint as opposed to the common statistical practice that simply examines average treatment effects. This article proposes an approach to quantifying, reporting and analyzing individual benefits of medical or behavioral treatments to severely ill patients with chronic conditions, using data from clinical trials. The approach is a new development of a published framework for measuring the severity of a chronic disease and the benefits treatments provide to individuals, which utilizes regression models with random coefficients. Here, a patient is considered to be severely ill if the patient's basal severity is close to one. This allows the derivation of a very flexible family of probability distributions of individual benefits that depend on treatment duration and the covariates included in the regression model. Our approach may enrich the statistical analysis of clinical trials of severely ill patients because it allows investigating the probability distribution of individual benefits in the patient population and the variables that influence it, and we can also measure the benefits achieved in specific patients including new patients. We illustrate our approach using data from a clinical trial of the anti-depressant imipramine.
Noise exposure-response relationships established from repeated binary observations: Modeling approaches and applications.

PubMed

Schäffer, Beat; Pieren, Reto; Mendolia, Franco; Basner, Mathias; Brink, Mark

2017-05-01

Noise exposure-response relationships are used to estimate the effects of noise on individuals or a population. Such relationships may be derived from independent or repeated binary observations, and modeled by different statistical methods. Depending on the method by which they were established, their application in population risk assessment or estimation of individual responses may yield different results, i.e., predict "weaker" or "stronger" effects. As far as the present body of literature on noise effect studies is concerned, however, the underlying statistical methodology to establish exposure-response relationships has not always been paid sufficient attention. This paper gives an overview on two statistical approaches (subject-specific and population-averaged logistic regression analysis) to establish noise exposure-response relationships from repeated binary observations, and their appropriate applications. The considerations are illustrated with data from three noise effect studies, estimating also the magnitude of differences in results when applying exposure-response relationships derived from the two statistical approaches. Depending on the underlying data set and the probability range of the binary variable it covers, the two approaches yield similar to very different results. The adequate choice of a specific statistical approach and its application in subsequent studies, both depending on the research question, are therefore crucial.
Characterization of Inclusion Populations in Mn-Si Deoxidized Steel

NASA Astrophysics Data System (ADS)

García-Carbajal, Alfonso; Herrera-Trejo, Martín; Castro-Cedeño, Edgar-Ivan; Castro-Román, Manuel; Martinez-Enriquez, Arturo-Isaias

2017-12-01

Four plant heats of Mn-Si deoxidized steel were conducted to follow the evolution of the inclusion population through ladle furnace (LF) treatment and subsequent vacuum treatment (VT). The liquid steel was sampled, and the chemical composition and size distribution of the inclusion populations were characterized. The Gumbel generalized extreme-value (GEV) and generalized Pareto (GP) distributions were used for the statistical analysis of the inclusion size distributions. The inclusions found at the beginning of the LF treatment were mostly fully liquid SiO2-Al2O3-MnO inclusions, which then evolved into fully liquid SiO2-Al2O3-CaO-MgO and partly liquid SiO2-CaO-MgO-(Al2O3-MgO) inclusions detected at the end of the VT. The final fully liquid inclusions had a desirable chemical composition for plastic behavior in subsequent metallurgical operations. The GP distribution was found to be undesirable for statistical analysis. The GEV distribution approach led to shape parameter values different from the zero value hypothesized from the Gumbel distribution. According to the GEV approach, some of the final inclusion size distributions had statistically significant differences, whereas the Gumbel approach predicted no statistically significant differences. The heats were organized according to indicators of inclusion cleanliness and a statistical comparison of the size distributions.

Statistical model specification and power: recommendations on the use of test-qualified pooling in analysis of experimental data

PubMed Central

Colegrave, Nick

2017-01-01

A common approach to the analysis of experimental data across much of the biological sciences is test-qualified pooling. Here non-significant terms are dropped from a statistical model, effectively pooling the variation associated with each removed term with the error term used to test hypotheses (or estimate effect sizes). This pooling is only carried out if statistical testing on the basis of applying that data to a previous more complicated model provides motivation for this model simplification; hence the pooling is test-qualified. In pooling, the researcher increases the degrees of freedom of the error term with the aim of increasing statistical power to test their hypotheses of interest. Despite this approach being widely adopted and explicitly recommended by some of the most widely cited statistical textbooks aimed at biologists, here we argue that (except in highly specialized circumstances that we can identify) the hoped-for improvement in statistical power will be small or non-existent, and there is likely to be much reduced reliability of the statistical procedures through deviation of type I error rates from nominal levels. We thus call for greatly reduced use of test-qualified pooling across experimental biology, more careful justification of any use that continues, and a different philosophy for initial selection of statistical models in the light of this change in procedure. PMID:28330912
Combined acute ecotoxicity of malathion and deltamethrin to Daphnia magna (Crustacea, Cladocera): comparison of different data analysis approaches.

PubMed

Toumi, Héla; Boumaiza, Moncef; Millet, Maurice; Radetski, Claudemir Marcos; Camara, Baba Issa; Felten, Vincent; Masfaraud, Jean-François; Férard, Jean-François

2018-04-19

We studied the combined acute effect (i.e., after 48 h) of deltamethrin (a pyrethroid insecticide) and malathion (an organophosphate insecticide) on Daphnia magna. Two approaches were used to examine the potential interaction effects of eight mixtures of deltamethrin and malathion: (i) calculation of mixture toxicity index (MTI) and safety factor index (SFI) and (ii) response surface methodology coupled with isobole-based statistical model (using generalized linear model). According to the calculation of MTI and SFI, one tested mixture was found additive while the two other tested mixtures were found no additive (MTI) or antagonistic (SFI), but these differences between index responses are only due to differences in terminology related to these two indexes. Through the surface response approach and isobologram analysis, we concluded that there was a significant antagonistic effect of the binary mixtures of deltamethrin and malathion that occurs on D. magna immobilization, after 48 h of exposure. Index approaches and surface response approach with isobologram analysis are complementary. Calculation of mixture toxicity index and safety factor index allows identifying punctually the type of interaction for several tested mixtures, while the surface response approach with isobologram analysis integrates all the data providing a global outcome about the type of interactive effect. Only the surface response approach and isobologram analysis allowed the statistical assessment of the ecotoxicological interaction. Nevertheless, we recommend the use of both approaches (i) to identify the combined effects of contaminants and (ii) to improve risk assessment and environmental management.
FT. Sam 91 Whiskey Combat Medic Medical Simulation Training Quantitative Integration Enhancement Program

DTIC Science & Technology

2011-07-01

joined the project team in the statistical and research coordination role. Dr. Collin is an employee at the University of Pittsburgh. A successful...3. Submit to Ft. Detrick Completed Milestone: Statistical analysis planning 1. Review planned data metrics and data gathering tools...approach to performance assessment for continuous quality improvement.  Analyzing data with modern statistical techniques to determine the
From micro to mainframe. A practical approach to perinatal data processing.

PubMed

Yeh, S Y; Lincoln, T

1985-04-01

A new, practical approach to perinatal data processing for a large obstetric population is described. This was done with a microcomputer for data entry and a mainframe computer for data reduction. The Screen Oriented Data Access (SODA) program was used to generate the data entry form and to input data into the Apple II Plus computer. Data were stored on diskettes and transmitted through a modern and telephone line to the IBM 370/168 computer. The Statistical Analysis System (SAS) program was used for statistical analyses and report generations. This approach was found to be most practical, flexible, and economical.
The added value of ordinal analysis in clinical trials: an example in traumatic brain injury.

PubMed

Roozenbeek, Bob; Lingsma, Hester F; Perel, Pablo; Edwards, Phil; Roberts, Ian; Murray, Gordon D; Maas, Andrew Ir; Steyerberg, Ewout W

2011-01-01

In clinical trials, ordinal outcome measures are often dichotomized into two categories. In traumatic brain injury (TBI) the 5-point Glasgow outcome scale (GOS) is collapsed into unfavourable versus favourable outcome. Simulation studies have shown that exploiting the ordinal nature of the GOS increases chances of detecting treatment effects. The objective of this study is to quantify the benefits of ordinal analysis in the real-life situation of a large TBI trial. We used data from the CRASH trial that investigated the efficacy of corticosteroids in TBI patients (n = 9,554). We applied two techniques for ordinal analysis: proportional odds analysis and the sliding dichotomy approach, where the GOS is dichotomized at different cut-offs according to baseline prognostic risk. These approaches were compared to dichotomous analysis. The information density in each analysis was indicated by a Wald statistic. All analyses were adjusted for baseline characteristics. Dichotomous analysis of the six-month GOS showed a non-significant treatment effect (OR = 1.09, 95% CI 0.98 to 1.21, P = 0.096). Ordinal analysis with proportional odds regression or sliding dichotomy showed highly statistically significant treatment effects (OR 1.15, 95% CI 1.06 to 1.25, P = 0.0007 and 1.19, 95% CI 1.08 to 1.30, P = 0.0002), with 2.05-fold and 2.56-fold higher information density compared to the dichotomous approach respectively. Analysis of the CRASH trial data confirmed that ordinal analysis of outcome substantially increases statistical power. We expect these results to hold for other fields of critical care medicine that use ordinal outcome measures and recommend that future trials adopt ordinal analyses. This will permit detection of smaller treatment effects.
Statistical methods and neural network approaches for classification of data from multiple sources

NASA Technical Reports Server (NTRS)

Benediktsson, Jon Atli; Swain, Philip H.

1990-01-01

Statistical methods for classification of data from multiple data sources are investigated and compared to neural network models. A problem with using conventional multivariate statistical approaches for classification of data of multiple types is in general that a multivariate distribution cannot be assumed for the classes in the data sources. Another common problem with statistical classification methods is that the data sources are not equally reliable. This means that the data sources need to be weighted according to their reliability but most statistical classification methods do not have a mechanism for this. This research focuses on statistical methods which can overcome these problems: a method of statistical multisource analysis and consensus theory. Reliability measures for weighting the data sources in these methods are suggested and investigated. Secondly, this research focuses on neural network models. The neural networks are distribution free since no prior knowledge of the statistical distribution of the data is needed. This is an obvious advantage over most statistical classification methods. The neural networks also automatically take care of the problem involving how much weight each data source should have. On the other hand, their training process is iterative and can take a very long time. Methods to speed up the training procedure are introduced and investigated. Experimental results of classification using both neural network models and statistical methods are given, and the approaches are compared based on these results.
Recent advances in statistical energy analysis

NASA Technical Reports Server (NTRS)

Heron, K. H.

1992-01-01

Statistical Energy Analysis (SEA) has traditionally been developed using modal summation and averaging approach, and has led to the need for many restrictive SEA assumptions. The assumption of 'weak coupling' is particularly unacceptable when attempts are made to apply SEA to structural coupling. It is now believed that this assumption is more a function of the modal formulation rather than a necessary formulation of SEA. The present analysis ignores this restriction and describes a wave approach to the calculation of plate-plate coupling loss factors. Predictions based on this method are compared with results obtained from experiments using point excitation on one side of an irregular six-sided box structure. Conclusions show that the use and calculation of infinite transmission coefficients is the way forward for the development of a purely predictive SEA code.
Some Statistics for Assessing Person-Fit Based on Continuous-Response Models

ERIC Educational Resources Information Center

Ferrando, Pere Joan

2010-01-01

This article proposes several statistics for assessing individual fit based on two unidimensional models for continuous responses: linear factor analysis and Samejima's continuous response model. Both models are approached using a common framework based on underlying response variables and are formulated at the individual level as fixed regression…
Analyzing a Mature Software Inspection Process Using Statistical Process Control (SPC)

NASA Technical Reports Server (NTRS)

Barnard, Julie; Carleton, Anita; Stamper, Darrell E. (Technical Monitor)

1999-01-01

This paper presents a cooperative effort where the Software Engineering Institute and the Space Shuttle Onboard Software Project could experiment applying Statistical Process Control (SPC) analysis to inspection activities. The topics include: 1) SPC Collaboration Overview; 2) SPC Collaboration Approach and Results; and 3) Lessons Learned.
Three Strategies for the Critical Use of Statistical Methods in Psychological Research

ERIC Educational Resources Information Center

Campitelli, Guillermo; Macbeth, Guillermo; Ospina, Raydonal; Marmolejo-Ramos, Fernando

2017-01-01

We present three strategies to replace the null hypothesis statistical significance testing approach in psychological research: (1) visual representation of cognitive processes and predictions, (2) visual representation of data distributions and choice of the appropriate distribution for analysis, and (3) model comparison. The three strategies…
Some Conceptual Deficiencies in "Developmental" Behavior Genetics.

ERIC Educational Resources Information Center

Gottlieb, Gilbert

1995-01-01

Criticizes the application of the statistical procedures of the population-genetic approach within evolutionary biology to the study of psychological development. Argues that the application of the statistical methods of population genetics--primarily the analysis of variance--to the causes of psychological development is bound to result in a…
Application of Transformations in Parametric Inference

ERIC Educational Resources Information Center

Brownstein, Naomi; Pensky, Marianna

2008-01-01

The objective of the present paper is to provide a simple approach to statistical inference using the method of transformations of variables. We demonstrate performance of this powerful tool on examples of constructions of various estimation procedures, hypothesis testing, Bayes analysis and statistical inference for the stress-strength systems.…
On the Application of Syntactic Methodologies in Automatic Text Analysis.

ERIC Educational Resources Information Center

Salton, Gerard; And Others

1990-01-01

Summarizes various linguistic approaches proposed for document analysis in information retrieval environments. Topics discussed include syntactic analysis; use of machine-readable dictionary information; knowledge base construction; the PLNLP English Grammar (PEG) system; phrase normalization; and statistical and syntactic phrase evaluation used…
Prediction of the Electromagnetic Field Distribution in a Typical Aircraft Using the Statistical Energy Analysis

NASA Astrophysics Data System (ADS)

Kovalevsky, Louis; Langley, Robin S.; Caro, Stephane

2016-05-01

Due to the high cost of experimental EMI measurements significant attention has been focused on numerical simulation. Classical methods such as Method of Moment or Finite Difference Time Domain are not well suited for this type of problem, as they require a fine discretisation of space and failed to take into account uncertainties. In this paper, the authors show that the Statistical Energy Analysis is well suited for this type of application. The SEA is a statistical approach employed to solve high frequency problems of electromagnetically reverberant cavities at a reduced computational cost. The key aspects of this approach are (i) to consider an ensemble of system that share the same gross parameter, and (ii) to avoid solving Maxwell's equations inside the cavity, using the power balance principle. The output is an estimate of the field magnitude distribution in each cavity. The method is applied on a typical aircraft structure.
High-Density Signal Interface Electromagnetic Radiation Prediction for Electromagnetic Compatibility Evaluation.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Halligan, Matthew

Radiated power calculation approaches for practical scenarios of incomplete high- density interface characterization information and incomplete incident power information are presented. The suggested approaches build upon a method that characterizes power losses through the definition of power loss constant matrices. Potential radiated power estimates include using total power loss information, partial radiated power loss information, worst case analysis, and statistical bounding analysis. A method is also proposed to calculate radiated power when incident power information is not fully known for non-periodic signals at the interface. Incident data signals are modeled from a two-state Markov chain where bit state probabilities aremore » derived. The total spectrum for windowed signals is postulated as the superposition of spectra from individual pulses in a data sequence. Statistical bounding methods are proposed as a basis for the radiated power calculation due to the statistical calculation complexity to find a radiated power probability density function.« less
A note about high blood pressure in childhood

NASA Astrophysics Data System (ADS)

Teodoro, M. Filomena; Simão, Carla

2017-06-01

In medical, behavioral and social sciences it is usual to get a binary outcome. In the present work is collected information where some of the outcomes are binary variables (1='yes'/ 0='no'). In [14] a preliminary study about the caregivers perception of pediatric hypertension was introduced. An experimental questionnaire was designed to be answered by the caregivers of routine pediatric consultation attendees in the Santa Maria's hospital (HSM). The collected data was statistically analyzed, where a descriptive analysis and a predictive model were performed. Significant relations between some socio-demographic variables and the assessed knowledge were obtained. In [14] can be found a statistical data analysis using partial questionnaire's information. The present article completes the statistical approach estimating a model for relevant remaining questions of questionnaire by Generalized Linear Models (GLM). Exploring the binary outcome issue, we intend to extend this approach using Generalized Linear Mixed Models (GLMM), but the process is still ongoing.
Transfusion Indication Threshold Reduction (TITRe2) randomized controlled trial in cardiac surgery: statistical analysis plan.

PubMed

Pike, Katie; Nash, Rachel L; Murphy, Gavin J; Reeves, Barnaby C; Rogers, Chris A

2015-02-22

The Transfusion Indication Threshold Reduction (TITRe2) trial is the largest randomized controlled trial to date to compare red blood cell transfusion strategies following cardiac surgery. This update presents the statistical analysis plan, detailing how the study will be analyzed and presented. The statistical analysis plan has been written following recommendations from the International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use, prior to database lock and the final analysis of trial data. Outlined analyses are in line with the Consolidated Standards of Reporting Trials (CONSORT). The study aims to randomize 2000 patients from 17 UK centres. Patients are randomized to either a restrictive (transfuse if haemoglobin concentration <7.5 g/dl) or liberal (transfuse if haemoglobin concentration <9 g/dl) transfusion strategy. The primary outcome is a binary composite outcome of any serious infectious or ischaemic event in the first 3 months following randomization. The statistical analysis plan details how non-adherence with the intervention, withdrawals from the study, and the study population will be derived and dealt with in the analysis. The planned analyses of the trial primary and secondary outcome measures are described in detail, including approaches taken to deal with multiple testing, model assumptions not being met and missing data. Details of planned subgroup and sensitivity analyses and pre-specified ancillary analyses are given, along with potential issues that have been identified with such analyses and possible approaches to overcome such issues. ISRCTN70923932 .
Examination of two methods for statistical analysis of data with magnitude and direction emphasizing vestibular research applications

NASA Technical Reports Server (NTRS)

Calkins, D. S.

1998-01-01

When the dependent (or response) variable response variable in an experiment has direction and magnitude, one approach that has been used for statistical analysis involves splitting magnitude and direction and applying univariate statistical techniques to the components. However, such treatment of quantities with direction and magnitude is not justifiable mathematically and can lead to incorrect conclusions about relationships among variables and, as a result, to flawed interpretations. This note discusses a problem with that practice and recommends mathematically correct procedures to be used with dependent variables that have direction and magnitude for 1) computation of mean values, 2) statistical contrasts of and confidence intervals for means, and 3) correlation methods.
Fast and accurate imputation of summary statistics enhances evidence of functional enrichment

PubMed Central

Pasaniuc, Bogdan; Zaitlen, Noah; Shi, Huwenbo; Bhatia, Gaurav; Gusev, Alexander; Pickrell, Joseph; Hirschhorn, Joel; Strachan, David P.; Patterson, Nick; Price, Alkes L.

2014-01-01

Motivation: Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. Results: In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1–5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case–control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of χ2 association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses. Availability and implementation: Publicly available software package available at http://bogdan.bioinformatics.ucla.edu/software/. Contact: bpasaniuc@mednet.ucla.edu or aprice@hsph.harvard.edu Supplementary information: Supplementary materials are available at Bioinformatics online. PMID:24990607
Assessing the Kansas water-level monitoring program: An example of the application of classical statistics to a geological problem

USGS Publications Warehouse

Davis, J.C.

2000-01-01

Geologists may feel that geological data are not amenable to statistical analysis, or at best require specialized approaches such as nonparametric statistics and geostatistics. However, there are many circumstances, particularly in systematic studies conducted for environmental or regulatory purposes, where traditional parametric statistical procedures can be beneficial. An example is the application of analysis of variance to data collected in an annual program of measuring groundwater levels in Kansas. Influences such as well conditions, operator effects, and use of the water can be assessed and wells that yield less reliable measurements can be identified. Such statistical studies have resulted in yearly improvements in the quality and reliability of the collected hydrologic data. Similar benefits may be achieved in other geological studies by the appropriate use of classical statistical tools.

Choice-Based Conjoint Analysis: Classification vs. Discrete Choice Models

NASA Astrophysics Data System (ADS)

Giesen, Joachim; Mueller, Klaus; Taneva, Bilyana; Zolliker, Peter

Conjoint analysis is a family of techniques that originated in psychology and later became popular in market research. The main objective of conjoint analysis is to measure an individual's or a population's preferences on a class of options that can be described by parameters and their levels. We consider preference data obtained in choice-based conjoint analysis studies, where one observes test persons' choices on small subsets of the options. There are many ways to analyze choice-based conjoint analysis data. Here we discuss the intuition behind a classification based approach, and compare this approach to one based on statistical assumptions (discrete choice models) and to a regression approach. Our comparison on real and synthetic data indicates that the classification approach outperforms the discrete choice models.
A discrimlnant function approach to ecological site classification in northern New England

Treesearch

James M. Fincher; Marie-Louise Smith

1994-01-01

Describes one approach to ecologically based classification of upland forest community types of the White and Green Mountain physiographic regions. The classification approach is based on an intensive statistical analysis of the relationship between the communities and soil-site factors. Discriminant functions useful in distinguishing between types based on soil-site...
Threshold-free high-power methods for the ontological analysis of genome-wide gene-expression studies

PubMed Central

Nilsson, Björn; Håkansson, Petra; Johansson, Mikael; Nelander, Sven; Fioretos, Thoas

2007-01-01

Ontological analysis facilitates the interpretation of microarray data. Here we describe new ontological analysis methods which, unlike existing approaches, are threshold-free and statistically powerful. We perform extensive evaluations and introduce a new concept, detection spectra, to characterize methods. We show that different ontological analysis methods exhibit distinct detection spectra, and that it is critical to account for this diversity. Our results argue strongly against the continued use of existing methods, and provide directions towards an enhanced approach. PMID:17488501
Introduction to Bayesian statistical approaches to compositional analyses of transgenic crops 1. Model validation and setting the stage.

PubMed

Harrison, Jay M; Breeze, Matthew L; Harrigan, George G

2011-08-01

Statistical comparisons of compositional data generated on genetically modified (GM) crops and their near-isogenic conventional (non-GM) counterparts typically rely on classical significance testing. This manuscript presents an introduction to Bayesian methods for compositional analysis along with recommendations for model validation. The approach is illustrated using protein and fat data from two herbicide tolerant GM soybeans (MON87708 and MON87708×MON89788) and a conventional comparator grown in the US in 2008 and 2009. Guidelines recommended by the US Food and Drug Administration (FDA) in conducting Bayesian analyses of clinical studies on medical devices were followed. This study is the first Bayesian approach to GM and non-GM compositional comparisons. The evaluation presented here supports a conclusion that a Bayesian approach to analyzing compositional data can provide meaningful and interpretable results. We further describe the importance of method validation and approaches to model checking if Bayesian approaches to compositional data analysis are to be considered viable by scientists involved in GM research and regulation. Copyright © 2011 Elsevier Inc. All rights reserved.
Analysis of the dependence of extreme rainfalls

NASA Astrophysics Data System (ADS)

Padoan, Simone; Ancey, Christophe; Parlange, Marc

2010-05-01

The aim of spatial analysis is to quantitatively describe the behavior of environmental phenomena such as precipitation levels, wind speed or daily temperatures. A number of generic approaches to spatial modeling have been developed[1], but these are not necessarily ideal for handling extremal aspects given their focus on mean process levels. The areal modelling of the extremes of a natural process observed at points in space is important in environmental statistics; for example, understanding extremal spatial rainfall is crucial in flood protection. In light of recent concerns over climate change, the use of robust mathematical and statistical methods for such analyses has grown in importance. Multivariate extreme value models and the class of maxstable processes [2] have a similar asymptotic motivation to the univariate Generalized Extreme Value (GEV) distribution , but providing a general approach to modeling extreme processes incorporating temporal or spatial dependence. Statistical methods for max-stable processes and data analyses of practical problems are discussed by [3] and [4]. This work illustrates methods to the statistical modelling of spatial extremes and gives examples of their use by means of a real extremal data analysis of Switzerland precipitation levels. [1] Cressie, N. A. C. (1993). Statistics for Spatial Data. Wiley, New York. [2] de Haan, L and Ferreria A. (2006). Extreme Value Theory An Introduction. Springer, USA. [3] Padoan, S. A., Ribatet, M and Sisson, S. A. (2009). Likelihood-Based Inference for Max-Stable Processes. Journal of the American Statistical Association, Theory & Methods. In press. [4] Davison, A. C. and Gholamrezaee, M. (2009), Geostatistics of extremes. Journal of the Royal Statistical Society, Series B. To appear.
Statistical power and optimal design in experiments in which samples of participants respond to samples of stimuli.

PubMed

Westfall, Jacob; Kenny, David A; Judd, Charles M

2014-10-01

Researchers designing experiments in which a sample of participants responds to a sample of stimuli are faced with difficult questions about optimal study design. The conventional procedures of statistical power analysis fail to provide appropriate answers to these questions because they are based on statistical models in which stimuli are not assumed to be a source of random variation in the data, models that are inappropriate for experiments involving crossed random factors of participants and stimuli. In this article, we present new methods of power analysis for designs with crossed random factors, and we give detailed, practical guidance to psychology researchers planning experiments in which a sample of participants responds to a sample of stimuli. We extensively examine 5 commonly used experimental designs, describe how to estimate statistical power in each, and provide power analysis results based on a reasonable set of default parameter values. We then develop general conclusions and formulate rules of thumb concerning the optimal design of experiments in which a sample of participants responds to a sample of stimuli. We show that in crossed designs, statistical power typically does not approach unity as the number of participants goes to infinity but instead approaches a maximum attainable power value that is possibly small, depending on the stimulus sample. We also consider the statistical merits of designs involving multiple stimulus blocks. Finally, we provide a simple and flexible Web-based power application to aid researchers in planning studies with samples of stimuli.
An improved approach for flight readiness certification: Methodology for failure risk assessment and application examples. Volume 3: Structure and listing of programs

NASA Technical Reports Server (NTRS)

Moore, N. R.; Ebbeler, D. H.; Newlin, L. E.; Sutharshana, S.; Creager, M.

1992-01-01

An improved methodology for quantitatively evaluating failure risk of spaceflight systems to assess flight readiness and identify risk control measures is presented. This methodology, called Probabilistic Failure Assessment (PFA), combines operating experience from tests and flights with engineering analysis to estimate failure risk. The PFA methodology is of particular value when information on which to base an assessment of failure risk, including test experience and knowledge of parameters used in engineering analyses of failure phenomena, is expensive or difficult to acquire. The PFA methodology is a prescribed statistical structure in which engineering analysis models that characterize failure phenomena are used conjointly with uncertainties about analysis parameters and/or modeling accuracy to estimate failure probability distributions for specific failure modes. These distributions can then be modified, by means of statistical procedures of the PFA methodology, to reflect any test or flight experience. Conventional engineering analysis models currently employed for design of failure prediction are used in this methodology. The PFA methodology is described and examples of its application are presented. Conventional approaches to failure risk evaluation for spaceflight systems are discussed, and the rationale for the approach taken in the PFA methodology is presented. The statistical methods, engineering models, and computer software used in fatigue failure mode applications are thoroughly documented.
Analysis of regional deformation and strain accumulation data adjacent to the San Andreas fault

NASA Technical Reports Server (NTRS)

Turcotte, Donald L.

1991-01-01

A new approach to the understanding of crustal deformation was developed under this grant. This approach combined aspects of fractals, chaos, and self-organized criticality to provide a comprehensive theory for deformation on distributed faults. It is hypothesized that crustal deformation is an example of comminution: Deformation takes place on a fractal distribution of faults resulting in a fractal distribution of seismicity. Our primary effort under this grant was devoted to developing an understanding of distributed deformation in the continental crust. An initial effort was carried out on the fractal clustering of earthquakes in time. It was shown that earthquakes do not obey random Poisson statistics, but can be approximated in many cases by coupled, scale-invariant fractal statistics. We applied our approach to the statistics of earthquakes in the New Hebrides region of the southwest Pacific because of the very high level of seismicity there. This work was written up and published in the Bulletin of the Seismological Society of America. This approach was also applied to the statistics of the seismicity on the San Andreas fault system.
A hierarchical fuzzy rule-based approach to aphasia diagnosis.

PubMed

Akbarzadeh-T, Mohammad-R; Moshtagh-Khorasani, Majid

2007-10-01

Aphasia diagnosis is a particularly challenging medical diagnostic task due to the linguistic uncertainty and vagueness, inconsistencies in the definition of aphasic syndromes, large number of measurements with imprecision, natural diversity and subjectivity in test objects as well as in opinions of experts who diagnose the disease. To efficiently address this diagnostic process, a hierarchical fuzzy rule-based structure is proposed here that considers the effect of different features of aphasia by statistical analysis in its construction. This approach can be efficient for diagnosis of aphasia and possibly other medical diagnostic applications due to its fuzzy and hierarchical reasoning construction. Initially, the symptoms of the disease which each consists of different features are analyzed statistically. The measured statistical parameters from the training set are then used to define membership functions and the fuzzy rules. The resulting two-layered fuzzy rule-based system is then compared with a back propagating feed-forward neural network for diagnosis of four Aphasia types: Anomic, Broca, Global and Wernicke. In order to reduce the number of required inputs, the technique is applied and compared on both comprehensive and spontaneous speech tests. Statistical t-test analysis confirms that the proposed approach uses fewer Aphasia features while also presenting a significant improvement in terms of accuracy.
Analysis of data collected from right and left limbs: Accounting for dependence and improving statistical efficiency in musculoskeletal research.

PubMed

Stewart, Sarah; Pearson, Janet; Rome, Keith; Dalbeth, Nicola; Vandal, Alain C

2018-01-01

Statistical techniques currently used in musculoskeletal research often inefficiently account for paired-limb measurements or the relationship between measurements taken from multiple regions within limbs. This study compared three commonly used analysis methods with a mixed-models approach that appropriately accounted for the association between limbs, regions, and trials and that utilised all information available from repeated trials. Four analysis were applied to an existing data set containing plantar pressure data, which was collected for seven masked regions on right and left feet, over three trials, across three participant groups. Methods 1-3 averaged data over trials and analysed right foot data (Method 1), data from a randomly selected foot (Method 2), and averaged right and left foot data (Method 3). Method 4 used all available data in a mixed-effects regression that accounted for repeated measures taken for each foot, foot region and trial. Confidence interval widths for the mean differences between groups for each foot region were used as a criterion for comparison of statistical efficiency. Mean differences in pressure between groups were similar across methods for each foot region, while the confidence interval widths were consistently smaller for Method 4. Method 4 also revealed significant between-group differences that were not detected by Methods 1-3. A mixed effects linear model approach generates improved efficiency and power by producing more precise estimates compared to alternative approaches that discard information in the process of accounting for paired-limb measurements. This approach is recommended in generating more clinically sound and statistically efficient research outputs. Copyright © 2017 Elsevier B.V. All rights reserved.
Statistical power in parallel group point exposure studies with time-to-event outcomes: an empirical comparison of the performance of randomized controlled trials and the inverse probability of treatment weighting (IPTW) approach.

PubMed

Austin, Peter C; Schuster, Tibor; Platt, Robert W

2015-10-15

Estimating statistical power is an important component of the design of both randomized controlled trials (RCTs) and observational studies. Methods for estimating statistical power in RCTs have been well described and can be implemented simply. In observational studies, statistical methods must be used to remove the effects of confounding that can occur due to non-random treatment assignment. Inverse probability of treatment weighting (IPTW) using the propensity score is an attractive method for estimating the effects of treatment using observational data. However, sample size and power calculations have not been adequately described for these methods. We used an extensive series of Monte Carlo simulations to compare the statistical power of an IPTW analysis of an observational study with time-to-event outcomes with that of an analysis of a similarly-structured RCT. We examined the impact of four factors on the statistical power function: number of observed events, prevalence of treatment, the marginal hazard ratio, and the strength of the treatment-selection process. We found that, on average, an IPTW analysis had lower statistical power compared to an analysis of a similarly-structured RCT. The difference in statistical power increased as the magnitude of the treatment-selection model increased. The statistical power of an IPTW analysis tended to be lower than the statistical power of a similarly-structured RCT.
Big-Data RHEED analysis for understanding epitaxial film growth processes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vasudevan, Rama K; Tselev, Alexander; Baddorf, Arthur P

Reflection high energy electron diffraction (RHEED) has by now become a standard tool for in-situ monitoring of film growth by pulsed laser deposition and molecular beam epitaxy. Yet despite the widespread adoption and wealth of information in RHEED image, most applications are limited to observing intensity oscillations of the specular spot, and much additional information on growth is discarded. With ease of data acquisition and increased computation speeds, statistical methods to rapidly mine the dataset are now feasible. Here, we develop such an approach to the analysis of the fundamental growth processes through multivariate statistical analysis of RHEED image sequence.more » This approach is illustrated for growth of LaxCa1-xMnO3 films grown on etched (001) SrTiO3 substrates, but is universal. The multivariate methods including principal component analysis and k-means clustering provide insight into the relevant behaviors, the timing and nature of a disordered to ordered growth change, and highlight statistically significant patterns. Fourier analysis yields the harmonic components of the signal and allows separation of the relevant components and baselines, isolating the assymetric nature of the step density function and the transmission spots from the imperfect layer-by-layer (LBL) growth. These studies show the promise of big data approaches to obtaining more insight into film properties during and after epitaxial film growth. Furthermore, these studies open the pathway to use forward prediction methods to potentially allow significantly more control over growth process and hence final film quality.« less
Complexities and potential pitfalls of clinical study design and data analysis in assisted reproduction.

PubMed

Patounakis, George; Hill, Micah J

2018-06-01

The purpose of the current review is to describe the common pitfalls in design and statistical analysis of reproductive medicine studies. It serves to guide both authors and reviewers toward reducing the incidence of spurious statistical results and erroneous conclusions. The large amount of data gathered in IVF cycles leads to problems with multiplicity, multicollinearity, and over fitting of regression models. Furthermore, the use of the word 'trend' to describe nonsignificant results has increased in recent years. Finally, methods to accurately account for female age in infertility research models are becoming more common and necessary. The pitfalls of study design and analysis reviewed provide a framework for authors and reviewers to approach clinical research in the field of reproductive medicine. By providing a more rigorous approach to study design and analysis, the literature in reproductive medicine will have more reliable conclusions that can stand the test of time.
Modelling short time series in metabolomics: a functional data analysis approach.

PubMed

Montana, Giovanni; Berk, Maurice; Ebbels, Tim

2011-01-01

Metabolomics is the study of the complement of small molecule metabolites in cells, biofluids and tissues. Many metabolomic experiments are designed to compare changes observed over time under two or more experimental conditions (e.g. a control and drug-treated group), thus producing time course data. Models from traditional time series analysis are often unsuitable because, by design, only very few time points are available and there are a high number of missing values. We propose a functional data analysis approach for modelling short time series arising in metabolomic studies which overcomes these obstacles. Our model assumes that each observed time series is a smooth random curve, and we propose a statistical approach for inferring this curve from repeated measurements taken on the experimental units. A test statistic for detecting differences between temporal profiles associated with two experimental conditions is then presented. The methodology has been applied to NMR spectroscopy data collected in a pre-clinical toxicology study.
Introduction to bioinformatics.

PubMed

Can, Tolga

2014-01-01

Bioinformatics is an interdisciplinary field mainly involving molecular biology and genetics, computer science, mathematics, and statistics. Data intensive, large-scale biological problems are addressed from a computational point of view. The most common problems are modeling biological processes at the molecular level and making inferences from collected data. A bioinformatics solution usually involves the following steps: Collect statistics from biological data. Build a computational model. Solve a computational modeling problem. Test and evaluate a computational algorithm. This chapter gives a brief introduction to bioinformatics by first providing an introduction to biological terminology and then discussing some classical bioinformatics problems organized by the types of data sources. Sequence analysis is the analysis of DNA and protein sequences for clues regarding function and includes subproblems such as identification of homologs, multiple sequence alignment, searching sequence patterns, and evolutionary analyses. Protein structures are three-dimensional data and the associated problems are structure prediction (secondary and tertiary), analysis of protein structures for clues regarding function, and structural alignment. Gene expression data is usually represented as matrices and analysis of microarray data mostly involves statistics analysis, classification, and clustering approaches. Biological networks such as gene regulatory networks, metabolic pathways, and protein-protein interaction networks are usually modeled as graphs and graph theoretic approaches are used to solve associated problems such as construction and analysis of large-scale networks.
Implementing informative priors for heterogeneity in meta-analysis using meta-regression and pseudo data.

PubMed

Rhodes, Kirsty M; Turner, Rebecca M; White, Ian R; Jackson, Dan; Spiegelhalter, David J; Higgins, Julian P T

2016-12-20

Many meta-analyses combine results from only a small number of studies, a situation in which the between-study variance is imprecisely estimated when standard methods are applied. Bayesian meta-analysis allows incorporation of external evidence on heterogeneity, providing the potential for more robust inference on the effect size of interest. We present a method for performing Bayesian meta-analysis using data augmentation, in which we represent an informative conjugate prior for between-study variance by pseudo data and use meta-regression for estimation. To assist in this, we derive predictive inverse-gamma distributions for the between-study variance expected in future meta-analyses. These may serve as priors for heterogeneity in new meta-analyses. In a simulation study, we compare approximate Bayesian methods using meta-regression and pseudo data against fully Bayesian approaches based on importance sampling techniques and Markov chain Monte Carlo (MCMC). We compare the frequentist properties of these Bayesian methods with those of the commonly used frequentist DerSimonian and Laird procedure. The method is implemented in standard statistical software and provides a less complex alternative to standard MCMC approaches. An importance sampling approach produces almost identical results to standard MCMC approaches, and results obtained through meta-regression and pseudo data are very similar. On average, data augmentation provides closer results to MCMC, if implemented using restricted maximum likelihood estimation rather than DerSimonian and Laird or maximum likelihood estimation. The methods are applied to real datasets, and an extension to network meta-analysis is described. The proposed method facilitates Bayesian meta-analysis in a way that is accessible to applied researchers. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Statistical properties of filtered pseudorandom digital sequences formed from the sum of maximum-length sequences

NASA Technical Reports Server (NTRS)

Wallace, G. R.; Weathers, G. D.; Graf, E. R.

1973-01-01

The statistics of filtered pseudorandom digital sequences called hybrid-sum sequences, formed from the modulo-two sum of several maximum-length sequences, are analyzed. The results indicate that a relation exists between the statistics of the filtered sequence and the characteristic polynomials of the component maximum length sequences. An analysis procedure is developed for identifying a large group of sequences with good statistical properties for applications requiring the generation of analog pseudorandom noise. By use of the analysis approach, the filtering process is approximated by the convolution of the sequence with a sum of unit step functions. A parameter reflecting the overall statistical properties of filtered pseudorandom sequences is derived. This parameter is called the statistical quality factor. A computer algorithm to calculate the statistical quality factor for the filtered sequences is presented, and the results for two examples of sequence combinations are included. The analysis reveals that the statistics of the signals generated with the hybrid-sum generator are potentially superior to the statistics of signals generated with maximum-length generators. Furthermore, fewer calculations are required to evaluate the statistics of a large group of hybrid-sum generators than are required to evaluate the statistics of the same size group of approximately equivalent maximum-length sequences.
Rotation of EOFs by the Independent Component Analysis: Towards A Solution of the Mixing Problem in the Decomposition of Geophysical Time Series

NASA Technical Reports Server (NTRS)

Aires, Filipe; Rossow, William B.; Chedin, Alain; Hansen, James E. (Technical Monitor)

2001-01-01

The Independent Component Analysis is a recently developed technique for component extraction. This new method requires the statistical independence of the extracted components, a stronger constraint that uses higher-order statistics, instead of the classical decorrelation, a weaker constraint that uses only second-order statistics. This technique has been used recently for the analysis of geophysical time series with the goal of investigating the causes of variability in observed data (i.e. exploratory approach). We demonstrate with a data simulation experiment that, if initialized with a Principal Component Analysis, the Independent Component Analysis performs a rotation of the classical PCA (or EOF) solution. This rotation uses no localization criterion like other Rotation Techniques (RT), only the global generalization of decorrelation by statistical independence is used. This rotation of the PCA solution seems to be able to solve the tendency of PCA to mix several physical phenomena, even when the signal is just their linear sum.
Stationary statistical theory of two-surface multipactor regarding all impacts for efficient threshold analysis

NASA Astrophysics Data System (ADS)

Lin, Shu; Wang, Rui; Xia, Ning; Li, Yongdong; Liu, Chunliang

2018-01-01

Statistical multipactor theories are critical prediction approaches for multipactor breakdown determination. However, these approaches still require a negotiation between the calculation efficiency and accuracy. This paper presents an improved stationary statistical theory for efficient threshold analysis of two-surface multipactor. A general integral equation over the distribution function of the electron emission phase with both the single-sided and double-sided impacts considered is formulated. The modeling results indicate that the improved stationary statistical theory can not only obtain equally good accuracy of multipactor threshold calculation as the nonstationary statistical theory, but also achieve high calculation efficiency concurrently. By using this improved stationary statistical theory, the total time consumption in calculating full multipactor susceptibility zones of parallel plates can be decreased by as much as a factor of four relative to the nonstationary statistical theory. It also shows that the effect of single-sided impacts is indispensable for accurate multipactor prediction of coaxial lines and also more significant for the high order multipactor. Finally, the influence of secondary emission yield (SEY) properties on the multipactor threshold is further investigated. It is observed that the first cross energy and the energy range between the first cross and the SEY maximum both play a significant role in determining the multipactor threshold, which agrees with the numerical simulation results in the literature.
Hedonic approaches based on spatial econometrics and spatial statistics: application to evaluation of project benefits

NASA Astrophysics Data System (ADS)

Tsutsumi, Morito; Seya, Hajime

2009-12-01

This study discusses the theoretical foundation of the application of spatial hedonic approaches—the hedonic approach employing spatial econometrics or/and spatial statistics—to benefits evaluation. The study highlights the limitations of the spatial econometrics approach since it uses a spatial weight matrix that is not employed by the spatial statistics approach. Further, the study presents empirical analyses by applying the Spatial Autoregressive Error Model (SAEM), which is based on the spatial econometrics approach, and the Spatial Process Model (SPM), which is based on the spatial statistics approach. SPMs are conducted based on both isotropy and anisotropy and applied to different mesh sizes. The empirical analysis reveals that the estimated benefits are quite different, especially between isotropic and anisotropic SPM and between isotropic SPM and SAEM; the estimated benefits are similar for SAEM and anisotropic SPM. The study demonstrates that the mesh size does not affect the estimated amount of benefits. Finally, the study provides a confidence interval for the estimated benefits and raises an issue with regard to benefit evaluation.

Econometrics of joint production: another approach. [Petroleum refining and petrochemicals production

DOE Office of Scientific and Technical Information (OSTI.GOV)

Griffin, J.M.

1977-11-01

The pseudo data approach to the joint production of petroleum refining and chemicals is described as an alternative that avoids the multicollinearity of time series data and allows a complex technology to be characterized in a statistical price possibility frontier. Intended primarily for long-range analysis, the pseudo data method can be used as a source of elasticity estimate for policy analysis. 19 references.
Statistical Field Estimation for Complex Coastal Regions and Archipelagos (PREPRINT)

DTIC Science & Technology

2011-04-09

and study the computational properties of these schemes. Specifically, we extend a multiscale Objective Analysis (OA) approach to complex coastal...computational properties of these schemes. Specifically, we extend a multiscale Objective Analysis (OA) approach to complex coastal regions and... multiscale free-surface code builds on the primitive-equation model of the Harvard Ocean Predic- tion System (HOPS, Haley et al. (2009)). Additionally
Using "Short" Interrupted Time-Series Analysis To Measure the Impacts of Whole-School Reforms: With Applications to a Study of Accelerated Schools.

ERIC Educational Resources Information Center

Bloom, Howard S.

2002-01-01

Introduces an new approach for measuring the impact of whole school reforms. The approach, based on "short" interrupted time-series analysis, is explained, its statistical procedures are outlined, and how it was used in the evaluation of a major whole-school reform, Accelerated Schools is described (H. Bloom and others, 2001). (SLD)
Estimating the Regional Economic Significance of Airports

DTIC Science & Technology

1992-09-01

following three options for estimating induced impacts: the economic base model , an econometric model , and a regional input-output model . One approach to...limitations, however, the economic base model has been widely used for regional economic analysis. A second approach is to develop an econometric model of...analysis is the principal statistical tool used to estimate the economic relationships. Regional econometric models are capable of estimating a single
Mild cognitive impairment and fMRI studies of brain functional connectivity: the state of the art

PubMed Central

Farràs-Permanyer, Laia; Guàrdia-Olmos, Joan; Peró-Cebollero, Maribel

2015-01-01

In the last 15 years, many articles have studied brain connectivity in Mild Cognitive Impairment patients with fMRI techniques, seemingly using different connectivity statistical models in each investigation to identify complex connectivity structures so as to recognize typical behavior in this type of patient. This diversity in statistical approaches may cause problems in results comparison. This paper seeks to describe how researchers approached the study of brain connectivity in MCI patients using fMRI techniques from 2002 to 2014. The focus is on the statistical analysis proposed by each research group in reference to the limitations and possibilities of those techniques to identify some recommendations to improve the study of functional connectivity. The included articles came from a search of Web of Science and PsycINFO using the following keywords: f MRI, MCI, and functional connectivity. Eighty-one papers were found, but two of them were discarded because of the lack of statistical analysis. Accordingly, 79 articles were included in this review. We summarized some parts of the articles, including the goal of every investigation, the cognitive paradigm and methods used, brain regions involved, use of ROI analysis and statistical analysis, emphasizing on the connectivity estimation model used in each investigation. The present analysis allowed us to confirm the remarkable variability of the statistical analysis methods found. Additionally, the study of brain connectivity in this type of population is not providing, at the moment, any significant information or results related to clinical aspects relevant for prediction and treatment. We propose to follow guidelines for publishing fMRI data that would be a good solution to the problem of study replication. The latter aspect could be important for future publications because a higher homogeneity would benefit the comparison between publications and the generalization of results. PMID:26300802
Advanced microwave soil moisture studies. [Big Sioux River Basin, Iowa

NASA Technical Reports Server (NTRS)

Dalsted, K. J.; Harlan, J. C.

1983-01-01

Comparisons of low level L-band brightness temperature (TB) and thermal infrared (TIR) data as well as the following data sets: soil map and land cover data; direct soil moisture measurement; and a computer generated contour map were statistically evaluated using regression analysis and linear discriminant analysis. Regression analysis of footprint data shows that statistical groupings of ground variables (soil features and land cover) hold promise for qualitative assessment of soil moisture and for reducing variance within the sampling space. Dry conditions appear to be more conductive to producing meaningful statistics than wet conditions. Regression analysis using field averaged TB and TIR data did not approach the higher sq R values obtained using within-field variations. The linear discriminant analysis indicates some capacity to distinguish categories with the results being somewhat better on a field basis than a footprint basis.
Systematic review of statistical approaches to quantify, or correct for, measurement error in a continuous exposure in nutritional epidemiology.

PubMed

Bennett, Derrick A; Landry, Denise; Little, Julian; Minelli, Cosetta

2017-09-19

Several statistical approaches have been proposed to assess and correct for exposure measurement error. We aimed to provide a critical overview of the most common approaches used in nutritional epidemiology. MEDLINE, EMBASE, BIOSIS and CINAHL were searched for reports published in English up to May 2016 in order to ascertain studies that described methods aimed to quantify and/or correct for measurement error for a continuous exposure in nutritional epidemiology using a calibration study. We identified 126 studies, 43 of which described statistical methods and 83 that applied any of these methods to a real dataset. The statistical approaches in the eligible studies were grouped into: a) approaches to quantify the relationship between different dietary assessment instruments and "true intake", which were mostly based on correlation analysis and the method of triads; b) approaches to adjust point and interval estimates of diet-disease associations for measurement error, mostly based on regression calibration analysis and its extensions. Two approaches (multiple imputation and moment reconstruction) were identified that can deal with differential measurement error. For regression calibration, the most common approach to correct for measurement error used in nutritional epidemiology, it is crucial to ensure that its assumptions and requirements are fully met. Analyses that investigate the impact of departures from the classical measurement error model on regression calibration estimates can be helpful to researchers in interpreting their findings. With regard to the possible use of alternative methods when regression calibration is not appropriate, the choice of method should depend on the measurement error model assumed, the availability of suitable calibration study data and the potential for bias due to violation of the classical measurement error model assumptions. On the basis of this review, we provide some practical advice for the use of methods to assess and adjust for measurement error in nutritional epidemiology.
Experimental design matters for statistical analysis: how to handle blocking.

PubMed

Jensen, Signe M; Schaarschmidt, Frank; Onofri, Andrea; Ritz, Christian

2018-03-01

Nowadays, evaluation of the effects of pesticides often relies on experimental designs that involve multiple concentrations of the pesticide of interest or multiple pesticides at specific comparable concentrations and, possibly, secondary factors of interest. Unfortunately, the experimental design is often more or less neglected when analysing data. Two data examples were analysed using different modelling strategies. First, in a randomized complete block design, mean heights of maize treated with a herbicide and one of several adjuvants were compared. Second, translocation of an insecticide applied to maize as a seed treatment was evaluated using incomplete data from an unbalanced design with several layers of hierarchical sampling. Extensive simulations were carried out to further substantiate the effects of different modelling strategies. It was shown that results from suboptimal approaches (two-sample t-tests and ordinary ANOVA assuming independent observations) may be both quantitatively and qualitatively different from the results obtained using an appropriate linear mixed model. The simulations demonstrated that the different approaches may lead to differences in coverage percentages of confidence intervals and type 1 error rates, confirming that misleading conclusions can easily happen when an inappropriate statistical approach is chosen. To ensure that experimental data are summarized appropriately, avoiding misleading conclusions, the experimental design should duly be reflected in the choice of statistical approaches and models. We recommend that author guidelines should explicitly point out that authors need to indicate how the statistical analysis reflects the experimental design. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.
Older and Younger Workers: The Equalling Effects of Health

ERIC Educational Resources Information Center

Beck, Vanessa; Quinn, Martin

2012-01-01

Purpose: The purpose of this paper is to consider the statistical evidence on the effects that ill health has on labour market participation and opportunities for younger and older workers in the East Midlands (UK). Design/methodology/approach: A statistical analysis of Labour Force Survey data was undertaken to demonstrate that health issues…
Measuring the Gas Constant "R": Propagation of Uncertainty and Statistics

ERIC Educational Resources Information Center

Olsen, Robert J.; Sattar, Simeen

2013-01-01

Determining the gas constant "R" by measuring the properties of hydrogen gas collected in a gas buret is well suited for comparing two approaches to uncertainty analysis using a single data set. The brevity of the experiment permits multiple determinations, allowing for statistical evaluation of the standard uncertainty u[subscript…
Bayesian Statistics in Educational Research: A Look at the Current State of Affairs

ERIC Educational Resources Information Center

König, Christoph; van de Schoot, Rens

2018-01-01

The ability of a scientific discipline to build cumulative knowledge depends on its predominant method of data analysis. A steady accumulation of knowledge requires approaches which allow researchers to consider results from comparable prior research. Bayesian statistics is especially relevant for establishing a cumulative scientific discipline,…
Spatial Statistical Model and Optimal Survey Design for Rapid Geophysical Characterization of UXO Sites

DTIC Science & Technology

2003-07-01

4, Gnanadesikan , 1977). An entity whose measured features fall into one of the regions is classified accordingly. For the approaches we discuss here... Gnanadesikan , R. 1977. Methods for Statistical Data Analysis of Multivariate Observations. John Wiley & Sons, New York. Hassig, N. L., O’Brien, R. F
Statistical Analysis and Time Series Modeling of Air Traffic Operations Data From Flight Service Stations and Terminal Radar Approach Control Facilities : Two Case Studies

DOT National Transportation Integrated Search

1981-10-01

Two statistical procedures have been developed to estimate hourly or daily aircraft counts. These counts can then be transformed into estimates of instantaneous air counts. The first procedure estimates the stable (deterministic) mean level of hourly...
Doppler and speckle methods for diagnostics in dentistry

NASA Astrophysics Data System (ADS)

Ulyanov, Sergey S.; Lepilin, Alexander V.; Lebedeva, Nina G.; Sedykh, Alexey V.; Kharish, Natalia A.; Osipova, Yulia; Karpovich, Alexander

2002-02-01

The results of statistical analysis of Doppler spectra of scattered intensity, obtained from tissues of oral cavity membrane of healthy volunteers, are presented. The dependence of the spectral moments of Doppler signal on cutoff frequency is investigated. Some results of statistical analysis of Doppler spectra, obtained from tooth pulp of patients, are presented. New approach for monitoring of blood microcirculation in orthodontics is suggested. Influence of own noise of measuring system on formation of speckle-interferometric signal is studied.
A New Methodology for Systematic Exploitation of Technology Databases.

ERIC Educational Resources Information Center

Bedecarrax, Chantal; Huot, Charles

1994-01-01

Presents the theoretical aspects of a data analysis methodology that can help transform sequential raw data from a database into useful information, using the statistical analysis of patents as an example. Topics discussed include relational analysis and a technology watch approach. (Contains 17 references.) (LRW)
IGESS: a statistical approach to integrating individual-level genotype data and summary statistics in genome-wide association studies.

PubMed

Dai, Mingwei; Ming, Jingsi; Cai, Mingxuan; Liu, Jin; Yang, Can; Wan, Xiang; Xu, Zongben

2017-09-15

Results from genome-wide association studies (GWAS) suggest that a complex phenotype is often affected by many variants with small effects, known as 'polygenicity'. Tens of thousands of samples are often required to ensure statistical power of identifying these variants with small effects. However, it is often the case that a research group can only get approval for the access to individual-level genotype data with a limited sample size (e.g. a few hundreds or thousands). Meanwhile, summary statistics generated using single-variant-based analysis are becoming publicly available. The sample sizes associated with the summary statistics datasets are usually quite large. How to make the most efficient use of existing abundant data resources largely remains an open question. In this study, we propose a statistical approach, IGESS, to increasing statistical power of identifying risk variants and improving accuracy of risk prediction by i ntegrating individual level ge notype data and s ummary s tatistics. An efficient algorithm based on variational inference is developed to handle the genome-wide analysis. Through comprehensive simulation studies, we demonstrated the advantages of IGESS over the methods which take either individual-level data or summary statistics data as input. We applied IGESS to perform integrative analysis of Crohns Disease from WTCCC and summary statistics from other studies. IGESS was able to significantly increase the statistical power of identifying risk variants and improve the risk prediction accuracy from 63.2% ( ±0.4% ) to 69.4% ( ±0.1% ) using about 240 000 variants. The IGESS software is available at https://github.com/daviddaigithub/IGESS . zbxu@xjtu.edu.cn or xwan@comp.hkbu.edu.hk or eeyang@hkbu.edu.hk. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Association between Stereotactic Radiotherapy and Death from Brain Metastases of Epithelial Ovarian Cancer: a Gliwice Data Re-Analysis with Penalization

PubMed

Tukiendorf, Andrzej; Mansournia, Mohammad Ali; Wydmański, Jerzy; Wolny-Rokicka, Edyta

2017-04-01

Background: Clinical datasets for epithelial ovarian cancer brain metastatic patients are usually small in size. When adequate case numbers are lacking, resulting estimates of regression coefficients may demonstrate bias. One of the direct approaches to reduce such sparse-data bias is based on penalized estimation. Methods: A re- analysis of formerly reported hazard ratios in diagnosed patients was performed using penalized Cox regression with a popular SAS package providing additional software codes for a statistical computational procedure. Results: It was found that the penalized approach can readily diminish sparse data artefacts and radically reduce the magnitude of estimated regression coefficients. Conclusions: It was confirmed that classical statistical approaches may exaggerate regression estimates or distort study interpretations and conclusions. The results support the thesis that penalization via weak informative priors and data augmentation are the safest approaches to shrink sparse data artefacts frequently occurring in epidemiological research. Creative Commons Attribution License
A Monte Carlo–Based Bayesian Approach for Measuring Agreement in a Qualitative Scale

PubMed Central

Pérez Sánchez, Carlos Javier

2014-01-01

Agreement analysis has been an active research area whose techniques have been widely applied in psychology and other fields. However, statistical agreement among raters has been mainly considered from a classical statistics point of view. Bayesian methodology is a viable alternative that allows the inclusion of subjective initial information coming from expert opinions, personal judgments, or historical data. A Bayesian approach is proposed by providing a unified Monte Carlo–based framework to estimate all types of measures of agreement in a qualitative scale of response. The approach is conceptually simple and it has a low computational cost. Both informative and non-informative scenarios are considered. In case no initial information is available, the results are in line with the classical methodology, but providing more information on the measures of agreement. For the informative case, some guidelines are presented to elicitate the prior distribution. The approach has been applied to two applications related to schizophrenia diagnosis and sensory analysis. PMID:29881002
Phase Space Dissimilarity Measures for Structural Health Monitoring

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bubacz, Jacob A; Chmielewski, Hana T; Pape, Alexander E

A novel method for structural health monitoring (SHM), known as the Phase Space Dissimilarity Measures (PSDM) approach, is proposed and developed. The patented PSDM approach has already been developed and demonstrated for a variety of equipment and biomedical applications. Here, we investigate SHM of bridges via analysis of time serial accelerometer measurements. This work has four aspects. The first is algorithm scalability, which was found to scale linearly from one processing core to four cores. Second, the same data are analyzed to determine how the use of the PSDM approach affects sensor placement. We found that a relatively low-density placementmore » sufficiently captures the dynamics of the structure. Third, the same data are analyzed by unique combinations of accelerometer axes (vertical, longitudinal, and lateral with respect to the bridge) to determine how the choice of axes affects the analysis. The vertical axis is found to provide satisfactory SHM data. Fourth, statistical methods were investigated to validate the PSDM approach for this application, yielding statistically significant results.« less
Association analysis of multiple traits by an approach of combining P values.

PubMed

Chen, Lili; Wang, Yong; Zhou, Yajing

2018-03-01

Increasing evidence shows that one variant can affect multiple traits, which is a widespread phenomenon in complex diseases. Joint analysis of multiple traits can increase statistical power of association analysis and uncover the underlying genetic mechanism. Although there are many statistical methods to analyse multiple traits, most of these methods are usually suitable for detecting common variants associated with multiple traits. However, because of low minor allele frequency of rare variant, these methods are not optimal for rare variant association analysis. In this paper, we extend an adaptive combination of P values method (termed ADA) for single trait to test association between multiple traits and rare variants in the given region. For a given region, we use reverse regression model to test each rare variant associated with multiple traits and obtain the P value of single-variant test. Further, we take the weighted combination of these P values as the test statistic. Extensive simulation studies show that our approach is more powerful than several other comparison methods in most cases and is robust to the inclusion of a high proportion of neutral variants and the different directions of effects of causal variants.

Effect Size as the Essential Statistic in Developing Methods for mTBI Diagnosis.

PubMed

Gibson, Douglas Brandt

2015-01-01

The descriptive statistic known as "effect size" measures the distinguishability of two sets of data. Distingishability is at the core of diagnosis. This article is intended to point out the importance of effect size in the development of effective diagnostics for mild traumatic brain injury and to point out the applicability of the effect size statistic in comparing diagnostic efficiency across the main proposed TBI diagnostic methods: psychological, physiological, biochemical, and radiologic. Comparing diagnostic approaches is difficult because different researcher in different fields have different approaches to measuring efficacy. Converting diverse measures to effect sizes, as is done in meta-analysis, is a relatively easy way to make studies comparable.
A stochastic approach to uncertainty quantification in residual moveout analysis

NASA Astrophysics Data System (ADS)

Johng-Ay, T.; Landa, E.; Dossou-Gbété, S.; Bordes, L.

2015-06-01

Oil and gas exploration and production relies usually on the interpretation of a single seismic image, which is obtained from observed data. However, the statistical nature of seismic data and the various approximations and assumptions are sources of uncertainties which may corrupt the evaluation of parameters. The quantification of these uncertainties is a major issue which supposes to help in decisions that have important social and commercial implications. The residual moveout analysis, which is an important step in seismic data processing is usually performed by a deterministic approach. In this paper we discuss a Bayesian approach to the uncertainty analysis.
Metaplot: a novel stata graph for assessing heterogeneity at a glance.

PubMed

Poorolajal, J; Mahmoodi, M; Majdzadeh, R; Fotouhi, A

2010-01-01

Heterogeneity is usually a major concern in meta-analysis. Although there are some statistical approaches for assessing variability across studies, here we present a new approach to heterogeneity using "MetaPlot" that investigate the influence of a single study on the overall heterogeneity. MetaPlot is a two-way (x, y) graph, which can be considered as a complementary graphical approach for testing heterogeneity. This method shows graphically as well as numerically the results of an influence analysis, in which Higgins' I(2) statistic with 95% (Confidence interval) CI are computed omitting one study in each turn and then are plotted against reciprocal of standard error (1/SE) or "precision". In this graph, "1/SE" lies on x axis and "I(2) results" lies on y axe. Having a first glance at MetaPlot, one can predict to what extent omission of a single study may influence the overall heterogeneity. The precision on x-axis enables us to distinguish the size of each trial. The graph describes I(2) statistic with 95% CI graphically as well as numerically in one view for prompt comparison. It is possible to implement MetaPlot for meta-analysis of different types of outcome data and summary measures. This method presents a simple graphical approach to identify an outlier and its effect on overall heterogeneity at a glance. We wish to suggest MetaPlot to Stata experts to prepare its module for the software.
Metaplot: A Novel Stata Graph for Assessing Heterogeneity at a Glance

PubMed Central

Poorolajal, J; Mahmoodi, M; Majdzadeh, R; Fotouhi, A

2010-01-01

Background: Heterogeneity is usually a major concern in meta-analysis. Although there are some statistical approaches for assessing variability across studies, here we present a new approach to heterogeneity using “MetaPlot” that investigate the influence of a single study on the overall heterogeneity. Methods: MetaPlot is a two-way (x, y) graph, which can be considered as a complementary graphical approach for testing heterogeneity. This method shows graphically as well as numerically the results of an influence analysis, in which Higgins’ I2 statistic with 95% (Confidence interval) CI are computed omitting one study in each turn and then are plotted against reciprocal of standard error (1/SE) or “precision”. In this graph, “1/SE” lies on x axis and “I2 results” lies on y axe. Results: Having a first glance at MetaPlot, one can predict to what extent omission of a single study may influence the overall heterogeneity. The precision on x-axis enables us to distinguish the size of each trial. The graph describes I2 statistic with 95% CI graphically as well as numerically in one view for prompt comparison. It is possible to implement MetaPlot for meta-analysis of different types of outcome data and summary measures. Conclusion: This method presents a simple graphical approach to identify an outlier and its effect on overall heterogeneity at a glance. We wish to suggest MetaPlot to Stata experts to prepare its module for the software. PMID:23113013
A critique of the usefulness of inferential statistics in applied behavior analysis

PubMed Central

Hopkins, B. L.; Cole, Brian L.; Mason, Tina L.

1998-01-01

Researchers continue to recommend that applied behavior analysts use inferential statistics in making decisions about effects of independent variables on dependent variables. In many other approaches to behavioral science, inferential statistics are the primary means for deciding the importance of effects. Several possible uses of inferential statistics are considered. Rather than being an objective means for making decisions about effects, as is often claimed, inferential statistics are shown to be subjective. It is argued that the use of inferential statistics adds nothing to the complex and admittedly subjective nonstatistical methods that are often employed in applied behavior analysis. Attacks on inferential statistics that are being made, perhaps with increasing frequency, by those who are not behavior analysts, are discussed. These attackers are calling for banning the use of inferential statistics in research publications and commonly recommend that behavioral scientists should switch to using statistics aimed at interval estimation or the method of confidence intervals. Interval estimation is shown to be contrary to the fundamental assumption of behavior analysis that only individuals behave. It is recommended that authors who wish to publish the results of inferential statistics be asked to justify them as a means for helping us to identify any ways in which they may be useful. PMID:22478304
Simplified Approach to Predicting Rough Surface Transition

NASA Technical Reports Server (NTRS)

Boyle, Robert J.; Stripf, Matthias

2009-01-01

Turbine vane heat transfer predictions are given for smooth and rough vanes where the experimental data show transition moving forward on the vane as the surface roughness physical height increases. Consiste nt with smooth vane heat transfer, the transition moves forward for a fixed roughness height as the Reynolds number increases. Comparison s are presented with published experimental data. Some of the data ar e for a regular roughness geometry with a range of roughness heights, Reynolds numbers, and inlet turbulence intensities. The approach ta ken in this analysis is to treat the roughness in a statistical sense , consistent with what would be obtained from blades measured after e xposure to actual engine environments. An approach is given to determ ine the equivalent sand grain roughness from the statistics of the re gular geometry. This approach is guided by the experimental data. A roughness transition criterion is developed, and comparisons are made with experimental data over the entire range of experimental test co nditions. Additional comparisons are made with experimental heat tran sfer data, where the roughness geometries are both regular as well a s statistical. Using the developed analysis, heat transfer calculatio ns are presented for the second stage vane of a high pressure turbine at hypothetical engine conditions.
Developing a Campaign Plan to Target Centers of Gravity Within Economic Systems

DTIC Science & Technology

1995-05-01

Conclusion 67 CHAPTER 7: CURRENT AND FUTURE CONCERNS 69 Decision Making and Planning 69 Conclusion 72 CHAPTER 8: CONCLUSION 73 APPENDIX A: STATISTICS 80...Terminology and Statistical Tests 80 Country Analysis 84 APPENDIX B 154 BIBLIOGRAPHY 157 VITAE 162 IV LIST OF FIGURES Figure 1. Air Campaign...This project furthers the original statistical effort and adds to this a campaign planning approach (including both systems and operational level
A combined pre-clinical meta-analysis and randomized confirmatory trial approach to improve data validity for therapeutic target validation.

PubMed

Kleikers, Pamela W M; Hooijmans, Carlijn; Göb, Eva; Langhauser, Friederike; Rewell, Sarah S J; Radermacher, Kim; Ritskes-Hoitinga, Merel; Howells, David W; Kleinschnitz, Christoph; Schmidt, Harald H H W

2015-08-27

Biomedical research suffers from a dramatically poor translational success. For example, in ischemic stroke, a condition with a high medical need, over a thousand experimental drug targets were unsuccessful. Here, we adopt methods from clinical research for a late-stage pre-clinical meta-analysis (MA) and randomized confirmatory trial (pRCT) approach. A profound body of literature suggests NOX2 to be a major therapeutic target in stroke. Systematic review and MA of all available NOX2(-/y) studies revealed a positive publication bias and lack of statistical power to detect a relevant reduction in infarct size. A fully powered multi-center pRCT rejects NOX2 as a target to improve neurofunctional outcomes or achieve a translationally relevant infarct size reduction. Thus stringent statistical thresholds, reporting negative data and a MA-pRCT approach can ensure biomedical data validity and overcome risks of bias.
Statistical mapping of zones of focused groundwater/surface-water exchange using fiber-optic distributed temperature sensing

USGS Publications Warehouse

Mwakanyamale, Kisa; Day-Lewis, Frederick D.; Slater, Lee D.

2013-01-01

Fiber-optic distributed temperature sensing (FO-DTS) increasingly is used to map zones of focused groundwater/surface-water exchange (GWSWE). Previous studies of GWSWE using FO-DTS involved identification of zones of focused GWSWE based on arbitrary cutoffs of FO-DTS time-series statistics (e.g., variance, cross-correlation between temperature and stage, or spectral power). New approaches are needed to extract more quantitative information from large, complex FO-DTS data sets while concurrently providing an assessment of uncertainty associated with mapping zones of focused GSWSE. Toward this end, we present a strategy combining discriminant analysis (DA) and spectral analysis (SA). We demonstrate the approach using field experimental data from a reach of the Columbia River adjacent to the Hanford 300 Area site. Results of the combined SA/DA approach are shown to be superior to previous results from qualitative interpretation of FO-DTS spectra alone.
A comparative study of two statistical approaches for the analysis of real seismicity sequences and synthetic seismicity generated by a stick-slip experimental model

NASA Astrophysics Data System (ADS)

Flores-Marquez, Leticia Elsa; Ramirez Rojaz, Alejandro; Telesca, Luciano

2015-04-01

The study of two statistical approaches is analyzed for two different types of data sets, one is the seismicity generated by the subduction processes occurred at south Pacific coast of Mexico between 2005 and 2012, and the other corresponds to the synthetic seismic data generated by a stick-slip experimental model. The statistical methods used for the present study are the visibility graph in order to investigate the time dynamics of the series and the scaled probability density function in the natural time domain to investigate the critical order of the system. This comparison has the purpose to show the similarities between the dynamical behaviors of both types of data sets, from the point of view of critical systems. The observed behaviors allow us to conclude that the experimental set up globally reproduces the behavior observed in the statistical approaches used to analyses the seismicity of the subduction zone. The present study was supported by the Bilateral Project Italy-Mexico Experimental Stick-slip models of tectonic faults: innovative statistical approaches applied to synthetic seismic sequences, jointly funded by MAECI (Italy) and AMEXCID (Mexico) in the framework of the Bilateral Agreement for Scientific and Technological Cooperation PE 2014-2016.
Two approaches to incorporate clinical data uncertainty into multiple criteria decision analysis for benefit-risk assessment of medicinal products.

PubMed

Wen, Shihua; Zhang, Lanju; Yang, Bo

2014-07-01

The Problem formulation, Objectives, Alternatives, Consequences, Trade-offs, Uncertainties, Risk attitude, and Linked decisions (PrOACT-URL) framework and multiple criteria decision analysis (MCDA) have been recommended by the European Medicines Agency for structured benefit-risk assessment of medicinal products undergoing regulatory review. The objective of this article was to provide solutions to incorporate the uncertainty from clinical data into the MCDA model when evaluating the overall benefit-risk profiles among different treatment options. Two statistical approaches, the δ-method approach and the Monte-Carlo approach, were proposed to construct the confidence interval of the overall benefit-risk score from the MCDA model as well as other probabilistic measures for comparing the benefit-risk profiles between treatment options. Both approaches can incorporate the correlation structure between clinical parameters (criteria) in the MCDA model and are straightforward to implement. The two proposed approaches were applied to a case study to evaluate the benefit-risk profile of an add-on therapy for rheumatoid arthritis (drug X) relative to placebo. It demonstrated a straightforward way to quantify the impact of the uncertainty from clinical data to the benefit-risk assessment and enabled statistical inference on evaluating the overall benefit-risk profiles among different treatment options. The δ-method approach provides a closed form to quantify the variability of the overall benefit-risk score in the MCDA model, whereas the Monte-Carlo approach is more computationally intensive but can yield its true sampling distribution for statistical inference. The obtained confidence intervals and other probabilistic measures from the two approaches enhance the benefit-risk decision making of medicinal products. Copyright © 2014 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
Efficiency Analysis of Public Universities in Thailand

ERIC Educational Resources Information Center

Kantabutra, Saranya; Tang, John C. S.

2010-01-01

This paper examines the performance of Thai public universities in terms of efficiency, using a non-parametric approach called data envelopment analysis. Two efficiency models, the teaching efficiency model and the research efficiency model, are developed and the analysis is conducted at the faculty level. Further statistical analyses are also…
Markov Logic Networks in the Analysis of Genetic Data

PubMed Central

Sakhanenko, Nikita A.

2010-01-01

Abstract Complex, non-additive genetic interactions are common and can be critical in determining phenotypes. Genome-wide association studies (GWAS) and similar statistical studies of linkage data, however, assume additive models of gene interactions in looking for genotype-phenotype associations. These statistical methods view the compound effects of multiple genes on a phenotype as a sum of influences of each gene and often miss a substantial part of the heritable effect. Such methods do not use any biological knowledge about underlying mechanisms. Modeling approaches from the artificial intelligence (AI) field that incorporate deterministic knowledge into models to perform statistical analysis can be applied to include prior knowledge in genetic analysis. We chose to use the most general such approach, Markov Logic Networks (MLNs), for combining deterministic knowledge with statistical analysis. Using simple, logistic regression-type MLNs we can replicate the results of traditional statistical methods, but we also show that we are able to go beyond finding independent markers linked to a phenotype by using joint inference without an independence assumption. The method is applied to genetic data on yeast sporulation, a complex phenotype with gene interactions. In addition to detecting all of the previously identified loci associated with sporulation, our method identifies four loci with smaller effects. Since their effect on sporulation is small, these four loci were not detected with methods that do not account for dependence between markers due to gene interactions. We show how gene interactions can be detected using more complex models, which can be used as a general framework for incorporating systems biology with genetics. PMID:20958249
Multi-resolutional shape features via non-Euclidean wavelets: Applications to statistical analysis of cortical thickness

PubMed Central

Kim, Won Hwa; Singh, Vikas; Chung, Moo K.; Hinrichs, Chris; Pachauri, Deepti; Okonkwo, Ozioma C.; Johnson, Sterling C.

2014-01-01

Statistical analysis on arbitrary surface meshes such as the cortical surface is an important approach to understanding brain diseases such as Alzheimer’s disease (AD). Surface analysis may be able to identify specific cortical patterns that relate to certain disease characteristics or exhibit differences between groups. Our goal in this paper is to make group analysis of signals on surfaces more sensitive. To do this, we derive multi-scale shape descriptors that characterize the signal around each mesh vertex, i.e., its local context, at varying levels of resolution. In order to define such a shape descriptor, we make use of recent results from harmonic analysis that extend traditional continuous wavelet theory from the Euclidean to a non-Euclidean setting (i.e., a graph, mesh or network). Using this descriptor, we conduct experiments on two different datasets, the Alzheimer’s Disease NeuroImaging Initiative (ADNI) data and images acquired at the Wisconsin Alzheimer’s Disease Research Center (W-ADRC), focusing on individuals labeled as having Alzheimer’s disease (AD), mild cognitive impairment (MCI) and healthy controls. In particular, we contrast traditional univariate methods with our multi-resolution approach which show increased sensitivity and improved statistical power to detect a group-level effects. We also provide an open source implementation. PMID:24614060
Effect of the absolute statistic on gene-sampling gene-set analysis methods.

PubMed

Nam, Dougu

2017-06-01

Gene-set enrichment analysis and its modified versions have commonly been used for identifying altered functions or pathways in disease from microarray data. In particular, the simple gene-sampling gene-set analysis methods have been heavily used for datasets with only a few sample replicates. The biggest problem with this approach is the highly inflated false-positive rate. In this paper, the effect of absolute gene statistic on gene-sampling gene-set analysis methods is systematically investigated. Thus far, the absolute gene statistic has merely been regarded as a supplementary method for capturing the bidirectional changes in each gene set. Here, it is shown that incorporating the absolute gene statistic in gene-sampling gene-set analysis substantially reduces the false-positive rate and improves the overall discriminatory ability. Its effect was investigated by power, false-positive rate, and receiver operating curve for a number of simulated and real datasets. The performances of gene-set analysis methods in one-tailed (genome-wide association study) and two-tailed (gene expression data) tests were also compared and discussed.
The role of empirical Bayes methodology as a leading principle in modern medical statistics.

PubMed

van Houwelingen, Hans C

2014-11-01

This paper reviews and discusses the role of Empirical Bayes methodology in medical statistics in the last 50 years. It gives some background on the origin of the empirical Bayes approach and its link with the famous Stein estimator. The paper describes the application in four important areas in medical statistics: disease mapping, health care monitoring, meta-analysis, and multiple testing. It ends with a warning that the application of the outcome of an empirical Bayes analysis to the individual "subjects" is a delicate matter that should be handled with prudence and care. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Treatment of Selective Mutism: A Best-Evidence Synthesis.

ERIC Educational Resources Information Center

Stone, Beth Pionek; Kratochwill, Thomas R.; Sladezcek, Ingrid; Serlin, Ronald C.

2002-01-01

Presents systematic analysis of the major treatment approaches used for selective mutism. Based on nonparametric statistical tests of effect sizes, major findings include the following: treatment of selective mutism is more effective than no treatment; behaviorally oriented treatment approaches are more effective than no treatment; and no…
Semistochastic approach to many electron systems

NASA Astrophysics Data System (ADS)

Grossjean, M. K.; Grossjean, M. F.; Schulten, K.; Tavan, P.

1992-08-01

A Pariser-Parr-Pople (PPP) Hamiltonian of an 8π electron system of the molecule octatetraene, represented in a configuration-interaction basis (CI basis), is analyzed with respect to the statistical properties of its matrix elements. Based on this analysis we develop an effective Hamiltonian, which represents virtual excitations by a Gaussian orthogonal ensemble (GOE). We also examine numerical approaches which replace the original Hamiltonian by a semistochastically generated CI matrix. In that CI matrix, the matrix elements of high energy excitations are choosen randomly according to distributions reflecting the statistics of the original CI matrix.
An efficient approach to identify different chemical markers between fibrous root and rhizome of Anemarrhena asphodeloides by ultra high-performance liquid chromatography quadrupole time-of-flight tandem mass spectrometry with multivariate statistical analysis.

PubMed

Wang, Fang-Xu; Yuan, Jian-Chao; Kang, Li-Ping; Pang, Xu; Yan, Ren-Yi; Zhao, Yang; Zhang, Jie; Sun, Xin-Guang; Ma, Bai-Ping

2016-09-10

An ultra high-performance liquid chromatography quadrupole time-of-flight tandem mass spectrometry approach coupled with multivariate statistical analysis was established and applied to rapidly distinguish the chemical differences between fibrous root and rhizome of Anemarrhena asphodeloides. The datasets of tR-m/z pairs, ion intensity and sample code were processed by principal component analysis and orthogonal partial least squares discriminant analysis. Chemical markers could be identified based on their exact mass data, fragmentation characteristics, and retention times. And the new compounds among chemical markers could be isolated rapidly guided by the ultra high-performance liquid chromatography quadrupole time-of-flight tandem mass spectrometry and their definitive structures would be further elucidated by NMR spectra. Using this approach, twenty-four markers were identified on line including nine new saponins and five new steroidal saponins of them were obtained in pure form. The study validated this proposed approach as a suitable method for identification of the chemical differences between various medicinal parts in order to expand medicinal parts and increase the utilization rate of resources. Copyright © 2016 Elsevier B.V. All rights reserved.
Nonequilibrium Statistical Operator Method and Generalized Kinetic Equations

NASA Astrophysics Data System (ADS)

Kuzemsky, A. L.

2018-01-01

We consider some principal problems of nonequilibrium statistical thermodynamics in the framework of the Zubarev nonequilibrium statistical operator approach. We present a brief comparative analysis of some approaches to describing irreversible processes based on the concept of nonequilibrium Gibbs ensembles and their applicability to describing nonequilibrium processes. We discuss the derivation of generalized kinetic equations for a system in a heat bath. We obtain and analyze a damped Schrödinger-type equation for a dynamical system in a heat bath. We study the dynamical behavior of a particle in a medium taking the dissipation effects into account. We consider the scattering problem for neutrons in a nonequilibrium medium and derive a generalized Van Hove formula. We show that the nonequilibrium statistical operator method is an effective, convenient tool for describing irreversible processes in condensed matter.

Statistical parsimony networks and species assemblages in Cephalotrichid nemerteans (nemertea).

PubMed

Chen, Haixia; Strand, Malin; Norenburg, Jon L; Sun, Shichun; Kajihara, Hiroshi; Chernyshev, Alexey V; Maslakova, Svetlana A; Sundberg, Per

2010-09-21

It has been suggested that statistical parsimony network analysis could be used to get an indication of species represented in a set of nucleotide data, and the approach has been used to discuss species boundaries in some taxa. Based on 635 base pairs of the mitochondrial protein-coding gene cytochrome c oxidase I (COI), we analyzed 152 nemertean specimens using statistical parsimony network analysis with the connection probability set to 95%. The analysis revealed 15 distinct networks together with seven singletons. Statistical parsimony yielded three networks supporting the species status of Cephalothrix rufifrons, C. major and C. spiralis as they currently have been delineated by morphological characters and geographical location. Many other networks contained haplotypes from nearby geographical locations. Cladistic structure by maximum likelihood analysis overall supported the network analysis, but indicated a false positive result where subnetworks should have been connected into one network/species. This probably is caused by undersampling of the intraspecific haplotype diversity. Statistical parsimony network analysis provides a rapid and useful tool for detecting possible undescribed/cryptic species among cephalotrichid nemerteans based on COI gene. It should be combined with phylogenetic analysis to get indications of false positive results, i.e., subnetworks that would have been connected with more extensive haplotype sampling.
The use of imputed sibling genotypes in sibship-based association analysis: on modeling alternatives, power and model misspecification.

PubMed

Minică, Camelia C; Dolan, Conor V; Hottenga, Jouke-Jan; Willemsen, Gonneke; Vink, Jacqueline M; Boomsma, Dorret I

2013-05-01

When phenotypic, but no genotypic data are available for relatives of participants in genetic association studies, previous research has shown that family-based imputed genotypes can boost the statistical power when included in such studies. Here, using simulations, we compared the performance of two statistical approaches suitable to model imputed genotype data: the mixture approach, which involves the full distribution of the imputed genotypes and the dosage approach, where the mean of the conditional distribution features as the imputed genotype. Simulations were run by varying sibship size, size of the phenotypic correlations among siblings, imputation accuracy and minor allele frequency of the causal SNP. Furthermore, as imputing sibling data and extending the model to include sibships of size two or greater requires modeling the familial covariance matrix, we inquired whether model misspecification affects power. Finally, the results obtained via simulations were empirically verified in two datasets with continuous phenotype data (height) and with a dichotomous phenotype (smoking initiation). Across the settings considered, the mixture and the dosage approach are equally powerful and both produce unbiased parameter estimates. In addition, the likelihood-ratio test in the linear mixed model appears to be robust to the considered misspecification in the background covariance structure, given low to moderate phenotypic correlations among siblings. Empirical results show that the inclusion in association analysis of imputed sibling genotypes does not always result in larger test statistic. The actual test statistic may drop in value due to small effect sizes. That is, if the power benefit is small, that the change in distribution of the test statistic under the alternative is relatively small, the probability is greater of obtaining a smaller test statistic. As the genetic effects are typically hypothesized to be small, in practice, the decision on whether family-based imputation could be used as a means to increase power should be informed by prior power calculations and by the consideration of the background correlation.
A hybrid correlation analysis with application to imaging genetics

NASA Astrophysics Data System (ADS)

Hu, Wenxing; Fang, Jian; Calhoun, Vince D.; Wang, Yu-Ping

2018-03-01

Investigating the association between brain regions and genes continues to be a challenging topic in imaging genetics. Current brain region of interest (ROI)-gene association studies normally reduce data dimension by averaging the value of voxels in each ROI. This averaging may lead to a loss of information due to the existence of functional sub-regions. Pearson correlation is widely used for association analysis. However, it only detects linear correlation whereas nonlinear correlation may exist among ROIs. In this work, we introduced distance correlation to ROI-gene association analysis, which can detect both linear and nonlinear correlations and overcome the limitation of averaging operations by taking advantage of the information at each voxel. Nevertheless, distance correlation usually has a much lower value than Pearson correlation. To address this problem, we proposed a hybrid correlation analysis approach, by applying canonical correlation analysis (CCA) to the distance covariance matrix instead of directly computing distance correlation. Incorporating CCA into distance correlation approach may be more suitable for complex disease study because it can detect highly associated pairs of ROI and gene groups, and may improve the distance correlation level and statistical power. In addition, we developed a novel nonlinear CCA, called distance kernel CCA, which seeks the optimal combination of features with the most significant dependence. This approach was applied to imaging genetic data from the Philadelphia Neurodevelopmental Cohort (PNC). Experiments showed that our hybrid approach produced more consistent results than conventional CCA across resampling and both the correlation and statistical significance were increased compared to distance correlation analysis. Further gene enrichment analysis and region of interest (ROI) analysis confirmed the associations of the identified genes with brain ROIs. Therefore, our approach provides a powerful tool for finding the correlation between brain imaging and genomic data.
Single-case research design in pediatric psychology: considerations regarding data analysis.

PubMed

Cohen, Lindsey L; Feinstein, Amanda; Masuda, Akihiko; Vowles, Kevin E

2014-03-01

Single-case research allows for an examination of behavior and can demonstrate the functional relation between intervention and outcome in pediatric psychology. This review highlights key assumptions, methodological and design considerations, and options for data analysis. Single-case methodology and guidelines are reviewed with an in-depth focus on visual and statistical analyses. Guidelines allow for the careful evaluation of design quality and visual analysis. A number of statistical techniques have been introduced to supplement visual analysis, but to date, there is no consensus on their recommended use in single-case research design. Single-case methodology is invaluable for advancing pediatric psychology science and practice, and guidelines have been introduced to enhance the consistency, validity, and reliability of these studies. Experts generally agree that visual inspection is the optimal method of analysis in single-case design; however, statistical approaches are becoming increasingly evaluated and used to augment data interpretation.
Spatiotemporal Analysis of the Ebola Hemorrhagic Fever in West Africa in 2014

NASA Astrophysics Data System (ADS)

Xu, M.; Cao, C. X.; Guo, H. F.

2017-09-01

Ebola hemorrhagic fever (EHF) is an acute hemorrhagic diseases caused by the Ebola virus, which is highly contagious. This paper aimed to explore the possible gathering area of EHF cases in West Africa in 2014, and identify endemic areas and their tendency by means of time-space analysis. We mapped distribution of EHF incidences and explored statistically significant space, time and space-time disease clusters. We utilized hotspot analysis to find the spatial clustering pattern on the basis of the actual outbreak cases. spatial-temporal cluster analysis is used to analyze the spatial or temporal distribution of agglomeration disease, examine whether its distribution is statistically significant. Local clusters were investigated using Kulldorff's scan statistic approach. The result reveals that the epidemic mainly gathered in the western part of Africa near north Atlantic with obvious regional distribution. For the current epidemic, we have found areas in high incidence of EVD by means of spatial cluster analysis.
A critique of Rasch residual fit statistics.

PubMed

Karabatsos, G

2000-01-01

In test analysis involving the Rasch model, a large degree of importance is placed on the "objective" measurement of individual abilities and item difficulties. The degree to which the objectivity properties are attained, of course, depends on the degree to which the data fit the Rasch model. It is therefore important to utilize fit statistics that accurately and reliably detect the person-item response inconsistencies that threaten the measurement objectivity of persons and items. Given this argument, it is somewhat surprising that there is far more emphasis placed in the objective measurement of person and items than there is in the measurement quality of Rasch fit statistics. This paper provides a critical analysis of the residual fit statistics of the Rasch model, arguably the most often used fit statistics, in an effort to illustrate that the task of Rasch fit analysis is not as simple and straightforward as it appears to be. The faulty statistical properties of the residual fit statistics do not allow either a convenient or a straightforward approach to Rasch fit analysis. For instance, given a residual fit statistic, the use of a single minimum critical value for misfit diagnosis across different testing situations, where the situations vary in sample and test properties, leads to both the overdetection and underdetection of misfit. To improve this situation, it is argued that psychometricians need to implement residual-free Rasch fit statistics that are based on the number of Guttman response errors, or use indices that are statistically optimal in detecting measurement disturbances.
Detection of Fatty Acids from Intact Microorganisms by Molecular Beam Static Secondary Ion Mass Spectrometry

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ingram, Jani Cheri; Lehman, Richard Michael; Bauer, William Francis

We report the use of a surface analysis approach, static secondary ion mass spectrometry (SIMS) equipped with a molecular (ReO4-) ion primary beam, to analyze the surface of intact microbial cells. SIMS spectra of 28 microorganisms were compared to fatty acid profiles determined by gas chromatographic analysis of transesterfied fatty acids extracted from the same organisms. The results indicate that surface bombardment using the molecular primary beam cleaved the ester linkage characteristic of bacteria at the glycerophosphate backbone of the phospholipid components of the cell membrane. This cleavage enables direct detection of the fatty acid conjugate base of intact microorganismsmore » by static SIMS. The limit of detection for this approach is approximately 107 bacterial cells/cm2. Multivariate statistical methods were applied in a graded approach to the SIMS microbial data. The results showed that the full data set could initially be statistically grouped based upon major differences in biochemical composition of the cell wall. The gram-positive bacteria were further statistically analyzed, followed by final analysis of a specific bacterial genus that was successfully grouped by species. Additionally, the use of SIMS to detect microbes on mineral surfaces is demonstrated by an analysis of Shewanella oneidensis on crushed hematite. The results of this study provide evidence for the potential of static SIMS to rapidly detect bacterial species based on ion fragments originating from cell membrane lipids directly from sample surfaces.« less
A novel bi-level meta-analysis approach: applied to biological pathway analysis.

PubMed

Nguyen, Tin; Tagett, Rebecca; Donato, Michele; Mitrea, Cristina; Draghici, Sorin

2016-02-01

The accumulation of high-throughput data in public repositories creates a pressing need for integrative analysis of multiple datasets from independent experiments. However, study heterogeneity, study bias, outliers and the lack of power of available methods present real challenge in integrating genomic data. One practical drawback of many P-value-based meta-analysis methods, including Fisher's, Stouffer's, minP and maxP, is that they are sensitive to outliers. Another drawback is that, because they perform just one statistical test for each individual experiment, they may not fully exploit the potentially large number of samples within each study. We propose a novel bi-level meta-analysis approach that employs the additive method and the Central Limit Theorem within each individual experiment and also across multiple experiments. We prove that the bi-level framework is robust against bias, less sensitive to outliers than other methods, and more sensitive to small changes in signal. For comparative analysis, we demonstrate that the intra-experiment analysis has more power than the equivalent statistical test performed on a single large experiment. For pathway analysis, we compare the proposed framework versus classical meta-analysis approaches (Fisher's, Stouffer's and the additive method) as well as against a dedicated pathway meta-analysis package (MetaPath), using 1252 samples from 21 datasets related to three human diseases, acute myeloid leukemia (9 datasets), type II diabetes (5 datasets) and Alzheimer's disease (7 datasets). Our framework outperforms its competitors to correctly identify pathways relevant to the phenotypes. The framework is sufficiently general to be applied to any type of statistical meta-analysis. The R scripts are available on demand from the authors. sorin@wayne.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Event coincidence analysis for quantifying statistical interrelationships between event time series. On the role of flood events as triggers of epidemic outbreaks

NASA Astrophysics Data System (ADS)

Donges, J. F.; Schleussner, C.-F.; Siegmund, J. F.; Donner, R. V.

2016-05-01

Studying event time series is a powerful approach for analyzing the dynamics of complex dynamical systems in many fields of science. In this paper, we describe the method of event coincidence analysis to provide a framework for quantifying the strength, directionality and time lag of statistical interrelationships between event series. Event coincidence analysis allows to formulate and test null hypotheses on the origin of the observed interrelationships including tests based on Poisson processes or, more generally, stochastic point processes with a prescribed inter-event time distribution and other higher-order properties. Applying the framework to country-level observational data yields evidence that flood events have acted as triggers of epidemic outbreaks globally since the 1950s. Facing projected future changes in the statistics of climatic extreme events, statistical techniques such as event coincidence analysis will be relevant for investigating the impacts of anthropogenic climate change on human societies and ecosystems worldwide.
Statistical tools for transgene copy number estimation based on real-time PCR.

PubMed

Yuan, Joshua S; Burris, Jason; Stewart, Nathan R; Mentewab, Ayalew; Stewart, C Neal

2007-11-01

As compared with traditional transgene copy number detection technologies such as Southern blot analysis, real-time PCR provides a fast, inexpensive and high-throughput alternative. However, the real-time PCR based transgene copy number estimation tends to be ambiguous and subjective stemming from the lack of proper statistical analysis and data quality control to render a reliable estimation of copy number with a prediction value. Despite the recent progresses in statistical analysis of real-time PCR, few publications have integrated these advancements in real-time PCR based transgene copy number determination. Three experimental designs and four data quality control integrated statistical models are presented. For the first method, external calibration curves are established for the transgene based on serially-diluted templates. The Ct number from a control transgenic event and putative transgenic event are compared to derive the transgene copy number or zygosity estimation. Simple linear regression and two group T-test procedures were combined to model the data from this design. For the second experimental design, standard curves were generated for both an internal reference gene and the transgene, and the copy number of transgene was compared with that of internal reference gene. Multiple regression models and ANOVA models can be employed to analyze the data and perform quality control for this approach. In the third experimental design, transgene copy number is compared with reference gene without a standard curve, but rather, is based directly on fluorescence data. Two different multiple regression models were proposed to analyze the data based on two different approaches of amplification efficiency integration. Our results highlight the importance of proper statistical treatment and quality control integration in real-time PCR-based transgene copy number determination. These statistical methods allow the real-time PCR-based transgene copy number estimation to be more reliable and precise with a proper statistical estimation. Proper confidence intervals are necessary for unambiguous prediction of trangene copy number. The four different statistical methods are compared for their advantages and disadvantages. Moreover, the statistical methods can also be applied for other real-time PCR-based quantification assays including transfection efficiency analysis and pathogen quantification.
Big-data reflection high energy electron diffraction analysis for understanding epitaxial film growth processes.

PubMed

Vasudevan, Rama K; Tselev, Alexander; Baddorf, Arthur P; Kalinin, Sergei V

2014-10-28

Reflection high energy electron diffraction (RHEED) has by now become a standard tool for in situ monitoring of film growth by pulsed laser deposition and molecular beam epitaxy. Yet despite the widespread adoption and wealth of information in RHEED images, most applications are limited to observing intensity oscillations of the specular spot, and much additional information on growth is discarded. With ease of data acquisition and increased computation speeds, statistical methods to rapidly mine the data set are now feasible. Here, we develop such an approach to the analysis of the fundamental growth processes through multivariate statistical analysis of a RHEED image sequence. This approach is illustrated for growth of La(x)Ca(1-x)MnO(3) films grown on etched (001) SrTiO(3) substrates, but is universal. The multivariate methods including principal component analysis and k-means clustering provide insight into the relevant behaviors, the timing and nature of a disordered to ordered growth change, and highlight statistically significant patterns. Fourier analysis yields the harmonic components of the signal and allows separation of the relevant components and baselines, isolating the asymmetric nature of the step density function and the transmission spots from the imperfect layer-by-layer (LBL) growth. These studies show the promise of big data approaches to obtaining more insight into film properties during and after epitaxial film growth. Furthermore, these studies open the pathway to use forward prediction methods to potentially allow significantly more control over growth process and hence final film quality.
Confounding in statistical mediation analysis: What it is and how to address it.

PubMed

Valente, Matthew J; Pelham, William E; Smyth, Heather; MacKinnon, David P

2017-11-01

Psychology researchers are often interested in mechanisms underlying how randomized interventions affect outcomes such as substance use and mental health. Mediation analysis is a common statistical method for investigating psychological mechanisms that has benefited from exciting new methodological improvements over the last 2 decades. One of the most important new developments is methodology for estimating causal mediated effects using the potential outcomes framework for causal inference. Potential outcomes-based methods developed in epidemiology and statistics have important implications for understanding psychological mechanisms. We aim to provide a concise introduction to and illustration of these new methods and emphasize the importance of confounder adjustment. First, we review the traditional regression approach for estimating mediated effects. Second, we describe the potential outcomes framework. Third, we define what a confounder is and how the presence of a confounder can provide misleading evidence regarding mechanisms of interventions. Fourth, we describe experimental designs that can help rule out confounder bias. Fifth, we describe new statistical approaches to adjust for measured confounders of the mediator-outcome relation and sensitivity analyses to probe effects of unmeasured confounders on the mediated effect. All approaches are illustrated with application to a real counseling intervention dataset. Counseling psychologists interested in understanding the causal mechanisms of their interventions can benefit from incorporating the most up-to-date techniques into their mediation analyses. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Econophysical visualization of Adam Smith’s invisible hand

NASA Astrophysics Data System (ADS)

Cohen, Morrel H.; Eliazar, Iddo I.

2013-02-01

Consider a complex system whose macrostate is statistically observable, but yet whose operating mechanism is an unknown black-box. In this paper we address the problem of inferring, from the system’s macrostate statistics, the system’s intrinsic force yielding the observed statistics. The inference is established via two diametrically opposite approaches which result in the very same intrinsic force: a top-down approach based on the notion of entropy, and a bottom-up approach based on the notion of Langevin dynamics. The general results established are applied to the problem of visualizing the intrinsic socioeconomic force-Adam Smith’s invisible hand-shaping the distribution of wealth in human societies. Our analysis yields quantitative econophysical representations of figurative socioeconomic forces, quantitative definitions of “poor” and “rich”, and a quantitative characterization of the “poor-get-poorer” and the “rich-get-richer” phenomena.
Hybrid Evidence Theory-based Finite Element/Statistical Energy Analysis method for mid-frequency analysis of built-up systems with epistemic uncertainties

NASA Astrophysics Data System (ADS)

Yin, Shengwen; Yu, Dejie; Yin, Hui; Lü, Hui; Xia, Baizhan

2017-09-01

Considering the epistemic uncertainties within the hybrid Finite Element/Statistical Energy Analysis (FE/SEA) model when it is used for the response analysis of built-up systems in the mid-frequency range, the hybrid Evidence Theory-based Finite Element/Statistical Energy Analysis (ETFE/SEA) model is established by introducing the evidence theory. Based on the hybrid ETFE/SEA model and the sub-interval perturbation technique, the hybrid Sub-interval Perturbation and Evidence Theory-based Finite Element/Statistical Energy Analysis (SIP-ETFE/SEA) approach is proposed. In the hybrid ETFE/SEA model, the uncertainty in the SEA subsystem is modeled by a non-parametric ensemble, while the uncertainty in the FE subsystem is described by the focal element and basic probability assignment (BPA), and dealt with evidence theory. Within the hybrid SIP-ETFE/SEA approach, the mid-frequency response of interest, such as the ensemble average of the energy response and the cross-spectrum response, is calculated analytically by using the conventional hybrid FE/SEA method. Inspired by the probability theory, the intervals of the mean value, variance and cumulative distribution are used to describe the distribution characteristics of mid-frequency responses of built-up systems with epistemic uncertainties. In order to alleviate the computational burdens for the extreme value analysis, the sub-interval perturbation technique based on the first-order Taylor series expansion is used in ETFE/SEA model to acquire the lower and upper bounds of the mid-frequency responses over each focal element. Three numerical examples are given to illustrate the feasibility and effectiveness of the proposed method.
Asymptotic modal analysis and statistical energy analysis

NASA Technical Reports Server (NTRS)

Dowell, Earl H.

1992-01-01

Asymptotic Modal Analysis (AMA) is a method which is used to model linear dynamical systems with many participating modes. The AMA method was originally developed to show the relationship between statistical energy analysis (SEA) and classical modal analysis (CMA). In the limit of a large number of modes of a vibrating system, the classical modal analysis result can be shown to be equivalent to the statistical energy analysis result. As the CMA result evolves into the SEA result, a number of systematic assumptions are made. Most of these assumptions are based upon the supposition that the number of modes approaches infinity. It is for this reason that the term 'asymptotic' is used. AMA is the asymptotic result of taking the limit of CMA as the number of modes approaches infinity. AMA refers to any of the intermediate results between CMA and SEA, as well as the SEA result which is derived from CMA. The main advantage of the AMA method is that individual modal characteristics are not required in the model or computations. By contrast, CMA requires that each modal parameter be evaluated at each frequency. In the latter, contributions from each mode are computed and the final answer is obtained by summing over all the modes in the particular band of interest. AMA evaluates modal parameters only at their center frequency and does not sum the individual contributions from each mode in order to obtain a final result. The method is similar to SEA in this respect. However, SEA is only capable of obtaining spatial averages or means, as it is a statistical method. Since AMA is systematically derived from CMA, it can obtain local spatial information as well.
Congruence analysis of geodetic networks - hypothesis tests versus model selection by information criteria

NASA Astrophysics Data System (ADS)

Lehmann, Rüdiger; Lösler, Michael

2017-12-01

Geodetic deformation analysis can be interpreted as a model selection problem. The null model indicates that no deformation has occurred. It is opposed to a number of alternative models, which stipulate different deformation patterns. A common way to select the right model is the usage of a statistical hypothesis test. However, since we have to test a series of deformation patterns, this must be a multiple test. As an alternative solution for the test problem, we propose the p-value approach. Another approach arises from information theory. Here, the Akaike information criterion (AIC) or some alternative is used to select an appropriate model for a given set of observations. Both approaches are discussed and applied to two test scenarios: A synthetic levelling network and the Delft test data set. It is demonstrated that they work but behave differently, sometimes even producing different results. Hypothesis tests are well-established in geodesy, but may suffer from an unfavourable choice of the decision error rates. The multiple test also suffers from statistical dependencies between the test statistics, which are neglected. Both problems are overcome by applying information criterions like AIC.
Pyrotechnic Shock Analysis Using Statistical Energy Analysis

DTIC Science & Technology

2015-10-23

SEA subsystems. A couple of validation examples are provided to demonstrate the new approach. KEY WORDS : Peak Ratio, phase perturbation...Ballistic Shock Prediction Models and Techniques for Use in the Crusader Combat Vehicle Program,” 11th Annual US Army Ground Vehicle Survivability
An Empirical Bayes Approach to Mantel-Haenszel DIF Analysis.

ERIC Educational Resources Information Center

Zwick, Rebecca; Thayer, Dorothy T.; Lewis, Charles

1999-01-01

Developed an empirical Bayes enhancement to Mantel-Haenszel (MH) analysis of differential item functioning (DIF) in which it is assumed that the MH statistics are normally distributed and that the prior distribution of underlying DIF parameters is also normal. (Author/SLD)
Monte Carlo Approach for Reliability Estimations in Generalizability Studies.

ERIC Educational Resources Information Center

Dimitrov, Dimiter M.

A Monte Carlo approach is proposed, using the Statistical Analysis System (SAS) programming language, for estimating reliability coefficients in generalizability theory studies. Test scores are generated by a probabilistic model that considers the probability for a person with a given ability score to answer an item with a given difficulty…
A MOOC on Approaches to Machine Translation

ERIC Educational Resources Information Center

Costa-jussà, Mart R.; Formiga, Lluís; Torrillas, Oriol; Petit, Jordi; Fonollosa, José A. R.

2015-01-01

This paper describes the design, development, and analysis of a MOOC entitled "Approaches to Machine Translation: Rule-based, statistical and hybrid", and provides lessons learned and conclusions to be taken into account in the future. The course was developed within the Canvas platform, used by recognized European universities. It…

A flexible, interpretable framework for assessing sensitivity to unmeasured confounding.

PubMed

Dorie, Vincent; Harada, Masataka; Carnegie, Nicole Bohme; Hill, Jennifer

2016-09-10

When estimating causal effects, unmeasured confounding and model misspecification are both potential sources of bias. We propose a method to simultaneously address both issues in the form of a semi-parametric sensitivity analysis. In particular, our approach incorporates Bayesian Additive Regression Trees into a two-parameter sensitivity analysis strategy that assesses sensitivity of posterior distributions of treatment effects to choices of sensitivity parameters. This results in an easily interpretable framework for testing for the impact of an unmeasured confounder that also limits the number of modeling assumptions. We evaluate our approach in a large-scale simulation setting and with high blood pressure data taken from the Third National Health and Nutrition Examination Survey. The model is implemented as open-source software, integrated into the treatSens package for the R statistical programming language. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
A Flexible Approach for the Statistical Visualization of Ensemble Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Potter, K.; Wilson, A.; Bremer, P.

2009-09-29

Scientists are increasingly moving towards ensemble data sets to explore relationships present in dynamic systems. Ensemble data sets combine spatio-temporal simulation results generated using multiple numerical models, sampled input conditions and perturbed parameters. While ensemble data sets are a powerful tool for mitigating uncertainty, they pose significant visualization and analysis challenges due to their complexity. We present a collection of overview and statistical displays linked through a high level of interactivity to provide a framework for gaining key scientific insight into the distribution of the simulation results as well as the uncertainty associated with the data. In contrast to methodsmore » that present large amounts of diverse information in a single display, we argue that combining multiple linked statistical displays yields a clearer presentation of the data and facilitates a greater level of visual data analysis. We demonstrate this approach using driving problems from climate modeling and meteorology and discuss generalizations to other fields.« less
Fast and accurate imputation of summary statistics enhances evidence of functional enrichment.

PubMed

Pasaniuc, Bogdan; Zaitlen, Noah; Shi, Huwenbo; Bhatia, Gaurav; Gusev, Alexander; Pickrell, Joseph; Hirschhorn, Joel; Strachan, David P; Patterson, Nick; Price, Alkes L

2014-10-15

Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1-5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case-control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of [Formula: see text] association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses. Publicly available software package available at http://bogdan.bioinformatics.ucla.edu/software/. bpasaniuc@mednet.ucla.edu or aprice@hsph.harvard.edu Supplementary materials are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Statistical analysis of excitation energies in actinide and rare-earth nuclei

NASA Astrophysics Data System (ADS)

Levon, A. I.; Magner, A. G.; Radionov, S. V.

2018-04-01

Statistical analysis of distributions of the collective states in actinide and rare-earth nuclei is performed in terms of the nearest-neighbor spacing distribution (NNSD). Several approximations, such as the linear approach to the level repulsion density and that suggested by Brody to the NNSDs were applied for the analysis. We found an intermediate character of the experimental spectra between the order and the chaos for a number of rare-earth and actinide nuclei. The spectra are closer to the Wigner distribution for energies limited by 3 MeV, and to the Poisson distribution for data including higher excitation energies and higher spins. The latter result is in agreement with the theoretical calculations. These features are confirmed by the cumulative distributions, where the Wigner contribution dominates at smaller spacings while the Poisson one is more important at larger spacings, and our linear approach improves the comparison with experimental data at all desired spacings.
Contextualizing Obesity and Diabetes Policy: Exploring a Nested Statistical and Constructivist Approach at the Cross-National and Subnational Government Level in the United States and Brazil

PubMed Central

Gómez, Eduardo J.

2017-01-01

Background: This article conducts a comparative national and subnational government analysis of the political, economic, and ideational constructivist contextual factors facilitating the adoption of obesity and diabetes policy. Methods: We adopt a nested analytical approach to policy analysis, which combines cross-national statistical analysis with subnational case study comparisons to examine theoretical prepositions and discover alternative contextual factors; this was combined with an ideational constructivist approach to policy-making. Results: Contrary to the existing literature, we found that with the exception of cross-national statistical differences in access to healthcare infrastructural resources, the growing burden of obesity and diabetes, rising healthcare costs and increased citizens’ knowledge had no predictive affect on the adoption of obesity and diabetes policy. We then turned to a subnational comparative analysis of the states of Mississippi in the United States and Rio Grande do Norte in Brazil to further assess the importance of infrastructural resources, at two units of analysis: the state governments versus rural municipal governments. Qualitative evidence suggests that differences in subnational healthcare infrastructural resources were insufficient for explaining policy reform processes, highlighting instead other potentially important factors, such as state-civil societal relationships and policy diffusion in Mississippi, federal policy intervention in Rio Grande do Norte, and politicians’ social construction of obesity and the resulting differences in policy roles assigned to the central government. Conclusion: We conclude by underscoring the complexity of subnational policy responses to obesity and diabetes, the importance of combining resource and constructivist analysis for better understanding the context of policy reform, while underscoring the potential lessons that the United States can learn from Brazil. PMID:29179290
Parametric and experimental analysis using a power flow approach

NASA Technical Reports Server (NTRS)

Cuschieri, J. M.

1990-01-01

A structural power flow approach for the analysis of structure-borne transmission of vibrations is used to analyze the influence of structural parameters on transmitted power. The parametric analysis is also performed using the Statistical Energy Analysis approach and the results are compared with those obtained using the power flow approach. The advantages of structural power flow analysis are demonstrated by comparing the type of results that are obtained by the two analytical methods. Also, to demonstrate that the power flow results represent a direct physical parameter that can be measured on a typical structure, an experimental study of structural power flow is presented. This experimental study presents results for an L shaped beam for which an available solution was already obtained. Various methods to measure vibrational power flow are compared to study their advantages and disadvantages.
"The Two Brothers": Reconciling Perceptual-Cognitive and Statistical Models of Musical Evolution.

PubMed

Jan, Steven

2018-01-01

While the "units, events and dynamics" of memetic evolution have been abstractly theorized (Lynch, 1998), they have not been applied systematically to real corpora in music. Some researchers, convinced of the validity of cultural evolution in more than the metaphorical sense adopted by much musicology, but perhaps skeptical of some or all of the claims of memetics, have attempted statistically based corpus-analysis techniques of music drawn from molecular biology, and these have offered strong evidence in favor of system-level change over time (Savage, 2017). This article argues that such statistical approaches, while illuminating, ignore the psychological realities of music-information grouping, the transmission of such groups with varying degrees of fidelity, their selection according to relative perceptual-cognitive salience, and the power of this Darwinian process to drive the systemic changes (such as the development over time of systems of tonal organization in music) that statistical methodologies measure. It asserts that a synthesis between such statistical approaches to the study of music-cultural change and the theory of memetics as applied to music (Jan, 2007), in particular the latter's perceptual-cognitive elements, would harness the strengths of each approach and deepen understanding of cultural evolution in music.
“The Two Brothers”: Reconciling Perceptual-Cognitive and Statistical Models of Musical Evolution

PubMed Central

Jan, Steven

2018-01-01

While the “units, events and dynamics” of memetic evolution have been abstractly theorized (Lynch, 1998), they have not been applied systematically to real corpora in music. Some researchers, convinced of the validity of cultural evolution in more than the metaphorical sense adopted by much musicology, but perhaps skeptical of some or all of the claims of memetics, have attempted statistically based corpus-analysis techniques of music drawn from molecular biology, and these have offered strong evidence in favor of system-level change over time (Savage, 2017). This article argues that such statistical approaches, while illuminating, ignore the psychological realities of music-information grouping, the transmission of such groups with varying degrees of fidelity, their selection according to relative perceptual-cognitive salience, and the power of this Darwinian process to drive the systemic changes (such as the development over time of systems of tonal organization in music) that statistical methodologies measure. It asserts that a synthesis between such statistical approaches to the study of music-cultural change and the theory of memetics as applied to music (Jan, 2007), in particular the latter's perceptual-cognitive elements, would harness the strengths of each approach and deepen understanding of cultural evolution in music. PMID:29670551
Possibilities of fractal analysis of the competitive dynamics: Approaches and procedures

NASA Astrophysics Data System (ADS)

Zagornaya, T. O.; Medvedeva, M. A.; Panova, V. L.; Isaichik, K. F.; Medvedev, A. N.

2017-11-01

The possibilities of the fractal approach are used for the study of non-linear nature of the competitive dynamics of the market of trading intermediaries. Based on a statistical study of the functioning of retail indicators in the region, the approach to the analysis of the characteristics of the competitive behavior of market participants is developed. The authors postulate the principles of studying the dynamics of competition as a result of changes in the characteristics of the vector and the competitive behavior of market agents.
A polynomial-chaos-expansion-based building block approach for stochastic analysis of photonic circuits

NASA Astrophysics Data System (ADS)

Waqas, Abi; Melati, Daniele; Manfredi, Paolo; Grassi, Flavia; Melloni, Andrea

2018-02-01

The Building Block (BB) approach has recently emerged in photonic as a suitable strategy for the analysis and design of complex circuits. Each BB can be foundry related and contains a mathematical macro-model of its functionality. As well known, statistical variations in fabrication processes can have a strong effect on their functionality and ultimately affect the yield. In order to predict the statistical behavior of the circuit, proper analysis of the uncertainties effects is crucial. This paper presents a method to build a novel class of Stochastic Process Design Kits for the analysis of photonic circuits. The proposed design kits directly store the information on the stochastic behavior of each building block in the form of a generalized-polynomial-chaos-based augmented macro-model obtained by properly exploiting stochastic collocation and Galerkin methods. Using this approach, we demonstrate that the augmented macro-models of the BBs can be calculated once and stored in a BB (foundry dependent) library and then used for the analysis of any desired circuit. The main advantage of this approach, shown here for the first time in photonics, is that the stochastic moments of an arbitrary photonic circuit can be evaluated by a single simulation only, without the need for repeated simulations. The accuracy and the significant speed-up with respect to the classical Monte Carlo analysis are verified by means of classical photonic circuit example with multiple uncertain variables.
Visualization and Image Analysis of Yeast Cells.

PubMed

Bagley, Steve

2016-01-01

When converting real-life data via visualization to numbers and then onto statistics the whole system needs to be considered so that conversion from the analogue to the digital is accurate and repeatable. Here we describe the points to consider when approaching yeast cell analysis visualization, processing, and analysis of a population by screening techniques.
Statistical Inference for Data Adaptive Target Parameters.

PubMed

Hubbard, Alan E; Kherad-Pajouh, Sara; van der Laan, Mark J

2016-05-01

Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in an estimation sample (one of the V subsamples) and corresponding complementary parameter-generating sample. For each of the V parameter-generating samples, we apply an algorithm that maps the sample to a statistical target parameter. We define our sample-split data adaptive statistical target parameter as the average of these V-sample specific target parameters. We present an estimator (and corresponding central limit theorem) of this type of data adaptive target parameter. This general methodology for generating data adaptive target parameters is demonstrated with a number of practical examples that highlight new opportunities for statistical learning from data. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming "data-driven", the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods. To suggest such potential, and to verify the predictions of the theory, extensive simulation studies, along with a data analysis based on adaptively determined intervention rules are shown and give insight into how to structure such an approach. The results show that the data adaptive target parameter approach provides a general framework and resulting methodology for data-driven science.
Multivariate approaches for stability control of the olive oil reference materials for sensory analysis - part I: framework and fundamentals.

PubMed

Valverde-Som, Lucia; Ruiz-Samblás, Cristina; Rodríguez-García, Francisco P; Cuadros-Rodríguez, Luis

2018-02-09

Virgin olive oil is the only food product for which sensory analysis is regulated to classify it in different quality categories. To harmonize the results of the sensorial method, the use of standards or reference materials is crucial. The stability of sensory reference materials is required to enable their suitable control, aiming to confirm that their specific target values are maintained on an ongoing basis. Currently, such stability is monitored by means of sensory analysis and the sensory panels are in the paradoxical situation of controlling the standards that are devoted to controlling the panels. In the present study, several approaches based on similarity analysis are exploited. For each approach, the specific methodology to build a proper multivariate control chart to monitor the stability of the sensory properties is explained and discussed. The normalized Euclidean and Mahalanobis distances, the so-called nearness and hardiness indices respectively, have been defined as new similarity indices to range the values from 0 to 1. Also, the squared mean from Hotelling's T 2 -statistic and Q 2 -statistic has been proposed as another similarity index. © 2018 Society of Chemical Industry. © 2018 Society of Chemical Industry.
Aftershock identification problem via the nearest-neighbor analysis for marked point processes

NASA Astrophysics Data System (ADS)

Gabrielov, A.; Zaliapin, I.; Wong, H.; Keilis-Borok, V.

2007-12-01

The centennial observations on the world seismicity have revealed a wide variety of clustering phenomena that unfold in the space-time-energy domain and provide most reliable information about the earthquake dynamics. However, there is neither a unifying theory nor a convenient statistical apparatus that would naturally account for the different types of seismic clustering. In this talk we present a theoretical framework for nearest-neighbor analysis of marked processes and obtain new results on hierarchical approach to studying seismic clustering introduced by Baiesi and Paczuski (2004). Recall that under this approach one defines an asymmetric distance D in space-time-energy domain such that the nearest-neighbor spanning graph with respect to D becomes a time- oriented tree. We demonstrate how this approach can be used to detect earthquake clustering. We apply our analysis to the observed seismicity of California and synthetic catalogs from ETAS model and show that the earthquake clustering part is statistically different from the homogeneous part. This finding may serve as a basis for an objective aftershock identification procedure.
Object-oriented productivity metrics

NASA Technical Reports Server (NTRS)

Connell, John L.; Eller, Nancy

1992-01-01

Software productivity metrics are useful for sizing and costing proposed software and for measuring development productivity. Estimating and measuring source lines of code (SLOC) has proven to be a bad idea because it encourages writing more lines of code and using lower level languages. Function Point Analysis is an improved software metric system, but it is not compatible with newer rapid prototyping and object-oriented approaches to software development. A process is presented here for counting object-oriented effort points, based on a preliminary object-oriented analysis. It is proposed that this approach is compatible with object-oriented analysis, design, programming, and rapid prototyping. Statistics gathered on actual projects are presented to validate the approach.
A simulations approach for meta-analysis of genetic association studies based on additive genetic model.

PubMed

John, Majnu; Lencz, Todd; Malhotra, Anil K; Correll, Christoph U; Zhang, Jian-Ping

2018-06-01

Meta-analysis of genetic association studies is being increasingly used to assess phenotypic differences between genotype groups. When the underlying genetic model is assumed to be dominant or recessive, assessing the phenotype differences based on summary statistics, reported for individual studies in a meta-analysis, is a valid strategy. However, when the genetic model is additive, a similar strategy based on summary statistics will lead to biased results. This fact about the additive model is one of the things that we establish in this paper, using simulations. The main goal of this paper is to present an alternate strategy for the additive model based on simulating data for the individual studies. We show that the alternate strategy is far superior to the strategy based on summary statistics.
Transitioning to multiple imputation : a new method to impute missing blood alcohol concentration (BAC) values in FARS

DOT National Transportation Integrated Search

2002-01-01

The National Center for Statistics and Analysis (NCSA) of the National Highway Traffic Safety : Administration (NHTSA) has undertaken several approaches to remedy the problem of missing blood alcohol : test results in the Fatality Analysis Reporting ...
Applied immuno-epidemiological research: an approach for integrating existing knowledge into the statistical analysis of multiple immune markers.

PubMed

Genser, Bernd; Fischer, Joachim E; Figueiredo, Camila A; Alcântara-Neves, Neuza; Barreto, Mauricio L; Cooper, Philip J; Amorim, Leila D; Saemann, Marcus D; Weichhart, Thomas; Rodrigues, Laura C

2016-05-20

Immunologists often measure several correlated immunological markers, such as concentrations of different cytokines produced by different immune cells and/or measured under different conditions, to draw insights from complex immunological mechanisms. Although there have been recent methodological efforts to improve the statistical analysis of immunological data, a framework is still needed for the simultaneous analysis of multiple, often correlated, immune markers. This framework would allow the immunologists' hypotheses about the underlying biological mechanisms to be integrated. We present an analytical approach for statistical analysis of correlated immune markers, such as those commonly collected in modern immuno-epidemiological studies. We demonstrate i) how to deal with interdependencies among multiple measurements of the same immune marker, ii) how to analyse association patterns among different markers, iii) how to aggregate different measures and/or markers to immunological summary scores, iv) how to model the inter-relationships among these scores, and v) how to use these scores in epidemiological association analyses. We illustrate the application of our approach to multiple cytokine measurements from 818 children enrolled in a large immuno-epidemiological study (SCAALA Salvador), which aimed to quantify the major immunological mechanisms underlying atopic diseases or asthma. We demonstrate how to aggregate systematically the information captured in multiple cytokine measurements to immunological summary scores aimed at reflecting the presumed underlying immunological mechanisms (Th1/Th2 balance and immune regulatory network). We show how these aggregated immune scores can be used as predictors in regression models with outcomes of immunological studies (e.g. specific IgE) and compare the results to those obtained by a traditional multivariate regression approach. The proposed analytical approach may be especially useful to quantify complex immune responses in immuno-epidemiological studies, where investigators examine the relationship among epidemiological patterns, immune response, and disease outcomes.
A quadratically regularized functional canonical correlation analysis for identifying the global structure of pleiotropy with NGS data

PubMed Central

Zhu, Yun; Fan, Ruzong; Xiong, Momiao

2017-01-01

Investigating the pleiotropic effects of genetic variants can increase statistical power, provide important information to achieve deep understanding of the complex genetic structures of disease, and offer powerful tools for designing effective treatments with fewer side effects. However, the current multiple phenotype association analysis paradigm lacks breadth (number of phenotypes and genetic variants jointly analyzed at the same time) and depth (hierarchical structure of phenotype and genotypes). A key issue for high dimensional pleiotropic analysis is to effectively extract informative internal representation and features from high dimensional genotype and phenotype data. To explore correlation information of genetic variants, effectively reduce data dimensions, and overcome critical barriers in advancing the development of novel statistical methods and computational algorithms for genetic pleiotropic analysis, we proposed a new statistic method referred to as a quadratically regularized functional CCA (QRFCCA) for association analysis which combines three approaches: (1) quadratically regularized matrix factorization, (2) functional data analysis and (3) canonical correlation analysis (CCA). Large-scale simulations show that the QRFCCA has a much higher power than that of the ten competing statistics while retaining the appropriate type 1 errors. To further evaluate performance, the QRFCCA and ten other statistics are applied to the whole genome sequencing dataset from the TwinsUK study. We identify a total of 79 genes with rare variants and 67 genes with common variants significantly associated with the 46 traits using QRFCCA. The results show that the QRFCCA substantially outperforms the ten other statistics. PMID:29040274
Sparse approximation of currents for statistics on curves and surfaces.

PubMed

Durrleman, Stanley; Pennec, Xavier; Trouvé, Alain; Ayache, Nicholas

2008-01-01

Computing, processing, visualizing statistics on shapes like curves or surfaces is a real challenge with many applications ranging from medical image analysis to computational geometry. Modelling such geometrical primitives with currents avoids feature-based approach as well as point-correspondence method. This framework has been proved to be powerful to register brain surfaces or to measure geometrical invariants. However, if the state-of-the-art methods perform efficiently pairwise registrations, new numerical schemes are required to process groupwise statistics due to an increasing complexity when the size of the database is growing. Statistics such as mean and principal modes of a set of shapes often have a heavy and highly redundant representation. We propose therefore to find an adapted basis on which mean and principal modes have a sparse decomposition. Besides the computational improvement, this sparse representation offers a way to visualize and interpret statistics on currents. Experiments show the relevance of the approach on 34 sets of 70 sulcal lines and on 50 sets of 10 meshes of deep brain structures.

Detecting temporal change in freshwater fisheries surveys: statistical power and the important linkages between management questions and monitoring objectives

USGS Publications Warehouse

Wagner, Tyler; Irwin, Brian J.; James R. Bence,; Daniel B. Hayes,

2016-01-01

Monitoring to detect temporal trends in biological and habitat indices is a critical component of fisheries management. Thus, it is important that management objectives are linked to monitoring objectives. This linkage requires a definition of what constitutes a management-relevant “temporal trend.” It is also important to develop expectations for the amount of time required to detect a trend (i.e., statistical power) and for choosing an appropriate statistical model for analysis. We provide an overview of temporal trends commonly encountered in fisheries management, review published studies that evaluated statistical power of long-term trend detection, and illustrate dynamic linear models in a Bayesian context, as an additional analytical approach focused on shorter term change. We show that monitoring programs generally have low statistical power for detecting linear temporal trends and argue that often management should be focused on different definitions of trends, some of which can be better addressed by alternative analytical approaches.
Variable system: An alternative approach for the analysis of mediated moderation.

PubMed

Kwan, Joyce Lok Yin; Chan, Wai

2018-06-01

Mediated moderation (meMO) occurs when the moderation effect of the moderator (W) on the relationship between the independent variable (X) and the dependent variable (Y) is transmitted through a mediator (M). To examine this process empirically, 2 different model specifications (Type I meMO and Type II meMO) have been proposed in the literature. However, both specifications are found to be problematic, either conceptually or statistically. For example, it can be shown that each type of meMO model is statistically equivalent to a particular form of moderated mediation (moME), another process that examines the condition when the indirect effect from X to Y through M varies as a function of W. Consequently, it is difficult for one to differentiate these 2 processes mathematically. This study therefore has 2 objectives. First, we attempt to differentiate moME and meMO by proposing an alternative specification for meMO. Conceptually, this alternative specification is intuitively meaningful and interpretable, and, statistically, it offers meMO a unique representation that is no longer identical to its moME counterpart. Second, using structural equation modeling, we propose an integrated approach for the analysis of meMO as well as for other general types of conditional path models. VS, a computer software program that implements the proposed approach, has been developed to facilitate the analysis of conditional path models for applied researchers. Real examples are considered to illustrate how the proposed approach works in practice and to compare its performance against the traditional methods. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Evaluating the quality of a cell counting measurement process via a dilution series experimental design.

PubMed

Sarkar, Sumona; Lund, Steven P; Vyzasatya, Ravi; Vanguri, Padmavathy; Elliott, John T; Plant, Anne L; Lin-Gibson, Sheng

2017-12-01

Cell counting measurements are critical in the research, development and manufacturing of cell-based products, yet determining cell quantity with accuracy and precision remains a challenge. Validating and evaluating a cell counting measurement process can be difficult because of the lack of appropriate reference material. Here we describe an experimental design and statistical analysis approach to evaluate the quality of a cell counting measurement process in the absence of appropriate reference materials or reference methods. The experimental design is based on a dilution series study with replicate samples and observations as well as measurement process controls. The statistical analysis evaluates the precision and proportionality of the cell counting measurement process and can be used to compare the quality of two or more counting methods. As an illustration of this approach, cell counting measurement processes (automated and manual methods) were compared for a human mesenchymal stromal cell (hMSC) preparation. For the hMSC preparation investigated, results indicated that the automated method performed better than the manual counting methods in terms of precision and proportionality. By conducting well controlled dilution series experimental designs coupled with appropriate statistical analysis, quantitative indicators of repeatability and proportionality can be calculated to provide an assessment of cell counting measurement quality. This approach does not rely on the use of a reference material or comparison to "gold standard" methods known to have limited assurance of accuracy and precision. The approach presented here may help the selection, optimization, and/or validation of a cell counting measurement process. Published by Elsevier Inc.
Application of Turchin's method of statistical regularization

NASA Astrophysics Data System (ADS)

Zelenyi, Mikhail; Poliakova, Mariia; Nozik, Alexander; Khudyakov, Alexey

2018-04-01

During analysis of experimental data, one usually needs to restore a signal after it has been convoluted with some kind of apparatus function. According to Hadamard's definition this problem is ill-posed and requires regularization to provide sensible results. In this article we describe an implementation of the Turchin's method of statistical regularization based on the Bayesian approach to the regularization strategy.
A statistical mechanics approach to autopoietic immune networks

NASA Astrophysics Data System (ADS)

Barra, Adriano; Agliari, Elena

2010-07-01

In this work we aim to bridge theoretical immunology and disordered statistical mechanics. We introduce a model for the behavior of B-cells which naturally merges the clonal selection theory and the autopoietic network theory as a whole. From the analysis of its features we recover several basic phenomena such as low-dose tolerance, dynamical memory of antigens and self/non-self discrimination.
A statistical power analysis of woody carbon flux from forest inventory data

Treesearch

James A. Westfall; Christopher W. Woodall; Mark A. Hatfield

2013-01-01

At a national scale, the carbon (C) balance of numerous forest ecosystem C pools can be monitored using a stock change approach based on national forest inventory data. Given the potential influence of disturbance events and/or climate change processes, the statistical detection of changes in forest C stocks is paramount to maintaining the net sequestration status of...
A Dynamic Intrusion Detection System Based on Multivariate Hotelling's T2 Statistics Approach for Network Environments

PubMed Central

Avalappampatty Sivasamy, Aneetha; Sundan, Bose

2015-01-01

The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T2 method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T2 statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better. PMID:26357668
A Dynamic Intrusion Detection System Based on Multivariate Hotelling's T2 Statistics Approach for Network Environments.

PubMed

Sivasamy, Aneetha Avalappampatty; Sundan, Bose

2015-01-01

The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T(2) method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T(2) statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better.
Adaptive filtering in biological signal processing.

PubMed

Iyer, V K; Ploysongsang, Y; Ramamoorthy, P A

1990-01-01

The high dependence of conventional optimal filtering methods on the a priori knowledge of the signal and noise statistics render them ineffective in dealing with signals whose statistics cannot be predetermined accurately. Adaptive filtering methods offer a better alternative, since the a priori knowledge of statistics is less critical, real time processing is possible, and the computations are less expensive for this approach. Adaptive filtering methods compute the filter coefficients "on-line", converging to the optimal values in the least-mean square (LMS) error sense. Adaptive filtering is therefore apt for dealing with the "unknown" statistics situation and has been applied extensively in areas like communication, speech, radar, sonar, seismology, and biological signal processing and analysis for channel equalization, interference and echo canceling, line enhancement, signal detection, system identification, spectral analysis, beamforming, modeling, control, etc. In this review article adaptive filtering in the context of biological signals is reviewed. An intuitive approach to the underlying theory of adaptive filters and its applicability are presented. Applications of the principles in biological signal processing are discussed in a manner that brings out the key ideas involved. Current and potential future directions in adaptive biological signal processing are also discussed.
Validating Coherence Measurements Using Aligned and Unaligned Coherence Functions

NASA Technical Reports Server (NTRS)

Miles, Jeffrey Hilton

2006-01-01

This paper describes a novel approach based on the use of coherence functions and statistical theory for sensor validation in a harsh environment. By the use of aligned and unaligned coherence functions and statistical theory one can test for sensor degradation, total sensor failure or changes in the signal. This advanced diagnostic approach and the novel data processing methodology discussed provides a single number that conveys this information. This number as calculated with standard statistical procedures for comparing the means of two distributions is compared with results obtained using Yuen's robust statistical method to create confidence intervals. Examination of experimental data from Kulite pressure transducers mounted in a Pratt & Whitney PW4098 combustor using spectrum analysis methods on aligned and unaligned time histories has verified the effectiveness of the proposed method. All the procedures produce good results which demonstrates how robust the technique is.
An astronomer's guide to period searching

NASA Astrophysics Data System (ADS)

Schwarzenberg-Czerny, A.

2003-03-01

We concentrate on analysis of unevenly sampled time series, interrupted by periodic gaps, as often encountered in astronomy. While some of our conclusions may appear surprising, all are based on classical statistical principles of Fisher & successors. Except for discussion of the resolution issues, it is best for the reader to forget temporarily about Fourier transforms and to concentrate on problems of fitting of a time series with a model curve. According to their statistical content we divide the issues into several sections, consisting of: (ii) statistical numerical aspects of model fitting, (iii) evaluation of fitted models as hypotheses testing, (iv) the role of the orthogonal models in signal detection (v) conditions for equivalence of periodograms (vi) rating sensitivity by test power. An experienced observer working with individual objects would benefit little from formalized statistical approach. However, we demonstrate the usefulness of this approach in evaluation of performance of periodograms and in quantitative design of large variability surveys.
A statistical approach to optimizing concrete mixture design.

PubMed

Ahmad, Shamsad; Alghamdi, Saeid A

2014-01-01

A step-by-step statistical approach is proposed to obtain optimum proportioning of concrete mixtures using the data obtained through a statistically planned experimental program. The utility of the proposed approach for optimizing the design of concrete mixture is illustrated considering a typical case in which trial mixtures were considered according to a full factorial experiment design involving three factors and their three levels (3(3)). A total of 27 concrete mixtures with three replicates (81 specimens) were considered by varying the levels of key factors affecting compressive strength of concrete, namely, water/cementitious materials ratio (0.38, 0.43, and 0.48), cementitious materials content (350, 375, and 400 kg/m(3)), and fine/total aggregate ratio (0.35, 0.40, and 0.45). The experimental data were utilized to carry out analysis of variance (ANOVA) and to develop a polynomial regression model for compressive strength in terms of the three design factors considered in this study. The developed statistical model was used to show how optimization of concrete mixtures can be carried out with different possible options.
A Statistical Approach to Optimizing Concrete Mixture Design

PubMed Central

Alghamdi, Saeid A.

2014-01-01

A step-by-step statistical approach is proposed to obtain optimum proportioning of concrete mixtures using the data obtained through a statistically planned experimental program. The utility of the proposed approach for optimizing the design of concrete mixture is illustrated considering a typical case in which trial mixtures were considered according to a full factorial experiment design involving three factors and their three levels (33). A total of 27 concrete mixtures with three replicates (81 specimens) were considered by varying the levels of key factors affecting compressive strength of concrete, namely, water/cementitious materials ratio (0.38, 0.43, and 0.48), cementitious materials content (350, 375, and 400 kg/m3), and fine/total aggregate ratio (0.35, 0.40, and 0.45). The experimental data were utilized to carry out analysis of variance (ANOVA) and to develop a polynomial regression model for compressive strength in terms of the three design factors considered in this study. The developed statistical model was used to show how optimization of concrete mixtures can be carried out with different possible options. PMID:24688405
A graph theory approach to identify resonant and non-resonant transmission paths in statistical modal energy distribution analysis

NASA Astrophysics Data System (ADS)

Aragonès, Àngels; Maxit, Laurent; Guasch, Oriol

2015-08-01

Statistical modal energy distribution analysis (SmEdA) extends classical statistical energy analysis (SEA) to the mid frequency range by establishing power balance equations between modes in different subsystems. This circumvents the SEA requirement of modal energy equipartition and enables applying SmEdA to the cases of low modal overlap, locally excited subsystems and to deal with complex heterogeneous subsystems as well. Yet, widening the range of application of SEA is done at a price with large models because the number of modes per subsystem can become considerable when the frequency increases. Therefore, it would be worthwhile to have at one's disposal tools for a quick identification and ranking of the resonant and non-resonant paths involved in modal energy transmission between subsystems. It will be shown that previously developed graph theory algorithms for transmission path analysis (TPA) in SEA can be adapted to SmEdA and prove useful for that purpose. The case of airborne transmission between two cavities separated apart by homogeneous and ribbed plates will be first addressed to illustrate the potential of the graph approach. A more complex case representing transmission between non-contiguous cavities in a shipbuilding structure will be also presented.
A statistical approach to deriving subsystem specifications. [for spacecraft shock and vibrational environment tests

NASA Technical Reports Server (NTRS)

Keegan, W. B.

1974-01-01

In order to produce cost effective environmental test programs, the test specifications must be realistic and to be useful, they must be available early in the life of a program. This paper describes a method for achieving such specifications for subsystems by utilizing the results of a statistical analysis of data acquired at subsystem mounting locations during system level environmental tests. The paper describes the details of this statistical analysis. The resultant recommended levels are a function of the subsystems' mounting location in the spacecraft. Methods of determining this mounting 'zone' are described. Recommendations are then made as to which of the various problem areas encountered should be pursued further.
Statistical analysis for understanding and predicting battery degradations in real-life electric vehicle use

NASA Astrophysics Data System (ADS)

Barré, Anthony; Suard, Frédéric; Gérard, Mathias; Montaru, Maxime; Riu, Delphine

2014-01-01

This paper describes the statistical analysis of recorded data parameters of electrical battery ageing during electric vehicle use. These data permit traditional battery ageing investigation based on the evolution of the capacity fade and resistance raise. The measured variables are examined in order to explain the correlation between battery ageing and operating conditions during experiments. Such study enables us to identify the main ageing factors. Then, detailed statistical dependency explorations present the responsible factors on battery ageing phenomena. Predictive battery ageing models are built from this approach. Thereby results demonstrate and quantify a relationship between variables and battery ageing global observations, and also allow accurate battery ageing diagnosis through predictive models.
Performance analysis of clustering techniques over microarray data: A case study

NASA Astrophysics Data System (ADS)

Dash, Rasmita; Misra, Bijan Bihari

2018-03-01

Handling big data is one of the major issues in the field of statistical data analysis. In such investigation cluster analysis plays a vital role to deal with the large scale data. There are many clustering techniques with different cluster analysis approach. But which approach suits a particular dataset is difficult to predict. To deal with this problem a grading approach is introduced over many clustering techniques to identify a stable technique. But the grading approach depends on the characteristic of dataset as well as on the validity indices. So a two stage grading approach is implemented. In this study the grading approach is implemented over five clustering techniques like hybrid swarm based clustering (HSC), k-means, partitioning around medoids (PAM), vector quantization (VQ) and agglomerative nesting (AGNES). The experimentation is conducted over five microarray datasets with seven validity indices. The finding of grading approach that a cluster technique is significant is also established by Nemenyi post-hoc hypothetical test.
Research Update: Spatially resolved mapping of electronic structure on atomic level by multivariate statistical analysis

NASA Astrophysics Data System (ADS)

Belianinov, Alex; Ganesh, Panchapakesan; Lin, Wenzhi; Sales, Brian C.; Sefat, Athena S.; Jesse, Stephen; Pan, Minghu; Kalinin, Sergei V.

2014-12-01

Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe0.55Se0.45 (Tc = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe1-xSex structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified by their electronic signature and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.
Statistical shape analysis using 3D Poisson equation--A quantitatively validated approach.

PubMed

Gao, Yi; Bouix, Sylvain

2016-05-01

Statistical shape analysis has been an important area of research with applications in biology, anatomy, neuroscience, agriculture, paleontology, etc. Unfortunately, the proposed methods are rarely quantitatively evaluated, and as shown in recent studies, when they are evaluated, significant discrepancies exist in their outputs. In this work, we concentrate on the problem of finding the consistent location of deformation between two population of shapes. We propose a new shape analysis algorithm along with a framework to perform a quantitative evaluation of its performance. Specifically, the algorithm constructs a Signed Poisson Map (SPoM) by solving two Poisson equations on the volumetric shapes of arbitrary topology, and statistical analysis is then carried out on the SPoMs. The method is quantitatively evaluated on synthetic shapes and applied on real shape data sets in brain structures. Copyright © 2016 Elsevier B.V. All rights reserved.
Gauging Skills of Hospital Security Personnel: a Statistically-driven, Questionnaire-based Approach.

PubMed

Rinkoo, Arvind Vashishta; Mishra, Shubhra; Rahesuddin; Nabi, Tauqeer; Chandra, Vidha; Chandra, Hem

2013-01-01

This study aims to gauge the technical and soft skills of the hospital security personnel so as to enable prioritization of their training needs. A cross sectional questionnaire based study was conducted in December 2011. Two separate predesigned and pretested questionnaires were used for gauging soft skills and technical skills of the security personnel. Extensive statistical analysis, including Multivariate Analysis (Pillai-Bartlett trace along with Multi-factorial ANOVA) and Post-hoc Tests (Bonferroni Test) was applied. The 143 participants performed better on the soft skills front with an average score of 6.43 and standard deviation of 1.40. The average technical skills score was 5.09 with a standard deviation of 1.44. The study avowed a need for formal hands on training with greater emphasis on technical skills. Multivariate analysis of the available data further helped in identifying 20 security personnel who should be prioritized for soft skills training and a group of 36 security personnel who should receive maximum attention during technical skills training. This statistically driven approach can be used as a prototype by healthcare delivery institutions worldwide, after situation specific customizations, to identify the training needs of any category of healthcare staff.

Gauging Skills of Hospital Security Personnel: a Statistically-driven, Questionnaire-based Approach

PubMed Central

Rinkoo, Arvind Vashishta; Mishra, Shubhra; Rahesuddin; Nabi, Tauqeer; Chandra, Vidha; Chandra, Hem

2013-01-01

Objectives This study aims to gauge the technical and soft skills of the hospital security personnel so as to enable prioritization of their training needs. Methodology A cross sectional questionnaire based study was conducted in December 2011. Two separate predesigned and pretested questionnaires were used for gauging soft skills and technical skills of the security personnel. Extensive statistical analysis, including Multivariate Analysis (Pillai-Bartlett trace along with Multi-factorial ANOVA) and Post-hoc Tests (Bonferroni Test) was applied. Results The 143 participants performed better on the soft skills front with an average score of 6.43 and standard deviation of 1.40. The average technical skills score was 5.09 with a standard deviation of 1.44. The study avowed a need for formal hands on training with greater emphasis on technical skills. Multivariate analysis of the available data further helped in identifying 20 security personnel who should be prioritized for soft skills training and a group of 36 security personnel who should receive maximum attention during technical skills training. Conclusion This statistically driven approach can be used as a prototype by healthcare delivery institutions worldwide, after situation specific customizations, to identify the training needs of any category of healthcare staff. PMID:23559904
Applications of Ergodic Theory to Coverage Analysis

NASA Technical Reports Server (NTRS)

Lo, Martin W.

2003-01-01

The study of differential equations, or dynamical systems in general, has two fundamentally different approaches. We are most familiar with the construction of solutions to differential equations. Another approach is to study the statistical behavior of the solutions. Ergodic Theory is one of the most developed methods to study the statistical behavior of the solutions of differential equations. In the theory of satellite orbits, the statistical behavior of the orbits is used to produce 'Coverage Analysis' or how often a spacecraft is in view of a site on the ground. In this paper, we consider the use of Ergodic Theory for Coverage Analysis. This allows us to greatly simplify the computation of quantities such as the total time for which a ground station can see a satellite without ever integrating the trajectory, see Lo 1,2. More over, for any quantity which is an integrable function of the ground track, its average may be computed similarly without the integration of the trajectory. For example, the data rate for a simple telecom system is a function of the distance between the satellite and the ground station. We show that such a function may be averaged using the Ergodic Theorem.
Validation of a heteroscedastic hazards regression model.

PubMed

Wu, Hong-Dar Isaac; Hsieh, Fushing; Chen, Chen-Hsin

2002-03-01

A Cox-type regression model accommodating heteroscedasticity, with a power factor of the baseline cumulative hazard, is investigated for analyzing data with crossing hazards behavior. Since the approach of partial likelihood cannot eliminate the baseline hazard, an overidentified estimating equation (OEE) approach is introduced in the estimation procedure. It by-product, a model checking statistic, is presented to test for the overall adequacy of the heteroscedastic model. Further, under the heteroscedastic model setting, we propose two statistics to test the proportional hazards assumption. Implementation of this model is illustrated in a data analysis of a cancer clinical trial.
Calibration of Response Data Using MIRT Models with Simple and Mixed Structures

ERIC Educational Resources Information Center

Zhang, Jinming

2012-01-01

It is common to assume during a statistical analysis of a multiscale assessment that the assessment is composed of several unidimensional subtests or that it has simple structure. Under this assumption, the unidimensional and multidimensional approaches can be used to estimate item parameters. These two approaches are equivalent in parameter…
Genetic programming based models in plant tissue culture: An addendum to traditional statistical approach.

PubMed

Mridula, Meenu R; Nair, Ashalatha S; Kumar, K Satheesh

2018-02-01

In this paper, we compared the efficacy of observation based modeling approach using a genetic algorithm with the regular statistical analysis as an alternative methodology in plant research. Preliminary experimental data on in vitro rooting was taken for this study with an aim to understand the effect of charcoal and naphthalene acetic acid (NAA) on successful rooting and also to optimize the two variables for maximum result. Observation-based modelling, as well as traditional approach, could identify NAA as a critical factor in rooting of the plantlets under the experimental conditions employed. Symbolic regression analysis using the software deployed here optimised the treatments studied and was successful in identifying the complex non-linear interaction among the variables, with minimalistic preliminary data. The presence of charcoal in the culture medium has a significant impact on root generation by reducing basal callus mass formation. Such an approach is advantageous for establishing in vitro culture protocols as these models will have significant potential for saving time and expenditure in plant tissue culture laboratories, and it further reduces the need for specialised background.
Prior approval: the growth of Bayesian methods in psychology.

PubMed

Andrews, Mark; Baguley, Thom

2013-02-01

Within the last few years, Bayesian methods of data analysis in psychology have proliferated. In this paper, we briefly review the history or the Bayesian approach to statistics, and consider the implications that Bayesian methods have for the theory and practice of data analysis in psychology.
Patterns of medicinal plant use: an examination of the Ecuadorian Shuar medicinal flora using contingency table and binomial analyses.

PubMed

Bennett, Bradley C; Husby, Chad E

2008-03-28

Botanical pharmacopoeias are non-random subsets of floras, with some taxonomic groups over- or under-represented. Moerman [Moerman, D.E., 1979. Symbols and selectivity: a statistical analysis of Native American medical ethnobotany, Journal of Ethnopharmacology 1, 111-119] introduced linear regression/residual analysis to examine these patterns. However, regression, the commonly-employed analysis, suffers from several statistical flaws. We use contingency table and binomial analyses to examine patterns of Shuar medicinal plant use (from Amazonian Ecuador). We first analyzed the Shuar data using Moerman's approach, modified to better meet requirements of linear regression analysis. Second, we assessed the exact randomization contingency table test for goodness of fit. Third, we developed a binomial model to test for non-random selection of plants in individual families. Modified regression models (which accommodated assumptions of linear regression) reduced R(2) to from 0.59 to 0.38, but did not eliminate all problems associated with regression analyses. Contingency table analyses revealed that the entire flora departs from the null model of equal proportions of medicinal plants in all families. In the binomial analysis, only 10 angiosperm families (of 115) differed significantly from the null model. These 10 families are largely responsible for patterns seen at higher taxonomic levels. Contingency table and binomial analyses offer an easy and statistically valid alternative to the regression approach.
A Nonparametric Statistical Approach to the Validation of Computer Simulation Models

DTIC Science & Technology

1985-11-01

Ballistic Research Laboratory, the Experimental Design and Analysis Branch of the Systems Engineering and Concepts Analysis Division was funded to...2 Winter. E M. Wisemiler. D P. azd UjiharmJ K. Venrgcation ad Validatiot of Engineering Simulatiots with Minimal D2ta." Pmeedinr’ of the 1976 Summer...used by numerous authors. Law%6 has augmented their approach with specific suggestions for each of the three stage’s: 1. develop high face-validity
A Baseline for the Multivariate Comparison of Resting-State Networks

PubMed Central

Allen, Elena A.; Erhardt, Erik B.; Damaraju, Eswar; Gruner, William; Segall, Judith M.; Silva, Rogers F.; Havlicek, Martin; Rachakonda, Srinivas; Fries, Jill; Kalyanam, Ravi; Michael, Andrew M.; Caprihan, Arvind; Turner, Jessica A.; Eichele, Tom; Adelsheim, Steven; Bryan, Angela D.; Bustillo, Juan; Clark, Vincent P.; Feldstein Ewing, Sarah W.; Filbey, Francesca; Ford, Corey C.; Hutchison, Kent; Jung, Rex E.; Kiehl, Kent A.; Kodituwakku, Piyadasa; Komesu, Yuko M.; Mayer, Andrew R.; Pearlson, Godfrey D.; Phillips, John P.; Sadek, Joseph R.; Stevens, Michael; Teuscher, Ursina; Thoma, Robert J.; Calhoun, Vince D.

2011-01-01

As the size of functional and structural MRI datasets expands, it becomes increasingly important to establish a baseline from which diagnostic relevance may be determined, a processing strategy that efficiently prepares data for analysis, and a statistical approach that identifies important effects in a manner that is both robust and reproducible. In this paper, we introduce a multivariate analytic approach that optimizes sensitivity and reduces unnecessary testing. We demonstrate the utility of this mega-analytic approach by identifying the effects of age and gender on the resting-state networks (RSNs) of 603 healthy adolescents and adults (mean age: 23.4 years, range: 12–71 years). Data were collected on the same scanner, preprocessed using an automated analysis pipeline based in SPM, and studied using group independent component analysis. RSNs were identified and evaluated in terms of three primary outcome measures: time course spectral power, spatial map intensity, and functional network connectivity. Results revealed robust effects of age on all three outcome measures, largely indicating decreases in network coherence and connectivity with increasing age. Gender effects were of smaller magnitude but suggested stronger intra-network connectivity in females and more inter-network connectivity in males, particularly with regard to sensorimotor networks. These findings, along with the analysis approach and statistical framework described here, provide a useful baseline for future investigations of brain networks in health and disease. PMID:21442040
Quantity of dates trumps quality of dates for dense Bayesian radiocarbon sediment chronologies - Gas ion source 14C dating instructed by simultaneous Bayesian accumulation rate modeling

NASA Astrophysics Data System (ADS)

Rosenheim, B. E.; Firesinger, D.; Roberts, M. L.; Burton, J. R.; Khan, N.; Moyer, R. P.

2016-12-01

Radiocarbon (14C) sediment core chronologies benefit from a high density of dates, even when precision of individual dates is sacrificed. This is demonstrated by a combined approach of rapid 14C analysis of CO2 gas generated from carbonates and organic material coupled with Bayesian statistical modeling. Analysis of 14C is facilitated by the gas ion source on the Continuous Flow Accelerator Mass Spectrometry (CFAMS) system at the Woods Hole Oceanographic Institution's National Ocean Sciences Accelerator Mass Spectrometry facility. This instrument is capable of producing a 14C determination of +/- 100 14C y precision every 4-5 minutes, with limited sample handling (dissolution of carbonates and/or combustion of organic carbon in evacuated containers). Rapid analysis allows over-preparation of samples to include replicates at each depth and/or comparison of different sample types at particular depths in a sediment or peat core. Analysis priority is given to depths that have the least chronologic precision as determined by Bayesian modeling of the chronology of calibrated ages. Use of such a statistical approach to determine the order in which samples are run ensures that the chronology constantly improves so long as material is available for the analysis of chronologic weak points. Ultimately, accuracy of the chronology is determined by the material that is actually being dated, and our combined approach allows testing of different constituents of the organic carbon pool and the carbonate minerals within a core. We will present preliminary results from a deep-sea sediment core abundant in deep-sea foraminifera as well as coastal wetland peat cores to demonstrate statistical improvements in sediment- and peat-core chronologies obtained by increasing the quantity and decreasing the quality of individual dates.
Review of U.S. Army Unmanned Aerial Systems Accident Reports: Analysis of Human Error Contributions

DTIC Science & Technology

2018-03-20

USAARL Report No. 2018-08 Review of U.S. Army Unmanned Aerial Systems Accident Reports: Analysis of Human Error Contributions By Kathryn A...3 Statistical Analysis Approach ..............................................................................................3 Results...1 Introduction The success of unmanned aerial systems (UAS) operations relies upon a variety of factors, including, but not limited to
The use of decision analysis to evaluate the economic effects of heat mount detectors in two dairy herds.

PubMed

Williamson, N B

1975-03-01

This paper reports a decrease in the interval from calving to conception in two commercial dairy herds, associated with the use of KaMaR Heat Mount Detectors. An economic analysis of the results uses a neoclassical decision theory approach to demonstrate that the use of heat mount detectors is likely to be profitable, with an expected net return of $154.18 per 100 calvings. The analysis demonstrates the suitability of a decision-theoretic approach to the analysis of applied research, and illustrates some of the weaknesses of "Classical" statistical analysis in such circumstances.
Analysis and meta-analysis of single-case designs with a standardized mean difference statistic: a primer and applications.

PubMed

Shadish, William R; Hedges, Larry V; Pustejovsky, James E

2014-04-01

This article presents a d-statistic for single-case designs that is in the same metric as the d-statistic used in between-subjects designs such as randomized experiments and offers some reasons why such a statistic would be useful in SCD research. The d has a formal statistical development, is accompanied by appropriate power analyses, and can be estimated using user-friendly SPSS macros. We discuss both advantages and disadvantages of d compared to other approaches such as previous d-statistics, overlap statistics, and multilevel modeling. It requires at least three cases for computation and assumes normally distributed outcomes and stationarity, assumptions that are discussed in some detail. We also show how to test these assumptions. The core of the article then demonstrates in depth how to compute d for one study, including estimation of the autocorrelation and the ratio of between case variance to total variance (between case plus within case variance), how to compute power using a macro, and how to use the d to conduct a meta-analysis of studies using single-case designs in the free program R, including syntax in an appendix. This syntax includes how to read data, compute fixed and random effect average effect sizes, prepare a forest plot and a cumulative meta-analysis, estimate various influence statistics to identify studies contributing to heterogeneity and effect size, and do various kinds of publication bias analyses. This d may prove useful for both the analysis and meta-analysis of data from SCDs. Copyright © 2013 Society for the Study of School Psychology. Published by Elsevier Ltd. All rights reserved.
Measuring the statistical validity of summary meta-analysis and meta-regression results for use in clinical practice.

PubMed

Willis, Brian H; Riley, Richard D

2017-09-20

An important question for clinicians appraising a meta-analysis is: are the findings likely to be valid in their own practice-does the reported effect accurately represent the effect that would occur in their own clinical population? To this end we advance the concept of statistical validity-where the parameter being estimated equals the corresponding parameter for a new independent study. Using a simple ('leave-one-out') cross-validation technique, we demonstrate how we may test meta-analysis estimates for statistical validity using a new validation statistic, Vn, and derive its distribution. We compare this with the usual approach of investigating heterogeneity in meta-analyses and demonstrate the link between statistical validity and homogeneity. Using a simulation study, the properties of Vn and the Q statistic are compared for univariate random effects meta-analysis and a tailored meta-regression model, where information from the setting (included as model covariates) is used to calibrate the summary estimate to the setting of application. Their properties are found to be similar when there are 50 studies or more, but for fewer studies Vn has greater power but a higher type 1 error rate than Q. The power and type 1 error rate of Vn are also shown to depend on the within-study variance, between-study variance, study sample size, and the number of studies in the meta-analysis. Finally, we apply Vn to two published meta-analyses and conclude that it usefully augments standard methods when deciding upon the likely validity of summary meta-analysis estimates in clinical practice. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
Statistical approach to partial equilibrium analysis

NASA Astrophysics Data System (ADS)

Wang, Yougui; Stanley, H. E.

2009-04-01

A statistical approach to market equilibrium and efficiency analysis is proposed in this paper. One factor that governs the exchange decisions of traders in a market, named willingness price, is highlighted and constitutes the whole theory. The supply and demand functions are formulated as the distributions of corresponding willing exchange over the willingness price. The laws of supply and demand can be derived directly from these distributions. The characteristics of excess demand function are analyzed and the necessary conditions for the existence and uniqueness of equilibrium point of the market are specified. The rationing rates of buyers and sellers are introduced to describe the ratio of realized exchange to willing exchange, and their dependence on the market price is studied in the cases of shortage and surplus. The realized market surplus, which is the criterion of market efficiency, can be written as a function of the distributions of willing exchange and the rationing rates. With this approach we can strictly prove that a market is efficient in the state of equilibrium.
Using expert knowledge to incorporate uncertainty in cause-of-death assignments for modeling of cause-specific mortality

USGS Publications Warehouse

Walsh, Daniel P.; Norton, Andrew S.; Storm, Daniel J.; Van Deelen, Timothy R.; Heisy, Dennis M.

2018-01-01

Implicit and explicit use of expert knowledge to inform ecological analyses is becoming increasingly common because it often represents the sole source of information in many circumstances. Thus, there is a need to develop statistical methods that explicitly incorporate expert knowledge, and can successfully leverage this information while properly accounting for associated uncertainty during analysis. Studies of cause-specific mortality provide an example of implicit use of expert knowledge when causes-of-death are uncertain and assigned based on the observer's knowledge of the most likely cause. To explicitly incorporate this use of expert knowledge and the associated uncertainty, we developed a statistical model for estimating cause-specific mortality using a data augmentation approach within a Bayesian hierarchical framework. Specifically, for each mortality event, we elicited the observer's belief of cause-of-death by having them specify the probability that the death was due to each potential cause. These probabilities were then used as prior predictive values within our framework. This hierarchical framework permitted a simple and rigorous estimation method that was easily modified to include covariate effects and regularizing terms. Although applied to survival analysis, this method can be extended to any event-time analysis with multiple event types, for which there is uncertainty regarding the true outcome. We conducted simulations to determine how our framework compared to traditional approaches that use expert knowledge implicitly and assume that cause-of-death is specified accurately. Simulation results supported the inclusion of observer uncertainty in cause-of-death assignment in modeling of cause-specific mortality to improve model performance and inference. Finally, we applied the statistical model we developed and a traditional method to cause-specific survival data for white-tailed deer, and compared results. We demonstrate that model selection results changed between the two approaches, and incorporating observer knowledge in cause-of-death increased the variability associated with parameter estimates when compared to the traditional approach. These differences between the two approaches can impact reported results, and therefore, it is critical to explicitly incorporate expert knowledge in statistical methods to ensure rigorous inference.
The impact of varicocelectomy on sperm parameters: a meta-analysis.

PubMed

Schauer, Ingrid; Madersbacher, Stephan; Jost, Romy; Hübner, Wilhelm Alexander; Imhof, Martin

2012-05-01

We determined the impact of 3 surgical techniques (high ligation, inguinal varicocelectomy and the subinguinal approach) for varicocelectomy on sperm parameters (count and motility) and pregnancy rates. By searching the literature using MEDLINE and the Cochrane Library with the last search performed in February 2011, focusing on the last 20 years, a total of 94 articles published between 1975 and 2011 reporting on sperm parameters before and after varicocelectomy were identified. Inclusion criteria for this meta-analysis were at least 2 semen analyses (before and 3 or more months after the procedure), patient age older than 19 years, clinical subfertility and/or abnormal semen parameters, and a clinically palpable varicocele. To rule out skewing factors a bias analysis was performed, and statistical analysis was done with RevMan5(®) and SPSS 15.0(®). A total of 14 articles were included in the statistical analysis. All 3 surgical approaches led to significant or highly significant postoperative improvement of both parameters with only slight numeric differences among the techniques. This difference did not reach statistical significance for sperm count (p = 0.973) or sperm motility (p = 0.372). After high ligation surgery sperm count increased by 10.85 million per ml (p = 0.006) and motility by 6.80% (p <0.00001) on the average. Inguinal varicocelectomy led to an improvement in sperm count of 7.17 million per ml (p <0.0001) while motility changed by 9.44% (p = 0.001). Subinguinal varicocelectomy provided an increase in sperm count of 9.75 million per ml (p = 0.002) and sperm motility by 12.25% (p = 0.001). Inguinal varicocelectomy showed the highest pregnancy rate of 41.48% compared to 26.90% and 26.56% after high ligation and subinguinal varicocelectomy, respectively, and the difference was statistically significant (p = 0.035). This meta-analysis suggests that varicocelectomy leads to significant improvements in sperm count and motility regardless of surgical technique, with the inguinal approach offering the highest pregnancy rate. Copyright © 2012 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.
Level statistics of words: Finding keywords in literary texts and symbolic sequences

NASA Astrophysics Data System (ADS)

Carpena, P.; Bernaola-Galván, P.; Hackenberg, M.; Coronado, A. V.; Oliver, J. L.

2009-03-01

Using a generalization of the level statistics analysis of quantum disordered systems, we present an approach able to extract automatically keywords in literary texts. Our approach takes into account not only the frequencies of the words present in the text but also their spatial distribution along the text, and is based on the fact that relevant words are significantly clustered (i.e., they self-attract each other), while irrelevant words are distributed randomly in the text. Since a reference corpus is not needed, our approach is especially suitable for single documents for which no a priori information is available. In addition, we show that our method works also in generic symbolic sequences (continuous texts without spaces), thus suggesting its general applicability.
Renormalization Group Tutorial

NASA Technical Reports Server (NTRS)

Bell, Thomas L.

2004-01-01

Complex physical systems sometimes have statistical behavior characterized by power- law dependence on the parameters of the system and spatial variability with no particular characteristic scale as the parameters approach critical values. The renormalization group (RG) approach was developed in the fields of statistical mechanics and quantum field theory to derive quantitative predictions of such behavior in cases where conventional methods of analysis fail. Techniques based on these ideas have since been extended to treat problems in many different fields, and in particular, the behavior of turbulent fluids. This lecture will describe a relatively simple but nontrivial example of the RG approach applied to the diffusion of photons out of a stellar medium when the photons have wavelengths near that of an emission line of atoms in the medium.
A statistical physics perspective on alignment-independent protein sequence comparison.

PubMed

Chattopadhyay, Amit K; Nasiev, Diar; Flower, Darren R

2015-08-01

Within bioinformatics, the textual alignment of amino acid sequences has long dominated the determination of similarity between proteins, with all that implies for shared structure, function and evolutionary descent. Despite the relative success of modern-day sequence alignment algorithms, so-called alignment-free approaches offer a complementary means of determining and expressing similarity, with potential benefits in certain key applications, such as regression analysis of protein structure-function studies, where alignment-base similarity has performed poorly. Here, we offer a fresh, statistical physics-based perspective focusing on the question of alignment-free comparison, in the process adapting results from 'first passage probability distribution' to summarize statistics of ensemble averaged amino acid propensity values. In this article, we introduce and elaborate this approach. © The Author 2015. Published by Oxford University Press.

Suggestions for presenting the results of data analyses

USGS Publications Warehouse

Anderson, David R.; Link, William A.; Johnson, Douglas H.; Burnham, Kenneth P.

2001-01-01

We give suggestions for the presentation of research results from frequentist, information-theoretic, and Bayesian analysis paradigms, followed by several general suggestions. The information-theoretic and Bayesian methods offer alternative approaches to data analysis and inference compared to traditionally used methods. Guidance is lacking on the presentation of results under these alternative procedures and on nontesting aspects of classical frequentists methods of statistical analysis. Null hypothesis testing has come under intense criticism. We recommend less reporting of the results of statistical tests of null hypotheses in cases where the null is surely false anyway, or where the null hypothesis is of little interest to science or management.
Global Sensitivity Analysis of Environmental Systems via Multiple Indices based on Statistical Moments of Model Outputs

NASA Astrophysics Data System (ADS)

Guadagnini, A.; Riva, M.; Dell'Oca, A.

2017-12-01

We propose to ground sensitivity of uncertain parameters of environmental models on a set of indices based on the main (statistical) moments, i.e., mean, variance, skewness and kurtosis, of the probability density function (pdf) of a target model output. This enables us to perform Global Sensitivity Analysis (GSA) of a model in terms of multiple statistical moments and yields a quantification of the impact of model parameters on features driving the shape of the pdf of model output. Our GSA approach includes the possibility of being coupled with the construction of a reduced complexity model that allows approximating the full model response at a reduced computational cost. We demonstrate our approach through a variety of test cases. These include a commonly used analytical benchmark, a simplified model representing pumping in a coastal aquifer, a laboratory-scale tracer experiment, and the migration of fracturing fluid through a naturally fractured reservoir (source) to reach an overlying formation (target). Our strategy allows discriminating the relative importance of model parameters to the four statistical moments considered. We also provide an appraisal of the error associated with the evaluation of our sensitivity metrics by replacing the original system model through the selected surrogate model. Our results suggest that one might need to construct a surrogate model with increasing level of accuracy depending on the statistical moment considered in the GSA. The methodological framework we propose can assist the development of analysis techniques targeted to model calibration, design of experiment, uncertainty quantification and risk assessment.
Structural Analysis of Covariance and Correlation Matrices.

ERIC Educational Resources Information Center

Joreskog, Karl G.

1978-01-01

A general approach to analysis of covariance structures is considered, in which the variances and covariances or correlations of the observed variables are directly expressed in terms of the parameters of interest. The statistical problems of identification, estimation and testing of such covariance or correlation structures are discussed.…
Exploring Rating Quality in Rater-Mediated Assessments Using Mokken Scale Analysis

ERIC Educational Resources Information Center

Wind, Stefanie A.; Engelhard, George, Jr.

2016-01-01

Mokken scale analysis is a probabilistic nonparametric approach that offers statistical and graphical tools for evaluating the quality of social science measurement without placing potentially inappropriate restrictions on the structure of a data set. In particular, Mokken scaling provides a useful method for evaluating important measurement…
Opportunities for Applied Behavior Analysis in the Total Quality Movement.

ERIC Educational Resources Information Center

Redmon, William K.

1992-01-01

This paper identifies critical components of recent organizational quality improvement programs and specifies how applied behavior analysis can contribute to quality technology. Statistical Process Control and Total Quality Management approaches are compared, and behavior analysts are urged to build their research base and market behavior change…
Intuitive Analysis of Variance-- A Formative Assessment Approach

ERIC Educational Resources Information Center

Trumpower, David

2013-01-01

This article describes an assessment activity that can show students how much they intuitively understand about statistics, but also alert them to common misunderstandings. How the activity can be used formatively to help improve students' conceptual understanding of analysis of variance is discussed. (Contains 1 figure and 1 table.)
Single-Level and Multilevel Mediation Analysis

ERIC Educational Resources Information Center

Tofighi, Davood; Thoemmes, Felix

2014-01-01

Mediation analysis is a statistical approach used to examine how the effect of an independent variable on an outcome is transmitted through an intervening variable (mediator). In this article, we provide a gentle introduction to single-level and multilevel mediation analyses. Using single-level data, we demonstrate an application of structural…
A General Approach to Causal Mediation Analysis

ERIC Educational Resources Information Center

Imai, Kosuke; Keele, Luke; Tingley, Dustin

2010-01-01

Traditionally in the social sciences, causal mediation analysis has been formulated, understood, and implemented within the framework of linear structural equation models. We argue and demonstrate that this is problematic for 3 reasons: the lack of a general definition of causal mediation effects independent of a particular statistical model, the…
Seasonal rationalization of river water quality sampling locations: a comparative study of the modified Sanders and multivariate statistical approaches.

PubMed

Varekar, Vikas; Karmakar, Subhankar; Jha, Ramakar

2016-02-01

The design of surface water quality sampling location is a crucial decision-making process for rationalization of monitoring network. The quantity, quality, and types of available dataset (watershed characteristics and water quality data) may affect the selection of appropriate design methodology. The modified Sanders approach and multivariate statistical techniques [particularly factor analysis (FA)/principal component analysis (PCA)] are well-accepted and widely used techniques for design of sampling locations. However, their performance may vary significantly with quantity, quality, and types of available dataset. In this paper, an attempt has been made to evaluate performance of these techniques by accounting the effect of seasonal variation, under a situation of limited water quality data but extensive watershed characteristics information, as continuous and consistent river water quality data is usually difficult to obtain, whereas watershed information may be made available through application of geospatial techniques. A case study of Kali River, Western Uttar Pradesh, India, is selected for the analysis. The monitoring was carried out at 16 sampling locations. The discrete and diffuse pollution loads at different sampling sites were estimated and accounted using modified Sanders approach, whereas the monitored physical and chemical water quality parameters were utilized as inputs for FA/PCA. The designed optimum number of sampling locations for monsoon and non-monsoon seasons by modified Sanders approach are eight and seven while that for FA/PCA are eleven and nine, respectively. Less variation in the number and locations of designed sampling sites were obtained by both techniques, which shows stability of results. A geospatial analysis has also been carried out to check the significance of designed sampling location with respect to river basin characteristics and land use of the study area. Both methods are equally efficient; however, modified Sanders approach outperforms FA/PCA when limited water quality and extensive watershed information is available. The available water quality dataset is limited and FA/PCA-based approach fails to identify monitoring locations with higher variation, as these multivariate statistical approaches are data-driven. The priority/hierarchy and number of sampling sites designed by modified Sanders approach are well justified by the land use practices and observed river basin characteristics of the study area.
Advanced statistical energy analysis

NASA Astrophysics Data System (ADS)

Heron, K. H.

1994-09-01

A high-frequency theory (advanced statistical energy analysis (ASEA)) is developed which takes account of the mechanism of tunnelling and uses a ray theory approach to track the power flowing around a plate or a beam network and then uses statistical energy analysis (SEA) to take care of any residual power. ASEA divides the energy of each sub-system into energy that is freely available for transfer to other sub-systems and energy that is fixed within the sub-systems that are physically separate and can be interpreted as a series of mathematical models, the first of which is identical to standard SEA and subsequent higher order models are convergent on an accurate prediction. Using a structural assembly of six rods as an example, ASEA is shown to converge onto the exact results while SEA is shown to overpredict by up to 60 dB.
Experimental analysis of computer system dependability

NASA Technical Reports Server (NTRS)

Iyer, Ravishankar, K.; Tang, Dong

1993-01-01

This paper reviews an area which has evolved over the past 15 years: experimental analysis of computer system dependability. Methodologies and advances are discussed for three basic approaches used in the area: simulated fault injection, physical fault injection, and measurement-based analysis. The three approaches are suited, respectively, to dependability evaluation in the three phases of a system's life: design phase, prototype phase, and operational phase. Before the discussion of these phases, several statistical techniques used in the area are introduced. For each phase, a classification of research methods or study topics is outlined, followed by discussion of these methods or topics as well as representative studies. The statistical techniques introduced include the estimation of parameters and confidence intervals, probability distribution characterization, and several multivariate analysis methods. Importance sampling, a statistical technique used to accelerate Monte Carlo simulation, is also introduced. The discussion of simulated fault injection covers electrical-level, logic-level, and function-level fault injection methods as well as representative simulation environments such as FOCUS and DEPEND. The discussion of physical fault injection covers hardware, software, and radiation fault injection methods as well as several software and hybrid tools including FIAT, FERARI, HYBRID, and FINE. The discussion of measurement-based analysis covers measurement and data processing techniques, basic error characterization, dependency analysis, Markov reward modeling, software-dependability, and fault diagnosis. The discussion involves several important issues studies in the area, including fault models, fast simulation techniques, workload/failure dependency, correlated failures, and software fault tolerance.
Experimental Investigations on Two Potential Sound Diffuseness Measures in Enclosures

NASA Astrophysics Data System (ADS)

Bai, Xin

This study investigates two different approaches to measure sound field diffuseness in enclosures from monophonic room impulse responses. One approach quantifies sound field diffuseness in enclosures by calculating the kurtosis of the pressure samples of room impulse responses. Kurtosis is a statistical measure that is known to describe the peakedness or tailedness of the distribution of a set of data. High kurtosis indicates low diffuseness of the sound field of interest. The other one relies on multifractal detrended fluctuation analysis which is a way to evaluate the statistical self-affinity of a signal to measure diffuseness. To test these two approaches, room impulse responses are obtained under varied room-acoustic diffuseness configurations, achieved by using varied degrees of diffusely reflecting interior surfaces. This paper will analyze experimentally measured monophonic room impulse responses, and discuss results from these two approaches.
Comparison of algebraic and analytical approaches to the formulation of the statistical model-based reconstruction problem for X-ray computed tomography.

PubMed

Cierniak, Robert; Lorent, Anna

2016-09-01

The main aim of this paper is to investigate properties of our originally formulated statistical model-based iterative approach applied to the image reconstruction from projections problem which are related to its conditioning, and, in this manner, to prove a superiority of this approach over ones recently used by other authors. The reconstruction algorithm based on this conception uses a maximum likelihood estimation with an objective adjusted to the probability distribution of measured signals obtained from an X-ray computed tomography system with parallel beam geometry. The analysis and experimental results presented here show that our analytical approach outperforms the referential algebraic methodology which is explored widely in the literature and exploited in various commercial implementations. Copyright © 2016 Elsevier Ltd. All rights reserved.
Statistical Analysis of CFD Solutions from the Fourth AIAA Drag Prediction Workshop

NASA Technical Reports Server (NTRS)

Morrison, Joseph H.

2010-01-01

A graphical framework is used for statistical analysis of the results from an extensive N-version test of a collection of Reynolds-averaged Navier-Stokes computational fluid dynamics codes. The solutions were obtained by code developers and users from the U.S., Europe, Asia, and Russia using a variety of grid systems and turbulence models for the June 2009 4th Drag Prediction Workshop sponsored by the AIAA Applied Aerodynamics Technical Committee. The aerodynamic configuration for this workshop was a new subsonic transport model, the Common Research Model, designed using a modern approach for the wing and included a horizontal tail. The fourth workshop focused on the prediction of both absolute and incremental drag levels for wing-body and wing-body-horizontal tail configurations. This work continues the statistical analysis begun in the earlier workshops and compares the results from the grid convergence study of the most recent workshop with earlier workshops using the statistical framework.
A Classification and Analysis of National Contract Management Journal Articles from 1966 Through 1989

DTIC Science & Technology

1991-06-01

THEORETICAL, NORMATIVE, EMPIRICAL, INDUCTIVE [24] "New Approaches for Quantifying Risk and Determining Sharing Arrangements," Raymond S. Lieber, pp...the negotiation process can be enhanced by quantifying risk using statistical methods. The article discusses two approaches which allow the...in Incentive Contracting," Melvin W. Lifson, pp. 59-80. The purpose to this article is to suggest a general approach for defining and quantifying
Information categorization approach to literary authorship disputes

NASA Astrophysics Data System (ADS)

Yang, Albert C.-C.; Peng, C.-K.; Yien, H.-W.; Goldberger, Ary L.

2003-11-01

Scientific analysis of the linguistic styles of different authors has generated considerable interest. We present a generic approach to measuring the similarity of two symbolic sequences that requires minimal background knowledge about a given human language. Our analysis is based on word rank order-frequency statistics and phylogenetic tree construction. We demonstrate the applicability of this method to historic authorship questions related to the classic Chinese novel “The Dream of the Red Chamber,” to the plays of William Shakespeare, and to the Federalist papers. This method may also provide a simple approach to other large databases based on their information content.
Discovering human germ cell mutagens with whole genome sequencing: Insights from power calculations reveal the importance of controlling for between-family variability.

PubMed

Webster, R J; Williams, A; Marchetti, F; Yauk, C L

2018-07-01

Mutations in germ cells pose potential genetic risks to offspring. However, de novo mutations are rare events that are spread across the genome and are difficult to detect. Thus, studies in this area have generally been under-powered, and no human germ cell mutagen has been identified. Whole Genome Sequencing (WGS) of human pedigrees has been proposed as an approach to overcome these technical and statistical challenges. WGS enables analysis of a much wider breadth of the genome than traditional approaches. Here, we performed power analyses to determine the feasibility of using WGS in human families to identify germ cell mutagens. Different statistical models were compared in the power analyses (ANOVA and multiple regression for one-child families, and mixed effect model sampling between two to four siblings per family). Assumptions were made based on parameters from the existing literature, such as the mutation-by-paternal age effect. We explored two scenarios: a constant effect due to an exposure that occurred in the past, and an accumulating effect where the exposure is continuing. Our analysis revealed the importance of modeling inter-family variability of the mutation-by-paternal age effect. Statistical power was improved by models accounting for the family-to-family variability. Our power analyses suggest that sufficient statistical power can be attained with 4-28 four-sibling families per treatment group, when the increase in mutations ranges from 40 to 10% respectively. Modeling family variability using mixed effect models provided a reduction in sample size compared to a multiple regression approach. Much larger sample sizes were required to detect an interaction effect between environmental exposures and paternal age. These findings inform study design and statistical modeling approaches to improve power and reduce sequencing costs for future studies in this area. Crown Copyright © 2018. Published by Elsevier B.V. All rights reserved.
Visualizing statistical significance of disease clusters using cartograms.

PubMed

Kronenfeld, Barry J; Wong, David W S

2017-05-15

Health officials and epidemiological researchers often use maps of disease rates to identify potential disease clusters. Because these maps exaggerate the prominence of low-density districts and hide potential clusters in urban (high-density) areas, many researchers have used density-equalizing maps (cartograms) as a basis for epidemiological mapping. However, we do not have existing guidelines for visual assessment of statistical uncertainty. To address this shortcoming, we develop techniques for visual determination of statistical significance of clusters spanning one or more districts on a cartogram. We developed the techniques within a geovisual analytics framework that does not rely on automated significance testing, and can therefore facilitate visual analysis to detect clusters that automated techniques might miss. On a cartogram of the at-risk population, the statistical significance of a disease cluster is determinate from the rate, area and shape of the cluster under standard hypothesis testing scenarios. We develop formulae to determine, for a given rate, the area required for statistical significance of a priori and a posteriori designated regions under certain test assumptions. Uniquely, our approach enables dynamic inference of aggregate regions formed by combining individual districts. The method is implemented in interactive tools that provide choropleth mapping, automated legend construction and dynamic search tools to facilitate cluster detection and assessment of the validity of tested assumptions. A case study of leukemia incidence analysis in California demonstrates the ability to visually distinguish between statistically significant and insignificant regions. The proposed geovisual analytics approach enables intuitive visual assessment of statistical significance of arbitrarily defined regions on a cartogram. Our research prompts a broader discussion of the role of geovisual exploratory analyses in disease mapping and the appropriate framework for visually assessing the statistical significance of spatial clusters.
Statistical comparison of a hybrid approach with approximate and exact inference models for Fusion 2+

NASA Astrophysics Data System (ADS)

Lee, K. David; Wiesenfeld, Eric; Gelfand, Andrew

2007-04-01

One of the greatest challenges in modern combat is maintaining a high level of timely Situational Awareness (SA). In many situations, computational complexity and accuracy considerations make the development and deployment of real-time, high-level inference tools very difficult. An innovative hybrid framework that combines Bayesian inference, in the form of Bayesian Networks, and Possibility Theory, in the form of Fuzzy Logic systems, has recently been introduced to provide a rigorous framework for high-level inference. In previous research, the theoretical basis and benefits of the hybrid approach have been developed. However, lacking is a concrete experimental comparison of the hybrid framework with traditional fusion methods, to demonstrate and quantify this benefit. The goal of this research, therefore, is to provide a statistical analysis on the comparison of the accuracy and performance of hybrid network theory, with pure Bayesian and Fuzzy systems and an inexact Bayesian system approximated using Particle Filtering. To accomplish this task, domain specific models will be developed under these different theoretical approaches and then evaluated, via Monte Carlo Simulation, in comparison to situational ground truth to measure accuracy and fidelity. Following this, a rigorous statistical analysis of the performance results will be performed, to quantify the benefit of hybrid inference to other fusion tools.
Neural network approach in multichannel auditory event-related potential analysis.

PubMed

Wu, F Y; Slater, J D; Ramsay, R E

1994-04-01

Even though there are presently no clearly defined criteria for the assessment of P300 event-related potential (ERP) abnormality, it is strongly indicated through statistical analysis that such criteria exist for classifying control subjects and patients with diseases resulting in neuropsychological impairment such as multiple sclerosis (MS). We have demonstrated the feasibility of artificial neural network (ANN) methods in classifying ERP waveforms measured at a single channel (Cz) from control subjects and MS patients. In this paper, we report the results of multichannel ERP analysis and a modified network analysis methodology to enhance automation of the classification rule extraction process. The proposed methodology significantly reduces the work of statistical analysis. It also helps to standardize the criteria of P300 ERP assessment and facilitate the computer-aided analysis on neuropsychological functions.

Bayesian correction for covariate measurement error: A frequentist evaluation and comparison with regression calibration.

PubMed

Bartlett, Jonathan W; Keogh, Ruth H

2018-06-01

Bayesian approaches for handling covariate measurement error are well established and yet arguably are still relatively little used by researchers. For some this is likely due to unfamiliarity or disagreement with the Bayesian inferential paradigm. For others a contributory factor is the inability of standard statistical packages to perform such Bayesian analyses. In this paper, we first give an overview of the Bayesian approach to handling covariate measurement error, and contrast it with regression calibration, arguably the most commonly adopted approach. We then argue why the Bayesian approach has a number of statistical advantages compared to regression calibration and demonstrate that implementing the Bayesian approach is usually quite feasible for the analyst. Next, we describe the closely related maximum likelihood and multiple imputation approaches and explain why we believe the Bayesian approach to generally be preferable. We then empirically compare the frequentist properties of regression calibration and the Bayesian approach through simulation studies. The flexibility of the Bayesian approach to handle both measurement error and missing data is then illustrated through an analysis of data from the Third National Health and Nutrition Examination Survey.
Nonparametric Residue Analysis of Dynamic PET Data With Application to Cerebral FDG Studies in Normals.

PubMed

O'Sullivan, Finbarr; Muzi, Mark; Spence, Alexander M; Mankoff, David M; O'Sullivan, Janet N; Fitzgerald, Niall; Newman, George C; Krohn, Kenneth A

2009-06-01

Kinetic analysis is used to extract metabolic information from dynamic positron emission tomography (PET) uptake data. The theory of indicator dilutions, developed in the seminal work of Meier and Zierler (1954), provides a probabilistic framework for representation of PET tracer uptake data in terms of a convolution between an arterial input function and a tissue residue. The residue is a scaled survival function associated with tracer residence in the tissue. Nonparametric inference for the residue, a deconvolution problem, provides a novel approach to kinetic analysis-critically one that is not reliant on specific compartmental modeling assumptions. A practical computational technique based on regularized cubic B-spline approximation of the residence time distribution is proposed. Nonparametric residue analysis allows formal statistical evaluation of specific parametric models to be considered. This analysis needs to properly account for the increased flexibility of the nonparametric estimator. The methodology is illustrated using data from a series of cerebral studies with PET and fluorodeoxyglucose (FDG) in normal subjects. Comparisons are made between key functionals of the residue, tracer flux, flow, etc., resulting from a parametric (the standard two-compartment of Phelps et al. 1979) and a nonparametric analysis. Strong statistical evidence against the compartment model is found. Primarily these differences relate to the representation of the early temporal structure of the tracer residence-largely a function of the vascular supply network. There are convincing physiological arguments against the representations implied by the compartmental approach but this is the first time that a rigorous statistical confirmation using PET data has been reported. The compartmental analysis produces suspect values for flow but, notably, the impact on the metabolic flux, though statistically significant, is limited to deviations on the order of 3%-4%. The general advantage of the nonparametric residue analysis is the ability to provide a valid kinetic quantitation in the context of studies where there may be heterogeneity or other uncertainty about the accuracy of a compartmental model approximation of the tissue residue.
Comparing geological and statistical approaches for element selection in sediment tracing research

NASA Astrophysics Data System (ADS)

Laceby, J. Patrick; McMahon, Joe; Evrard, Olivier; Olley, Jon

2015-04-01

Elevated suspended sediment loads reduce reservoir capacity and significantly increase the cost of operating water treatment infrastructure, making the management of sediment supply to reservoirs of increasingly importance. Sediment fingerprinting techniques can be used to determine the relative contributions of different sources of sediment accumulating in reservoirs. The objective of this research is to compare geological and statistical approaches to element selection for sediment fingerprinting modelling. Time-integrated samplers (n=45) were used to obtain source samples from four major subcatchments flowing into the Baroon Pocket Dam in South East Queensland, Australia. The geochemistry of potential sources were compared to the geochemistry of sediment cores (n=12) sampled in the reservoir. The geochemical approach selected elements for modelling that provided expected, observed and statistical discrimination between sediment sources. Two statistical approaches selected elements for modelling with the Kruskal-Wallis H-test and Discriminatory Function Analysis (DFA). In particular, two different significance levels (0.05 & 0.35) for the DFA were included to investigate the importance of element selection on modelling results. A distribution model determined the relative contributions of different sources to sediment sampled in the Baroon Pocket Dam. Elemental discrimination was expected between one subcatchment (Obi Obi Creek) and the remaining subcatchments (Lexys, Falls and Bridge Creek). Six major elements were expected to provide discrimination. Of these six, only Fe2O3 and SiO2 provided expected, observed and statistical discrimination. Modelling results with this geological approach indicated 36% (+/- 9%) of sediment sampled in the reservoir cores were from mafic-derived sources and 64% (+/- 9%) were from felsic-derived sources. The geological and the first statistical approach (DFA0.05) differed by only 1% (σ 5%) for 5 out of 6 model groupings with only the Lexys Creek modelling results differing significantly (35%). The statistical model with expanded elemental selection (DFA0.35) differed from the geological model by an average of 30% for all 6 models. Elemental selection for sediment fingerprinting therefore has the potential to impact modeling results. Accordingly is important to incorporate both robust geological and statistical approaches when selecting elements for sediment fingerprinting. For the Baroon Pocket Dam, management should focus on reducing the supply of sediments derived from felsic sources in each of the subcatchments.
metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis.

PubMed

Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti

2016-07-01

A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness.Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Code is available at https://github.com/aalto-ics-kepaco anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis

PubMed Central

Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J.; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T.; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti

2016-01-01

Motivation: A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. Results: We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness. Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Availability and implementation: Code is available at https://github.com/aalto-ics-kepaco Contacts: anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153689
Power analysis on the time effect for the longitudinal Rasch model.

PubMed

Feddag, M L; Blanchin, M; Hardouin, J B; Sebille, V

2014-01-01

Statistics literature in the social, behavioral, and biomedical sciences typically stress the importance of power analysis. Patient Reported Outcomes (PRO) such as quality of life and other perceived health measures (pain, fatigue, stress,...) are increasingly used as important health outcomes in clinical trials or in epidemiological studies. They cannot be directly observed nor measured as other clinical or biological data and they are often collected through questionnaires with binary or polytomous items. The Rasch model is the well known model in the item response theory (IRT) for binary data. The article proposes an approach to evaluate the statistical power of the time effect for the longitudinal Rasch model with two time points. The performance of this method is compared to the one obtained by simulation study. Finally, the proposed approach is illustrated on one subscale of the SF-36 questionnaire.
Using Genome-Wide Expression Profiling to Define Gene Networks Relevant to the Study of Complex Traits: From RNA Integrity to Network Topology

PubMed Central

O'Brien, M.A.; Costin, B.N.; Miles, M.F.

2014-01-01

Postgenomic studies of the function of genes and their role in disease have now become an area of intense study since efforts to define the raw sequence material of the genome have largely been completed. The use of whole-genome approaches such as microarray expression profiling and, more recently, RNA-sequence analysis of transcript abundance has allowed an unprecedented look at the workings of the genome. However, the accurate derivation of such high-throughput data and their analysis in terms of biological function has been critical to truly leveraging the postgenomic revolution. This chapter will describe an approach that focuses on the use of gene networks to both organize and interpret genomic expression data. Such networks, derived from statistical analysis of large genomic datasets and the application of multiple bioinformatics data resources, poten-tially allow the identification of key control elements for networks associated with human disease, and thus may lead to derivation of novel therapeutic approaches. However, as discussed in this chapter, the leveraging of such networks cannot occur without a thorough understanding of the technical and statistical factors influencing the derivation of genomic expression data. Thus, while the catch phrase may be “it's the network … stupid,” the understanding of factors extending from RNA isolation to genomic profiling technique, multivariate statistics, and bioinformatics are all critical to defining fully useful gene networks for study of complex biology. PMID:23195313
Validating an Air Traffic Management Concept of Operation Using Statistical Modeling

NASA Technical Reports Server (NTRS)

He, Yuning; Davies, Misty Dawn

2013-01-01

Validating a concept of operation for a complex, safety-critical system (like the National Airspace System) is challenging because of the high dimensionality of the controllable parameters and the infinite number of states of the system. In this paper, we use statistical modeling techniques to explore the behavior of a conflict detection and resolution algorithm designed for the terminal airspace. These techniques predict the robustness of the system simulation to both nominal and off-nominal behaviors within the overall airspace. They also can be used to evaluate the output of the simulation against recorded airspace data. Additionally, the techniques carry with them a mathematical value of the worth of each prediction-a statistical uncertainty for any robustness estimate. Uncertainty Quantification (UQ) is the process of quantitative characterization and ultimately a reduction of uncertainties in complex systems. UQ is important for understanding the influence of uncertainties on the behavior of a system and therefore is valuable for design, analysis, and verification and validation. In this paper, we apply advanced statistical modeling methodologies and techniques on an advanced air traffic management system, namely the Terminal Tactical Separation Assured Flight Environment (T-TSAFE). We show initial results for a parameter analysis and safety boundary (envelope) detection in the high-dimensional parameter space. For our boundary analysis, we developed a new sequential approach based upon the design of computer experiments, allowing us to incorporate knowledge from domain experts into our modeling and to determine the most likely boundary shapes and its parameters. We carried out the analysis on system parameters and describe an initial approach that will allow us to include time-series inputs, such as the radar track data, into the analysis
Comparison of Unidimensional and Multidimensional Approaches to IRT Parameter Estimation. Research Report. ETS RR-04-44

ERIC Educational Resources Information Center

Zhang, Jinming

2004-01-01

It is common to assume during statistical analysis of a multiscale assessment that the assessment has simple structure or that it is composed of several unidimensional subtests. Under this assumption, both the unidimensional and multidimensional approaches can be used to estimate item parameters. This paper theoretically demonstrates that these…
Statistical Tools And Artificial Intelligence Approaches To Predict Fracture In Bulk Forming Processes

NASA Astrophysics Data System (ADS)

Di Lorenzo, R.; Ingarao, G.; Fonti, V.

2007-05-01

The crucial task in the prevention of ductile fracture is the availability of a tool for the prediction of such defect occurrence. The technical literature presents a wide investigation on this topic and many contributions have been given by many authors following different approaches. The main class of approaches regards the development of fracture criteria: generally, such criteria are expressed by determining a critical value of a damage function which depends on stress and strain paths: ductile fracture is assumed to occur when such critical value is reached during the analysed process. There is a relevant drawback related to the utilization of ductile fracture criteria; in fact each criterion usually has good performances in the prediction of fracture for particular stress - strain paths, i.e. it works very well for certain processes but may provide no good results for other processes. On the other hand, the approaches based on damage mechanics formulation are very effective from a theoretical point of view but they are very complex and their proper calibration is quite difficult. In this paper, two different approaches are investigated to predict fracture occurrence in cold forming operations. The final aim of the proposed method is the achievement of a tool which has a general reliability i.e. it is able to predict fracture for different forming processes. The proposed approach represents a step forward within a research project focused on the utilization of innovative predictive tools for ductile fracture. The paper presents a comparison between an artificial neural network design procedure and an approach based on statistical tools; both the approaches were aimed to predict fracture occurrence/absence basing on a set of stress and strain paths data. The proposed approach is based on the utilization of experimental data available, for a given material, on fracture occurrence in different processes. More in detail, the approach consists in the analysis of experimental tests in which fracture occurs followed by the numerical simulations of such processes in order to track the stress-strain paths in the workpiece region where fracture is expected. Such data are utilized to build up a proper data set which was utilized both to train an artificial neural network and to perform a statistical analysis aimed to predict fracture occurrence. The developed statistical tool is properly designed and optimized and is able to recognize the fracture occurrence. The reliability and predictive capability of the statistical method were compared with the ones obtained from an artificial neural network developed to predict fracture occurrence. Moreover, the approach is validated also in forming processes characterized by a complex fracture mechanics.
Analysis of the procedures used to evaluate suicide crime scenes in Brazil: a statistical approach to interpret reports.

PubMed

Bruni, Aline Thaís; Velho, Jesus Antonio; Ferreira, Arthur Serra Lopes; Tasso, Maria Júlia; Ferrari, Raíssa Santos; Yoshida, Ricardo Luís; Dias, Marcos Salvador; Leite, Vitor Barbanti Pereira

2014-08-01

This study uses statistical techniques to evaluate reports on suicide scenes; it utilizes 80 reports from different locations in Brazil, randomly collected from both federal and state jurisdictions. We aimed to assess a heterogeneous group of cases in order to obtain an overall perspective of the problem. We evaluated variables regarding the characteristics of the crime scene, such as the detected traces (blood, instruments and clothes) that were found and we addressed the methodology employed by the experts. A qualitative approach using basic statistics revealed a wide distribution as to how the issue was addressed in the documents. We examined a quantitative approach involving an empirical equation and we used multivariate procedures to validate the quantitative methodology proposed for this empirical equation. The methodology successfully identified the main differences in the information presented in the reports, showing that there is no standardized method of analyzing evidences. Copyright © 2014 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
Statistical approach for the retrieval of phytoplankton community structures from in situ fluorescence measurements.

PubMed

Wang, Shengqiang; Xiao, Cong; Ishizaka, Joji; Qiu, Zhongfeng; Sun, Deyong; Xu, Qian; Zhu, Yuanli; Huan, Yu; Watanabe, Yuji

2016-10-17

Knowledge of phytoplankton community structures is important to the understanding of various marine biogeochemical processes and ecosystem. Fluorescence excitation spectra (F(λ)) provide great potential for studying phytoplankton communities because their spectral variability depends on changes in the pigment compositions related to distinct phytoplankton groups. Commercial spectrofluorometers have been developed to analyze phytoplankton communities by measuring the field F(λ), but estimations using the default methods are not always accurate because of their strong dependence on norm spectra, which are obtained by culturing pure algae of a given group and are assumed to be constant. In this study, we proposed a novel approach for estimating the chlorophyll a (Chl a) fractions of brown algae, cyanobacteria, green algae and cryptophytes based on a data set collected in the East China Sea (ECS) and the Tsushima Strait (TS), with concurrent measurements of in vivo F(λ) and phytoplankton communities derived from pigments analysis. The new approach blends various statistical features by computing the band ratios and continuum-removed spectra of F(λ) without requiring a priori knowledge of the norm spectra. The model evaluations indicate that our approach yields good estimations of the Chl a fractions, with root-mean-square errors of 0.117, 0.078, 0.072 and 0.060 for brown algae, cyanobacteria, green algae and cryptophytes, respectively. The statistical analysis shows that the models are generally robust to uncertainty in F(λ). We recommend using a site-specific model for more accurate estimations. To develop a site-specific model in the ECS and TS, approximately 26 samples are sufficient for using our approach, but this conclusion needs to be validated in additional regions. Overall, our approach provides a useful technical basis for estimating phytoplankton communities from measurements of F(λ).
Understanding spatial organizations of chromosomes via statistical analysis of Hi-C data

PubMed Central

Hu, Ming; Deng, Ke; Qin, Zhaohui; Liu, Jun S.

2015-01-01

Understanding how chromosomes fold provides insights into the transcription regulation, hence, the functional state of the cell. Using the next generation sequencing technology, the recently developed Hi-C approach enables a global view of spatial chromatin organization in the nucleus, which substantially expands our knowledge about genome organization and function. However, due to multiple layers of biases, noises and uncertainties buried in the protocol of Hi-C experiments, analyzing and interpreting Hi-C data poses great challenges, and requires novel statistical methods to be developed. This article provides an overview of recent Hi-C studies and their impacts on biomedical research, describes major challenges in statistical analysis of Hi-C data, and discusses some perspectives for future research. PMID:26124977
Statistical strategy for anisotropic adventitia modelling in IVUS.

PubMed

Gil, Debora; Hernández, Aura; Rodriguez, Oriol; Mauri, Josepa; Radeva, Petia

2006-06-01

Vessel plaque assessment by analysis of intravascular ultrasound sequences is a useful tool for cardiac disease diagnosis and intervention. Manual detection of luminal (inner) and media-adventitia (external) vessel borders is the main activity of physicians in the process of lumen narrowing (plaque) quantification. Difficult definition of vessel border descriptors, as well as, shades, artifacts, and blurred signal response due to ultrasound physical properties trouble automated adventitia segmentation. In order to efficiently approach such a complex problem, we propose blending advanced anisotropic filtering operators and statistical classification techniques into a vessel border modelling strategy. Our systematic statistical analysis shows that the reported adventitia detection achieves an accuracy in the range of interobserver variability regardless of plaque nature, vessel geometry, and incomplete vessel borders.
Tests of Mediation: Paradoxical Decline in Statistical Power as a Function of Mediator Collinearity

PubMed Central

Beasley, T. Mark

2013-01-01

Increasing the correlation between the independent variable and the mediator (a coefficient) increases the effect size (ab) for mediation analysis; however, increasing a by definition increases collinearity in mediation models. As a result, the standard error of product tests increase. The variance inflation due to increases in a at some point outweighs the increase of the effect size (ab) and results in a loss of statistical power. This phenomenon also occurs with nonparametric bootstrapping approaches because the variance of the bootstrap distribution of ab approximates the variance expected from normal theory. Both variances increase dramatically when a exceeds the b coefficient, thus explaining the power decline with increases in a. Implications for statistical analysis and applied researchers are discussed. PMID:24954952
Separate-channel analysis of two-channel microarrays: recovering inter-spot information.

PubMed

Smyth, Gordon K; Altman, Naomi S

2013-05-26

Two-channel (or two-color) microarrays are cost-effective platforms for comparative analysis of gene expression. They are traditionally analysed in terms of the log-ratios (M-values) of the two channel intensities at each spot, but this analysis does not use all the information available in the separate channel observations. Mixed models have been proposed to analyse intensities from the two channels as separate observations, but such models can be complex to use and the gain in efficiency over the log-ratio analysis is difficult to quantify. Mixed models yield test statistics for the null distributions can be specified only approximately, and some approaches do not borrow strength between genes. This article reformulates the mixed model to clarify the relationship with the traditional log-ratio analysis, to facilitate information borrowing between genes, and to obtain an exact distributional theory for the resulting test statistics. The mixed model is transformed to operate on the M-values and A-values (average log-expression for each spot) instead of on the log-expression values. The log-ratio analysis is shown to ignore information contained in the A-values. The relative efficiency of the log-ratio analysis is shown to depend on the size of the intraspot correlation. A new separate channel analysis method is proposed that assumes a constant intra-spot correlation coefficient across all genes. This approach permits the mixed model to be transformed into an ordinary linear model, allowing the data analysis to use a well-understood empirical Bayes analysis pipeline for linear modeling of microarray data. This yields statistically powerful test statistics that have an exact distributional theory. The log-ratio, mixed model and common correlation methods are compared using three case studies. The results show that separate channel analyses that borrow strength between genes are more powerful than log-ratio analyses. The common correlation analysis is the most powerful of all. The common correlation method proposed in this article for separate-channel analysis of two-channel microarray data is no more difficult to apply in practice than the traditional log-ratio analysis. It provides an intuitive and powerful means to conduct analyses and make comparisons that might otherwise not be possible.
Statistical Projections for Multi-resolution, Multi-dimensional Visual Data Exploration and Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hoa T. Nguyen; Stone, Daithi; E. Wes Bethel

2016-01-01

An ongoing challenge in visual exploration and analysis of large, multi-dimensional datasets is how to present useful, concise information to a user for some specific visualization tasks. Typical approaches to this problem have proposed either reduced-resolution versions of data, or projections of data, or both. These approaches still have some limitations such as consuming high computation or suffering from errors. In this work, we explore the use of a statistical metric as the basis for both projections and reduced-resolution versions of data, with a particular focus on preserving one key trait in data, namely variation. We use two different casemore » studies to explore this idea, one that uses a synthetic dataset, and another that uses a large ensemble collection produced by an atmospheric modeling code to study long-term changes in global precipitation. The primary findings of our work are that in terms of preserving the variation signal inherent in data, that using a statistical measure more faithfully preserves this key characteristic across both multi-dimensional projections and multi-resolution representations than a methodology based upon averaging.« less
In defence of model-based inference in phylogeography

PubMed Central

Beaumont, Mark A.; Nielsen, Rasmus; Robert, Christian; Hey, Jody; Gaggiotti, Oscar; Knowles, Lacey; Estoup, Arnaud; Panchal, Mahesh; Corander, Jukka; Hickerson, Mike; Sisson, Scott A.; Fagundes, Nelson; Chikhi, Lounès; Beerli, Peter; Vitalis, Renaud; Cornuet, Jean-Marie; Huelsenbeck, John; Foll, Matthieu; Yang, Ziheng; Rousset, Francois; Balding, David; Excoffier, Laurent

2017-01-01

Recent papers have promoted the view that model-based methods in general, and those based on Approximate Bayesian Computation (ABC) in particular, are flawed in a number of ways, and are therefore inappropriate for the analysis of phylogeographic data. These papers further argue that Nested Clade Phylogeographic Analysis (NCPA) offers the best approach in statistical phylogeography. In order to remove the confusion and misconceptions introduced by these papers, we justify and explain the reasoning behind model-based inference. We argue that ABC is a statistically valid approach, alongside other computational statistical techniques that have been successfully used to infer parameters and compare models in population genetics. We also examine the NCPA method and highlight numerous deficiencies, either when used with single or multiple loci. We further show that the ages of clades are carelessly used to infer ages of demographic events, that these ages are estimated under a simple model of panmixia and population stationarity but are then used under different and unspecified models to test hypotheses, a usage the invalidates these testing procedures. We conclude by encouraging researchers to study and use model-based inference in population genetics. PMID:29284924
Contour plot assessment of existing meta-analyses confirms robust association of statin use and acute kidney injury risk.

PubMed

Chevance, Aurélie; Schuster, Tibor; Steele, Russell; Ternès, Nils; Platt, Robert W

2015-10-01

Robustness of an existing meta-analysis can justify decisions on whether to conduct an additional study addressing the same research question. We illustrate the graphical assessment of the potential impact of an additional study on an existing meta-analysis using published data on statin use and the risk of acute kidney injury. A previously proposed graphical augmentation approach is used to assess the sensitivity of the current test and heterogeneity statistics extracted from existing meta-analysis data. In addition, we extended the graphical augmentation approach to assess potential changes in the pooled effect estimate after updating a current meta-analysis and applied the three graphical contour definitions to data from meta-analyses on statin use and acute kidney injury risk. In the considered example data, the pooled effect estimates and heterogeneity indices demonstrated to be considerably robust to the addition of a future study. Supportingly, for some previously inconclusive meta-analyses, a study update might yield statistically significant kidney injury risk increase associated with higher statin exposure. The illustrated contour approach should become a standard tool for the assessment of the robustness of meta-analyses. It can guide decisions on whether to conduct additional studies addressing a relevant research question. Copyright © 2015 Elsevier Inc. All rights reserved.
Bayesian Factor Analysis as a Variable Selection Problem: Alternative Priors and Consequences

PubMed Central

Lu, Zhao-Hua; Chow, Sy-Miin; Loken, Eric

2016-01-01

Factor analysis is a popular statistical technique for multivariate data analysis. Developments in the structural equation modeling framework have enabled the use of hybrid confirmatory/exploratory approaches in which factor loading structures can be explored relatively flexibly within a confirmatory factor analysis (CFA) framework. Recently, a Bayesian structural equation modeling (BSEM) approach (Muthén & Asparouhov, 2012) has been proposed as a way to explore the presence of cross-loadings in CFA models. We show that the issue of determining factor loading patterns may be formulated as a Bayesian variable selection problem in which Muthén and Asparouhov’s approach can be regarded as a BSEM approach with ridge regression prior (BSEM-RP). We propose another Bayesian approach, denoted herein as the Bayesian structural equation modeling with spike and slab prior (BSEM-SSP), which serves as a one-stage alternative to the BSEM-RP. We review the theoretical advantages and disadvantages of both approaches and compare their empirical performance relative to two modification indices-based approaches and exploratory factor analysis with target rotation. A teacher stress scale data set (Byrne, 2012; Pettegrew & Wolf, 1982) is used to demonstrate our approach. PMID:27314566

Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis

PubMed Central

Steele, Joe; Bastola, Dhundy

2014-01-01

Modern sequencing and genome assembly technologies have provided a wealth of data, which will soon require an analysis by comparison for discovery. Sequence alignment, a fundamental task in bioinformatics research, may be used but with some caveats. Seminal techniques and methods from dynamic programming are proving ineffective for this work owing to their inherent computational expense when processing large amounts of sequence data. These methods are prone to giving misleading information because of genetic recombination, genetic shuffling and other inherent biological events. New approaches from information theory, frequency analysis and data compression are available and provide powerful alternatives to dynamic programming. These new methods are often preferred, as their algorithms are simpler and are not affected by synteny-related problems. In this review, we provide a detailed discussion of computational tools, which stem from alignment-free methods based on statistical analysis from word frequencies. We provide several clear examples to demonstrate applications and the interpretations over several different areas of alignment-free analysis such as base–base correlations, feature frequency profiles, compositional vectors, an improved string composition and the D2 statistic metric. Additionally, we provide detailed discussion and an example of analysis by Lempel–Ziv techniques from data compression. PMID:23904502
Stochastic or statistic? Comparing flow duration curve models in ungauged basins and changing climates

NASA Astrophysics Data System (ADS)

Müller, M. F.; Thompson, S. E.

2015-09-01

The prediction of flow duration curves (FDCs) in ungauged basins remains an important task for hydrologists given the practical relevance of FDCs for water management and infrastructure design. Predicting FDCs in ungauged basins typically requires spatial interpolation of statistical or model parameters. This task is complicated if climate becomes non-stationary, as the prediction challenge now also requires extrapolation through time. In this context, process-based models for FDCs that mechanistically link the streamflow distribution to climate and landscape factors may have an advantage over purely statistical methods to predict FDCs. This study compares a stochastic (process-based) and statistical method for FDC prediction in both stationary and non-stationary contexts, using Nepal as a case study. Under contemporary conditions, both models perform well in predicting FDCs, with Nash-Sutcliffe coefficients above 0.80 in 75 % of the tested catchments. The main drives of uncertainty differ between the models: parameter interpolation was the main source of error for the statistical model, while violations of the assumptions of the process-based model represented the main source of its error. The process-based approach performed better than the statistical approach in numerical simulations with non-stationary climate drivers. The predictions of the statistical method under non-stationary rainfall conditions were poor if (i) local runoff coefficients were not accurately determined from the gauge network, or (ii) streamflow variability was strongly affected by changes in rainfall. A Monte Carlo analysis shows that the streamflow regimes in catchments characterized by a strong wet-season runoff and a rapid, strongly non-linear hydrologic response are particularly sensitive to changes in rainfall statistics. In these cases, process-based prediction approaches are strongly favored over statistical models.
Comparing statistical and process-based flow duration curve models in ungauged basins and changing rain regimes

NASA Astrophysics Data System (ADS)

Müller, M. F.; Thompson, S. E.

2016-02-01

The prediction of flow duration curves (FDCs) in ungauged basins remains an important task for hydrologists given the practical relevance of FDCs for water management and infrastructure design. Predicting FDCs in ungauged basins typically requires spatial interpolation of statistical or model parameters. This task is complicated if climate becomes non-stationary, as the prediction challenge now also requires extrapolation through time. In this context, process-based models for FDCs that mechanistically link the streamflow distribution to climate and landscape factors may have an advantage over purely statistical methods to predict FDCs. This study compares a stochastic (process-based) and statistical method for FDC prediction in both stationary and non-stationary contexts, using Nepal as a case study. Under contemporary conditions, both models perform well in predicting FDCs, with Nash-Sutcliffe coefficients above 0.80 in 75 % of the tested catchments. The main drivers of uncertainty differ between the models: parameter interpolation was the main source of error for the statistical model, while violations of the assumptions of the process-based model represented the main source of its error. The process-based approach performed better than the statistical approach in numerical simulations with non-stationary climate drivers. The predictions of the statistical method under non-stationary rainfall conditions were poor if (i) local runoff coefficients were not accurately determined from the gauge network, or (ii) streamflow variability was strongly affected by changes in rainfall. A Monte Carlo analysis shows that the streamflow regimes in catchments characterized by frequent wet-season runoff and a rapid, strongly non-linear hydrologic response are particularly sensitive to changes in rainfall statistics. In these cases, process-based prediction approaches are favored over statistical models.
Revisiting regional flood frequency analysis in Slovakia: the region-of-influence method vs. traditional regional approaches

NASA Astrophysics Data System (ADS)

Gaál, Ladislav; Kohnová, Silvia; Szolgay, Ján.

2010-05-01

During the last 10-15 years, the Slovak hydrologists and water resources managers have been devoting considerable efforts to develop statistical tools for modelling probabilities of flood occurrence in a regional context. Initially, these models followed concepts to regional flood frequency analysis that were based on fixed regions, later the Hosking and Wallis's (HW; 1997) theory was adopted and modified. Nevertheless, it turned out to be that delineating homogeneous regions using these approaches is not a straightforward task, mostly due to the complex orography of the country. In this poster we aim at revisiting flood frequency analyses so far accomplished for Slovakia by adopting one of the pooling approaches, i.e. the region-of-influence (ROI) approach (Burn, 1990). In the ROI approach, unique pooling groups of similar sites are defined for each site under study. The similarity of sites is defined through Euclidean distance in the space of site attributes that had also proved applicability in former cluster analyses: catchment area, afforested area, hydrogeological catchment index and the mean annual precipitation. The homogeneity of the proposed pooling groups is evaluated by the built-in homogeneity test by Lu and Stedinger (1992). Two alternatives of the ROI approach are examined: in the first one the target size of the pooling groups is adjusted to the target return period T of the estimated flood quantiles, while in the other one, the target size is fixed, regardless of the target T. The statistical models of the ROI approach are inter-compared by the conventional regionalization approach based on the HW methodology where the parameters of flood frequency distributions were derived by means of L-moment statistics and a regional formula for the estimation of the index flood was derived by multiple regression methods using physiographic and climatic catchment characteristics. The inter-comparison of different frequency models is evaluated by means of the root mean square error of data from Monte Carlo simulations. The analysis is based on the annual peak discharges from 168 small and mid-sized catchments from Slovakia. The study is supported by the Grant Agency of AS CR under project B300420801; the Slovak Research and Development Agency under the contract No. APVV-0443-07 and the Slovak VEGA Grant Agency under the project No. 1/0103/10. Burn, D.H., 1990: Evaluation of regional flood frequency analysis with a region of influence approach. Water Resources Research, 26(10), 2257-2265. Hosking, J.R.M., Wallis, J.R., 1997: Regional frequency analysis: an approach based on L-moments. Cambridge University Press, Cambridge. Lu, L.-H., Stedinger, J.R., 1992: Sampling variance of normalized GEV/PWM quantile estimators and a regional homogeneity test. Journal of Hydrology, 138(1-2), 223-245.
Work domain constraints for modelling surgical performance.

PubMed

Morineau, Thierry; Riffaud, Laurent; Morandi, Xavier; Villain, Jonathan; Jannin, Pierre

2015-10-01

Three main approaches can be identified for modelling surgical performance: a competency-based approach, a task-based approach, both largely explored in the literature, and a less known work domain-based approach. The work domain-based approach first describes the work domain properties that constrain the agent's actions and shape the performance. This paper presents a work domain-based approach for modelling performance during cervical spine surgery, based on the idea that anatomical structures delineate the surgical performance. This model was evaluated through an analysis of junior and senior surgeons' actions. Twenty-four cervical spine surgeries performed by two junior and two senior surgeons were recorded in real time by an expert surgeon. According to a work domain-based model describing an optimal progression through anatomical structures, the degree of adjustment of each surgical procedure to a statistical polynomial function was assessed. Each surgical procedure showed a significant suitability with the model and regression coefficient values around 0.9. However, the surgeries performed by senior surgeons fitted this model significantly better than those performed by junior surgeons. Analysis of the relative frequencies of actions on anatomical structures showed that some specific anatomical structures discriminate senior from junior performances. The work domain-based modelling approach can provide an overall statistical indicator of surgical performance, but in particular, it can highlight specific points of interest among anatomical structures that the surgeons dwelled on according to their level of expertise.
Assessment of pleiotropic transcriptome perturbations in Arabidopsis engineered for indirect insect defence.

PubMed

Houshyani, Benyamin; van der Krol, Alexander R; Bino, Raoul J; Bouwmeester, Harro J

2014-06-19

Molecular characterization is an essential step of risk/safety assessment of genetically modified (GM) crops. Holistic approaches for molecular characterization using omics platforms can be used to confirm the intended impact of the genetic engineering, but can also reveal the unintended changes at the omics level as a first assessment of potential risks. The potential of omics platforms for risk assessment of GM crops has rarely been used for this purpose because of the lack of a consensus reference and statistical methods to judge the significance or importance of the pleiotropic changes in GM plants. Here we propose a meta data analysis approach to the analysis of GM plants, by measuring the transcriptome distance to untransformed wild-types. In the statistical analysis of the transcriptome distance between GM and wild-type plants, values are compared with naturally occurring transcriptome distances in non-GM counterparts obtained from a database. Using this approach we show that the pleiotropic effect of genes involved in indirect insect defence traits is substantially equivalent to the variation in gene expression occurring naturally in Arabidopsis. Transcriptome distance is a useful screening method to obtain insight in the pleiotropic effects of genetic modification.
Complex networks as a unified framework for descriptive analysis and predictive modeling in climate

DOE Office of Scientific and Technical Information (OSTI.GOV)

Steinhaeuser, Karsten J K; Chawla, Nitesh; Ganguly, Auroop R

The analysis of climate data has relied heavily on hypothesis-driven statistical methods, while projections of future climate are based primarily on physics-based computational models. However, in recent years a wealth of new datasets has become available. Therefore, we take a more data-centric approach and propose a unified framework for studying climate, with an aim towards characterizing observed phenomena as well as discovering new knowledge in the climate domain. Specifically, we posit that complex networks are well-suited for both descriptive analysis and predictive modeling tasks. We show that the structural properties of climate networks have useful interpretation within the domain. Further,more » we extract clusters from these networks and demonstrate their predictive power as climate indices. Our experimental results establish that the network clusters are statistically significantly better predictors than clusters derived using a more traditional clustering approach. Using complex networks as data representation thus enables the unique opportunity for descriptive and predictive modeling to inform each other.« less
Pathway-GPS and SIGORA: identifying relevant pathways based on the over-representation of their gene-pair signatures

PubMed Central

Foroushani, Amir B.K.; Brinkman, Fiona S.L.

2013-01-01

Motivation. Predominant pathway analysis approaches treat pathways as collections of individual genes and consider all pathway members as equally informative. As a result, at times spurious and misleading pathways are inappropriately identified as statistically significant, solely due to components that they share with the more relevant pathways. Results. We introduce the concept of Pathway Gene-Pair Signatures (Pathway-GPS) as pairs of genes that, as a combination, are specific to a single pathway. We devised and implemented a novel approach to pathway analysis, Signature Over-representation Analysis (SIGORA), which focuses on the statistically significant enrichment of Pathway-GPS in a user-specified gene list of interest. In a comparative evaluation of several published datasets, SIGORA outperformed traditional methods by delivering biologically more plausible and relevant results. Availability. An efficient implementation of SIGORA, as an R package with precompiled GPS data for several human and mouse pathway repositories is available for download from http://sigora.googlecode.com/svn/. PMID:24432194
Analysis of Landsat-4 Thematic Mapper data for classification of forest stands in Baldwin County, Alabama

NASA Technical Reports Server (NTRS)

Hill, C. L.

1984-01-01

A computer-implemented classification has been derived from Landsat-4 Thematic Mapper data acquired over Baldwin County, Alabama on January 15, 1983. One set of spectral signatures was developed from the data by utilizing a 3x3 pixel sliding window approach. An analysis of the classification produced from this technique identified forested areas. Additional information regarding only the forested areas. Additional information regarding only the forested areas was extracted by employing a pixel-by-pixel signature development program which derived spectral statistics only for pixels within the forested land covers. The spectral statistics from both approaches were integrated and the data classified. This classification was evaluated by comparing the spectral classes produced from the data against corresponding ground verification polygons. This iterative data analysis technique resulted in an overall classification accuracy of 88.4 percent correct for slash pine, young pine, loblolly pine, natural pine, and mixed hardwood-pine. An accuracy assessment matrix has been produced for the classification.
Tables of square-law signal detection statistics for Hann spectra with 50 percent overlap

NASA Technical Reports Server (NTRS)

Deans, Stanley R.; Cullers, D. Kent

1991-01-01

The Search for Extraterrestrial Intelligence, currently being planned by NASA, will require that an enormous amount of data be analyzed in real time by special purpose hardware. It is expected that overlapped Hann data windows will play an important role in this analysis. In order to understand the statistical implication of this approach, it has been necessary to compute detection statistics for overlapped Hann spectra. Tables of signal detection statistics are given for false alarm rates from 10(exp -14) to 10(exp -1) and signal detection probabilities from 0.50 to 0.99; the number of computed spectra ranges from 4 to 2000.
Kurtosis Approach Nonlinear Blind Source Separation

NASA Technical Reports Server (NTRS)

Duong, Vu A.; Stubbemd, Allen R.

2005-01-01

In this paper, we introduce a new algorithm for blind source signal separation for post-nonlinear mixtures. The mixtures are assumed to be linearly mixed from unknown sources first and then distorted by memoryless nonlinear functions. The nonlinear functions are assumed to be smooth and can be approximated by polynomials. Both the coefficients of the unknown mixing matrix and the coefficients of the approximated polynomials are estimated by the gradient descent method conditional on the higher order statistical requirements. The results of simulation experiments presented in this paper demonstrate the validity and usefulness of our approach for nonlinear blind source signal separation Keywords: Independent Component Analysis, Kurtosis, Higher order statistics.
Development of LACIE CCEA-1 weather/wheat yield models. [regression analysis

NASA Technical Reports Server (NTRS)

Strommen, N. D.; Sakamoto, C. M.; Leduc, S. K.; Umberger, D. E. (Principal Investigator)

1979-01-01

The advantages and disadvantages of the casual (phenological, dynamic, physiological), statistical regression, and analog approaches to modeling for grain yield are examined. Given LACIE's primary goal of estimating wheat production for the large areas of eight major wheat-growing regions, the statistical regression approach of correlating historical yield and climate data offered the Center for Climatic and Environmental Assessment the greatest potential return within the constraints of time and data sources. The basic equation for the first generation wheat-yield model is given. Topics discussed include truncation, trend variable, selection of weather variables, episodic events, strata selection, operational data flow, weighting, and model results.
Analyzing Responses of Chemical Sensor Arrays

NASA Technical Reports Server (NTRS)

Zhou, Hanying

2007-01-01

NASA is developing a third-generation electronic nose (ENose) capable of continuous monitoring of the International Space Station s cabin atmosphere for specific, harmful airborne contaminants. Previous generations of the ENose have been described in prior NASA Tech Briefs issues. Sensor selection is critical in both (prefabrication) sensor material selection and (post-fabrication) data analysis of the ENose, which detects several analytes that are difficult to detect, or that are at very low concentration ranges. Existing sensor selection approaches usually include limited statistical measures, where selectivity is more important but reliability and sensitivity are not of concern. When reliability and sensitivity can be major limiting factors in detecting target compounds reliably, the existing approach is not able to provide meaningful selection that will actually improve data analysis results. The approach and software reported here consider more statistical measures (factors) than existing approaches for a similar purpose. The result is a more balanced and robust sensor selection from a less than ideal sensor array. The software offers quick, flexible, optimal sensor selection and weighting for a variety of purposes without a time-consuming, iterative search by performing sensor calibrations to a known linear or nonlinear model, evaluating the individual sensor s statistics, scoring the individual sensor s overall performance, finding the best sensor array size to maximize class separation, finding optimal weights for the remaining sensor array, estimating limits of detection for the target compounds, evaluating fingerprint distance between group pairs, and finding the best event-detecting sensors.
Where statistics and molecular microarray experiments biology meet.

PubMed

Kelmansky, Diana M

2013-01-01

This review chapter presents a statistical point of view to microarray experiments with the purpose of understanding the apparent contradictions that often appear in relation to their results. We give a brief introduction of molecular biology for nonspecialists. We describe microarray experiments from their construction and the biological principles the experiments rely on, to data acquisition and analysis. The role of epidemiological approaches and sample size considerations are also discussed.
Statistical inference of dynamic resting-state functional connectivity using hierarchical observation modeling.

PubMed

Sojoudi, Alireza; Goodyear, Bradley G

2016-12-01

Spontaneous fluctuations of blood-oxygenation level-dependent functional magnetic resonance imaging (BOLD fMRI) signals are highly synchronous between brain regions that serve similar functions. This provides a means to investigate functional networks; however, most analysis techniques assume functional connections are constant over time. This may be problematic in the case of neurological disease, where functional connections may be highly variable. Recently, several methods have been proposed to determine moment-to-moment changes in the strength of functional connections over an imaging session (so called dynamic connectivity). Here a novel analysis framework based on a hierarchical observation modeling approach was proposed, to permit statistical inference of the presence of dynamic connectivity. A two-level linear model composed of overlapping sliding windows of fMRI signals, incorporating the fact that overlapping windows are not independent was described. To test this approach, datasets were synthesized whereby functional connectivity was either constant (significant or insignificant) or modulated by an external input. The method successfully determines the statistical significance of a functional connection in phase with the modulation, and it exhibits greater sensitivity and specificity in detecting regions with variable connectivity, when compared with sliding-window correlation analysis. For real data, this technique possesses greater reproducibility and provides a more discriminative estimate of dynamic connectivity than sliding-window correlation analysis. Hum Brain Mapp 37:4566-4580, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
SAFER, an Analysis Method of Quantitative Proteomic Data, Reveals New Interactors of the C. elegans Autophagic Protein LGG-1.

PubMed

Yi, Zhou; Manil-Ségalen, Marion; Sago, Laila; Glatigny, Annie; Redeker, Virginie; Legouis, Renaud; Mucchielli-Giorgi, Marie-Hélène

2016-05-06

Affinity purifications followed by mass spectrometric analysis are used to identify protein-protein interactions. Because quantitative proteomic data are noisy, it is necessary to develop statistical methods to eliminate false-positives and identify true partners. We present here a novel approach for filtering false interactors, named "SAFER" for mass Spectrometry data Analysis by Filtering of Experimental Replicates, which is based on the reproducibility of the replicates and the fold-change of the protein intensities between bait and control. To identify regulators or targets of autophagy, we characterized the interactors of LGG1, a ubiquitin-like protein involved in autophagosome formation in C. elegans. LGG-1 partners were purified by affinity, analyzed by nanoLC-MS/MS mass spectrometry, and quantified by a label-free proteomic approach based on the mass spectrometric signal intensity of peptide precursor ions. Because the selection of confident interactions depends on the method used for statistical analysis, we compared SAFER with several statistical tests and different scoring algorithms on this set of data. We show that SAFER recovers high-confidence interactors that have been ignored by the other methods and identified new candidates involved in the autophagy process. We further validated our method on a public data set and conclude that SAFER notably improves the identification of protein interactors.
Objective definition of rosette shape variation using a combined computer vision and data mining approach.

PubMed

Camargo, Anyela; Papadopoulou, Dimitra; Spyropoulou, Zoi; Vlachonasios, Konstantinos; Doonan, John H; Gay, Alan P

2014-01-01

Computer-vision based measurements of phenotypic variation have implications for crop improvement and food security because they are intrinsically objective. It should be possible therefore to use such approaches to select robust genotypes. However, plants are morphologically complex and identification of meaningful traits from automatically acquired image data is not straightforward. Bespoke algorithms can be designed to capture and/or quantitate specific features but this approach is inflexible and is not generally applicable to a wide range of traits. In this paper, we have used industry-standard computer vision techniques to extract a wide range of features from images of genetically diverse Arabidopsis rosettes growing under non-stimulated conditions, and then used statistical analysis to identify those features that provide good discrimination between ecotypes. This analysis indicates that almost all the observed shape variation can be described by 5 principal components. We describe an easily implemented pipeline including image segmentation, feature extraction and statistical analysis. This pipeline provides a cost-effective and inherently scalable method to parameterise and analyse variation in rosette shape. The acquisition of images does not require any specialised equipment and the computer routines for image processing and data analysis have been implemented using open source software. Source code for data analysis is written using the R package. The equations to calculate image descriptors have been also provided.
When is hub gene selection better than standard meta-analysis?

PubMed

Langfelder, Peter; Mischel, Paul S; Horvath, Steve

2013-01-01

Since hub nodes have been found to play important roles in many networks, highly connected hub genes are expected to play an important role in biology as well. However, the empirical evidence remains ambiguous. An open question is whether (or when) hub gene selection leads to more meaningful gene lists than a standard statistical analysis based on significance testing when analyzing genomic data sets (e.g., gene expression or DNA methylation data). Here we address this question for the special case when multiple genomic data sets are available. This is of great practical importance since for many research questions multiple data sets are publicly available. In this case, the data analyst can decide between a standard statistical approach (e.g., based on meta-analysis) and a co-expression network analysis approach that selects intramodular hubs in consensus modules. We assess the performance of these two types of approaches according to two criteria. The first criterion evaluates the biological insights gained and is relevant in basic research. The second criterion evaluates the validation success (reproducibility) in independent data sets and often applies in clinical diagnostic or prognostic applications. We compare meta-analysis with consensus network analysis based on weighted correlation network analysis (WGCNA) in three comprehensive and unbiased empirical studies: (1) Finding genes predictive of lung cancer survival, (2) finding methylation markers related to age, and (3) finding mouse genes related to total cholesterol. The results demonstrate that intramodular hub gene status with respect to consensus modules is more useful than a meta-analysis p-value when identifying biologically meaningful gene lists (reflecting criterion 1). However, standard meta-analysis methods perform as good as (if not better than) a consensus network approach in terms of validation success (criterion 2). The article also reports a comparison of meta-analysis techniques applied to gene expression data and presents novel R functions for carrying out consensus network analysis, network based screening, and meta analysis.
Flexible statistical modelling detects clinical functional magnetic resonance imaging activation in partially compliant subjects.

PubMed

Waites, Anthony B; Mannfolk, Peter; Shaw, Marnie E; Olsrud, Johan; Jackson, Graeme D

2007-02-01

Clinical functional magnetic resonance imaging (fMRI) occasionally fails to detect significant activation, often due to variability in task performance. The present study seeks to test whether a more flexible statistical analysis can better detect activation, by accounting for variance associated with variable compliance to the task over time. Experimental results and simulated data both confirm that even at 80% compliance to the task, such a flexible model outperforms standard statistical analysis when assessed using the extent of activation (experimental data), goodness of fit (experimental data), and area under the operator characteristic curve (simulated data). Furthermore, retrospective examination of 14 clinical fMRI examinations reveals that in patients where the standard statistical approach yields activation, there is a measurable gain in model performance in adopting the flexible statistical model, with little or no penalty in lost sensitivity. This indicates that a flexible model should be considered, particularly for clinical patients who may have difficulty complying fully with the study task.
Statistical Methods for the Analysis of Discrete Choice Experiments: A Report of the ISPOR Conjoint Analysis Good Research Practices Task Force.

PubMed

Hauber, A Brett; González, Juan Marcos; Groothuis-Oudshoorn, Catharina G M; Prior, Thomas; Marshall, Deborah A; Cunningham, Charles; IJzerman, Maarten J; Bridges, John F P

2016-06-01

Conjoint analysis is a stated-preference survey method that can be used to elicit responses that reveal preferences, priorities, and the relative importance of individual features associated with health care interventions or services. Conjoint analysis methods, particularly discrete choice experiments (DCEs), have been increasingly used to quantify preferences of patients, caregivers, physicians, and other stakeholders. Recent consensus-based guidance on good research practices, including two recent task force reports from the International Society for Pharmacoeconomics and Outcomes Research, has aided in improving the quality of conjoint analyses and DCEs in outcomes research. Nevertheless, uncertainty regarding good research practices for the statistical analysis of data from DCEs persists. There are multiple methods for analyzing DCE data. Understanding the characteristics and appropriate use of different analysis methods is critical to conducting a well-designed DCE study. This report will assist researchers in evaluating and selecting among alternative approaches to conducting statistical analysis of DCE data. We first present a simplistic DCE example and a simple method for using the resulting data. We then present a pedagogical example of a DCE and one of the most common approaches to analyzing data from such a question format-conditional logit. We then describe some common alternative methods for analyzing these data and the strengths and weaknesses of each alternative. We present the ESTIMATE checklist, which includes a list of questions to consider when justifying the choice of analysis method, describing the analysis, and interpreting the results. Copyright © 2016 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.

mapDIA: Preprocessing and statistical analysis of quantitative proteomics data from data independent acquisition mass spectrometry.

PubMed

Teo, Guoshou; Kim, Sinae; Tsou, Chih-Chiang; Collins, Ben; Gingras, Anne-Claude; Nesvizhskii, Alexey I; Choi, Hyungwon

2015-11-03

Data independent acquisition (DIA) mass spectrometry is an emerging technique that offers more complete detection and quantification of peptides and proteins across multiple samples. DIA allows fragment-level quantification, which can be considered as repeated measurements of the abundance of the corresponding peptides and proteins in the downstream statistical analysis. However, few statistical approaches are available for aggregating these complex fragment-level data into peptide- or protein-level statistical summaries. In this work, we describe a software package, mapDIA, for statistical analysis of differential protein expression using DIA fragment-level intensities. The workflow consists of three major steps: intensity normalization, peptide/fragment selection, and statistical analysis. First, mapDIA offers normalization of fragment-level intensities by total intensity sums as well as a novel alternative normalization by local intensity sums in retention time space. Second, mapDIA removes outlier observations and selects peptides/fragments that preserve the major quantitative patterns across all samples for each protein. Last, using the selected fragments and peptides, mapDIA performs model-based statistical significance analysis of protein-level differential expression between specified groups of samples. Using a comprehensive set of simulation datasets, we show that mapDIA detects differentially expressed proteins with accurate control of the false discovery rates. We also describe the analysis procedure in detail using two recently published DIA datasets generated for 14-3-3β dynamic interaction network and prostate cancer glycoproteome. The software was written in C++ language and the source code is available for free through SourceForge website http://sourceforge.net/projects/mapdia/.This article is part of a Special Issue entitled: Computational Proteomics. Copyright © 2015 Elsevier B.V. All rights reserved.
Measuring the statistical validity of summary meta‐analysis and meta‐regression results for use in clinical practice

PubMed Central

Riley, Richard D.

2017-01-01

An important question for clinicians appraising a meta‐analysis is: are the findings likely to be valid in their own practice—does the reported effect accurately represent the effect that would occur in their own clinical population? To this end we advance the concept of statistical validity—where the parameter being estimated equals the corresponding parameter for a new independent study. Using a simple (‘leave‐one‐out’) cross‐validation technique, we demonstrate how we may test meta‐analysis estimates for statistical validity using a new validation statistic, Vn, and derive its distribution. We compare this with the usual approach of investigating heterogeneity in meta‐analyses and demonstrate the link between statistical validity and homogeneity. Using a simulation study, the properties of Vn and the Q statistic are compared for univariate random effects meta‐analysis and a tailored meta‐regression model, where information from the setting (included as model covariates) is used to calibrate the summary estimate to the setting of application. Their properties are found to be similar when there are 50 studies or more, but for fewer studies Vn has greater power but a higher type 1 error rate than Q. The power and type 1 error rate of Vn are also shown to depend on the within‐study variance, between‐study variance, study sample size, and the number of studies in the meta‐analysis. Finally, we apply Vn to two published meta‐analyses and conclude that it usefully augments standard methods when deciding upon the likely validity of summary meta‐analysis estimates in clinical practice. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:28620945
UNITY: Confronting Supernova Cosmology's Statistical and Systematic Uncertainties in a Unified Bayesian Framework

NASA Astrophysics Data System (ADS)

Rubin, D.; Aldering, G.; Barbary, K.; Boone, K.; Chappell, G.; Currie, M.; Deustua, S.; Fagrelius, P.; Fruchter, A.; Hayden, B.; Lidman, C.; Nordin, J.; Perlmutter, S.; Saunders, C.; Sofiatti, C.; Supernova Cosmology Project, The

2015-11-01

While recent supernova (SN) cosmology research has benefited from improved measurements, current analysis approaches are not statistically optimal and will prove insufficient for future surveys. This paper discusses the limitations of current SN cosmological analyses in treating outliers, selection effects, shape- and color-standardization relations, unexplained dispersion, and heterogeneous observations. We present a new Bayesian framework, called UNITY (Unified Nonlinear Inference for Type-Ia cosmologY), that incorporates significant improvements in our ability to confront these effects. We apply the framework to real SN observations and demonstrate smaller statistical and systematic uncertainties. We verify earlier results that SNe Ia require nonlinear shape and color standardizations, but we now include these nonlinear relations in a statistically well-justified way. This analysis was primarily performed blinded, in that the basic framework was first validated on simulated data before transitioning to real data. We also discuss possible extensions of the method.
An Investigation of Civilians Preparedness to Compete with Individuals with Military Experience for Army Board Select Acquisition Positions

DTIC Science & Technology

2017-05-25

37 Research Design ... research employed a mixed research methodology – quantitative with descriptive statistical analysis and qualitative with a thematic analysis approach...mixed research methodology – quantitative and qualitative, using interviews to collect the data. The interviews included demographic and open-ended
Effect Size Measure and Analysis of Single Subject Designs

ERIC Educational Resources Information Center

Swaminathan, Hariharan; Horner, Robert H.; Rogers, H. Jane; Sugai, George

2012-01-01

This study is aimed at addressing the criticisms that have been leveled at the currently available statistical procedures for analyzing single subject designs (SSD). One of the vexing problems in the analysis of SSD is in the assessment of the effect of intervention. Serial dependence notwithstanding, the linear model approach that has been…
Principle Component Analysis with Incomplete Data: A simulation of R pcaMethods package in Constructing an Environmental Quality Index with Missing Data

EPA Science Inventory

Missing data is a common problem in the application of statistical techniques. In principal component analysis (PCA), a technique for dimensionality reduction, incomplete data points are either discarded or imputed using interpolation methods. Such approaches are less valid when ...
Important Literature in Endocrinology: Citation Analysis and Historial Methodology.

ERIC Educational Resources Information Center

Hurt, C. D.

1982-01-01

Results of a study comparing two approaches to the identification of important literature in endocrinology reveals that association between ranking of cited items using the two methods is not statistically significant and use of citation or historical analysis alone will not result in same set of literature. Forty-two sources are appended. (EJS)
Multilevel Factor Analysis by Model Segregation: New Applications for Robust Test Statistics

ERIC Educational Resources Information Center

Schweig, Jonathan

2014-01-01

Measures of classroom environments have become central to policy efforts that assess school and teacher quality. This has sparked a wide interest in using multilevel factor analysis to test measurement hypotheses about classroom-level variables. One approach partitions the total covariance matrix and tests models separately on the…
Estimating annual bole biomass production using uncertainty analysis

Treesearch

Travis J. Woolley; Mark E. Harmon; Kari B. O' Connell

2007-01-01

Two common sampling methodologies coupled with a simple statistical model were evaluated to determine the accuracy and precision of annual bole biomass production (BBP) and inter-annual variability estimates using this type of approach. We performed an uncertainty analysis using Monte Carlo methods in conjunction with radial growth core data from trees in three Douglas...
A statistical analysis of energy and power demand for the tractive purposes of an electric vehicle in urban traffic - an analysis of a short and long observation period

NASA Astrophysics Data System (ADS)

Slaski, G.; Ohde, B.

2016-09-01

The article presents the results of a statistical dispersion analysis of an energy and power demand for tractive purposes of a battery electric vehicle. The authors compare data distribution for different values of an average speed in two approaches, namely a short and long period of observation. The short period of observation (generally around several hundred meters) results from a previously proposed macroscopic energy consumption model based on an average speed per road section. This approach yielded high values of standard deviation and coefficient of variation (the ratio between standard deviation and the mean) around 0.7-1.2. The long period of observation (about several kilometers long) is similar in length to standardized speed cycles used in testing a vehicle energy consumption and available range. The data were analysed to determine the impact of observation length on the energy and power demand variation. The analysis was based on a simulation of electric power and energy consumption performed with speed profiles data recorded in Poznan agglomeration.
Effects of music therapy for children and adolescents with psychopathology: a meta-analysis.

PubMed

Gold, Christian; Voracek, Martin; Wigram, Tony

2004-09-01

The objectives of this review were to examine the overall efficacy of music therapy for children and adolescents with psychopathology, and to examine how the size of the effect of music therapy is influenced by the type of pathology, client's age, music therapy approach, and type of outcome. Eleven studies were included for analysis, which resulted in a total of 188 subjects for the meta-analysis. Effect sizes from these studies were combined, with weighting for sample size, and their distribution was examined. After exclusion of an extreme positive outlying value, the analysis revealed that music therapy has a medium to large positive effect (ES =.61) on clinically relevant outcomes that was statistically highly significant (p <.001) and statistically homogeneous. No evidence of a publication bias was identified. Effects tended to be greater for behavioural and developmental disorders than for emotional disorders; greater for eclectic, psychodynamic, and humanistic approaches than for behavioural models; and greater for behavioural and developmental outcomes than for social skills and self-concept. Implications for clinical practice and research are discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Fangyan; Zhang, Song; Chung Wong, Pak

Effectively visualizing large graphs and capturing the statistical properties are two challenging tasks. To aid in these two tasks, many sampling approaches for graph simplification have been proposed, falling into three categories: node sampling, edge sampling, and traversal-based sampling. It is still unknown which approach is the best. We evaluate commonly used graph sampling methods through a combined visual and statistical comparison of graphs sampled at various rates. We conduct our evaluation on three graph models: random graphs, small-world graphs, and scale-free graphs. Initial results indicate that the effectiveness of a sampling method is dependent on the graph model, themore » size of the graph, and the desired statistical property. This benchmark study can be used as a guideline in choosing the appropriate method for a particular graph sampling task, and the results presented can be incorporated into graph visualization and analysis tools.« less
New advances in the statistical parton distributions approach

NASA Astrophysics Data System (ADS)

Soffer, Jacques; Bourrely, Claude

2016-03-01

The quantum statistical parton distributions approach proposed more than one decade ago is revisited by considering a larger set of recent and accurate Deep Inelastic Scattering experimental results. It enables us to improve the description of the data by means of a new determination of the parton distributions. This global next-to-leading order QCD analysis leads to a good description of several structure functions, involving unpolarized parton distributions and helicity distributions, in terms of a rather small number of free parameters. There are many serious challenging issues. The predictions of this theoretical approach will be tested for single-jet production and charge asymmetry in W± production in p¯p and pp collisions up to LHC energies, using recent data and also for forthcoming experimental results. Presented by J. So.er at POETIC 2015
Chemical modeling of groundwater in the Banat Plain, southwestern Romania, with elevated As content and co-occurring species by combining diagrams and unsupervised multivariate statistical approaches.

PubMed

Butaciu, Sinziana; Senila, Marin; Sarbu, Costel; Ponta, Michaela; Tanaselia, Claudiu; Cadar, Oana; Roman, Marius; Radu, Emil; Sima, Mihaela; Frentiu, Tiberiu

2017-04-01

The study proposes a combined model based on diagrams (Gibbs, Piper, Stuyfzand Hydrogeochemical Classification System) and unsupervised statistical approaches (Cluster Analysis, Principal Component Analysis, Fuzzy Principal Component Analysis, Fuzzy Hierarchical Cross-Clustering) to describe natural enrichment of inorganic arsenic and co-occurring species in groundwater in the Banat Plain, southwestern Romania. Speciation of inorganic As (arsenite, arsenate), ion concentrations (Na + , K + , Ca 2+ , Mg 2+ , HCO 3 - , Cl - , F - , SO 4 2- , PO 4 3- , NO 3 - ), pH, redox potential, conductivity and total dissolved substances were performed. Classical diagrams provided the hydrochemical characterization, while statistical approaches were helpful to establish (i) the mechanism of naturally occurring of As and F - species and the anthropogenic one for NO 3 - , SO 4 2- , PO 4 3- and K + and (ii) classification of groundwater based on content of arsenic species. The HCO 3 - type of local groundwater and alkaline pH (8.31-8.49) were found to be responsible for the enrichment of arsenic species and occurrence of F - but by different paths. The PO 4 3- -AsO 4 3- ion exchange, water-rock interaction (silicates hydrolysis and desorption from clay) were associated to arsenate enrichment in the oxidizing aquifer. Fuzzy Hierarchical Cross-Clustering was the strongest tool for the rapid simultaneous classification of groundwaters as a function of arsenic content and hydrogeochemical characteristics. The approach indicated the Na + -F - -pH cluster as marker for groundwater with naturally elevated As and highlighted which parameters need to be monitored. A chemical conceptual model illustrating the natural and anthropogenic paths and enrichment of As and co-occurring species in the local groundwater supported by mineralogical analysis of rocks was established. Copyright © 2016 Elsevier Ltd. All rights reserved.
Statistical models for the analysis and design of digital polymerase chain (dPCR) experiments

USGS Publications Warehouse

Dorazio, Robert; Hunter, Margaret

2015-01-01

Statistical methods for the analysis and design of experiments using digital PCR (dPCR) have received only limited attention and have been misused in many instances. To address this issue and to provide a more general approach to the analysis of dPCR data, we describe a class of statistical models for the analysis and design of experiments that require quantification of nucleic acids. These models are mathematically equivalent to generalized linear models of binomial responses that include a complementary, log–log link function and an offset that is dependent on the dPCR partition volume. These models are both versatile and easy to fit using conventional statistical software. Covariates can be used to specify different sources of variation in nucleic acid concentration, and a model’s parameters can be used to quantify the effects of these covariates. For purposes of illustration, we analyzed dPCR data from different types of experiments, including serial dilution, evaluation of copy number variation, and quantification of gene expression. We also showed how these models can be used to help design dPCR experiments, as in selection of sample sizes needed to achieve desired levels of precision in estimates of nucleic acid concentration or to detect differences in concentration among treatments with prescribed levels of statistical power.
Statistical Models for the Analysis and Design of Digital Polymerase Chain Reaction (dPCR) Experiments.

PubMed

Dorazio, Robert M; Hunter, Margaret E

2015-11-03

Statistical methods for the analysis and design of experiments using digital PCR (dPCR) have received only limited attention and have been misused in many instances. To address this issue and to provide a more general approach to the analysis of dPCR data, we describe a class of statistical models for the analysis and design of experiments that require quantification of nucleic acids. These models are mathematically equivalent to generalized linear models of binomial responses that include a complementary, log-log link function and an offset that is dependent on the dPCR partition volume. These models are both versatile and easy to fit using conventional statistical software. Covariates can be used to specify different sources of variation in nucleic acid concentration, and a model's parameters can be used to quantify the effects of these covariates. For purposes of illustration, we analyzed dPCR data from different types of experiments, including serial dilution, evaluation of copy number variation, and quantification of gene expression. We also showed how these models can be used to help design dPCR experiments, as in selection of sample sizes needed to achieve desired levels of precision in estimates of nucleic acid concentration or to detect differences in concentration among treatments with prescribed levels of statistical power.
Implementation of novel statistical procedures and other advanced approaches to improve analysis of CASA data.

PubMed

Ramón, M; Martínez-Pastor, F

2018-04-23

Computer-aided sperm analysis (CASA) produces a wealth of data that is frequently ignored. The use of multiparametric statistical methods can help explore these datasets, unveiling the subpopulation structure of sperm samples. In this review we analyse the significance of the internal heterogeneity of sperm samples and its relevance. We also provide a brief description of the statistical tools used for extracting sperm subpopulations from the datasets, namely unsupervised clustering (with non-hierarchical, hierarchical and two-step methods) and the most advanced supervised methods, based on machine learning. The former method has allowed exploration of subpopulation patterns in many species, whereas the latter offering further possibilities, especially considering functional studies and the practical use of subpopulation analysis. We also consider novel approaches, such as the use of geometric morphometrics or imaging flow cytometry. Finally, although the data provided by CASA systems provides valuable information on sperm samples by applying clustering analyses, there are several caveats. Protocols for capturing and analysing motility or morphometry should be standardised and adapted to each experiment, and the algorithms should be open in order to allow comparison of results between laboratories. Moreover, we must be aware of new technology that could change the paradigm for studying sperm motility and morphology.
Statistical mixture design and multivariate analysis of inkjet printed a-WO3/TiO2/WOX electrochromic films.

PubMed

Wojcik, Pawel Jerzy; Pereira, Luís; Martins, Rodrigo; Fortunato, Elvira

2014-01-13

An efficient mathematical strategy in the field of solution processed electrochromic (EC) films is outlined as a combination of an experimental work, modeling, and information extraction from massive computational data via statistical software. Design of Experiment (DOE) was used for statistical multivariate analysis and prediction of mixtures through a multiple regression model, as well as the optimization of a five-component sol-gel precursor subjected to complex constraints. This approach significantly reduces the number of experiments to be realized, from 162 in the full factorial (L=3) and 72 in the extreme vertices (D=2) approach down to only 30 runs, while still maintaining a high accuracy of the analysis. By carrying out a finite number of experiments, the empirical modeling in this study shows reasonably good prediction ability in terms of the overall EC performance. An optimized ink formulation was employed in a prototype of a passive EC matrix fabricated in order to test and trial this optically active material system together with a solid-state electrolyte for the prospective application in EC displays. Coupling of DOE with chromogenic material formulation shows the potential to maximize the capabilities of these systems and ensures increased productivity in many potential solution-processed electrochemical applications.
Environmental Health Practice: Statistically Based Performance Measurement

PubMed Central

Enander, Richard T.; Gagnon, Ronald N.; Hanumara, R. Choudary; Park, Eugene; Armstrong, Thomas; Gute, David M.

2007-01-01

Objectives. State environmental and health protection agencies have traditionally relied on a facility-by-facility inspection-enforcement paradigm to achieve compliance with government regulations. We evaluated the effectiveness of a new approach that uses a self-certification random sampling design. Methods. Comprehensive environmental and occupational health data from a 3-year statewide industry self-certification initiative were collected from representative automotive refinishing facilities located in Rhode Island. Statistical comparisons between baseline and postintervention data facilitated a quantitative evaluation of statewide performance. Results. The analysis of field data collected from 82 randomly selected automotive refinishing facilities showed statistically significant improvements (P<.05, Fisher exact test) in 4 major performance categories: occupational health and safety, air pollution control, hazardous waste management, and wastewater discharge. Statistical significance was also shown when a modified Bonferroni adjustment for multiple comparisons was performed. Conclusions. Our findings suggest that the new self-certification approach to environmental and worker protection is effective and can be used as an adjunct to further enhance state and federal enforcement programs. PMID:17267709
Assessing population exposure for landslide risk analysis using dasymetric cartography

NASA Astrophysics Data System (ADS)

Garcia, Ricardo A. C.; Oliveira, Sergio C.; Zezere, Jose L.

2015-04-01

Exposed Population is a major topic that needs to be taken into account in a full landslide risk analysis. Usually, risk analysis is based on an accounting of inhabitants number or inhabitants density, applied over statistical or administrative terrain units, such as NUTS or parishes. However, this kind of approach may skew the obtained results underestimating the importance of population, mainly in territorial units with predominance of rural occupation. Furthermore, the landslide susceptibility scores calculated for each terrain unit are frequently more detailed and accurate than the location of the exposed population inside each territorial unit based on Census data. These drawbacks are not the ideal setting when landslide risk analysis is performed for urban management and emergency planning. Dasymetric cartography, which uses a parameter or set of parameters to restrict the spatial distribution of a particular phenomenon, is a methodology that may help to enhance the resolution of Census data and therefore to give a more realistic representation of the population distribution. Therefore, this work aims to map and to compare the population distribution based on a traditional approach (population per administrative terrain units) and based on dasymetric cartography (population by building). The study is developed in the Region North of Lisbon using 2011 population data and following three main steps: i) the landslide susceptibility assessment based on statistical models independently validated; ii) the evaluation of population distribution (absolute and density) for different administrative territorial units (Parishes and BGRI - the basic statistical unit in the Portuguese Census); and iii) the dasymetric population's cartography based on building areal weighting. Preliminary results show that in sparsely populated administrative units, population density differs more than two times depending on the application of the traditional approach or the dasymetric cartography. This work was supported by the FCT - Portuguese Foundation for Science and Technology.

Power-law statistics of neurophysiological processes analyzed using short signals

NASA Astrophysics Data System (ADS)

Pavlova, Olga N.; Runnova, Anastasiya E.; Pavlov, Alexey N.

2018-04-01

We discuss the problem of quantifying power-law statistics of complex processes from short signals. Based on the analysis of electroencephalograms (EEG) we compare three interrelated approaches which enable characterization of the power spectral density (PSD) and show that an application of the detrended fluctuation analysis (DFA) or the wavelet-transform modulus maxima (WTMM) method represents a useful way of indirect characterization of the PSD features from short data sets. We conclude that despite DFA- and WTMM-based measures can be obtained from the estimated PSD, these tools outperform the standard spectral analysis when characterization of the analyzed regime should be provided based on a very limited amount of data.
Surveying Turkish high school and university students' attitudes and approaches to physics problem solving

NASA Astrophysics Data System (ADS)

Balta, Nuri; Mason, Andrew J.; Singh, Chandralekha

2016-06-01

Students' attitudes and approaches to physics problem solving can impact how well they learn physics and how successful they are in solving physics problems. Prior research in the U.S. using a validated Attitude and Approaches to Problem Solving (AAPS) survey suggests that there are major differences between students in introductory physics and astronomy courses and physics experts in terms of their attitudes and approaches to physics problem solving. Here we discuss the validation, administration, and analysis of data for the Turkish version of the AAPS survey for high school and university students in Turkey. After the validation and administration of the Turkish version of the survey, the analysis of the data was conducted by grouping the data by grade level, school type, and gender. While there are no statistically significant differences between the averages of various groups on the survey, overall, the university students in Turkey were more expertlike than vocational high school students. On an item by item basis, there are statistically differences between the averages of the groups on many items. For example, on average, the university students demonstrated less expertlike attitudes about the role of equations and formulas in problem solving, in solving difficult problems, and in knowing when the solution is not correct, whereas they displayed more expertlike attitudes and approaches on items related to metacognition in physics problem solving. A principal component analysis on the data yields item clusters into which the student responses on various survey items can be grouped. A comparison of the responses of the Turkish and American university students enrolled in algebra-based introductory physics courses shows that on more than half of the items, the responses of these two groups were statistically significantly different, with the U.S. students on average responding to the items in a more expertlike manner.
Community Learning Approach (CLA) for Literacy Promotion of Women in the Fishing Villages of Region I (Philippines).

ERIC Educational Resources Information Center

Manlongat, Sylvia

After an analysis of 1990 Philippines National Statistics Office data showed a high incidence of illiteracy among women in the fishing villages, a project, Community Learning Approach (CLA), was developed to raise the literacy level. It was designed as an alternative delivery system of educating women in 24 villages for functional literacy and…
Introducing linear functions: an alternative statistical approach

NASA Astrophysics Data System (ADS)

Nolan, Caroline; Herbert, Sandra

2015-12-01

The introduction of linear functions is the turning point where many students decide if mathematics is useful or not. This means the role of parameters and variables in linear functions could be considered to be `threshold concepts'. There is recognition that linear functions can be taught in context through the exploration of linear modelling examples, but this has its limitations. Currently, statistical data is easily attainable, and graphics or computer algebra system (CAS) calculators are common in many classrooms. The use of this technology provides ease of access to different representations of linear functions as well as the ability to fit a least-squares line for real-life data. This means these calculators could support a possible alternative approach to the introduction of linear functions. This study compares the results of an end-of-topic test for two classes of Australian middle secondary students at a regional school to determine if such an alternative approach is feasible. In this study, test questions were grouped by concept and subjected to concept by concept analysis of the means of test results of the two classes. This analysis revealed that the students following the alternative approach demonstrated greater competence with non-standard questions.
Principal Angle Enrichment Analysis (PAEA): Dimensionally Reduced Multivariate Gene Set Enrichment Analysis Tool

PubMed Central

Clark, Neil R.; Szymkiewicz, Maciej; Wang, Zichen; Monteiro, Caroline D.; Jones, Matthew R.; Ma’ayan, Avi

2016-01-01

Gene set analysis of differential expression, which identifies collectively differentially expressed gene sets, has become an important tool for biology. The power of this approach lies in its reduction of the dimensionality of the statistical problem and its incorporation of biological interpretation by construction. Many approaches to gene set analysis have been proposed, but benchmarking their performance in the setting of real biological data is difficult due to the lack of a gold standard. In a previously published work we proposed a geometrical approach to differential expression which performed highly in benchmarking tests and compared well to the most popular methods of differential gene expression. As reported, this approach has a natural extension to gene set analysis which we call Principal Angle Enrichment Analysis (PAEA). PAEA employs dimensionality reduction and a multivariate approach for gene set enrichment analysis. However, the performance of this method has not been assessed nor its implementation as a web-based tool. Here we describe new benchmarking protocols for gene set analysis methods and find that PAEA performs highly. The PAEA method is implemented as a user-friendly web-based tool, which contains 70 gene set libraries and is freely available to the community. PMID:26848405
Principal Angle Enrichment Analysis (PAEA): Dimensionally Reduced Multivariate Gene Set Enrichment Analysis Tool.

PubMed

Clark, Neil R; Szymkiewicz, Maciej; Wang, Zichen; Monteiro, Caroline D; Jones, Matthew R; Ma'ayan, Avi

2015-11-01

Gene set analysis of differential expression, which identifies collectively differentially expressed gene sets, has become an important tool for biology. The power of this approach lies in its reduction of the dimensionality of the statistical problem and its incorporation of biological interpretation by construction. Many approaches to gene set analysis have been proposed, but benchmarking their performance in the setting of real biological data is difficult due to the lack of a gold standard. In a previously published work we proposed a geometrical approach to differential expression which performed highly in benchmarking tests and compared well to the most popular methods of differential gene expression. As reported, this approach has a natural extension to gene set analysis which we call Principal Angle Enrichment Analysis (PAEA). PAEA employs dimensionality reduction and a multivariate approach for gene set enrichment analysis. However, the performance of this method has not been assessed nor its implementation as a web-based tool. Here we describe new benchmarking protocols for gene set analysis methods and find that PAEA performs highly. The PAEA method is implemented as a user-friendly web-based tool, which contains 70 gene set libraries and is freely available to the community.
Using statistical process control for monitoring the prevalence of hospital-acquired pressure ulcers.

PubMed

Kottner, Jan; Halfens, Ruud

2010-05-01

Institutionally acquired pressure ulcers are used as outcome indicators to assess the quality of pressure ulcer prevention programs. Determining whether quality improvement projects that aim to decrease the proportions of institutionally acquired pressure ulcers lead to real changes in clinical practice depends on the measurement method and statistical analysis used. To examine whether nosocomial pressure ulcer prevalence rates in hospitals in the Netherlands changed, a secondary data analysis using different statistical approaches was conducted of annual (1998-2008) nationwide nursing-sensitive health problem prevalence studies in the Netherlands. Institutions that participated regularly in all survey years were identified. Risk-adjusted nosocomial pressure ulcers prevalence rates, grade 2 to 4 (European Pressure Ulcer Advisory Panel system) were calculated per year and hospital. Descriptive statistics, chi-square trend tests, and P charts based on statistical process control (SPC) were applied and compared. Six of the 905 healthcare institutions participated in every survey year and 11,444 patients in these six hospitals were identified as being at risk for pressure ulcers. Prevalence rates per year ranged from 0.05 to 0.22. Chi-square trend tests revealed statistically significant downward trends in four hospitals but based on SPC methods, prevalence rates of five hospitals varied by chance only. Results of chi-square trend tests and SPC methods were not comparable, making it impossible to decide which approach is more appropriate. P charts provide more valuable information than single P values and are more helpful for monitoring institutional performance. Empirical evidence about the decrease of nosocomial pressure ulcer prevalence rates in the Netherlands is contradictory and limited.
Research Update: Spatially resolved mapping of electronic structure on atomic level by multivariate statistical analysis

DOE PAGES

Belianinov, Alex; Panchapakesan, G.; Lin, Wenzhi; ...

2014-12-02

Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe0.55Se0.45 (Tc = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe1 x Sex structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified by their electronic signaturemore » and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.« less
Research Update: Spatially resolved mapping of electronic structure on atomic level by multivariate statistical analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Belianinov, Alex, E-mail: belianinova@ornl.gov; Ganesh, Panchapakesan; Lin, Wenzhi

2014-12-01

Atomic level spatial variability of electronic structure in Fe-based superconductor FeTe{sub 0.55}Se{sub 0.45} (T{sub c} = 15 K) is explored using current-imaging tunneling-spectroscopy. Multivariate statistical analysis of the data differentiates regions of dissimilar electronic behavior that can be identified with the segregation of chalcogen atoms, as well as boundaries between terminations and near neighbor interactions. Subsequent clustering analysis allows identification of the spatial localization of these dissimilar regions. Similar statistical analysis of modeled calculated density of states of chemically inhomogeneous FeTe{sub 1−x}Se{sub x} structures further confirms that the two types of chalcogens, i.e., Te and Se, can be identified bymore » their electronic signature and differentiated by their local chemical environment. This approach allows detailed chemical discrimination of the scanning tunneling microscopy data including separation of atomic identities, proximity, and local configuration effects and can be universally applicable to chemically and electronically inhomogeneous surfaces.« less
Treatment of missing data in follow-up studies of randomised controlled trials: A systematic review of the literature.

PubMed

Sullivan, Thomas R; Yelland, Lisa N; Lee, Katherine J; Ryan, Philip; Salter, Amy B

2017-08-01

After completion of a randomised controlled trial, an extended follow-up period may be initiated to learn about longer term impacts of the intervention. Since extended follow-up studies often involve additional eligibility restrictions and consent processes for participation, and a longer duration of follow-up entails a greater risk of participant attrition, missing data can be a considerable threat in this setting. As a potential source of bias, it is critical that missing data are appropriately handled in the statistical analysis, yet little is known about the treatment of missing data in extended follow-up studies. The aims of this review were to summarise the extent of missing data in extended follow-up studies and the use of statistical approaches to address this potentially serious problem. We performed a systematic literature search in PubMed to identify extended follow-up studies published from January to June 2015. Studies were eligible for inclusion if the original randomised controlled trial results were also published and if the main objective of extended follow-up was to compare the original randomised groups. We recorded information on the extent of missing data and the approach used to treat missing data in the statistical analysis of the primary outcome of the extended follow-up study. Of the 81 studies included in the review, 36 (44%) reported additional eligibility restrictions and 24 (30%) consent processes for entry into extended follow-up. Data were collected at a median of 7 years after randomisation. Excluding 28 studies with a time to event primary outcome, 51/53 studies (96%) reported missing data on the primary outcome. The median percentage of randomised participants with complete data on the primary outcome was just 66% in these studies. The most common statistical approach to address missing data was complete case analysis (51% of studies), while likelihood-based analyses were also well represented (25%). Sensitivity analyses around the missing data mechanism were rarely performed (25% of studies), and when they were, they often involved unrealistic assumptions about the mechanism. Despite missing data being a serious problem in extended follow-up studies, statistical approaches to addressing missing data were often inadequate. We recommend researchers clearly specify all sources of missing data in follow-up studies and use statistical methods that are valid under a plausible assumption about the missing data mechanism. Sensitivity analyses should also be undertaken to assess the robustness of findings to assumptions about the missing data mechanism.
Multivariate analysis and geochemical approach for assessment of metal pollution state in sediment cores.

PubMed

Jamshidi-Zanjani, Ahmad; Saeedi, Mohsen

2017-07-01

Vertical distribution of metals (Cu, Zn, Cr, Fe, Mn, Pb, Ni, Cd, and Li) in four sediment core samples (C 1 , C 2 , C 3 , and C 4 ) from Anzali international wetland located southwest of the Caspian Sea was examined. Background concentration of each metal was calculated according to different statistical approaches. The results of multivariate statistical analysis showed that Fe and Mn might have significant role in the fate of Ni and Zn in sediment core samples. Different sediment quality indexes were utilized to assess metal pollution in sediment cores. Moreover, a new sediment quality index named aggregative toxicity index (ATI) based on sediment quality guidelines (SQGs) was developed to assess the degree of metal toxicity in an aggregative manner. The increasing pattern of metal pollution and their toxicity degree in upper layers of core samples indicated increasing effects of anthropogenic sources in the study area.
Navigational Traffic Conflict Technique: A Proactive Approach to Quantitative Measurement of Collision Risks in Port Waters

NASA Astrophysics Data System (ADS)

Debnath, Ashim Kumar; Chin, Hoong Chor

Navigational safety analysis relying on collision statistics is often hampered because of the low number of observations. A promising alternative approach that overcomes this problem is proposed in this paper. By analyzing critical vessel interactions this approach proactively measures collision risk in port waters. The proposed method is illustrated for quantitative measurement of collision risks in Singapore port fairways, and validated by examining correlations between the measured risks with those perceived by pilots. This method is an ethically appealing alternative to the collision-based analysis for fast, reliable and effective safety assessment, thus possessing great potential for managing collision risks in port waters.
The Statistical Consulting Center for Astronomy (SCCA)

NASA Technical Reports Server (NTRS)

Akritas, Michael

2001-01-01

The process by which raw astronomical data acquisition is transformed into scientifically meaningful results and interpretation typically involves many statistical steps. Traditional astronomy limits itself to a narrow range of old and familiar statistical methods: means and standard deviations; least-squares methods like chi(sup 2) minimization; and simple nonparametric procedures such as the Kolmogorov-Smirnov tests. These tools are often inadequate for the complex problems and datasets under investigations, and recent years have witnessed an increased usage of maximum-likelihood, survival analysis, multivariate analysis, wavelet and advanced time-series methods. The Statistical Consulting Center for Astronomy (SCCA) assisted astronomers with the use of sophisticated tools, and to match these tools with specific problems. The SCCA operated with two professors of statistics and a professor of astronomy working together. Questions were received by e-mail, and were discussed in detail with the questioner. Summaries of those questions and answers leading to new approaches were posted on the Web (www.state.psu.edu/ mga/SCCA). In addition to serving individual astronomers, the SCCA established a Web site for general use that provides hypertext links to selected on-line public-domain statistical software and services. The StatCodes site (www.astro.psu.edu/statcodes) provides over 200 links in the areas of: Bayesian statistics; censored and truncated data; correlation and regression, density estimation and smoothing, general statistics packages and information; image analysis; interactive Web tools; multivariate analysis; multivariate clustering and classification; nonparametric analysis; software written by astronomers; spatial statistics; statistical distributions; time series analysis; and visualization tools. StatCodes has received a remarkable high and constant hit rate of 250 hits/week (over 10,000/year) since its inception in mid-1997. It is of interest to scientists both within and outside of astronomy. The most popular sections are multivariate techniques, image analysis, and time series analysis. Hundreds of copies of the ASURV, SLOPES and CENS-TAU codes developed by SCCA scientists were also downloaded from the StatCodes site. In addition to formal SCCA duties, SCCA scientists continued a variety of related activities in astrostatistics, including refereeing of statistically oriented papers submitted to the Astrophysical Journal, talks in meetings including Feigelson's talk to science journalists entitled "The reemergence of astrostatistics" at the American Association for the Advancement of Science meeting, and published papers of astrostatistical content.
Addressing the statistical mechanics of planet orbits in the solar system

NASA Astrophysics Data System (ADS)

Mogavero, Federico

2017-10-01

The chaotic nature of planet dynamics in the solar system suggests the relevance of a statistical approach to planetary orbits. In such a statistical description, the time-dependent position and velocity of the planets are replaced by the probability density function (PDF) of their orbital elements. It is natural to set up this kind of approach in the framework of statistical mechanics. In the present paper, I focus on the collisionless excitation of eccentricities and inclinations via gravitational interactions in a planetary system. The future planet trajectories in the solar system constitute the prototype of this kind of dynamics. I thus address the statistical mechanics of the solar system planet orbits and try to reproduce the PDFs numerically constructed by Laskar (2008, Icarus, 196, 1). I show that the microcanonical ensemble of the Laplace-Lagrange theory accurately reproduces the statistics of the giant planet orbits. To model the inner planets I then investigate the ansatz of equiprobability in the phase space constrained by the secular integrals of motion. The eccentricity and inclination PDFs of Earth and Venus are reproduced with no free parameters. Within the limitations of a stationary model, the predictions also show a reasonable agreement with Mars PDFs and that of Mercury inclination. The eccentricity of Mercury demands in contrast a deeper analysis. I finally revisit the random walk approach of Laskar to the time dependence of the inner planet PDFs. Such a statistical theory could be combined with direct numerical simulations of planet trajectories in the context of planet formation, which is likely to be a chaotic process.
Comparative Analysis of Academic Grades in Compulsory Secondary Education in Spain Using Statistical Techniques

ERIC Educational Resources Information Center

Veas, Alejandro; Gilar, Raquel; Miñano, Pablo; Castejón, Juan Luis

2017-01-01

The present study, based on the construct comparability approach, performs a comparative analysis of general points average for seven courses, using exploratory factor analysis (EFA) and the Partial Credit model (PCM) with a sample of 1398 student subjects (M = 12.5, SD = 0.67) from 8 schools in the province of Alicante (Spain). EFA confirmed a…
Collaboration and Synergy among Government, Industry and Academia in M&S Domain: Turkey’s Approach

DTIC Science & Technology

2009-10-01

Analysis, Decision Support System Design and Implementation, Simulation Output Analysis, Statistical Data Analysis, Virtual Reality , Artificial... virtual and constructive visual simulation systems as well as integrated advanced analytical models. Collaboration and Synergy among Government...simulation systems that are ready to use, credible, integrated with C4ISR systems.  Creating synthetic environments and/or virtual prototypes of concepts
Asymptotic modal analysis and statistical energy analysis

NASA Technical Reports Server (NTRS)

Dowell, Earl H.

1988-01-01

Statistical Energy Analysis (SEA) is defined by considering the asymptotic limit of Classical Modal Analysis, an approach called Asymptotic Modal Analysis (AMA). The general approach is described for both structural and acoustical systems. The theoretical foundation is presented for structural systems, and experimental verification is presented for a structural plate responding to a random force. Work accomplished subsequent to the grant initiation focusses on the acoustic response of an interior cavity (i.e., an aircraft or spacecraft fuselage) with a portion of the wall vibrating in a large number of structural modes. First results were presented at the ASME Winter Annual Meeting in December, 1987, and accepted for publication in the Journal of Vibration, Acoustics, Stress and Reliability in Design. It is shown that asymptotically as the number of acoustic modes excited becomes large, the pressure level in the cavity becomes uniform except at the cavity boundaries. However, the mean square pressure at the cavity corner, edge and wall is, respectively, 8, 4, and 2 times the value in the cavity interior. Also it is shown that when the portion of the wall which is vibrating is near a cavity corner or edge, the response is significantly higher.
Review of Statistical Learning Methods in Integrated Omics Studies (An Integrated Information Science).

PubMed

Zeng, Irene Sui Lan; Lumley, Thomas

2018-01-01

Integrated omics is becoming a new channel for investigating the complex molecular system in modern biological science and sets a foundation for systematic learning for precision medicine. The statistical/machine learning methods that have emerged in the past decade for integrated omics are not only innovative but also multidisciplinary with integrated knowledge in biology, medicine, statistics, machine learning, and artificial intelligence. Here, we review the nontrivial classes of learning methods from the statistical aspects and streamline these learning methods within the statistical learning framework. The intriguing findings from the review are that the methods used are generalizable to other disciplines with complex systematic structure, and the integrated omics is part of an integrated information science which has collated and integrated different types of information for inferences and decision making. We review the statistical learning methods of exploratory and supervised learning from 42 publications. We also discuss the strengths and limitations of the extended principal component analysis, cluster analysis, network analysis, and regression methods. Statistical techniques such as penalization for sparsity induction when there are fewer observations than the number of features and using Bayesian approach when there are prior knowledge to be integrated are also included in the commentary. For the completeness of the review, a table of currently available software and packages from 23 publications for omics are summarized in the appendix.
A Novel Approach to Detect Accelerated Aged and Surface-Mediated Degradation in Explosives by UPLC-ESI-MS.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Beppler, Christina L

2015-12-01

A new approach was created for studying energetic material degradation. This approach involved detecting and tentatively identifying non-volatile chemical species by liquid chromatography-mass spectrometry (LC-MS) with multivariate statistical data analysis that form as the CL-20 energetic material thermally degraded. Multivariate data analysis showed clear separation and clustering of samples based on sample group: either pristine or aged material. Further analysis showed counter-clockwise trends in the principal components analysis (PCA), a type of multivariate data analysis, Scores plots. These trends may indicate that there was a discrete shift in the chemical markers as the went from pristine to aged material, andmore » then again when the aged CL-20 mixed with a potentially incompatible material was thermally aged for 4, 6, or 9 months. This new approach to studying energetic material degradation should provide greater knowledge of potential degradation markers in these materials.« less
Parametric and experimental analysis using a power flow approach

NASA Technical Reports Server (NTRS)

Cuschieri, J. M.

1988-01-01

Having defined and developed a structural power flow approach for the analysis of structure-borne transmission of structural vibrations, the technique is used to perform an analysis of the influence of structural parameters on the transmitted energy. As a base for comparison, the parametric analysis is first performed using a Statistical Energy Analysis approach and the results compared with those obtained using the power flow approach. The advantages of using structural power flow are thus demonstrated by comparing the type of results obtained by the two methods. Additionally, to demonstrate the advantages of using the power flow method and to show that the power flow results represent a direct physical parameter that can be measured on a typical structure, an experimental investigation of structural power flow is also presented. Results are presented for an L-shaped beam for which an analytical solution has already been obtained. Furthermore, the various methods available to measure vibrational power flow are compared to investigate the advantages and disadvantages of each method.

Valid Statistical Analysis for Logistic Regression with Multiple Sources

NASA Astrophysics Data System (ADS)

Fienberg, Stephen E.; Nardi, Yuval; Slavković, Aleksandra B.

Considerable effort has gone into understanding issues of privacy protection of individual information in single databases, and various solutions have been proposed depending on the nature of the data, the ways in which the database will be used and the precise nature of the privacy protection being offered. Once data are merged across sources, however, the nature of the problem becomes far more complex and a number of privacy issues arise for the linked individual files that go well beyond those that are considered with regard to the data within individual sources. In the paper, we propose an approach that gives full statistical analysis on the combined database without actually combining it. We focus mainly on logistic regression, but the method and tools described may be applied essentially to other statistical models as well.
Air Quality Forecasting through Different Statistical and Artificial Intelligence Techniques

NASA Astrophysics Data System (ADS)

Mishra, D.; Goyal, P.

2014-12-01

Urban air pollution forecasting has emerged as an acute problem in recent years because there are sever environmental degradation due to increase in harmful air pollutants in the ambient atmosphere. In this study, there are different types of statistical as well as artificial intelligence techniques are used for forecasting and analysis of air pollution over Delhi urban area. These techniques are principle component analysis (PCA), multiple linear regression (MLR) and artificial neural network (ANN) and the forecasting are observed in good agreement with the observed concentrations through Central Pollution Control Board (CPCB) at different locations in Delhi. But such methods suffers from disadvantages like they provide limited accuracy as they are unable to predict the extreme points i.e. the pollution maximum and minimum cut-offs cannot be determined using such approach. Also, such methods are inefficient approach for better output forecasting. But with the advancement in technology and research, an alternative to the above traditional methods has been proposed i.e. the coupling of statistical techniques with artificial Intelligence (AI) can be used for forecasting purposes. The coupling of PCA, ANN and fuzzy logic is used for forecasting of air pollutant over Delhi urban area. The statistical measures e.g., correlation coefficient (R), normalized mean square error (NMSE), fractional bias (FB) and index of agreement (IOA) of the proposed model are observed in better agreement with the all other models. Hence, the coupling of statistical and artificial intelligence can be use for the forecasting of air pollutant over urban area.
Robustness of Multiple Objective Decision Analysis Preference Functions

DTIC Science & Technology

2002-06-01

p p′ : The probability of some event. ,i ip q : The probability of event . i Π : An aggregation of proportional data used in calculating a test ...statistical tests of the significance of the term and also is conducted in a multivariate framework rather than the ROSA univariate approach. A...residual error is ˆ−e = y y (45) The coefficient provides a ready indicator of the contribution for the associated variable and statistical tests
Measuring Efficiency and Tradeoffs in Attainment of EEO Goals.

DTIC Science & Technology

1982-02-01

in FY78 and FY79. i.e., T9tese goals Are based on undifferentiated Civilian Labor Force (CLF) ratios required for reporting by the Equal Employment...Lewis and R.J. Niehaus, "Design and Development of Equal Employment Opportunity Human Resources Planning Models," NPDRC TR79--141 (San Diego: Navy...Approach to Analysis of Tradeoffs Among Household Ptoduction Outputs," American Statistical Association 1979 Proceedings of the Social Statistics Section
Paleomagnetism.org: An online multi-platform open source environment for paleomagnetic data analysis

NASA Astrophysics Data System (ADS)

Koymans, Mathijs R.; Langereis, Cor G.; Pastor-Galán, Daniel; van Hinsbergen, Douwe J. J.

2016-08-01

This contribution provides an overview of Paleomagnetism.org, an open-source, multi-platform online environment for paleomagnetic data analysis. Paleomagnetism.org provides an interactive environment where paleomagnetic data can be interpreted, evaluated, visualized, and exported. The Paleomagnetism.org application is split in to an interpretation portal, a statistics portal, and a portal for miscellaneous paleomagnetic tools. In the interpretation portal, principle component analysis can be performed on visualized demagnetization diagrams. Interpreted directions and great circles can be combined to find great circle solutions. These directions can be used in the statistics portal, or exported as data and figures. The tools in the statistics portal cover standard Fisher statistics for directions and VGPs, including other statistical parameters used as reliability criteria. Other available tools include an eigenvector approach foldtest, two reversal test including a Monte Carlo simulation on mean directions, and a coordinate bootstrap on the original data. An implementation is included for the detection and correction of inclination shallowing in sediments following TK03.GAD. Finally we provide a module to visualize VGPs and expected paleolatitudes, declinations, and inclinations relative to widely used global apparent polar wander path models in coordinates of major continent-bearing plates. The tools in the miscellaneous portal include a net tectonic rotation (NTR) analysis to restore a body to its paleo-vertical and a bootstrapped oroclinal test using linear regressive techniques, including a modified foldtest around a vertical axis. Paleomagnetism.org provides an integrated approach for researchers to work with visualized (e.g. hemisphere projections, Zijderveld diagrams) paleomagnetic data. The application constructs a custom exportable file that can be shared freely and included in public databases. This exported file contains all data and can later be imported to the application by other researchers. The accessibility and simplicity through which paleomagnetic data can be interpreted, analyzed, visualized, and shared makes Paleomagnetism.org of interest to the community.
An intelligent system based on fuzzy probabilities for medical diagnosis– a study in aphasia diagnosis*

PubMed Central

Moshtagh-Khorasani, Majid; Akbarzadeh-T, Mohammad-R; Jahangiri, Nader; Khoobdel, Mehdi

2009-01-01

BACKGROUND: Aphasia diagnosis is particularly challenging due to the linguistic uncertainty and vagueness, inconsistencies in the definition of aphasic syndromes, large number of measurements with imprecision, natural diversity and subjectivity in test objects as well as in opinions of experts who diagnose the disease. METHODS: Fuzzy probability is proposed here as the basic framework for handling the uncertainties in medical diagnosis and particularly aphasia diagnosis. To efficiently construct this fuzzy probabilistic mapping, statistical analysis is performed that constructs input membership functions as well as determines an effective set of input features. RESULTS: Considering the high sensitivity of performance measures to different distribution of testing/training sets, a statistical t-test of significance is applied to compare fuzzy approach results with NN results as well as author's earlier work using fuzzy logic. The proposed fuzzy probability estimator approach clearly provides better diagnosis for both classes of data sets. Specifically, for the first and second type of fuzzy probability classifiers, i.e. spontaneous speech and comprehensive model, P-values are 2.24E-08 and 0.0059, respectively, strongly rejecting the null hypothesis. CONCLUSIONS: The technique is applied and compared on both comprehensive and spontaneous speech test data for diagnosis of four Aphasia types: Anomic, Broca, Global and Wernicke. Statistical analysis confirms that the proposed approach can significantly improve accuracy using fewer Aphasia features. PMID:21772867
A Class of Population Covariance Matrices in the Bootstrap Approach to Covariance Structure Analysis

ERIC Educational Resources Information Center

Yuan, Ke-Hai; Hayashi, Kentaro; Yanagihara, Hirokazu

2007-01-01

Model evaluation in covariance structure analysis is critical before the results can be trusted. Due to finite sample sizes and unknown distributions of real data, existing conclusions regarding a particular statistic may not be applicable in practice. The bootstrap procedure automatically takes care of the unknown distribution and, for a given…
Reply to discussion: ground water response to forest harvest: implications or hillslope stability

Treesearch

Amod Dhakal; Roy C. Sidle; A.C. Johnson; R.T. Edwards

2008-01-01

Dhakal and Sidle (this volume) have requested clarification of some of the rationales and approaches used in analyses described by Johnson et al. (2007). Here we further describe hydrologic conditions typical of southeast Alaska and elaborate on an accepted methodology used for conducting analysis of covariance statistical analysis (ANCOVA). We discuss Dhakal and Sidle...
Prerequisites for Systems Analysts: Analytic and Management Demands of a New Approach to Educational Administration.

ERIC Educational Resources Information Center

Ammentorp, William

There is much to be gained by using systems analysis in educational administration. Most administrators, presently relying on classical statistical techniques restricted to problems having few variables, should be trained to use more sophisticated tools such as systems analysis. The systems analyst, interested in the basic processes of a group or…
Manipulating measurement scales in medical statistical analysis and data mining: A review of methodologies

PubMed Central

Marateb, Hamid Reza; Mansourian, Marjan; Adibi, Peyman; Farina, Dario

2014-01-01

Background: selecting the correct statistical test and data mining method depends highly on the measurement scale of data, type of variables, and purpose of the analysis. Different measurement scales are studied in details and statistical comparison, modeling, and data mining methods are studied based upon using several medical examples. We have presented two ordinal–variables clustering examples, as more challenging variable in analysis, using Wisconsin Breast Cancer Data (WBCD). Ordinal-to-Interval scale conversion example: a breast cancer database of nine 10-level ordinal variables for 683 patients was analyzed by two ordinal-scale clustering methods. The performance of the clustering methods was assessed by comparison with the gold standard groups of malignant and benign cases that had been identified by clinical tests. Results: the sensitivity and accuracy of the two clustering methods were 98% and 96%, respectively. Their specificity was comparable. Conclusion: by using appropriate clustering algorithm based on the measurement scale of the variables in the study, high performance is granted. Moreover, descriptive and inferential statistics in addition to modeling approach must be selected based on the scale of the variables. PMID:24672565
Face recognition using an enhanced independent component analysis approach.

PubMed

Kwak, Keun-Chang; Pedrycz, Witold

2007-03-01

This paper is concerned with an enhanced independent component analysis (ICA) and its application to face recognition. Typically, face representations obtained by ICA involve unsupervised learning and high-order statistics. In this paper, we develop an enhancement of the generic ICA by augmenting this method by the Fisher linear discriminant analysis (LDA); hence, its abbreviation, FICA. The FICA is systematically developed and presented along with its underlying architecture. A comparative analysis explores four distance metrics, as well as classification with support vector machines (SVMs). We demonstrate that the FICA approach leads to the formation of well-separated classes in low-dimension subspace and is endowed with a great deal of insensitivity to large variation in illumination and facial expression. The comprehensive experiments are completed for the facial-recognition technology (FERET) face database; a comparative analysis demonstrates that FICA comes with improved classification rates when compared with some other conventional approaches such as eigenface, fisherface, and the ICA itself.
Local coexistence of VO 2 phases revealed by deep data analysis

DOE PAGES

Strelcov, Evgheni; Ievlev, Anton; Tselev, Alexander; ...

2016-07-07

We report a synergistic approach of micro-Raman spectroscopic mapping and deep data analysis to study the distribution of crystallographic phases and ferroelastic domains in a defected Al-doped VO 2 microcrystal. Bayesian linear unmixing revealed an uneven distribution of the T phase, which is stabilized by the surface defects and uneven local doping that went undetectable by other classical analysis techniques such as PCA and SIMPLISMA. This work demonstrates the impact of information recovery via statistical analysis and full mapping in spectroscopic studies of vanadium dioxide systems, which is commonly substituted by averaging or single point-probing approaches, both of which suffermore » from information misinterpretation due to low resolving power.« less
A knowledge-based T2-statistic to perform pathway analysis for quantitative proteomic data

PubMed Central

Chen, Yi-Hau

2017-01-01

Approaches to identify significant pathways from high-throughput quantitative data have been developed in recent years. Still, the analysis of proteomic data stays difficult because of limited sample size. This limitation also leads to the practice of using a competitive null as common approach; which fundamentally implies genes or proteins as independent units. The independent assumption ignores the associations among biomolecules with similar functions or cellular localization, as well as the interactions among them manifested as changes in expression ratios. Consequently, these methods often underestimate the associations among biomolecules and cause false positives in practice. Some studies incorporate the sample covariance matrix into the calculation to address this issue. However, sample covariance may not be a precise estimation if the sample size is very limited, which is usually the case for the data produced by mass spectrometry. In this study, we introduce a multivariate test under a self-contained null to perform pathway analysis for quantitative proteomic data. The covariance matrix used in the test statistic is constructed by the confidence scores retrieved from the STRING database or the HitPredict database. We also design an integrating procedure to retain pathways of sufficient evidence as a pathway group. The performance of the proposed T2-statistic is demonstrated using five published experimental datasets: the T-cell activation, the cAMP/PKA signaling, the myoblast differentiation, and the effect of dasatinib on the BCR-ABL pathway are proteomic datasets produced by mass spectrometry; and the protective effect of myocilin via the MAPK signaling pathway is a gene expression dataset of limited sample size. Compared with other popular statistics, the proposed T2-statistic yields more accurate descriptions in agreement with the discussion of the original publication. We implemented the T2-statistic into an R package T2GA, which is available at https://github.com/roqe/T2GA. PMID:28622336
A knowledge-based T2-statistic to perform pathway analysis for quantitative proteomic data.

PubMed

Lai, En-Yu; Chen, Yi-Hau; Wu, Kun-Pin

2017-06-01

Approaches to identify significant pathways from high-throughput quantitative data have been developed in recent years. Still, the analysis of proteomic data stays difficult because of limited sample size. This limitation also leads to the practice of using a competitive null as common approach; which fundamentally implies genes or proteins as independent units. The independent assumption ignores the associations among biomolecules with similar functions or cellular localization, as well as the interactions among them manifested as changes in expression ratios. Consequently, these methods often underestimate the associations among biomolecules and cause false positives in practice. Some studies incorporate the sample covariance matrix into the calculation to address this issue. However, sample covariance may not be a precise estimation if the sample size is very limited, which is usually the case for the data produced by mass spectrometry. In this study, we introduce a multivariate test under a self-contained null to perform pathway analysis for quantitative proteomic data. The covariance matrix used in the test statistic is constructed by the confidence scores retrieved from the STRING database or the HitPredict database. We also design an integrating procedure to retain pathways of sufficient evidence as a pathway group. The performance of the proposed T2-statistic is demonstrated using five published experimental datasets: the T-cell activation, the cAMP/PKA signaling, the myoblast differentiation, and the effect of dasatinib on the BCR-ABL pathway are proteomic datasets produced by mass spectrometry; and the protective effect of myocilin via the MAPK signaling pathway is a gene expression dataset of limited sample size. Compared with other popular statistics, the proposed T2-statistic yields more accurate descriptions in agreement with the discussion of the original publication. We implemented the T2-statistic into an R package T2GA, which is available at https://github.com/roqe/T2GA.
Models of dyadic social interaction.

PubMed Central

Griffin, Dale; Gonzalez, Richard

2003-01-01

We discuss the logic of research designs for dyadic interaction and present statistical models with parameters that are tied to psychologically relevant constructs. Building on Karl Pearson's classic nineteenth-century statistical analysis of within-organism similarity, we describe several approaches to indexing dyadic interdependence and provide graphical methods for visualizing dyadic data. We also describe several statistical and conceptual solutions to the 'levels of analytic' problem in analysing dyadic data. These analytic strategies allow the researcher to examine and measure psychological questions of interdependence and social influence. We provide illustrative data from casually interacting and romantic dyads. PMID:12689382
DOE Office of Scientific and Technical Information (OSTI.GOV)

Kolker, Eugene

Our project focused primarily on analysis of different types of data produced by global high-throughput technologies, data integration of gene annotation, and gene and protein expression information, as well as on getting a better functional annotation of Shewanella genes. Specifically, four of our numerous major activities and achievements include the development of: statistical models for identification and expression proteomics, superior to currently available approaches (including our own earlier ones); approaches to improve gene annotations on the whole-organism scale; standards for annotation, transcriptomics and proteomics approaches; and generalized approaches for data integration of gene annotation, gene and protein expression information.
Assessing statistical differences between parameters estimates in Partial Least Squares path modeling.

PubMed

Rodríguez-Entrena, Macario; Schuberth, Florian; Gelhard, Carsten

2018-01-01

Structural equation modeling using partial least squares (PLS-SEM) has become a main-stream modeling approach in various disciplines. Nevertheless, prior literature still lacks a practical guidance on how to properly test for differences between parameter estimates. Whereas existing techniques such as parametric and non-parametric approaches in PLS multi-group analysis solely allow to assess differences between parameters that are estimated for different subpopulations, the study at hand introduces a technique that allows to also assess whether two parameter estimates that are derived from the same sample are statistically different. To illustrate this advancement to PLS-SEM, we particularly refer to a reduced version of the well-established technology acceptance model.
Sensitivity analysis of helicopter IMC decelerating steep approach and landing performance to navigation system parameters

NASA Technical Reports Server (NTRS)

Karmali, M. S.; Phatak, A. V.

1982-01-01

Results of a study to investigate, by means of a computer simulation, the performance sensitivity of helicopter IMC DSAL operations as a function of navigation system parameters are presented. A mathematical model representing generically a navigation system is formulated. The scenario simulated consists of a straight in helicopter approach to landing along a 6 deg glideslope. The deceleration magnitude chosen is 03g. The navigation model parameters are varied and the statistics of the total system errors (TSE) computed. These statistics are used to determine the critical navigation system parameters that affect the performance of the closed-loop navigation, guidance and control system of a UH-1H helicopter.
Probabilistic Modeling and Visualization of the Flexibility in Morphable Models

NASA Astrophysics Data System (ADS)

Lüthi, M.; Albrecht, T.; Vetter, T.

Statistical shape models, and in particular morphable models, have gained widespread use in computer vision, computer graphics and medical imaging. Researchers have started to build models of almost any anatomical structure in the human body. While these models provide a useful prior for many image analysis task, relatively little information about the shape represented by the morphable model is exploited. We propose a method for computing and visualizing the remaining flexibility, when a part of the shape is fixed. Our method, which is based on Probabilistic PCA, not only leads to an approach for reconstructing the full shape from partial information, but also allows us to investigate and visualize the uncertainty of a reconstruction. To show the feasibility of our approach we performed experiments on a statistical model of the human face and the femur bone. The visualization of the remaining flexibility allows for greater insight into the statistical properties of the shape.
A computational visual saliency model based on statistics and machine learning.

PubMed

Lin, Ru-Je; Lin, Wei-Song

2014-08-01

Identifying the type of stimuli that attracts human visual attention has been an appealing topic for scientists for many years. In particular, marking the salient regions in images is useful for both psychologists and many computer vision applications. In this paper, we propose a computational approach for producing saliency maps using statistics and machine learning methods. Based on four assumptions, three properties (Feature-Prior, Position-Prior, and Feature-Distribution) can be derived and combined by a simple intersection operation to obtain a saliency map. These properties are implemented by a similarity computation, support vector regression (SVR) technique, statistical analysis of training samples, and information theory using low-level features. This technique is able to learn the preferences of human visual behavior while simultaneously considering feature uniqueness. Experimental results show that our approach performs better in predicting human visual attention regions than 12 other models in two test databases. © 2014 ARVO.

A d-statistic for single-case designs that is equivalent to the usual between-groups d-statistic.

PubMed

Shadish, William R; Hedges, Larry V; Pustejovsky, James E; Boyajian, Jonathan G; Sullivan, Kristynn J; Andrade, Alma; Barrientos, Jeannette L

2014-01-01

We describe a standardised mean difference statistic (d) for single-case designs that is equivalent to the usual d in between-groups experiments. We show how it can be used to summarise treatment effects over cases within a study, to do power analyses in planning new studies and grant proposals, and to meta-analyse effects across studies of the same question. We discuss limitations of this d-statistic, and possible remedies to them. Even so, this d-statistic is better founded statistically than other effect size measures for single-case design, and unlike many general linear model approaches such as multilevel modelling or generalised additive models, it produces a standardised effect size that can be integrated over studies with different outcome measures. SPSS macros for both effect size computation and power analysis are available.
Comparing meta-analysis and ecological-longitudinal analysis in time-series studies. A case study of the effects of air pollution on mortality in three Spanish cities.

PubMed

Saez, M; Figueiras, A; Ballester, F; Pérez-Hoyos, S; Ocaña, R; Tobías, A

2001-06-01

The objective of this paper is to introduce a different approach, called the ecological-longitudinal, to carrying out pooled analysis in time series ecological studies. Because it gives a larger number of data points and, hence, increases the statistical power of the analysis, this approach, unlike conventional ones, allows the complementation of aspects such as accommodation of random effect models, of lags, of interaction between pollutants and between pollutants and meteorological variables, that are hardly implemented in conventional approaches. The approach is illustrated by providing quantitative estimates of the short-term effects of air pollution on mortality in three Spanish cities, Barcelona, Valencia and Vigo, for the period 1992-1994. Because the dependent variable was a count, a Poisson generalised linear model was first specified. Several modelling issues are worth mentioning. Firstly, because the relations between mortality and explanatory variables were non-linear, cubic splines were used for covariate control, leading to a generalised additive model, GAM. Secondly, the effects of the predictors on the response were allowed to occur with some lag. Thirdly, the residual autocorrelation, because of imperfect control, was controlled for by means of an autoregressive Poisson GAM. Finally, the longitudinal design demanded the consideration of the existence of individual heterogeneity, requiring the consideration of mixed models. The estimates of the relative risks obtained from the individual analyses varied across cities, particularly those associated with sulphur dioxide. The highest relative risks corresponded to black smoke in Valencia. These estimates were higher than those obtained from the ecological-longitudinal analysis. Relative risks estimated from this latter analysis were practically identical across cities, 1.00638 (95% confidence intervals 1.0002, 1.0011) for a black smoke increase of 10 microg/m(3) and 1.00415 (95% CI 1.0001, 1.0007) for a increase of 10 microg/m(3) of sulphur dioxide. Because the statistical power is higher than in the individual analysis more interactions were statistically significant, especially those among air pollutants and meteorological variables. Air pollutant levels were related to mortality in the three cities of the study, Barcelona, Valencia and Vigo. These results were consistent with similar studies in other cities, with other multicentric studies and coherent with both, previous individual, for each city, and multicentric studies for all three cities.
Decision Process to Identify Lessons for Transition to a Distributed (or Blended) Learning Instructional Format

DTIC Science & Technology

2009-09-01

instructional format. Using a mixed- method coding and analysis approach, the sample of POIs were categorized, coded, statistically analyzed, and a... Method SECURITY CLASSIFICATION OF 19. LIMITATION OF 20. NUMBER 21. RESPONSIBLE PERSON 16. REPORT Unclassified 17. ABSTRACT...transition to a distributed (or blended) learning format. Procedure: A mixed- methods approach, combining qualitative coding procedures with basic
Intelligent Systems Approaches to Product Sound Quality Analysis

NASA Astrophysics Data System (ADS)

Pietila, Glenn M.

As a product market becomes more competitive, consumers become more discriminating in the way in which they differentiate between engineered products. The consumer often makes a purchasing decision based on the sound emitted from the product during operation by using the sound to judge quality or annoyance. Therefore, in recent years, many sound quality analysis tools have been developed to evaluate the consumer preference as it relates to a product sound and to quantify this preference based on objective measurements. This understanding can be used to direct a product design process in order to help differentiate the product from competitive products or to establish an impression on consumers regarding a product's quality or robustness. The sound quality process is typically a statistical tool that is used to model subjective preference, or merit score, based on objective measurements, or metrics. In this way, new product developments can be evaluated in an objective manner without the laborious process of gathering a sample population of consumers for subjective studies each time. The most common model used today is the Multiple Linear Regression (MLR), although recently non-linear Artificial Neural Network (ANN) approaches are gaining popularity. This dissertation will review publicly available published literature and present additional intelligent systems approaches that can be used to improve on the current sound quality process. The focus of this work is to address shortcomings in the current paired comparison approach to sound quality analysis. This research will propose a framework for an adaptive jury analysis approach as an alternative to the current Bradley-Terry model. The adaptive jury framework uses statistical hypothesis testing to focus on sound pairings that are most interesting and is expected to address some of the restrictions required by the Bradley-Terry model. It will also provide a more amicable framework for an intelligent systems approach. Next, an unsupervised jury clustering algorithm is used to identify and classify subgroups within a jury who have conflicting preferences. In addition, a nested Artificial Neural Network (ANN) architecture is developed to predict subjective preference based on objective sound quality metrics, in the presence of non-linear preferences. Finally, statistical decomposition and correlation algorithms are reviewed that can help an analyst establish a clear understanding of the variability of the product sounds used as inputs into the jury study and to identify correlations between preference scores and sound quality metrics in the presence of non-linearities.
Multiple imputation for IPD meta-analysis: allowing for heterogeneity and studies with missing covariates.

PubMed

Quartagno, M; Carpenter, J R

2016-07-30

Recently, multiple imputation has been proposed as a tool for individual patient data meta-analysis with sporadically missing observations, and it has been suggested that within-study imputation is usually preferable. However, such within study imputation cannot handle variables that are completely missing within studies. Further, if some of the contributing studies are relatively small, it may be appropriate to share information across studies when imputing. In this paper, we develop and evaluate a joint modelling approach to multiple imputation of individual patient data in meta-analysis, with an across-study probability distribution for the study specific covariance matrices. This retains the flexibility to allow for between-study heterogeneity when imputing while allowing (i) sharing information on the covariance matrix across studies when this is appropriate, and (ii) imputing variables that are wholly missing from studies. Simulation results show both equivalent performance to the within-study imputation approach where this is valid, and good results in more general, practically relevant, scenarios with studies of very different sizes, non-negligible between-study heterogeneity and wholly missing variables. We illustrate our approach using data from an individual patient data meta-analysis of hypertension trials. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Evaluation of peak-picking algorithms for protein mass spectrometry.

PubMed

Bauer, Chris; Cramer, Rainer; Schuchhardt, Johannes

2011-01-01

Peak picking is an early key step in MS data analysis. We compare three commonly used approaches to peak picking and discuss their merits by means of statistical analysis. Methods investigated encompass signal-to-noise ratio, continuous wavelet transform, and a correlation-based approach using a Gaussian template. Functionality of the three methods is illustrated and discussed in a practical context using a mass spectral data set created with MALDI-TOF technology. Sensitivity and specificity are investigated using a manually defined reference set of peaks. As an additional criterion, the robustness of the three methods is assessed by a perturbation analysis and illustrated using ROC curves.
ClinicAl Evaluation of Dental Restorative Materials

DTIC Science & Technology

1989-01-01

use of an Atuarial Life Table Survival Analysis procedure. The median survival time for anterior composites was 13.5 years, as compared to 12.1 years...dental materials. For the first time in clinical biomaterials research, we used a statistical approach of Survival Analysis which utilized the... analysis has been established to assure uniformity in usage. This scale is now in use by clinical investigators throughout the country. Its use at the
Breath Analysis as a Potential and Non-Invasive Frontier in Disease Diagnosis: An Overview

PubMed Central

Pereira, Jorge; Porto-Figueira, Priscilla; Cavaco, Carina; Taunk, Khushman; Rapole, Srikanth; Dhakne, Rahul; Nagarajaram, Hampapathalu; Câmara, José S.

2015-01-01

Currently, a small number of diseases, particularly cardiovascular (CVDs), oncologic (ODs), neurodegenerative (NDDs), chronic respiratory diseases, as well as diabetes, form a severe burden to most of the countries worldwide. Hence, there is an urgent need for development of efficient diagnostic tools, particularly those enabling reliable detection of diseases, at their early stages, preferably using non-invasive approaches. Breath analysis is a non-invasive approach relying only on the characterisation of volatile composition of the exhaled breath (EB) that in turn reflects the volatile composition of the bloodstream and airways and therefore the status and condition of the whole organism metabolism. Advanced sampling procedures (solid-phase and needle traps microextraction) coupled with modern analytical technologies (proton transfer reaction mass spectrometry, selected ion flow tube mass spectrometry, ion mobility spectrometry, e-noses, etc.) allow the characterisation of EB composition to an unprecedented level. However, a key challenge in EB analysis is the proper statistical analysis and interpretation of the large and heterogeneous datasets obtained from EB research. There is no standard statistical framework/protocol yet available in literature that can be used for EB data analysis towards discovery of biomarkers for use in a typical clinical setup. Nevertheless, EB analysis has immense potential towards development of biomarkers for the early disease diagnosis of diseases. PMID:25584743
Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis.

PubMed

Bonham-Carter, Oliver; Steele, Joe; Bastola, Dhundy

2014-11-01

Modern sequencing and genome assembly technologies have provided a wealth of data, which will soon require an analysis by comparison for discovery. Sequence alignment, a fundamental task in bioinformatics research, may be used but with some caveats. Seminal techniques and methods from dynamic programming are proving ineffective for this work owing to their inherent computational expense when processing large amounts of sequence data. These methods are prone to giving misleading information because of genetic recombination, genetic shuffling and other inherent biological events. New approaches from information theory, frequency analysis and data compression are available and provide powerful alternatives to dynamic programming. These new methods are often preferred, as their algorithms are simpler and are not affected by synteny-related problems. In this review, we provide a detailed discussion of computational tools, which stem from alignment-free methods based on statistical analysis from word frequencies. We provide several clear examples to demonstrate applications and the interpretations over several different areas of alignment-free analysis such as base-base correlations, feature frequency profiles, compositional vectors, an improved string composition and the D2 statistic metric. Additionally, we provide detailed discussion and an example of analysis by Lempel-Ziv techniques from data compression. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Breath analysis as a potential and non-invasive frontier in disease diagnosis: an overview.

PubMed

Pereira, Jorge; Porto-Figueira, Priscilla; Cavaco, Carina; Taunk, Khushman; Rapole, Srikanth; Dhakne, Rahul; Nagarajaram, Hampapathalu; Câmara, José S

2015-01-09

Currently, a small number of diseases, particularly cardiovascular (CVDs), oncologic (ODs), neurodegenerative (NDDs), chronic respiratory diseases, as well as diabetes, form a severe burden to most of the countries worldwide. Hence, there is an urgent need for development of efficient diagnostic tools, particularly those enabling reliable detection of diseases, at their early stages, preferably using non-invasive approaches. Breath analysis is a non-invasive approach relying only on the characterisation of volatile composition of the exhaled breath (EB) that in turn reflects the volatile composition of the bloodstream and airways and therefore the status and condition of the whole organism metabolism. Advanced sampling procedures (solid-phase and needle traps microextraction) coupled with modern analytical technologies (proton transfer reaction mass spectrometry, selected ion flow tube mass spectrometry, ion mobility spectrometry, e-noses, etc.) allow the characterisation of EB composition to an unprecedented level. However, a key challenge in EB analysis is the proper statistical analysis and interpretation of the large and heterogeneous datasets obtained from EB research. There is no standard statistical framework/protocol yet available in literature that can be used for EB data analysis towards discovery of biomarkers for use in a typical clinical setup. Nevertheless, EB analysis has immense potential towards development of biomarkers for the early disease diagnosis of diseases.
How to Perform a Systematic Review and Meta-analysis of Diagnostic Imaging Studies.

PubMed

Cronin, Paul; Kelly, Aine Marie; Altaee, Duaa; Foerster, Bradley; Petrou, Myria; Dwamena, Ben A

2018-05-01

A systematic review is a comprehensive search, critical evaluation, and synthesis of all the relevant studies on a specific (clinical) topic that can be applied to the evaluation of diagnostic and screening imaging studies. It can be a qualitative or a quantitative (meta-analysis) review of available literature. A meta-analysis uses statistical methods to combine and summarize the results of several studies. In this review, a 12-step approach to performing a systematic review (and meta-analysis) is outlined under the four domains: (1) Problem Formulation and Data Acquisition, (2) Quality Appraisal of Eligible Studies, (3) Statistical Analysis of Quantitative Data, and (4) Clinical Interpretation of the Evidence. This review is specifically geared toward the performance of a systematic review and meta-analysis of diagnostic test accuracy (imaging) studies. Copyright © 2018 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.
Evaluation of the prediction precision capability of partial least squares regression approach for analysis of high alloy steel by laser induced breakdown spectroscopy

NASA Astrophysics Data System (ADS)

Sarkar, Arnab; Karki, Vijay; Aggarwal, Suresh K.; Maurya, Gulab S.; Kumar, Rohit; Rai, Awadhesh K.; Mao, Xianglei; Russo, Richard E.

2015-06-01

Laser induced breakdown spectroscopy (LIBS) was applied for elemental characterization of high alloy steel using partial least squares regression (PLSR) with an objective to evaluate the analytical performance of this multivariate approach. The optimization of the number of principle components for minimizing error in PLSR algorithm was investigated. The effect of different pre-treatment procedures on the raw spectral data before PLSR analysis was evaluated based on several statistical (standard error of prediction, percentage relative error of prediction etc.) parameters. The pre-treatment with "NORM" parameter gave the optimum statistical results. The analytical performance of PLSR model improved by increasing the number of laser pulses accumulated per spectrum as well as by truncating the spectrum to appropriate wavelength region. It was found that the statistical benefit of truncating the spectrum can also be accomplished by increasing the number of laser pulses per accumulation without spectral truncation. The constituents (Co and Mo) present in hundreds of ppm were determined with relative precision of 4-9% (2σ), whereas the major constituents Cr and Ni (present at a few percent levels) were determined with a relative precision of ~ 2%(2σ).
Modelling and analysis of FMS productivity variables by ISM, SEM and GTMA approach

NASA Astrophysics Data System (ADS)

Jain, Vineet; Raj, Tilak

2014-09-01

Productivity has often been cited as a key factor in a flexible manufacturing system (FMS) performance, and actions to increase it are said to improve profitability and the wage earning capacity of employees. Improving productivity is seen as a key issue for survival and success in the long term of a manufacturing system. The purpose of this paper is to make a model and analysis of the productivity variables of FMS. This study was performed by different approaches viz. interpretive structural modelling (ISM), structural equation modelling (SEM), graph theory and matrix approach (GTMA) and a cross-sectional survey within manufacturing firms in India. ISM has been used to develop a model of productivity variables, and then it has been analyzed. Exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) are powerful statistical techniques. CFA is carried by SEM. EFA is applied to extract the factors in FMS by the statistical package for social sciences (SPSS 20) software and confirming these factors by CFA through analysis of moment structures (AMOS 20) software. The twenty productivity variables are identified through literature and four factors extracted, which involves the productivity of FMS. The four factors are people, quality, machine and flexibility. SEM using AMOS 20 was used to perform the first order four-factor structures. GTMA is a multiple attribute decision making (MADM) methodology used to find intensity/quantification of productivity variables in an organization. The FMS productivity index has purposed to intensify the factors which affect FMS.
Multivariate statistical analysis of a high rate biofilm process treating kraft mill bleach plant effluent.

PubMed

Goode, C; LeRoy, J; Allen, D G

2007-01-01

This study reports on a multivariate analysis of the moving bed biofilm reactor (MBBR) wastewater treatment system at a Canadian pulp mill. The modelling approach involved a data overview by principal component analysis (PCA) followed by partial least squares (PLS) modelling with the objective of explaining and predicting changes in the BOD output of the reactor. Over two years of data with 87 process measurements were used to build the models. Variables were collected from the MBBR control scheme as well as upstream in the bleach plant and in digestion. To account for process dynamics, a variable lagging approach was used for variables with significant temporal correlations. It was found that wood type pulped at the mill was a significant variable governing reactor performance. Other important variables included flow parameters, faults in the temperature or pH control of the reactor, and some potential indirect indicators of biomass activity (residual nitrogen and pH out). The most predictive model was found to have an RMSEP value of 606 kgBOD/d, representing a 14.5% average error. This was a good fit, given the measurement error of the BOD test. Overall, the statistical approach was effective in describing and predicting MBBR treatment performance.
Integrality in cervical cancer care: evaluation of access

PubMed Central

Brito-Silva, Keila; Bezerra, Adriana Falangola Benjamin; Chaves, Lucieli Dias Pedreschi; Tanaka, Oswaldo Yoshimi

2014-01-01

OBJECTIVE To evaluate integrity of access to uterine cervical cancer prevention, diagnosis and treatment services. METHODS The tracer condition was analyzed using a mixed quantitative and qualitative approach. The quantitative approach was based on secondary data from the analysis of cytology and biopsy exams performed between 2008 and 2010 on 25 to 59 year-old women in a municipality with a large population and with the necessary technological resources. Data were obtained from the Health Information System and the Regional Cervical Cancer Information System. Statistical analysis was performed using PASW statistic 17.0 software. The qualitative approach involved semi-structured interviews with service managers, health care professionals and users. NVivo 9.0 software was used for the content analysis of the primary data. RESULTS Pap smear coverage was low, possible due to insufficient screening and the difficulty of making appointments in primary care. The numbers of biopsies conducted are similar to those of abnormal cytologies, reflecting easy access to the specialized services. There was higher coverage among younger women. More serious diagnoses, for both cytologies and biopsies, were more prevalent in older women. CONCLUSIONS Insufficient coverage of cytologies, reported by the interviewees allows us to understand access difficulties in primary care, as well as the fragility of screening strategies. PMID:24897045
Analysis of Emergency Diesel Generators Failure Incidents in Nuclear Power Plants

NASA Astrophysics Data System (ADS)

Hunt, Ronderio LaDavis

In early years of operation, emergency diesel generators have had a minimal rate of demand failures. Emergency diesel generators are designed to operate as a backup when the main source of electricity has been disrupted. As of late, EDGs (emergency diesel generators) have been failing at NPPs (nuclear power plants) around the United States causing either station blackouts or loss of onsite and offsite power. These failures occurred from a specific type called demand failures. This thesis evaluated the current problem that raised concern in the nuclear industry which was averaging 1 EDG demand failure/year in 1997 to having an excessive event of 4 EDG demand failure year which occurred in 2011. To determine the next occurrence of the extreme event and possible cause to an event of such happening, two analyses were conducted, the statistical and root cause analysis. Considering the statistical analysis in which an extreme event probability approach was applied to determine the next occurrence year of an excessive event as well as, the probability of that excessive event occurring. Using the root cause analysis in which the potential causes of the excessive event occurred by evaluating, the EDG manufacturers, aging, policy changes/ maintenance practices and failure components. The root cause analysis investigated the correlation between demand failure data and historical data. Final results from the statistical analysis showed expectations of an excessive event occurring in a fixed range of probability and a wider range of probability from the extreme event probability approach. The root-cause analysis of the demand failure data followed historical statistics for the EDG manufacturer, aging and policy changes/ maintenance practices but, indicated a possible cause regarding the excessive event with the failure components. Conclusions showed the next excessive demand failure year, prediction of the probability and the next occurrence year of such failures, with an acceptable confidence level, was difficult but, it was likely that this type of failure will not be a 100 year event. It was noticeable to see that the majority of the EDG demand failures occurred within the main components as of 2005. The overall analysis of this study provided from percentages, indicated that it would be appropriate to make the statement that the excessive event was caused by the overall age (wear and tear) of the Emergency Diesel Generators in Nuclear Power Plants. Future Work will be to better determine the return period of the excessive event once the occurrence has happened for a second time by implementing the extreme event probability approach.
To What Extent Is Mathematical Ability Predictive of Performance in a Methodology and Statistics Course? Can an Action Research Approach Be Used to Understand the Relevance of Mathematical Ability in Psychology Undergraduates?

ERIC Educational Resources Information Center

Bourne, Victoria J.

2014-01-01

Research methods and statistical analysis is typically the least liked and most anxiety provoking aspect of a psychology undergraduate degree, in large part due to the mathematical component of the content. In this first cycle of a piece of action research, students' mathematical ability is examined in relation to their performance across…
Incorporating principal component analysis into air quality model evaluation

EPA Science Inventory

The efficacy of standard air quality model evaluation techniques is becoming compromised as the simulation periods continue to lengthen in response to ever increasing computing capacity. Accordingly, the purpose of this paper is to demonstrate a statistical approach called Princi...
The use of multivariate statistics in studies of wildlife habitat

Treesearch

David E. Capen

1981-01-01

This report contains edited and reviewed versions of papers presented at a workshop held at the University of Vermont in April 1980. Topics include sampling avian habitats, multivariate methods, applications, examples, and new approaches to analysis and interpretation.
Accounting for competing risks in randomized controlled trials: a review and recommendations for improvement.

PubMed

Austin, Peter C; Fine, Jason P

2017-04-15

In studies with survival or time-to-event outcomes, a competing risk is an event whose occurrence precludes the occurrence of the primary event of interest. Specialized statistical methods must be used to analyze survival data in the presence of competing risks. We conducted a review of randomized controlled trials with survival outcomes that were published in high-impact general medical journals. Of 40 studies that we identified, 31 (77.5%) were potentially susceptible to competing risks. However, in the majority of these studies, the potential presence of competing risks was not accounted for in the statistical analyses that were described. Of the 31 studies potentially susceptible to competing risks, 24 (77.4%) reported the results of a Kaplan-Meier survival analysis, while only five (16.1%) reported using cumulative incidence functions to estimate the incidence of the outcome over time in the presence of competing risks. The former approach will tend to result in an overestimate of the incidence of the outcome over time, while the latter approach will result in unbiased estimation of the incidence of the primary outcome over time. We provide recommendations on the analysis and reporting of randomized controlled trials with survival outcomes in the presence of competing risks. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

A DMAIC approach for process capability improvement an engine crankshaft manufacturing process

NASA Astrophysics Data System (ADS)

Sharma, G. V. S. S.; Rao, P. Srinivasa

2014-05-01

The define-measure-analyze-improve-control (DMAIC) approach is a five-strata approach, namely DMAIC. This approach is the scientific approach for reducing the deviations and improving the capability levels of the manufacturing processes. The present work elaborates on DMAIC approach applied in reducing the process variations of the stub-end-hole boring operation of the manufacture of crankshaft. This statistical process control study starts with selection of the critical-to-quality (CTQ) characteristic in the define stratum. The next stratum constitutes the collection of dimensional measurement data of the CTQ characteristic identified. This is followed by the analysis and improvement strata where the various quality control tools like Ishikawa diagram, physical mechanism analysis, failure modes effects analysis and analysis of variance are applied. Finally, the process monitoring charts are deployed at the workplace for regular monitoring and control of the concerned CTQ characteristic. By adopting DMAIC approach, standard deviation is reduced from 0.003 to 0.002. The process potential capability index ( C P) values improved from 1.29 to 2.02 and the process performance capability index ( C PK) values improved from 0.32 to 1.45, respectively.
Application of microarray analysis on computer cluster and cloud platforms.

PubMed

Bernau, C; Boulesteix, A-L; Knaus, J

2013-01-01

Analysis of recent high-dimensional biological data tends to be computationally intensive as many common approaches such as resampling or permutation tests require the basic statistical analysis to be repeated many times. A crucial advantage of these methods is that they can be easily parallelized due to the computational independence of the resampling or permutation iterations, which has induced many statistics departments to establish their own computer clusters. An alternative is to rent computing resources in the cloud, e.g. at Amazon Web Services. In this article we analyze whether a selection of statistical projects, recently implemented at our department, can be efficiently realized on these cloud resources. Moreover, we illustrate an opportunity to combine computer cluster and cloud resources. In order to compare the efficiency of computer cluster and cloud implementations and their respective parallelizations we use microarray analysis procedures and compare their runtimes on the different platforms. Amazon Web Services provide various instance types which meet the particular needs of the different statistical projects we analyzed in this paper. Moreover, the network capacity is sufficient and the parallelization is comparable in efficiency to standard computer cluster implementations. Our results suggest that many statistical projects can be efficiently realized on cloud resources. It is important to mention, however, that workflows can change substantially as a result of a shift from computer cluster to cloud computing.
Toward improved analysis of concentration data: Embracing nondetects.

PubMed

Shoari, Niloofar; Dubé, Jean-Sébastien

2018-03-01

Various statistical tests on concentration data serve to support decision-making regarding characterization and monitoring of contaminated media, assessing exposure to a chemical, and quantifying the associated risks. However, the routine statistical protocols cannot be directly applied because of challenges arising from nondetects or left-censored observations, which are concentration measurements below the detection limit of measuring instruments. Despite the existence of techniques based on survival analysis that can adjust for nondetects, these are seldom taken into account properly. A comprehensive review of the literature showed that managing policies regarding analysis of censored data do not always agree and that guidance from regulatory agencies may be outdated. Therefore, researchers and practitioners commonly resort to the most convenient way of tackling the censored data problem by substituting nondetects with arbitrary constants prior to data analysis, although this is generally regarded as a bias-prone approach. Hoping to improve the interpretation of concentration data, the present article aims to familiarize researchers in different disciplines with the significance of left-censored observations and provides theoretical and computational recommendations (under both frequentist and Bayesian frameworks) for adequate analysis of censored data. In particular, the present article synthesizes key findings from previous research with respect to 3 noteworthy aspects of inferential statistics: estimation of descriptive statistics, hypothesis testing, and regression analysis. Environ Toxicol Chem 2018;37:643-656. © 2017 SETAC. © 2017 SETAC.
Assessment of Reliable Change Using 95% Credible Intervals for the Differences in Proportions: A Statistical Analysis for Case-Study Methodology.

PubMed

Unicomb, Rachael; Colyvas, Kim; Harrison, Elisabeth; Hewat, Sally

2015-06-01

Case-study methodology studying change is often used in the field of speech-language pathology, but it can be criticized for not being statistically robust. Yet with the heterogeneous nature of many communication disorders, case studies allow clinicians and researchers to closely observe and report on change. Such information is valuable and can further inform large-scale experimental designs. In this research note, a statistical analysis for case-study data is outlined that employs a modification to the Reliable Change Index (Jacobson & Truax, 1991). The relationship between reliable change and clinical significance is discussed. Example data are used to guide the reader through the use and application of this analysis. A method of analysis is detailed that is suitable for assessing change in measures with binary categorical outcomes. The analysis is illustrated using data from one individual, measured before and after treatment for stuttering. The application of this approach to assess change in categorical, binary data has potential application in speech-language pathology. It enables clinicians and researchers to analyze results from case studies for their statistical and clinical significance. This new method addresses a gap in the research design literature, that is, the lack of analysis methods for noncontinuous data (such as counts, rates, proportions of events) that may be used in case-study designs.
Exploring the relationship between the engineering and physical sciences and the health and life sciences by advanced bibliometric methods.

PubMed

Waltman, Ludo; van Raan, Anthony F J; Smart, Sue

2014-01-01

We investigate the extent to which advances in the health and life sciences (HLS) are dependent on research in the engineering and physical sciences (EPS), particularly physics, chemistry, mathematics, and engineering. The analysis combines two different bibliometric approaches. The first approach to analyze the 'EPS-HLS interface' is based on term map visualizations of HLS research fields. We consider 16 clinical fields and five life science fields. On the basis of expert judgment, EPS research in these fields is studied by identifying EPS-related terms in the term maps. In the second approach, a large-scale citation-based network analysis is applied to publications from all fields of science. We work with about 22,000 clusters of publications, each representing a topic in the scientific literature. Citation relations are used to identify topics at the EPS-HLS interface. The two approaches complement each other. The advantages of working with textual data compensate for the limitations of working with citation relations and the other way around. An important advantage of working with textual data is in the in-depth qualitative insights it provides. Working with citation relations, on the other hand, yields many relevant quantitative statistics. We find that EPS research contributes to HLS developments mainly in the following five ways: new materials and their properties; chemical methods for analysis and molecular synthesis; imaging of parts of the body as well as of biomaterial surfaces; medical engineering mainly related to imaging, radiation therapy, signal processing technology, and other medical instrumentation; mathematical and statistical methods for data analysis. In our analysis, about 10% of all EPS and HLS publications are classified as being at the EPS-HLS interface. This percentage has remained more or less constant during the past decade.
Exploring the Relationship between the Engineering and Physical Sciences and the Health and Life Sciences by Advanced Bibliometric Methods

PubMed Central

Waltman, Ludo; van Raan, Anthony F. J.; Smart, Sue

2014-01-01

We investigate the extent to which advances in the health and life sciences (HLS) are dependent on research in the engineering and physical sciences (EPS), particularly physics, chemistry, mathematics, and engineering. The analysis combines two different bibliometric approaches. The first approach to analyze the ‘EPS-HLS interface’ is based on term map visualizations of HLS research fields. We consider 16 clinical fields and five life science fields. On the basis of expert judgment, EPS research in these fields is studied by identifying EPS-related terms in the term maps. In the second approach, a large-scale citation-based network analysis is applied to publications from all fields of science. We work with about 22,000 clusters of publications, each representing a topic in the scientific literature. Citation relations are used to identify topics at the EPS-HLS interface. The two approaches complement each other. The advantages of working with textual data compensate for the limitations of working with citation relations and the other way around. An important advantage of working with textual data is in the in-depth qualitative insights it provides. Working with citation relations, on the other hand, yields many relevant quantitative statistics. We find that EPS research contributes to HLS developments mainly in the following five ways: new materials and their properties; chemical methods for analysis and molecular synthesis; imaging of parts of the body as well as of biomaterial surfaces; medical engineering mainly related to imaging, radiation therapy, signal processing technology, and other medical instrumentation; mathematical and statistical methods for data analysis. In our analysis, about 10% of all EPS and HLS publications are classified as being at the EPS-HLS interface. This percentage has remained more or less constant during the past decade. PMID:25360616
Diagnosis of students' ability in a statistical course based on Rasch probabilistic outcome

NASA Astrophysics Data System (ADS)

Mahmud, Zamalia; Ramli, Wan Syahira Wan; Sapri, Shamsiah; Ahmad, Sanizah

2017-06-01

Measuring students' ability and performance are important in assessing how well students have learned and mastered the statistical courses. Any improvement in learning will depend on the student's approaches to learning, which are relevant to some factors of learning, namely assessment methods carrying out tasks consisting of quizzes, tests, assignment and final examination. This study has attempted an alternative approach to measure students' ability in an undergraduate statistical course based on the Rasch probabilistic model. Firstly, this study aims to explore the learning outcome patterns of students in a statistics course (Applied Probability and Statistics) based on an Entrance-Exit survey. This is followed by investigating students' perceived learning ability based on four Course Learning Outcomes (CLOs) and students' actual learning ability based on their final examination scores. Rasch analysis revealed that students perceived themselves as lacking the ability to understand about 95% of the statistics concepts at the beginning of the class but eventually they had a good understanding at the end of the 14 weeks class. In terms of students' performance in their final examination, their ability in understanding the topics varies at different probability values given the ability of the students and difficulty of the questions. Majority found the probability and counting rules topic to be the most difficult to learn.
Correlation spectrometer for filtering of (quasi) elastic neutron scattering with variable resolution

NASA Astrophysics Data System (ADS)

Magazù, Salvatore; Mezei, Ferenc; Migliardo, Federica

2018-05-01

In a variety of applications of inelastic neutron scattering spectroscopy the goal is to single out the elastic scattering contribution from the total scattered spectrum as a function of momentum transfer and sample environment parameters. The elastic part of the spectrum is defined in such a case by the energy resolution of the spectrometer. Variable elastic energy resolution offers a way to distinguish between elastic and quasi-elastic intensities. Correlation spectroscopy lends itself as an efficient, high intensity approach for accomplishing this both at continuous and pulsed neutron sources. On the one hand, in beam modulation methods the Liouville theorem coupling between intensity and resolution is relaxed and time-of-flight velocity analysis of the neutron velocity distribution can be performed with 50 % duty factor exposure for all available resolutions. On the other hand, the (quasi)elastic part of the spectrum generally contains the major part of the integrated intensity at a given detector, and thus correlation spectroscopy can be applied with most favorable signal to statistical noise ratio. The novel spectrometer CORELLI at SNS is an example for this type of application of the correlation technique at a pulsed source. On a continuous neutron source a statistical chopper can be used for quasi-random time dependent beam modulation and the total time-of-flight of the neutron from the statistical chopper to detection is determined by the analysis of the correlation between the temporal fluctuation of the neutron detection rate and the statistical chopper beam modulation pattern. The correlation analysis can either be used for the determination of the incoming neutron velocity or for the scattered neutron velocity, depending of the position of the statistical chopper along the neutron trajectory. These two options are considered together with an evaluation of spectrometer performance compared to conventional spectroscopy, in particular for variable resolution elastic neutron scattering (RENS) studies of relaxation processes and the evolution of mean square displacements. A particular focus of our analysis is the unique feature of correlation spectroscopy of delivering high and resolution independent beam intensity, thus the same statistical chopper scan contains both high intensity and high resolution information at the same time, and can be evaluated both ways. This flexibility for variable resolution data handling represents an additional asset for correlation spectroscopy in variable resolution work. Changing the beam width for the same statistical chopper allows us to additionally trade resolution for intensity in two different experimental runs, similarly for conventional single slit chopper spectroscopy. The combination of these two approaches is a capability of particular value in neutron spectroscopy studies requiring variable energy resolution, such as the systematic study of quasi-elastic scattering and mean square displacement. Furthermore the statistical chopper approach is particularly advantageous for studying samples with low scattering intensity in the presence of a high, sample independent background.
The effect of normalization of Partial Directed Coherence on the statistical assessment of connectivity patterns: a simulation study.

PubMed

Toppi, J; Petti, M; Vecchiato, G; Cincotti, F; Salinari, S; Mattia, D; Babiloni, F; Astolfi, L

2013-01-01

Partial Directed Coherence (PDC) is a spectral multivariate estimator for effective connectivity, relying on the concept of Granger causality. Even if its original definition derived directly from information theory, two modifies were introduced in order to provide better physiological interpretations of the estimated networks: i) normalization of the estimator according to rows, ii) squared transformation. In the present paper we investigated the effect of PDC normalization on the performances achieved by applying the statistical validation process on investigated connectivity patterns under different conditions of Signal to Noise ratio (SNR) and amount of data available for the analysis. Results of the statistical analysis revealed an effect of PDC normalization only on the percentages of type I and type II errors occurred by using Shuffling procedure for the assessment of connectivity patterns. No effects of the PDC formulation resulted on the performances achieved during the validation process executed instead by means of Asymptotic Statistic approach. Moreover, the percentages of both false positives and false negatives committed by Asymptotic Statistic are always lower than those achieved by Shuffling procedure for each type of normalization.
External validation of ADO, DOSE, COTE and CODEX at predicting death in primary care patients with COPD using standard and machine learning approaches.

PubMed

Morales, Daniel R; Flynn, Rob; Zhang, Jianguo; Trucco, Emmanuel; Quint, Jennifer K; Zutis, Kris

2018-05-01

Several models for predicting the risk of death in people with chronic obstructive pulmonary disease (COPD) exist but have not undergone large scale validation in primary care. The objective of this study was to externally validate these models using statistical and machine learning approaches. We used a primary care COPD cohort identified using data from the UK Clinical Practice Research Datalink. Age-standardised mortality rates were calculated for the population by gender and discrimination of ADO (age, dyspnoea, airflow obstruction), COTE (COPD-specific comorbidity test), DOSE (dyspnoea, airflow obstruction, smoking, exacerbations) and CODEX (comorbidity, dyspnoea, airflow obstruction, exacerbations) at predicting death over 1-3 years measured using logistic regression and a support vector machine learning (SVM) method of analysis. The age-standardised mortality rate was 32.8 (95%CI 32.5-33.1) and 25.2 (95%CI 25.4-25.7) per 1000 person years for men and women respectively. Complete data were available for 54879 patients to predict 1-year mortality. ADO performed the best (c-statistic of 0.730) compared with DOSE (c-statistic 0.645), COTE (c-statistic 0.655) and CODEX (c-statistic 0.649) at predicting 1-year mortality. Discrimination of ADO and DOSE improved at predicting 1-year mortality when combined with COTE comorbidities (c-statistic 0.780 ADO + COTE; c-statistic 0.727 DOSE + COTE). Discrimination did not change significantly over 1-3 years. Comparable results were observed using SVM. In primary care, ADO appears superior at predicting death in COPD. Performance of ADO and DOSE improved when combined with COTE comorbidities suggesting better models may be generated with additional data facilitated using novel approaches. Copyright © 2018. Published by Elsevier Ltd.
A Dimensionally Reduced Clustering Methodology for Heterogeneous Occupational Medicine Data Mining.

PubMed

Saâdaoui, Foued; Bertrand, Pierre R; Boudet, Gil; Rouffiac, Karine; Dutheil, Frédéric; Chamoux, Alain

2015-10-01

Clustering is a set of techniques of the statistical learning aimed at finding structures of heterogeneous partitions grouping homogenous data called clusters. There are several fields in which clustering was successfully applied, such as medicine, biology, finance, economics, etc. In this paper, we introduce the notion of clustering in multifactorial data analysis problems. A case study is conducted for an occupational medicine problem with the purpose of analyzing patterns in a population of 813 individuals. To reduce the data set dimensionality, we base our approach on the Principal Component Analysis (PCA), which is the statistical tool most commonly used in factorial analysis. However, the problems in nature, especially in medicine, are often based on heterogeneous-type qualitative-quantitative measurements, whereas PCA only processes quantitative ones. Besides, qualitative data are originally unobservable quantitative responses that are usually binary-coded. Hence, we propose a new set of strategies allowing to simultaneously handle quantitative and qualitative data. The principle of this approach is to perform a projection of the qualitative variables on the subspaces spanned by quantitative ones. Subsequently, an optimal model is allocated to the resulting PCA-regressed subspaces.
Stochastic dynamic analysis of marine risers considering Gaussian system uncertainties

NASA Astrophysics Data System (ADS)

Ni, Pinghe; Li, Jun; Hao, Hong; Xia, Yong

2018-03-01

This paper performs the stochastic dynamic response analysis of marine risers with material uncertainties, i.e. in the mass density and elastic modulus, by using Stochastic Finite Element Method (SFEM) and model reduction technique. These uncertainties are assumed having Gaussian distributions. The random mass density and elastic modulus are represented by using the Karhunen-Loève (KL) expansion. The Polynomial Chaos (PC) expansion is adopted to represent the vibration response because the covariance of the output is unknown. Model reduction based on the Iterated Improved Reduced System (IIRS) technique is applied to eliminate the PC coefficients of the slave degrees of freedom to reduce the dimension of the stochastic system. Monte Carlo Simulation (MCS) is conducted to obtain the reference response statistics. Two numerical examples are studied in this paper. The response statistics from the proposed approach are compared with those from MCS. It is noted that the computational time is significantly reduced while the accuracy is kept. The results demonstrate the efficiency of the proposed approach for stochastic dynamic response analysis of marine risers.
Trabecular bone analysis in CT and X-ray images of the proximal femur for the assessment of local bone quality.

PubMed

Fritscher, Karl; Grunerbl, Agnes; Hanni, Markus; Suhm, Norbert; Hengg, Clemens; Schubert, Rainer

2009-10-01

Currently, conventional X-ray and CT images as well as invasive methods performed during the surgical intervention are used to judge the local quality of a fractured proximal femur. However, these approaches are either dependent on the surgeon's experience or cannot assist diagnostic and planning tasks preoperatively. Therefore, in this work a method for the individual analysis of local bone quality in the proximal femur based on model-based analysis of CT- and X-ray images of femur specimen will be proposed. A combined representation of shape and spatial intensity distribution of an object and different statistical approaches for dimensionality reduction are used to create a statistical appearance model in order to assess the local bone quality in CT and X-ray images. The developed algorithms are tested and evaluated on 28 femur specimen. It will be shown that the tools and algorithms presented herein are highly adequate to automatically and objectively predict bone mineral density values as well as a biomechanical parameter of the bone that can be measured intraoperatively.
Mediation analysis in nursing research: a methodological review.

PubMed

Liu, Jianghong; Ulrich, Connie

2016-12-01

Mediation statistical models help clarify the relationship between independent predictor variables and dependent outcomes of interest by assessing the impact of third variables. This type of statistical analysis is applicable for many clinical nursing research questions, yet its use within nursing remains low. Indeed, mediational analyses may help nurse researchers develop more effective and accurate prevention and treatment programs as well as help bridge the gap between scientific knowledge and clinical practice. In addition, this statistical approach allows nurse researchers to ask - and answer - more meaningful and nuanced questions that extend beyond merely determining whether an outcome occurs. Therefore, the goal of this paper is to provide a brief tutorial on the use of mediational analyses in clinical nursing research by briefly introducing the technique and, through selected empirical examples from the nursing literature, demonstrating its applicability in advancing nursing science.
Structure-Specific Statistical Mapping of White Matter Tracts

PubMed Central

Yushkevich, Paul A.; Zhang, Hui; Simon, Tony; Gee, James C.

2008-01-01

We present a new model-based framework for the statistical analysis of diffusion imaging data associated with specific white matter tracts. The framework takes advantage of the fact that several of the major white matter tracts are thin sheet-like structures that can be effectively modeled by medial representations. The approach involves segmenting major tracts and fitting them with deformable geometric medial models. The medial representation makes it possible to average and combine tensor-based features along directions locally perpendicular to the tracts, thus reducing data dimensionality and accounting for errors in normalization. The framework enables the analysis of individual white matter structures, and provides a range of possibilities for computing statistics and visualizing differences between cohorts. The framework is demonstrated in a study of white matter differences in pediatric chromosome 22q11.2 deletion syndrome. PMID:18407524
Feasibility Study on the Use of On-line Multivariate Statistical Process Control for Safeguards Applications in Natural Uranium Conversion Plants

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ladd-Lively, Jennifer L

2014-01-01

The objective of this work was to determine the feasibility of using on-line multivariate statistical process control (MSPC) for safeguards applications in natural uranium conversion plants. Multivariate statistical process control is commonly used throughout industry for the detection of faults. For safeguards applications in uranium conversion plants, faults could include the diversion of intermediate products such as uranium dioxide, uranium tetrafluoride, and uranium hexafluoride. This study was limited to a 100 metric ton of uranium (MTU) per year natural uranium conversion plant (NUCP) using the wet solvent extraction method for the purification of uranium ore concentrate. A key component inmore » the multivariate statistical methodology is the Principal Component Analysis (PCA) approach for the analysis of data, development of the base case model, and evaluation of future operations. The PCA approach was implemented through the use of singular value decomposition of the data matrix where the data matrix represents normal operation of the plant. Component mole balances were used to model each of the process units in the NUCP. However, this approach could be applied to any data set. The monitoring framework developed in this research could be used to determine whether or not a diversion of material has occurred at an NUCP as part of an International Atomic Energy Agency (IAEA) safeguards system. This approach can be used to identify the key monitoring locations, as well as locations where monitoring is unimportant. Detection limits at the key monitoring locations can also be established using this technique. Several faulty scenarios were developed to test the monitoring framework after the base case or normal operating conditions of the PCA model were established. In all of the scenarios, the monitoring framework was able to detect the fault. Overall this study was successful at meeting the stated objective.« less
Principal components analysis of an evaluation of the hemiplegic subject based on the Bobath approach.

PubMed

Corriveau, H; Arsenault, A B; Dutil, E; Lepage, Y

1992-01-01

An evaluation based on the Bobath approach to treatment has previously been developed and partially validated. The purpose of the present study was to verify the content validity of this evaluation with the use of a statistical approach known as principal components analysis. Thirty-eight hemiplegic subjects participated in the study. Analysis of the scores on each of six parameters (sensorium, active movements, muscle tone, reflex activity, postural reactions, and pain) was evaluated on three occasions across a 2-month period. Each time this produced three factors that contained 70% of the variation in the data set. The first component mainly reflected variations in mobility, the second mainly variations in muscle tone, and the third mainly variations in sensorium and pain. The results of such exploratory analysis highlight the fact that some of the parameters are not only important but also interrelated. These results seem to partially support the conceptual framework substantiating the Bobath approach to treatment.
Response to Comments on "Ducklings imprint on the relational concept of 'same or different'".

PubMed

Martinho, Antone; Kacelnik, Alex

2017-02-24

Two Comments by Hupé and by Langbein and Puppe address our choice of statistical analysis in assigning preference between sets of stimuli to individual ducklings in our paper. We believe that our analysis remains the most appropriate approach for our data and experimental design. Copyright © 2017, American Association for the Advancement of Science.
Citation Patterns of Engineering, Statistics, and Computer Science Researchers: An Internal and External Citation Analysis across Multiple Engineering Subfields

ERIC Educational Resources Information Center

Kelly, Madeline

2015-01-01

This study takes a multidimensional approach to citation analysis, examining citations in multiple subfields of engineering, from both scholarly journals and doctoral dissertations. The three major goals of the study are to determine whether there are differences between citations drawn from dissertations and those drawn from journal articles; to…
Using "Excel" for White's Test--An Important Technique for Evaluating the Equality of Variance Assumption and Model Specification in a Regression Analysis

ERIC Educational Resources Information Center

Berenson, Mark L.

2013-01-01

There is consensus in the statistical literature that severe departures from its assumptions invalidate the use of regression modeling for purposes of inference. The assumptions of regression modeling are usually evaluated subjectively through visual, graphic displays in a residual analysis but such an approach, taken alone, may be insufficient…

Entropy Production in Collisionless Systems. II. Arbitrary Phase-space Occupation Numbers

NASA Astrophysics Data System (ADS)

Barnes, Eric I.; Williams, Liliya L. R.

2012-04-01

We present an analysis of two thermodynamic techniques for determining equilibria of self-gravitating systems. One is the Lynden-Bell (LB) entropy maximization analysis that introduced violent relaxation. Since we do not use the Stirling approximation, which is invalid at small occupation numbers, our systems have finite mass, unlike LB's isothermal spheres. (Instead of Stirling, we utilize a very accurate smooth approximation for ln x!.) The second analysis extends entropy production extremization to self-gravitating systems, also without the use of the Stirling approximation. In addition to the LB statistical family characterized by the exclusion principle in phase space, and designed to treat collisionless systems, we also apply the two approaches to the Maxwell-Boltzmann (MB) families, which have no exclusion principle and hence represent collisional systems. We implicitly assume that all of the phase space is equally accessible. We derive entropy production expressions for both families and give the extremum conditions for entropy production. Surprisingly, our analysis indicates that extremizing entropy production rate results in systems that have maximum entropy, in both LB and MB statistics. In other words, both thermodynamic approaches lead to the same equilibrium structures.
Automation method to identify the geological structure of seabed using spatial statistic analysis of echo sounding data

NASA Astrophysics Data System (ADS)

Kwon, O.; Kim, W.; Kim, J.

2017-12-01

Recently construction of subsea tunnel has been increased globally. For safe construction of subsea tunnel, identifying the geological structure including fault at design and construction stage is more than important. Then unlike the tunnel in land, it's very difficult to obtain the data on geological structure because of the limit in geological survey. This study is intended to challenge such difficulties in a way of developing the technology to identify the geological structure of seabed automatically by using echo sounding data. When investigation a potential site for a deep subsea tunnel, there is the technical and economical limit with borehole of geophysical investigation. On the contrary, echo sounding data is easily obtainable while information reliability is higher comparing to above approaches. This study is aimed at developing the algorithm that identifies the large scale of geological structure of seabed using geostatic approach. This study is based on theory of structural geology that topographic features indicate geological structure. Basic concept of algorithm is outlined as follows; (1) convert the seabed topography to the grid data using echo sounding data, (2) apply the moving window in optimal size to the grid data, (3) estimate the spatial statistics of the grid data in the window area, (4) set the percentile standard of spatial statistics, (5) display the values satisfying the standard on the map, (6) visualize the geological structure on the map. The important elements in this study include optimal size of moving window, kinds of optimal spatial statistics and determination of optimal percentile standard. To determine such optimal elements, a numerous simulations were implemented. Eventually, user program based on R was developed using optimal analysis algorithm. The user program was designed to identify the variations of various spatial statistics. It leads to easy analysis of geological structure depending on variation of spatial statistics by arranging to easily designate the type of spatial statistics and percentile standard. This research was supported by the Korea Agency for Infrastructure Technology Advancement under the Ministry of Land, Infrastructure and Transport of the Korean government. (Project Number: 13 Construction Research T01)
Statistical Analysis of the Uncertainty in Pre-Flight Aerodynamic Database of a Hypersonic Vehicle

NASA Astrophysics Data System (ADS)

Huh, Lynn

The objective of the present research was to develop a new method to derive the aerodynamic coefficients and the associated uncertainties for flight vehicles via post- flight inertial navigation analysis using data from the inertial measurement unit. Statistical estimates of vehicle state and aerodynamic coefficients are derived using Monte Carlo simulation. Trajectory reconstruction using the inertial navigation system (INS) is a simple and well used method. However, deriving realistic uncertainties in the reconstructed state and any associated parameters is not so straight forward. Extended Kalman filters, batch minimum variance estimation and other approaches have been used. However, these methods generally depend on assumed physical models, assumed statistical distributions (usually Gaussian) or have convergence issues for non-linear problems. The approach here assumes no physical models, is applicable to any statistical distribution, and does not have any convergence issues. The new approach obtains the statistics directly from a sufficient number of Monte Carlo samples using only the generally well known gyro and accelerometer specifications and could be applied to the systems of non-linear form and non-Gaussian distribution. When redundant data are available, the set of Monte Carlo simulations are constrained to satisfy the redundant data within the uncertainties specified for the additional data. The proposed method was applied to validate the uncertainty in the pre-flight aerodynamic database of the X-43A Hyper-X research vehicle. In addition to gyro and acceleration data, the actual flight data include redundant measurements of position and velocity from the global positioning system (GPS). The criteria derived from the blend of the GPS and INS accuracy was used to select valid trajectories for statistical analysis. The aerodynamic coefficients were derived from the selected trajectories by either direct extraction method based on the equations in dynamics, or by the inquiry of the pre-flight aerodynamic database. After the application of the proposed method to the case of the X-43A Hyper-X research vehicle, it was found that 1) there were consistent differences in the aerodynamic coefficients from the pre-flight aerodynamic database and post-flight analysis, 2) the pre-flight estimation of the pitching moment coefficients was significantly different from the post-flight analysis, 3) the type of distribution of the states from the Monte Carlo simulation were affected by that of the perturbation parameters, 4) the uncertainties in the pre-flight model were overestimated, 5) the range where the aerodynamic coefficients from the pre-flight aerodynamic database and post-flight analysis are in closest agreement is between Mach *.* and *.* and more data points may be needed between Mach * and ** in the pre-flight aerodynamic database, 6) selection criterion for valid trajectories from the Monte Carlo simulations was mostly driven by the horizontal velocity error, 7) the selection criterion must be based on reasonable model to ensure the validity of the statistics from the proposed method, and 8) the results from the proposed method applied to the two different flights with the identical geometry and similar flight profile were consistent.
Methodological approaches in analysing observational data: A practical example on how to address clustering and selection bias.

PubMed

Trutschel, Diana; Palm, Rebecca; Holle, Bernhard; Simon, Michael

2017-11-01

Because not every scientific question on effectiveness can be answered with randomised controlled trials, research methods that minimise bias in observational studies are required. Two major concerns influence the internal validity of effect estimates: selection bias and clustering. Hence, to reduce the bias of the effect estimates, more sophisticated statistical methods are needed. To introduce statistical approaches such as propensity score matching and mixed models into representative real-world analysis and to conduct the implementation in statistical software R to reproduce the results. Additionally, the implementation in R is presented to allow the results to be reproduced. We perform a two-level analytic strategy to address the problems of bias and clustering: (i) generalised models with different abilities to adjust for dependencies are used to analyse binary data and (ii) the genetic matching and covariate adjustment methods are used to adjust for selection bias. Hence, we analyse the data from two population samples, the sample produced by the matching method and the full sample. The different analysis methods in this article present different results but still point in the same direction. In our example, the estimate of the probability of receiving a case conference is higher in the treatment group than in the control group. Both strategies, genetic matching and covariate adjustment, have their limitations but complement each other to provide the whole picture. The statistical approaches were feasible for reducing bias but were nevertheless limited by the sample used. For each study and obtained sample, the pros and cons of the different methods have to be weighted. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.
Simultaneous Analysis and Quality Assurance for Diffusion Tensor Imaging

PubMed Central

Lauzon, Carolyn B.; Asman, Andrew J.; Esparza, Michael L.; Burns, Scott S.; Fan, Qiuyun; Gao, Yurui; Anderson, Adam W.; Davis, Nicole; Cutting, Laurie E.; Landman, Bennett A.

2013-01-01

Diffusion tensor imaging (DTI) enables non-invasive, cyto-architectural mapping of in vivo tissue microarchitecture through voxel-wise mathematical modeling of multiple magnetic resonance imaging (MRI) acquisitions, each differently sensitized to water diffusion. DTI computations are fundamentally estimation processes and are sensitive to noise and artifacts. Despite widespread adoption in the neuroimaging community, maintaining consistent DTI data quality remains challenging given the propensity for patient motion, artifacts associated with fast imaging techniques, and the possibility of hardware changes/failures. Furthermore, the quantity of data acquired per voxel, the non-linear estimation process, and numerous potential use cases complicate traditional visual data inspection approaches. Currently, quality inspection of DTI data has relied on visual inspection and individual processing in DTI analysis software programs (e.g. DTIPrep, DTI-studio). However, recent advances in applied statistical methods have yielded several different metrics to assess noise level, artifact propensity, quality of tensor fit, variance of estimated measures, and bias in estimated measures. To date, these metrics have been largely studied in isolation. Herein, we select complementary metrics for integration into an automatic DTI analysis and quality assurance pipeline. The pipeline completes in 24 hours, stores statistical outputs, and produces a graphical summary quality analysis (QA) report. We assess the utility of this streamlined approach for empirical quality assessment on 608 DTI datasets from pediatric neuroimaging studies. The efficiency and accuracy of quality analysis using the proposed pipeline is compared with quality analysis based on visual inspection. The unified pipeline is found to save a statistically significant amount of time (over 70%) while improving the consistency of QA between a DTI expert and a pool of research associates. Projection of QA metrics to a low dimensional manifold reveal qualitative, but clear, QA-study associations and suggest that automated outlier/anomaly detection would be feasible. PMID:23637895
Simultaneous analysis and quality assurance for diffusion tensor imaging.

PubMed

Lauzon, Carolyn B; Asman, Andrew J; Esparza, Michael L; Burns, Scott S; Fan, Qiuyun; Gao, Yurui; Anderson, Adam W; Davis, Nicole; Cutting, Laurie E; Landman, Bennett A

2013-01-01

Diffusion tensor imaging (DTI) enables non-invasive, cyto-architectural mapping of in vivo tissue microarchitecture through voxel-wise mathematical modeling of multiple magnetic resonance imaging (MRI) acquisitions, each differently sensitized to water diffusion. DTI computations are fundamentally estimation processes and are sensitive to noise and artifacts. Despite widespread adoption in the neuroimaging community, maintaining consistent DTI data quality remains challenging given the propensity for patient motion, artifacts associated with fast imaging techniques, and the possibility of hardware changes/failures. Furthermore, the quantity of data acquired per voxel, the non-linear estimation process, and numerous potential use cases complicate traditional visual data inspection approaches. Currently, quality inspection of DTI data has relied on visual inspection and individual processing in DTI analysis software programs (e.g. DTIPrep, DTI-studio). However, recent advances in applied statistical methods have yielded several different metrics to assess noise level, artifact propensity, quality of tensor fit, variance of estimated measures, and bias in estimated measures. To date, these metrics have been largely studied in isolation. Herein, we select complementary metrics for integration into an automatic DTI analysis and quality assurance pipeline. The pipeline completes in 24 hours, stores statistical outputs, and produces a graphical summary quality analysis (QA) report. We assess the utility of this streamlined approach for empirical quality assessment on 608 DTI datasets from pediatric neuroimaging studies. The efficiency and accuracy of quality analysis using the proposed pipeline is compared with quality analysis based on visual inspection. The unified pipeline is found to save a statistically significant amount of time (over 70%) while improving the consistency of QA between a DTI expert and a pool of research associates. Projection of QA metrics to a low dimensional manifold reveal qualitative, but clear, QA-study associations and suggest that automated outlier/anomaly detection would be feasible.
Mass detection, localization and estimation for wind turbine blades based on statistical pattern recognition

NASA Astrophysics Data System (ADS)

Colone, L.; Hovgaard, M. K.; Glavind, L.; Brincker, R.

2018-07-01

A method for mass change detection on wind turbine blades using natural frequencies is presented. The approach is based on two statistical tests. The first test decides if there is a significant mass change and the second test is a statistical group classification based on Linear Discriminant Analysis. The frequencies are identified by means of Operational Modal Analysis using natural excitation. Based on the assumption of Gaussianity of the frequencies, a multi-class statistical model is developed by combining finite element model sensitivities in 10 classes of change location on the blade, the smallest area being 1/5 of the span. The method is experimentally validated for a full scale wind turbine blade in a test setup and loaded by natural wind. Mass change from natural causes was imitated with sand bags and the algorithm was observed to perform well with an experimental detection rate of 1, localization rate of 0.88 and mass estimation rate of 0.72.
An ANOVA approach for statistical comparisons of brain networks.

PubMed

Fraiman, Daniel; Fraiman, Ricardo

2018-03-16

The study of brain networks has developed extensively over the last couple of decades. By contrast, techniques for the statistical analysis of these networks are less developed. In this paper, we focus on the statistical comparison of brain networks in a nonparametric framework and discuss the associated detection and identification problems. We tested network differences between groups with an analysis of variance (ANOVA) test we developed specifically for networks. We also propose and analyse the behaviour of a new statistical procedure designed to identify different subnetworks. As an example, we show the application of this tool in resting-state fMRI data obtained from the Human Connectome Project. We identify, among other variables, that the amount of sleep the days before the scan is a relevant variable that must be controlled. Finally, we discuss the potential bias in neuroimaging findings that is generated by some behavioural and brain structure variables. Our method can also be applied to other kind of networks such as protein interaction networks, gene networks or social networks.
Enhancing efficiency and quality of statistical estimation of immunogenicity assay cut points through standardization and automation.

PubMed

Su, Cheng; Zhou, Lei; Hu, Zheng; Weng, Winnie; Subramani, Jayanthi; Tadkod, Vineet; Hamilton, Kortney; Bautista, Ami; Wu, Yu; Chirmule, Narendra; Zhong, Zhandong Don

2015-10-01

Biotherapeutics can elicit immune responses, which can alter the exposure, safety, and efficacy of the therapeutics. A well-designed and robust bioanalytical method is critical for the detection and characterization of relevant anti-drug antibody (ADA) and the success of an immunogenicity study. As a fundamental criterion in immunogenicity testing, assay cut points need to be statistically established with a risk-based approach to reduce subjectivity. This manuscript describes the development of a validated, web-based, multi-tier customized assay statistical tool (CAST) for assessing cut points of ADA assays. The tool provides an intuitive web interface that allows users to import experimental data generated from a standardized experimental design, select the assay factors, run the standardized analysis algorithms, and generate tables, figures, and listings (TFL). It allows bioanalytical scientists to perform complex statistical analysis at a click of the button to produce reliable assay parameters in support of immunogenicity studies. Copyright © 2015 Elsevier B.V. All rights reserved.
Bayesian Reconstruction of Disease Outbreaks by Combining Epidemiologic and Genomic Data

PubMed Central

Jombart, Thibaut; Cori, Anne; Didelot, Xavier; Cauchemez, Simon; Fraser, Christophe; Ferguson, Neil

2014-01-01

Recent years have seen progress in the development of statistically rigorous frameworks to infer outbreak transmission trees (“who infected whom”) from epidemiological and genetic data. Making use of pathogen genome sequences in such analyses remains a challenge, however, with a variety of heuristic approaches having been explored to date. We introduce a statistical method exploiting both pathogen sequences and collection dates to unravel the dynamics of densely sampled outbreaks. Our approach identifies likely transmission events and infers dates of infections, unobserved cases and separate introductions of the disease. It also proves useful for inferring numbers of secondary infections and identifying heterogeneous infectivity and super-spreaders. After testing our approach using simulations, we illustrate the method with the analysis of the beginning of the 2003 Singaporean outbreak of Severe Acute Respiratory Syndrome (SARS), providing new insights into the early stage of this epidemic. Our approach is the first tool for disease outbreak reconstruction from genetic data widely available as free software, the R package outbreaker. It is applicable to various densely sampled epidemics, and improves previous approaches by detecting unobserved and imported cases, as well as allowing multiple introductions of the pathogen. Because of its generality, we believe this method will become a tool of choice for the analysis of densely sampled disease outbreaks, and will form a rigorous framework for subsequent methodological developments. PMID:24465202
Optimal spectral tracking--adapting to dynamic regime change.

PubMed

Brittain, John-Stuart; Halliday, David M

2011-01-30

Real world data do not always obey the statistical restraints imposed upon them by sophisticated analysis techniques. In spectral analysis for instance, an ergodic process--the interchangeability of temporal for spatial averaging--is assumed for a repeat-trial design. Many evolutionary scenarios, such as learning and motor consolidation, do not conform to such linear behaviour and should be approached from a more flexible perspective. To this end we previously introduced the method of optimal spectral tracking (OST) in the study of trial-varying parameters. In this extension to our work we modify the OST routines to provide an adaptive implementation capable of reacting to dynamic transitions in the underlying system state. In so doing, we generalise our approach to characterise both slow-varying and rapid fluctuations in time-series, simultaneously providing a metric of system stability. The approach is first applied to a surrogate dataset and compared to both our original non-adaptive solution and spectrogram approaches. The adaptive OST is seen to display fast convergence and desirable statistical properties. All three approaches are then applied to a neurophysiological recording obtained during a study on anaesthetic monitoring. Local field potentials acquired from the posterior hypothalamic region of a deep brain stimulation patient undergoing anaesthesia were analysed. The characterisation of features such as response delay, time-to-peak and modulation brevity are considered. Copyright © 2010 Elsevier B.V. All rights reserved.
On Learning Cluster Coefficient of Private Networks

PubMed Central

Wang, Yue; Wu, Xintao; Zhu, Jun; Xiang, Yang

2013-01-01

Enabling accurate analysis of social network data while preserving differential privacy has been challenging since graph features such as clustering coefficient or modularity often have high sensitivity, which is different from traditional aggregate functions (e.g., count and sum) on tabular data. In this paper, we treat a graph statistics as a function f and develop a divide and conquer approach to enforce differential privacy. The basic procedure of this approach is to first decompose the target computation f into several less complex unit computations f1, …, fm connected by basic mathematical operations (e.g., addition, subtraction, multiplication, division), then perturb the output of each fi with Laplace noise derived from its own sensitivity value and the distributed privacy threshold εi, and finally combine those perturbed fi as the perturbed output of computation f. We examine how various operations affect the accuracy of complex computations. When unit computations have large global sensitivity values, we enforce the differential privacy by calibrating noise based on the smooth sensitivity, rather than the global sensitivity. By doing this, we achieve the strict differential privacy guarantee with smaller magnitude noise. We illustrate our approach by using clustering coefficient, which is a popular statistics used in social network analysis. Empirical evaluations on five real social networks and various synthetic graphs generated from three random graph models show the developed divide and conquer approach outperforms the direct approach. PMID:24429843
Relative Velocity as a Metric for Probability of Collision Calculations

NASA Technical Reports Server (NTRS)

Frigm, Ryan Clayton; Rohrbaugh, Dave

2008-01-01

Collision risk assessment metrics, such as the probability of collision calculation, are based largely on assumptions about the interaction of two objects during their close approach. Specifically, the approach to probabilistic risk assessment can be performed more easily if the relative trajectories of the two close approach objects are assumed to be linear during the encounter. It is shown in this analysis that one factor in determining linearity is the relative velocity of the two encountering bodies, in that the assumption of linearity breaks down at low relative approach velocities. The first part of this analysis is the determination of the relative velocity threshold below which the assumption of linearity becomes invalid. The second part is a statistical study of conjunction interactions between representative asset spacecraft and the associated debris field environment to determine the likelihood of encountering a low relative velocity close approach. This analysis is performed for both the LEO and GEO orbit regimes. Both parts comment on the resulting effects to collision risk assessment operations.
Objective determination of image end-members in spectral mixture analysis of AVIRIS data

NASA Technical Reports Server (NTRS)

Tompkins, Stefanie; Mustard, John F.; Pieters, Carle M.; Forsyth, Donald W.

1993-01-01

Spectral mixture analysis has been shown to be a powerful, multifaceted tool for analysis of multi- and hyper-spectral data. Applications of AVIRIS data have ranged from mapping soils and bedrock to ecosystem studies. During the first phase of the approach, a set of end-members are selected from an image cube (image end-members) that best account for its spectral variance within a constrained, linear least squares mixing model. These image end-members are usually selected using a priori knowledge and successive trial and error solutions to refine the total number and physical location of the end-members. However, in many situations a more objective method of determining these essential components is desired. We approach the problem of image end-member determination objectively by using the inherent variance of the data. Unlike purely statistical methods such as factor analysis, this approach derives solutions that conform to a physically realistic model.
Identifiability of PBPK Models with Applications to ...

EPA Pesticide Factsheets

Any statistical model should be identifiable in order for estimates and tests using it to be meaningful. We consider statistical analysis of physiologically-based pharmacokinetic (PBPK) models in which parameters cannot be estimated precisely from available data, and discuss different types of identifiability that occur in PBPK models and give reasons why they occur. We particularly focus on how the mathematical structure of a PBPK model and lack of appropriate data can lead to statistical models in which it is impossible to estimate at least some parameters precisely. Methods are reviewed which can determine whether a purely linear PBPK model is globally identifiable. We propose a theorem which determines when identifiability at a set of finite and specific values of the mathematical PBPK model (global discrete identifiability) implies identifiability of the statistical model. However, we are unable to establish conditions that imply global discrete identifiability, and conclude that the only safe approach to analysis of PBPK models involves Bayesian analysis with truncated priors. Finally, computational issues regarding posterior simulations of PBPK models are discussed. The methodology is very general and can be applied to numerous PBPK models which can be expressed as linear time-invariant systems. A real data set of a PBPK model for exposure to dimethyl arsinic acid (DMA(V)) is presented to illustrate the proposed methodology. We consider statistical analy
Cross-Population Joint Analysis of eQTLs: Fine Mapping and Functional Annotation

PubMed Central

Wen, Xiaoquan; Luca, Francesca; Pique-Regi, Roger

2015-01-01

Mapping expression quantitative trait loci (eQTLs) has been shown as a powerful tool to uncover the genetic underpinnings of many complex traits at molecular level. In this paper, we present an integrative analysis approach that leverages eQTL data collected from multiple population groups. In particular, our approach effectively identifies multiple independent cis-eQTL signals that are consistent across populations, accounting for population heterogeneity in allele frequencies and linkage disequilibrium patterns. Furthermore, by integrating genomic annotations, our analysis framework enables high-resolution functional analysis of eQTLs. We applied our statistical approach to analyze the GEUVADIS data consisting of samples from five population groups. From this analysis, we concluded that i) jointly analysis across population groups greatly improves the power of eQTL discovery and the resolution of fine mapping of causal eQTL ii) many genes harbor multiple independent eQTLs in their cis regions iii) genetic variants that disrupt transcription factor binding are significantly enriched in eQTLs (p-value = 4.93 × 10-22). PMID:25906321
Do different data analytic approaches generate discrepant findings when measuring mother-infant HPA axis attunement?

PubMed

Bernard, Nicola K; Kashy, Deborah A; Levendosky, Alytia A; Bogat, G Anne; Lonstein, Joseph S

2017-03-01

Attunement between mothers and infants in their hypothalamic-pituitary-adrenal (HPA) axis responsiveness to acute stressors is thought to benefit the child's emerging physiological and behavioral self-regulation, as well as their socioemotional development. However, there is no universally accepted definition of attunement in the literature, which appears to have resulted in inconsistent statistical analyses for determining its presence or absence, and contributed to discrepant results. We used a series of data analytic approaches, some previously used in the attunement literature and others not, to evaluate the attunement between 182 women and their 1-year-old infants in their HPA axis responsivity to acute stress. Cortisol was measured in saliva samples taken from mothers and infants before and twice after a naturalistic laboratory stressor (infant arm restraint). The results of the data analytic approaches were mixed, with some analyses suggesting attunement while others did not. The strengths and weaknesses of each statistical approach are discussed, and an analysis using a cross-lagged model that considered both time and interactions between mother and infant appeared the most appropriate. Greater consensus in the field about the conceptualization and analysis of physiological attunement would be valuable in order to advance our understanding of this phenomenon. © 2016 Wiley Periodicals, Inc.
IDENTIFICATION OF REGIME SHIFTS IN TIME SERIES USING NEIGHBORHOOD STATISTICS

EPA Science Inventory

The identification of alternative dynamic regimes in ecological systems requires several lines of evidence. Previous work on time series analysis of dynamic regimes includes mainly model-fitting methods. We introduce two methods that do not use models. These approaches use state-...
Addressing the issue of insufficient information in data-based bridge health monitoring : final report.

DOT National Transportation Integrated Search

2015-11-01

One of the most efficient ways to solve the damage detection problem using the statistical pattern recognition : approach is that of exploiting the methods of outlier analysis. Cast within the pattern recognition framework, : damage detection assesse...
Who Was the Real William Shakespeare?

ERIC Educational Resources Information Center

Edwards, Michael Todd

2009-01-01

This article highlights a project that encourages students to connect reading and mathematics instruction by using a data analysis approach. Students analyze sonnets from statistical, literary, and historical points of view in an effort to uncover the true identity of William Shakespeare. (Contains 10 figures.)

Seeking for the rational basis of the median model: the optimal combination of multi-model ensemble results

NASA Astrophysics Data System (ADS)

Riccio, A.; Giunta, G.; Galmarini, S.

2007-04-01

In this paper we present an approach for the statistical analysis of multi-model ensemble results. The models considered here are operational long-range transport and dispersion models, also used for the real-time simulation of pollutant dispersion or the accidental release of radioactive nuclides. We first introduce the theoretical basis (with its roots sinking into the Bayes theorem) and then apply this approach to the analysis of model results obtained during the ETEX-1 exercise. We recover some interesting results, supporting the heuristic approach called "median model", originally introduced in Galmarini et al. (2004a, b). This approach also provides a way to systematically reduce (and quantify) model uncertainties, thus supporting the decision-making process and/or regulatory-purpose activities in a very effective manner.
Seeking for the rational basis of the Median Model: the optimal combination of multi-model ensemble results

NASA Astrophysics Data System (ADS)

Riccio, A.; Giunta, G.; Galmarini, S.

2007-12-01

In this paper we present an approach for the statistical analysis of multi-model ensemble results. The models considered here are operational long-range transport and dispersion models, also used for the real-time simulation of pollutant dispersion or the accidental release of radioactive nuclides. We first introduce the theoretical basis (with its roots sinking into the Bayes theorem) and then apply this approach to the analysis of model results obtained during the ETEX-1 exercise. We recover some interesting results, supporting the heuristic approach called "median model", originally introduced in Galmarini et al. (2004a, b). This approach also provides a way to systematically reduce (and quantify) model uncertainties, thus supporting the decision-making process and/or regulatory-purpose activities in a very effective manner.
The mediating effect of calling on the relationship between medical school students' academic burnout and empathy.

PubMed

Chae, Su Jin; Jeong, So Mi; Chung, Yoon-Sok

2017-09-01

This study is aimed at identifying the relationships between medical school students' academic burnout, empathy, and calling, and determining whether their calling has a mediating effect on the relationship between academic burnout and empathy. A mixed method study was conducted. One hundred twenty-seven medical students completed a survey. Scales measuring academic burnout, medical students' empathy, and calling were utilized. For statistical analysis, correlation analysis, descriptive statistics analysis, and hierarchical multiple regression analyses were conducted. For qualitative approach, eight medical students participated in a focus group interview. The study found that empathy has a statistically significant, negative correlation with academic burnout, while having a significant, positive correlation with calling. Sense of calling proved to be an effective mediator of the relationship between academic burnout and empathy. This result demonstrates that calling is a key variable that mediates the relationship between medical students' academic burnout and empathy. As such, this study provides baseline data for an education that could improve medical students' empathy skills.
Statistical approaches to account for missing values in accelerometer data: Applications to modeling physical activity.

PubMed

Yue Xu, Selene; Nelson, Sandahl; Kerr, Jacqueline; Godbole, Suneeta; Patterson, Ruth; Merchant, Gina; Abramson, Ian; Staudenmayer, John; Natarajan, Loki

2018-04-01

Physical inactivity is a recognized risk factor for many chronic diseases. Accelerometers are increasingly used as an objective means to measure daily physical activity. One challenge in using these devices is missing data due to device nonwear. We used a well-characterized cohort of 333 overweight postmenopausal breast cancer survivors to examine missing data patterns of accelerometer outputs over the day. Based on these observed missingness patterns, we created psuedo-simulated datasets with realistic missing data patterns. We developed statistical methods to design imputation and variance weighting algorithms to account for missing data effects when fitting regression models. Bias and precision of each method were evaluated and compared. Our results indicated that not accounting for missing data in the analysis yielded unstable estimates in the regression analysis. Incorporating variance weights and/or subject-level imputation improved precision by >50%, compared to ignoring missing data. We recommend that these simple easy-to-implement statistical tools be used to improve analysis of accelerometer data.
Comparison of Exposure in the Kaplan Versus the Kocher Approach in the Treatment of Radial Head Fractures.

PubMed

Barnes, Leslie Fink; Lombardi, Joseph; Gardner, Thomas R; Strauch, Robert J; Rosenwasser, Melvin P

2018-01-01

The aim of this study was to compare the complete visible surface area of the radial head, neck, and coronoid in the Kaplan and Kocher approaches to the lateral elbow. The hypothesis was that the Kaplan approach would afford greater visibility due to the differential anatomy of the intermuscular planes. Ten cadavers were dissected with the Kaplan and Kocher approaches, and the visible surface area was measured in situ using a 3-dimensional digitizer. Six measurements were taken for each approach by 2 surgeons, and the mean of these measurements were analyzed. The mean surface area visible with the lateral collateral ligament (LCL) preserved in the Kaplan approach was 616.6 mm 2 in comparison with the surface area of 136.2 mm 2 visible in the Kocher approach when the LCL was preserved. Using a 2-way analysis of variance, the difference between these 2 approaches was statistically significant. When the LCL complex was incised in the Kocher approach, the average visible surface area of the Kocher approach was 456.1 mm 2 and was statistically less than the Kaplan approach. The average surface area of the coronoid visible using a proximally extended Kaplan approach was 197.8 mm 2 . The Kaplan approach affords significantly greater visible surface area of the proximal radius than the Kocher approach.
Functional MRI Preprocessing in Lesioned Brains: Manual Versus Automated Region of Interest Analysis

PubMed Central

Garrison, Kathleen A.; Rogalsky, Corianne; Sheng, Tong; Liu, Brent; Damasio, Hanna; Winstein, Carolee J.; Aziz-Zadeh, Lisa S.

2015-01-01

Functional magnetic resonance imaging (fMRI) has significant potential in the study and treatment of neurological disorders and stroke. Region of interest (ROI) analysis in such studies allows for testing of strong a priori clinical hypotheses with improved statistical power. A commonly used automated approach to ROI analysis is to spatially normalize each participant’s structural brain image to a template brain image and define ROIs using an atlas. However, in studies of individuals with structural brain lesions, such as stroke, the gold standard approach may be to manually hand-draw ROIs on each participant’s non-normalized structural brain image. Automated approaches to ROI analysis are faster and more standardized, yet are susceptible to preprocessing error (e.g., normalization error) that can be greater in lesioned brains. The manual approach to ROI analysis has high demand for time and expertise, but may provide a more accurate estimate of brain response. In this study, commonly used automated and manual approaches to ROI analysis were directly compared by reanalyzing data from a previously published hypothesis-driven cognitive fMRI study, involving individuals with stroke. The ROI evaluated is the pars opercularis of the inferior frontal gyrus. Significant differences were identified in task-related effect size and percent-activated voxels in this ROI between the automated and manual approaches to ROI analysis. Task interactions, however, were consistent across ROI analysis approaches. These findings support the use of automated approaches to ROI analysis in studies of lesioned brains, provided they employ a task interaction design. PMID:26441816
A Finite-Volume "Shaving" Method for Interfacing NASA/DAO''s Physical Space Statistical Analysis System to the Finite-Volume GCM with a Lagrangian Control-Volume Vertical Coordinate

NASA Technical Reports Server (NTRS)

Lin, Shian-Jiann; DaSilva, Arlindo; Atlas, Robert (Technical Monitor)

2001-01-01

Toward the development of a finite-volume Data Assimilation System (fvDAS), a consistent finite-volume methodology is developed for interfacing the NASA/DAO's Physical Space Statistical Analysis System (PSAS) to the joint NASA/NCAR finite volume CCM3 (fvCCM3). To take advantage of the Lagrangian control-volume vertical coordinate of the fvCCM3, a novel "shaving" method is applied to the lowest few model layers to reflect the surface pressure changes as implied by the final analysis. Analysis increments (from PSAS) to the upper air variables are then consistently put onto the Lagrangian layers as adjustments to the volume-mean quantities during the analysis cycle. This approach is demonstrated to be superior to the conventional method of using independently computed "tendency terms" for surface pressure and upper air prognostic variables.
Spatial Differentiation of Landscape Values in the Murray River Region of Victoria, Australia

NASA Astrophysics Data System (ADS)

Zhu, Xuan; Pfueller, Sharron; Whitelaw, Paul; Winter, Caroline

2010-05-01

This research advances the understanding of the location of perceived landscape values through a statistically based approach to spatial analysis of value densities. Survey data were obtained from a sample of people living in and using the Murray River region, Australia, where declining environmental quality prompted a reevaluation of its conservation status. When densities of 12 perceived landscape values were mapped using geographic information systems (GIS), valued places clustered along the entire river bank and in associated National/State Parks and reserves. While simple density mapping revealed high value densities in various locations, it did not indicate what density of a landscape value could be regarded as a statistically significant hotspot or distinguish whether overlapping areas of high density for different values indicate identical or adjacent locations. A spatial statistic Getis-Ord Gi* was used to indicate statistically significant spatial clusters of high value densities or “hotspots”. Of 251 hotspots, 40% were for single non-use values, primarily spiritual, therapeutic or intrinsic. Four hotspots had 11 landscape values. Two, lacking economic value, were located in ecologically important river red gum forests and two, lacking wilderness value, were near the major towns of Echuca-Moama and Albury-Wodonga. Hotspots for eight values showed statistically significant associations with another value. There were high associations between learning and heritage values while economic and biological diversity values showed moderate associations with several other direct and indirect use values. This approach may improve confidence in the interpretation of spatial analysis of landscape values by enhancing understanding of value relationships.
Moment-based metrics for global sensitivity analysis of hydrological systems

NASA Astrophysics Data System (ADS)

Dell'Oca, Aronne; Riva, Monica; Guadagnini, Alberto

2017-12-01

We propose new metrics to assist global sensitivity analysis, GSA, of hydrological and Earth systems. Our approach allows assessing the impact of uncertain parameters on main features of the probability density function, pdf, of a target model output, y. These include the expected value of y, the spread around the mean and the degree of symmetry and tailedness of the pdf of y. Since reliable assessment of higher-order statistical moments can be computationally demanding, we couple our GSA approach with a surrogate model, approximating the full model response at a reduced computational cost. Here, we consider the generalized polynomial chaos expansion (gPCE), other model reduction techniques being fully compatible with our theoretical framework. We demonstrate our approach through three test cases, including an analytical benchmark, a simplified scenario mimicking pumping in a coastal aquifer and a laboratory-scale conservative transport experiment. Our results allow ascertaining which parameters can impact some moments of the model output pdf while being uninfluential to others. We also investigate the error associated with the evaluation of our sensitivity metrics by replacing the original system model through a gPCE. Our results indicate that the construction of a surrogate model with increasing level of accuracy might be required depending on the statistical moment considered in the GSA. The approach is fully compatible with (and can assist the development of) analysis techniques employed in the context of reduction of model complexity, model calibration, design of experiment, uncertainty quantification and risk assessment.
Detecting Disease Specific Pathway Substructures through an Integrated Systems Biology Approach

PubMed Central

Alaimo, Salvatore; Marceca, Gioacchino Paolo; Ferro, Alfredo; Pulvirenti, Alfredo

2017-01-01

In the era of network medicine, pathway analysis methods play a central role in the prediction of phenotype from high throughput experiments. In this paper, we present a network-based systems biology approach capable of extracting disease-perturbed subpathways within pathway networks in connection with expression data taken from The Cancer Genome Atlas (TCGA). Our system extends pathways with missing regulatory elements, such as microRNAs, and their interactions with genes. The framework enables the extraction, visualization, and analysis of statistically significant disease-specific subpathways through an easy to use web interface. Our analysis shows that the methodology is able to fill the gap in current techniques, allowing a more comprehensive analysis of the phenomena underlying disease states. PMID:29657291
A unified statistical approach to non-negative matrix factorization and probabilistic latent semantic indexing

PubMed Central

Wang, Guoli; Ebrahimi, Nader

2014-01-01

Non-negative matrix factorization (NMF) is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix V into the product of two nonnegative matrices, W and H, such that V ∼ W H. It has been shown to have a parts-based, sparse representation of the data. NMF has been successfully applied in a variety of areas such as natural language processing, neuroscience, information retrieval, image processing, speech recognition and computational biology for the analysis and interpretation of large-scale data. There has also been simultaneous development of a related statistical latent class modeling approach, namely, probabilistic latent semantic indexing (PLSI), for analyzing and interpreting co-occurrence count data arising in natural language processing. In this paper, we present a generalized statistical approach to NMF and PLSI based on Renyi's divergence between two non-negative matrices, stemming from the Poisson likelihood. Our approach unifies various competing models and provides a unique theoretical framework for these methods. We propose a unified algorithm for NMF and provide a rigorous proof of monotonicity of multiplicative updates for W and H. In addition, we generalize the relationship between NMF and PLSI within this framework. We demonstrate the applicability and utility of our approach as well as its superior performance relative to existing methods using real-life and simulated document clustering data. PMID:25821345
A unified statistical approach to non-negative matrix factorization and probabilistic latent semantic indexing.

PubMed

Devarajan, Karthik; Wang, Guoli; Ebrahimi, Nader

2015-04-01

Non-negative matrix factorization (NMF) is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix V into the product of two nonnegative matrices, W and H , such that V ∼ W H . It has been shown to have a parts-based, sparse representation of the data. NMF has been successfully applied in a variety of areas such as natural language processing, neuroscience, information retrieval, image processing, speech recognition and computational biology for the analysis and interpretation of large-scale data. There has also been simultaneous development of a related statistical latent class modeling approach, namely, probabilistic latent semantic indexing (PLSI), for analyzing and interpreting co-occurrence count data arising in natural language processing. In this paper, we present a generalized statistical approach to NMF and PLSI based on Renyi's divergence between two non-negative matrices, stemming from the Poisson likelihood. Our approach unifies various competing models and provides a unique theoretical framework for these methods. We propose a unified algorithm for NMF and provide a rigorous proof of monotonicity of multiplicative updates for W and H . In addition, we generalize the relationship between NMF and PLSI within this framework. We demonstrate the applicability and utility of our approach as well as its superior performance relative to existing methods using real-life and simulated document clustering data.
Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses

PubMed Central

Liu, Ruijie; Holik, Aliaksei Z.; Su, Shian; Jansz, Natasha; Chen, Kelan; Leong, Huei San; Blewitt, Marnie E.; Asselin-Labat, Marie-Liesse; Smyth, Gordon K.; Ritchie, Matthew E.

2015-01-01

Variations in sample quality are frequently encountered in small RNA-sequencing experiments, and pose a major challenge in a differential expression analysis. Removal of high variation samples reduces noise, but at a cost of reducing power, thus limiting our ability to detect biologically meaningful changes. Similarly, retaining these samples in the analysis may not reveal any statistically significant changes due to the higher noise level. A compromise is to use all available data, but to down-weight the observations from more variable samples. We describe a statistical approach that facilitates this by modelling heterogeneity at both the sample and observational levels as part of the differential expression analysis. At the sample level this is achieved by fitting a log-linear variance model that includes common sample-specific or group-specific parameters that are shared between genes. The estimated sample variance factors are then converted to weights and combined with observational level weights obtained from the mean–variance relationship of the log-counts-per-million using ‘voom’. A comprehensive analysis involving both simulations and experimental RNA-sequencing data demonstrates that this strategy leads to a universally more powerful analysis and fewer false discoveries when compared to conventional approaches. This methodology has wide application and is implemented in the open-source ‘limma’ package. PMID:25925576
Protein Multiplexed Immunoassay Analysis with R.

PubMed

Breen, Edmond J

2017-01-01

Plasma samples from 177 control and type 2 diabetes patients collected at three Australian hospitals are screened for 14 analytes using six custom-made multiplex kits across 60 96-well plates. In total 354 samples were collected from the patients, representing one baseline and one end point sample from each patient. R methods and source code for analyzing the analyte fluorescence response obtained from these samples by Luminex Bio-Plex ® xMap multiplexed immunoassay technology are disclosed. Techniques and R procedures for reading Bio-Plex ® result files for statistical analysis and data visualization are also presented. The need for technical replicates and the number of technical replicates are addressed as well as plate layout design strategies. Multinomial regression is used to determine plate to sample covariate balance. Methods for matching clinical covariate information to Bio-Plex ® results and vice versa are given. As well as methods for measuring and inspecting the quality of the fluorescence responses are presented. Both fixed and mixed-effect approaches for immunoassay statistical differential analysis are presented and discussed. A random effect approach to outlier analysis and detection is also shown. The bioinformatics R methodology present here provides a foundation for rigorous and reproducible analysis of the fluorescence response obtained from multiplexed immunoassays.
Software for Statistical Analysis of Weibull Distributions with Application to Gear Fatigue Data: User Manual with Verification

NASA Technical Reports Server (NTRS)

Krantz, Timothy L.

2002-01-01

The Weibull distribution has been widely adopted for the statistical description and inference of fatigue data. This document provides user instructions, examples, and verification for software to analyze gear fatigue test data. The software was developed presuming the data are adequately modeled using a two-parameter Weibull distribution. The calculations are based on likelihood methods, and the approach taken is valid for data that include type 1 censoring. The software was verified by reproducing results published by others.
Statistical analysis of time transfer data from Timation 2. [US Naval Observatory and Australia

NASA Technical Reports Server (NTRS)

Luck, J. M.; Morgan, P.

1974-01-01

Between July 1973 and January 1974, three time transfer experiments using the Timation 2 satellite were conducted to measure time differences between the U.S. Naval Observatory and Australia. Statistical tests showed that the results are unaffected by the satellite's position with respect to the sunrise/sunset line or by its closest approach azimuth at the Australian station. Further tests revealed that forward predictions of time scale differences, based on the measurements, can be made with high confidence.
Software for Statistical Analysis of Weibull Distributions with Application to Gear Fatigue Data: User Manual with Verification

NASA Technical Reports Server (NTRS)

Kranz, Timothy L.

2002-01-01

The Weibull distribution has been widely adopted for the statistical description and inference of fatigue data. This document provides user instructions, examples, and verification for software to analyze gear fatigue test data. The software was developed presuming the data are adequately modeled using a two-parameter Weibull distribution. The calculations are based on likelihood methods, and the approach taken is valid for data that include type I censoring. The software was verified by reproducing results published by others.
FMRI group analysis combining effect estimates and their variances

PubMed Central

Chen, Gang; Saad, Ziad S.; Nath, Audrey R.; Beauchamp, Michael S.; Cox, Robert W.

2012-01-01

Conventional functional magnetic resonance imaging (FMRI) group analysis makes two key assumptions that are not always justified. First, the data from each subject is condensed into a single number per voxel, under the assumption that within-subject variance for the effect of interest is the same across all subjects or is negligible relative to the cross-subject variance. Second, it is assumed that all data values are drawn from the same Gaussian distribution with no outliers. We propose an approach that does not make such strong assumptions, and present a computationally efficient frequentist approach to FMRI group analysis, which we term mixed-effects multilevel analysis (MEMA), that incorporates both the variability across subjects and the precision estimate of each effect of interest from individual subject analyses. On average, the more accurate tests result in higher statistical power, especially when conventional variance assumptions do not hold, or in the presence of outliers. In addition, various heterogeneity measures are available with MEMA that may assist the investigator in further improving the modeling. Our method allows group effect t-tests and comparisons among conditions and among groups. In addition, it has the capability to incorporate subject-specific covariates such as age, IQ, or behavioral data. Simulations were performed to illustrate power comparisons and the capability of controlling type I errors among various significance testing methods, and the results indicated that the testing statistic we adopted struck a good balance between power gain and type I error control. Our approach is instantiated in an open-source, freely distributed program that may be used on any dataset stored in the universal neuroimaging file transfer (NIfTI) format. To date, the main impediment for more accurate testing that incorporates both within- and cross-subject variability has been the high computational cost. Our efficient implementation makes this approach practical. We recommend its use in lieu of the less accurate approach in the conventional group analysis. PMID:22245637
Multi-classification of cell deformation based on object alignment and run length statistic.

PubMed

Li, Heng; Liu, Zhiwen; An, Xing; Shi, Yonggang

2014-01-01

Cellular morphology is widely applied in digital pathology and is essential for improving our understanding of the basic physiological processes of organisms. One of the main issues of application is to develop efficient methods for cell deformation measurement. We propose an innovative indirect approach to analyze dynamic cell morphology in image sequences. The proposed approach considers both the cellular shape change and cytoplasm variation, and takes each frame in the image sequence into account. The cell deformation is measured by the minimum energy function of object alignment, which is invariant to object pose. Then an indirect analysis strategy is employed to overcome the limitation of gradual deformation by run length statistic. We demonstrate the power of the proposed approach with one application: multi-classification of cell deformation. Experimental results show that the proposed method is sensitive to the morphology variation and performs better than standard shape representation methods.
The cutting edge - Micro-CT for quantitative toolmark analysis of sharp force trauma to bone.

PubMed

Norman, D G; Watson, D G; Burnett, B; Fenne, P M; Williams, M A

2018-02-01

Toolmark analysis involves examining marks created on an object to identify the likely tool responsible for creating those marks (e.g., a knife). Although a potentially powerful forensic tool, knife mark analysis is still in its infancy and the validation of imaging techniques as well as quantitative approaches is ongoing. This study builds on previous work by simulating real-world stabbings experimentally and statistically exploring quantitative toolmark properties, such as cut mark angle captured by micro-CT imaging, to predict the knife responsible. In Experiment 1 a mechanical stab rig and two knives were used to create 14 knife cut marks on dry pig ribs. The toolmarks were laser and micro-CT scanned to allow for quantitative measurements of numerous toolmark properties. The findings from Experiment 1 demonstrated that both knives produced statistically different cut mark widths, wall angle and shapes. Experiment 2 examined knife marks created on fleshed pig torsos with conditions designed to better simulate real-world stabbings. Eight knives were used to generate 64 incision cut marks that were also micro-CT scanned. Statistical exploration of these cut marks suggested that knife type, serrated or plain, can be predicted from cut mark width and wall angle. Preliminary results suggest that knives type can be predicted from cut mark width, and that knife edge thickness correlates with cut mark width. An additional 16 cut marks walls were imaged for striation marks using scanning electron microscopy with results suggesting that this approach might not be useful for knife mark analysis. Results also indicated that observer judgements of cut mark shape were more consistent when rated from micro-CT images than light microscopy images. The potential to combine micro-CT data, medical grade CT data and photographs to develop highly realistic virtual models for visualisation and 3D printing is also demonstrated. This is the first study to statistically explore simulated real-world knife marks imaged by micro-CT to demonstrate the potential of quantitative approaches in knife mark analysis. Findings and methods presented in this study are relevant to both forensic toolmark researchers as well as practitioners. Limitations of the experimental methodologies and imaging techniques are discussed, and further work is recommended. Copyright © 2017 Elsevier B.V. All rights reserved.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Sample records for statistical analysis approach

Abstract