general linear statistical: Topics by Science.gov

Sample records for general linear statistical

Generalized massive optimal data compression

NASA Astrophysics Data System (ADS)

Alsing, Justin; Wandelt, Benjamin

2018-05-01

In this paper, we provide a general procedure for optimally compressing N data down to n summary statistics, where n is equal to the number of parameters of interest. We show that compression to the score function - the gradient of the log-likelihood with respect to the parameters - yields n compressed statistics that are optimal in the sense that they preserve the Fisher information content of the data. Our method generalizes earlier work on linear Karhunen-Loéve compression for Gaussian data whilst recovering both lossless linear compression and quadratic estimation as special cases when they are optimal. We give a unified treatment that also includes the general non-Gaussian case as long as mild regularity conditions are satisfied, producing optimal non-linear summary statistics when appropriate. As a worked example, we derive explicitly the n optimal compressed statistics for Gaussian data in the general case where both the mean and covariance depend on the parameters.
Statistical inference for template aging

NASA Astrophysics Data System (ADS)

Schuckers, Michael E.

2006-04-01

A change in classification error rates for a biometric device is often referred to as template aging. Here we offer two methods for determining whether the effect of time is statistically significant. The first of these is the use of a generalized linear model to determine if these error rates change linearly over time. This approach generalizes previous work assessing the impact of covariates using generalized linear models. The second approach uses of likelihood ratio tests methodology. The focus here is on statistical methods for estimation not the underlying cause of the change in error rates over time. These methodologies are applied to data from the National Institutes of Standards and Technology Biometric Score Set Release 1. The results of these applications are discussed.
Analyzing longitudinal data with the linear mixed models procedure in SPSS.

PubMed

West, Brady T

2009-09-01

Many applied researchers analyzing longitudinal data share a common misconception: that specialized statistical software is necessary to fit hierarchical linear models (also known as linear mixed models [LMMs], or multilevel models) to longitudinal data sets. Although several specialized statistical software programs of high quality are available that allow researchers to fit these models to longitudinal data sets (e.g., HLM), rapid advances in general purpose statistical software packages have recently enabled analysts to fit these same models when using preferred packages that also enable other more common analyses. One of these general purpose statistical packages is SPSS, which includes a very flexible and powerful procedure for fitting LMMs to longitudinal data sets with continuous outcomes. This article aims to present readers with a practical discussion of how to analyze longitudinal data using the LMMs procedure in the SPSS statistical software package.
Noise limitations in optical linear algebra processors.

PubMed

Batsell, S G; Jong, T L; Walkup, J F; Krile, T F

1990-05-10

A general statistical noise model is presented for optical linear algebra processors. A statistical analysis which includes device noise, the multiplication process, and the addition operation is undertaken. We focus on those processes which are architecturally independent. Finally, experimental results which verify the analytical predictions are also presented.
Extending local canonical correlation analysis to handle general linear contrasts for FMRI data.

PubMed

Jin, Mingwu; Nandy, Rajesh; Curran, Tim; Cordes, Dietmar

2012-01-01

Local canonical correlation analysis (CCA) is a multivariate method that has been proposed to more accurately determine activation patterns in fMRI data. In its conventional formulation, CCA has several drawbacks that limit its usefulness in fMRI. A major drawback is that, unlike the general linear model (GLM), a test of general linear contrasts of the temporal regressors has not been incorporated into the CCA formalism. To overcome this drawback, a novel directional test statistic was derived using the equivalence of multivariate multiple regression (MVMR) and CCA. This extension will allow CCA to be used for inference of general linear contrasts in more complicated fMRI designs without reparameterization of the design matrix and without reestimating the CCA solutions for each particular contrast of interest. With the proper constraints on the spatial coefficients of CCA, this test statistic can yield a more powerful test on the inference of evoked brain regional activations from noisy fMRI data than the conventional t-test in the GLM. The quantitative results from simulated and pseudoreal data and activation maps from fMRI data were used to demonstrate the advantage of this novel test statistic.
Extending Local Canonical Correlation Analysis to Handle General Linear Contrasts for fMRI Data

PubMed Central

Jin, Mingwu; Nandy, Rajesh; Curran, Tim; Cordes, Dietmar

2012-01-01

Local canonical correlation analysis (CCA) is a multivariate method that has been proposed to more accurately determine activation patterns in fMRI data. In its conventional formulation, CCA has several drawbacks that limit its usefulness in fMRI. A major drawback is that, unlike the general linear model (GLM), a test of general linear contrasts of the temporal regressors has not been incorporated into the CCA formalism. To overcome this drawback, a novel directional test statistic was derived using the equivalence of multivariate multiple regression (MVMR) and CCA. This extension will allow CCA to be used for inference of general linear contrasts in more complicated fMRI designs without reparameterization of the design matrix and without reestimating the CCA solutions for each particular contrast of interest. With the proper constraints on the spatial coefficients of CCA, this test statistic can yield a more powerful test on the inference of evoked brain regional activations from noisy fMRI data than the conventional t-test in the GLM. The quantitative results from simulated and pseudoreal data and activation maps from fMRI data were used to demonstrate the advantage of this novel test statistic. PMID:22461786
Image-analysis library

NASA Technical Reports Server (NTRS)

1980-01-01

MATHPAC image-analysis library is collection of general-purpose mathematical and statistical routines and special-purpose data-analysis and pattern-recognition routines for image analysis. MATHPAC library consists of Linear Algebra, Optimization, Statistical-Summary, Densities and Distribution, Regression, and Statistical-Test packages.
A Constrained Linear Estimator for Multiple Regression

ERIC Educational Resources Information Center

Davis-Stober, Clintin P.; Dana, Jason; Budescu, David V.

2010-01-01

"Improper linear models" (see Dawes, Am. Psychol. 34:571-582, "1979"), such as equal weighting, have garnered interest as alternatives to standard regression models. We analyze the general circumstances under which these models perform well by recasting a class of "improper" linear models as "proper" statistical models with a single predictor. We…
On Generalizations of Cochran’s Theorem and Projection Matrices.

DTIC Science & Technology

1980-08-01

Definiteness of the Estimated Dispersion Matrix in a Multivariate Linear Model ," F. Pukelsheim and George P.H. Styan, May 1978. TECHNICAL REPORTS...with applications to the analysis of covariance," Proc. Cambridge Philos. Soc., 30, pp. 178-191. Graybill , F. A. and Marsaglia, G. (1957...34Idempotent matrices and quad- ratic forms in the general linear hypothesis," Ann. Math. Statist., 28, pp. 678-686. Greub, W. (1975). Linear Algebra (4th ed
Commentary on the statistical properties of noise and its implication on general linear models in functional near-infrared spectroscopy.

PubMed

Huppert, Theodore J

2016-01-01

Functional near-infrared spectroscopy (fNIRS) is a noninvasive neuroimaging technique that uses low levels of light to measure changes in cerebral blood oxygenation levels. In the majority of NIRS functional brain studies, analysis of this data is based on a statistical comparison of hemodynamic levels between a baseline and task or between multiple task conditions by means of a linear regression model: the so-called general linear model. Although these methods are similar to their implementation in other fields, particularly for functional magnetic resonance imaging, the specific application of these methods in fNIRS research differs in several key ways related to the sources of noise and artifacts unique to fNIRS. In this brief communication, we discuss the application of linear regression models in fNIRS and the modifications needed to generalize these models in order to deal with structured (colored) noise due to systemic physiology and noise heteroscedasticity due to motion artifacts. The objective of this work is to present an overview of these noise properties in the context of the linear model as it applies to fNIRS data. This work is aimed at explaining these mathematical issues to the general fNIRS experimental researcher but is not intended to be a complete mathematical treatment of these concepts.
Evaluation of airborne lidar data to predict vegetation Presence/Absence

USGS Publications Warehouse

Palaseanu-Lovejoy, M.; Nayegandhi, A.; Brock, J.; Woodman, R.; Wright, C.W.

2009-01-01

This study evaluates the capabilities of the Experimental Advanced Airborne Research Lidar (EAARL) in delineating vegetation assemblages in Jean Lafitte National Park, Louisiana. Five-meter-resolution grids of bare earth, canopy height, canopy-reflection ratio, and height of median energy were derived from EAARL data acquired in September 2006. Ground-truth data were collected along transects to assess species composition, canopy cover, and ground cover. To decide which model is more accurate, comparisons of general linear models and generalized additive models were conducted using conventional evaluation methods (i.e., sensitivity, specificity, Kappa statistics, and area under the curve) and two new indexes, net reclassification improvement and integrated discrimination improvement. Generalized additive models were superior to general linear models in modeling presence/absence in training vegetation categories, but no statistically significant differences between the two models were achieved in determining the classification accuracy at validation locations using conventional evaluation methods, although statistically significant improvements in net reclassifications were observed. ?? 2009 Coastal Education and Research Foundation.
The photon gas formulation of thermal radiation

NASA Technical Reports Server (NTRS)

Ried, R. C., Jr.

1975-01-01

A statistical consideration of the energy, the linear momentum, and the angular momentum of the photons that make up a thermal radiation field was presented. A general nonequilibrium statistical thermodynamics approach toward a macroscopic description of thermal radiation transport was developed and then applied to the restricted equilibrium statistical thermostatics derivation of the energy, linear momentum, and intrinsic angular momentum equations for an isotropic photon gas. A brief treatment of a nonisotropic photon gas, as an example of the results produced by the nonequilibrium statistical thermodynamics approach, was given. The relativistic variation of temperature and the invariance of entropy were illustrated.
Statistical Tutorial | Center for Cancer Research

Cancer.gov

Recent advances in cancer biology have resulted in the need for increased statistical analysis of research data. ST is designed as a follow up to Statistical Analysis of Research Data (SARD) held in April 2018. The tutorial will apply the general principles of statistical analysis of research data including descriptive statistics, z- and t-tests of means and mean differences, simple and multiple linear regression, ANOVA tests, and Chi-Squared distribution.
Generalized linear and generalized additive models in studies of species distributions: Setting the scene

USGS Publications Warehouse

Guisan, Antoine; Edwards, T.C.; Hastie, T.

2002-01-01

An important statistical development of the last 30 years has been the advance in regression analysis provided by generalized linear models (GLMs) and generalized additive models (GAMs). Here we introduce a series of papers prepared within the framework of an international workshop entitled: Advances in GLMs/GAMs modeling: from species distribution to environmental management, held in Riederalp, Switzerland, 6-11 August 2001. We first discuss some general uses of statistical models in ecology, as well as provide a short review of several key examples of the use of GLMs and GAMs in ecological modeling efforts. We next present an overview of GLMs and GAMs, and discuss some of their related statistics used for predictor selection, model diagnostics, and evaluation. Included is a discussion of several new approaches applicable to GLMs and GAMs, such as ridge regression, an alternative to stepwise selection of predictors, and methods for the identification of interactions by a combined use of regression trees and several other approaches. We close with an overview of the papers and how we feel they advance our understanding of their application to ecological modeling. ?? 2002 Elsevier Science B.V. All rights reserved.
The Development of Web-based Graphical User Interface for Unified Modeling Data with Multi (Correlated) Responses

NASA Astrophysics Data System (ADS)

Made Tirta, I.; Anggraeni, Dian

2018-04-01

Statistical models have been developed rapidly into various directions to accommodate various types of data. Data collected from longitudinal, repeated measured, clustered data (either continuous, binary, count, or ordinal), are more likely to be correlated. Therefore statistical model for independent responses, such as Generalized Linear Model (GLM), Generalized Additive Model (GAM) are not appropriate. There are several models available to apply for correlated responses including GEEs (Generalized Estimating Equations), for marginal model and various mixed effect model such as GLMM (Generalized Linear Mixed Models) and HGLM (Hierarchical Generalized Linear Models) for subject spesific models. These models are available on free open source software R, but they can only be accessed through command line interface (using scrit). On the othe hand, most practical researchers very much rely on menu based or Graphical User Interface (GUI). We develop, using Shiny framework, standard pull down menu Web-GUI that unifies most models for correlated responses. The Web-GUI has accomodated almost all needed features. It enables users to do and compare various modeling for repeated measure data (GEE, GLMM, HGLM, GEE for nominal responses) much more easily trough online menus. This paper discusses the features of the Web-GUI and illustrates the use of them. In General we find that GEE, GLMM, HGLM gave very closed results.
Statistical Methods for Generalized Linear Models with Covariates Subject to Detection Limits.

PubMed

Bernhardt, Paul W; Wang, Huixia J; Zhang, Daowen

2015-05-01

Censored observations are a common occurrence in biomedical data sets. Although a large amount of research has been devoted to estimation and inference for data with censored responses, very little research has focused on proper statistical procedures when predictors are censored. In this paper, we consider statistical methods for dealing with multiple predictors subject to detection limits within the context of generalized linear models. We investigate and adapt several conventional methods and develop a new multiple imputation approach for analyzing data sets with predictors censored due to detection limits. We establish the consistency and asymptotic normality of the proposed multiple imputation estimator and suggest a computationally simple and consistent variance estimator. We also demonstrate that the conditional mean imputation method often leads to inconsistent estimates in generalized linear models, while several other methods are either computationally intensive or lead to parameter estimates that are biased or more variable compared to the proposed multiple imputation estimator. In an extensive simulation study, we assess the bias and variability of different approaches within the context of a logistic regression model and compare variance estimation methods for the proposed multiple imputation estimator. Lastly, we apply several methods to analyze the data set from a recently-conducted GenIMS study.
Response statistics of rotating shaft with non-linear elastic restoring forces by path integration

NASA Astrophysics Data System (ADS)

Gaidai, Oleg; Naess, Arvid; Dimentberg, Michael

2017-07-01

Extreme statistics of random vibrations is studied for a Jeffcott rotor under uniaxial white noise excitation. Restoring force is modelled as elastic non-linear; comparison is done with linearized restoring force to see the force non-linearity effect on the response statistics. While for the linear model analytical solutions and stability conditions are available, it is not generally the case for non-linear system except for some special cases. The statistics of non-linear case is studied by applying path integration (PI) method, which is based on the Markov property of the coupled dynamic system. The Jeffcott rotor response statistics can be obtained by solving the Fokker-Planck (FP) equation of the 4D dynamic system. An efficient implementation of PI algorithm is applied, namely fast Fourier transform (FFT) is used to simulate dynamic system additive noise. The latter allows significantly reduce computational time, compared to the classical PI. Excitation is modelled as Gaussian white noise, however any kind distributed white noise can be implemented with the same PI technique. Also multidirectional Markov noise can be modelled with PI in the same way as unidirectional. PI is accelerated by using Monte Carlo (MC) estimated joint probability density function (PDF) as initial input. Symmetry of dynamic system was utilized to afford higher mesh resolution. Both internal (rotating) and external damping are included in mechanical model of the rotor. The main advantage of using PI rather than MC is that PI offers high accuracy in the probability distribution tail. The latter is of critical importance for e.g. extreme value statistics, system reliability, and first passage probability.
SOCR Analyses - an Instructional Java Web-based Statistical Analysis Toolkit.

PubMed

Chu, Annie; Cui, Jenny; Dinov, Ivo D

2009-03-01

The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test.The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website.In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models.
Normality of raw data in general linear models: The most widespread myth in statistics

USGS Publications Warehouse

Kery, Marc; Hatfield, Jeff S.

2003-01-01

In years of statistical consulting for ecologists and wildlife biologists, by far the most common misconception we have come across has been the one about normality in general linear models. These comprise a very large part of the statistical models used in ecology and include t tests, simple and multiple linear regression, polynomial regression, and analysis of variance (ANOVA) and covariance (ANCOVA). There is a widely held belief that the normality assumption pertains to the raw data rather than to the model residuals. We suspect that this error may also occur in countless published studies, whenever the normality assumption is tested prior to analysis. This may lead to the use of nonparametric alternatives (if there are any), when parametric tests would indeed be appropriate, or to use of transformations of raw data, which may introduce hidden assumptions such as multiplicative effects on the natural scale in the case of log-transformed data. Our aim here is to dispel this myth. We very briefly describe relevant theory for two cases of general linear models to show that the residuals need to be normally distributed if tests requiring normality are to be used, such as t and F tests. We then give two examples demonstrating that the distribution of the response variable may be nonnormal, and yet the residuals are well behaved. We do not go into the issue of how to test normality; instead we display the distributions of response variables and residuals graphically.
A note about high blood pressure in childhood

NASA Astrophysics Data System (ADS)

Teodoro, M. Filomena; Simão, Carla

2017-06-01

In medical, behavioral and social sciences it is usual to get a binary outcome. In the present work is collected information where some of the outcomes are binary variables (1='yes'/ 0='no'). In [14] a preliminary study about the caregivers perception of pediatric hypertension was introduced. An experimental questionnaire was designed to be answered by the caregivers of routine pediatric consultation attendees in the Santa Maria's hospital (HSM). The collected data was statistically analyzed, where a descriptive analysis and a predictive model were performed. Significant relations between some socio-demographic variables and the assessed knowledge were obtained. In [14] can be found a statistical data analysis using partial questionnaire's information. The present article completes the statistical approach estimating a model for relevant remaining questions of questionnaire by Generalized Linear Models (GLM). Exploring the binary outcome issue, we intend to extend this approach using Generalized Linear Mixed Models (GLMM), but the process is still ongoing.

On Fitting Generalized Linear Mixed-effects Models for Binary Responses using Different Statistical Packages

PubMed Central

Zhang, Hui; Lu, Naiji; Feng, Changyong; Thurston, Sally W.; Xia, Yinglin; Tu, Xin M.

2011-01-01

Summary The generalized linear mixed-effects model (GLMM) is a popular paradigm to extend models for cross-sectional data to a longitudinal setting. When applied to modeling binary responses, different software packages and even different procedures within a package may give quite different results. In this report, we describe the statistical approaches that underlie these different procedures and discuss their strengths and weaknesses when applied to fit correlated binary responses. We then illustrate these considerations by applying these procedures implemented in some popular software packages to simulated and real study data. Our simulation results indicate a lack of reliability for most of the procedures considered, which carries significant implications for applying such popular software packages in practice. PMID:21671252
A General Accelerated Degradation Model Based on the Wiener Process.

PubMed

Liu, Le; Li, Xiaoyang; Sun, Fuqiang; Wang, Ning

2016-12-06

Accelerated degradation testing (ADT) is an efficient tool to conduct material service reliability and safety evaluations by analyzing performance degradation data. Traditional stochastic process models are mainly for linear or linearization degradation paths. However, those methods are not applicable for the situations where the degradation processes cannot be linearized. Hence, in this paper, a general ADT model based on the Wiener process is proposed to solve the problem for accelerated degradation data analysis. The general model can consider the unit-to-unit variation and temporal variation of the degradation process, and is suitable for both linear and nonlinear ADT analyses with single or multiple acceleration variables. The statistical inference is given to estimate the unknown parameters in both constant stress and step stress ADT. The simulation example and two real applications demonstrate that the proposed method can yield reliable lifetime evaluation results compared with the existing linear and time-scale transformation Wiener processes in both linear and nonlinear ADT analyses.
A General Accelerated Degradation Model Based on the Wiener Process

PubMed Central

Liu, Le; Li, Xiaoyang; Sun, Fuqiang; Wang, Ning

2016-01-01

Accelerated degradation testing (ADT) is an efficient tool to conduct material service reliability and safety evaluations by analyzing performance degradation data. Traditional stochastic process models are mainly for linear or linearization degradation paths. However, those methods are not applicable for the situations where the degradation processes cannot be linearized. Hence, in this paper, a general ADT model based on the Wiener process is proposed to solve the problem for accelerated degradation data analysis. The general model can consider the unit-to-unit variation and temporal variation of the degradation process, and is suitable for both linear and nonlinear ADT analyses with single or multiple acceleration variables. The statistical inference is given to estimate the unknown parameters in both constant stress and step stress ADT. The simulation example and two real applications demonstrate that the proposed method can yield reliable lifetime evaluation results compared with the existing linear and time-scale transformation Wiener processes in both linear and nonlinear ADT analyses. PMID:28774107
Optimal linear and nonlinear feature extraction based on the minimization of the increased risk of misclassification. [Bayes theorem - statistical analysis/data processing

NASA Technical Reports Server (NTRS)

Defigueiredo, R. J. P.

1974-01-01

General classes of nonlinear and linear transformations were investigated for the reduction of the dimensionality of the classification (feature) space so that, for a prescribed dimension m of this space, the increase of the misclassification risk is minimized.
A General Linear Method for Equating with Small Samples

ERIC Educational Resources Information Center

Albano, Anthony D.

2015-01-01

Research on equating with small samples has shown that methods with stronger assumptions and fewer statistical estimates can lead to decreased error in the estimated equating function. This article introduces a new approach to linear observed-score equating, one which provides flexible control over how form difficulty is assumed versus estimated…
Summary goodness-of-fit statistics for binary generalized linear models with noncanonical link functions.

PubMed

Canary, Jana D; Blizzard, Leigh; Barry, Ronald P; Hosmer, David W; Quinn, Stephen J

2016-05-01

Generalized linear models (GLM) with a canonical logit link function are the primary modeling technique used to relate a binary outcome to predictor variables. However, noncanonical links can offer more flexibility, producing convenient analytical quantities (e.g., probit GLMs in toxicology) and desired measures of effect (e.g., relative risk from log GLMs). Many summary goodness-of-fit (GOF) statistics exist for logistic GLM. Their properties make the development of GOF statistics relatively straightforward, but it can be more difficult under noncanonical links. Although GOF tests for logistic GLM with continuous covariates (GLMCC) have been applied to GLMCCs with log links, we know of no GOF tests in the literature specifically developed for GLMCCs that can be applied regardless of link function chosen. We generalize the Tsiatis GOF statistic originally developed for logistic GLMCCs, (TG), so that it can be applied under any link function. Further, we show that the algebraically related Hosmer-Lemeshow (HL) and Pigeon-Heyse (J(2) ) statistics can be applied directly. In a simulation study, TG, HL, and J(2) were used to evaluate the fit of probit, log-log, complementary log-log, and log models, all calculated with a common grouping method. The TG statistic consistently maintained Type I error rates, while those of HL and J(2) were often lower than expected if terms with little influence were included. Generally, the statistics had similar power to detect an incorrect model. An exception occurred when a log GLMCC was incorrectly fit to data generated from a logistic GLMCC. In this case, TG had more power than HL or J(2) . © 2015 John Wiley & Sons Ltd/London School of Economics.
A statistical package for computing time and frequency domain analysis

NASA Technical Reports Server (NTRS)

Brownlow, J.

1978-01-01

The spectrum analysis (SPA) program is a general purpose digital computer program designed to aid in data analysis. The program does time and frequency domain statistical analyses as well as some preanalysis data preparation. The capabilities of the SPA program include linear trend removal and/or digital filtering of data, plotting and/or listing of both filtered and unfiltered data, time domain statistical characterization of data, and frequency domain statistical characterization of data.
Transmit Designs for the MIMO Broadcast Channel With Statistical CSI

NASA Astrophysics Data System (ADS)

Wu, Yongpeng; Jin, Shi; Gao, Xiqi; McKay, Matthew R.; Xiao, Chengshan

2014-09-01

We investigate the multiple-input multiple-output broadcast channel with statistical channel state information available at the transmitter. The so-called linear assignment operation is employed, and necessary conditions are derived for the optimal transmit design under general fading conditions. Based on this, we introduce an iterative algorithm to maximize the linear assignment weighted sum-rate by applying a gradient descent method. To reduce complexity, we derive an upper bound of the linear assignment achievable rate of each receiver, from which a simplified closed-form expression for a near-optimal linear assignment matrix is derived. This reveals an interesting construction analogous to that of dirty-paper coding. In light of this, a low complexity transmission scheme is provided. Numerical examples illustrate the significant performance of the proposed low complexity scheme.
Investigation of Sunspot Area Varying with Sunspot Number

NASA Astrophysics Data System (ADS)

Li, K. J.; Li, F. Y.; Zhang, J.; Feng, W.

2016-11-01

The statistical relationship between sunspot area (SA) and sunspot number (SN) is investigated through analysis of their daily observation records from May 1874 to April 2015. For a total of 1607 days, representing 3 % of the total interval considered, either SA or SN had a value of zero while the other parameter did not. These occurrences most likely reflect the report of short-lived spots by a single observatory and subsequent averaging of zero values over multiple stations. The main results obtained are as follows: i) The number of spotless days around the minimum of a solar cycle is statistically negatively correlated with the maximum strength of solar activity of that cycle. ii) The probability distribution of SA generally decreases monotonically with SA, but the distribution of SN generally increases first, then it decreases as a whole. The different probability distribution of SA and SN should strengthen their non-linear relation, and the correction factor [k] in the definition of SN may be one of the factors that cause the non-linearity. iii) The non-linear relation of SA and SN indeed exists statistically, and it is clearer during the maximum epoch of a solar cycle.
The Effects of Measurement Error on Statistical Models for Analyzing Change. Final Report.

ERIC Educational Resources Information Center

Dunivant, Noel

The results of six major projects are discussed including a comprehensive mathematical and statistical analysis of the problems caused by errors of measurement in linear models for assessing change. In a general matrix representation of the problem, several new analytic results are proved concerning the parameters which affect bias in…
On fitting generalized linear mixed-effects models for binary responses using different statistical packages.

PubMed

Zhang, Hui; Lu, Naiji; Feng, Changyong; Thurston, Sally W; Xia, Yinglin; Zhu, Liang; Tu, Xin M

2011-09-10

The generalized linear mixed-effects model (GLMM) is a popular paradigm to extend models for cross-sectional data to a longitudinal setting. When applied to modeling binary responses, different software packages and even different procedures within a package may give quite different results. In this report, we describe the statistical approaches that underlie these different procedures and discuss their strengths and weaknesses when applied to fit correlated binary responses. We then illustrate these considerations by applying these procedures implemented in some popular software packages to simulated and real study data. Our simulation results indicate a lack of reliability for most of the procedures considered, which carries significant implications for applying such popular software packages in practice. Copyright © 2011 John Wiley & Sons, Ltd.
Right-Sizing Statistical Models for Longitudinal Data

PubMed Central

Wood, Phillip K.; Steinley, Douglas; Jackson, Kristina M.

2015-01-01

Arguments are proposed that researchers using longitudinal data should consider more and less complex statistical model alternatives to their initially chosen techniques in an effort to “right-size” the model to the data at hand. Such model comparisons may alert researchers who use poorly fitting overly parsimonious models to more complex better fitting alternatives, and, alternatively, may identify more parsimonious alternatives to overly complex (and perhaps empirically under-identified and/or less powerful) statistical models. A general framework is proposed for considering (often nested) relationships between a variety of psychometric and growth curve models. A three-step approach is proposed in which models are evaluated based on the number and patterning of variance components prior to selection of better-fitting growth models that explain both mean and variation/covariation patterns. The orthogonal, free-curve slope-intercept (FCSI) growth model is considered as a general model which includes, as special cases, many models including the Factor Mean model (FM, McArdle & Epstein, 1987), McDonald's (1967) linearly constrained factor model, Hierarchical Linear Models (HLM), Repeated Measures MANOVA, and the Linear Slope Intercept (LinearSI) Growth Model. The FCSI model, in turn, is nested within the Tuckerized factor model. The approach is illustrated by comparing alternative models in a longitudinal study of children's vocabulary and by comparison of several candidate parametric growth and chronometric models in a Monte Carlo study. PMID:26237507
Right-sizing statistical models for longitudinal data.

PubMed

Wood, Phillip K; Steinley, Douglas; Jackson, Kristina M

2015-12-01

Arguments are proposed that researchers using longitudinal data should consider more and less complex statistical model alternatives to their initially chosen techniques in an effort to "right-size" the model to the data at hand. Such model comparisons may alert researchers who use poorly fitting, overly parsimonious models to more complex, better-fitting alternatives and, alternatively, may identify more parsimonious alternatives to overly complex (and perhaps empirically underidentified and/or less powerful) statistical models. A general framework is proposed for considering (often nested) relationships between a variety of psychometric and growth curve models. A 3-step approach is proposed in which models are evaluated based on the number and patterning of variance components prior to selection of better-fitting growth models that explain both mean and variation-covariation patterns. The orthogonal free curve slope intercept (FCSI) growth model is considered a general model that includes, as special cases, many models, including the factor mean (FM) model (McArdle & Epstein, 1987), McDonald's (1967) linearly constrained factor model, hierarchical linear models (HLMs), repeated-measures multivariate analysis of variance (MANOVA), and the linear slope intercept (linearSI) growth model. The FCSI model, in turn, is nested within the Tuckerized factor model. The approach is illustrated by comparing alternative models in a longitudinal study of children's vocabulary and by comparing several candidate parametric growth and chronometric models in a Monte Carlo study. (c) 2015 APA, all rights reserved).
Generalized t-statistic for two-group classification.

PubMed

Komori, Osamu; Eguchi, Shinto; Copas, John B

2015-06-01

In the classic discriminant model of two multivariate normal distributions with equal variance matrices, the linear discriminant function is optimal both in terms of the log likelihood ratio and in terms of maximizing the standardized difference (the t-statistic) between the means of the two distributions. In a typical case-control study, normality may be sensible for the control sample but heterogeneity and uncertainty in diagnosis may suggest that a more flexible model is needed for the cases. We generalize the t-statistic approach by finding the linear function which maximizes a standardized difference but with data from one of the groups (the cases) filtered by a possibly nonlinear function U. We study conditions for consistency of the method and find the function U which is optimal in the sense of asymptotic efficiency. Optimality may also extend to other measures of discriminatory efficiency such as the area under the receiver operating characteristic curve. The optimal function U depends on a scalar probability density function which can be estimated non-parametrically using a standard numerical algorithm. A lasso-like version for variable selection is implemented by adding L1-regularization to the generalized t-statistic. Two microarray data sets in the study of asthma and various cancers are used as motivating examples. © 2014, The International Biometric Society.
Comparing Regression Coefficients between Nested Linear Models for Clustered Data with Generalized Estimating Equations

ERIC Educational Resources Information Center

Yan, Jun; Aseltine, Robert H., Jr.; Harel, Ofer

2013-01-01

Comparing regression coefficients between models when one model is nested within another is of great practical interest when two explanations of a given phenomenon are specified as linear models. The statistical problem is whether the coefficients associated with a given set of covariates change significantly when other covariates are added into…
Neuroimaging Research: from Null-Hypothesis Falsification to Out-Of-Sample Generalization

ERIC Educational Resources Information Center

Bzdok, Danilo; Varoquaux, Gaël; Thirion, Bertrand

2017-01-01

Brain-imaging technology has boosted the quantification of neurobiological phenomena underlying human mental operations and their disturbances. Since its inception, drawing inference on neurophysiological effects hinged on classical statistical methods, especially, the general linear model. The tens of thousands of variables per brain scan were…
A General Approach to Causal Mediation Analysis

ERIC Educational Resources Information Center

Imai, Kosuke; Keele, Luke; Tingley, Dustin

2010-01-01

Traditionally in the social sciences, causal mediation analysis has been formulated, understood, and implemented within the framework of linear structural equation models. We argue and demonstrate that this is problematic for 3 reasons: the lack of a general definition of causal mediation effects independent of a particular statistical model, the…
Predicting oropharyngeal tumor volume throughout the course of radiation therapy from pretreatment computed tomography data using general linear models.

PubMed

Yock, Adam D; Rao, Arvind; Dong, Lei; Beadle, Beth M; Garden, Adam S; Kudchadker, Rajat J; Court, Laurence E

2014-05-01

The purpose of this work was to develop and evaluate the accuracy of several predictive models of variation in tumor volume throughout the course of radiation therapy. Nineteen patients with oropharyngeal cancers were imaged daily with CT-on-rails for image-guided alignment per an institutional protocol. The daily volumes of 35 tumors in these 19 patients were determined and used to generate (1) a linear model in which tumor volume changed at a constant rate, (2) a general linear model that utilized the power fit relationship between the daily and initial tumor volumes, and (3) a functional general linear model that identified and exploited the primary modes of variation between time series describing the changing tumor volumes. Primary and nodal tumor volumes were examined separately. The accuracy of these models in predicting daily tumor volumes were compared with those of static and linear reference models using leave-one-out cross-validation. In predicting the daily volume of primary tumors, the general linear model and the functional general linear model were more accurate than the static reference model by 9.9% (range: -11.6%-23.8%) and 14.6% (range: -7.3%-27.5%), respectively, and were more accurate than the linear reference model by 14.2% (range: -6.8%-40.3%) and 13.1% (range: -1.5%-52.5%), respectively. In predicting the daily volume of nodal tumors, only the 14.4% (range: -11.1%-20.5%) improvement in accuracy of the functional general linear model compared to the static reference model was statistically significant. A general linear model and a functional general linear model trained on data from a small population of patients can predict the primary tumor volume throughout the course of radiation therapy with greater accuracy than standard reference models. These more accurate models may increase the prognostic value of information about the tumor garnered from pretreatment computed tomography images and facilitate improved treatment management.
Quantum description of light propagation in generalized media

NASA Astrophysics Data System (ADS)

Häyrynen, Teppo; Oksanen, Jani

2016-02-01

Linear quantum input-output relation based models are widely applied to describe the light propagation in a lossy medium. The details of the interaction and the associated added noise depend on whether the device is configured to operate as an amplifier or an attenuator. Using the traveling wave (TW) approach, we generalize the linear material model to simultaneously account for both the emission and absorption processes and to have point-wise defined noise field statistics and intensity dependent interaction strengths. Thus, our approach describes the quantum input-output relations of linear media with net attenuation, amplification or transparency without pre-selection of the operation point. The TW approach is then applied to investigate materials at thermal equilibrium, inverted materials, the transparency limit where losses are compensated, and the saturating amplifiers. We also apply the approach to investigate media in nonuniform states which can be e.g. consequences of a temperature gradient over the medium or a position dependent inversion of the amplifier. Furthermore, by using the generalized model we investigate devices with intensity dependent interactions and show how an initial thermal field transforms to a field having coherent statistics due to gain saturation.
SOCR Analyses – an Instructional Java Web-based Statistical Analysis Toolkit

PubMed Central

Chu, Annie; Cui, Jenny; Dinov, Ivo D.

2011-01-01

The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test. The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website. In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models. PMID:21546994

Robust Combining of Disparate Classifiers Through Order Statistics

NASA Technical Reports Server (NTRS)

Tumer, Kagan; Ghosh, Joydeep

2001-01-01

Integrating the outputs of multiple classifiers via combiners or meta-learners has led to substantial improvements in several difficult pattern recognition problems. In this article we investigate a family of combiners based on order statistics, for robust handling of situations where there are large discrepancies in performance of individual classifiers. Based on a mathematical modeling of how the decision boundaries are affected by order statistic combiners, we derive expressions for the reductions in error expected when simple output combination methods based on the the median, the maximum and in general, the ith order statistic, are used. Furthermore, we analyze the trim and spread combiners, both based on linear combinations of the ordered classifier outputs, and show that in the presence of uneven classifier performance, they often provide substantial gains over both linear and simple order statistics combiners. Experimental results on both real world data and standard public domain data sets corroborate these findings.
Linear Least Squares for Correlated Data

NASA Technical Reports Server (NTRS)

Dean, Edwin B.

1988-01-01

Throughout the literature authors have consistently discussed the suspicion that regression results were less than satisfactory when the independent variables were correlated. Camm, Gulledge, and Womer, and Womer and Marcotte provide excellent applied examples of these concerns. Many authors have obtained partial solutions for this problem as discussed by Womer and Marcotte and Wonnacott and Wonnacott, which result in generalized least squares algorithms to solve restrictive cases. This paper presents a simple but relatively general multivariate method for obtaining linear least squares coefficients which are free of the statistical distortion created by correlated independent variables.
A General Family of Limited Information Goodness-of-Fit Statistics for Multinomial Data

ERIC Educational Resources Information Center

Joe, Harry; Maydeu-Olivares, Alberto

2010-01-01

Maydeu-Olivares and Joe (J. Am. Stat. Assoc. 100:1009-1020, "2005"; Psychometrika 71:713-732, "2006") introduced classes of chi-square tests for (sparse) multidimensional multinomial data based on low-order marginal proportions. Our extension provides general conditions under which quadratic forms in linear functions of cell residuals are…
Drivers willingness to pay progressive rate for street parking.

DOT National Transportation Integrated Search

2015-01-01

This study finds willingness to pay and price elasticity for on-street parking demand using stated : preference data obtained from 238 respondents. Descriptive, statistical and economic analyses including : regression, generalized linear model, and f...
Estimation and Selection via Absolute Penalized Convex Minimization And Its Multistage Adaptive Applications

PubMed Central

Huang, Jian; Zhang, Cun-Hui

2013-01-01

The ℓ1-penalized method, or the Lasso, has emerged as an important tool for the analysis of large data sets. Many important results have been obtained for the Lasso in linear regression which have led to a deeper understanding of high-dimensional statistical problems. In this article, we consider a class of weighted ℓ1-penalized estimators for convex loss functions of a general form, including the generalized linear models. We study the estimation, prediction, selection and sparsity properties of the weighted ℓ1-penalized estimator in sparse, high-dimensional settings where the number of predictors p can be much larger than the sample size n. Adaptive Lasso is considered as a special case. A multistage method is developed to approximate concave regularized estimation by applying an adaptive Lasso recursively. We provide prediction and estimation oracle inequalities for single- and multi-stage estimators, a general selection consistency theorem, and an upper bound for the dimension of the Lasso estimator. Important models including the linear regression, logistic regression and log-linear models are used throughout to illustrate the applications of the general results. PMID:24348100
Flow Equation Approach to the Statistics of Nonlinear Dynamical Systems

NASA Astrophysics Data System (ADS)

Marston, J. B.; Hastings, M. B.

2005-03-01

The probability distribution function of non-linear dynamical systems is governed by a linear framework that resembles quantum many-body theory, in which stochastic forcing and/or averaging over initial conditions play the role of non-zero . Besides the well-known Fokker-Planck approach, there is a related Hopf functional methodootnotetextUriel Frisch, Turbulence: The Legacy of A. N. Kolmogorov (Cambridge University Press, 1995) chapter 9.5.; in both formalisms, zero modes of linear operators describe the stationary non-equilibrium statistics. To access the statistics, we investigate the method of continuous unitary transformationsootnotetextS. D. Glazek and K. G. Wilson, Phys. Rev. D 48, 5863 (1993); Phys. Rev. D 49, 4214 (1994). (also known as the flow equation approachootnotetextF. Wegner, Ann. Phys. 3, 77 (1994).), suitably generalized to the diagonalization of non-Hermitian matrices. Comparison to the more traditional cumulant expansion method is illustrated with low-dimensional attractors. The treatment of high-dimensional dynamical systems is also discussed.
On Fluctuations of Eigenvalues of Random Band Matrices

NASA Astrophysics Data System (ADS)

Shcherbina, M.

2015-10-01

We consider the fluctuations of linear eigenvalue statistics of random band matrices whose entries have the form with i.i.d. possessing the th moment, where the function u has a finite support , so that M has only nonzero diagonals. The parameter b (called the bandwidth) is assumed to grow with n in a way such that . Without any additional assumptions on the growth of b we prove CLT for linear eigenvalue statistics for a rather wide class of test functions. Thus we improve and generalize the results of the previous papers (Jana et al., arXiv:1412.2445; Li et al. Random Matrices 2:04, 2013), where CLT was proven under the assumption . Moreover, we develop a method which allows to prove automatically the CLT for linear eigenvalue statistics of the smooth test functions for almost all classical models of random matrix theory: deformed Wigner and sample covariance matrices, sparse matrices, diluted random matrices, matrices with heavy tales etc.
The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded.

PubMed

Nakagawa, Shinichi; Johnson, Paul C D; Schielzeth, Holger

2017-09-01

The coefficient of determination R 2 quantifies the proportion of variance explained by a statistical model and is an important summary statistic of biological interest. However, estimating R 2 for generalized linear mixed models (GLMMs) remains challenging. We have previously introduced a version of R 2 that we called [Formula: see text] for Poisson and binomial GLMMs, but not for other distributional families. Similarly, we earlier discussed how to estimate intra-class correlation coefficients (ICCs) using Poisson and binomial GLMMs. In this paper, we generalize our methods to all other non-Gaussian distributions, in particular to negative binomial and gamma distributions that are commonly used for modelling biological data. While expanding our approach, we highlight two useful concepts for biologists, Jensen's inequality and the delta method, both of which help us in understanding the properties of GLMMs. Jensen's inequality has important implications for biologically meaningful interpretation of GLMMs, whereas the delta method allows a general derivation of variance associated with non-Gaussian distributions. We also discuss some special considerations for binomial GLMMs with binary or proportion data. We illustrate the implementation of our extension by worked examples from the field of ecology and evolution in the R environment. However, our method can be used across disciplines and regardless of statistical environments. © 2017 The Author(s).
Massive parallelization of serial inference algorithms for a complex generalized linear model

PubMed Central

Suchard, Marc A.; Simpson, Shawn E.; Zorych, Ivan; Ryan, Patrick; Madigan, David

2014-01-01

Following a series of high-profile drug safety disasters in recent years, many countries are redoubling their efforts to ensure the safety of licensed medical products. Large-scale observational databases such as claims databases or electronic health record systems are attracting particular attention in this regard, but present significant methodological and computational concerns. In this paper we show how high-performance statistical computation, including graphics processing units, relatively inexpensive highly parallel computing devices, can enable complex methods in large databases. We focus on optimization and massive parallelization of cyclic coordinate descent approaches to fit a conditioned generalized linear model involving tens of millions of observations and thousands of predictors in a Bayesian context. We find orders-of-magnitude improvement in overall run-time. Coordinate descent approaches are ubiquitous in high-dimensional statistics and the algorithms we propose open up exciting new methodological possibilities with the potential to significantly improve drug safety. PMID:25328363
Online Statistical Modeling (Regression Analysis) for Independent Responses

NASA Astrophysics Data System (ADS)

Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus

2017-06-01

Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.
Quantifying the evolution of flow boiling bubbles by statistical testing and image analysis: toward a general model.

PubMed

Xiao, Qingtai; Xu, Jianxin; Wang, Hua

2016-08-16

A new index, the estimate of the error variance, which can be used to quantify the evolution of the flow patterns when multiphase components or tracers are difficultly distinguishable, was proposed. The homogeneity degree of the luminance space distribution behind the viewing windows in the direct contact boiling heat transfer process was explored. With image analysis and a linear statistical model, the F-test of the statistical analysis was used to test whether the light was uniform, and a non-linear method was used to determine the direction and position of a fixed source light. The experimental results showed that the inflection point of the new index was approximately equal to the mixing time. The new index has been popularized and applied to a multiphase macro mixing process by top blowing in a stirred tank. Moreover, a general quantifying model was introduced for demonstrating the relationship between the flow patterns of the bubble swarms and heat transfer. The results can be applied to investigate other mixing processes that are very difficult to recognize the target.
Quantifying the evolution of flow boiling bubbles by statistical testing and image analysis: toward a general model

PubMed Central

Xiao, Qingtai; Xu, Jianxin; Wang, Hua

2016-01-01

A new index, the estimate of the error variance, which can be used to quantify the evolution of the flow patterns when multiphase components or tracers are difficultly distinguishable, was proposed. The homogeneity degree of the luminance space distribution behind the viewing windows in the direct contact boiling heat transfer process was explored. With image analysis and a linear statistical model, the F-test of the statistical analysis was used to test whether the light was uniform, and a non-linear method was used to determine the direction and position of a fixed source light. The experimental results showed that the inflection point of the new index was approximately equal to the mixing time. The new index has been popularized and applied to a multiphase macro mixing process by top blowing in a stirred tank. Moreover, a general quantifying model was introduced for demonstrating the relationship between the flow patterns of the bubble swarms and heat transfer. The results can be applied to investigate other mixing processes that are very difficult to recognize the target. PMID:27527065
Statistics of Smoothed Cosmic Fields in Perturbation Theory. I. Formulation and Useful Formulae in Second-Order Perturbation Theory

NASA Astrophysics Data System (ADS)

Matsubara, Takahiko

2003-02-01

We formulate a general method for perturbative evaluations of statistics of smoothed cosmic fields and provide useful formulae for application of the perturbation theory to various statistics. This formalism is an extensive generalization of the method used by Matsubara, who derived a weakly nonlinear formula of the genus statistic in a three-dimensional density field. After describing the general method, we apply the formalism to a series of statistics, including genus statistics, level-crossing statistics, Minkowski functionals, and a density extrema statistic, regardless of the dimensions in which each statistic is defined. The relation between the Minkowski functionals and other geometrical statistics is clarified. These statistics can be applied to several cosmic fields, including three-dimensional density field, three-dimensional velocity field, two-dimensional projected density field, and so forth. The results are detailed for second-order theory of the formalism. The effect of the bias is discussed. The statistics of smoothed cosmic fields as functions of rescaled threshold by volume fraction are discussed in the framework of second-order perturbation theory. In CDM-like models, their functional deviations from linear predictions plotted against the rescaled threshold are generally much smaller than that plotted against the direct threshold. There is still a slight meatball shift against rescaled threshold, which is characterized by asymmetry in depths of troughs in the genus curve. A theory-motivated asymmetry factor in the genus curve is proposed.
Tools for Basic Statistical Analysis

NASA Technical Reports Server (NTRS)

Luz, Paul L.

2005-01-01

Statistical Analysis Toolset is a collection of eight Microsoft Excel spreadsheet programs, each of which performs calculations pertaining to an aspect of statistical analysis. These programs present input and output data in user-friendly, menu-driven formats, with automatic execution. The following types of calculations are performed: Descriptive statistics are computed for a set of data x(i) (i = 1, 2, 3 . . . ) entered by the user. Normal Distribution Estimates will calculate the statistical value that corresponds to cumulative probability values, given a sample mean and standard deviation of the normal distribution. Normal Distribution from two Data Points will extend and generate a cumulative normal distribution for the user, given two data points and their associated probability values. Two programs perform two-way analysis of variance (ANOVA) with no replication or generalized ANOVA for two factors with four levels and three repetitions. Linear Regression-ANOVA will curvefit data to the linear equation y=f(x) and will do an ANOVA to check its significance.
Detecting temporal change in freshwater fisheries surveys: statistical power and the important linkages between management questions and monitoring objectives

USGS Publications Warehouse

Wagner, Tyler; Irwin, Brian J.; James R. Bence,; Daniel B. Hayes,

2016-01-01

Monitoring to detect temporal trends in biological and habitat indices is a critical component of fisheries management. Thus, it is important that management objectives are linked to monitoring objectives. This linkage requires a definition of what constitutes a management-relevant “temporal trend.” It is also important to develop expectations for the amount of time required to detect a trend (i.e., statistical power) and for choosing an appropriate statistical model for analysis. We provide an overview of temporal trends commonly encountered in fisheries management, review published studies that evaluated statistical power of long-term trend detection, and illustrate dynamic linear models in a Bayesian context, as an additional analytical approach focused on shorter term change. We show that monitoring programs generally have low statistical power for detecting linear temporal trends and argue that often management should be focused on different definitions of trends, some of which can be better addressed by alternative analytical approaches.
A Generalization of Pythagoras's Theorem and Application to Explanations of Variance Contributions in Linear Models. Research Report. ETS RR-14-18

ERIC Educational Resources Information Center

Carlson, James E.

2014-01-01

Many aspects of the geometry of linear statistical models and least squares estimation are well known. Discussions of the geometry may be found in many sources. Some aspects of the geometry relating to the partitioning of variation that can be explained using a little-known theorem of Pappus and have not been discussed previously are the topic of…
A Mathematics Software Database Update.

ERIC Educational Resources Information Center

Cunningham, R. S.; Smith, David A.

1987-01-01

Contains an update of an earlier listing of software for mathematics instruction at the college level. Topics are: advanced mathematics, algebra, calculus, differential equations, discrete mathematics, equation solving, general mathematics, geometry, linear and matrix algebra, logic, statistics and probability, and trigonometry. (PK)
A Refined Model for Radar Homing Intercepts.

DTIC Science & Technology

1983-10-27

Helge Toutenburq, Prior Information in Linear Models ,(Wiley, NY, 1982). 7. F. A. Graybill , Introduction to Matrices with Applications in StatisticF... linear target trajectory model z i = 0 + 1 r i + wi () where w i i=I,..., N is a sequence of uncorrelated zero-mean A noise, the general formula for...z i (i=l,..., N) at r. and a linear regression model 1 z i = a0 + a1 r i + w i =(Al) where wi is the corruption noise; the problem is to estimate a0
A general science-based framework for dynamical spatio-temporal models

USGS Publications Warehouse

Wikle, C.K.; Hooten, M.B.

2010-01-01

Spatio-temporal statistical models are increasingly being used across a wide variety of scientific disciplines to describe and predict spatially-explicit processes that evolve over time. Correspondingly, in recent years there has been a significant amount of research on new statistical methodology for such models. Although descriptive models that approach the problem from the second-order (covariance) perspective are important, and innovative work is being done in this regard, many real-world processes are dynamic, and it can be more efficient in some cases to characterize the associated spatio-temporal dependence by the use of dynamical models. The chief challenge with the specification of such dynamical models has been related to the curse of dimensionality. Even in fairly simple linear, first-order Markovian, Gaussian error settings, statistical models are often over parameterized. Hierarchical models have proven invaluable in their ability to deal to some extent with this issue by allowing dependency among groups of parameters. In addition, this framework has allowed for the specification of science based parameterizations (and associated prior distributions) in which classes of deterministic dynamical models (e. g., partial differential equations (PDEs), integro-difference equations (IDEs), matrix models, and agent-based models) are used to guide specific parameterizations. Most of the focus for the application of such models in statistics has been in the linear case. The problems mentioned above with linear dynamic models are compounded in the case of nonlinear models. In this sense, the need for coherent and sensible model parameterizations is not only helpful, it is essential. Here, we present an overview of a framework for incorporating scientific information to motivate dynamical spatio-temporal models. First, we illustrate the methodology with the linear case. We then develop a general nonlinear spatio-temporal framework that we call general quadratic nonlinearity and demonstrate that it accommodates many different classes of scientific-based parameterizations as special cases. The model is presented in a hierarchical Bayesian framework and is illustrated with examples from ecology and oceanography. ?? 2010 Sociedad de Estad??stica e Investigaci??n Operativa.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Tessore, Nicolas; Metcalf, R. Benton; Winther, Hans A.

A number of alternatives to general relativity exhibit gravitational screening in the non-linear regime of structure formation. We describe a set of algorithms that can produce weak lensing maps of large scale structure in such theories and can be used to generate mock surveys for cosmological analysis. By analysing a few basic statistics we indicate how these alternatives can be distinguished from general relativity with future weak lensing surveys.

Gain optimization with non-linear controls

NASA Technical Reports Server (NTRS)

Slater, G. L.; Kandadai, R. D.

1984-01-01

An algorithm has been developed for the analysis and design of controls for non-linear systems. The technical approach is to use statistical linearization to model the non-linear dynamics of a system by a quasi-Gaussian model. A covariance analysis is performed to determine the behavior of the dynamical system and a quadratic cost function. Expressions for the cost function and its derivatives are determined so that numerical optimization techniques can be applied to determine optimal feedback laws. The primary application for this paper is centered about the design of controls for nominally linear systems but where the controls are saturated or limited by fixed constraints. The analysis is general, however, and numerical computation requires only that the specific non-linearity be considered in the analysis.
Towards a General Turbulence Model for Planetary Boundary Layers Based on Direct Statistical Simulation

NASA Astrophysics Data System (ADS)

Skitka, J.; Marston, B.; Fox-Kemper, B.

2016-02-01

Sub-grid turbulence models for planetary boundary layers are typically constructed additively, starting with local flow properties and including non-local (KPP) or higher order (Mellor-Yamada) parameters until a desired level of predictive capacity is achieved or a manageable threshold of complexity is surpassed. Such approaches are necessarily limited in general circumstances, like global circulation models, by their being optimized for particular flow phenomena. By building a model reductively, starting with the infinite hierarchy of turbulence statistics, truncating at a given order, and stripping degrees of freedom from the flow, we offer the prospect a turbulence model and investigative tool that is equally applicable to all flow types and able to take full advantage of the wealth of nonlocal information in any flow. Direct statistical simulation (DSS) that is based upon expansion in equal-time cumulants can be used to compute flow statistics of arbitrary order. We investigate the feasibility of a second-order closure (CE2) by performing simulations of the ocean boundary layer in a quasi-linear approximation for which CE2 is exact. As oceanographic examples, wind-driven Langmuir turbulence and thermal convection are studied by comparison of the quasi-linear and fully nonlinear statistics. We also characterize the computational advantages and physical uncertainties of CE2 defined on a reduced basis determined via proper orthogonal decomposition (POD) of the flow fields.
Scaling of Perceptual Errors Can Predict the Shape of Neural Tuning Curves

NASA Astrophysics Data System (ADS)

Shouval, Harel Z.; Agarwal, Animesh; Gavornik, Jeffrey P.

2013-04-01

Weber’s law, first characterized in the 19th century, states that errors estimating the magnitude of perceptual stimuli scale linearly with stimulus intensity. This linear relationship is found in most sensory modalities, generalizes to temporal interval estimation, and even applies to some abstract variables. Despite its generality and long experimental history, the neural basis of Weber’s law remains unknown. This work presents a simple theory explaining the conditions under which Weber’s law can result from neural variability and predicts that the tuning curves of neural populations which adhere to Weber’s law will have a log-power form with parameters that depend on spike-count statistics. The prevalence of Weber’s law suggests that it might be optimal in some sense. We examine this possibility, using variational calculus, and show that Weber’s law is optimal only when observed real-world variables exhibit power-law statistics with a specific exponent. Our theory explains how physiology gives rise to the behaviorally characterized Weber’s law and may represent a general governing principle relating perception to neural activity.
MIDAS: Regionally linear multivariate discriminative statistical mapping.

PubMed

Varol, Erdem; Sotiras, Aristeidis; Davatzikos, Christos

2018-07-01

Statistical parametric maps formed via voxel-wise mass-univariate tests, such as the general linear model, are commonly used to test hypotheses about regionally specific effects in neuroimaging cross-sectional studies where each subject is represented by a single image. Despite being informative, these techniques remain limited as they ignore multivariate relationships in the data. Most importantly, the commonly employed local Gaussian smoothing, which is important for accounting for registration errors and making the data follow Gaussian distributions, is usually chosen in an ad hoc fashion. Thus, it is often suboptimal for the task of detecting group differences and correlations with non-imaging variables. Information mapping techniques, such as searchlight, which use pattern classifiers to exploit multivariate information and obtain more powerful statistical maps, have become increasingly popular in recent years. However, existing methods may lead to important interpretation errors in practice (i.e., misidentifying a cluster as informative, or failing to detect truly informative voxels), while often being computationally expensive. To address these issues, we introduce a novel efficient multivariate statistical framework for cross-sectional studies, termed MIDAS, seeking highly sensitive and specific voxel-wise brain maps, while leveraging the power of regional discriminant analysis. In MIDAS, locally linear discriminative learning is applied to estimate the pattern that best discriminates between two groups, or predicts a variable of interest. This pattern is equivalent to local filtering by an optimal kernel whose coefficients are the weights of the linear discriminant. By composing information from all neighborhoods that contain a given voxel, MIDAS produces a statistic that collectively reflects the contribution of the voxel to the regional classifiers as well as the discriminative power of the classifiers. Critically, MIDAS efficiently assesses the statistical significance of the derived statistic by analytically approximating its null distribution without the need for computationally expensive permutation tests. The proposed framework was extensively validated using simulated atrophy in structural magnetic resonance imaging (MRI) and further tested using data from a task-based functional MRI study as well as a structural MRI study of cognitive performance. The performance of the proposed framework was evaluated against standard voxel-wise general linear models and other information mapping methods. The experimental results showed that MIDAS achieves relatively higher sensitivity and specificity in detecting group differences. Together, our results demonstrate the potential of the proposed approach to efficiently map effects of interest in both structural and functional data. Copyright © 2018. Published by Elsevier Inc.
Beam-plasma instability in the presence of low-frequency turbulence. [during type 3 solar emission

NASA Technical Reports Server (NTRS)

Goldman, M. V.; Dubois, D. F.

1982-01-01

General equations are derived for a linear beam-plasma instability in the presence of low-frequency turbulence. Within a 'quasi-linear' statistical approximation, these equations contain Langmuir wave scattering, diffusion, resonant and nonresonant anomalous absorption, and a 'plasma laser' effect. It is proposed that naturally occurring density irregularities in the solar wind may stabilize the beam-unstable Langmuir waves which occur during type III solar emissions.
Linear and Order Statistics Combiners for Pattern Classification

NASA Technical Reports Server (NTRS)

Tumer, Kagan; Ghosh, Joydeep; Lau, Sonie (Technical Monitor)

2001-01-01

Several researchers have experimentally shown that substantial improvements can be obtained in difficult pattern recognition problems by combining or integrating the outputs of multiple classifiers. This chapter provides an analytical framework to quantify the improvements in classification results due to combining. The results apply to both linear combiners and order statistics combiners. We first show that to a first order approximation, the error rate obtained over and above the Bayes error rate, is directly proportional to the variance of the actual decision boundaries around the Bayes optimum boundary. Combining classifiers in output space reduces this variance, and hence reduces the 'added' error. If N unbiased classifiers are combined by simple averaging. the added error rate can be reduced by a factor of N if the individual errors in approximating the decision boundaries are uncorrelated. Expressions are then derived for linear combiners which are biased or correlated, and the effect of output correlations on ensemble performance is quantified. For order statistics based non-linear combiners, we derive expressions that indicate how much the median, the maximum and in general the i-th order statistic can improve classifier performance. The analysis presented here facilitates the understanding of the relationships among error rates, classifier boundary distributions, and combining in output space. Experimental results on several public domain data sets are provided to illustrate the benefits of combining and to support the analytical results.
Adaptive Error Estimation in Linearized Ocean General Circulation Models

NASA Technical Reports Server (NTRS)

Chechelnitsky, Michael Y.

1999-01-01

Data assimilation methods are routinely used in oceanography. The statistics of the model and measurement errors need to be specified a priori. This study addresses the problem of estimating model and measurement error statistics from observations. We start by testing innovation based methods of adaptive error estimation with low-dimensional models in the North Pacific (5-60 deg N, 132-252 deg E) to TOPEX/POSEIDON (TIP) sea level anomaly data, acoustic tomography data from the ATOC project, and the MIT General Circulation Model (GCM). A reduced state linear model that describes large scale internal (baroclinic) error dynamics is used. The methods are shown to be sensitive to the initial guess for the error statistics and the type of observations. A new off-line approach is developed, the covariance matching approach (CMA), where covariance matrices of model-data residuals are "matched" to their theoretical expectations using familiar least squares methods. This method uses observations directly instead of the innovations sequence and is shown to be related to the MT method and the method of Fu et al. (1993). Twin experiments using the same linearized MIT GCM suggest that altimetric data are ill-suited to the estimation of internal GCM errors, but that such estimates can in theory be obtained using acoustic data. The CMA is then applied to T/P sea level anomaly data and a linearization of a global GFDL GCM which uses two vertical modes. We show that the CMA method can be used with a global model and a global data set, and that the estimates of the error statistics are robust. We show that the fraction of the GCM-T/P residual variance explained by the model error is larger than that derived in Fukumori et al.(1999) with the method of Fu et al.(1993). Most of the model error is explained by the barotropic mode. However, we find that impact of the change in the error statistics on the data assimilation estimates is very small. This is explained by the large representation error, i.e. the dominance of the mesoscale eddies in the T/P signal, which are not part of the 21 by 1" GCM. Therefore, the impact of the observations on the assimilation is very small even after the adjustment of the error statistics. This work demonstrates that simult&neous estimation of the model and measurement error statistics for data assimilation with global ocean data sets and linearized GCMs is possible. However, the error covariance estimation problem is in general highly underdetermined, much more so than the state estimation problem. In other words there exist a very large number of statistical models that can be made consistent with the available data. Therefore, methods for obtaining quantitative error estimates, powerful though they may be, cannot replace physical insight. Used in the right context, as a tool for guiding the choice of a small number of model error parameters, covariance matching can be a useful addition to the repertory of tools available to oceanographers.
Modeling exposure–lag–response associations with distributed lag non-linear models

PubMed Central

Gasparrini, Antonio

2014-01-01

In biomedical research, a health effect is frequently associated with protracted exposures of varying intensity sustained in the past. The main complexity of modeling and interpreting such phenomena lies in the additional temporal dimension needed to express the association, as the risk depends on both intensity and timing of past exposures. This type of dependency is defined here as exposure–lag–response association. In this contribution, I illustrate a general statistical framework for such associations, established through the extension of distributed lag non-linear models, originally developed in time series analysis. This modeling class is based on the definition of a cross-basis, obtained by the combination of two functions to flexibly model linear or nonlinear exposure-responses and the lag structure of the relationship, respectively. The methodology is illustrated with an example application to cohort data and validated through a simulation study. This modeling framework generalizes to various study designs and regression models, and can be applied to study the health effects of protracted exposures to environmental factors, drugs or carcinogenic agents, among others. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd. PMID:24027094
Role of sufficient statistics in stochastic thermodynamics and its implication to sensory adaptation

NASA Astrophysics Data System (ADS)

Matsumoto, Takumi; Sagawa, Takahiro

2018-04-01

A sufficient statistic is a significant concept in statistics, which means a probability variable that has sufficient information required for an inference task. We investigate the roles of sufficient statistics and related quantities in stochastic thermodynamics. Specifically, we prove that for general continuous-time bipartite networks, the existence of a sufficient statistic implies that an informational quantity called the sensory capacity takes the maximum. Since the maximal sensory capacity imposes a constraint that the energetic efficiency cannot exceed one-half, our result implies that the existence of a sufficient statistic is inevitably accompanied by energetic dissipation. We also show that, in a particular parameter region of linear Langevin systems there exists the optimal noise intensity at which the sensory capacity, the information-thermodynamic efficiency, and the total entropy production are optimized at the same time. We apply our general result to a model of sensory adaptation of E. coli and find that the sensory capacity is nearly maximal with experimentally realistic parameters.
Digital photography and transparency-based methods for measuring wound surface area.

PubMed

Bhedi, Amul; Saxena, Atul K; Gadani, Ravi; Patel, Ritesh

2013-04-01

To compare and determine a credible method of measurement of wound surface area by linear, transparency, and photographic methods for monitoring progress of wound healing accurately and ascertaining whether these methods are significantly different. From April 2005 to December 2006, 40 patients (30 men, 5 women, 5 children) admitted to the surgical ward of Shree Sayaji General Hospital, Baroda, had clean as well as infected wound following trauma, debridement, pressure sore, venous ulcer, and incision and drainage. Wound surface areas were measured by these three methods (linear, transparency, and photographic methods) simultaneously on alternate days. The linear method is statistically and significantly different from transparency and photographic methods (P value <0.05), but there is no significant difference between transparency and photographic methods (P value >0.05). Photographic and transparency methods provided measurements of wound surface area with equivalent result and there was no statistically significant difference between these two methods.
Regularized learning of linear ordered-statistic constant false alarm rate filters (Conference Presentation)

NASA Astrophysics Data System (ADS)

Havens, Timothy C.; Cummings, Ian; Botts, Jonathan; Summers, Jason E.

2017-05-01

The linear ordered statistic (LOS) is a parameterized ordered statistic (OS) that is a weighted average of a rank-ordered sample. LOS operators are useful generalizations of aggregation as they can represent any linear aggregation, from minimum to maximum, including conventional aggregations, such as mean and median. In the fuzzy logic field, these aggregations are called ordered weighted averages (OWAs). Here, we present a method for learning LOS operators from training data, viz., data for which you know the output of the desired LOS. We then extend the learning process with regularization, such that a lower complexity or sparse LOS can be learned. Hence, we discuss what 'lower complexity' means in this context and how to represent that in the optimization procedure. Finally, we apply our learning methods to the well-known constant-false-alarm-rate (CFAR) detection problem, specifically for the case of background levels modeled by long-tailed distributions, such as the K-distribution. These backgrounds arise in several pertinent imaging problems, including the modeling of clutter in synthetic aperture radar and sonar (SAR and SAS) and in wireless communications.
Statistics of Macroturbulence from Flow Equations

NASA Astrophysics Data System (ADS)

Marston, Brad; Iadecola, Thomas; Qi, Wanming

2012-02-01

Probability distribution functions of stochastically-driven and frictionally-damped fluids are governed by a linear framework that resembles quantum many-body theory. Besides the Fokker-Planck approach, there is a closely related Hopf functional methodfootnotetextOokie Ma and J. B. Marston, J. Stat. Phys. Th. Exp. P10007 (2005).; in both formalisms, zero modes of linear operators describe the stationary non-equilibrium statistics. To access the statistics, we generalize the flow equation approachfootnotetextF. Wegner, Ann. Phys. 3, 77 (1994). (also known as the method of continuous unitary transformationsfootnotetextS. D. Glazek and K. G. Wilson, Phys. Rev. D 48, 5863 (1993); Phys. Rev. D 49, 4214 (1994).) to find the zero mode. We test the approach using a prototypical model of geophysical and astrophysical flows on a rotating sphere that spontaneously organizes into a coherent jet. Good agreement is found with low-order equal-time statistics accumulated by direct numerical simulation, the traditional method. Different choices for the generators of the continuous transformations, and for closure approximations of the operator algebra, are discussed.
Using complexity metrics with R-R intervals and BPM heart rate measures.

PubMed

Wallot, Sebastian; Fusaroli, Riccardo; Tylén, Kristian; Jegindø, Else-Marie

2013-01-01

Lately, growing attention in the health sciences has been paid to the dynamics of heart rate as indicator of impending failures and for prognoses. Likewise, in social and cognitive sciences, heart rate is increasingly employed as a measure of arousal, emotional engagement and as a marker of interpersonal coordination. However, there is no consensus about which measurements and analytical tools are most appropriate in mapping the temporal dynamics of heart rate and quite different metrics are reported in the literature. As complexity metrics of heart rate variability depend critically on variability of the data, different choices regarding the kind of measures can have a substantial impact on the results. In this article we compare linear and non-linear statistics on two prominent types of heart beat data, beat-to-beat intervals (R-R interval) and beats-per-min (BPM). As a proof-of-concept, we employ a simple rest-exercise-rest task and show that non-linear statistics-fractal (DFA) and recurrence (RQA) analyses-reveal information about heart beat activity above and beyond the simple level of heart rate. Non-linear statistics unveil sustained post-exercise effects on heart rate dynamics, but their power to do so critically depends on the type data that is employed: While R-R intervals are very susceptible to non-linear analyses, the success of non-linear methods for BPM data critically depends on their construction. Generally, "oversampled" BPM time-series can be recommended as they retain most of the information about non-linear aspects of heart beat dynamics.
Statistical downscaling of precipitation using long short-term memory recurrent neural networks

NASA Astrophysics Data System (ADS)

Misra, Saptarshi; Sarkar, Sudeshna; Mitra, Pabitra

2017-11-01

Hydrological impacts of global climate change on regional scale are generally assessed by downscaling large-scale climatic variables, simulated by General Circulation Models (GCMs), to regional, small-scale hydrometeorological variables like precipitation, temperature, etc. In this study, we propose a new statistical downscaling model based on Recurrent Neural Network with Long Short-Term Memory which captures the spatio-temporal dependencies in local rainfall. The previous studies have used several other methods such as linear regression, quantile regression, kernel regression, beta regression, and artificial neural networks. Deep neural networks and recurrent neural networks have been shown to be highly promising in modeling complex and highly non-linear relationships between input and output variables in different domains and hence we investigated their performance in the task of statistical downscaling. We have tested this model on two datasets—one on precipitation in Mahanadi basin in India and the second on precipitation in Campbell River basin in Canada. Our autoencoder coupled long short-term memory recurrent neural network model performs the best compared to other existing methods on both the datasets with respect to temporal cross-correlation, mean squared error, and capturing the extremes.
Minimal agent based model for financial markets II. Statistical properties of the linear and multiplicative dynamics

NASA Astrophysics Data System (ADS)

Alfi, V.; Cristelli, M.; Pietronero, L.; Zaccaria, A.

2009-02-01

We present a detailed study of the statistical properties of the Agent Based Model introduced in paper I [Eur. Phys. J. B, DOI: 10.1140/epjb/e2009-00028-4] and of its generalization to the multiplicative dynamics. The aim of the model is to consider the minimal elements for the understanding of the origin of the stylized facts and their self-organization. The key elements are fundamentalist agents, chartist agents, herding dynamics and price behavior. The first two elements correspond to the competition between stability and instability tendencies in the market. The herding behavior governs the possibility of the agents to change strategy and it is a crucial element of this class of models. We consider a linear approximation for the price dynamics which permits a simple interpretation of the model dynamics and, for many properties, it is possible to derive analytical results. The generalized non linear dynamics results to be extremely more sensible to the parameter space and much more difficult to analyze and control. The main results for the nature and self-organization of the stylized facts are, however, very similar in the two cases. The main peculiarity of the non linear dynamics is an enhancement of the fluctuations and a more marked evidence of the stylized facts. We will also discuss some modifications of the model to introduce more realistic elements with respect to the real markets.
Graph embedding and extensions: a general framework for dimensionality reduction.

PubMed

Yan, Shuicheng; Xu, Dong; Zhang, Benyu; Zhang, Hong-Jiang; Yang, Qiang; Lin, Stephen

2007-01-01

Over the past few decades, a large family of algorithms - supervised or unsupervised; stemming from statistics or geometry theory - has been designed to provide different solutions to the problem of dimensionality reduction. Despite the different motivations of these algorithms, we present in this paper a general formulation known as graph embedding to unify them within a common framework. In graph embedding, each algorithm can be considered as the direct graph embedding or its linear/kernel/tensor extension of a specific intrinsic graph that describes certain desired statistical or geometric properties of a data set, with constraints from scale normalization or a penalty graph that characterizes a statistical or geometric property that should be avoided. Furthermore, the graph embedding framework can be used as a general platform for developing new dimensionality reduction algorithms. By utilizing this framework as a tool, we propose a new supervised dimensionality reduction algorithm called Marginal Fisher Analysis in which the intrinsic graph characterizes the intraclass compactness and connects each data point with its neighboring points of the same class, while the penalty graph connects the marginal points and characterizes the interclass separability. We show that MFA effectively overcomes the limitations of the traditional Linear Discriminant Analysis algorithm due to data distribution assumptions and available projection directions. Real face recognition experiments show the superiority of our proposed MFA in comparison to LDA, also for corresponding kernel and tensor extensions.
The Impact of New Technology on Accounting Education.

ERIC Educational Resources Information Center

Shaoul, Jean

The introduction of computers in the Department of Accounting and Finance at Manchester University is described. General background outlining the increasing need for microcomputers in the accounting curriculum (including financial modelling tools and decision support systems such as linear programming, statistical packages, and simulation) is…
Frequency-dependent scaling from mesoscale to macroscale in viscoelastic random composites

PubMed Central

Zhang, Jun

2016-01-01

This paper investigates the scaling from a statistical volume element (SVE; i.e. mesoscale level) to representative volume element (RVE; i.e. macroscale level) of spatially random linear viscoelastic materials, focusing on the quasi-static properties in the frequency domain. Requiring the material statistics to be spatially homogeneous and ergodic, the mesoscale bounds on the RVE response are developed from the Hill–Mandel homogenization condition adapted to viscoelastic materials. The bounds are obtained from two stochastic initial-boundary value problems set up, respectively, under uniform kinematic and traction boundary conditions. The frequency and scale dependencies of mesoscale bounds are obtained through computational mechanics for composites with planar random chessboard microstructures. In general, the frequency-dependent scaling to RVE can be described through a complex-valued scaling function, which generalizes the concept originally developed for linear elastic random composites. This scaling function is shown to apply for all different phase combinations on random chessboards and, essentially, is only a function of the microstructure and mesoscale. PMID:27274689
A new statistical method for transfer coefficient calculations in the framework of the general multiple-compartment model of transport for radionuclides in biological systems.

PubMed

Garcia, F; Arruda-Neto, J D; Manso, M V; Helene, O M; Vanin, V R; Rodriguez, O; Mesa, J; Likhachev, V P; Filho, J W; Deppman, A; Perez, G; Guzman, F; de Camargo, S P

1999-10-01

A new and simple statistical procedure (STATFLUX) for the calculation of transfer coefficients of radionuclide transport to animals and plants is proposed. The method is based on the general multiple-compartment model, which uses a system of linear equations involving geometrical volume considerations. By using experimentally available curves of radionuclide concentrations versus time, for each animal compartment (organs), flow parameters were estimated by employing a least-squares procedure, whose consistency is tested. Some numerical results are presented in order to compare the STATFLUX transfer coefficients with those from other works and experimental data.
The ultrasound-enhanced bioscouring performance of four polygalacturonase enzymes obtained from rhizopus oryzae

USDA-ARS?s Scientific Manuscript database

An analytical and statistical method has been developed to measure the ultrasound-enhanced bioscouring performance of milligram quantities of endo- and exo-polygalacturonase enzymes obtained from Rhizopus oryzae fungi. UV-Vis spectrophotometric data and a general linear mixed models procedure indic...

Genomic similarity and kernel methods I: advancements by building on mathematical and statistical foundations.

PubMed

Schaid, Daniel J

2010-01-01

Measures of genomic similarity are the basis of many statistical analytic methods. We review the mathematical and statistical basis of similarity methods, particularly based on kernel methods. A kernel function converts information for a pair of subjects to a quantitative value representing either similarity (larger values meaning more similar) or distance (smaller values meaning more similar), with the requirement that it must create a positive semidefinite matrix when applied to all pairs of subjects. This review emphasizes the wide range of statistical methods and software that can be used when similarity is based on kernel methods, such as nonparametric regression, linear mixed models and generalized linear mixed models, hierarchical models, score statistics, and support vector machines. The mathematical rigor for these methods is summarized, as is the mathematical framework for making kernels. This review provides a framework to move from intuitive and heuristic approaches to define genomic similarities to more rigorous methods that can take advantage of powerful statistical modeling and existing software. A companion paper reviews novel approaches to creating kernels that might be useful for genomic analyses, providing insights with examples [1]. Copyright © 2010 S. Karger AG, Basel.
Dimensional Reduction for the General Markov Model on Phylogenetic Trees.

PubMed

Sumner, Jeremy G

2017-03-01

We present a method of dimensional reduction for the general Markov model of sequence evolution on a phylogenetic tree. We show that taking certain linear combinations of the associated random variables (site pattern counts) reduces the dimensionality of the model from exponential in the number of extant taxa, to quadratic in the number of taxa, while retaining the ability to statistically identify phylogenetic divergence events. A key feature is the identification of an invariant subspace which depends only bilinearly on the model parameters, in contrast to the usual multi-linear dependence in the full space. We discuss potential applications including the computation of split (edge) weights on phylogenetic trees from observed sequence data.
On the Problems of Construction and Statistical Inference Associated with a Generalization of Canonical Variables.

DTIC Science & Technology

1982-02-01

of them are pre- sented in this paper. As an application, important practical problems similar to the one posed by Gnanadesikan (1977), p. 77 can be... Gnanadesikan and Wilk (1969) to search for a non-linear combination, giving rise to non-linear first principal component. So, a p-dinensional vector can...distribution, Gnanadesikan and Gupta (1970) and earlier Eaton (1967) have considered the problem of ranking the r underlying populations according to the
A heteroscedastic generalized linear model with a non-normal speed factor for responses and response times.

PubMed

Molenaar, Dylan; Bolsinova, Maria

2017-05-01

In generalized linear modelling of responses and response times, the observed response time variables are commonly transformed to make their distribution approximately normal. A normal distribution for the transformed response times is desirable as it justifies the linearity and homoscedasticity assumptions in the underlying linear model. Past research has, however, shown that the transformed response times are not always normal. Models have been developed to accommodate this violation. In the present study, we propose a modelling approach for responses and response times to test and model non-normality in the transformed response times. Most importantly, we distinguish between non-normality due to heteroscedastic residual variances, and non-normality due to a skewed speed factor. In a simulation study, we establish parameter recovery and the power to separate both effects. In addition, we apply the model to a real data set. © 2017 The Authors. British Journal of Mathematical and Statistical Psychology published by John Wiley & Sons Ltd on behalf of British Psychological Society.
Development of a technique for estimating noise covariances using multiple observers

NASA Technical Reports Server (NTRS)

Bundick, W. Thomas

1988-01-01

Friedland's technique for estimating the unknown noise variances of a linear system using multiple observers has been extended by developing a general solution for the estimates of the variances, developing the statistics (mean and standard deviation) of these estimates, and demonstrating the solution on two examples.
Statistical considerations in the analysis of data from replicated bioassays

USDA-ARS?s Scientific Manuscript database

Multiple-dose bioassay is generally the preferred method for characterizing virulence of insect pathogens. Linear regression of probit mortality on log dose enables estimation of LD50/LC50 and slope, the latter having substantial effect on LD90/95s (doses of considerable interest in pest management)...
Modeling Systematicity and Individuality in Nonlinear Second Language Development: The Case of English Grammatical Morphemes

ERIC Educational Resources Information Center

Murakami, Akira

2016-01-01

This article introduces two sophisticated statistical modeling techniques that allow researchers to analyze systematicity, individual variation, and nonlinearity in second language (L2) development. Generalized linear mixed-effects models can be used to quantify individual variation and examine systematic effects simultaneously, and generalized…
INTERANNUAL VARIATION IN METEOROLOGICALLY ADJUSTED OZONE LEVELS IN THE EASTERN UNITED STATES: A COMPARISON OF TWO APPROACHED

EPA Science Inventory

Assessing the influence of abatement efforts and other human activities on ozone levels is complicated by the atmosphere's changeable nature. Two statistical methods, the dynamic linear model(DLM) and the generalized additive model (GAM), are used to estimate ozone trends in the...
Variable Selection with Prior Information for Generalized Linear Models via the Prior LASSO Method.

PubMed

Jiang, Yuan; He, Yunxiao; Zhang, Heping

LASSO is a popular statistical tool often used in conjunction with generalized linear models that can simultaneously select variables and estimate parameters. When there are many variables of interest, as in current biological and biomedical studies, the power of LASSO can be limited. Fortunately, so much biological and biomedical data have been collected and they may contain useful information about the importance of certain variables. This paper proposes an extension of LASSO, namely, prior LASSO (pLASSO), to incorporate that prior information into penalized generalized linear models. The goal is achieved by adding in the LASSO criterion function an additional measure of the discrepancy between the prior information and the model. For linear regression, the whole solution path of the pLASSO estimator can be found with a procedure similar to the Least Angle Regression (LARS). Asymptotic theories and simulation results show that pLASSO provides significant improvement over LASSO when the prior information is relatively accurate. When the prior information is less reliable, pLASSO shows great robustness to the misspecification. We illustrate the application of pLASSO using a real data set from a genome-wide association study.
Optimal generalized multistep integration formulae for real-time digital simulation

NASA Technical Reports Server (NTRS)

Moerder, D. D.; Halyo, N.

1985-01-01

The problem of discretizing a dynamical system for real-time digital simulation is considered. Treating the system and its simulation as stochastic processes leads to a statistical characterization of simulator fidelity. A plant discretization procedure based on an efficient matrix generalization of explicit linear multistep discrete integration formulae is introduced, which minimizes a weighted sum of the mean squared steady-state and transient error between the system and simulator outputs.
Log-normal frailty models fitted as Poisson generalized linear mixed models.

PubMed

Hirsch, Katharina; Wienke, Andreas; Kuss, Oliver

2016-12-01

The equivalence of a survival model with a piecewise constant baseline hazard function and a Poisson regression model has been known since decades. As shown in recent studies, this equivalence carries over to clustered survival data: A frailty model with a log-normal frailty term can be interpreted and estimated as a generalized linear mixed model with a binary response, a Poisson likelihood, and a specific offset. Proceeding this way, statistical theory and software for generalized linear mixed models are readily available for fitting frailty models. This gain in flexibility comes at the small price of (1) having to fix the number of pieces for the baseline hazard in advance and (2) having to "explode" the data set by the number of pieces. In this paper we extend the simulations of former studies by using a more realistic baseline hazard (Gompertz) and by comparing the model under consideration with competing models. Furthermore, the SAS macro %PCFrailty is introduced to apply the Poisson generalized linear mixed approach to frailty models. The simulations show good results for the shared frailty model. Our new %PCFrailty macro provides proper estimates, especially in case of 4 events per piece. The suggested Poisson generalized linear mixed approach for log-normal frailty models based on the %PCFrailty macro provides several advantages in the analysis of clustered survival data with respect to more flexible modelling of fixed and random effects, exact (in the sense of non-approximate) maximum likelihood estimation, and standard errors and different types of confidence intervals for all variance parameters. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Statistical Entropy of Vaidya-de Sitter Black Hole to All Orders in Planck Length

NASA Astrophysics Data System (ADS)

Sun, HangBin; He, Feng; Huang, Hai

2012-06-01

Considering corrections to all orders in Planck length on the quantum state density from generalized uncertainty principle, we calculate the statistical entropy of scalar field near event horizon and cosmological horizon of Vaidya-de Sitter black hole without any artificial cutoff. It is shown that the entropy is linear sum of event horizon area and cosmological horizon area and there are similar proportional parameters related to changing rate of the horizon position. This is different from the static and stationary cases.
An evaluation of three statistical estimation methods for assessing health policy effects on prescription drug claims.

PubMed

Mittal, Manish; Harrison, Donald L; Thompson, David M; Miller, Michael J; Farmer, Kevin C; Ng, Yu-Tze

2016-01-01

While the choice of analytical approach affects study results and their interpretation, there is no consensus to guide the choice of statistical approaches to evaluate public health policy change. This study compared and contrasted three statistical estimation procedures in the assessment of a U.S. Food and Drug Administration (FDA) suicidality warning, communicated in January 2008 and implemented in May 2009, on antiepileptic drug (AED) prescription claims. Longitudinal designs were utilized to evaluate Oklahoma (U.S. State) Medicaid claim data from January 2006 through December 2009. The study included 9289 continuously eligible individuals with prevalent diagnoses of epilepsy and/or psychiatric disorder. Segmented regression models using three estimation procedures [i.e., generalized linear models (GLM), generalized estimation equations (GEE), and generalized linear mixed models (GLMM)] were used to estimate trends of AED prescription claims across three time periods: before (January 2006-January 2008); during (February 2008-May 2009); and after (June 2009-December 2009) the FDA warning. All three statistical procedures estimated an increasing trend (P < 0.0001) in AED prescription claims before the FDA warning period. No procedures detected a significant change in trend during (GLM: -30.0%, 99% CI: -60.0% to 10.0%; GEE: -20.0%, 99% CI: -70.0% to 30.0%; GLMM: -23.5%, 99% CI: -58.8% to 1.2%) and after (GLM: 50.0%, 99% CI: -70.0% to 160.0%; GEE: 80.0%, 99% CI: -20.0% to 200.0%; GLMM: 47.1%, 99% CI: -41.2% to 135.3%) the FDA warning when compared to pre-warning period. Although the three procedures provided consistent inferences, the GEE and GLMM approaches accounted appropriately for correlation. Further, marginal models estimated using GEE produced more robust and valid population-level estimations. Copyright © 2016 Elsevier Inc. All rights reserved.
Light propagation and the distance-redshift relation in a realistic inhomogeneous universe

NASA Technical Reports Server (NTRS)

Futamase, Toshifumi; Sasaki, Misao

1989-01-01

The propagation of light rays in a clumpy universe constructed by cosmological version of the post-Newtonian approximation was investigated. It is shown that linear approximation to the propagation equations is valid in the region where zeta is approximately less than 1 even if the density contrast is much larger than unity. Based on a gerneral order-of-magnitude statistical consideration, it is argued that the linear approximation is still valid where zeta is approximately greater than 1. A general formula for the distance-redshift relation in a clumpy universe is given. An explicit expression is derived for a simplified situation in which the effect of the gravitational potential of inhomogeneities dominates. In the light of the derived relation, the validity of the Dyer-Roeder distance is discussed. Also, statistical properties of light rays are investigated for a simple model of an inhomogeneous universe. The result of this example supports the validity of the linear approximation.
Differential gene expression detection and sample classification using penalized linear regression models.

PubMed

Wu, Baolin

2006-02-15

Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p > n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the 'large p small n' is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the 'nearest shrunken centroid' proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the (1) penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the (1) penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection.
Modeling Linguistic Variables With Regression Models: Addressing Non-Gaussian Distributions, Non-independent Observations, and Non-linear Predictors With Random Effects and Generalized Additive Models for Location, Scale, and Shape

PubMed Central

Coupé, Christophe

2018-01-01

As statistical approaches are getting increasingly used in linguistics, attention must be paid to the choice of methods and algorithms used. This is especially true since they require assumptions to be satisfied to provide valid results, and because scientific articles still often fall short of reporting whether such assumptions are met. Progress is being, however, made in various directions, one of them being the introduction of techniques able to model data that cannot be properly analyzed with simpler linear regression models. We report recent advances in statistical modeling in linguistics. We first describe linear mixed-effects regression models (LMM), which address grouping of observations, and generalized linear mixed-effects models (GLMM), which offer a family of distributions for the dependent variable. Generalized additive models (GAM) are then introduced, which allow modeling non-linear parametric or non-parametric relationships between the dependent variable and the predictors. We then highlight the possibilities offered by generalized additive models for location, scale, and shape (GAMLSS). We explain how they make it possible to go beyond common distributions, such as Gaussian or Poisson, and offer the appropriate inferential framework to account for ‘difficult’ variables such as count data with strong overdispersion. We also demonstrate how they offer interesting perspectives on data when not only the mean of the dependent variable is modeled, but also its variance, skewness, and kurtosis. As an illustration, the case of phonemic inventory size is analyzed throughout the article. For over 1,500 languages, we consider as predictors the number of speakers, the distance from Africa, an estimation of the intensity of language contact, and linguistic relationships. We discuss the use of random effects to account for genealogical relationships, the choice of appropriate distributions to model count data, and non-linear relationships. Relying on GAMLSS, we assess a range of candidate distributions, including the Sichel, Delaporte, Box-Cox Green and Cole, and Box-Cox t distributions. We find that the Box-Cox t distribution, with appropriate modeling of its parameters, best fits the conditional distribution of phonemic inventory size. We finally discuss the specificities of phoneme counts, weak effects, and how GAMLSS should be considered for other linguistic variables. PMID:29713298
Modeling Linguistic Variables With Regression Models: Addressing Non-Gaussian Distributions, Non-independent Observations, and Non-linear Predictors With Random Effects and Generalized Additive Models for Location, Scale, and Shape.

PubMed

Coupé, Christophe

2018-01-01

As statistical approaches are getting increasingly used in linguistics, attention must be paid to the choice of methods and algorithms used. This is especially true since they require assumptions to be satisfied to provide valid results, and because scientific articles still often fall short of reporting whether such assumptions are met. Progress is being, however, made in various directions, one of them being the introduction of techniques able to model data that cannot be properly analyzed with simpler linear regression models. We report recent advances in statistical modeling in linguistics. We first describe linear mixed-effects regression models (LMM), which address grouping of observations, and generalized linear mixed-effects models (GLMM), which offer a family of distributions for the dependent variable. Generalized additive models (GAM) are then introduced, which allow modeling non-linear parametric or non-parametric relationships between the dependent variable and the predictors. We then highlight the possibilities offered by generalized additive models for location, scale, and shape (GAMLSS). We explain how they make it possible to go beyond common distributions, such as Gaussian or Poisson, and offer the appropriate inferential framework to account for 'difficult' variables such as count data with strong overdispersion. We also demonstrate how they offer interesting perspectives on data when not only the mean of the dependent variable is modeled, but also its variance, skewness, and kurtosis. As an illustration, the case of phonemic inventory size is analyzed throughout the article. For over 1,500 languages, we consider as predictors the number of speakers, the distance from Africa, an estimation of the intensity of language contact, and linguistic relationships. We discuss the use of random effects to account for genealogical relationships, the choice of appropriate distributions to model count data, and non-linear relationships. Relying on GAMLSS, we assess a range of candidate distributions, including the Sichel, Delaporte, Box-Cox Green and Cole, and Box-Cox t distributions. We find that the Box-Cox t distribution, with appropriate modeling of its parameters, best fits the conditional distribution of phonemic inventory size. We finally discuss the specificities of phoneme counts, weak effects, and how GAMLSS should be considered for other linguistic variables.
Role of diversity in ICA and IVA: theory and applications

NASA Astrophysics Data System (ADS)

Adalı, Tülay

2016-05-01

Independent component analysis (ICA) has been the most popular approach for solving the blind source separation problem. Starting from a simple linear mixing model and the assumption of statistical independence, ICA can recover a set of linearly-mixed sources to within a scaling and permutation ambiguity. It has been successfully applied to numerous data analysis problems in areas as diverse as biomedicine, communications, finance, geo- physics, and remote sensing. ICA can be achieved using different types of diversity—statistical property—and, can be posed to simultaneously account for multiple types of diversity such as higher-order-statistics, sample dependence, non-circularity, and nonstationarity. A recent generalization of ICA, independent vector analysis (IVA), generalizes ICA to multiple data sets and adds the use of one more type of diversity, statistical dependence across the data sets, for jointly achieving independent decomposition of multiple data sets. With the addition of each new diversity type, identification of a broader class of signals become possible, and in the case of IVA, this includes sources that are independent and identically distributed Gaussians. We review the fundamentals and properties of ICA and IVA when multiple types of diversity are taken into account, and then ask the question whether diversity plays an important role in practical applications as well. Examples from various domains are presented to demonstrate that in many scenarios it might be worthwhile to jointly account for multiple statistical properties. This paper is submitted in conjunction with the talk delivered for the "Unsupervised Learning and ICA Pioneer Award" at the 2016 SPIE Conference on Sensing and Analysis Technologies for Biomedical and Cognitive Applications.
Estimating PM2.5 Concentrations in Xi'an City Using a Generalized Additive Model with Multi-Source Monitoring Data

PubMed Central

Song, Yong-Ze; Yang, Hong-Lei; Peng, Jun-Huan; Song, Yi-Rong; Sun, Qian; Li, Yuan

2015-01-01

Particulate matter with an aerodynamic diameter <2.5 μm (PM2.5) represents a severe environmental problem and is of negative impact on human health. Xi'an City, with a population of 6.5 million, is among the highest concentrations of PM2.5 in China. In 2013, in total, there were 191 days in Xi’an City on which PM2.5 concentrations were greater than 100 μg/m3. Recently, a few studies have explored the potential causes of high PM2.5 concentration using remote sensing data such as the MODIS aerosol optical thickness (AOT) product. Linear regression is a commonly used method to find statistical relationships among PM2.5 concentrations and other pollutants, including CO, NO2, SO2, and O3, which can be indicative of emission sources. The relationships of these variables, however, are usually complicated and non-linear. Therefore, a generalized additive model (GAM) is used to estimate the statistical relationships between potential variables and PM2.5 concentrations. This model contains linear functions of SO2 and CO, univariate smoothing non-linear functions of NO2, O3, AOT and temperature, and bivariate smoothing non-linear functions of location and wind variables. The model can explain 69.50% of PM2.5 concentrations, with R2 = 0.691, which improves the result of a stepwise linear regression (R2 = 0.582) by 18.73%. The two most significant variables, CO concentration and AOT, represent 20.65% and 19.54% of the deviance, respectively, while the three other gas-phase concentrations, SO2, NO2, and O3 account for 10.88% of the total deviance. These results show that in Xi'an City, the traffic and other industrial emissions are the primary source of PM2.5. Temperature, location, and wind variables also non-linearly related with PM2.5. PMID:26540446
How multiplicity determines entropy and the derivation of the maximum entropy principle for complex systems.

PubMed

Hanel, Rudolf; Thurner, Stefan; Gell-Mann, Murray

2014-05-13

The maximum entropy principle (MEP) is a method for obtaining the most likely distribution functions of observables from statistical systems by maximizing entropy under constraints. The MEP has found hundreds of applications in ergodic and Markovian systems in statistical mechanics, information theory, and statistics. For several decades there has been an ongoing controversy over whether the notion of the maximum entropy principle can be extended in a meaningful way to nonextensive, nonergodic, and complex statistical systems and processes. In this paper we start by reviewing how Boltzmann-Gibbs-Shannon entropy is related to multiplicities of independent random processes. We then show how the relaxation of independence naturally leads to the most general entropies that are compatible with the first three Shannon-Khinchin axioms, the (c,d)-entropies. We demonstrate that the MEP is a perfectly consistent concept for nonergodic and complex statistical systems if their relative entropy can be factored into a generalized multiplicity and a constraint term. The problem of finding such a factorization reduces to finding an appropriate representation of relative entropy in a linear basis. In a particular example we show that path-dependent random processes with memory naturally require specific generalized entropies. The example is to our knowledge the first exact derivation of a generalized entropy from the microscopic properties of a path-dependent random process.

Time Advice and Learning Questions in Computer Simulations

ERIC Educational Resources Information Center

Rey, Gunter Daniel

2011-01-01

Students (N = 101) used an introductory text and a computer simulation to learn fundamental concepts about statistical analyses (e.g., analysis of variance, regression analysis and General Linear Model). Each learner was randomly assigned to one cell of a 2 (with or without time advice) x 3 (with learning questions and corrective feedback, with…
Tests of Alignment among Assessment, Standards, and Instruction Using Generalized Linear Model Regression

ERIC Educational Resources Information Center

Fulmer, Gavin W.; Polikoff, Morgan S.

2014-01-01

An essential component in school accountability efforts is for assessments to be well-aligned with the standards or curriculum they are intended to measure. However, relatively little prior research has explored methods to determine statistical significance of alignment or misalignment. This study explores analyses of alignment as a special case…
Factor Scores, Structure and Communality Coefficients: A Primer

ERIC Educational Resources Information Center

Odum, Mary

2011-01-01

(Purpose) The purpose of this paper is to present an easy-to-understand primer on three important concepts of factor analysis: Factor scores, structure coefficients, and communality coefficients. Given that statistical analyses are a part of a global general linear model (GLM), and utilize weights as an integral part of analyses (Thompson, 2006;…
Prescriptive Statements and Educational Practice: What Can Structural Equation Modeling (SEM) Offer?

ERIC Educational Resources Information Center

Martin, Andrew J.

2011-01-01

Longitudinal structural equation modeling (SEM) can be a basis for making prescriptive statements on educational practice and offers yields over "traditional" statistical techniques under the general linear model. The extent to which prescriptive statements can be made will rely on the appropriate accommodation of key elements of research design,…
Power Analysis for Complex Mediational Designs Using Monte Carlo Methods

ERIC Educational Resources Information Center

Thoemmes, Felix; MacKinnon, David P.; Reiser, Mark R.

2010-01-01

Applied researchers often include mediation effects in applications of advanced methods such as latent variable models and linear growth curve models. Guidance on how to estimate statistical power to detect mediation for these models has not yet been addressed in the literature. We describe a general framework for power analyses for complex…
Electronic Resource Expenditure and the Decline in Reference Transaction Statistics in Academic Libraries

ERIC Educational Resources Information Center

Dubnjakovic, Ana

2012-01-01

The current study investigates factors influencing increase in reference transactions in a typical week in academic libraries across the United States of America. Employing multiple regression analysis and general linear modeling, variables of interest from the "Academic Library Survey (ALS) 2006" survey (sample size 3960 academic libraries) were…
Correlations between human mobility and social interaction reveal general activity patterns.

PubMed

Mollgaard, Anders; Lehmann, Sune; Mathiesen, Joachim

2017-01-01

A day in the life of a person involves a broad range of activities which are common across many people. Going beyond diurnal cycles, a central question is: to what extent do individuals act according to patterns shared across an entire population? Here we investigate the interplay between different activity types, namely communication, motion, and physical proximity by analyzing data collected from smartphones distributed among 638 individuals. We explore two central questions: Which underlying principles govern the formation of the activity patterns? Are the patterns specific to each individual or shared across the entire population? We find that statistics of the entire population allows us to successfully predict 71% of the activity and 85% of the inactivity involved in communication, mobility, and physical proximity. Surprisingly, individual level statistics only result in marginally better predictions, indicating that a majority of activity patterns are shared across our sample population. Finally, we predict short-term activity patterns using a generalized linear model, which suggests that a simple linear description might be sufficient to explain a wide range of actions, whether they be of social or of physical character.
Focal activation of primary visual cortex following supra-choroidal electrical stimulation of the retina: Intrinsic signal imaging and linear model analysis.

PubMed

Cloherty, Shaun L; Hietanen, Markus A; Suaning, Gregg J; Ibbotson, Michael R

2010-01-01

We performed optical intrinsic signal imaging of cat primary visual cortex (Area 17 and 18) while delivering bipolar electrical stimulation to the retina by way of a supra-choroidal electrode array. Using a general linear model (GLM) analysis we identified statistically significant (p < 0.01) activation in a localized region of cortex following supra-threshold electrical stimulation at a single retinal locus. (1) demonstrate that intrinsic signal imaging combined with linear model analysis provides a powerful tool for assessing cortical responses to prosthetic stimulation, and (2) confirm that supra-choroidal electrical stimulation can achieve localized activation of the cortex consistent with focal activation of the retina.
A d-statistic for single-case designs that is equivalent to the usual between-groups d-statistic.

PubMed

Shadish, William R; Hedges, Larry V; Pustejovsky, James E; Boyajian, Jonathan G; Sullivan, Kristynn J; Andrade, Alma; Barrientos, Jeannette L

2014-01-01

We describe a standardised mean difference statistic (d) for single-case designs that is equivalent to the usual d in between-groups experiments. We show how it can be used to summarise treatment effects over cases within a study, to do power analyses in planning new studies and grant proposals, and to meta-analyse effects across studies of the same question. We discuss limitations of this d-statistic, and possible remedies to them. Even so, this d-statistic is better founded statistically than other effect size measures for single-case design, and unlike many general linear model approaches such as multilevel modelling or generalised additive models, it produces a standardised effect size that can be integrated over studies with different outcome measures. SPSS macros for both effect size computation and power analysis are available.
Advanced statistics: linear regression, part II: multiple linear regression.

PubMed

Marill, Keith A

2004-01-01

The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.
Generalized linear mixed models with varying coefficients for longitudinal data.

PubMed

Zhang, Daowen

2004-03-01

The routinely assumed parametric functional form in the linear predictor of a generalized linear mixed model for longitudinal data may be too restrictive to represent true underlying covariate effects. We relax this assumption by representing these covariate effects by smooth but otherwise arbitrary functions of time, with random effects used to model the correlation induced by among-subject and within-subject variation. Due to the usually intractable integration involved in evaluating the quasi-likelihood function, the double penalized quasi-likelihood (DPQL) approach of Lin and Zhang (1999, Journal of the Royal Statistical Society, Series B61, 381-400) is used to estimate the varying coefficients and the variance components simultaneously by representing a nonparametric function by a linear combination of fixed effects and random effects. A scaled chi-squared test based on the mixed model representation of the proposed model is developed to test whether an underlying varying coefficient is a polynomial of certain degree. We evaluate the performance of the procedures through simulation studies and illustrate their application with Indonesian children infectious disease data.
Multivariate statistical analysis: Principles and applications to coorbital streams of meteorite falls

NASA Technical Reports Server (NTRS)

Wolf, S. F.; Lipschutz, M. E.

1993-01-01

Multivariate statistical analysis techniques (linear discriminant analysis and logistic regression) can provide powerful discrimination tools which are generally unfamiliar to the planetary science community. Fall parameters were used to identify a group of 17 H chondrites (Cluster 1) that were part of a coorbital stream which intersected Earth's orbit in May, from 1855 - 1895, and can be distinguished from all other H chondrite falls. Using multivariate statistical techniques, it was demonstrated that a totally different criterion, labile trace element contents - hence thermal histories - or 13 Cluster 1 meteorites are distinguishable from those of 45 non-Cluster 1 H chondrites. Here, we focus upon the principles of multivariate statistical techniques and illustrate their application using non-meteoritic and meteoritic examples.
Instructional Advice, Time Advice and Learning Questions in Computer Simulations

ERIC Educational Resources Information Center

Rey, Gunter Daniel

2010-01-01

Undergraduate students (N = 97) used an introductory text and a computer simulation to learn fundamental concepts about statistical analyses (e.g., analysis of variance, regression analysis and General Linear Model). Each learner was randomly assigned to one cell of a 2 (with or without instructional advice) x 2 (with or without time advice) x 2…
Wave kinetics of random fibre lasers

PubMed Central

Churkin, D V.; Kolokolov, I V.; Podivilov, E V.; Vatnik, I D.; Nikulin, M A.; Vergeles, S S.; Terekhov, I S.; Lebedev, V V.; Falkovich, G.; Babin, S A.; Turitsyn, S K.

2015-01-01

Traditional wave kinetics describes the slow evolution of systems with many degrees of freedom to equilibrium via numerous weak non-linear interactions and fails for very important class of dissipative (active) optical systems with cyclic gain and losses, such as lasers with non-linear intracavity dynamics. Here we introduce a conceptually new class of cyclic wave systems, characterized by non-uniform double-scale dynamics with strong periodic changes of the energy spectrum and slow evolution from cycle to cycle to a statistically steady state. Taking a practically important example—random fibre laser—we show that a model describing such a system is close to integrable non-linear Schrödinger equation and needs a new formalism of wave kinetics, developed here. We derive a non-linear kinetic theory of the laser spectrum, generalizing the seminal linear model of Schawlow and Townes. Experimental results agree with our theory. The work has implications for describing kinetics of cyclical systems beyond photonics. PMID:25645177
Using a generalized linear mixed model approach to explore the role of age, motor proficiency, and cognitive styles in children's reach estimation accuracy.

PubMed

Caçola, Priscila M; Pant, Mohan D

2014-10-01

The purpose was to use a multi-level statistical technique to analyze how children's age, motor proficiency, and cognitive styles interact to affect accuracy on reach estimation tasks via Motor Imagery and Visual Imagery. Results from the Generalized Linear Mixed Model analysis (GLMM) indicated that only the 7-year-old age group had significant random intercepts for both tasks. Motor proficiency predicted accuracy in reach tasks, and cognitive styles (object scale) predicted accuracy in the motor imagery task. GLMM analysis is suitable to explore age and other parameters of development. In this case, it allowed an assessment of motor proficiency interacting with age to shape how children represent, plan, and act on the environment.
Does competition improve financial stability of the banking sector in ASEAN countries? An empirical analysis.

PubMed

Noman, Abu Hanifa Md; Gee, Chan Sok; Isa, Che Ruhana

2017-01-01

This study examines the influence of competition on the financial stability of the commercial banks of Association of Southeast Asian Nation (ASEAN) over the 1990 to 2014 period. Panzar-Rosse H-statistic, Lerner index and Herfindahl-Hirschman Index (HHI) are used as measures of competition, while Z-score, non-performing loan (NPL) ratio and equity ratio are used as measures of financial stability. Two-step system Generalized Method of Moments (GMM) estimates demonstrate that competition measured by H-statistic is positively related to Z-score and equity ratio, and negatively related to non-performing loan ratio. Conversely, market power measured by Lerner index is negatively related to Z-score and equity ratio and positively related to NPL ratio. These results strongly support the competition-stability view for ASEAN banks. We also capture the non-linear relationship between competition and financial stability by incorporating a quadratic term of competition in our models. The results show that the coefficient of the quadratic term of H-statistic is negative for the Z-score model given a positive coefficient of the linear term in the same model. These results support the non-linear relationship between competition and financial stability of the banking sector. The study contains significant policy implications for improving the financial stability of the commercial banks.
Does competition improve financial stability of the banking sector in ASEAN countries? An empirical analysis

PubMed Central

Gee, Chan Sok; Isa, Che Ruhana

2017-01-01

This study examines the influence of competition on the financial stability of the commercial banks of Association of Southeast Asian Nation (ASEAN) over the 1990 to 2014 period. Panzar-Rosse H-statistic, Lerner index and Herfindahl-Hirschman Index (HHI) are used as measures of competition, while Z-score, non-performing loan (NPL) ratio and equity ratio are used as measures of financial stability. Two-step system Generalized Method of Moments (GMM) estimates demonstrate that competition measured by H-statistic is positively related to Z-score and equity ratio, and negatively related to non-performing loan ratio. Conversely, market power measured by Lerner index is negatively related to Z-score and equity ratio and positively related to NPL ratio. These results strongly support the competition-stability view for ASEAN banks. We also capture the non-linear relationship between competition and financial stability by incorporating a quadratic term of competition in our models. The results show that the coefficient of the quadratic term of H-statistic is negative for the Z-score model given a positive coefficient of the linear term in the same model. These results support the non-linear relationship between competition and financial stability of the banking sector. The study contains significant policy implications for improving the financial stability of the commercial banks. PMID:28486548
Comparing a single case to a control group - Applying linear mixed effects models to repeated measures data.

PubMed

Huber, Stefan; Klein, Elise; Moeller, Korbinian; Willmes, Klaus

2015-10-01

In neuropsychological research, single-cases are often compared with a small control sample. Crawford and colleagues developed inferential methods (i.e., the modified t-test) for such a research design. In the present article, we suggest an extension of the methods of Crawford and colleagues employing linear mixed models (LMM). We first show that a t-test for the significance of a dummy coded predictor variable in a linear regression is equivalent to the modified t-test of Crawford and colleagues. As an extension to this idea, we then generalized the modified t-test to repeated measures data by using LMMs to compare the performance difference in two conditions observed in a single participant to that of a small control group. The performance of LMMs regarding Type I error rates and statistical power were tested based on Monte-Carlo simulations. We found that starting with about 15-20 participants in the control sample Type I error rates were close to the nominal Type I error rate using the Satterthwaite approximation for the degrees of freedom. Moreover, statistical power was acceptable. Therefore, we conclude that LMMs can be applied successfully to statistically evaluate performance differences between a single-case and a control sample. Copyright © 2015 Elsevier Ltd. All rights reserved.
Generalized two-dimensional chiral QED: Anomaly and exotic statistics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Saradzhev, F.M.

1997-07-01

We study the influence of the anomaly on the physical quantum picture of the generalized chiral Schwinger model defined on S{sup 1}. We show that the anomaly (i) results in the background linearly rising electric field and (ii) makes the spectrum of the physical Hamiltonian nonrelativistic without a massive boson. The physical matter fields acquire exotic statistics. We construct explicitly the algebra of the Poincar{acute e} generators and show that it differs from the Poincar{acute e} one. We exhibit the role of the vacuum Berry phase in the failure of the Poincar{acute e} algebra to close. We prove that, inmore » spite of the background electric field, such phenomenon as the total screening of external charges characteristic for the standard Schwinger model takes place in the generalized chiral Schwinger model, too. {copyright} {ital 1997} {ital The American Physical Society}« less
Do US metropolitan core counties have lower scope 1 and 2 CO2 emissions than less urbanized counties?

NASA Astrophysics Data System (ADS)

Tamayao, M. M.; Blackhurst, M. F.; Matthews, H. S.

2014-10-01

Recent sustainability research has focused on urban systems given their high share of environmental impacts and potential for centralized impact mitigation. Recent research emphasizes descriptive statistics from place-based case studies to argue for policy action. This limits the potential for general insights and decision support. Here, we implement generalized linear and multiple linear regression analyses to obtain more robust insights on the relationship between urbanization and greenhouse gas (GHG) emissions in the US We used consistently derived county-level scope 1 and scope 2 GHG inventories for our response variable while predictor variables included dummy-coded variables for county geographic type (central, outlying, and nonmetropolitan), median household income, population density, and climate indices (heating degree days (HDD) and cooling degree days (CDD)). We find that there is not enough statistical evidence indicating per capita scope 1 and 2 emissions differ by geographic type, ceteris paribus. These results are robust for different assumed electricity emissions factors. We do find statistically significant differences in per capita emissions by sector for different county types, with transportation and residential emissions highest in nonmetropolitan (rural) counties, transportation emissions lowest in central counties, and commercial sector emissions highest in central counties. These results indicate the importance of regional land use and transportation dynamics when planning local emissions mitigation measures.

Identifiability of PBPK Models with Applications to ...

EPA Pesticide Factsheets

Any statistical model should be identifiable in order for estimates and tests using it to be meaningful. We consider statistical analysis of physiologically-based pharmacokinetic (PBPK) models in which parameters cannot be estimated precisely from available data, and discuss different types of identifiability that occur in PBPK models and give reasons why they occur. We particularly focus on how the mathematical structure of a PBPK model and lack of appropriate data can lead to statistical models in which it is impossible to estimate at least some parameters precisely. Methods are reviewed which can determine whether a purely linear PBPK model is globally identifiable. We propose a theorem which determines when identifiability at a set of finite and specific values of the mathematical PBPK model (global discrete identifiability) implies identifiability of the statistical model. However, we are unable to establish conditions that imply global discrete identifiability, and conclude that the only safe approach to analysis of PBPK models involves Bayesian analysis with truncated priors. Finally, computational issues regarding posterior simulations of PBPK models are discussed. The methodology is very general and can be applied to numerous PBPK models which can be expressed as linear time-invariant systems. A real data set of a PBPK model for exposure to dimethyl arsinic acid (DMA(V)) is presented to illustrate the proposed methodology. We consider statistical analy
An R2 statistic for fixed effects in the linear mixed model.

PubMed

Edwards, Lloyd J; Muller, Keith E; Wolfinger, Russell D; Qaqish, Bahjat F; Schabenberger, Oliver

2008-12-20

Statisticians most often use the linear mixed model to analyze Gaussian longitudinal data. The value and familiarity of the R(2) statistic in the linear univariate model naturally creates great interest in extending it to the linear mixed model. We define and describe how to compute a model R(2) statistic for the linear mixed model by using only a single model. The proposed R(2) statistic measures multivariate association between the repeated outcomes and the fixed effects in the linear mixed model. The R(2) statistic arises as a 1-1 function of an appropriate F statistic for testing all fixed effects (except typically the intercept) in a full model. The statistic compares the full model with a null model with all fixed effects deleted (except typically the intercept) while retaining exactly the same covariance structure. Furthermore, the R(2) statistic leads immediately to a natural definition of a partial R(2) statistic. A mixed model in which ethnicity gives a very small p-value as a longitudinal predictor of blood pressure (BP) compellingly illustrates the value of the statistic. In sharp contrast to the extreme p-value, a very small R(2) , a measure of statistical and scientific importance, indicates that ethnicity has an almost negligible association with the repeated BP outcomes for the study.
Longitudinal data analyses using linear mixed models in SPSS: concepts, procedures and illustrations.

PubMed

Shek, Daniel T L; Ma, Cecilia M S

2011-01-05

Although different methods are available for the analyses of longitudinal data, analyses based on generalized linear models (GLM) are criticized as violating the assumption of independence of observations. Alternatively, linear mixed models (LMM) are commonly used to understand changes in human behavior over time. In this paper, the basic concepts surrounding LMM (or hierarchical linear models) are outlined. Although SPSS is a statistical analyses package commonly used by researchers, documentation on LMM procedures in SPSS is not thorough or user friendly. With reference to this limitation, the related procedures for performing analyses based on LMM in SPSS are described. To demonstrate the application of LMM analyses in SPSS, findings based on six waves of data collected in the Project P.A.T.H.S. (Positive Adolescent Training through Holistic Social Programmes) in Hong Kong are presented.
Longitudinal Data Analyses Using Linear Mixed Models in SPSS: Concepts, Procedures and Illustrations

PubMed Central

Shek, Daniel T. L.; Ma, Cecilia M. S.

2011-01-01

Although different methods are available for the analyses of longitudinal data, analyses based on generalized linear models (GLM) are criticized as violating the assumption of independence of observations. Alternatively, linear mixed models (LMM) are commonly used to understand changes in human behavior over time. In this paper, the basic concepts surrounding LMM (or hierarchical linear models) are outlined. Although SPSS is a statistical analyses package commonly used by researchers, documentation on LMM procedures in SPSS is not thorough or user friendly. With reference to this limitation, the related procedures for performing analyses based on LMM in SPSS are described. To demonstrate the application of LMM analyses in SPSS, findings based on six waves of data collected in the Project P.A.T.H.S. (Positive Adolescent Training through Holistic Social Programmes) in Hong Kong are presented. PMID:21218263
Application of the Hyper-Poisson Generalized Linear Model for Analyzing Motor Vehicle Crashes.

PubMed

Khazraee, S Hadi; Sáez-Castillo, Antonio Jose; Geedipally, Srinivas Reddy; Lord, Dominique

2015-05-01

The hyper-Poisson distribution can handle both over- and underdispersion, and its generalized linear model formulation allows the dispersion of the distribution to be observation-specific and dependent on model covariates. This study's objective is to examine the potential applicability of a newly proposed generalized linear model framework for the hyper-Poisson distribution in analyzing motor vehicle crash count data. The hyper-Poisson generalized linear model was first fitted to intersection crash data from Toronto, characterized by overdispersion, and then to crash data from railway-highway crossings in Korea, characterized by underdispersion. The results of this study are promising. When fitted to the Toronto data set, the goodness-of-fit measures indicated that the hyper-Poisson model with a variable dispersion parameter provided a statistical fit as good as the traditional negative binomial model. The hyper-Poisson model was also successful in handling the underdispersed data from Korea; the model performed as well as the gamma probability model and the Conway-Maxwell-Poisson model previously developed for the same data set. The advantages of the hyper-Poisson model studied in this article are noteworthy. Unlike the negative binomial model, which has difficulties in handling underdispersed data, the hyper-Poisson model can handle both over- and underdispersed crash data. Although not a major issue for the Conway-Maxwell-Poisson model, the effect of each variable on the expected mean of crashes is easily interpretable in the case of this new model. © 2014 Society for Risk Analysis.
Linear maps preserving maximal deviation and the Jordan structure of quantum systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hamhalter, Jan

2012-12-15

In the algebraic approach to quantum theory, a quantum observable is given by an element of a Jordan algebra and a state of the system is modelled by a normalized positive functional on the underlying algebra. Maximal deviation of a quantum observable is the largest statistical deviation one can obtain in a particular state of the system. The main result of the paper shows that each linear bijective transformation between JBW algebras preserving maximal deviations is formed by a Jordan isomorphism or a minus Jordan isomorphism perturbed by a linear functional multiple of an identity. It shows that only onemore » numerical statistical characteristic has the power to determine the Jordan algebraic structure completely. As a consequence, we obtain that only very special maps can preserve the diameter of the spectra of elements. Nonlinear maps preserving the pseudometric given by maximal deviation are also described. The results generalize hitherto known theorems on preservers of maximal deviation in the case of self-adjoint parts of von Neumann algebras proved by Molnar.« less
Cocaine Dependence Treatment Data: Methods for Measurement Error Problems With Predictors Derived From Stationary Stochastic Processes

PubMed Central

Guan, Yongtao; Li, Yehua; Sinha, Rajita

2011-01-01

In a cocaine dependence treatment study, we use linear and nonlinear regression models to model posttreatment cocaine craving scores and first cocaine relapse time. A subset of the covariates are summary statistics derived from baseline daily cocaine use trajectories, such as baseline cocaine use frequency and average daily use amount. These summary statistics are subject to estimation error and can therefore cause biased estimators for the regression coefficients. Unlike classical measurement error problems, the error we encounter here is heteroscedastic with an unknown distribution, and there are no replicates for the error-prone variables or instrumental variables. We propose two robust methods to correct for the bias: a computationally efficient method-of-moments-based method for linear regression models and a subsampling extrapolation method that is generally applicable to both linear and nonlinear regression models. Simulations and an application to the cocaine dependence treatment data are used to illustrate the efficacy of the proposed methods. Asymptotic theory and variance estimation for the proposed subsampling extrapolation method and some additional simulation results are described in the online supplementary material. PMID:21984854
Fluorescent biopsy of biological tissues in differentiation of benign and malignant tumors of prostate

NASA Astrophysics Data System (ADS)

Trifoniuk, L. I.; Ushenko, Yu. A.; Sidor, M. I.; Minzer, O. P.; Gritsyuk, M. V.; Novakovskaya, O. Y.

2014-08-01

The work consists of investigation results of diagnostic efficiency of a new azimuthally stable Mueller-matrix method of analysis of laser autofluorescence coordinate distributions of biological tissues histological sections. A new model of generalized optical anisotropy of biological tissues protein networks is proposed in order to define the processes of laser autofluorescence. The influence of complex mechanisms of both phase anisotropy (linear birefringence and optical activity) and linear (circular) dichroism is taken into account. The interconnections between the azimuthally stable Mueller-matrix elements characterizing laser autofluorescence and different mechanisms of optical anisotropy are determined. The statistic analysis of coordinate distributions of such Mueller-matrix rotation invariants is proposed. Thereupon the quantitative criteria (statistic moments of the 1st to the 4th order) of differentiation of histological sections of uterus wall tumor - group 1 (dysplasia) and group 2 (adenocarcinoma) are estimated.
System of polarization correlometry of polycrystalline layers of urine in the differentiation stage of diabetes

NASA Astrophysics Data System (ADS)

Ushenko, Yu. O.; Pashkovskaya, N. V.; Marchuk, Y. F.; Dubolazov, O. V.; Savich, V. O.

2015-08-01

The work consists of investigation results of diagnostic efficiency of a new azimuthally stable Muellermatrix method of analysis of laser autofluorescence coordinate distributions of biological liquid layers. A new model of generalized optical anisotropy of biological tissues protein networks is proposed in order to define the processes of laser autofluorescence. The influence of complex mechanisms of both phase anisotropy (linear birefringence and optical activity) and linear (circular) dichroism is taken into account. The interconnections between the azimuthally stable Mueller-matrix elements characterizing laser autofluorescence and different mechanisms of optical anisotropy are determined. The statistic analysis of coordinate distributions of such Mueller-matrix rotation invariants is proposed. Thereupon the quantitative criteria (statistic moments of the 1st to the 4th order) of differentiation of human urine polycrystalline layers for the sake of diagnosing and differentiating cholelithiasis with underlying chronic cholecystitis (group 1) and diabetes mellitus of degree II (group 2) are estimated.
Mueller-matrix of laser-induced autofluorescence of polycrystalline films of dried peritoneal fluid in diagnostics of endometriosis

NASA Astrophysics Data System (ADS)

Ushenko, Yuriy A.; Koval, Galina D.; Ushenko, Alexander G.; Dubolazov, Olexander V.; Ushenko, Vladimir A.; Novakovskaia, Olga Yu.

2016-07-01

This research presents investigation results of the diagnostic efficiency of an azimuthally stable Mueller-matrix method of analysis of laser autofluorescence of polycrystalline films of dried uterine cavity peritoneal fluid. A model of the generalized optical anisotropy of films of dried peritoneal fluid is proposed in order to define the processes of laser autofluorescence. The influence of complex mechanisms of both phase (linear and circular birefringence) and amplitude (linear and circular dichroism) anisotropies is taken into consideration. The interconnections between the azimuthally stable Mueller-matrix elements characterizing laser autofluorescence and different mechanisms of optical anisotropy are determined. The statistical analysis of coordinate distributions of such Mueller-matrix rotation invariants is proposed. Thereupon the quantitative criteria (statistic moments of the first to the fourth order) of differentiation of polycrystalline films of dried peritoneal fluid, group 1 (healthy donors) and group 2 (uterus endometriosis patients), are determined.
Methods and means of Fourier-Stokes polarimetry and the spatial-frequency filtering of phase anisotropy manifestations in endometriosis diagnostics

NASA Astrophysics Data System (ADS)

Ushenko, A. G.; Dubolazov, O. V.; Ushenko, Vladimir A.; Ushenko, Yu. A.; Sakhnovskiy, M. Yu.; Prydiy, O. G.; Lakusta, I. I.; Novakovskaya, O. Yu.; Melenko, S. R.

2016-12-01

This research presents investigation results of diagnostic efficiency of a new azimuthally stable Mueller-matrix method of laser autofluorescence coordinate distributions analysis of dried polycrystalline films of uterine cavity peritoneal fluid. A new model of generalized optical anisotropy of biological tissues protein networks is proposed in order to define the processes of laser autofluorescence. The influence of complex mechanisms of both phase anisotropy (linear birefringence and optical activity) and linear (circular) dichroism is taken into account. The interconnections between the azimuthally stable Mueller-matrix elements characterizing laser autofluorescence and different mechanisms of optical anisotropy are determined. The statistic analysis of coordinate distributions of such Mueller-matrix rotation invariants is proposed. Thereupon the quantitative criteria (statistic moments of the 1st to the 4th order) of differentiation of dried polycrystalline films of peritoneal fluid - group 1 (healthy donors) and group 2 (uterus endometriosis patients) are estimated.
On Association Coefficients for 2x2 Tables and Properties that Do Not Depend on the Marginal Distributions

ERIC Educational Resources Information Center

Warrens, Matthijs J.

2008-01-01

We discuss properties that association coefficients may have in general, e.g., zero value under statistical independence, and we examine coefficients for 2x2 tables with respect to these properties. Furthermore, we study a family of coefficients that are linear transformations of the observed proportion of agreement given the marginal…
Generalized linear models and point count data: statistical considerations for the design and analysis of monitoring studies

Treesearch

Nathaniel E. Seavy; Suhel Quader; John D. Alexander; C. John Ralph

2005-01-01

The success of avian monitoring programs to effectively guide management decisions requires that studies be efficiently designed and data be properly analyzed. A complicating factor is that point count surveys often generate data with non-normal distributional properties. In this paper we review methods of dealing with deviations from normal assumptions, and we focus...
Building out a Measurement Model to Incorporate Complexities of Testing in the Language Domain

ERIC Educational Resources Information Center

Wilson, Mark; Moore, Stephen

2011-01-01

This paper provides a summary of a novel and integrated way to think about the item response models (most often used in measurement applications in social science areas such as psychology, education, and especially testing of various kinds) from the viewpoint of the statistical theory of generalized linear and nonlinear mixed models. In addition,…
Analysis of Parasite and Other Skewed Counts

PubMed Central

Alexander, Neal

2012-01-01

Objective To review methods for the statistical analysis of parasite and other skewed count data. Methods Statistical methods for skewed count data are described and compared, with reference to those used over a ten year period of Tropical Medicine and International Health. Two parasitological datasets are used for illustration. Results Ninety papers were identified, 89 with descriptive and 60 with inferential analysis. A lack of clarity is noted in identifying measures of location, in particular the Williams and geometric mean. The different measures are compared, emphasizing the legitimacy of the arithmetic mean for skewed data. In the published papers, the t test and related methods were often used on untransformed data, which is likely to be invalid. Several approaches to inferential analysis are described, emphasizing 1) non-parametric methods, while noting that they are not simply comparisons of medians, and 2) generalized linear modelling, in particular with the negative binomial distribution. Additional methods, such as the bootstrap, with potential for greater use are described. Conclusions Clarity is recommended when describing transformations and measures of location. It is suggested that non-parametric methods and generalized linear models are likely to be sufficient for most analyses. PMID:22943299
[Analysis of variance of repeated data measured by water maze with SPSS].

PubMed

Qiu, Hong; Jin, Guo-qin; Jin, Ru-feng; Zhao, Wei-kang

2007-01-01

To introduce the method of analyzing repeated data measured by water maze with SPSS 11.0, and offer a reference statistical method to clinical and basic medicine researchers who take the design of repeated measures. Using repeated measures and multivariate analysis of variance (ANOVA) process of the general linear model in SPSS and giving comparison among different groups and different measure time pairwise. Firstly, Mauchly's test of sphericity should be used to judge whether there were relations among the repeatedly measured data. If any (P
Removing an intersubject variance component in a general linear model improves multiway factoring of event-related spectral perturbations in group EEG studies.

PubMed

Spence, Jeffrey S; Brier, Matthew R; Hart, John; Ferree, Thomas C

2013-03-01

Linear statistical models are used very effectively to assess task-related differences in EEG power spectral analyses. Mixed models, in particular, accommodate more than one variance component in a multisubject study, where many trials of each condition of interest are measured on each subject. Generally, intra- and intersubject variances are both important to determine correct standard errors for inference on functions of model parameters, but it is often assumed that intersubject variance is the most important consideration in a group study. In this article, we show that, under common assumptions, estimates of some functions of model parameters, including estimates of task-related differences, are properly tested relative to the intrasubject variance component only. A substantial gain in statistical power can arise from the proper separation of variance components when there is more than one source of variability. We first develop this result analytically, then show how it benefits a multiway factoring of spectral, spatial, and temporal components from EEG data acquired in a group of healthy subjects performing a well-studied response inhibition task. Copyright © 2011 Wiley Periodicals, Inc.
Robust biological parametric mapping: an improved technique for multimodal brain image analysis

NASA Astrophysics Data System (ADS)

Yang, Xue; Beason-Held, Lori; Resnick, Susan M.; Landman, Bennett A.

2011-03-01

Mapping the quantitative relationship between structure and function in the human brain is an important and challenging problem. Numerous volumetric, surface, region of interest and voxelwise image processing techniques have been developed to statistically assess potential correlations between imaging and non-imaging metrics. Recently, biological parametric mapping has extended the widely popular statistical parametric approach to enable application of the general linear model to multiple image modalities (both for regressors and regressands) along with scalar valued observations. This approach offers great promise for direct, voxelwise assessment of structural and functional relationships with multiple imaging modalities. However, as presented, the biological parametric mapping approach is not robust to outliers and may lead to invalid inferences (e.g., artifactual low p-values) due to slight mis-registration or variation in anatomy between subjects. To enable widespread application of this approach, we introduce robust regression and robust inference in the neuroimaging context of application of the general linear model. Through simulation and empirical studies, we demonstrate that our robust approach reduces sensitivity to outliers without substantial degradation in power. The robust approach and associated software package provides a reliable way to quantitatively assess voxelwise correlations between structural and functional neuroimaging modalities.
Central Limit Theorem for Exponentially Quasi-local Statistics of Spin Models on Cayley Graphs

NASA Astrophysics Data System (ADS)

Reddy, Tulasi Ram; Vadlamani, Sreekar; Yogeshwaran, D.

2018-04-01

Central limit theorems for linear statistics of lattice random fields (including spin models) are usually proven under suitable mixing conditions or quasi-associativity. Many interesting examples of spin models do not satisfy mixing conditions, and on the other hand, it does not seem easy to show central limit theorem for local statistics via quasi-associativity. In this work, we prove general central limit theorems for local statistics and exponentially quasi-local statistics of spin models on discrete Cayley graphs with polynomial growth. Further, we supplement these results by proving similar central limit theorems for random fields on discrete Cayley graphs taking values in a countable space, but under the stronger assumptions of α -mixing (for local statistics) and exponential α -mixing (for exponentially quasi-local statistics). All our central limit theorems assume a suitable variance lower bound like many others in the literature. We illustrate our general central limit theorem with specific examples of lattice spin models and statistics arising in computational topology, statistical physics and random networks. Examples of clustering spin models include quasi-associated spin models with fast decaying covariances like the off-critical Ising model, level sets of Gaussian random fields with fast decaying covariances like the massive Gaussian free field and determinantal point processes with fast decaying kernels. Examples of local statistics include intrinsic volumes, face counts, component counts of random cubical complexes while exponentially quasi-local statistics include nearest neighbour distances in spin models and Betti numbers of sub-critical random cubical complexes.
Estimation of Quasi-Stiffness of the Human Hip in the Stance Phase of Walking

PubMed Central

Shamaei, Kamran; Sawicki, Gregory S.; Dollar, Aaron M.

2013-01-01

This work presents a framework for selection of subject-specific quasi-stiffness of hip orthoses and exoskeletons, and other devices that are intended to emulate the biological performance of this joint during walking. The hip joint exhibits linear moment-angular excursion behavior in both the extension and flexion stages of the resilient loading-unloading phase that consists of terminal stance and initial swing phases. Here, we establish statistical models that can closely estimate the slope of linear fits to the moment-angle graph of the hip in this phase, termed as the quasi-stiffness of the hip. Employing an inverse dynamics analysis, we identify a series of parameters that can capture the nearly linear hip quasi-stiffnesses in the resilient loading phase. We then employ regression analysis on experimental moment-angle data of 216 gait trials across 26 human adults walking over a wide range of gait speeds (0.75–2.63 m/s) to obtain a set of general-form statistical models that estimate the hip quasi-stiffnesses using body weight and height, gait speed, and hip excursion. We show that the general-form models can closely estimate the hip quasi-stiffness in the extension (R2 = 92%) and flexion portions (R2 = 89%) of the resilient loading phase of the gait. We further simplify the general-form models and present a set of stature-based models that can estimate the hip quasi-stiffness for the preferred gait speed using only body weight and height with an average error of 27% for the extension stage and 37% for the flexion stage. PMID:24349136

Using complexity metrics with R-R intervals and BPM heart rate measures

PubMed Central

Wallot, Sebastian; Fusaroli, Riccardo; Tylén, Kristian; Jegindø, Else-Marie

2013-01-01

Lately, growing attention in the health sciences has been paid to the dynamics of heart rate as indicator of impending failures and for prognoses. Likewise, in social and cognitive sciences, heart rate is increasingly employed as a measure of arousal, emotional engagement and as a marker of interpersonal coordination. However, there is no consensus about which measurements and analytical tools are most appropriate in mapping the temporal dynamics of heart rate and quite different metrics are reported in the literature. As complexity metrics of heart rate variability depend critically on variability of the data, different choices regarding the kind of measures can have a substantial impact on the results. In this article we compare linear and non-linear statistics on two prominent types of heart beat data, beat-to-beat intervals (R-R interval) and beats-per-min (BPM). As a proof-of-concept, we employ a simple rest-exercise-rest task and show that non-linear statistics—fractal (DFA) and recurrence (RQA) analyses—reveal information about heart beat activity above and beyond the simple level of heart rate. Non-linear statistics unveil sustained post-exercise effects on heart rate dynamics, but their power to do so critically depends on the type data that is employed: While R-R intervals are very susceptible to non-linear analyses, the success of non-linear methods for BPM data critically depends on their construction. Generally, “oversampled” BPM time-series can be recommended as they retain most of the information about non-linear aspects of heart beat dynamics. PMID:23964244
Order Selection for General Expression of Nonlinear Autoregressive Model Based on Multivariate Stepwise Regression

NASA Astrophysics Data System (ADS)

Shi, Jinfei; Zhu, Songqing; Chen, Ruwen

2017-12-01

An order selection method based on multiple stepwise regressions is proposed for General Expression of Nonlinear Autoregressive model which converts the model order problem into the variable selection of multiple linear regression equation. The partial autocorrelation function is adopted to define the linear term in GNAR model. The result is set as the initial model, and then the nonlinear terms are introduced gradually. Statistics are chosen to study the improvements of both the new introduced and originally existed variables for the model characteristics, which are adopted to determine the model variables to retain or eliminate. So the optimal model is obtained through data fitting effect measurement or significance test. The simulation and classic time-series data experiment results show that the method proposed is simple, reliable and can be applied to practical engineering.
Molecular representation of molar domain (volume), evolution equations, and linear constitutive relations for volume transport.

PubMed

Eu, Byung Chan

2008-09-07

In the traditional theories of irreversible thermodynamics and fluid mechanics, the specific volume and molar volume have been interchangeably used for pure fluids, but in this work we show that they should be distinguished from each other and given distinctive statistical mechanical representations. In this paper, we present a general formula for the statistical mechanical representation of molecular domain (volume or space) by using the Voronoi volume and its mean value that may be regarded as molar domain (volume) and also the statistical mechanical representation of volume flux. By using their statistical mechanical formulas, the evolution equations of volume transport are derived from the generalized Boltzmann equation of fluids. Approximate solutions of the evolution equations of volume transport provides kinetic theory formulas for the molecular domain, the constitutive equations for molar domain (volume) and volume flux, and the dissipation of energy associated with volume transport. Together with the constitutive equation for the mean velocity of the fluid obtained in a previous paper, the evolution equations for volume transport not only shed a fresh light on, and insight into, irreversible phenomena in fluids but also can be applied to study fluid flow problems in a manner hitherto unavailable in fluid dynamics and irreversible thermodynamics. Their roles in the generalized hydrodynamics will be considered in the sequel.
Evaluating and implementing temporal, spatial, and spatio-temporal methods for outbreak detection in a local syndromic surveillance system

PubMed Central

Lall, Ramona; Levin-Rector, Alison; Sell, Jessica; Paladini, Marc; Konty, Kevin J.; Olson, Don; Weiss, Don

2017-01-01

The New York City Department of Health and Mental Hygiene has operated an emergency department syndromic surveillance system since 2001, using temporal and spatial scan statistics run on a daily basis for cluster detection. Since the system was originally implemented, a number of new methods have been proposed for use in cluster detection. We evaluated six temporal and four spatial/spatio-temporal detection methods using syndromic surveillance data spiked with simulated injections. The algorithms were compared on several metrics, including sensitivity, specificity, positive predictive value, coherence, and timeliness. We also evaluated each method’s implementation, programming time, run time, and the ease of use. Among the temporal methods, at a set specificity of 95%, a Holt-Winters exponential smoother performed the best, detecting 19% of the simulated injects across all shapes and sizes, followed by an autoregressive moving average model (16%), a generalized linear model (15%), a modified version of the Early Aberration Reporting System’s C2 algorithm (13%), a temporal scan statistic (11%), and a cumulative sum control chart (<2%). Of the spatial/spatio-temporal methods we tested, a spatial scan statistic detected 3% of all injects, a Bayes regression found 2%, and a generalized linear mixed model and a space-time permutation scan statistic detected none at a specificity of 95%. Positive predictive value was low (<7%) for all methods. Overall, the detection methods we tested did not perform well in identifying the temporal and spatial clusters of cases in the inject dataset. The spatial scan statistic, our current method for spatial cluster detection, performed slightly better than the other tested methods across different inject magnitudes and types. Furthermore, we found the scan statistics, as applied in the SaTScan software package, to be the easiest to program and implement for daily data analysis. PMID:28886112
Evaluating and implementing temporal, spatial, and spatio-temporal methods for outbreak detection in a local syndromic surveillance system.

PubMed

Mathes, Robert W; Lall, Ramona; Levin-Rector, Alison; Sell, Jessica; Paladini, Marc; Konty, Kevin J; Olson, Don; Weiss, Don

2017-01-01

The New York City Department of Health and Mental Hygiene has operated an emergency department syndromic surveillance system since 2001, using temporal and spatial scan statistics run on a daily basis for cluster detection. Since the system was originally implemented, a number of new methods have been proposed for use in cluster detection. We evaluated six temporal and four spatial/spatio-temporal detection methods using syndromic surveillance data spiked with simulated injections. The algorithms were compared on several metrics, including sensitivity, specificity, positive predictive value, coherence, and timeliness. We also evaluated each method's implementation, programming time, run time, and the ease of use. Among the temporal methods, at a set specificity of 95%, a Holt-Winters exponential smoother performed the best, detecting 19% of the simulated injects across all shapes and sizes, followed by an autoregressive moving average model (16%), a generalized linear model (15%), a modified version of the Early Aberration Reporting System's C2 algorithm (13%), a temporal scan statistic (11%), and a cumulative sum control chart (<2%). Of the spatial/spatio-temporal methods we tested, a spatial scan statistic detected 3% of all injects, a Bayes regression found 2%, and a generalized linear mixed model and a space-time permutation scan statistic detected none at a specificity of 95%. Positive predictive value was low (<7%) for all methods. Overall, the detection methods we tested did not perform well in identifying the temporal and spatial clusters of cases in the inject dataset. The spatial scan statistic, our current method for spatial cluster detection, performed slightly better than the other tested methods across different inject magnitudes and types. Furthermore, we found the scan statistics, as applied in the SaTScan software package, to be the easiest to program and implement for daily data analysis.
Low-complexity stochastic modeling of wall-bounded shear flows

NASA Astrophysics Data System (ADS)

Zare, Armin

Turbulent flows are ubiquitous in nature and they appear in many engineering applications. Transition to turbulence, in general, increases skin-friction drag in air/water vehicles compromising their fuel-efficiency and reduces the efficiency and longevity of wind turbines. While traditional flow control techniques combine physical intuition with costly experiments, their effectiveness can be significantly enhanced by control design based on low-complexity models and optimization. In this dissertation, we develop a theoretical and computational framework for the low-complexity stochastic modeling of wall-bounded shear flows. Part I of the dissertation is devoted to the development of a modeling framework which incorporates data-driven techniques to refine physics-based models. We consider the problem of completing partially known sample statistics in a way that is consistent with underlying stochastically driven linear dynamics. Neither the statistics nor the dynamics are precisely known. Thus, our objective is to reconcile the two in a parsimonious manner. To this end, we formulate optimization problems to identify the dynamics and directionality of input excitation in order to explain and complete available covariance data. For problem sizes that general-purpose solvers cannot handle, we develop customized optimization algorithms based on alternating direction methods. The solution to the optimization problem provides information about critical directions that have maximal effect in bringing model and statistics in agreement. In Part II, we employ our modeling framework to account for statistical signatures of turbulent channel flow using low-complexity stochastic dynamical models. We demonstrate that white-in-time stochastic forcing is not sufficient to explain turbulent flow statistics and develop models for colored-in-time forcing of the linearized Navier-Stokes equations. We also examine the efficacy of stochastically forced linearized NS equations and their parabolized equivalents in the receptivity analysis of velocity fluctuations to external sources of excitation as well as capturing the effect of the slowly-varying base flow on streamwise streaks and Tollmien-Schlichting waves. In Part III, we develop a model-based approach to design surface actuation of turbulent channel flow in the form of streamwise traveling waves. This approach is capable of identifying the drag reducing trends of traveling waves in a simulation-free manner. We also use the stochastically forced linearized NS equations to examine the Reynolds number independent effects of spanwise wall oscillations on drag reduction in turbulent channel flows. This allows us to extend the predictive capability of our simulation-free approach to high Reynolds numbers.
Robust Linear Models for Cis-eQTL Analysis.

PubMed

Rantalainen, Mattias; Lindgren, Cecilia M; Holmes, Christopher C

2015-01-01

Expression Quantitative Trait Loci (eQTL) analysis enables characterisation of functional genetic variation influencing expression levels of individual genes. In outbread populations, including humans, eQTLs are commonly analysed using the conventional linear model, adjusting for relevant covariates, assuming an allelic dosage model and a Gaussian error term. However, gene expression data generally have noise that induces heavy-tailed errors relative to the Gaussian distribution and often include atypical observations, or outliers. Such departures from modelling assumptions can lead to an increased rate of type II errors (false negatives), and to some extent also type I errors (false positives). Careful model checking can reduce the risk of type-I errors but often not type II errors, since it is generally too time-consuming to carefully check all models with a non-significant effect in large-scale and genome-wide studies. Here we propose the application of a robust linear model for eQTL analysis to reduce adverse effects of deviations from the assumption of Gaussian residuals. We present results from a simulation study as well as results from the analysis of real eQTL data sets. Our findings suggest that in many situations robust models have the potential to provide more reliable eQTL results compared to conventional linear models, particularly in respect to reducing type II errors due to non-Gaussian noise. Post-genomic data, such as that generated in genome-wide eQTL studies, are often noisy and frequently contain atypical observations. Robust statistical models have the potential to provide more reliable results and increased statistical power under non-Gaussian conditions. The results presented here suggest that robust models should be considered routinely alongside other commonly used methodologies for eQTL analysis.
Advanced Statistics for Exotic Animal Practitioners.

PubMed

Hodsoll, John; Hellier, Jennifer M; Ryan, Elizabeth G

2017-09-01

Correlation and regression assess the association between 2 or more variables. This article reviews the core knowledge needed to understand these analyses, moving from visual analysis in scatter plots through correlation, simple and multiple linear regression, and logistic regression. Correlation estimates the strength and direction of a relationship between 2 variables. Regression can be considered more general and quantifies the numerical relationships between an outcome and 1 or multiple variables in terms of a best-fit line, allowing predictions to be made. Each technique is discussed with examples and the statistical assumptions underlying their correct application. Copyright © 2017 Elsevier Inc. All rights reserved.
Statistics of primordial density perturbations from discrete seed masses

NASA Technical Reports Server (NTRS)

Scherrer, Robert J.; Bertschinger, Edmund

1991-01-01

The statistics of density perturbations for general distributions of seed masses with arbitrary matter accretion is examined. Formal expressions for the power spectrum, the N-point correlation functions, and the density distribution function are derived. These results are applied to the case of uncorrelated seed masses, and power spectra are derived for accretion of both hot and cold dark matter plus baryons. The reduced moments (cumulants) of the density distribution are computed and used to obtain a series expansion for the density distribution function. Analytic results are obtained for the density distribution function in the case of a distribution of seed masses with a spherical top-hat accretion pattern. More generally, the formalism makes it possible to give a complete characterization of the statistical properties of any random field generated from a discrete linear superposition of kernels. In particular, the results can be applied to density fields derived by smoothing a discrete set of points with a window function.
Quantifying variation in speciation and extinction rates with clade data.

PubMed

Paradis, Emmanuel; Tedesco, Pablo A; Hugueny, Bernard

2013-12-01

High-level phylogenies are very common in evolutionary analyses, although they are often treated as incomplete data. Here, we provide statistical tools to analyze what we name "clade data," which are the ages of clades together with their numbers of species. We develop a general approach for the statistical modeling of variation in speciation and extinction rates, including temporal variation, unknown variation, and linear and nonlinear modeling. We show how this approach can be generalized to a wide range of situations, including testing the effects of life-history traits and environmental variables on diversification rates. We report the results of an extensive simulation study to assess the performance of some statistical tests presented here as well as of the estimators of speciation and extinction rates. These latter results suggest the possibility to estimate correctly extinction rate in the absence of fossils. An example with data on fish is presented. © 2013 The Author(s). Evolution © 2013 The Society for the Study of Evolution.
The microcomputer scientific software series 3: general linear model--analysis of variance.

Treesearch

Harold M. Rauscher

1985-01-01

A BASIC language set of programs, designed for use on microcomputers, is presented. This set of programs will perform the analysis of variance for any statistical model describing either balanced or unbalanced designs. The program computes and displays the degrees of freedom, Type I sum of squares, and the mean square for the overall model, the error, and each factor...
Analysis of statistical and standard algorithms for detecting muscle onset with surface electromyography.

PubMed

Tenan, Matthew S; Tweedell, Andrew J; Haynes, Courtney A

2017-01-01

The timing of muscle activity is a commonly applied analytic method to understand how the nervous system controls movement. This study systematically evaluates six classes of standard and statistical algorithms to determine muscle onset in both experimental surface electromyography (EMG) and simulated EMG with a known onset time. Eighteen participants had EMG collected from the biceps brachii and vastus lateralis while performing a biceps curl or knee extension, respectively. Three established methods and three statistical methods for EMG onset were evaluated. Linear envelope, Teager-Kaiser energy operator + linear envelope and sample entropy were the established methods evaluated while general time series mean/variance, sequential and batch processing of parametric and nonparametric tools, and Bayesian changepoint analysis were the statistical techniques used. Visual EMG onset (experimental data) and objective EMG onset (simulated data) were compared with algorithmic EMG onset via root mean square error and linear regression models for stepwise elimination of inferior algorithms. The top algorithms for both data types were analyzed for their mean agreement with the gold standard onset and evaluation of 95% confidence intervals. The top algorithms were all Bayesian changepoint analysis iterations where the parameter of the prior (p0) was zero. The best performing Bayesian algorithms were p0 = 0 and a posterior probability for onset determination at 60-90%. While existing algorithms performed reasonably, the Bayesian changepoint analysis methodology provides greater reliability and accuracy when determining the singular onset of EMG activity in a time series. Further research is needed to determine if this class of algorithms perform equally well when the time series has multiple bursts of muscle activity.
New robust statistical procedures for the polytomous logistic regression models.

PubMed

Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro

2018-05-17

This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.
Grip Strength Is Associated With Cognitive Performance in Schizophrenia and the General Population: A UK Biobank Study of 476559 Participants.

PubMed

Firth, Joseph; Stubbs, Brendon; Vancampfort, Davy; Firth, Josh A; Large, Matthew; Rosenbaum, Simon; Hallgren, Mats; Ward, Philip B; Sarris, Jerome; Yung, Alison R

2018-06-06

Handgrip strength may provide an easily-administered marker of cognitive functional status. However, further population-scale research examining relationships between grip strength and cognitive performance across multiple domains is needed. Additionally, relationships between grip strength and cognitive functioning in people with schizophrenia, who frequently experience cognitive deficits, has yet to be explored. Baseline data from the UK Biobank (2007-2010) was analyzed; including 475397 individuals from the general population, and 1162 individuals with schizophrenia. Linear mixed models and generalized linear mixed models were used to assess the relationship between grip strength and 5 cognitive domains (visual memory, reaction time, reasoning, prospective memory, and number memory), controlling for age, gender, bodyweight, education, and geographical region. In the general population, maximal grip strength was positively and significantly related to visual memory (coefficient [coeff] = -0.1601, standard error [SE] = 0.003), reaction time (coeff = -0.0346, SE = 0.0004), reasoning (coeff = 0.2304, SE = 0.0079), number memory (coeff = 0.1616, SE = 0.0092), and prospective memory (coeff = 0.3486, SE = 0.0092: all P < .001). In the schizophrenia sample, grip strength was strongly related to visual memory (coeff = -0.155, SE = 0.042, P < .001) and reaction time (coeff = -0.049, SE = 0.009, P < .001), while prospective memory approached statistical significance (coeff = 0.233, SE = 0.132, P = .078), and no statistically significant association was found with number memory and reasoning (P > .1). Grip strength is significantly associated with cognitive functioning in the general population and individuals with schizophrenia, particularly for working memory and processing speed. Future research should establish directionality, examine if grip strength also predicts functional and physical health outcomes in schizophrenia, and determine whether interventions which improve muscular strength impact on cognitive and real-world functioning.
Is There a Critical Distance for Fickian Transport? - a Statistical Approach to Sub-Fickian Transport Modelling in Porous Media

NASA Astrophysics Data System (ADS)

Most, S.; Nowak, W.; Bijeljic, B.

2014-12-01

Transport processes in porous media are frequently simulated as particle movement. This process can be formulated as a stochastic process of particle position increments. At the pore scale, the geometry and micro-heterogeneities prohibit the commonly made assumption of independent and normally distributed increments to represent dispersion. Many recent particle methods seek to loosen this assumption. Recent experimental data suggest that we have not yet reached the end of the need to generalize, because particle increments show statistical dependency beyond linear correlation and over many time steps. The goal of this work is to better understand the validity regions of commonly made assumptions. We are investigating after what transport distances can we observe: A statistical dependence between increments, that can be modelled as an order-k Markov process, boils down to order 1. This would be the Markovian distance for the process, where the validity of yet-unexplored non-Gaussian-but-Markovian random walks would start. A bivariate statistical dependence that simplifies to a multi-Gaussian dependence based on simple linear correlation (validity of correlated PTRW). Complete absence of statistical dependence (validity of classical PTRW/CTRW). The approach is to derive a statistical model for pore-scale transport from a powerful experimental data set via copula analysis. The model is formulated as a non-Gaussian, mutually dependent Markov process of higher order, which allows us to investigate the validity ranges of simpler models.
Predicting Statistical Response and Extreme Events in Uncertainty Quantification through Reduced-Order Models

NASA Astrophysics Data System (ADS)

Qi, D.; Majda, A.

2017-12-01

A low-dimensional reduced-order statistical closure model is developed for quantifying the uncertainty in statistical sensitivity and intermittency in principal model directions with largest variability in high-dimensional turbulent system and turbulent transport models. Imperfect model sensitivity is improved through a recent mathematical strategy for calibrating model errors in a training phase, where information theory and linear statistical response theory are combined in a systematic fashion to achieve the optimal model performance. The idea in the reduced-order method is from a self-consistent mathematical framework for general systems with quadratic nonlinearity, where crucial high-order statistics are approximated by a systematic model calibration procedure. Model efficiency is improved through additional damping and noise corrections to replace the expensive energy-conserving nonlinear interactions. Model errors due to the imperfect nonlinear approximation are corrected by tuning the model parameters using linear response theory with an information metric in a training phase before prediction. A statistical energy principle is adopted to introduce a global scaling factor in characterizing the higher-order moments in a consistent way to improve model sensitivity. Stringent models of barotropic and baroclinic turbulence are used to display the feasibility of the reduced-order methods. Principal statistical responses in mean and variance can be captured by the reduced-order models with accuracy and efficiency. Besides, the reduced-order models are also used to capture crucial passive tracer field that is advected by the baroclinic turbulent flow. It is demonstrated that crucial principal statistical quantities like the tracer spectrum and fat-tails in the tracer probability density functions in the most important large scales can be captured efficiently with accuracy using the reduced-order tracer model in various dynamical regimes of the flow field with distinct statistical structures.
On statistical inference in time series analysis of the evolution of road safety.

PubMed

Commandeur, Jacques J F; Bijleveld, Frits D; Bergel-Hayat, Ruth; Antoniou, Constantinos; Yannis, George; Papadimitriou, Eleonora

2013-11-01

Data collected for building a road safety observatory usually include observations made sequentially through time. Examples of such data, called time series data, include annual (or monthly) number of road traffic accidents, traffic fatalities or vehicle kilometers driven in a country, as well as the corresponding values of safety performance indicators (e.g., data on speeding, seat belt use, alcohol use, etc.). Some commonly used statistical techniques imply assumptions that are often violated by the special properties of time series data, namely serial dependency among disturbances associated with the observations. The first objective of this paper is to demonstrate the impact of such violations to the applicability of standard methods of statistical inference, which leads to an under or overestimation of the standard error and consequently may produce erroneous inferences. Moreover, having established the adverse consequences of ignoring serial dependency issues, the paper aims to describe rigorous statistical techniques used to overcome them. In particular, appropriate time series analysis techniques of varying complexity are employed to describe the development over time, relating the accident-occurrences to explanatory factors such as exposure measures or safety performance indicators, and forecasting the development into the near future. Traditional regression models (whether they are linear, generalized linear or nonlinear) are shown not to naturally capture the inherent dependencies in time series data. Dedicated time series analysis techniques, such as the ARMA-type and DRAG approaches are discussed next, followed by structural time series models, which are a subclass of state space methods. The paper concludes with general recommendations and practice guidelines for the use of time series models in road safety research. Copyright © 2012 Elsevier Ltd. All rights reserved.
Temperature in and out of equilibrium: A review of concepts, tools and attempts

NASA Astrophysics Data System (ADS)

Puglisi, A.; Sarracino, A.; Vulpiani, A.

2017-11-01

We review the general aspects of the concept of temperature in equilibrium and non-equilibrium statistical mechanics. Although temperature is an old and well-established notion, it still presents controversial facets. After a short historical survey of the key role of temperature in thermodynamics and statistical mechanics, we tackle a series of issues which have been recently reconsidered. In particular, we discuss different definitions and their relevance for energy fluctuations. The interest in such a topic has been triggered by the recent observation of negative temperatures in condensed matter experiments. Moreover, the ability to manipulate systems at the micro and nano-scale urges to understand and clarify some aspects related to the statistical properties of small systems (as the issue of temperature's ;fluctuations;). We also discuss the notion of temperature in a dynamical context, within the theory of linear response for Hamiltonian systems at equilibrium and stochastic models with detailed balance, and the generalized fluctuation-response relations, which provide a hint for an extension of the definition of temperature in far-from-equilibrium systems. To conclude we consider non-Hamiltonian systems, such as granular materials, turbulence and active matter, where a general theoretical framework is still lacking.
GLOBAL SOLUTIONS TO FOLDED CONCAVE PENALIZED NONCONVEX LEARNING

PubMed Central

Liu, Hongcheng; Yao, Tao; Li, Runze

2015-01-01

This paper is concerned with solving nonconvex learning problems with folded concave penalty. Despite that their global solutions entail desirable statistical properties, there lack optimization techniques that guarantee global optimality in a general setting. In this paper, we show that a class of nonconvex learning problems are equivalent to general quadratic programs. This equivalence facilitates us in developing mixed integer linear programming reformulations, which admit finite algorithms that find a provably global optimal solution. We refer to this reformulation-based technique as the mixed integer programming-based global optimization (MIPGO). To our knowledge, this is the first global optimization scheme with a theoretical guarantee for folded concave penalized nonconvex learning with the SCAD penalty (Fan and Li, 2001) and the MCP penalty (Zhang, 2010). Numerical results indicate a significant outperformance of MIPGO over the state-of-the-art solution scheme, local linear approximation, and other alternative solution techniques in literature in terms of solution quality. PMID:27141126
[Application of SAS macro to evaluated multiplicative and additive interaction in logistic and Cox regression in clinical practices].

PubMed

Nie, Z Q; Ou, Y Q; Zhuang, J; Qu, Y J; Mai, J Z; Chen, J M; Liu, X Q

2016-05-01

Conditional logistic regression analysis and unconditional logistic regression analysis are commonly used in case control study, but Cox proportional hazard model is often used in survival data analysis. Most literature only refer to main effect model, however, generalized linear model differs from general linear model, and the interaction was composed of multiplicative interaction and additive interaction. The former is only statistical significant, but the latter has biological significance. In this paper, macros was written by using SAS 9.4 and the contrast ratio, attributable proportion due to interaction and synergy index were calculated while calculating the items of logistic and Cox regression interactions, and the confidence intervals of Wald, delta and profile likelihood were used to evaluate additive interaction for the reference in big data analysis in clinical epidemiology and in analysis of genetic multiplicative and additive interactions.

Generalized statistical mechanics of cosmic rays: Application to positron-electron spectral indices.

PubMed

Yalcin, G Cigdem; Beck, Christian

2018-01-29

Cosmic ray energy spectra exhibit power law distributions over many orders of magnitude that are very well described by the predictions of q-generalized statistical mechanics, based on a q-generalized Hagedorn theory for transverse momentum spectra and hard QCD scattering processes. QCD at largest center of mass energies predicts the entropic index to be [Formula: see text]. Here we show that the escort duality of the nonextensive thermodynamic formalism predicts an energy split of effective temperature given by Δ [Formula: see text] MeV, where T H is the Hagedorn temperature. We carefully analyse the measured data of the AMS-02 collaboration and provide evidence that the predicted temperature split is indeed observed, leading to a different energy dependence of the e + and e - spectral indices. We also observe a distinguished energy scale E * ≈ 50 GeV where the e + and e - spectral indices differ the most. Linear combinations of the escort and non-escort q-generalized canonical distributions yield excellent agreement with the measured AMS-02 data in the entire energy range.
Prediction of rainfall anomalies during the dry to wet transition season over the Southern Amazonia using machine learning tools

NASA Astrophysics Data System (ADS)

Shan, X.; Zhang, K.; Zhuang, Y.; Fu, R.; Hong, Y.

2017-12-01

Seasonal prediction of rainfall during the dry-to-wet transition season in austral spring (September-November) over southern Amazonia is central for improving planting crops and fire mitigation in that region. Previous studies have identified the key large-scale atmospheric dynamic and thermodynamics pre-conditions during the dry season (June-August) that influence the rainfall anomalies during the dry to wet transition season over Southern Amazonia. Based on these key pre-conditions during dry season, we have evaluated several statistical models and developed a Neural Network based statistical prediction system to predict rainfall during the dry to wet transition for Southern Amazonia (5-15°S, 50-70°W). Multivariate Empirical Orthogonal Function (EOF) Analysis is applied to the following four fields during JJA from the ECMWF Reanalysis (ERA-Interim) spanning from year 1979 to 2015: geopotential height at 200 hPa, surface relative humidity, convective inhibition energy (CIN) index and convective available potential energy (CAPE), to filter out noise and highlight the most coherent spatial and temporal variations. The first 10 EOF modes are retained for inputs to the statistical models, accounting for at least 70% of the total variance in the predictor fields. We have tested several linear and non-linear statistical methods. While the regularized Ridge Regression and Lasso Regression can generally capture the spatial pattern and magnitude of rainfall anomalies, we found that that Neural Network performs best with an accuracy greater than 80%, as expected from the non-linear dependence of the rainfall on the large-scale atmospheric thermodynamic conditions and circulation. Further tests of various prediction skill metrics and hindcasts also suggest this Neural Network prediction approach can significantly improve seasonal prediction skill than the dynamic predictions and regression based statistical predictions. Thus, this statistical prediction system could have shown potential to improve real-time seasonal rainfall predictions in the future.
Four points function fitted and first derivative procedure for determining the end points in potentiometric titration curves: statistical analysis and method comparison.

PubMed

Kholeif, S A

2001-06-01

A new method that belongs to the differential category for determining the end points from potentiometric titration curves is presented. It uses a preprocess to find first derivative values by fitting four data points in and around the region of inflection to a non-linear function, and then locate the end point, usually as a maximum or minimum, using an inverse parabolic interpolation procedure that has an analytical solution. The behavior and accuracy of the sigmoid and cumulative non-linear functions used are investigated against three factors. A statistical evaluation of the new method using linear least-squares method validation and multifactor data analysis are covered. The new method is generally applied to symmetrical and unsymmetrical potentiometric titration curves, and the end point is calculated using numerical procedures only. It outperforms the "parent" regular differential method in almost all factors levels and gives accurate results comparable to the true or estimated true end points. Calculated end points from selected experimental titration curves compatible with the equivalence point category of methods, such as Gran or Fortuin, are also compared with the new method.
Asymptotic Linear Spectral Statistics for Spiked Hermitian Random Matrices

NASA Astrophysics Data System (ADS)

Passemier, Damien; McKay, Matthew R.; Chen, Yang

2015-07-01

Using the Coulomb Fluid method, this paper derives central limit theorems (CLTs) for linear spectral statistics of three "spiked" Hermitian random matrix ensembles. These include Johnstone's spiked model (i.e., central Wishart with spiked correlation), non-central Wishart with rank-one non-centrality, and a related class of non-central matrices. For a generic linear statistic, we derive simple and explicit CLT expressions as the matrix dimensions grow large. For all three ensembles under consideration, we find that the primary effect of the spike is to introduce an correction term to the asymptotic mean of the linear spectral statistic, which we characterize with simple formulas. The utility of our proposed framework is demonstrated through application to three different linear statistics problems: the classical likelihood ratio test for a population covariance, the capacity analysis of multi-antenna wireless communication systems with a line-of-sight transmission path, and a classical multiple sample significance testing problem.
Exact Scheffé-type confidence intervals for output from groundwater flow models: 1. Use of hydrogeologic information

USGS Publications Warehouse

Cooley, Richard L.

1993-01-01

A new method is developed to efficiently compute exact Scheffé-type confidence intervals for output (or other function of parameters) g(β) derived from a groundwater flow model. The method is general in that parameter uncertainty can be specified by any statistical distribution having a log probability density function (log pdf) that can be expanded in a Taylor series. However, for this study parameter uncertainty is specified by a statistical multivariate beta distribution that incorporates hydrogeologic information in the form of the investigator's best estimates of parameters and a grouping of random variables representing possible parameter values so that each group is defined by maximum and minimum bounds and an ordering according to increasing value. The new method forms the confidence intervals from maximum and minimum limits of g(β) on a contour of a linear combination of (1) the quadratic form for the parameters used by Cooley and Vecchia (1987) and (2) the log pdf for the multivariate beta distribution. Three example problems are used to compare characteristics of the confidence intervals for hydraulic head obtained using different weights for the linear combination. Different weights generally produced similar confidence intervals, whereas the method of Cooley and Vecchia (1987) often produced much larger confidence intervals.
The optimal hormonal replacement modality selection for multiple organ procurement from brain-dead organ donors

PubMed Central

Mi, Zhibao; Novitzky, Dimitri; Collins, Joseph F; Cooper, David KC

2015-01-01

The management of brain-dead organ donors is complex. The use of inotropic agents and replacement of depleted hormones (hormonal replacement therapy) is crucial for successful multiple organ procurement, yet the optimal hormonal replacement has not been identified, and the statistical adjustment to determine the best selection is not trivial. Traditional pair-wise comparisons between every pair of treatments, and multiple comparisons to all (MCA), are statistically conservative. Hsu’s multiple comparisons with the best (MCB) – adapted from the Dunnett’s multiple comparisons with control (MCC) – has been used for selecting the best treatment based on continuous variables. We selected the best hormonal replacement modality for successful multiple organ procurement using a two-step approach. First, we estimated the predicted margins by constructing generalized linear models (GLM) or generalized linear mixed models (GLMM), and then we applied the multiple comparison methods to identify the best hormonal replacement modality given that the testing of hormonal replacement modalities is independent. Based on 10-year data from the United Network for Organ Sharing (UNOS), among 16 hormonal replacement modalities, and using the 95% simultaneous confidence intervals, we found that the combination of thyroid hormone, a corticosteroid, antidiuretic hormone, and insulin was the best modality for multiple organ procurement for transplantation. PMID:25565890
Variational Bayesian Parameter Estimation Techniques for the General Linear Model

PubMed Central

Starke, Ludger; Ostwald, Dirk

2017-01-01

Variational Bayes (VB), variational maximum likelihood (VML), restricted maximum likelihood (ReML), and maximum likelihood (ML) are cornerstone parametric statistical estimation techniques in the analysis of functional neuroimaging data. However, the theoretical underpinnings of these model parameter estimation techniques are rarely covered in introductory statistical texts. Because of the widespread practical use of VB, VML, ReML, and ML in the neuroimaging community, we reasoned that a theoretical treatment of their relationships and their application in a basic modeling scenario may be helpful for both neuroimaging novices and practitioners alike. In this technical study, we thus revisit the conceptual and formal underpinnings of VB, VML, ReML, and ML and provide a detailed account of their mathematical relationships and implementational details. We further apply VB, VML, ReML, and ML to the general linear model (GLM) with non-spherical error covariance as commonly encountered in the first-level analysis of fMRI data. To this end, we explicitly derive the corresponding free energy objective functions and ensuing iterative algorithms. Finally, in the applied part of our study, we evaluate the parameter and model recovery properties of VB, VML, ReML, and ML, first in an exemplary setting and then in the analysis of experimental fMRI data acquired from a single participant under visual stimulation. PMID:28966572
Spatially resolved regression analysis of pre-treatment FDG, FLT and Cu-ATSM PET from post-treatment FDG PET: an exploratory study

PubMed Central

Bowen, Stephen R; Chappell, Richard J; Bentzen, Søren M; Deveau, Michael A; Forrest, Lisa J; Jeraj, Robert

2012-01-01

Purpose To quantify associations between pre-radiotherapy and post-radiotherapy PET parameters via spatially resolved regression. Materials and methods Ten canine sinonasal cancer patients underwent PET/CT scans of [18F]FDG (FDGpre), [18F]FLT (FLTpre), and [61Cu]Cu-ATSM (Cu-ATSMpre). Following radiotherapy regimens of 50 Gy in 10 fractions, veterinary patients underwent FDG PET/CT scans at three months (FDGpost). Regression of standardized uptake values in baseline FDGpre, FLTpre and Cu-ATSMpre tumour voxels to those in FDGpost images was performed for linear, log-linear, generalized-linear and mixed-fit linear models. Goodness-of-fit in regression coefficients was assessed by R2. Hypothesis testing of coefficients over the patient population was performed. Results Multivariate linear model fits of FDGpre to FDGpost were significantly positive over the population (FDGpost~0.17 FDGpre, p=0.03), and classified slopes of RECIST non-responders and responders to be different (0.37 vs. 0.07, p=0.01). Generalized-linear model fits related FDGpre to FDGpost by a linear power law (FDGpost~FDGpre0.93, p<0.001). Univariate mixture model fits of FDGpre improved R2 from 0.17 to 0.52. Neither baseline FLT PET nor Cu-ATSM PET uptake contributed statistically significant multivariate regression coefficients. Conclusions Spatially resolved regression analysis indicates that pre-treatment FDG PET uptake is most strongly associated with three-month post-treatment FDG PET uptake in this patient population, though associations are histopathology-dependent. PMID:22682748
Novel forecasting approaches using combination of machine learning and statistical models for flood susceptibility mapping.

PubMed

Shafizadeh-Moghadam, Hossein; Valavi, Roozbeh; Shahabi, Himan; Chapi, Kamran; Shirzadi, Ataollah

2018-07-01

In this research, eight individual machine learning and statistical models are implemented and compared, and based on their results, seven ensemble models for flood susceptibility assessment are introduced. The individual models included artificial neural networks, classification and regression trees, flexible discriminant analysis, generalized linear model, generalized additive model, boosted regression trees, multivariate adaptive regression splines, and maximum entropy, and the ensemble models were Ensemble Model committee averaging (EMca), Ensemble Model confidence interval Inferior (EMciInf), Ensemble Model confidence interval Superior (EMciSup), Ensemble Model to estimate the coefficient of variation (EMcv), Ensemble Model to estimate the mean (EMmean), Ensemble Model to estimate the median (EMmedian), and Ensemble Model based on weighted mean (EMwmean). The data set covered 201 flood events in the Haraz watershed (Mazandaran province in Iran) and 10,000 randomly selected non-occurrence points. Among the individual models, the Area Under the Receiver Operating Characteristic (AUROC), which showed the highest value, belonged to boosted regression trees (0.975) and the lowest value was recorded for generalized linear model (0.642). On the other hand, the proposed EMmedian resulted in the highest accuracy (0.976) among all models. In spite of the outstanding performance of some models, nevertheless, variability among the prediction of individual models was considerable. Therefore, to reduce uncertainty, creating more generalizable, more stable, and less sensitive models, ensemble forecasting approaches and in particular the EMmedian is recommended for flood susceptibility assessment. Copyright © 2018 Elsevier Ltd. All rights reserved.
General anesthesia in orthognathic surgeries: does it affect horizontal jaw relations?

PubMed

Yaghmaei, Masoud; Ejlali, Masoud; Nikzad, Sekieneh; Sayyedi, Ashraf; Shafaeifard, Shahrouz; Pourdanesh, Fereydoun

2013-10-01

The aim of this study was to evaluate the influence of general anesthesia on centric jaw relation (CR) records of orthognathic surgical patients in different postural positions. Fifty patients undergoing orthognathic surgery at Taleghani Hospital (Tehran, Iran) in 2008 were prospectively studied. CR records were obtained in conscious patients in 2 different positions (upright and supine) 1 day before surgery and in the supine position under general anesthesia. The impressions were made and the corresponding casts were mounted on a semiadjustable articulator. Differences were measured to the nearest 0.10 mm using a caliper. Paired t test and a general linear regression model were used for statistical analysis. Fifty patients (27 women and 23 men; mean age, 22.5 ± 3.5 yr) were enrolled. Angle Class I (group I), Class II (group II), and Class III (group III) malocclusions were detected in 16% (n = 8), 54% (n = 27), and 30% (n = 15) of patients, respectively. Although mean changes were smaller than 2 mm, statistically significant differences were found by paired t test in all Angle classification groups. No significant differences were found between the supine and conscious and the supine and unconscious patient positions in groups I and III (P > .05). However, in group II, this difference was statistically significant (P = .001). Regarding the impact of anesthesia on CR records of patients with different Angle classes, this study showed a significant effect, particularly in group II. Assessment of the outcome of interest (difference between the supine and conscious and the upright and conscious positions) versus position after adjustment for Angle class using a general linear regression model showed that the difference was significant only for Angle class (β = +0.29; t = 3.05; P = .003). General anesthesia may not adversely affect the mandibular condylar position in orthognathic patients in a supine position compared with a supine and conscious position. However, among all study groups, group II showed more significant changes in CR records under general anesthesia. Oral and maxillofacial surgeons should be well aware of such changes in these particular positions and avoid possible mismanagement and potential complications. Copyright © 2013 American Association of Oral and Maxillofacial Surgeons. Published by Elsevier Inc. All rights reserved.
Statistical models for the analysis and design of digital polymerase chain (dPCR) experiments

USGS Publications Warehouse

Dorazio, Robert; Hunter, Margaret

2015-01-01

Statistical methods for the analysis and design of experiments using digital PCR (dPCR) have received only limited attention and have been misused in many instances. To address this issue and to provide a more general approach to the analysis of dPCR data, we describe a class of statistical models for the analysis and design of experiments that require quantification of nucleic acids. These models are mathematically equivalent to generalized linear models of binomial responses that include a complementary, log–log link function and an offset that is dependent on the dPCR partition volume. These models are both versatile and easy to fit using conventional statistical software. Covariates can be used to specify different sources of variation in nucleic acid concentration, and a model’s parameters can be used to quantify the effects of these covariates. For purposes of illustration, we analyzed dPCR data from different types of experiments, including serial dilution, evaluation of copy number variation, and quantification of gene expression. We also showed how these models can be used to help design dPCR experiments, as in selection of sample sizes needed to achieve desired levels of precision in estimates of nucleic acid concentration or to detect differences in concentration among treatments with prescribed levels of statistical power.
Statistical Models for the Analysis and Design of Digital Polymerase Chain Reaction (dPCR) Experiments.

PubMed

Dorazio, Robert M; Hunter, Margaret E

2015-11-03

Statistical methods for the analysis and design of experiments using digital PCR (dPCR) have received only limited attention and have been misused in many instances. To address this issue and to provide a more general approach to the analysis of dPCR data, we describe a class of statistical models for the analysis and design of experiments that require quantification of nucleic acids. These models are mathematically equivalent to generalized linear models of binomial responses that include a complementary, log-log link function and an offset that is dependent on the dPCR partition volume. These models are both versatile and easy to fit using conventional statistical software. Covariates can be used to specify different sources of variation in nucleic acid concentration, and a model's parameters can be used to quantify the effects of these covariates. For purposes of illustration, we analyzed dPCR data from different types of experiments, including serial dilution, evaluation of copy number variation, and quantification of gene expression. We also showed how these models can be used to help design dPCR experiments, as in selection of sample sizes needed to achieve desired levels of precision in estimates of nucleic acid concentration or to detect differences in concentration among treatments with prescribed levels of statistical power.
MWASTools: an R/bioconductor package for metabolome-wide association studies.

PubMed

Rodriguez-Martinez, Andrea; Posma, Joram M; Ayala, Rafael; Neves, Ana L; Anwar, Maryam; Petretto, Enrico; Emanueli, Costanza; Gauguier, Dominique; Nicholson, Jeremy K; Dumas, Marc-Emmanuel

2018-03-01

MWASTools is an R package designed to provide an integrated pipeline to analyse metabonomic data in large-scale epidemiological studies. Key functionalities of our package include: quality control analysis; metabolome-wide association analysis using various models (partial correlations, generalized linear models); visualization of statistical outcomes; metabolite assignment using statistical total correlation spectroscopy (STOCSY); and biological interpretation of metabolome-wide association studies results. The MWASTools R package is implemented in R (version > =3.4) and is available from Bioconductor: https://bioconductor.org/packages/MWASTools/. m.dumas@imperial.ac.uk. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Statistical mechanics of Fermi-Pasta-Ulam chains with the canonical ensemble

NASA Astrophysics Data System (ADS)

Demirel, Melik C.; Sayar, Mehmet; Atılgan, Ali R.

1997-03-01

Low-energy vibrations of a Fermi-Pasta-Ulam-Β (FPU-Β) chain with 16 repeat units are analyzed with the aid of numerical experiments and the statistical mechanics equations of the canonical ensemble. Constant temperature numerical integrations are performed by employing the cubic coupling scheme of Kusnezov et al. [Ann. Phys. 204, 155 (1990)]. Very good agreement is obtained between numerical results and theoretical predictions for the probability distributions of the generalized coordinates and momenta both of the chain and of the thermal bath. It is also shown that the average energy of the chain scales linearly with the bath temperature.
Using structural equation modeling for network meta-analysis.

PubMed

Tu, Yu-Kang; Wu, Yun-Chun

2017-07-14

Network meta-analysis overcomes the limitations of traditional pair-wise meta-analysis by incorporating all available evidence into a general statistical framework for simultaneous comparisons of several treatments. Currently, network meta-analyses are undertaken either within the Bayesian hierarchical linear models or frequentist generalized linear mixed models. Structural equation modeling (SEM) is a statistical method originally developed for modeling causal relations among observed and latent variables. As random effect is explicitly modeled as a latent variable in SEM, it is very flexible for analysts to specify complex random effect structure and to make linear and nonlinear constraints on parameters. The aim of this article is to show how to undertake a network meta-analysis within the statistical framework of SEM. We used an example dataset to demonstrate the standard fixed and random effect network meta-analysis models can be easily implemented in SEM. It contains results of 26 studies that directly compared three treatment groups A, B and C for prevention of first bleeding in patients with liver cirrhosis. We also showed that a new approach to network meta-analysis based on the technique of unrestricted weighted least squares (UWLS) method can also be undertaken using SEM. For both the fixed and random effect network meta-analysis, SEM yielded similar coefficients and confidence intervals to those reported in the previous literature. The point estimates of two UWLS models were identical to those in the fixed effect model but the confidence intervals were greater. This is consistent with results from the traditional pairwise meta-analyses. Comparing to UWLS model with common variance adjusted factor, UWLS model with unique variance adjusted factor has greater confidence intervals when the heterogeneity was larger in the pairwise comparison. The UWLS model with unique variance adjusted factor reflects the difference in heterogeneity within each comparison. SEM provides a very flexible framework for univariate and multivariate meta-analysis, and its potential as a powerful tool for advanced meta-analysis is still to be explored.
Conditionally Unbiased Bounded Influence Robust Regression with Applications to Generalized Linear Models.

DTIC Science & Technology

1987-03-01

some general results and definitions from robust statistics (see Hampel et. al.. 1986). The influence function of an M-Lstimator is IC (y.x.0) = D_...estimator is (4,T T T (\\cond(YXGB) . Yx(x,O.B) ) . where X~xO.) =x T T8 xT-lx 1/2 = x x v(x . b/(x B x) ) - B. The influence function of this...the influence function for (3.7) and (3.8) is equal to -. Eo[\\Pcond(y.X.,B)] -1 cond(Yx. ,B). (4.3) On the other hand, the influence function for the
Analysis and generation of groundwater concentration time series

NASA Astrophysics Data System (ADS)

Crăciun, Maria; Vamoş, Călin; Suciu, Nicolae

2018-01-01

Concentration time series are provided by simulated concentrations of a nonreactive solute transported in groundwater, integrated over the transverse direction of a two-dimensional computational domain and recorded at the plume center of mass. The analysis of a statistical ensemble of time series reveals subtle features that are not captured by the first two moments which characterize the approximate Gaussian distribution of the two-dimensional concentration fields. The concentration time series exhibit a complex preasymptotic behavior driven by a nonstationary trend and correlated fluctuations with time-variable amplitude. Time series with almost the same statistics are generated by successively adding to a time-dependent trend a sum of linear regression terms, accounting for correlations between fluctuations around the trend and their increments in time, and terms of an amplitude modulated autoregressive noise of order one with time-varying parameter. The algorithm generalizes mixing models used in probability density function approaches. The well-known interaction by exchange with the mean mixing model is a special case consisting of a linear regression with constant coefficients.
Automating approximate Bayesian computation by local linear regression.

PubMed

Thornton, Kevin R

2009-07-07

In several biological contexts, parameter inference often relies on computationally-intensive techniques. "Approximate Bayesian Computation", or ABC, methods based on summary statistics have become increasingly popular. A particular flavor of ABC based on using a linear regression to approximate the posterior distribution of the parameters, conditional on the summary statistics, is computationally appealing, yet no standalone tool exists to automate the procedure. Here, I describe a program to implement the method. The software package ABCreg implements the local linear-regression approach to ABC. The advantages are: 1. The code is standalone, and fully-documented. 2. The program will automatically process multiple data sets, and create unique output files for each (which may be processed immediately in R), facilitating the testing of inference procedures on simulated data, or the analysis of multiple data sets. 3. The program implements two different transformation methods for the regression step. 4. Analysis options are controlled on the command line by the user, and the program is designed to output warnings for cases where the regression fails. 5. The program does not depend on any particular simulation machinery (coalescent, forward-time, etc.), and therefore is a general tool for processing the results from any simulation. 6. The code is open-source, and modular.Examples of applying the software to empirical data from Drosophila melanogaster, and testing the procedure on simulated data, are shown. In practice, the ABCreg simplifies implementing ABC based on local-linear regression.
Characterizations of linear sufficient statistics

NASA Technical Reports Server (NTRS)

Peters, B. C., Jr.; Reoner, R.; Decell, H. P., Jr.

1977-01-01

A surjective bounded linear operator T from a Banach space X to a Banach space Y must be a sufficient statistic for a dominated family of probability measures defined on the Borel sets of X. These results were applied, so that they characterize linear sufficient statistics for families of the exponential type, including as special cases the Wishart and multivariate normal distributions. The latter result was used to establish precisely which procedures for sampling from a normal population had the property that the sample mean was a sufficient statistic.
Classification image analysis: estimation and statistical inference for two-alternative forced-choice experiments

NASA Technical Reports Server (NTRS)

Abbey, Craig K.; Eckstein, Miguel P.

2002-01-01

We consider estimation and statistical hypothesis testing on classification images obtained from the two-alternative forced-choice experimental paradigm. We begin with a probabilistic model of task performance for simple forced-choice detection and discrimination tasks. Particular attention is paid to general linear filter models because these models lead to a direct interpretation of the classification image as an estimate of the filter weights. We then describe an estimation procedure for obtaining classification images from observer data. A number of statistical tests are presented for testing various hypotheses from classification images based on some more compact set of features derived from them. As an example of how the methods we describe can be used, we present a case study investigating detection of a Gaussian bump profile.

VizieR Online Data Catalog: Supernova matter EOS (Buyukcizmeci+, 2014)

NASA Astrophysics Data System (ADS)

Buyukcizmeci, N.; Botvina, A. S.; Mishustin, I. N.

2017-03-01

The Statistical Model for Supernova Matter (SMSM) was developed in Botvina & Mishustin (2004, PhLB, 584, 233 ; 2010, NuPhA, 843, 98) as a direct generalization of the Statistical Multifragmentation Model (SMM; Bondorf et al. 1995, PhR, 257, 133). We treat supernova matter as a mixture of nuclear species, electrons, and photons in statistical equilibrium. The SMSM EOS tables cover the following ranges of control parameters: 1. Temperature: T = 0.2-25 MeV; for 35 T values. 2. Electron fraction Ye: 0.02-0.56; linear mesh of Ye = 0.02, giving 28 Ye values. It is equal to the total proton fraction Xp, due to charge neutrality. 3. Baryon number density fraction {rho}/{rho}0 = (10-8-0.32), giving 31 {rho}/{rho}0 values. (2 data files).
2-Point microstructure archetypes for improved elastic properties

NASA Astrophysics Data System (ADS)

Adams, Brent L.; Gao, Xiang

2004-01-01

Rectangular models of material microstructure are described by their 1- and 2-point (spatial) correlation statistics of placement of local state. In the procedure described here the local state space is described in discrete form; and the focus is on placement of local state within a finite number of cells comprising rectangular models. It is illustrated that effective elastic properties (generalized Hashin Shtrikman bounds) can be obtained that are linear in components of the correlation statistics. Within this framework the concept of an eigen-microstructure within the microstructure hull is useful. Given the practical innumerability of the microstructure hull, however, we introduce a method for generating a sequence of archetypes of eigen-microstructure, from the 2-point correlation statistics of local state, assuming that the 1-point statistics are stationary. The method is illustrated by obtaining an archetype for an imaginary two-phase material where the objective is to maximize the combination C_{xxxx}^{*} + C_{xyxy}^{*}
Characterizing driver-response relationships in marine pelagic ecosystems for improved ocean management.

PubMed

Hunsicker, Mary E; Kappel, Carrie V; Selkoe, Kimberly A; Halpern, Benjamin S; Scarborough, Courtney; Mease, Lindley; Amrhein, Alisan

2016-04-01

Scientists and resource managers often use methods and tools that assume ecosystem components respond linearly to environmental drivers and human stressors. However, a growing body of literature demonstrates that many relationships are-non-linear, where small changes in a driver prompt a disproportionately large ecological response. We aim to provide a comprehensive assessment of the relationships between drivers and ecosystem components to identify where and when non-linearities are likely to occur. We focused our analyses on one of the best-studied marine systems, pelagic ecosystems, which allowed us to apply robust statistical techniques on a large pool of previously published studies. In this synthesis, we (1) conduct a wide literature review on single driver-response relationships in pelagic systems, (2) use statistical models to identify the degree of non-linearity in these relationships, and (3) assess whether general patterns exist in the strengths and shapes of non-linear relationships across drivers. Overall we found that non-linearities are common in pelagic ecosystems, comprising at least 52% of all driver-response relation- ships. This is likely an underestimate, as papers with higher quality data and analytical approaches reported non-linear relationships at a higher frequency (on average 11% more). Consequently, in the absence of evidence for a linear relationship, it is safer to assume a relationship is non-linear. Strong non-linearities can lead to greater ecological and socioeconomic consequences if they are unknown (and/or unanticipated), but if known they may provide clear thresholds to inform management targets. In pelagic systems, strongly non-linear relationships are often driven by climate and trophodynamic variables but are also associated with local stressors, such as overfishing and pollution, that can be more easily controlled by managers. Even when marine resource managers cannot influence ecosystem change, they can use information about threshold responses to guide how other stressors are managed and to adapt to new ocean conditions. As methods to detect and reduce uncertainty around threshold values improve, managers will be able to better understand and account for ubiquitous non-linear relationships.
A quasi-likelihood approach to non-negative matrix factorization

PubMed Central

Devarajan, Karthik; Cheung, Vincent C.K.

2017-01-01

A unified approach to non-negative matrix factorization based on the theory of generalized linear models is proposed. This approach embeds a variety of statistical models, including the exponential family, within a single theoretical framework and provides a unified view of such factorizations from the perspective of quasi-likelihood. Using this framework, a family of algorithms for handling signal-dependent noise is developed and its convergence proven using the Expectation-Maximization algorithm. In addition, a measure to evaluate the goodness-of-fit of the resulting factorization is described. The proposed methods allow modeling of non-linear effects via appropriate link functions and are illustrated using an application in biomedical signal processing. PMID:27348511
General Multivariate Linear Modeling of Surface Shapes Using SurfStat

PubMed Central

Chung, Moo K.; Worsley, Keith J.; Nacewicz, Brendon, M.; Dalton, Kim M.; Davidson, Richard J.

2010-01-01

Although there are many imaging studies on traditional ROI-based amygdala volumetry, there are very few studies on modeling amygdala shape variations. This paper present a unified computational and statistical framework for modeling amygdala shape variations in a clinical population. The weighted spherical harmonic representation is used as to parameterize, to smooth out, and to normalize amygdala surfaces. The representation is subsequently used as an input for multivariate linear models accounting for nuisance covariates such as age and brain size difference using SurfStat package that completely avoids the complexity of specifying design matrices. The methodology has been applied for quantifying abnormal local amygdala shape variations in 22 high functioning autistic subjects. PMID:20620211
Accurate ocean bottom seismometer positioning method inspired by multilateration technique

USGS Publications Warehouse

Benazzouz, Omar; Pinheiro, Luis M.; Matias, Luis M. A.; Afilhado, Alexandra; Herold, Daniel; Haines, Seth S.

2018-01-01

The positioning of ocean bottom seismometers (OBS) is a key step in the processing flow of OBS data, especially in the case of self popup types of OBS instruments. The use of first arrivals from airgun shots, rather than relying on the acoustic transponders mounted in the OBS, is becoming a trend and generally leads to more accurate positioning due to the statistics from a large number of shots. In this paper, a linearization of the OBS positioning problem via the multilateration technique is discussed. The discussed linear solution solves jointly for the average water layer velocity and the OBS position using only shot locations and first arrival times as input data.
Meta-analysis for the comparison of two diagnostic tests to a common gold standard: A generalized linear mixed model approach.

PubMed

Hoyer, Annika; Kuss, Oliver

2018-05-01

Meta-analysis of diagnostic studies is still a rapidly developing area of biostatistical research. Especially, there is an increasing interest in methods to compare different diagnostic tests to a common gold standard. Restricting to the case of two diagnostic tests, in these meta-analyses the parameters of interest are the differences of sensitivities and specificities (with their corresponding confidence intervals) between the two diagnostic tests while accounting for the various associations across single studies and between the two tests. We propose statistical models with a quadrivariate response (where sensitivity of test 1, specificity of test 1, sensitivity of test 2, and specificity of test 2 are the four responses) as a sensible approach to this task. Using a quadrivariate generalized linear mixed model naturally generalizes the common standard bivariate model of meta-analysis for a single diagnostic test. If information on several thresholds of the tests is available, the quadrivariate model can be further generalized to yield a comparison of full receiver operating characteristic (ROC) curves. We illustrate our model by an example where two screening methods for the diagnosis of type 2 diabetes are compared.
Analysis of statistical and standard algorithms for detecting muscle onset with surface electromyography

PubMed Central

Tweedell, Andrew J.; Haynes, Courtney A.

2017-01-01

The timing of muscle activity is a commonly applied analytic method to understand how the nervous system controls movement. This study systematically evaluates six classes of standard and statistical algorithms to determine muscle onset in both experimental surface electromyography (EMG) and simulated EMG with a known onset time. Eighteen participants had EMG collected from the biceps brachii and vastus lateralis while performing a biceps curl or knee extension, respectively. Three established methods and three statistical methods for EMG onset were evaluated. Linear envelope, Teager-Kaiser energy operator + linear envelope and sample entropy were the established methods evaluated while general time series mean/variance, sequential and batch processing of parametric and nonparametric tools, and Bayesian changepoint analysis were the statistical techniques used. Visual EMG onset (experimental data) and objective EMG onset (simulated data) were compared with algorithmic EMG onset via root mean square error and linear regression models for stepwise elimination of inferior algorithms. The top algorithms for both data types were analyzed for their mean agreement with the gold standard onset and evaluation of 95% confidence intervals. The top algorithms were all Bayesian changepoint analysis iterations where the parameter of the prior (p0) was zero. The best performing Bayesian algorithms were p0 = 0 and a posterior probability for onset determination at 60–90%. While existing algorithms performed reasonably, the Bayesian changepoint analysis methodology provides greater reliability and accuracy when determining the singular onset of EMG activity in a time series. Further research is needed to determine if this class of algorithms perform equally well when the time series has multiple bursts of muscle activity. PMID:28489897
Regression modeling of ground-water flow

USGS Publications Warehouse

Cooley, R.L.; Naff, R.L.

1985-01-01

Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
[Comparison of application of Cochran-Armitage trend test and linear regression analysis for rate trend analysis in epidemiology study].

PubMed

Wang, D Z; Wang, C; Shen, C F; Zhang, Y; Zhang, H; Song, G D; Xue, X D; Xu, Z L; Zhang, S; Jiang, G H

2017-05-10

We described the time trend of acute myocardial infarction (AMI) from 1999 to 2013 in Tianjin incidence rate with Cochran-Armitage trend (CAT) test and linear regression analysis, and the results were compared. Based on actual population, CAT test had much stronger statistical power than linear regression analysis for both overall incidence trend and age specific incidence trend (Cochran-Armitage trend P value
The use of generalized linear models and generalized estimating equations in bioarchaeological studies.

PubMed

Nikita, Efthymia

2014-03-01

The current article explores whether the application of generalized linear models (GLM) and generalized estimating equations (GEE) can be used in place of conventional statistical analyses in the study of ordinal data that code an underlying continuous variable, like entheseal changes. The analysis of artificial data and ordinal data expressing entheseal changes in archaeological North African populations gave the following results. Parametric and nonparametric tests give convergent results particularly for P values <0.1, irrespective of whether the underlying variable is normally distributed or not under the condition that the samples involved in the tests exhibit approximately equal sizes. If this prerequisite is valid and provided that the samples are of equal variances, analysis of covariance may be adopted. GLM are not subject to constraints and give results that converge to those obtained from all nonparametric tests. Therefore, they can be used instead of traditional tests as they give the same amount of information as them, but with the advantage of allowing the study of the simultaneous impact of multiple predictors and their interactions and the modeling of the experimental data. However, GLM should be replaced by GEE for the study of bilateral asymmetry and in general when paired samples are tested, because GEE are appropriate for correlated data. Copyright © 2013 Wiley Periodicals, Inc.
Dose and dose rate extrapolation factors for malignant and non-malignant health endpoints after exposure to gamma and neutron radiation.

PubMed

Tran, Van; Little, Mark P

2017-11-01

Murine experiments were conducted at the JANUS reactor in Argonne National Laboratory from 1970 to 1992 to study the effect of acute and protracted radiation dose from gamma rays and fission neutron whole body exposure. The present study reports the reanalysis of the JANUS data on 36,718 mice, of which 16,973 mice were irradiated with neutrons, 13,638 were irradiated with gamma rays, and 6107 were controls. Mice were mostly Mus musculus, but one experiment used Peromyscus leucopus. For both types of radiation exposure, a Cox proportional hazards model was used, using age as timescale, and stratifying on sex and experiment. The optimal model was one with linear and quadratic terms in cumulative lagged dose, with adjustments to both linear and quadratic dose terms for low-dose rate irradiation (<5 mGy/h) and with adjustments to the dose for age at exposure and sex. After gamma ray exposure there is significant non-linearity (generally with upward curvature) for all tumours, lymphoreticular, respiratory, connective tissue and gastrointestinal tumours, also for all non-tumour, other non-tumour, non-malignant pulmonary and non-malignant renal diseases (p < 0.001). Associated with this the low-dose extrapolation factor, measuring the overestimation in low-dose risk resulting from linear extrapolation is significantly elevated for lymphoreticular tumours 1.16 (95% CI 1.06, 1.31), elevated also for a number of non-malignant endpoints, specifically all non-tumour diseases, 1.63 (95% CI 1.43, 2.00), non-malignant pulmonary disease, 1.70 (95% CI 1.17, 2.76) and other non-tumour diseases, 1.47 (95% CI 1.29, 1.82). However, for a rather larger group of malignant endpoints the low-dose extrapolation factor is significantly less than 1 (implying downward curvature), with central estimates generally ranging from 0.2 to 0.8, in particular for tumours of the respiratory system, vasculature, ovary, kidney/urinary bladder and testis. For neutron exposure most endpoints, malignant and non-malignant, show downward curvature in the dose response, and for most endpoints this is statistically significant (p < 0.05). Associated with this, the low-dose extrapolation factor associated with neutron exposure is generally statistically significantly less than 1 for most malignant and non-malignant endpoints, with central estimates mostly in the range 0.1-0.9. In contrast to the situation at higher dose rates, there are statistically non-significant decreases of risk per unit dose at gamma dose rates of less than or equal to 5 mGy/h for most malignant endpoints, and generally non-significant increases in risk per unit dose at gamma dose rates ≤5 mGy/h for most non-malignant endpoints. Associated with this, the dose-rate extrapolation factor, the ratio of high dose-rate to low dose-rate (≤5 mGy/h) gamma dose response slopes, for many tumour sites is in the range 1.2-2.3, albeit not statistically significantly elevated from 1, while for most non-malignant endpoints the gamma dose-rate extrapolation factor is less than 1, with most estimates in the range 0.2-0.8. After neutron exposure there are non-significant indications of lower risk per unit dose at dose rates ≤5 mGy/h compared to higher dose rates for most malignant endpoints, and for all tumours (p = 0.001), and respiratory tumours (p = 0.007) this reduction is conventionally statistically significant; for most non-malignant outcomes risks per unit dose non-significantly increase at lower dose rates. Associated with this, the neutron dose-rate extrapolation factor is less than 1 for most malignant and non-malignant endpoints, in many cases statistically significantly so, with central estimates mostly in the range 0.0-0.2.
Forecasting volatility with neural regression: a contribution to model adequacy.

PubMed

Refenes, A N; Holt, W T

2001-01-01

Neural nets' usefulness for forecasting is limited by problems of overfitting and the lack of rigorous procedures for model identification, selection and adequacy testing. This paper describes a methodology for neural model misspecification testing. We introduce a generalization of the Durbin-Watson statistic for neural regression and discuss the general issues of misspecification testing using residual analysis. We derive a generalized influence matrix for neural estimators which enables us to evaluate the distribution of the statistic. We deploy Monte Carlo simulation to compare the power of the test for neural and linear regressors. While residual testing is not a sufficient condition for model adequacy, it is nevertheless a necessary condition to demonstrate that the model is a good approximation to the data generating process, particularly as neural-network estimation procedures are susceptible to partial convergence. The work is also an important step toward developing rigorous procedures for neural model identification, selection and adequacy testing which have started to appear in the literature. We demonstrate its applicability in the nontrivial problem of forecasting implied volatility innovations using high-frequency stock index options. Each step of the model building process is validated using statistical tests to verify variable significance and model adequacy with the results confirming the presence of nonlinear relationships in implied volatility innovations.
Tracking Electroencephalographic Changes Using Distributions of Linear Models: Application to Propofol-Based Depth of Anesthesia Monitoring.

PubMed

Kuhlmann, Levin; Manton, Jonathan H; Heyse, Bjorn; Vereecke, Hugo E M; Lipping, Tarmo; Struys, Michel M R F; Liley, David T J

2017-04-01

Tracking brain states with electrophysiological measurements often relies on short-term averages of extracted features and this may not adequately capture the variability of brain dynamics. The objective is to assess the hypotheses that this can be overcome by tracking distributions of linear models using anesthesia data, and that anesthetic brain state tracking performance of linear models is comparable to that of a high performing depth of anesthesia monitoring feature. Individuals' brain states are classified by comparing the distribution of linear (auto-regressive moving average-ARMA) model parameters estimated from electroencephalographic (EEG) data obtained with a sliding window to distributions of linear model parameters for each brain state. The method is applied to frontal EEG data from 15 subjects undergoing propofol anesthesia and classified by the observers assessment of alertness/sedation (OAA/S) scale. Classification of the OAA/S score was performed using distributions of either ARMA parameters or the benchmark feature, Higuchi fractal dimension. The highest average testing sensitivity of 59% (chance sensitivity: 17%) was found for ARMA (2,1) models and Higuchi fractal dimension achieved 52%, however, no statistical difference was observed. For the same ARMA case, there was no statistical difference if medians are used instead of distributions (sensitivity: 56%). The model-based distribution approach is not necessarily more effective than a median/short-term average approach, however, it performs well compared with a distribution approach based on a high performing anesthesia monitoring measure. These techniques hold potential for anesthesia monitoring and may be generally applicable for tracking brain states.
Comparison of Neural Network and Linear Regression Models in Statistically Predicting Mental and Physical Health Status of Breast Cancer Survivors

DTIC Science & Technology

2015-07-15

Long-term effects on cancer survivors’ quality of life of physical training versus physical training combined with cognitive-behavioral therapy ...COMPARISON OF NEURAL NETWORK AND LINEAR REGRESSION MODELS IN STATISTICALLY PREDICTING MENTAL AND PHYSICAL HEALTH STATUS OF BREAST...34Comparison of Neural Network and Linear Regression Models in Statistically Predicting Mental and Physical Health Status of Breast Cancer Survivors
Machine Learning-based discovery of closures for reduced models of dynamical systems

NASA Astrophysics Data System (ADS)

Pan, Shaowu; Duraisamy, Karthik

2017-11-01

Despite the successful application of machine learning (ML) in fields such as image processing and speech recognition, only a few attempts has been made toward employing ML to represent the dynamics of complex physical systems. Previous attempts mostly focus on parameter calibration or data-driven augmentation of existing models. In this work we present a ML framework to discover closure terms in reduced models of dynamical systems and provide insights into potential problems associated with data-driven modeling. Based on exact closure models for linear system, we propose a general linear closure framework from viewpoint of optimization. The framework is based on trapezoidal approximation of convolution term. Hyperparameters that need to be determined include temporal length of memory effect, number of sampling points, and dimensions of hidden states. To circumvent the explicit specification of memory effect, a general framework inspired from neural networks is also proposed. We conduct both a priori and posteriori evaluations of the resulting model on a number of non-linear dynamical systems. This work was supported in part by AFOSR under the project ``LES Modeling of Non-local effects using Statistical Coarse-graining'' with Dr. Jean-Luc Cambier as the technical monitor.
Sea surface temperature anomalies, planetary waves, and air-sea feedback in the middle latitudes

NASA Technical Reports Server (NTRS)

Frankignoul, C.

1985-01-01

Current analytical models for large-scale air-sea interactions in the middle latitudes are reviewed in terms of known sea-surface temperature (SST) anomalies. The scales and strength of different atmospheric forcing mechanisms are discussed, along with the damping and feedback processes controlling the evolution of the SST. Difficulties with effective SST modeling are described in terms of the techniques and results of case studies, numerical simulations of mixed-layer variability and statistical modeling. The relationship between SST and diabatic heating anomalies is considered and a linear model is developed for the response of the stationary atmosphere to the air-sea feedback. The results obtained with linear wave models are compared with the linear model results. Finally, sample data are presented from experiments with general circulation models into which specific SST anomaly data for the middle latitudes were introduced.
Statistical Analysis for Collision-free Boson Sampling.

PubMed

Huang, He-Liang; Zhong, Han-Sen; Li, Tan; Li, Feng-Guang; Fu, Xiang-Qun; Zhang, Shuo; Wang, Xiang; Bao, Wan-Su

2017-11-10

Boson sampling is strongly believed to be intractable for classical computers but solvable with photons in linear optics, which raises widespread concern as a rapid way to demonstrate the quantum supremacy. However, due to its solution is mathematically unverifiable, how to certify the experimental results becomes a major difficulty in the boson sampling experiment. Here, we develop a statistical analysis scheme to experimentally certify the collision-free boson sampling. Numerical simulations are performed to show the feasibility and practicability of our scheme, and the effects of realistic experimental conditions are also considered, demonstrating that our proposed scheme is experimentally friendly. Moreover, our broad approach is expected to be generally applied to investigate multi-particle coherent dynamics beyond the boson sampling.
On the repeated measures designs and sample sizes for randomized controlled trials.

PubMed

Tango, Toshiro

2016-04-01

For the analysis of longitudinal or repeated measures data, generalized linear mixed-effects models provide a flexible and powerful tool to deal with heterogeneity among subject response profiles. However, the typical statistical design adopted in usual randomized controlled trials is an analysis of covariance type analysis using a pre-defined pair of "pre-post" data, in which pre-(baseline) data are used as a covariate for adjustment together with other covariates. Then, the major design issue is to calculate the sample size or the number of subjects allocated to each treatment group. In this paper, we propose a new repeated measures design and sample size calculations combined with generalized linear mixed-effects models that depend not only on the number of subjects but on the number of repeated measures before and after randomization per subject used for the analysis. The main advantages of the proposed design combined with the generalized linear mixed-effects models are (1) it can easily handle missing data by applying the likelihood-based ignorable analyses under the missing at random assumption and (2) it may lead to a reduction in sample size, compared with the simple pre-post design. The proposed designs and the sample size calculations are illustrated with real data arising from randomized controlled trials. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Walking through the statistical black boxes of plant breeding.

PubMed

Xavier, Alencar; Muir, William M; Craig, Bruce; Rainey, Katy Martin

2016-10-01

The main statistical procedures in plant breeding are based on Gaussian process and can be computed through mixed linear models. Intelligent decision making relies on our ability to extract useful information from data to help us achieve our goals more efficiently. Many plant breeders and geneticists perform statistical analyses without understanding the underlying assumptions of the methods or their strengths and pitfalls. In other words, they treat these statistical methods (software and programs) like black boxes. Black boxes represent complex pieces of machinery with contents that are not fully understood by the user. The user sees the inputs and outputs without knowing how the outputs are generated. By providing a general background on statistical methodologies, this review aims (1) to introduce basic concepts of machine learning and its applications to plant breeding; (2) to link classical selection theory to current statistical approaches; (3) to show how to solve mixed models and extend their application to pedigree-based and genomic-based prediction; and (4) to clarify how the algorithms of genome-wide association studies work, including their assumptions and limitations.

Do non-targeted effects increase or decrease low dose risk in relation to the linear-non-threshold (LNT) model?☆

PubMed Central

Little, M.P.

2011-01-01

In this paper we review the evidence for departure from linearity for malignant and non-malignant disease and in the light of this assess likely mechanisms, and in particular the potential role for non-targeted effects. Excess cancer risks observed in the Japanese atomic bomb survivors and in many medically and occupationally exposed groups exposed at low or moderate doses are generally statistically compatible. For most cancer sites the dose–response in these groups is compatible with linearity over the range observed. The available data on biological mechanisms do not provide general support for the idea of a low dose threshold or hormesis. This large body of evidence does not suggest, indeed is not statistically compatible with, any very large threshold in dose for cancer, or with possible hormetic effects, and there is little evidence of the sorts of non-linearity in response implied by non-DNA-targeted effects. There are also excess risks of various types of non-malignant disease in the Japanese atomic bomb survivors and in other groups. In particular, elevated risks of cardiovascular disease, respiratory disease and digestive disease are observed in the A-bomb data. In contrast with cancer, there is much less consistency in the patterns of risk between the various exposed groups; for example, radiation-associated respiratory and digestive diseases have not been seen in these other (non-A-bomb) groups. Cardiovascular risks have been seen in many exposed populations, particularly in medically exposed groups, but in contrast with cancer there is much less consistency in risk between studies: risks per unit dose in epidemiological studies vary over at least two orders of magnitude, possibly a result of confounding and effect modification by well known (but unobserved) risk factors. In the absence of a convincing mechanistic explanation of epidemiological evidence that is, at present, less than persuasive, a cause-and-effect interpretation of the reported statistical associations for cardiovascular disease is unreliable but cannot be excluded. Inflammatory processes are the most likely mechanism by which radiation could modify the atherosclerotic disease process. If there is to be modification by low doses of ionizing radiation of cardiovascular disease through this mechanism, a role for non-DNA-targeted effects cannot be excluded. PMID:20105434
Findings regarding the relationships between sociodemographic, psychological, comorbidity factors, and functional status, in geriatric inpatients.

PubMed

Capisizu, Ana; Aurelian, Sorina; Zamfirescu, Andreea; Omer, Ioana; Haras, Monica; Ciobotaru, Camelia; Onose, Liliana; Spircu, Tiberiu; Onose, Gelu

2015-01-01

To assess the impact of socio-demographic and comorbidity factors, and quantified depressive symptoms on disability in inpatients. Observational cross-sectional study, including a number of 80 elderly (16 men, 64 women; mean age 72.48 years; standard deviation 9.95 years) admitted in the Geriatrics Clinic of "St. Luca" Hospital, Bucharest, between May-July, 2012. We used the Functional Independence Measure, Geriatric Depression Scale and an array of socio-demographic and poly-pathology parameters. Statistical analysis included Wilcoxon and Kruskal-Wallis tests for ordinal variables, linear bivariate correlations, general linear model analysis, ANOVA. FIM scores were negatively correlated with age (R=-0.301; 95%CI=-0.439 -0.163; p=0.007); GDS scores had a statistically significant negative correlation (R=-0.322; 95% CI=-0.324 -0.052; p=0.004) with FIM scores. A general linear model, including other variables (gender, age, provenance, matrimonial state, living conditions, education, respectively number of chronic illnesses) as factors, found living conditions (p=0.027) and the combination of matrimonial state and gender (p=0.004) to significantly influence FIM scores. ANOVA showed significant differences in FIM scores stratified by the number of chronic diseases (p=0.035). Our study objectified the negative impact of depression on functional status; interestingly, education had no influence on FIM scores; living conditions and a combination of matrimonial state and gender had an important impact: patients with living spouses showed better functional scores than divorced/widowers; the number of chronic diseases also affected the FIM scores: lower in patients with significant polypathology. These findings should be considered when designing geriatric rehabilitation programs, especially for home--including skilled--cares.
Correction of the significance level when attempting multiple transformations of an explanatory variable in generalized linear models

PubMed Central

2013-01-01

Background In statistical modeling, finding the most favorable coding for an exploratory quantitative variable involves many tests. This process involves multiple testing problems and requires the correction of the significance level. Methods For each coding, a test on the nullity of the coefficient associated with the new coded variable is computed. The selected coding corresponds to that associated with the largest statistical test (or equivalently the smallest pvalue). In the context of the Generalized Linear Model, Liquet and Commenges (Stat Probability Lett,71:33–38,2005) proposed an asymptotic correction of the significance level. This procedure, based on the score test, has been developed for dichotomous and Box-Cox transformations. In this paper, we suggest the use of resampling methods to estimate the significance level for categorical transformations with more than two levels and, by definition those that involve more than one parameter in the model. The categorical transformation is a more flexible way to explore the unknown shape of the effect between an explanatory and a dependent variable. Results The simulations we ran in this study showed good performances of the proposed methods. These methods were illustrated using the data from a study of the relationship between cholesterol and dementia. Conclusion The algorithms were implemented using R, and the associated CPMCGLM R package is available on the CRAN. PMID:23758852
Statistical Methods for Quality Control of Steel Coils Manufacturing Process using Generalized Linear Models

NASA Astrophysics Data System (ADS)

García-Díaz, J. Carlos

2009-11-01

Fault detection and diagnosis is an important problem in process engineering. Process equipments are subject to malfunctions during operation. Galvanized steel is a value added product, furnishing effective performance by combining the corrosion resistance of zinc with the strength and formability of steel. Fault detection and diagnosis is an important problem in continuous hot dip galvanizing and the increasingly stringent quality requirements in automotive industry has also demanded ongoing efforts in process control to make the process more robust. When faults occur, they change the relationship among these observed variables. This work compares different statistical regression models proposed in the literature for estimating the quality of galvanized steel coils on the basis of short time histories. Data for 26 batches were available. Five variables were selected for monitoring the process: the steel strip velocity, four bath temperatures and bath level. The entire data consisting of 48 galvanized steel coils was divided into sets. The first training data set was 25 conforming coils and the second data set was 23 nonconforming coils. Logistic regression is a modeling tool in which the dependent variable is categorical. In most applications, the dependent variable is binary. The results show that the logistic generalized linear models do provide good estimates of quality coils and can be useful for quality control in manufacturing process.
Optimizing the general linear model for functional near-infrared spectroscopy: an adaptive hemodynamic response function approach

PubMed Central

Uga, Minako; Dan, Ippeita; Sano, Toshifumi; Dan, Haruka; Watanabe, Eiju

2014-01-01

Abstract. An increasing number of functional near-infrared spectroscopy (fNIRS) studies utilize a general linear model (GLM) approach, which serves as a standard statistical method for functional magnetic resonance imaging (fMRI) data analysis. While fMRI solely measures the blood oxygen level dependent (BOLD) signal, fNIRS measures the changes of oxy-hemoglobin (oxy-Hb) and deoxy-hemoglobin (deoxy-Hb) signals at a temporal resolution severalfold higher. This suggests the necessity of adjusting the temporal parameters of a GLM for fNIRS signals. Thus, we devised a GLM-based method utilizing an adaptive hemodynamic response function (HRF). We sought the optimum temporal parameters to best explain the observed time series data during verbal fluency and naming tasks. The peak delay of the HRF was systematically changed to achieve the best-fit model for the observed oxy- and deoxy-Hb time series data. The optimized peak delay showed different values for each Hb signal and task. When the optimized peak delays were adopted, the deoxy-Hb data yielded comparable activations with similar statistical power and spatial patterns to oxy-Hb data. The adaptive HRF method could suitably explain the behaviors of both Hb parameters during tasks with the different cognitive loads during a time course, and thus would serve as an objective method to fully utilize the temporal structures of all fNIRS data. PMID:26157973
Streamflow record extension using power transformations and application to sediment transport

NASA Astrophysics Data System (ADS)

Moog, Douglas B.; Whiting, Peter J.; Thomas, Robert B.

1999-01-01

To obtain a representative set of flow rates for a stream, it is often desirable to fill in missing data or extend measurements to a longer time period by correlation to a nearby gage with a longer record. Linear least squares regression of the logarithms of the flows is a traditional and still common technique. However, its purpose is to generate optimal estimates of each day's discharge, rather than the population of discharges, for which it tends to underestimate variance. Maintenance-of-variance-extension (MOVE) equations [Hirsch, 1982] were developed to correct this bias. This study replaces the logarithmic transformation by the more general Box-Cox scaled power transformation, generating a more linear, constant-variance relationship for the MOVE extension. Combining the Box-Cox transformation with the MOVE extension is shown to improve accuracy in estimating order statistics of flow rate, particularly for the nonextreme discharges which generally govern cumulative transport over time. This advantage is illustrated by prediction of cumulative fractions of total bed load transport.
Bayesian inference on risk differences: an application to multivariate meta-analysis of adverse events in clinical trials.

PubMed

Chen, Yong; Luo, Sheng; Chu, Haitao; Wei, Peng

2013-05-01

Multivariate meta-analysis is useful in combining evidence from independent studies which involve several comparisons among groups based on a single outcome. For binary outcomes, the commonly used statistical models for multivariate meta-analysis are multivariate generalized linear mixed effects models which assume risks, after some transformation, follow a multivariate normal distribution with possible correlations. In this article, we consider an alternative model for multivariate meta-analysis where the risks are modeled by the multivariate beta distribution proposed by Sarmanov (1966). This model have several attractive features compared to the conventional multivariate generalized linear mixed effects models, including simplicity of likelihood function, no need to specify a link function, and has a closed-form expression of distribution functions for study-specific risk differences. We investigate the finite sample performance of this model by simulation studies and illustrate its use with an application to multivariate meta-analysis of adverse events of tricyclic antidepressants treatment in clinical trials.
Measuring the individual benefit of a medical or behavioral treatment using generalized linear mixed-effects models.

PubMed

Diaz, Francisco J

2016-10-15

We propose statistical definitions of the individual benefit of a medical or behavioral treatment and of the severity of a chronic illness. These definitions are used to develop a graphical method that can be used by statisticians and clinicians in the data analysis of clinical trials from the perspective of personalized medicine. The method focuses on assessing and comparing individual effects of treatments rather than average effects and can be used with continuous and discrete responses, including dichotomous and count responses. The method is based on new developments in generalized linear mixed-effects models, which are introduced in this article. To illustrate, analyses of data from the Sequenced Treatment Alternatives to Relieve Depression clinical trial of sequences of treatments for depression and data from a clinical trial of respiratory treatments are presented. The estimation of individual benefits is also explained. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Random forests in non-invasive sensorimotor rhythm brain-computer interfaces: a practical and convenient non-linear classifier.

PubMed

Steyrl, David; Scherer, Reinhold; Faller, Josef; Müller-Putz, Gernot R

2016-02-01

There is general agreement in the brain-computer interface (BCI) community that although non-linear classifiers can provide better results in some cases, linear classifiers are preferable. Particularly, as non-linear classifiers often involve a number of parameters that must be carefully chosen. However, new non-linear classifiers were developed over the last decade. One of them is the random forest (RF) classifier. Although popular in other fields of science, RFs are not common in BCI research. In this work, we address three open questions regarding RFs in sensorimotor rhythm (SMR) BCIs: parametrization, online applicability, and performance compared to regularized linear discriminant analysis (LDA). We found that the performance of RF is constant over a large range of parameter values. We demonstrate - for the first time - that RFs are applicable online in SMR-BCIs. Further, we show in an offline BCI simulation that RFs statistically significantly outperform regularized LDA by about 3%. These results confirm that RFs are practical and convenient non-linear classifiers for SMR-BCIs. Taking into account further properties of RFs, such as independence from feature distributions, maximum margin behavior, multiclass and advanced data mining capabilities, we argue that RFs should be taken into consideration for future BCIs.
Objective assessment of image quality. IV. Application to adaptive optics

PubMed Central

Barrett, Harrison H.; Myers, Kyle J.; Devaney, Nicholas; Dainty, Christopher

2008-01-01

The methodology of objective assessment, which defines image quality in terms of the performance of specific observers on specific tasks of interest, is extended to temporal sequences of images with random point spread functions and applied to adaptive imaging in astronomy. The tasks considered include both detection and estimation, and the observers are the optimal linear discriminant (Hotelling observer) and the optimal linear estimator (Wiener). A general theory of first- and second-order spatiotemporal statistics in adaptive optics is developed. It is shown that the covariance matrix can be rigorously decomposed into three terms representing the effect of measurement noise, random point spread function, and random nature of the astronomical scene. Figures of merit are developed, and computational methods are discussed. PMID:17106464
Granger-causality maps of diffusion processes.

PubMed

Wahl, Benjamin; Feudel, Ulrike; Hlinka, Jaroslav; Wächter, Matthias; Peinke, Joachim; Freund, Jan A

2016-02-01

Granger causality is a statistical concept devised to reconstruct and quantify predictive information flow between stochastic processes. Although the general concept can be formulated model-free it is often considered in the framework of linear stochastic processes. Here we show how local linear model descriptions can be employed to extend Granger causality into the realm of nonlinear systems. This novel treatment results in maps that resolve Granger causality in regions of state space. Through examples we provide a proof of concept and illustrate the utility of these maps. Moreover, by integration we convert the local Granger causality into a global measure that yields a consistent picture for a global Ornstein-Uhlenbeck process. Finally, we recover invariance transformations known from the theory of autoregressive processes.
Uncovering Local Trends in Genetic Effects of Multiple Phenotypes via Functional Linear Models.

PubMed

Vsevolozhskaya, Olga A; Zaykin, Dmitri V; Barondess, David A; Tong, Xiaoren; Jadhav, Sneha; Lu, Qing

2016-04-01

Recent technological advances equipped researchers with capabilities that go beyond traditional genotyping of loci known to be polymorphic in a general population. Genetic sequences of study participants can now be assessed directly. This capability removed technology-driven bias toward scoring predominantly common polymorphisms and let researchers reveal a wealth of rare and sample-specific variants. Although the relative contributions of rare and common polymorphisms to trait variation are being debated, researchers are faced with the need for new statistical tools for simultaneous evaluation of all variants within a region. Several research groups demonstrated flexibility and good statistical power of the functional linear model approach. In this work we extend previous developments to allow inclusion of multiple traits and adjustment for additional covariates. Our functional approach is unique in that it provides a nuanced depiction of effects and interactions for the variables in the model by representing them as curves varying over a genetic region. We demonstrate flexibility and competitive power of our approach by contrasting its performance with commonly used statistical tools and illustrate its potential for discovery and characterization of genetic architecture of complex traits using sequencing data from the Dallas Heart Study. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.
A Kp-based model of auroral boundaries

NASA Astrophysics Data System (ADS)

Carbary, James F.

2005-10-01

The auroral oval can serve as both a representation and a prediction of space weather on a global scale, so a competent model of the oval as a function of a geomagnetic index could conveniently appraise space weather itself. A simple model of the auroral boundaries is constructed by binning several months of images from the Polar Ultraviolet Imager by Kp index. The pixel intensities are first averaged into magnetic latitude-magnetic local time (MLT-MLAT) and local time bins, and intensity profiles are then derived for each Kp level at 1 hour intervals of MLT. After background correction, the boundary latitudes of each profile are determined at a threshold of 4 photons cm-2 s1. The peak locations and peak intensities are also found. The boundary and peak locations vary linearly with Kp index, and the coefficients of the linear fits are tabulated for each MLT. As a general rule of thumb, the UV intensity peak shifts 1° in magnetic latitude for each increment in Kp. The fits are surprisingly good for Kp < 6 but begin to deteriorate at high Kp because of auroral boundary irregularities and poor statistics. The statistical model allows calculation of the auroral boundaries at most MLTs as a function of Kp and can serve as an approximation to the shape and extent of the statistical oval.
Evaluating the statistical performance of less applied algorithms in classification of worldview-3 imagery data in an urbanized landscape

NASA Astrophysics Data System (ADS)

Ranaie, Mehrdad; Soffianian, Alireza; Pourmanafi, Saeid; Mirghaffari, Noorollah; Tarkesh, Mostafa

2018-03-01

In recent decade, analyzing the remotely sensed imagery is considered as one of the most common and widely used procedures in the environmental studies. In this case, supervised image classification techniques play a central role. Hence, taking a high resolution Worldview-3 over a mixed urbanized landscape in Iran, three less applied image classification methods including Bagged CART, Stochastic gradient boosting model and Neural network with feature extraction were tested and compared with two prevalent methods: random forest and support vector machine with linear kernel. To do so, each method was run ten time and three validation techniques was used to estimate the accuracy statistics consist of cross validation, independent validation and validation with total of train data. Moreover, using ANOVA and Tukey test, statistical difference significance between the classification methods was significantly surveyed. In general, the results showed that random forest with marginal difference compared to Bagged CART and stochastic gradient boosting model is the best performing method whilst based on independent validation there was no significant difference between the performances of classification methods. It should be finally noted that neural network with feature extraction and linear support vector machine had better processing speed than other.
Statistical methods for launch vehicle guidance, navigation, and control (GN&C) system design and analysis

NASA Astrophysics Data System (ADS)

Rose, Michael Benjamin

A novel trajectory and attitude control and navigation analysis tool for powered ascent is developed. The tool is capable of rapid trade-space analysis and is designed to ultimately reduce turnaround time for launch vehicle design, mission planning, and redesign work. It is streamlined to quickly determine trajectory and attitude control dispersions, propellant dispersions, orbit insertion dispersions, and navigation errors and their sensitivities to sensor errors, actuator execution uncertainties, and random disturbances. The tool is developed by applying both Monte Carlo and linear covariance analysis techniques to a closed-loop, launch vehicle guidance, navigation, and control (GN&C) system. The nonlinear dynamics and flight GN&C software models of a closed-loop, six-degree-of-freedom (6-DOF), Monte Carlo simulation are formulated and developed. The nominal reference trajectory (NRT) for the proposed lunar ascent trajectory is defined and generated. The Monte Carlo truth models and GN&C algorithms are linearized about the NRT, the linear covariance equations are formulated, and the linear covariance simulation is developed. The performance of the launch vehicle GN&C system is evaluated using both Monte Carlo and linear covariance techniques and their trajectory and attitude control dispersion, propellant dispersion, orbit insertion dispersion, and navigation error results are validated and compared. Statistical results from linear covariance analysis are generally within 10% of Monte Carlo results, and in most cases the differences are less than 5%. This is an excellent result given the many complex nonlinearities that are embedded in the ascent GN&C problem. Moreover, the real value of this tool lies in its speed, where the linear covariance simulation is 1036.62 times faster than the Monte Carlo simulation. Although the application and results presented are for a lunar, single-stage-to-orbit (SSTO), ascent vehicle, the tools, techniques, and mathematical formulations that are discussed are applicable to ascent on Earth or other planets as well as other rocket-powered systems such as sounding rockets and ballistic missiles.
Generalized functional linear models for gene-based case-control association studies.

PubMed

Fan, Ruzong; Wang, Yifan; Mills, James L; Carter, Tonia C; Lobach, Iryna; Wilson, Alexander F; Bailey-Wilson, Joan E; Weeks, Daniel E; Xiong, Momiao

2014-11-01

By using functional data analysis techniques, we developed generalized functional linear models for testing association between a dichotomous trait and multiple genetic variants in a genetic region while adjusting for covariates. Both fixed and mixed effect models are developed and compared. Extensive simulations show that Rao's efficient score tests of the fixed effect models are very conservative since they generate lower type I errors than nominal levels, and global tests of the mixed effect models generate accurate type I errors. Furthermore, we found that the Rao's efficient score test statistics of the fixed effect models have higher power than the sequence kernel association test (SKAT) and its optimal unified version (SKAT-O) in most cases when the causal variants are both rare and common. When the causal variants are all rare (i.e., minor allele frequencies less than 0.03), the Rao's efficient score test statistics and the global tests have similar or slightly lower power than SKAT and SKAT-O. In practice, it is not known whether rare variants or common variants in a gene region are disease related. All we can assume is that a combination of rare and common variants influences disease susceptibility. Thus, the improved performance of our models when the causal variants are both rare and common shows that the proposed models can be very useful in dissecting complex traits. We compare the performance of our methods with SKAT and SKAT-O on real neural tube defects and Hirschsprung's disease datasets. The Rao's efficient score test statistics and the global tests are more sensitive than SKAT and SKAT-O in the real data analysis. Our methods can be used in either gene-disease genome-wide/exome-wide association studies or candidate gene analyses. © 2014 WILEY PERIODICALS, INC.
Generalized Functional Linear Models for Gene-based Case-Control Association Studies

PubMed Central

Mills, James L.; Carter, Tonia C.; Lobach, Iryna; Wilson, Alexander F.; Bailey-Wilson, Joan E.; Weeks, Daniel E.; Xiong, Momiao

2014-01-01

By using functional data analysis techniques, we developed generalized functional linear models for testing association between a dichotomous trait and multiple genetic variants in a genetic region while adjusting for covariates. Both fixed and mixed effect models are developed and compared. Extensive simulations show that Rao's efficient score tests of the fixed effect models are very conservative since they generate lower type I errors than nominal levels, and global tests of the mixed effect models generate accurate type I errors. Furthermore, we found that the Rao's efficient score test statistics of the fixed effect models have higher power than the sequence kernel association test (SKAT) and its optimal unified version (SKAT-O) in most cases when the causal variants are both rare and common. When the causal variants are all rare (i.e., minor allele frequencies less than 0.03), the Rao's efficient score test statistics and the global tests have similar or slightly lower power than SKAT and SKAT-O. In practice, it is not known whether rare variants or common variants in a gene are disease-related. All we can assume is that a combination of rare and common variants influences disease susceptibility. Thus, the improved performance of our models when the causal variants are both rare and common shows that the proposed models can be very useful in dissecting complex traits. We compare the performance of our methods with SKAT and SKAT-O on real neural tube defects and Hirschsprung's disease data sets. The Rao's efficient score test statistics and the global tests are more sensitive than SKAT and SKAT-O in the real data analysis. Our methods can be used in either gene-disease genome-wide/exome-wide association studies or candidate gene analyses. PMID:25203683
VoxelStats: A MATLAB Package for Multi-Modal Voxel-Wise Brain Image Analysis.

PubMed

Mathotaarachchi, Sulantha; Wang, Seqian; Shin, Monica; Pascoal, Tharick A; Benedet, Andrea L; Kang, Min Su; Beaudry, Thomas; Fonov, Vladimir S; Gauthier, Serge; Labbe, Aurélie; Rosa-Neto, Pedro

2016-01-01

In healthy individuals, behavioral outcomes are highly associated with the variability on brain regional structure or neurochemical phenotypes. Similarly, in the context of neurodegenerative conditions, neuroimaging reveals that cognitive decline is linked to the magnitude of atrophy, neurochemical declines, or concentrations of abnormal protein aggregates across brain regions. However, modeling the effects of multiple regional abnormalities as determinants of cognitive decline at the voxel level remains largely unexplored by multimodal imaging research, given the high computational cost of estimating regression models for every single voxel from various imaging modalities. VoxelStats is a voxel-wise computational framework to overcome these computational limitations and to perform statistical operations on multiple scalar variables and imaging modalities at the voxel level. VoxelStats package has been developed in Matlab(®) and supports imaging formats such as Nifti-1, ANALYZE, and MINC v2. Prebuilt functions in VoxelStats enable the user to perform voxel-wise general and generalized linear models and mixed effect models with multiple volumetric covariates. Importantly, VoxelStats can recognize scalar values or image volumes as response variables and can accommodate volumetric statistical covariates as well as their interaction effects with other variables. Furthermore, this package includes built-in functionality to perform voxel-wise receiver operating characteristic analysis and paired and unpaired group contrast analysis. Validation of VoxelStats was conducted by comparing the linear regression functionality with existing toolboxes such as glim_image and RMINC. The validation results were identical to existing methods and the additional functionality was demonstrated by generating feature case assessments (t-statistics, odds ratio, and true positive rate maps). In summary, VoxelStats expands the current methods for multimodal imaging analysis by allowing the estimation of advanced regional association metrics at the voxel level.
Potential pitfalls when denoising resting state fMRI data using nuisance regression.

PubMed

Bright, Molly G; Tench, Christopher R; Murphy, Kevin

2017-07-01

In resting state fMRI, it is necessary to remove signal variance associated with noise sources, leaving cleaned fMRI time-series that more accurately reflect the underlying intrinsic brain fluctuations of interest. This is commonly achieved through nuisance regression, in which the fit is calculated of a noise model of head motion and physiological processes to the fMRI data in a General Linear Model, and the "cleaned" residuals of this fit are used in further analysis. We examine the statistical assumptions and requirements of the General Linear Model, and whether these are met during nuisance regression of resting state fMRI data. Using toy examples and real data we show how pre-whitening, temporal filtering and temporal shifting of regressors impact model fit. Based on our own observations, existing literature, and statistical theory, we make the following recommendations when employing nuisance regression: pre-whitening should be applied to achieve valid statistical inference of the noise model fit parameters; temporal filtering should be incorporated into the noise model to best account for changes in degrees of freedom; temporal shifting of regressors, although merited, should be achieved via optimisation and validation of a single temporal shift. We encourage all readers to make simple, practical changes to their fMRI denoising pipeline, and to regularly assess the appropriateness of the noise model used. By negotiating the potential pitfalls described in this paper, and by clearly reporting the details of nuisance regression in future manuscripts, we hope that the field will achieve more accurate and precise noise models for cleaning the resting state fMRI time-series. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Normalization Approaches for Removing Systematic Biases Associated with Mass Spectrometry and Label-Free Proteomics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Callister, Stephen J.; Barry, Richard C.; Adkins, Joshua N.

2006-02-01

Central tendency, linear regression, locally weighted regression, and quantile techniques were investigated for normalization of peptide abundance measurements obtained from high-throughput liquid chromatography-Fourier transform ion cyclotron resonance mass spectrometry (LC-FTICR MS). Arbitrary abundances of peptides were obtained from three sample sets, including a standard protein sample, two Deinococcus radiodurans samples taken from different growth phases, and two mouse striatum samples from control and methamphetamine-stressed mice (strain C57BL/6). The selected normalization techniques were evaluated in both the absence and presence of biological variability by estimating extraneous variability prior to and following normalization. Prior to normalization, replicate runs from each sample setmore » were observed to be statistically different, while following normalization replicate runs were no longer statistically different. Although all techniques reduced systematic bias, assigned ranks among the techniques revealed significant trends. For most LC-FTICR MS analyses, linear regression normalization ranked either first or second among the four techniques, suggesting that this technique was more generally suitable for reducing systematic biases.« less

Evaluating a linearized Euler equations model for strong turbulence effects on sound propagation.

PubMed

Ehrhardt, Loïc; Cheinet, Sylvain; Juvé, Daniel; Blanc-Benon, Philippe

2013-04-01

Sound propagation outdoors is strongly affected by atmospheric turbulence. Under strongly perturbed conditions or long propagation paths, the sound fluctuations reach their asymptotic behavior, e.g., the intensity variance progressively saturates. The present study evaluates the ability of a numerical propagation model based on the finite-difference time-domain solving of the linearized Euler equations in quantitatively reproducing the wave statistics under strong and saturated intensity fluctuations. It is the continuation of a previous study where weak intensity fluctuations were considered. The numerical propagation model is presented and tested with two-dimensional harmonic sound propagation over long paths and strong atmospheric perturbations. The results are compared to quantitative theoretical or numerical predictions available on the wave statistics, including the log-amplitude variance and the probability density functions of the complex acoustic pressure. The match is excellent for the evaluated source frequencies and all sound fluctuations strengths. Hence, this model captures these many aspects of strong atmospheric turbulence effects on sound propagation. Finally, the model results for the intensity probability density function are compared with a standard fit by a generalized gamma function.
An M-estimator for reduced-rank system identification.

PubMed

Chen, Shaojie; Liu, Kai; Yang, Yuguang; Xu, Yuting; Lee, Seonjoo; Lindquist, Martin; Caffo, Brian S; Vogelstein, Joshua T

2017-01-15

High-dimensional time-series data from a wide variety of domains, such as neuroscience, are being generated every day. Fitting statistical models to such data, to enable parameter estimation and time-series prediction, is an important computational primitive. Existing methods, however, are unable to cope with the high-dimensional nature of these data, due to both computational and statistical reasons. We mitigate both kinds of issues by proposing an M-estimator for Reduced-rank System IDentification ( MR. SID). A combination of low-rank approximations, ℓ 1 and ℓ 2 penalties, and some numerical linear algebra tricks, yields an estimator that is computationally efficient and numerically stable. Simulations and real data examples demonstrate the usefulness of this approach in a variety of problems. In particular, we demonstrate that MR. SID can accurately estimate spatial filters, connectivity graphs, and time-courses from native resolution functional magnetic resonance imaging data. MR. SID therefore enables big time-series data to be analyzed using standard methods, readying the field for further generalizations including non-linear and non-Gaussian state-space models.
An M-estimator for reduced-rank system identification

PubMed Central

Chen, Shaojie; Liu, Kai; Yang, Yuguang; Xu, Yuting; Lee, Seonjoo; Lindquist, Martin; Caffo, Brian S.; Vogelstein, Joshua T.

2018-01-01

High-dimensional time-series data from a wide variety of domains, such as neuroscience, are being generated every day. Fitting statistical models to such data, to enable parameter estimation and time-series prediction, is an important computational primitive. Existing methods, however, are unable to cope with the high-dimensional nature of these data, due to both computational and statistical reasons. We mitigate both kinds of issues by proposing an M-estimator for Reduced-rank System IDentification ( MR. SID). A combination of low-rank approximations, ℓ1 and ℓ2 penalties, and some numerical linear algebra tricks, yields an estimator that is computationally efficient and numerically stable. Simulations and real data examples demonstrate the usefulness of this approach in a variety of problems. In particular, we demonstrate that MR. SID can accurately estimate spatial filters, connectivity graphs, and time-courses from native resolution functional magnetic resonance imaging data. MR. SID therefore enables big time-series data to be analyzed using standard methods, readying the field for further generalizations including non-linear and non-Gaussian state-space models. PMID:29391659
A SIGNIFICANCE TEST FOR THE LASSO1

PubMed Central

Lockhart, Richard; Taylor, Jonathan; Tibshirani, Ryan J.; Tibshirani, Robert

2014-01-01

In the sparse linear regression setting, we consider testing the significance of the predictor variable that enters the current lasso model, in the sequence of models visited along the lasso solution path. We propose a simple test statistic based on lasso fitted values, called the covariance test statistic, and show that when the true model is linear, this statistic has an Exp(1) asymptotic distribution under the null hypothesis (the null being that all truly active variables are contained in the current lasso model). Our proof of this result for the special case of the first predictor to enter the model (i.e., testing for a single significant predictor variable against the global null) requires only weak assumptions on the predictor matrix X. On the other hand, our proof for a general step in the lasso path places further technical assumptions on X and the generative model, but still allows for the important high-dimensional case p > n, and does not necessarily require that the current lasso model achieves perfect recovery of the truly active variables. Of course, for testing the significance of an additional variable between two nested linear models, one typically uses the chi-squared test, comparing the drop in residual sum of squares (RSS) to a χ12 distribution. But when this additional variable is not fixed, and has been chosen adaptively or greedily, this test is no longer appropriate: adaptivity makes the drop in RSS stochastically much larger than χ12 under the null hypothesis. Our analysis explicitly accounts for adaptivity, as it must, since the lasso builds an adaptive sequence of linear models as the tuning parameter λ decreases. In this analysis, shrinkage plays a key role: though additional variables are chosen adaptively, the coefficients of lasso active variables are shrunken due to the l1 penalty. Therefore, the test statistic (which is based on lasso fitted values) is in a sense balanced by these two opposing properties—adaptivity and shrinkage—and its null distribution is tractable and asymptotically Exp(1). PMID:25574062
Experimental cosmic statistics - I. Variance

NASA Astrophysics Data System (ADS)

Colombi, Stéphane; Szapudi, István; Jenkins, Adrian; Colberg, Jörg

2000-04-01

Counts-in-cells are measured in the τCDM Virgo Hubble Volume simulation. This large N-body experiment has 109 particles in a cubic box of size 2000h-1Mpc. The unprecedented combination of size and resolution allows, for the first time, a realistic numerical analysis of the cosmic errors and cosmic correlations of statistics related to counts-in-cells measurements, such as the probability distribution function PN itself, its factorial moments Fk and the related cumulants ψ and SNs. These statistics are extracted from the whole simulation cube, as well as from 4096 subcubes of size 125h-1Mpc, each representing a virtual random realization of the local universe. The measurements and their scatter over the subvolumes are compared to the theoretical predictions of Colombi, Bouchet & Schaeffer for P0, and of Szapudi & Colombi and Szapudi, Colombi & Bernardeau for the factorial moments and the cumulants. The general behaviour of experimental variance and cross-correlations as functions of scale and order is well described by theoretical predictions, with a few per cent accuracy in the weakly non-linear regime for the cosmic error on factorial moments. On highly non-linear scales, however, all variants of the hierarchical model used by SC and SCB to describe clustering appear to become increasingly approximate, which leads to a slight overestimation of the error, by about a factor of two in the worst case. Because of the needed supplementary perturbative approach, the theory is less accurate for non-linear estimators, such as cumulants, than for factorial moments. The cosmic bias is evaluated as well, and, in agreement with SCB, is found to be insignificant compared with the cosmic variance in all regimes investigated. While higher order statistics were previously evaluated in several simulations, this work presents textbook quality measurements of SNs, 3<=N<=10, in an unprecedented dynamic range of 0.05 <~ ψ <~ 50. In the weakly non-linear regime the results confirm previous findings and agree remarkably well with perturbation theory predictions including the one-loop corrections based on spherical collapse by Fosalba & Gaztañaga. Extended perturbation theory is confirmed on all scales.
Becoming angular momentum density flow through nonlinear mass transfer into a gravitating spheroidal body

NASA Astrophysics Data System (ADS)

Krot, A. M.

2009-04-01

A statistical theory for a cosmological body forming based on the spheroidal body model has been proposed in the works [1]-[4]. This work studies a slowly evolving process of gravitational condensation of a spheroidal body from an infinitely distributed gas-dust substance in space. The equation for an initial evolution of mass density function of a gas-dust cloud is considered here. It is found this equation coincides completely with the analogous equation for a slowly gravitational compressed spheroidal body [5]. A conductive flow in dissipative systems was investigated by I. Prigogine in his works (see, for example, [6], [7]). As it has been found in [2], [5], there exists a conductive antidiffusion flow in a slowly compressible gravitating spheroidal body. Applying the equation of continuity to this conductive flow density we obtain a linear antidiffusion equation [5]. However, if an intensity of conductive flow density increases sharply then the linear antidiffusion equation becomes a nonlinear one. Really, it was pointed to [6] analogous linear equations of diffusion or thermal conductivity transform in nonlinear equations respectively. In this case, the equation of continuity describes a nonlinear mass flow being a source of instabilities into a gravitating spheroidal body because the gravitational compression factor G is a function of not only time but a mass density. Using integral substitution we can reduce a nonlinear antidiffusion equation to the linear antidiffusion equation relative to a new function. If the factor G can be considered as a specific angular momentum then the new function is an angular momentum density. Thus, a nonlinear momentum density flow induces a flow of angular momentum density because streamlines of moving continuous substance come close into a gravitating spheroidal body. Really, the streamline approach leads to more tight interactions of "liquid particles" that implies a superposition of their specific angular momentums. This superposition forms an antidiffusion flow of an angular momentum density into a gravitating spheroidal body. References: [1] Krot, A.M. The statistical model of gravitational interaction of particles. Achievement in Modern Radioelectronics (spec.issue"Cosmic Radiophysics", Moscow), 1996, no.8, pp. 66-81 (in Russian). [2] Krot, A.M. Statistical description of gravitational field: a new approach. Proc. SPIE's 14th Annual Intern.Symp. "AeroSense", Orlando, Florida, USA, 2000, vol.4038, pp.1318-1329. [3] Krot, A.M. The statistical model of rotating and gravitating spheroidal body with the point of view of general relativity. Proc.35th COSPAR Scientific Assembly, Paris, France, 2004, Abstract A-00162. [4] Krot, A. The statistical approach to exploring formation of Solar system. Proc.EGU General Assembly, Vienna, Austria, 2006, Geophys.Res.Abstracts, vol.8, A-00216; SRef-ID: 1607-7962/gra/. [5] Krot, A.M. A statistical approach to investigate the formation of the solar system. Chaos, Solitons and Fractals, 2008, doi:10.1016/j.chaos.2008.06.014. [6] Glansdorff, P. and Prigogine, I. Thermodynamic Theory of Structure, Stability and Fluctuations. London, 1971. [7] Nicolis, G. and Prigogine, I. Self-organization in Nonequilibrium Systems:From Dissipative Structures to Order through Fluctuation. John Willey and Sons, New York etc., 1977.
Detector noise statistics in the non-linear regime

NASA Technical Reports Server (NTRS)

Shopbell, P. L.; Bland-Hawthorn, J.

1992-01-01

The statistical behavior of an idealized linear detector in the presence of threshold and saturation levels is examined. It is assumed that the noise is governed by the statistical fluctuations in the number of photons emitted by the source during an exposure. Since physical detectors cannot have infinite dynamic range, our model illustrates that all devices have non-linear regimes, particularly at high count rates. The primary effect is a decrease in the statistical variance about the mean signal due to a portion of the expected noise distribution being removed via clipping. Higher order statistical moments are also examined, in particular, skewness and kurtosis. In principle, the expected distortion in the detector noise characteristics can be calibrated using flatfield observations with count rates matched to the observations. For this purpose, some basic statistical methods that utilize Fourier analysis techniques are described.
Quantile regression for the statistical analysis of immunological data with many non-detects.

PubMed

Eilers, Paul H C; Röder, Esther; Savelkoul, Huub F J; van Wijk, Roy Gerth

2012-07-07

Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced statistical techniques currently available for the analysis of datasets with non-detects can only be used if a small percentage of the data are non-detects. Quantile regression, a generalization of percentiles to regression models, models the median or higher percentiles and tolerates very high numbers of non-detects. We present a non-technical introduction and illustrate it with an implementation to real data from a clinical trial. We show that by using quantile regression, groups can be compared and that meaningful linear trends can be computed, even if more than half of the data consists of non-detects. Quantile regression is a valuable addition to the statistical methods that can be used for the analysis of immunological datasets with non-detects.
Rotation of EOFs by the Independent Component Analysis: Towards A Solution of the Mixing Problem in the Decomposition of Geophysical Time Series

NASA Technical Reports Server (NTRS)

Aires, Filipe; Rossow, William B.; Chedin, Alain; Hansen, James E. (Technical Monitor)

2001-01-01

The Independent Component Analysis is a recently developed technique for component extraction. This new method requires the statistical independence of the extracted components, a stronger constraint that uses higher-order statistics, instead of the classical decorrelation, a weaker constraint that uses only second-order statistics. This technique has been used recently for the analysis of geophysical time series with the goal of investigating the causes of variability in observed data (i.e. exploratory approach). We demonstrate with a data simulation experiment that, if initialized with a Principal Component Analysis, the Independent Component Analysis performs a rotation of the classical PCA (or EOF) solution. This rotation uses no localization criterion like other Rotation Techniques (RT), only the global generalization of decorrelation by statistical independence is used. This rotation of the PCA solution seems to be able to solve the tendency of PCA to mix several physical phenomena, even when the signal is just their linear sum.
Nonclassical-light generation in a photonic-band-gap nonlinear planar waveguide

NASA Astrophysics Data System (ADS)

Peřina, Jan, Jr.; Sibilia, Concita; Tricca, Daniela; Bertolotti, Mario

2004-10-01

The optical parametric process occurring in a photonic-band-gap planar waveguide is studied from the point of view of nonclassical-light generation. The nonlinearly interacting optical fields are described by the generalized superposition of coherent signals and noise using the method of operator linear corrections to a classical strong solution. Scattered backward-propagating fields are taken into account. Squeezed light as well as light with sub-Poissonian statistics can be obtained in two-mode fields under the specified conditions.
On Deriving and Solving the Generalized Bivariate, Linear Location Problems.

DTIC Science & Technology

1982-09-01

average (Eisenhart, 1978). Francis Galton indirectly coined the term "regression" in his 1885 publication, Natural Inheritance, when he studied sweet...David, F. N. Francis Galton . In W. H. Kruskal & J. Tanur (Eds.), International encyclopedia of statistics (Vol. 1). New York: Free Press, 1978. Dean, W...mhhhEmhnhhEEEI I fllfllfllfllfllfllfl EEEMMhMhMhhhMhI 1111 . I 28 12.5 1.:, 1 2 . 1.21111 1 4 11111I. IIIII~ JIII1L MICROCOPY RESOLUTION TEST CHART NATIONAL
Principal Curves on Riemannian Manifolds.

PubMed

Hauberg, Soren

2016-09-01

Euclidean statistics are often generalized to Riemannian manifolds by replacing straight-line interpolations with geodesic ones. While these Riemannian models are familiar-looking, they are restricted by the inflexibility of geodesics, and they rely on constructions which are optimal only in Euclidean domains. We consider extensions of Principal Component Analysis (PCA) to Riemannian manifolds. Classic Riemannian approaches seek a geodesic curve passing through the mean that optimizes a criteria of interest. The requirements that the solution both is geodesic and must pass through the mean tend to imply that the methods only work well when the manifold is mostly flat within the support of the generating distribution. We argue that instead of generalizing linear Euclidean models, it is more fruitful to generalize non-linear Euclidean models. Specifically, we extend the classic Principal Curves from Hastie & Stuetzle to data residing on a complete Riemannian manifold. We show that for elliptical distributions in the tangent of spaces of constant curvature, the standard principal geodesic is a principal curve. The proposed model is simple to compute and avoids many of the pitfalls of traditional geodesic approaches. We empirically demonstrate the effectiveness of the Riemannian principal curves on several manifolds and datasets.
Spatial and temporal trends in runoff at long-term streamgages within and near the Chesapeake Bay Watershed

USGS Publications Warehouse

Rice, Karen C.; Hirsch, Robert M.

2012-01-01

Long-term streamflow data within the Chesapeake Bay watershed and surrounding area were analyzed in an attempt to identify trends in streamflow. Data from 30 streamgages near and within the Chesapeake Bay watershed were selected from 1930 through 2010 for analysis. Streamflow data were converted to runoff and trend slopes in percent change per decade were calculated. Trend slopes for three runoff statistics (the 7-day minimum, the mean, and the 1-day maximum) were analyzed annually and seasonally. The slopes also were analyzed both spatially and temporally. The spatial results indicated that trend slopes in the northern half of the watershed were generally greater than those in the southern half. The temporal analysis was done by splitting the 80-year flow record into two subsets; records for 28 streamgages were analyzed for 1930 through 1969 and records for 30 streamgages were analyzed for 1970 through 2010. The mean of the data for all sites for each year were plotted so that the following datasets were analyzed: the 7-day minimum runoff for the north, the 7-day minimum runoff for the south, the mean runoff for the north, the mean runoff for the south, the 1-day maximum runoff for the north, and the 1-day maximum runoff for the south. Results indicated that the period 1930 through 1969 was statistically different from the period 1970 through 2010. For the 7-day minimum runoff and the mean runoff, the latter period had significantly higher streamflow than did the earlier period, although within those two periods no significant linear trends were identified. For the 1-day maximum runoff, no step trend or linear trend could be shown to be statistically significant for the north, although the south showed a mixture of an upward step trend accompanied by linear downtrends within the periods. In no case was a change identified that indicated an increasing rate of change over time, and no general pattern was identified of hydrologic conditions becoming "more extreme" over time.
Representing Micro-Macro Linkages by Actor-Based Dynamic Network Models

PubMed Central

Snijders, Tom A.B.; Steglich, Christian E.G.

2014-01-01

Stochastic actor-based models for network dynamics have the primary aim of statistical inference about processes of network change, but may be regarded as a kind of agent-based models. Similar to many other agent-based models, they are based on local rules for actor behavior. Different from many other agent-based models, by including elements of generalized linear statistical models they aim to be realistic detailed representations of network dynamics in empirical data sets. Statistical parallels to micro-macro considerations can be found in the estimation of parameters determining local actor behavior from empirical data, and the assessment of goodness of fit from the correspondence with network-level descriptives. This article studies several network-level consequences of dynamic actor-based models applied to represent cross-sectional network data. Two examples illustrate how network-level characteristics can be obtained as emergent features implied by micro-specifications of actor-based models. PMID:25960578
On the correct implementation of Fermi-Dirac statistics and electron trapping in nonlinear electrostatic plane wave propagation in collisionless plasmas

NASA Astrophysics Data System (ADS)

Schamel, Hans; Eliasson, Bengt

2016-05-01

Quantum statistics and electron trapping have a decisive influence on the propagation characteristics of coherent stationary electrostatic waves. The description of these strictly nonlinear structures, which are of electron hole type and violate linear Vlasov theory due to the particle trapping at any excitation amplitude, is obtained by a correct reduction of the three-dimensional Fermi-Dirac distribution function to one dimension and by a proper incorporation of trapping. For small but finite amplitudes, the holes become of cnoidal wave type and the electron density is shown to be described by a ϕ ( x ) 1 / 2 rather than a ϕ ( x ) expansion, where ϕ ( x ) is the electrostatic potential. The general coefficients are presented for a degenerate plasma as well as the quantum statistical analogue to these steady state coherent structures, including the shape of ϕ ( x ) and the nonlinear dispersion relation, which describes their phase velocity.
Compositional Solution Space Quantification for Probabilistic Software Analysis

NASA Technical Reports Server (NTRS)

Borges, Mateus; Pasareanu, Corina S.; Filieri, Antonio; d'Amorim, Marcelo; Visser, Willem

2014-01-01

Probabilistic software analysis aims at quantifying how likely a target event is to occur during program execution. Current approaches rely on symbolic execution to identify the conditions to reach the target event and try to quantify the fraction of the input domain satisfying these conditions. Precise quantification is usually limited to linear constraints, while only approximate solutions can be provided in general through statistical approaches. However, statistical approaches may fail to converge to an acceptable accuracy within a reasonable time. We present a compositional statistical approach for the efficient quantification of solution spaces for arbitrarily complex constraints over bounded floating-point domains. The approach leverages interval constraint propagation to improve the accuracy of the estimation by focusing the sampling on the regions of the input domain containing the sought solutions. Preliminary experiments show significant improvement on previous approaches both in results accuracy and analysis time.
Estimation of integral curves from high angular resolution diffusion imaging (HARDI) data.

PubMed

Carmichael, Owen; Sakhanenko, Lyudmila

2015-05-15

We develop statistical methodology for a popular brain imaging technique HARDI based on the high order tensor model by Özarslan and Mareci [10]. We investigate how uncertainty in the imaging procedure propagates through all levels of the model: signals, tensor fields, vector fields, and fibers. We construct asymptotically normal estimators of the integral curves or fibers which allow us to trace the fibers together with confidence ellipsoids. The procedure is computationally intense as it blends linear algebra concepts from high order tensors with asymptotical statistical analysis. The theoretical results are illustrated on simulated and real datasets. This work generalizes the statistical methodology proposed for low angular resolution diffusion tensor imaging by Carmichael and Sakhanenko [3], to several fibers per voxel. It is also a pioneering statistical work on tractography from HARDI data. It avoids all the typical limitations of the deterministic tractography methods and it delivers the same information as probabilistic tractography methods. Our method is computationally cheap and it provides well-founded mathematical and statistical framework where diverse functionals on fibers, directions and tensors can be studied in a systematic and rigorous way.
Estimation of integral curves from high angular resolution diffusion imaging (HARDI) data

PubMed Central

Carmichael, Owen; Sakhanenko, Lyudmila

2015-01-01

We develop statistical methodology for a popular brain imaging technique HARDI based on the high order tensor model by Özarslan and Mareci [10]. We investigate how uncertainty in the imaging procedure propagates through all levels of the model: signals, tensor fields, vector fields, and fibers. We construct asymptotically normal estimators of the integral curves or fibers which allow us to trace the fibers together with confidence ellipsoids. The procedure is computationally intense as it blends linear algebra concepts from high order tensors with asymptotical statistical analysis. The theoretical results are illustrated on simulated and real datasets. This work generalizes the statistical methodology proposed for low angular resolution diffusion tensor imaging by Carmichael and Sakhanenko [3], to several fibers per voxel. It is also a pioneering statistical work on tractography from HARDI data. It avoids all the typical limitations of the deterministic tractography methods and it delivers the same information as probabilistic tractography methods. Our method is computationally cheap and it provides well-founded mathematical and statistical framework where diverse functionals on fibers, directions and tensors can be studied in a systematic and rigorous way. PMID:25937674
Statistics of galaxy orientations - Morphology and large-scale structure

NASA Technical Reports Server (NTRS)

Lambas, Diego G.; Groth, Edward J.; Peebles, P. J. E.

1988-01-01

Using the Uppsala General Catalog of bright galaxies and the northern and southern maps of the Lick counts of galaxies, statistical evidence of a morphology-orientation effect is found. Major axes of elliptical galaxies are preferentially oriented along the large-scale features of the Lick maps. However, the orientations of the major axes of spiral and lenticular galaxies show no clear signs of significant nonrandom behavior at a level of less than about one-fifth of the effect seen for ellipticals. The angular scale of the detected alignment effect for Uppsala ellipticals extends to at least theta of about 2 deg, which at a redshift of z of about 0.02 corresponds to a linear scale of about 2/h Mpc.
fMRI paradigm designing and post-processing tools

PubMed Central

James, Jija S; Rajesh, PG; Chandran, Anuvitha VS; Kesavadas, Chandrasekharan

2014-01-01

In this article, we first review some aspects of functional magnetic resonance imaging (fMRI) paradigm designing for major cognitive functions by using stimulus delivery systems like Cogent, E-Prime, Presentation, etc., along with their technical aspects. We also review the stimulus presentation possibilities (block, event-related) for visual or auditory paradigms and their advantage in both clinical and research setting. The second part mainly focus on various fMRI data post-processing tools such as Statistical Parametric Mapping (SPM) and Brain Voyager, and discuss the particulars of various preprocessing steps involved (realignment, co-registration, normalization, smoothing) in these software and also the statistical analysis principles of General Linear Modeling for final interpretation of a functional activation result. PMID:24851001

Interpreting the g loadings of intelligence test composite scores in light of Spearman's law of diminishing returns.

PubMed

Reynolds, Matthew R

2013-03-01

The linear loadings of intelligence test composite scores on a general factor (g) have been investigated recently in factor analytic studies. Spearman's law of diminishing returns (SLODR), however, implies that the g loadings of test scores likely decrease in magnitude as g increases, or they are nonlinear. The purpose of this study was to (a) investigate whether the g loadings of composite scores from the Differential Ability Scales (2nd ed.) (DAS-II, C. D. Elliott, 2007a, Differential Ability Scales (2nd ed.). San Antonio, TX: Pearson) were nonlinear and (b) if they were nonlinear, to compare them with linear g loadings to demonstrate how SLODR alters the interpretation of these loadings. Linear and nonlinear confirmatory factor analysis (CFA) models were used to model Nonverbal Reasoning, Verbal Ability, Visual Spatial Ability, Working Memory, and Processing Speed composite scores in four age groups (5-6, 7-8, 9-13, and 14-17) from the DAS-II norming sample. The nonlinear CFA models provided better fit to the data than did the linear models. In support of SLODR, estimates obtained from the nonlinear CFAs indicated that g loadings decreased as g level increased. The nonlinear portion for the nonverbal reasoning loading, however, was not statistically significant across the age groups. Knowledge of general ability level informs composite score interpretation because g is less likely to produce differences, or is measured less, in those scores at higher g levels. One implication is that it may be more important to examine the pattern of specific abilities at higher general ability levels. PsycINFO Database Record (c) 2013 APA, all rights reserved.
Escaping the snare of chronological growth and launching a free curve alternative: general deviance as latent growth model.

PubMed

Wood, Phillip Karl; Jackson, Kristina M

2013-08-01

Researchers studying longitudinal relationships among multiple problem behaviors sometimes characterize autoregressive relationships across constructs as indicating "protective" or "launch" factors or as "developmental snares." These terms are used to indicate that initial or intermediary states of one problem behavior subsequently inhibit or promote some other problem behavior. Such models are contrasted with models of "general deviance" over time in which all problem behaviors are viewed as indicators of a common linear trajectory. When fit of the "general deviance" model is poor and fit of one or more autoregressive models is good, this is taken as support for the inhibitory or enhancing effect of one construct on another. In this paper, we argue that researchers consider competing models of growth before comparing deviance and time-bound models. Specifically, we propose use of the free curve slope intercept (FCSI) growth model (Meredith & Tisak, 1990) as a general model to typify change in a construct over time. The FCSI model includes, as nested special cases, several statistical models often used for prospective data, such as linear slope intercept models, repeated measures multivariate analysis of variance, various one-factor models, and hierarchical linear models. When considering models involving multiple constructs, we argue the construct of "general deviance" can be expressed as a single-trait multimethod model, permitting a characterization of the deviance construct over time without requiring restrictive assumptions about the form of growth over time. As an example, prospective assessments of problem behaviors from the Dunedin Multidisciplinary Health and Development Study (Silva & Stanton, 1996) are considered and contrasted with earlier analyses of Hussong, Curran, Moffitt, and Caspi (2008), which supported launch and snare hypotheses. For antisocial behavior, the FCSI model fit better than other models, including the linear chronometric growth curve model used by Hussong et al. For models including multiple constructs, a general deviance model involving a single trait and multimethod factors (or a corresponding hierarchical factor model) fit the data better than either the "snares" alternatives or the general deviance model previously considered by Hussong et al. Taken together, the analyses support the view that linkages and turning points cannot be contrasted with general deviance models absent additional experimental intervention or control.
Escaping the snare of chronological growth and launching a free curve alternative: General deviance as latent growth model

PubMed Central

WOOD, PHILLIP KARL; JACKSON, KRISTINA M.

2014-01-01

Researchers studying longitudinal relationships among multiple problem behaviors sometimes characterize autoregressive relationships across constructs as indicating “protective” or “launch” factors or as “developmental snares.” These terms are used to indicate that initial or intermediary states of one problem behavior subsequently inhibit or promote some other problem behavior. Such models are contrasted with models of “general deviance” over time in which all problem behaviors are viewed as indicators of a common linear trajectory. When fit of the “general deviance” model is poor and fit of one or more autoregressive models is good, this is taken as support for the inhibitory or enhancing effect of one construct on another. In this paper, we argue that researchers consider competing models of growth before comparing deviance and time-bound models. Specifically, we propose use of the free curve slope intercept (FCSI) growth model (Meredith & Tisak, 1990) as a general model to typify change in a construct over time. The FCSI model includes, as nested special cases, several statistical models often used for prospective data, such as linear slope intercept models, repeated measures multivariate analysis of variance, various one-factor models, and hierarchical linear models. When considering models involving multiple constructs, we argue the construct of “general deviance” can be expressed as a single-trait multimethod model, permitting a characterization of the deviance construct over time without requiring restrictive assumptions about the form of growth over time. As an example, prospective assessments of problem behaviors from the Dunedin Multidisciplinary Health and Development Study (Silva & Stanton, 1996) are considered and contrasted with earlier analyses of Hussong, Curran, Moffitt, and Caspi (2008), which supported launch and snare hypotheses. For antisocial behavior, the FCSI model fit better than other models, including the linear chronometric growth curve model used by Hussong et al. For models including multiple constructs, a general deviance model involving a single trait and multimethod factors (or a corresponding hierarchical factor model) fit the data better than either the “snares” alternatives or the general deviance model previously considered by Hussong et al. Taken together, the analyses support the view that linkages and turning points cannot be contrasted with general deviance models absent additional experimental intervention or control. PMID:23880389
Machine learning-based methods for prediction of linear B-cell epitopes.

PubMed

Wang, Hsin-Wei; Pai, Tun-Wen

2014-01-01

B-cell epitope prediction facilitates immunologists in designing peptide-based vaccine, diagnostic test, disease prevention, treatment, and antibody production. In comparison with T-cell epitope prediction, the performance of variable length B-cell epitope prediction is still yet to be satisfied. Fortunately, due to increasingly available verified epitope databases, bioinformaticians could adopt machine learning-based algorithms on all curated data to design an improved prediction tool for biomedical researchers. Here, we have reviewed related epitope prediction papers, especially those for linear B-cell epitope prediction. It should be noticed that a combination of selected propensity scales and statistics of epitope residues with machine learning-based tools formulated a general way for constructing linear B-cell epitope prediction systems. It is also observed from most of the comparison results that the kernel method of support vector machine (SVM) classifier outperformed other machine learning-based approaches. Hence, in this chapter, except reviewing recently published papers, we have introduced the fundamentals of B-cell epitope and SVM techniques. In addition, an example of linear B-cell prediction system based on physicochemical features and amino acid combinations is illustrated in details.
Analyzing linear spatial features in ecology.

PubMed

Buettel, Jessie C; Cole, Andrew; Dickey, John M; Brook, Barry W

2018-06-01

The spatial analysis of dimensionless points (e.g., tree locations on a plot map) is common in ecology, for instance using point-process statistics to detect and compare patterns. However, the treatment of one-dimensional linear features (fiber processes) is rarely attempted. Here we appropriate the methods of vector sums and dot products, used regularly in fields like astrophysics, to analyze a data set of mapped linear features (logs) measured in 12 × 1-ha forest plots. For this demonstrative case study, we ask two deceptively simple questions: do trees tend to fall downhill, and if so, does slope gradient matter? Despite noisy data and many potential confounders, we show clearly that topography (slope direction and steepness) of forest plots does matter to treefall. More generally, these results underscore the value of mathematical methods of physics to problems in the spatial analysis of linear features, and the opportunities that interdisciplinary collaboration provides. This work provides scope for a variety of future ecological analyzes of fiber processes in space. © 2018 by the Ecological Society of America.
The Fermi-Pasta-Ulam Problem and Its Underlying Integrable Dynamics: An Approach Through Lyapunov Exponents

NASA Astrophysics Data System (ADS)

Benettin, G.; Pasquali, S.; Ponno, A.

2018-05-01

FPU models, in dimension one, are perturbations either of the linear model or of the Toda model; perturbations of the linear model include the usual β -model, perturbations of Toda include the usual α +β model. In this paper we explore and compare two families, or hierarchies, of FPU models, closer and closer to either the linear or the Toda model, by computing numerically, for each model, the maximal Lyapunov exponent χ . More precisely, we consider statistically typical trajectories and study the asymptotics of χ for large N (the number of particles) and small ɛ (the specific energy E / N), and find, for all models, asymptotic power laws χ ˜eq Cɛ ^a, C and a depending on the model. The asymptotics turns out to be, in general, rather slow, and producing accurate results requires a great computational effort. We also revisit and extend the analytic computation of χ introduced by Casetti, Livi and Pettini, originally formulated for the β -model. With great evidence the theory extends successfully to all models of the linear hierarchy, but not to models close to Toda.
Variable selection for marginal longitudinal generalized linear models.

PubMed

Cantoni, Eva; Flemming, Joanna Mills; Ronchetti, Elvezio

2005-06-01

Variable selection is an essential part of any statistical analysis and yet has been somewhat neglected in the context of longitudinal data analysis. In this article, we propose a generalized version of Mallows's C(p) (GC(p)) suitable for use with both parametric and nonparametric models. GC(p) provides an estimate of a measure of model's adequacy for prediction. We examine its performance with popular marginal longitudinal models (fitted using GEE) and contrast results with what is typically done in practice: variable selection based on Wald-type or score-type tests. An application to real data further demonstrates the merits of our approach while at the same time emphasizing some important robust features inherent to GC(p).
Structural Equation Modeling: A Framework for Ocular and Other Medical Sciences Research

PubMed Central

Christ, Sharon L.; Lee, David J.; Lam, Byron L.; Diane, Zheng D.

2017-01-01

Structural equation modeling (SEM) is a modeling framework that encompasses many types of statistical models and can accommodate a variety of estimation and testing methods. SEM has been used primarily in social sciences but is increasingly used in epidemiology, public health, and the medical sciences. SEM provides many advantages for the analysis of survey and clinical data, including the ability to model latent constructs that may not be directly observable. Another major feature is simultaneous estimation of parameters in systems of equations that may include mediated relationships, correlated dependent variables, and in some instances feedback relationships. SEM allows for the specification of theoretically holistic models because multiple and varied relationships may be estimated together in the same model. SEM has recently expanded by adding generalized linear modeling capabilities that include the simultaneous estimation of parameters of different functional form for outcomes with different distributions in the same model. Therefore, mortality modeling and other relevant health outcomes may be evaluated. Random effects estimation using latent variables has been advanced in the SEM literature and software. In addition, SEM software has increased estimation options. Therefore, modern SEM is quite general and includes model types frequently used by health researchers, including generalized linear modeling, mixed effects linear modeling, and population average modeling. This article does not present any new information. It is meant as an introduction to SEM and its uses in ocular and other health research. PMID:24467557
Statistical properties of the radiation from SASE FEL operating in the linear regime

NASA Astrophysics Data System (ADS)

Saldin, E. L.; Schneidmiller, E. A.; Yurkov, M. V.

1998-02-01

The paper presents comprehensive analysis of statistical properties of the radiation from self amplified spontaneous emission (SASE) free electron laser operating in linear mode. The investigation has been performed in a one-dimensional approximation, assuming the electron pulse length to be much larger than a coherence length of the radiation. The following statistical properties of the SASE FEL radiation have been studied: field correlations, distribution of the radiation energy after monochromator installed at the FEL amplifier exit and photoelectric counting statistics of SASE FEL radiation. It is shown that the radiation from SASE FEL operating in linear regime possesses all the features corresponding to completely chaotic polarized radiation.
Non-linear scaling of a musculoskeletal model of the lower limb using statistical shape models.

PubMed

Nolte, Daniel; Tsang, Chui Kit; Zhang, Kai Yu; Ding, Ziyun; Kedgley, Angela E; Bull, Anthony M J

2016-10-03

Accurate muscle geometry for musculoskeletal models is important to enable accurate subject-specific simulations. Commonly, linear scaling is used to obtain individualised muscle geometry. More advanced methods include non-linear scaling using segmented bone surfaces and manual or semi-automatic digitisation of muscle paths from medical images. In this study, a new scaling method combining non-linear scaling with reconstructions of bone surfaces using statistical shape modelling is presented. Statistical Shape Models (SSMs) of femur and tibia/fibula were used to reconstruct bone surfaces of nine subjects. Reference models were created by morphing manually digitised muscle paths to mean shapes of the SSMs using non-linear transformations and inter-subject variability was calculated. Subject-specific models of muscle attachment and via points were created from three reference models. The accuracy was evaluated by calculating the differences between the scaled and manually digitised models. The points defining the muscle paths showed large inter-subject variability at the thigh and shank - up to 26mm; this was found to limit the accuracy of all studied scaling methods. Errors for the subject-specific muscle point reconstructions of the thigh could be decreased by 9% to 20% by using the non-linear scaling compared to a typical linear scaling method. We conclude that the proposed non-linear scaling method is more accurate than linear scaling methods. Thus, when combined with the ability to reconstruct bone surfaces from incomplete or scattered geometry data using statistical shape models our proposed method is an alternative to linear scaling methods. Copyright © 2016 The Author. Published by Elsevier Ltd.. All rights reserved.
Bayesian reconstruction of projection reconstruction NMR (PR-NMR).

PubMed

Yoon, Ji Won

2014-11-01

Projection reconstruction nuclear magnetic resonance (PR-NMR) is a technique for generating multidimensional NMR spectra. A small number of projections from lower-dimensional NMR spectra are used to reconstruct the multidimensional NMR spectra. In our previous work, it was shown that multidimensional NMR spectra are efficiently reconstructed using peak-by-peak based reversible jump Markov chain Monte Carlo (RJMCMC) algorithm. We propose an extended and generalized RJMCMC algorithm replacing a simple linear model with a linear mixed model to reconstruct close NMR spectra into true spectra. This statistical method generates samples in a Bayesian scheme. Our proposed algorithm is tested on a set of six projections derived from the three-dimensional 700 MHz HNCO spectrum of a protein HasA. Copyright © 2014 Elsevier Ltd. All rights reserved.
Novel formulation of the ℳ model through the Generalized-K distribution for atmospheric optical channels.

PubMed

Garrido-Balsells, José María; Jurado-Navas, Antonio; Paris, José Francisco; Castillo-Vazquez, Miguel; Puerta-Notario, Antonio

2015-03-09

In this paper, a novel and deeper physical interpretation on the recently published Málaga or ℳ statistical distribution is provided. This distribution, which is having a wide acceptance by the scientific community, models the optical irradiance scintillation induced by the atmospheric turbulence. Here, the analytical expressions previously published are modified in order to express them by a mixture of the known Generalized-K and discrete Binomial and Negative Binomial distributions. In particular, the probability density function (pdf) of the ℳ model is now obtained as a linear combination of these Generalized-K pdf, in which the coefficients depend directly on the parameters of the ℳ distribution. In this way, the Málaga model can be physically interpreted as a superposition of different optical sub-channels each of them described by the corresponding Generalized-K fading model and weighted by the ℳ dependent coefficients. The expressions here proposed are simpler than the equations of the original ℳ model and are validated by means of numerical simulations by generating ℳ -distributed random sequences and their associated histogram. This novel interpretation of the Málaga statistical distribution provides a valuable tool for analyzing the performance of atmospheric optical channels for every turbulence condition.
Two Computer Programs for the Statistical Evaluation of a Weighted Linear Composite.

ERIC Educational Resources Information Center

Sands, William A.

1978-01-01

Two computer programs (one batch, one interactive) are designed to provide statistics for a weighted linear combination of several component variables. Both programs provide mean, variance, standard deviation, and a validity coefficient. (Author/JKS)
Linear and Non-linear Information Flows In Rainfall Field

NASA Astrophysics Data System (ADS)

Molini, A.; La Barbera, P.; Lanza, L. G.

The rainfall process is the result of a complex framework of non-linear dynamical in- teractions between the different components of the atmosphere. It preserves the com- plexity and the intermittent features of the generating system in space and time as well as the strong dependence of these properties on the scale of observations. The understanding and quantification of how the non-linearity of the generating process comes to influence the single rain events constitute relevant research issues in the field of hydro-meteorology, especially in those applications where a timely and effective forecasting of heavy rain events is able to reduce the risk of failure. This work focuses on the characterization of the non-linear properties of the observed rain process and on the influence of these features on hydrological models. Among the goals of such a survey is the research of regular structures of the rainfall phenomenon and the study of the information flows within the rain field. The research focuses on three basic evo- lution directions for the system: in time, in space and between the different scales. In fact, the information flows that force the system to evolve represent in general a connection between the different locations in space, the different instants in time and, unless assuming the hypothesis of scale invariance is verified "a priori", the different characteristic scales. A first phase of the analysis is carried out by means of classic statistical methods, then a survey of the information flows within the field is devel- oped by means of techniques borrowed from the Information Theory, and finally an analysis of the rain signal in the time and frequency domains is performed, with par- ticular reference to its intermittent structure. The methods adopted in this last part of the work are both the classic techniques of statistical inference and a few procedures for the detection of non-linear and non-stationary features within the process starting from measured data.
Work-related Mental Consequences: Implications of Burnout on Mental Health Status Among Health Care Providers

PubMed Central

Papathanasiou, Ioanna V.

2015-01-01

Introduction: Burnout can create problems in every aspect of individual’s’ human life. It may have an adverse effect on interpersonal and family relations and can lead to a general negative attitude towards life. Aim: The purpose of this study is to investigate whether burnout is associated with the mental health status of health care providers. Material and Methods: The sample in this study consisted of 240 health care employees. The Greek version of Maslach’s Burnout Inventory (MBI) was used for measuring burnout levels and the Greek version of the Symptoms Rating Scale for Depression and Anxiety (SRSDA) questionnaire was used to evaluate health care providers’ mental health status. Descriptive statistics were initially generated for sample characteristics. Normality was checked by the Kolmogorov-Smirnov test and data was processed with parametric tests. General linear models with MBI dimensions as independent variables and SRSDA subscales as dependent variables were used to determine the relation between burnout and mental health status. Statistics were processed with SPSS v. 17.0 (SPSS, Chicago, IL, USA). Statistical significance was set at p=0.05. Results: The average age of the sample is 40.00±7.95 years. Regarding gender the percentage of men is 21.40% (N=49) and of women is 78.60% (N=180). Overall the professional burnout of health care workers is moderate. The mean score for emotional exhaustion is 26.41, for personal accomplishment 36.70 and for depersonalization 9.81. The mean for each subscale of SRSDA is 8.23±6.79 for Depression Beck-21, 3.96±4.26 for Depression Beck-13, 4.91±4.44 for Melancholia, 6.32±4.35 for Asthenia and 6.36±4.72 for Anxiety. The results of general linear models with the MBI dimensions as independent variables and the SRSDA subscales as dependent variables are shown that emotional exhaustion and personal accomplishment are statistically correlated with all subscales of SRSDA, while depersonalization is not correlated with any SRSDA subscale. Conclusions: Burnout appears to implicate mental health status of healthcare providers in work index. Emotional exhaustion is the burnout dimension that is correlated the most with employees’ mental health. PMID:25870487
Comparing physically-based and statistical landslide susceptibility model outputs - a case study from Lower Austria

NASA Astrophysics Data System (ADS)

Canli, Ekrem; Thiebes, Benni; Petschko, Helene; Glade, Thomas

2015-04-01

By now there is a broad consensus that due to human-induced global change the frequency and magnitude of heavy precipitation events is expected to increase in certain parts of the world. Given the fact, that rainfall serves as the most common triggering agent for landslide initiation, also an increased landside activity can be expected there. Landslide occurrence is a globally spread phenomenon that clearly needs to be handled. The present and well known problems in modelling landslide susceptibility and hazard give uncertain results in the prediction. This includes the lack of a universal applicable modelling solution for adequately assessing landslide susceptibility (which can be seen as the relative indication of the spatial probability of landslide initiation). Generally speaking, there are three major approaches for performing landslide susceptibility analysis: heuristic, statistical and deterministic models, all with different assumptions, its distinctive data requirements and differently interpretable outcomes. Still, detailed comparison of resulting landslide susceptibility maps are rare. In this presentation, the susceptibility modelling outputs of a deterministic model (Stability INdex MAPping - SINMAP) and a statistical modelling approach (generalized additive model - GAM) are compared. SINMAP is an infinite slope stability model which requires parameterization of soil mechanical parameters. Modelling with the generalized additive model, which represents a non-linear extension of a generalized linear model, requires a high quality landslide inventory that serves as the dependent variable in the statistical approach. Both methods rely on topographical data derived from the DTM. The comparison has been carried out in a study area located in the district of Waidhofen/Ybbs in Lower Austria. For the whole district (ca. 132 km²), 1063 landslides have been mapped and partially used within the analysis and the validation of the model outputs. The respective susceptibility maps have been reclassified to contain three susceptibility classes each. The comparison of the susceptibility maps was performed on a grid cell basis. A match of the maps was observed for grid cells located in the same susceptibility class. In contrast, a mismatch or deviation was observed for locations with different assigned susceptibility classes (up to two classes' difference). Although the modelling approaches differ significantly, more than 70% of the pixels reveal a match in the same susceptibility class. A mismatch by two classes' difference occurred in less than 2% of all pixels. Although the result looks promising and strengthens the confidence in the susceptibility zonation for this area, some of the general drawbacks related to the respective approaches still have to be addressed in further detail. Future work is heading towards an integration of probabilistic aspects into deterministic modelling.
Improving validation methods for molecular diagnostics: application of Bland-Altman, Deming and simple linear regression analyses in assay comparison and evaluation for next-generation sequencing

PubMed Central

Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L

2018-01-01

Aims A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R2), using R2 as the primary metric of assay agreement. However, the use of R2 alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. Methods We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Results Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. Conclusions The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. PMID:28747393
A comparison of linear and nonlinear statistical techniques in performance attribution.

PubMed

Chan, N H; Genovese, C R

2001-01-01

Performance attribution is usually conducted under the linear framework of multifactor models. Although commonly used by practitioners in finance, linear multifactor models are known to be less than satisfactory in many situations. After a brief survey of nonlinear methods, nonlinear statistical techniques are applied to performance attribution of a portfolio constructed from a fixed universe of stocks using factors derived from some commonly used cross sectional linear multifactor models. By rebalancing this portfolio monthly, the cumulative returns for procedures based on standard linear multifactor model and three nonlinear techniques-model selection, additive models, and neural networks-are calculated and compared. It is found that the first two nonlinear techniques, especially in combination, outperform the standard linear model. The results in the neural-network case are inconclusive because of the great variety of possible models. Although these methods are more complicated and may require some tuning, toolboxes are developed and suggestions on calibration are proposed. This paper demonstrates the usefulness of modern nonlinear statistical techniques in performance attribution.
A comparison of optimal MIMO linear and nonlinear models for brain machine interfaces

NASA Astrophysics Data System (ADS)

Kim, S.-P.; Sanchez, J. C.; Rao, Y. N.; Erdogmus, D.; Carmena, J. M.; Lebedev, M. A.; Nicolelis, M. A. L.; Principe, J. C.

2006-06-01

The field of brain-machine interfaces requires the estimation of a mapping from spike trains collected in motor cortex areas to the hand kinematics of the behaving animal. This paper presents a systematic investigation of several linear (Wiener filter, LMS adaptive filters, gamma filter, subspace Wiener filters) and nonlinear models (time-delay neural network and local linear switching models) applied to datasets from two experiments in monkeys performing motor tasks (reaching for food and target hitting). Ensembles of 100-200 cortical neurons were simultaneously recorded in these experiments, and even larger neuronal samples are anticipated in the future. Due to the large size of the models (thousands of parameters), the major issue studied was the generalization performance. Every parameter of the models (not only the weights) was selected optimally using signal processing and machine learning techniques. The models were also compared statistically with respect to the Wiener filter as the baseline. Each of the optimization procedures produced improvements over that baseline for either one of the two datasets or both.
A comparison of optimal MIMO linear and nonlinear models for brain-machine interfaces.

PubMed

Kim, S-P; Sanchez, J C; Rao, Y N; Erdogmus, D; Carmena, J M; Lebedev, M A; Nicolelis, M A L; Principe, J C

2006-06-01

The field of brain-machine interfaces requires the estimation of a mapping from spike trains collected in motor cortex areas to the hand kinematics of the behaving animal. This paper presents a systematic investigation of several linear (Wiener filter, LMS adaptive filters, gamma filter, subspace Wiener filters) and nonlinear models (time-delay neural network and local linear switching models) applied to datasets from two experiments in monkeys performing motor tasks (reaching for food and target hitting). Ensembles of 100-200 cortical neurons were simultaneously recorded in these experiments, and even larger neuronal samples are anticipated in the future. Due to the large size of the models (thousands of parameters), the major issue studied was the generalization performance. Every parameter of the models (not only the weights) was selected optimally using signal processing and machine learning techniques. The models were also compared statistically with respect to the Wiener filter as the baseline. Each of the optimization procedures produced improvements over that baseline for either one of the two datasets or both.

Multivariate meta-analysis for non-linear and other multi-parameter associations

PubMed Central

Gasparrini, A; Armstrong, B; Kenward, M G

2012-01-01

In this paper, we formalize the application of multivariate meta-analysis and meta-regression to synthesize estimates of multi-parameter associations obtained from different studies. This modelling approach extends the standard two-stage analysis used to combine results across different sub-groups or populations. The most straightforward application is for the meta-analysis of non-linear relationships, described for example by regression coefficients of splines or other functions, but the methodology easily generalizes to any setting where complex associations are described by multiple correlated parameters. The modelling framework of multivariate meta-analysis is implemented in the package mvmeta within the statistical environment R. As an illustrative example, we propose a two-stage analysis for investigating the non-linear exposure–response relationship between temperature and non-accidental mortality using time-series data from multiple cities. Multivariate meta-analysis represents a useful analytical tool for studying complex associations through a two-stage procedure. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22807043
Large deformation image classification using generalized locality-constrained linear coding.

PubMed

Zhang, Pei; Wee, Chong-Yaw; Niethammer, Marc; Shen, Dinggang; Yap, Pew-Thian

2013-01-01

Magnetic resonance (MR) imaging has been demonstrated to be very useful for clinical diagnosis of Alzheimer's disease (AD). A common approach to using MR images for AD detection is to spatially normalize the images by non-rigid image registration, and then perform statistical analysis on the resulting deformation fields. Due to the high nonlinearity of the deformation field, recent studies suggest to use initial momentum instead as it lies in a linear space and fully encodes the deformation field. In this paper we explore the use of initial momentum for image classification by focusing on the problem of AD detection. Experiments on the public ADNI dataset show that the initial momentum, together with a simple sparse coding technique-locality-constrained linear coding (LLC)--can achieve a classification accuracy that is comparable to or even better than the state of the art. We also show that the performance of LLC can be greatly improved by introducing proper weights to the codebook.
Influence of the nucleus area distribution on the survival fraction after charged particles broad beam irradiation.

PubMed

Wéra, A-C; Barazzuol, L; Jeynes, J C G; Merchant, M J; Suzuki, M; Kirkby, K J

2014-08-07

It is well known that broad beam irradiation with heavy ions leads to variation in the number of hit(s) received by each cell as the distribution of particles follows the Poisson statistics. Although the nucleus area will determine the number of hit(s) received for a given dose, variation amongst its irradiated cell population is generally not considered. In this work, we investigate the effect of the nucleus area's distribution on the survival fraction. More specifically, this work aims to explain the deviation, or tail, which might be observed in the survival fraction at high irradiation doses. For this purpose, the nucleus area distribution was added to the beam Poisson statistics and the Linear-Quadratic model in order to fit the experimental data. As shown in this study, nucleus size variation, and the associated Poisson statistics, can lead to an upward survival trend after broad beam irradiation. The influence of the distribution parameters (mean area and standard deviation) was studied using a normal distribution, along with the Linear-Quadratic model parameters (α and β). Finally, the model proposed here was successfully tested to the survival fraction of LN18 cells irradiated with a 85 keV µm(- 1) carbon ion broad beam for which the distribution in the area of the nucleus had been determined.
Statistics of the stochastically forced Lorenz attractor by the Fokker-Planck equation and cumulant expansions.

PubMed

Allawala, Altan; Marston, J B

2016-11-01

We investigate the Fokker-Planck description of the equal-time statistics of the three-dimensional Lorenz attractor with additive white noise. The invariant measure is found by computing the zero (or null) mode of the linear Fokker-Planck operator as a problem of sparse linear algebra. Two variants are studied: a self-adjoint construction of the linear operator and the replacement of diffusion with hyperdiffusion. We also access the low-order statistics of the system by a perturbative expansion in equal-time cumulants. A comparison is made to statistics obtained by the standard approach of accumulation via direct numerical simulation. Theoretical and computational aspects of the Fokker-Planck and cumulant expansion methods are discussed.
Estimation for general birth-death processes

PubMed Central

Crawford, Forrest W.; Minin, Vladimir N.; Suchard, Marc A.

2013-01-01

Birth-death processes (BDPs) are continuous-time Markov chains that track the number of “particles” in a system over time. While widely used in population biology, genetics and ecology, statistical inference of the instantaneous particle birth and death rates remains largely limited to restrictive linear BDPs in which per-particle birth and death rates are constant. Researchers often observe the number of particles at discrete times, necessitating data augmentation procedures such as expectation-maximization (EM) to find maximum likelihood estimates. For BDPs on finite state-spaces, there are powerful matrix methods for computing the conditional expectations needed for the E-step of the EM algorithm. For BDPs on infinite state-spaces, closed-form solutions for the E-step are available for some linear models, but most previous work has resorted to time-consuming simulation. Remarkably, we show that the E-step conditional expectations can be expressed as convolutions of computable transition probabilities for any general BDP with arbitrary rates. This important observation, along with a convenient continued fraction representation of the Laplace transforms of the transition probabilities, allows for novel and efficient computation of the conditional expectations for all BDPs, eliminating the need for truncation of the state-space or costly simulation. We use this insight to derive EM algorithms that yield maximum likelihood estimation for general BDPs characterized by various rate models, including generalized linear models. We show that our Laplace convolution technique outperforms competing methods when they are available and demonstrate a technique to accelerate EM algorithm convergence. We validate our approach using synthetic data and then apply our methods to cancer cell growth and estimation of mutation parameters in microsatellite evolution. PMID:25328261
Estimation for general birth-death processes.

PubMed

Crawford, Forrest W; Minin, Vladimir N; Suchard, Marc A

2014-04-01

Birth-death processes (BDPs) are continuous-time Markov chains that track the number of "particles" in a system over time. While widely used in population biology, genetics and ecology, statistical inference of the instantaneous particle birth and death rates remains largely limited to restrictive linear BDPs in which per-particle birth and death rates are constant. Researchers often observe the number of particles at discrete times, necessitating data augmentation procedures such as expectation-maximization (EM) to find maximum likelihood estimates. For BDPs on finite state-spaces, there are powerful matrix methods for computing the conditional expectations needed for the E-step of the EM algorithm. For BDPs on infinite state-spaces, closed-form solutions for the E-step are available for some linear models, but most previous work has resorted to time-consuming simulation. Remarkably, we show that the E-step conditional expectations can be expressed as convolutions of computable transition probabilities for any general BDP with arbitrary rates. This important observation, along with a convenient continued fraction representation of the Laplace transforms of the transition probabilities, allows for novel and efficient computation of the conditional expectations for all BDPs, eliminating the need for truncation of the state-space or costly simulation. We use this insight to derive EM algorithms that yield maximum likelihood estimation for general BDPs characterized by various rate models, including generalized linear models. We show that our Laplace convolution technique outperforms competing methods when they are available and demonstrate a technique to accelerate EM algorithm convergence. We validate our approach using synthetic data and then apply our methods to cancer cell growth and estimation of mutation parameters in microsatellite evolution.
Generalized Linear Covariance Analysis

NASA Technical Reports Server (NTRS)

Carpenter, James R.; Markley, F. Landis

2014-01-01

This talk presents a comprehensive approach to filter modeling for generalized covariance analysis of both batch least-squares and sequential estimators. We review and extend in two directions the results of prior work that allowed for partitioning of the state space into solve-for'' and consider'' parameters, accounted for differences between the formal values and the true values of the measurement noise, process noise, and textita priori solve-for and consider covariances, and explicitly partitioned the errors into subspaces containing only the influence of the measurement noise, process noise, and solve-for and consider covariances. In this work, we explicitly add sensitivity analysis to this prior work, and relax an implicit assumption that the batch estimator's epoch time occurs prior to the definitive span. We also apply the method to an integrated orbit and attitude problem, in which gyro and accelerometer errors, though not estimated, influence the orbit determination performance. We illustrate our results using two graphical presentations, which we call the variance sandpile'' and the sensitivity mosaic,'' and we compare the linear covariance results to confidence intervals associated with ensemble statistics from a Monte Carlo analysis.
SAS macro programs for geographically weighted generalized linear modeling with spatial point data: applications to health research.

PubMed

Chen, Vivian Yi-Ju; Yang, Tse-Chuan

2012-08-01

An increasing interest in exploring spatial non-stationarity has generated several specialized analytic software programs; however, few of these programs can be integrated natively into a well-developed statistical environment such as SAS. We not only developed a set of SAS macro programs to fill this gap, but also expanded the geographically weighted generalized linear modeling (GWGLM) by integrating the strengths of SAS into the GWGLM framework. Three features distinguish our work. First, the macro programs of this study provide more kernel weighting functions than the existing programs. Second, with our codes the users are able to better specify the bandwidth selection process compared to the capabilities of existing programs. Third, the development of the macro programs is fully embedded in the SAS environment, providing great potential for future exploration of complicated spatially varying coefficient models in other disciplines. We provided three empirical examples to illustrate the use of the SAS macro programs and demonstrated the advantages explained above. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Unobtrusive Detection of Mild Cognitive Impairment in Older Adults Through Home Monitoring.

PubMed

Akl, Ahmad; Snoek, Jasper; Mihailidis, Alex

2017-03-01

The early detection of dementias such as Alzheimer's disease can in some cases reverse, stop, or slow cognitive decline and in general greatly reduce the burden of care. This is of increasing significance as demographic studies are warning of an aging population in North America and worldwide. Various smart homes and systems have been developed to detect cognitive decline through continuous monitoring of high risk individuals. However, the majority of these smart homes and systems use a number of predefined heuristics to detect changes in cognition, which has been demonstrated to focus on the idiosyncratic nuances of the individual subjects, and thus, does not generalize. In this paper, we address this problem by building generalized linear models of home activity of older adults monitored using unobtrusive sensing technologies. We use inhomogenous Poisson processes to model the presence of the recruited older adults within different rooms throughout the day. We employ an information theoretic approach to compare the generalized linear models learned, and we observe significant statistical differences between the cognitively intact and impaired older adults. Using a simple thresholding approach, we were able to detect mild cognitive impairment in older adults with an average area under the ROC curve of 0.716 and an average area under the precision-recall curve of 0.706 using activity models estimated over a time window of 12 weeks.
Statistical Models for the Analysis of Zero-Inflated Pain Intensity Numeric Rating Scale Data.

PubMed

Goulet, Joseph L; Buta, Eugenia; Bathulapalli, Harini; Gueorguieva, Ralitza; Brandt, Cynthia A

2017-03-01

Pain intensity is often measured in clinical and research settings using the 0 to 10 numeric rating scale (NRS). NRS scores are recorded as discrete values, and in some samples they may display a high proportion of zeroes and a right-skewed distribution. Despite this, statistical methods for normally distributed data are frequently used in the analysis of NRS data. We present results from an observational cross-sectional study examining the association of NRS scores with patient characteristics using data collected from a large cohort of 18,935 veterans in Department of Veterans Affairs care diagnosed with a potentially painful musculoskeletal disorder. The mean (variance) NRS pain was 3.0 (7.5), and 34% of patients reported no pain (NRS = 0). We compared the following statistical models for analyzing NRS scores: linear regression, generalized linear models (Poisson and negative binomial), zero-inflated and hurdle models for data with an excess of zeroes, and a cumulative logit model for ordinal data. We examined model fit, interpretability of results, and whether conclusions about the predictor effects changed across models. In this study, models that accommodate zero inflation provided a better fit than the other models. These models should be considered for the analysis of NRS data with a large proportion of zeroes. We examined and analyzed pain data from a large cohort of veterans with musculoskeletal disorders. We found that many reported no current pain on the NRS on the diagnosis date. We present several alternative statistical methods for the analysis of pain intensity data with a large proportion of zeroes. Published by Elsevier Inc.
An open-access CMIP5 pattern library for temperature and precipitation: Description and methodology

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lynch, Cary D.; Hartin, Corinne A.; Bond-Lamberty, Benjamin

Pattern scaling is used to efficiently emulate general circulation models and explore uncertainty in climate projections under multiple forcing scenarios. Pattern scaling methods assume that local climate changes scale with a global mean temperature increase, allowing for spatial patterns to be generated for multiple models for any future emission scenario. For uncertainty quantification and probabilistic statistical analysis, a library of patterns with descriptive statistics for each file would be beneficial, but such a library does not presently exist. Of the possible techniques used to generate patterns, the two most prominent are the delta and least squared regression methods. We exploremore » the differences and statistical significance between patterns generated by each method and assess performance of the generated patterns across methods and scenarios. Differences in patterns across seasons between methods and epochs were largest in high latitudes (60-90°N/S). Bias and mean errors between modeled and pattern predicted output from the linear regression method were smaller than patterns generated by the delta method. Across scenarios, differences in the linear regression method patterns were more statistically significant, especially at high latitudes. We found that pattern generation methodologies were able to approximate the forced signal of change to within ≤ 0.5°C, but choice of pattern generation methodology for pattern scaling purposes should be informed by user goals and criteria. As a result, this paper describes our library of least squared regression patterns from all CMIP5 models for temperature and precipitation on an annual and sub-annual basis, along with the code used to generate these patterns.« less
An open-access CMIP5 pattern library for temperature and precipitation: Description and methodology

DOE PAGES

Lynch, Cary D.; Hartin, Corinne A.; Bond-Lamberty, Benjamin; ...

2017-05-15

Pattern scaling is used to efficiently emulate general circulation models and explore uncertainty in climate projections under multiple forcing scenarios. Pattern scaling methods assume that local climate changes scale with a global mean temperature increase, allowing for spatial patterns to be generated for multiple models for any future emission scenario. For uncertainty quantification and probabilistic statistical analysis, a library of patterns with descriptive statistics for each file would be beneficial, but such a library does not presently exist. Of the possible techniques used to generate patterns, the two most prominent are the delta and least squared regression methods. We exploremore » the differences and statistical significance between patterns generated by each method and assess performance of the generated patterns across methods and scenarios. Differences in patterns across seasons between methods and epochs were largest in high latitudes (60-90°N/S). Bias and mean errors between modeled and pattern predicted output from the linear regression method were smaller than patterns generated by the delta method. Across scenarios, differences in the linear regression method patterns were more statistically significant, especially at high latitudes. We found that pattern generation methodologies were able to approximate the forced signal of change to within ≤ 0.5°C, but choice of pattern generation methodology for pattern scaling purposes should be informed by user goals and criteria. As a result, this paper describes our library of least squared regression patterns from all CMIP5 models for temperature and precipitation on an annual and sub-annual basis, along with the code used to generate these patterns.« less
Can upstaging of ductal carcinoma in situ be predicted at biopsy by histologic and mammographic features?

NASA Astrophysics Data System (ADS)

Shi, Bibo; Grimm, Lars J.; Mazurowski, Maciej A.; Marks, Jeffrey R.; King, Lorraine M.; Maley, Carlo C.; Hwang, E. Shelley; Lo, Joseph Y.

2017-03-01

Reducing the overdiagnosis and overtreatment associated with ductal carcinoma in situ (DCIS) requires accurate prediction of the invasive potential at cancer screening. In this work, we investigated the utility of pre-operative histologic and mammographic features to predict upstaging of DCIS. The goal was to provide intentionally conservative baseline performance using readily available data from radiologists and pathologists and only linear models. We conducted a retrospective analysis on 99 patients with DCIS. Of those 25 were upstaged to invasive cancer at the time of definitive surgery. Pre-operative factors including both the histologic features extracted from stereotactic core needle biopsy (SCNB) reports and the mammographic features annotated by an expert breast radiologist were investigated with statistical analysis. Furthermore, we built classification models based on those features in an attempt to predict the presence of an occult invasive component in DCIS, with generalization performance assessed by receiver operating characteristic (ROC) curve analysis. Histologic features including nuclear grade and DCIS subtype did not show statistically significant differences between cases with pure DCIS and with DCIS plus invasive disease. However, three mammographic features, i.e., the major axis length of DCIS lesion, the BI-RADS level of suspicion, and radiologist's assessment did achieve the statistical significance. Using those three statistically significant features as input, a linear discriminant model was able to distinguish patients with DCIS plus invasive disease from those with pure DCIS, with AUC-ROC equal to 0.62. Overall, mammograms used for breast screening contain useful information that can be perceived by radiologists and help predict occult invasive components in DCIS.
Nonlinear wave chaos: statistics of second harmonic fields.

PubMed

Zhou, Min; Ott, Edward; Antonsen, Thomas M; Anlage, Steven M

2017-10-01

Concepts from the field of wave chaos have been shown to successfully predict the statistical properties of linear electromagnetic fields in electrically large enclosures. The Random Coupling Model (RCM) describes these properties by incorporating both universal features described by Random Matrix Theory and the system-specific features of particular system realizations. In an effort to extend this approach to the nonlinear domain, we add an active nonlinear frequency-doubling circuit to an otherwise linear wave chaotic system, and we measure the statistical properties of the resulting second harmonic fields. We develop an RCM-based model of this system as two linear chaotic cavities coupled by means of a nonlinear transfer function. The harmonic field strengths are predicted to be the product of two statistical quantities and the nonlinearity characteristics. Statistical results from measurement-based calculation, RCM-based simulation, and direct experimental measurements are compared and show good agreement over many decades of power.
Multilevel modelling: Beyond the basic applications.

PubMed

Wright, Daniel B; London, Kamala

2009-05-01

Over the last 30 years statistical algorithms have been developed to analyse datasets that have a hierarchical/multilevel structure. Particularly within developmental and educational psychology these techniques have become common where the sample has an obvious hierarchical structure, like pupils nested within a classroom. We describe two areas beyond the basic applications of multilevel modelling that are important to psychology: modelling the covariance structure in longitudinal designs and using generalized linear multilevel modelling as an alternative to methods from signal detection theory (SDT). Detailed code for all analyses is described using packages for the freeware R.
Note: Modification of the Gay-Berne potential for improved accuracy and speed

NASA Astrophysics Data System (ADS)

Persson, Rasmus A. X.

2012-06-01

A modification of the Gay-Berne (GB) potential is proposed which is about 10% to 20% more speed efficient and statistically more accurate in reproducing the energy of interaction of two linear Lennard-Jones tetratomics when averaged over all orientations. For the special cases of end-to-end and side-by-side configurations, the new potential is equivalent to the GB one. A simple generalization to dissimilar particles of D∞h symmetry is presented but does not retain the superior agreement with respect to its GB counterpart, except at close range.
Schrödinger equation revisited

PubMed Central

Schleich, Wolfgang P.; Greenberger, Daniel M.; Kobe, Donald H.; Scully, Marlan O.

2013-01-01

The time-dependent Schrödinger equation is a cornerstone of quantum physics and governs all phenomena of the microscopic world. However, despite its importance, its origin is still not widely appreciated and properly understood. We obtain the Schrödinger equation from a mathematical identity by a slight generalization of the formulation of classical statistical mechanics based on the Hamilton–Jacobi equation. This approach brings out most clearly the fact that the linearity of quantum mechanics is intimately connected to the strong coupling between the amplitude and phase of a quantum wave. PMID:23509260
Improving validation methods for molecular diagnostics: application of Bland-Altman, Deming and simple linear regression analyses in assay comparison and evaluation for next-generation sequencing.

PubMed

Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L

2018-02-01

A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R 2 ), using R 2 as the primary metric of assay agreement. However, the use of R 2 alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Nonlinear subdiffusive fractional equations and the aggregation phenomenon.

PubMed

Fedotov, Sergei

2013-09-01

In this article we address the problem of the nonlinear interaction of subdiffusive particles. We introduce the random walk model in which statistical characteristics of a random walker such as escape rate and jump distribution depend on the mean density of particles. We derive a set of nonlinear subdiffusive fractional master equations and consider their diffusion approximations. We show that these equations describe the transition from an intermediate subdiffusive regime to asymptotically normal advection-diffusion transport regime. This transition is governed by nonlinear tempering parameter that generalizes the standard linear tempering. We illustrate the general results through the use of the examples from cell and population biology. We find that a nonuniform anomalous exponent has a strong influence on the aggregation phenomenon.
Efficient Robust Regression via Two-Stage Generalized Empirical Likelihood

PubMed Central

Bondell, Howard D.; Stefanski, Leonard A.

2013-01-01

Large- and finite-sample efficiency and resistance to outliers are the key goals of robust statistics. Although often not simultaneously attainable, we develop and study a linear regression estimator that comes close. Efficiency obtains from the estimator’s close connection to generalized empirical likelihood, and its favorable robustness properties are obtained by constraining the associated sum of (weighted) squared residuals. We prove maximum attainable finite-sample replacement breakdown point, and full asymptotic efficiency for normal errors. Simulation evidence shows that compared to existing robust regression estimators, the new estimator has relatively high efficiency for small sample sizes, and comparable outlier resistance. The estimator is further illustrated and compared to existing methods via application to a real data set with purported outliers. PMID:23976805

Increased skills usage statistically mediates symptom reduction in self-guided internet-delivered cognitive-behavioural therapy for depression and anxiety: a randomised controlled trial.

PubMed

Terides, Matthew D; Dear, Blake F; Fogliati, Vincent J; Gandy, Milena; Karin, Eyal; Jones, Michael P; Titov, Nickolai

2018-01-01

Cognitive-behavioural therapy (CBT) is an effective treatment for clinical and subclinical symptoms of depression and general anxiety, and increases life satisfaction. Patients' usage of CBT skills is a core aspect of treatment but there is insufficient empirical evidence suggesting that skills usage behaviours are a mechanism of clinical change. This study investigated if an internet-delivered CBT (iCBT) intervention increased the frequency of CBT skills usage behaviours and if this statistically mediated reductions in symptoms and increased life satisfaction. A two-group randomised controlled trial was conducted comparing internet-delivered CBT (n = 65) with a waitlist control group (n = 75). Participants were individuals experiencing clinically significant symptoms of depression or general anxiety. Mixed-linear models analyses revealed that the treatment group reported a significantly higher frequency of skills usage, lower symptoms, and higher life satisfaction by the end of treatment compared with the control group. Results from bootstrapping mediation analyses revealed that the increased skills usage behaviours statistically mediated symptom reductions and increased life satisfaction. Although skills usage and symptom outcomes were assessed concurrently, these findings support the notion that iCBT increases the frequency of skills usage behaviours and suggest that this may be an important mechanism of change.
Managing Clustered Data Using Hierarchical Linear Modeling

ERIC Educational Resources Information Center

Warne, Russell T.; Li, Yan; McKyer, E. Lisako J.; Condie, Rachel; Diep, Cassandra S.; Murano, Peter S.

2012-01-01

Researchers in nutrition research often use cluster or multistage sampling to gather participants for their studies. These sampling methods often produce violations of the assumption of data independence that most traditional statistics share. Hierarchical linear modeling is a statistical method that can overcome violations of the independence…
Improving UWB-Based Localization in IoT Scenarios with Statistical Models of Distance Error.

PubMed

Monica, Stefania; Ferrari, Gianluigi

2018-05-17

Interest in the Internet of Things (IoT) is rapidly increasing, as the number of connected devices is exponentially growing. One of the application scenarios envisaged for IoT technologies involves indoor localization and context awareness. In this paper, we focus on a localization approach that relies on a particular type of communication technology, namely Ultra Wide Band (UWB). UWB technology is an attractive choice for indoor localization, owing to its high accuracy. Since localization algorithms typically rely on estimated inter-node distances, the goal of this paper is to evaluate the improvement brought by a simple (linear) statistical model of the distance error. On the basis of an extensive experimental measurement campaign, we propose a general analytical framework, based on a Least Square (LS) method, to derive a novel statistical model for the range estimation error between a pair of UWB nodes. The proposed statistical model is then applied to improve the performance of a few illustrative localization algorithms in various realistic scenarios. The obtained experimental results show that the use of the proposed statistical model improves the accuracy of the considered localization algorithms with a reduction of the localization error up to 66%.
Statistical Methodology for the Analysis of Repeated Duration Data in Behavioral Studies.

PubMed

Letué, Frédérique; Martinez, Marie-José; Samson, Adeline; Vilain, Anne; Vilain, Coriandre

2018-03-15

Repeated duration data are frequently used in behavioral studies. Classical linear or log-linear mixed models are often inadequate to analyze such data, because they usually consist of nonnegative and skew-distributed variables. Therefore, we recommend use of a statistical methodology specific to duration data. We propose a methodology based on Cox mixed models and written under the R language. This semiparametric model is indeed flexible enough to fit duration data. To compare log-linear and Cox mixed models in terms of goodness-of-fit on real data sets, we also provide a procedure based on simulations and quantile-quantile plots. We present two examples from a data set of speech and gesture interactions, which illustrate the limitations of linear and log-linear mixed models, as compared to Cox models. The linear models are not validated on our data, whereas Cox models are. Moreover, in the second example, the Cox model exhibits a significant effect that the linear model does not. We provide methods to select the best-fitting models for repeated duration data and to compare statistical methodologies. In this study, we show that Cox models are best suited to the analysis of our data set.
Reversed inverse regression for the univariate linear calibration and its statistical properties derived using a new methodology

NASA Astrophysics Data System (ADS)

Kang, Pilsang; Koo, Changhoi; Roh, Hokyu

2017-11-01

Since simple linear regression theory was established at the beginning of the 1900s, it has been used in a variety of fields. Unfortunately, it cannot be used directly for calibration. In practical calibrations, the observed measurements (the inputs) are subject to errors, and hence they vary, thus violating the assumption that the inputs are fixed. Therefore, in the case of calibration, the regression line fitted using the method of least squares is not consistent with the statistical properties of simple linear regression as already established based on this assumption. To resolve this problem, "classical regression" and "inverse regression" have been proposed. However, they do not completely resolve the problem. As a fundamental solution, we introduce "reversed inverse regression" along with a new methodology for deriving its statistical properties. In this study, the statistical properties of this regression are derived using the "error propagation rule" and the "method of simultaneous error equations" and are compared with those of the existing regression approaches. The accuracy of the statistical properties thus derived is investigated in a simulation study. We conclude that the newly proposed regression and methodology constitute the complete regression approach for univariate linear calibrations.
Steady induction effects in geomagnetism. Part 1B: Geomagnetic estimation of steady surficial core motions: A non-linear inverse problem

NASA Technical Reports Server (NTRS)

Voorhies, Coerte V.

1993-01-01

The problem of estimating a steady fluid velocity field near the top of Earth's core which induces the secular variation (SV) indicated by models of the observed geomagnetic field is examined in the source-free mantle/frozen-flux core (SFI/VFFC) approximation. This inverse problem is non-linear because solutions of the forward problem are deterministically chaotic. The SFM/FFC approximation is inexact, and neither the models nor the observations they represent are either complete or perfect. A method is developed for solving the non-linear inverse motional induction problem posed by the hypothesis of (piecewise, statistically) steady core surface flow and the supposition of a complete initial geomagnetic condition. The method features iterative solution of the weighted, linearized least-squares problem and admits optional biases favoring surficially geostrophic flow and/or spatially simple flow. Two types of weights are advanced radial field weights for fitting the evolution of the broad-scale portion of the radial field component near Earth's surface implied by the models, and generalized weights for fitting the evolution of the broad-scale portion of the scalar potential specified by the models.
Characterization of Perovskite Oxide/Semiconductor Heterostructures

NASA Astrophysics Data System (ADS)

Walker, Phillip

The tools developed for the use of investigating dynamical systems have provided critical understanding to a wide range of physical phenomena. Here these tools are used to gain further insight into scalar transport, and how it is affected by mixing. The aim of this research is to investigate the efficiency of several different partitioning methods which demarcate flow fields into dynamically distinct regions, and the correlation of finite-time statistics from the advection-diffusion equation to these regions. For autonomous systems, invariant manifold theory can be used to separate the system into dynamically distinct regions. Despite there being no equivalent method for nonautonomous systems, a similar analysis can be done. Systems with general time dependencies must resort to using finite-time transport barriers for partitioning; these barriers are the edges of Lagrangian coherent structures (LCS), the analog to the stable and unstable manifolds of invariant manifold theory. Using the coherent structures of a flow to analyze the statistics of trapping, flight, and residence times, the signature of anomalous diffusion are obtained. This research also investigates the use of linear models for approximating the elements of the covariance matrix of nonlinear flows, and then applying the covariance matrix approximation over coherent regions. The first and second-order moments can be used to fully describe an ensemble evolution in linear systems, however there is no direct method for nonlinear systems. The problem is only compounded by the fact that the moments for nonlinear flows typically don't have analytic representations, therefore direct numerical simulations would be needed to obtain the moments throughout the domain. To circumvent these many computations, the nonlinear system is approximated as many linear systems for which analytic expressions for the moments exist. The parameters introduced in the linear models are obtained locally from the nonlinear deformation tensor.
A Statistical Approach to Passive Target Tracking.

DTIC Science & Technology

1981-04-01

a fixed heading of 90 degrees. For 7F. A. Graybill , An Introduction to Linear Statistical Models , Vol. 1, New York: John Wiley&-Sons -Inc. (1961). 13...likelihood estimators. 12 NCSC TM 311-81 The adjustment for a changing error variance is easy using the linear model approach; i.e., use weighted
Predictors of effects of lifestyle intervention on diabetes mellitus type 2 patients.

PubMed

Jacobsen, Ramune; Vadstrup, Eva; Røder, Michael; Frølich, Anne

2012-01-01

The main aim of the study was to identify predictors of the effects of lifestyle intervention on diabetes mellitus type 2 patients by means of multivariate analysis. Data from a previously published randomised clinical trial, which compared the effects of a rehabilitation programme including standardised education and physical training sessions in the municipality's health care centre with the same duration of individual counseling in the diabetes outpatient clinic, were used. Data from 143 diabetes patients were analysed. The merged lifestyle intervention resulted in statistically significant improvements in patients' systolic blood pressure, waist circumference, exercise capacity, glycaemic control, and some aspects of general health-related quality of life. The linear multivariate regression models explained 45% to 80% of the variance in these improvements. The baseline outcomes in accordance to the logic of the regression to the mean phenomenon were the only statistically significant and robust predictors in all regression models. These results are important from a clinical point of view as they highlight the more urgent need for and better outcomes following lifestyle intervention for those patients who have worse general and disease-specific health.
Cognition of and Demand for Education and Teaching in Medical Statistics in China: A Systematic Review and Meta-Analysis

PubMed Central

Li, Gaoming; Yi, Dali; Wu, Xiaojiao; Liu, Xiaoyu; Zhang, Yanqi; Liu, Ling; Yi, Dong

2015-01-01

Background Although a substantial number of studies focus on the teaching and application of medical statistics in China, few studies comprehensively evaluate the recognition of and demand for medical statistics. In addition, the results of these various studies differ and are insufficiently comprehensive and systematic. Objectives This investigation aimed to evaluate the general cognition of and demand for medical statistics by undergraduates, graduates, and medical staff in China. Methods We performed a comprehensive database search related to the cognition of and demand for medical statistics from January 2007 to July 2014 and conducted a meta-analysis of non-controlled studies with sub-group analysis for undergraduates, graduates, and medical staff. Results There are substantial differences with respect to the cognition of theory in medical statistics among undergraduates (73.5%), graduates (60.7%), and medical staff (39.6%). The demand for theory in medical statistics is high among graduates (94.6%), undergraduates (86.1%), and medical staff (88.3%). Regarding specific statistical methods, the cognition of basic statistical methods is higher than of advanced statistical methods. The demand for certain advanced statistical methods, including (but not limited to) multiple analysis of variance (ANOVA), multiple linear regression, and logistic regression, is higher than that for basic statistical methods. The use rates of the Statistical Package for the Social Sciences (SPSS) software and statistical analysis software (SAS) are only 55% and 15%, respectively. Conclusion The overall statistical competence of undergraduates, graduates, and medical staff is insufficient, and their ability to practically apply their statistical knowledge is limited, which constitutes an unsatisfactory state of affairs for medical statistics education. Because the demand for skills in this area is increasing, the need to reform medical statistics education in China has become urgent. PMID:26053876
Cognition of and Demand for Education and Teaching in Medical Statistics in China: A Systematic Review and Meta-Analysis.

PubMed

Wu, Yazhou; Zhou, Liang; Li, Gaoming; Yi, Dali; Wu, Xiaojiao; Liu, Xiaoyu; Zhang, Yanqi; Liu, Ling; Yi, Dong

2015-01-01

Although a substantial number of studies focus on the teaching and application of medical statistics in China, few studies comprehensively evaluate the recognition of and demand for medical statistics. In addition, the results of these various studies differ and are insufficiently comprehensive and systematic. This investigation aimed to evaluate the general cognition of and demand for medical statistics by undergraduates, graduates, and medical staff in China. We performed a comprehensive database search related to the cognition of and demand for medical statistics from January 2007 to July 2014 and conducted a meta-analysis of non-controlled studies with sub-group analysis for undergraduates, graduates, and medical staff. There are substantial differences with respect to the cognition of theory in medical statistics among undergraduates (73.5%), graduates (60.7%), and medical staff (39.6%). The demand for theory in medical statistics is high among graduates (94.6%), undergraduates (86.1%), and medical staff (88.3%). Regarding specific statistical methods, the cognition of basic statistical methods is higher than of advanced statistical methods. The demand for certain advanced statistical methods, including (but not limited to) multiple analysis of variance (ANOVA), multiple linear regression, and logistic regression, is higher than that for basic statistical methods. The use rates of the Statistical Package for the Social Sciences (SPSS) software and statistical analysis software (SAS) are only 55% and 15%, respectively. The overall statistical competence of undergraduates, graduates, and medical staff is insufficient, and their ability to practically apply their statistical knowledge is limited, which constitutes an unsatisfactory state of affairs for medical statistics education. Because the demand for skills in this area is increasing, the need to reform medical statistics education in China has become urgent.
Non-Gaussian bias: insights from discrete density peaks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Desjacques, Vincent; Riotto, Antonio; Gong, Jinn-Ouk, E-mail: Vincent.Desjacques@unige.ch, E-mail: jinn-ouk.gong@apctp.org, E-mail: Antonio.Riotto@unige.ch

2013-09-01

Corrections induced by primordial non-Gaussianity to the linear halo bias can be computed from a peak-background split or the widespread local bias model. However, numerical simulations clearly support the prediction of the former, in which the non-Gaussian amplitude is proportional to the linear halo bias. To understand better the reasons behind the failure of standard Lagrangian local bias, in which the halo overdensity is a function of the local mass overdensity only, we explore the effect of a primordial bispectrum on the 2-point correlation of discrete density peaks. We show that the effective local bias expansion to peak clustering vastlymore » simplifies the calculation. We generalize this approach to excursion set peaks and demonstrate that the resulting non-Gaussian amplitude, which is a weighted sum of quadratic bias factors, precisely agrees with the peak-background split expectation, which is a logarithmic derivative of the halo mass function with respect to the normalisation amplitude. We point out that statistics of thresholded regions can be computed using the same formalism. Our results suggest that halo clustering statistics can be modelled consistently (in the sense that the Gaussian and non-Gaussian bias factors agree with peak-background split expectations) from a Lagrangian bias relation only if the latter is specified as a set of constraints imposed on the linear density field. This is clearly not the case of standard Lagrangian local bias. Therefore, one is led to consider additional variables beyond the local mass overdensity.« less
Hybrid regulatory models: a statistically tractable approach to model regulatory network dynamics.

PubMed

Ocone, Andrea; Millar, Andrew J; Sanguinetti, Guido

2013-04-01

Computational modelling of the dynamics of gene regulatory networks is a central task of systems biology. For networks of small/medium scale, the dominant paradigm is represented by systems of coupled non-linear ordinary differential equations (ODEs). ODEs afford great mechanistic detail and flexibility, but calibrating these models to data is often an extremely difficult statistical problem. Here, we develop a general statistical inference framework for stochastic transcription-translation networks. We use a coarse-grained approach, which represents the system as a network of stochastic (binary) promoter and (continuous) protein variables. We derive an exact inference algorithm and an efficient variational approximation that allows scalable inference and learning of the model parameters. We demonstrate the power of the approach on two biological case studies, showing that the method allows a high degree of flexibility and is capable of testable novel biological predictions. http://homepages.inf.ed.ac.uk/gsanguin/software.html. Supplementary data are available at Bioinformatics online.
Motivation, values, and work design as drivers of participation in the R open source project for statistical computing

PubMed Central

Mair, Patrick; Hofmann, Eva; Gruber, Kathrin; Hatzinger, Reinhold; Zeileis, Achim; Hornik, Kurt

2015-01-01

One of the cornerstones of the R system for statistical computing is the multitude of packages contributed by numerous package authors. This amount of packages makes an extremely broad range of statistical techniques and other quantitative methods freely available. Thus far, no empirical study has investigated psychological factors that drive authors to participate in the R project. This article presents a study of R package authors, collecting data on different types of participation (number of packages, participation in mailing lists, participation in conferences), three psychological scales (types of motivation, psychological values, and work design characteristics), and various socio-demographic factors. The data are analyzed using item response models and subsequent generalized linear models, showing that the most important determinants for participation are a hybrid form of motivation and the social characteristics of the work design. Other factors are found to have less impact or influence only specific aspects of participation. PMID:26554005
Motivation, values, and work design as drivers of participation in the R open source project for statistical computing.

PubMed

Mair, Patrick; Hofmann, Eva; Gruber, Kathrin; Hatzinger, Reinhold; Zeileis, Achim; Hornik, Kurt

2015-12-01

One of the cornerstones of the R system for statistical computing is the multitude of packages contributed by numerous package authors. This amount of packages makes an extremely broad range of statistical techniques and other quantitative methods freely available. Thus far, no empirical study has investigated psychological factors that drive authors to participate in the R project. This article presents a study of R package authors, collecting data on different types of participation (number of packages, participation in mailing lists, participation in conferences), three psychological scales (types of motivation, psychological values, and work design characteristics), and various socio-demographic factors. The data are analyzed using item response models and subsequent generalized linear models, showing that the most important determinants for participation are a hybrid form of motivation and the social characteristics of the work design. Other factors are found to have less impact or influence only specific aspects of participation.
Correlation and simple linear regression.

PubMed

Zou, Kelly H; Tuncali, Kemal; Silverman, Stuart G

2003-06-01

In this tutorial article, the concepts of correlation and regression are reviewed and demonstrated. The authors review and compare two correlation coefficients, the Pearson correlation coefficient and the Spearman rho, for measuring linear and nonlinear relationships between two continuous variables. In the case of measuring the linear relationship between a predictor and an outcome variable, simple linear regression analysis is conducted. These statistical concepts are illustrated by using a data set from published literature to assess a computed tomography-guided interventional technique. These statistical methods are important for exploring the relationships between variables and can be applied to many radiologic studies.
Near-road air pollutant concentrations of CO and PM 2.5: A comparison of MOBILE6.2/CALINE4 and generalized additive models

NASA Astrophysics Data System (ADS)

Zhang, Kai; Batterman, Stuart

2010-05-01

The contribution of vehicular traffic to air pollutant concentrations is often difficult to establish. This paper utilizes both time-series and simulation models to estimate vehicle contributions to pollutant levels near roadways. The time-series model used generalized additive models (GAMs) and fitted pollutant observations to traffic counts and meteorological variables. A one year period (2004) was analyzed on a seasonal basis using hourly measurements of carbon monoxide (CO) and particulate matter less than 2.5 μm in diameter (PM 2.5) monitored near a major highway in Detroit, Michigan, along with hourly traffic counts and local meteorological data. Traffic counts showed statistically significant and approximately linear relationships with CO concentrations in fall, and piecewise linear relationships in spring, summer and winter. The same period was simulated using emission and dispersion models (Motor Vehicle Emissions Factor Model/MOBILE6.2; California Line Source Dispersion Model/CALINE4). CO emissions derived from the GAM were similar, on average, to those estimated by MOBILE6.2. The same analyses for PM 2.5 showed that GAM emission estimates were much higher (by 4-5 times) than the dispersion model results, and that the traffic-PM 2.5 relationship varied seasonally. This analysis suggests that the simulation model performed reasonably well for CO, but it significantly underestimated PM 2.5 concentrations, a likely result of underestimating PM 2.5 emission factors. Comparisons between statistical and simulation models can help identify model deficiencies and improve estimates of vehicle emissions and near-road air quality.
Statistical image quantification toward optimal scan fusion and change quantification

NASA Astrophysics Data System (ADS)

Potesil, Vaclav; Zhou, Xiang Sean

2007-03-01

Recent advance of imaging technology has brought new challenges and opportunities for automatic and quantitative analysis of medical images. With broader accessibility of more imaging modalities for more patients, fusion of modalities/scans from one time point and longitudinal analysis of changes across time points have become the two most critical differentiators to support more informed, more reliable and more reproducible diagnosis and therapy decisions. Unfortunately, scan fusion and longitudinal analysis are both inherently plagued with increased levels of statistical errors. A lack of comprehensive analysis by imaging scientists and a lack of full awareness by physicians pose potential risks in clinical practice. In this paper, we discuss several key error factors affecting imaging quantification, studying their interactions, and introducing a simulation strategy to establish general error bounds for change quantification across time. We quantitatively show that image resolution, voxel anisotropy, lesion size, eccentricity, and orientation are all contributing factors to quantification error; and there is an intricate relationship between voxel anisotropy and lesion shape in affecting quantification error. Specifically, when two or more scans are to be fused at feature level, optimal linear fusion analysis reveals that scans with voxel anisotropy aligned with lesion elongation should receive a higher weight than other scans. As a result of such optimal linear fusion, we will achieve a lower variance than naïve averaging. Simulated experiments are used to validate theoretical predictions. Future work based on the proposed simulation methods may lead to general guidelines and error lower bounds for quantitative image analysis and change detection.
Generalized Polynomial Chaos Based Uncertainty Quantification for Planning MRgLITT Procedures

PubMed Central

Fahrenholtz, S.; Stafford, R. J.; Maier, F.; Hazle, J. D.; Fuentes, D.

2014-01-01

Purpose A generalized polynomial chaos (gPC) method is used to incorporate constitutive parameter uncertainties within the Pennes representation of bioheat transfer phenomena. The stochastic temperature predictions of the mathematical model are critically evaluated against MR thermometry data for planning MR-guided Laser Induced Thermal Therapies (MRgLITT). Methods Pennes bioheat transfer model coupled with a diffusion theory approximation of laser tissue interaction was implemented as the underlying deterministic kernel. A probabilistic sensitivity study was used to identify parameters that provide the most variance in temperature output. Confidence intervals of the temperature predictions are compared to MR temperature imaging (MRTI) obtained during phantom and in vivo canine (n=4) MRgLITT experiments. The gPC predictions were quantitatively compared to MRTI data using probabilistic linear and temporal profiles as well as 2-D 60 °C isotherms. Results Within the range of physically meaningful constitutive values relevant to the ablative temperature regime of MRgLITT, the sensitivity study indicated that the optical parameters, particularly the anisotropy factor, created the most variance in the stochastic model's output temperature prediction. Further, within the statistical sense considered, a nonlinear model of the temperature and damage dependent perfusion, absorption, and scattering is captured within the confidence intervals of the linear gPC method. Multivariate stochastic model predictions using parameters with the dominant sensitivities show good agreement with experimental MRTI data. Conclusions Given parameter uncertainties and mathematical modeling approximations of the Pennes bioheat model, the statistical framework demonstrates conservative estimates of the therapeutic heating and has potential for use as a computational prediction tool for thermal therapy planning. PMID:23692295
Beyond δ: Tailoring marked statistics to reveal modified gravity

NASA Astrophysics Data System (ADS)

Valogiannis, Georgios; Bean, Rachel

2018-01-01

Models which attempt to explain the accelerated expansion of the universe through large-scale modifications to General Relativity (GR), must satisfy the stringent experimental constraints of GR in the solar system. Viable candidates invoke a “screening” mechanism, that dynamically suppresses deviations in high density environments, making their overall detection challenging even for ambitious future large-scale structure surveys. We present methods to efficiently simulate the non-linear properties of such theories, and consider how a series of statistics that reweight the density field to accentuate deviations from GR can be applied to enhance the overall signal-to-noise ratio in differentiating the models from GR. Our results demonstrate that the cosmic density field can yield additional, invaluable cosmological information, beyond the simple density power spectrum, that will enable surveys to more confidently discriminate between modified gravity models and ΛCDM.

Performance Equivalence and Validation of the Soleris Automated System for Quantitative Microbial Content Testing Using Pure Suspension Cultures.

PubMed

Limberg, Brian J; Johnstone, Kevin; Filloon, Thomas; Catrenich, Carl

2016-09-01

Using United States Pharmacopeia-National Formulary (USP-NF) general method <1223> guidance, the Soleris(®) automated system and reagents (Nonfermenting Total Viable Count for bacteria and Direct Yeast and Mold for yeast and mold) were validated, using a performance equivalence approach, as an alternative to plate counting for total microbial content analysis using five representative microbes: Staphylococcus aureus, Bacillus subtilis, Pseudomonas aeruginosa, Candida albicans, and Aspergillus brasiliensis. Detection times (DTs) in the alternative automated system were linearly correlated to CFU/sample (R(2) = 0.94-0.97) with ≥70% accuracy per USP General Chapter <1223> guidance. The LOD and LOQ of the automated system were statistically similar to the traditional plate count method. This system was significantly more precise than plate counting (RSD 1.2-2.9% for DT, 7.8-40.6% for plate counts), was statistically comparable to plate counting with respect to variations in analyst, vial lots, and instruments, and was robust when variations in the operating detection thresholds (dTs; ±2 units) were used. The automated system produced accurate results, was more precise and less labor-intensive, and met or exceeded criteria for a valid alternative quantitative method, consistent with USP-NF general method <1223> guidance.
[Analysis of the technical efficiency of hospitals in the Spanish National Health Service].

PubMed

Pérez-Romero, Carmen; Ortega-Díaz, M Isabel; Ocaña-Riola, Ricardo; Martín-Martín, José Jesús

To analyse the technical efficiency and productivity of general hospitals in the Spanish National Health Service (NHS) (2010-2012) and identify explanatory hospital and regional variables. 230 NHS hospitals were analysed by data envelopment analysis for overall, technical and scale efficiency, and Malmquist index. The robustness of the analysis is contrasted with alternative input-output models. A fixed effects multilevel cross-sectional linear model was used to analyse the explanatory efficiency variables. The average rate of overall technical efficiency (OTE) was 0.736 in 2012; there was considerable variability by region. Malmquist index (2010-2012) is 1.013. A 23% variability in OTE is attributable to the region in question. Statistically significant exogenous variables (residents per 100 physicians, aging index, average annual income per household, essential public service expenditure and public health expenditure per capita) explain 42% of the OTE variability between hospitals and 64% between regions. The number of residents showed a statistically significant relationship. As regards regions, there is a statistically significant direct linear association between OTE and annual income per capita and essential public service expenditure, and an indirect association with the aging index and annual public health expenditure per capita. The significant room for improvement in the efficiency of hospitals is conditioned by region-specific characteristics, specifically aging, wealth and the public expenditure policies of each one. Copyright © 2016 SESPAS. Publicado por Elsevier España, S.L.U. All rights reserved.
Statistical Methodology for the Analysis of Repeated Duration Data in Behavioral Studies

ERIC Educational Resources Information Center

Letué, Frédérique; Martinez, Marie-José; Samson, Adeline; Vilain, Anne; Vilain, Coriandre

2018-01-01

Purpose: Repeated duration data are frequently used in behavioral studies. Classical linear or log-linear mixed models are often inadequate to analyze such data, because they usually consist of nonnegative and skew-distributed variables. Therefore, we recommend use of a statistical methodology specific to duration data. Method: We propose a…
On the Stability of Jump-Linear Systems Driven by Finite-State Machines with Markovian Inputs

NASA Technical Reports Server (NTRS)

Patilkulkarni, Sudarshan; Herencia-Zapana, Heber; Gray, W. Steven; Gonzalez, Oscar R.

2004-01-01

This paper presents two mean-square stability tests for a jump-linear system driven by a finite-state machine with a first-order Markovian input process. The first test is based on conventional Markov jump-linear theory and avoids the use of any higher-order statistics. The second test is developed directly using the higher-order statistics of the machine s output process. The two approaches are illustrated with a simple model for a recoverable computer control system.
Coherence solution for bidirectional reflectance distributions of surfaces with wavelength-scale statistics.

PubMed

Hoover, Brian G; Gamiz, Victor L

2006-02-01

The scalar bidirectional reflectance distribution function (BRDF) due to a perfectly conducting surface with roughness and autocorrelation width comparable with the illumination wavelength is derived from coherence theory on the assumption of a random reflective phase screen and an expansion valid for large effective roughness. A general quadratic expansion of the two-dimensional isotropic surface autocorrelation function near the origin yields representative Cauchy and Gaussian BRDF solutions and an intermediate general solution as the sum of an incoherent component and a nonspecular coherent component proportional to an integral of the plasma dispersion function in the complex plane. Plots illustrate agreement of the derived general solution with original bistatic BRDF data due to a machined aluminum surface, and comparisons are drawn with previously published data in the examination of variations with incident angle, roughness, illumination wavelength, and autocorrelation coefficients in the bistatic and monostatic geometries. The general quadratic autocorrelation expansion provides a BRDF solution that smoothly interpolates between the well-known results of the linear and parabolic approximations.
Helical tomotherapy to LINAC plan conversion utilizing RayStation Fallback planning.

PubMed

Zhang, Xin; Penagaricano, Jose; Narayanasamy, Ganesh; Corry, Peter; Liu, TianXiao; Sanjay, Maraboyina; Paudel, Nava; Morrill, Steven

2017-01-01

RaySearch RayStation Fallback (FB) planning module can generate an equivalent backup radiotherapy treatment plan facilitating treatment on other linear accelerators. FB plans were generated from the RayStation FB module by simulating the original plan target and organ at risk (OAR) dose distribution and delivered in various backup linear accelerators. In this study, helical tomotherapy (HT) backup plans used in Varian TrueBeam linear accelerator were generated with the RayStation FB module. About 30 patients, 10 with lung cancer, 10 with head and neck (HN) cancer, and 10 with prostate cancer, who were treated with HT, were included in this study. Intensity-modulated radiotherapy Fallback plans (FB-IMRT) were generated for all patients, and three-dimensional conformal radiotherapy Fallback plans (FB-3D) were only generated for lung cancer patients. Dosimetric comparison study evaluated FB plans based on dose coverage to 95% of the PTV volume (R 95 ), PTV mean dose (D mean ), Paddick's conformity index (CI), and dose homogeneity index (HI). The evaluation results showed that all IMRT plans were statistically comparable between HT and FB-IMRT plans except that PTV HI was worse in prostate, and PTV R 95 and HI were worse in HN multitarget plans for FB-IMRT plans. For 3D lung cancer plans, only the PTV R 95 was statistically comparable between HT and FB-3D plans, PTV D mean was higher, and CI and HI were worse compared to HT plans. The FB plans using a TrueBeam linear accelerator generally offer better OAR sparing compared to HT plans for all the patients. In this study, all cases of FB-IMRT plans and 9/10 cases of FB-3D plans were clinically acceptable without further modification and optimization once the FB plans were generated. However, the statistical differences between HT and FB-IMRT/3D plans might not be of any clinically significant. One FB-3D plan failed to simulate the original plan without further optimization. © 2017 The Authors. Journal of Applied Clinical Medical Physics published by Wiley Periodicals, Inc. on behalf of American Association of Physicists in Medicine.
An open-access CMIP5 pattern library for temperature and precipitation: description and methodology

NASA Astrophysics Data System (ADS)

Lynch, Cary; Hartin, Corinne; Bond-Lamberty, Ben; Kravitz, Ben

2017-05-01

Pattern scaling is used to efficiently emulate general circulation models and explore uncertainty in climate projections under multiple forcing scenarios. Pattern scaling methods assume that local climate changes scale with a global mean temperature increase, allowing for spatial patterns to be generated for multiple models for any future emission scenario. For uncertainty quantification and probabilistic statistical analysis, a library of patterns with descriptive statistics for each file would be beneficial, but such a library does not presently exist. Of the possible techniques used to generate patterns, the two most prominent are the delta and least squares regression methods. We explore the differences and statistical significance between patterns generated by each method and assess performance of the generated patterns across methods and scenarios. Differences in patterns across seasons between methods and epochs were largest in high latitudes (60-90° N/S). Bias and mean errors between modeled and pattern-predicted output from the linear regression method were smaller than patterns generated by the delta method. Across scenarios, differences in the linear regression method patterns were more statistically significant, especially at high latitudes. We found that pattern generation methodologies were able to approximate the forced signal of change to within ≤ 0.5 °C, but the choice of pattern generation methodology for pattern scaling purposes should be informed by user goals and criteria. This paper describes our library of least squares regression patterns from all CMIP5 models for temperature and precipitation on an annual and sub-annual basis, along with the code used to generate these patterns. The dataset and netCDF data generation code are available at doi:10.5281/zenodo.495632.
Meteorological influences on the interannual variability of meningitis incidence in northwest Nigeria.

NASA Astrophysics Data System (ADS)

Abdussalam, Auwal; Monaghan, Andrew; Dukic, Vanja; Hayden, Mary; Hopson, Thomas; Leckebusch, Gregor

2013-04-01

Northwest Nigeria is a region with high risk of bacterial meningitis. Since the first documented epidemic of meningitis in Nigeria in 1905, the disease has been endemic in the northern part of the country, with epidemics occurring regularly. In this study we examine the influence of climate on the interannual variability of meningitis incidence and epidemics. Monthly aggregate counts of clinically confirmed hospital-reported cases of meningitis were collected in northwest Nigeria for the 22-year period spanning 1990-2011. Several generalized linear statistical models were fit to the monthly meningitis counts, including generalized additive models. Explanatory variables included monthly records of temperatures, humidity, rainfall, wind speed, sunshine and dustiness from weather stations nearest to the hospitals, and a time series of polysaccharide vaccination efficacy. The effects of other confounding factors -- i.e., mainly non-climatic factors for which records were not available -- were estimated as a smooth, monthly-varying function of time in the generalized additive models. Results reveal that the most important explanatory climatic variables are mean maximum monthly temperature, relative humidity and dustiness. Accounting for confounding factors (e.g., social processes) in the generalized additive models explains more of the year-to-year variation of meningococcal disease compared to those generalized linear models that do not account for such factors. Promising results from several models that included only explanatory variables that preceded the meningitis case data by 1-month suggest there may be potential for prediction of meningitis in northwest Nigeria to aid decision makers on this time scale.
Fusion yield: Guderley model and Tsallis statistics

NASA Astrophysics Data System (ADS)

Haubold, H. J.; Kumar, D.

2011-02-01

The reaction rate probability integral is extended from Maxwell-Boltzmann approach to a more general approach by using the pathway model introduced by Mathai in 2005 (A pathway to matrix-variate gamma and normal densities. Linear Algebr. Appl. 396, 317-328). The extended thermonuclear reaction rate is obtained in the closed form via a Meijer's G-function and the so-obtained G-function is represented as a solution of a homogeneous linear differential equation. A physical model for the hydrodynamical process in a fusion plasma-compressed and laser-driven spherical shock wave is used for evaluating the fusion energy integral by integrating the extended thermonuclear reaction rate integral over the temperature. The result obtained is compared with the standard fusion yield obtained by Haubold and John in 1981 (Analytical representation of the thermonuclear reaction rate and fusion energy production in a spherical plasma shock wave. Plasma Phys. 23, 399-411). An interpretation for the pathway parameter is also given.
Genetic mixed linear models for twin survival data.

PubMed

Ha, Il Do; Lee, Youngjo; Pawitan, Yudi

2007-07-01

Twin studies are useful for assessing the relative importance of genetic or heritable component from the environmental component. In this paper we develop a methodology to study the heritability of age-at-onset or lifespan traits, with application to analysis of twin survival data. Due to limited period of observation, the data can be left truncated and right censored (LTRC). Under the LTRC setting we propose a genetic mixed linear model, which allows general fixed predictors and random components to capture genetic and environmental effects. Inferences are based upon the hierarchical-likelihood (h-likelihood), which provides a statistically efficient and unified framework for various mixed-effect models. We also propose a simple and fast computation method for dealing with large data sets. The method is illustrated by the survival data from the Swedish Twin Registry. Finally, a simulation study is carried out to evaluate its performance.
Analysis of Cross-Sectional Univariate Measurements for Family Dyads Using Linear Mixed Modeling

PubMed Central

Knafl, George J.; Dixon, Jane K.; O'Malley, Jean P.; Grey, Margaret; Deatrick, Janet A.; Gallo, Agatha M.; Knafl, Kathleen A.

2010-01-01

Outcome measurements from members of the same family are likely correlated. Such intrafamilial correlation (IFC) is an important dimension of the family as a unit but is not always accounted for in analyses of family data. This article demonstrates the use of linear mixed modeling to account for IFC in the important special case of univariate measurements for family dyads collected at a single point in time. Example analyses of data from partnered parents having a child with a chronic condition on their child's adaptation to the condition and on the family's general functioning and management of the condition are provided. Analyses of this kind are reasonably straightforward to generate with popular statistical tools. Thus, it is recommended that IFC be reported as standard practice reflecting the fact that a family dyad is more than just the aggregate of two individuals. Moreover, not accounting for IFC can affect the conclusions. PMID:19307316
Supervised Learning for Dynamical System Learning.

PubMed

Hefny, Ahmed; Downey, Carlton; Gordon, Geoffrey J

2015-01-01

Recently there has been substantial interest in spectral methods for learning dynamical systems. These methods are popular since they often offer a good tradeoff between computational and statistical efficiency. Unfortunately, they can be difficult to use and extend in practice: e.g., they can make it difficult to incorporate prior information such as sparsity or structure. To address this problem, we present a new view of dynamical system learning: we show how to learn dynamical systems by solving a sequence of ordinary supervised learning problems, thereby allowing users to incorporate prior knowledge via standard techniques such as L 1 regularization. Many existing spectral methods are special cases of this new framework, using linear regression as the supervised learner. We demonstrate the effectiveness of our framework by showing examples where nonlinear regression or lasso let us learn better state representations than plain linear regression does; the correctness of these instances follows directly from our general analysis.
The three-dimensional structure of cumulus clouds over the ocean. 1: Structural analysis

NASA Technical Reports Server (NTRS)

Kuo, Kwo-Sen; Welch, Ronald M.; Weger, Ronald C.; Engelstad, Mark A.; Sengupta, S. K.

1993-01-01

Thermal channel (channel 6, 10.4-12.5 micrometers) images of five Landsat thematic mapper cumulus scenes over the ocean are examined. These images are thresholded using the standard International Satellite Cloud Climatology Project (ISCCP) thermal threshold algorithm. The individual clouds in the cloud fields are segmented to obtain their structural statistics which include size distribution, orientation angle, horizontal aspect ratio, and perimeter-to-area (PtA) relationship. The cloud size distributions exhibit a double power law with the smaller clouds having a smaller absolute exponent. The cloud orientation angles, horizontal aspect ratios, and PtA exponents are found in good agreement with earlier studies. A technique also is developed to recognize individual cells within a cloud so that statistics of cloud cellular structure can be obtained. Cell structural statistics are computed for each cloud. Unicellular clouds are generally smaller (less than or equal to 1 km) and have smaller PtA exponents, while multicellular clouds are larger (greater than or equal to 1 km) and have larger PtA exponents. Cell structural statistics are similar to those of the smaller clouds. When each cell is approximated as a quadric surface using a linear least squares fit, most cells have the shape of a hyperboloid of one sheet, but about 15% of the cells are best modeled by a hyperboloid of two sheets. Less than 1% of the clouds are ellipsoidal. The number of cells in a cloud increases slightly faster than linearly with increasing cloud size. The mean nearest neighbor distance between cells in a cloud, however, appears to increase linearly with increasing cloud size and to reach a maximum when the cloud effective diameter is about 10 km; then it decreases with increasing cloud size. Sensitivity studies of threshold and lapse rate show that neither has a significant impact upon the results. A goodness-of-fit ratio is used to provide a quantitative measure of the individual cloud results. Significantly improved results are obtained after applying a smoothing operator, suggesting the eliminating subresolution scale variations with higher spatial resolution may yield even better shape analyses.
Equilibrium statistical-thermal models in high-energy physics

NASA Astrophysics Data System (ADS)

Tawfik, Abdel Nasser

2014-05-01

We review some recent highlights from the applications of statistical-thermal models to different experimental measurements and lattice QCD thermodynamics that have been made during the last decade. We start with a short review of the historical milestones on the path of constructing statistical-thermal models for heavy-ion physics. We discovered that Heinz Koppe formulated in 1948, an almost complete recipe for the statistical-thermal models. In 1950, Enrico Fermi generalized this statistical approach, in which he started with a general cross-section formula and inserted into it, the simplifying assumptions about the matrix element of the interaction process that likely reflects many features of the high-energy reactions dominated by density in the phase space of final states. In 1964, Hagedorn systematically analyzed the high-energy phenomena using all tools of statistical physics and introduced the concept of limiting temperature based on the statistical bootstrap model. It turns to be quite often that many-particle systems can be studied with the help of statistical-thermal methods. The analysis of yield multiplicities in high-energy collisions gives an overwhelming evidence for the chemical equilibrium in the final state. The strange particles might be an exception, as they are suppressed at lower beam energies. However, their relative yields fulfill statistical equilibrium, as well. We review the equilibrium statistical-thermal models for particle production, fluctuations and collective flow in heavy-ion experiments. We also review their reproduction of the lattice QCD thermodynamics at vanishing and finite chemical potential. During the last decade, five conditions have been suggested to describe the universal behavior of the chemical freeze-out parameters. The higher order moments of multiplicity have been discussed. They offer deep insights about particle production and to critical fluctuations. Therefore, we use them to describe the freeze-out parameters and suggest the location of the QCD critical endpoint. Various extensions have been proposed in order to take into consideration the possible deviations of the ideal hadron gas. We highlight various types of interactions, dissipative properties and location-dependences (spatial rapidity). Furthermore, we review three models combining hadronic with partonic phases; quasi-particle model, linear sigma model with Polyakov potentials and compressible bag model.
A power comparison of generalized additive models and the spatial scan statistic in a case-control setting.

PubMed

Young, Robin L; Weinberg, Janice; Vieira, Verónica; Ozonoff, Al; Webster, Thomas F

2010-07-19

A common, important problem in spatial epidemiology is measuring and identifying variation in disease risk across a study region. In application of statistical methods, the problem has two parts. First, spatial variation in risk must be detected across the study region and, second, areas of increased or decreased risk must be correctly identified. The location of such areas may give clues to environmental sources of exposure and disease etiology. One statistical method applicable in spatial epidemiologic settings is a generalized additive model (GAM) which can be applied with a bivariate LOESS smoother to account for geographic location as a possible predictor of disease status. A natural hypothesis when applying this method is whether residential location of subjects is associated with the outcome, i.e. is the smoothing term necessary? Permutation tests are a reasonable hypothesis testing method and provide adequate power under a simple alternative hypothesis. These tests have yet to be compared to other spatial statistics. This research uses simulated point data generated under three alternative hypotheses to evaluate the properties of the permutation methods and compare them to the popular spatial scan statistic in a case-control setting. Case 1 was a single circular cluster centered in a circular study region. The spatial scan statistic had the highest power though the GAM method estimates did not fall far behind. Case 2 was a single point source located at the center of a circular cluster and Case 3 was a line source at the center of the horizontal axis of a square study region. Each had linearly decreasing logodds with distance from the point. The GAM methods outperformed the scan statistic in Cases 2 and 3. Comparing sensitivity, measured as the proportion of the exposure source correctly identified as high or low risk, the GAM methods outperformed the scan statistic in all three Cases. The GAM permutation testing methods provide a regression-based alternative to the spatial scan statistic. Across all hypotheses examined in this research, the GAM methods had competing or greater power estimates and sensitivities exceeding that of the spatial scan statistic.
A power comparison of generalized additive models and the spatial scan statistic in a case-control setting

PubMed Central

2010-01-01

Background A common, important problem in spatial epidemiology is measuring and identifying variation in disease risk across a study region. In application of statistical methods, the problem has two parts. First, spatial variation in risk must be detected across the study region and, second, areas of increased or decreased risk must be correctly identified. The location of such areas may give clues to environmental sources of exposure and disease etiology. One statistical method applicable in spatial epidemiologic settings is a generalized additive model (GAM) which can be applied with a bivariate LOESS smoother to account for geographic location as a possible predictor of disease status. A natural hypothesis when applying this method is whether residential location of subjects is associated with the outcome, i.e. is the smoothing term necessary? Permutation tests are a reasonable hypothesis testing method and provide adequate power under a simple alternative hypothesis. These tests have yet to be compared to other spatial statistics. Results This research uses simulated point data generated under three alternative hypotheses to evaluate the properties of the permutation methods and compare them to the popular spatial scan statistic in a case-control setting. Case 1 was a single circular cluster centered in a circular study region. The spatial scan statistic had the highest power though the GAM method estimates did not fall far behind. Case 2 was a single point source located at the center of a circular cluster and Case 3 was a line source at the center of the horizontal axis of a square study region. Each had linearly decreasing logodds with distance from the point. The GAM methods outperformed the scan statistic in Cases 2 and 3. Comparing sensitivity, measured as the proportion of the exposure source correctly identified as high or low risk, the GAM methods outperformed the scan statistic in all three Cases. Conclusions The GAM permutation testing methods provide a regression-based alternative to the spatial scan statistic. Across all hypotheses examined in this research, the GAM methods had competing or greater power estimates and sensitivities exceeding that of the spatial scan statistic. PMID:20642827
An Application of Interactive Computer Graphics to the Study of Inferential Statistics and the General Linear Model

DTIC Science & Technology

1991-09-01

matrix, the Regression Sum of Squares (SSR) and Error Sum of Squares (SSE) are also displayed as a percentage of the Total Sum of Squares ( SSTO ...vector when the student compares the SSR to the SSE. In addition to the plot, the actual values of SSR, SSE, and SSTO are also provided. Figure 3 gives the...Es ainSpace = E 3 Error- Eor Space =n t! L . Pro~cio q Yonto Pro~rct on of Y onto the simaton, pac ror Space SSR SSEL0.20 IV = 14,1 +IErrorI 2 SSTO
[Age index and an interpretation of survivorship curves (author's transl)].

PubMed

Lohmann, W

1977-01-01

Clinical investigations showed that the age dependences of physiological functions do not show -- as generally assumed -- a linear increase with age, but an exponential one. Considering this result one can easily interpret the survivorship curve of a population (Gompertz plot). The only thing that is required is that the probability of death (death rate) is proportional to a function of ageing given by mu(t) = mu0 exp (alpha t). Considering survivorship curves resulting from annual death statistics and fitting them by suitable parameters, then the resulting alpha-values are in agreement with clinical data.
Powerless fluxes and forces, and change of scale in irreversible thermodynamics

NASA Astrophysics Data System (ADS)

Ostoja-Starzewski, M.; Zubelewicz, A.

2011-08-01

We show that the dissipation function of linear processes in continuum thermomechanics may be treated as the average of the statistically fluctuating dissipation rate on either coarse or small spatial scales. The first case involves thermodynamic orthogonality due to Ziegler, while the second one involves powerless forces in a general solution of the Clausius-Duhem inequality according to Poincaré and Edelen. This formulation is demonstrated using the example of parabolic versus hyperbolic heat conduction. The existence of macroscopic powerless heat fluxes is traced here to the hidden dissipative processes at lower temporal and spatial scales.
Zonal flow as pattern formation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Parker, Jeffrey B.; Krommes, John A.

2013-10-15

Zonal flows are well known to arise spontaneously out of turbulence. We show that for statistically averaged equations of the stochastically forced generalized Hasegawa-Mima model, steady-state zonal flows, and inhomogeneous turbulence fit into the framework of pattern formation. There are many implications. First, the wavelength of the zonal flows is not unique. Indeed, in an idealized, infinite system, any wavelength within a certain continuous band corresponds to a solution. Second, of these wavelengths, only those within a smaller subband are linearly stable. Unstable wavelengths must evolve to reach a stable wavelength; this process manifests as merging jets.

An optimization model to agroindustrial sector in antioquia (Colombia, South America)

NASA Astrophysics Data System (ADS)

Fernandez, J.

2015-06-01

This paper develops a proposal of a general optimization model for the flower industry, which is defined by using discrete simulation and nonlinear optimization, whose mathematical models have been solved by using ProModel simulation tools and Gams optimization. It defines the operations that constitute the production and marketing of the sector, statistically validated data taken directly from each operation through field work, the discrete simulation model of the operations and the linear optimization model of the entire industry chain are raised. The model is solved with the tools described above and presents the results validated in a case study.
A Method of Relating General Circulation Model Simulated Climate to the Observed Local Climate. Part I: Seasonal Statistics.

NASA Astrophysics Data System (ADS)

Karl, Thomas R.; Wang, Wei-Chyung; Schlesinger, Michael E.; Knight, Richard W.; Portman, David

1990-10-01

Important surface observations such as the daily maximum and minimum temperature, daily precipitation, and cloud ceilings often have localized characteristics that are difficult to reproduce with the current resolution and the physical parameterizations in state-of-the-art General Circulation climate Models (GCMs). Many of the difficulties can be partially attributed to mismatches in scale, local topography. regional geography and boundary conditions between models and surface-based observations. Here, we present a method, called climatological projection by model statistics (CPMS), to relate GCM grid-point flee-atmosphere statistics, the predictors, to these important local surface observations. The method can be viewed as a generalization of the model output statistics (MOS) and perfect prog (PP) procedures used in numerical weather prediction (NWP) models. It consists of the application of three statistical methods: 1) principle component analysis (FICA), 2) canonical correlation, and 3) inflated regression analysis. The PCA reduces the redundancy of the predictors The canonical correlation is used to develop simultaneous relationships between linear combinations of the predictors, the canonical variables, and the surface-based observations. Finally, inflated regression is used to relate the important canonical variables to each of the surface-based observed variables.We demonstrate that even an early version of the Oregon State University two-level atmospheric GCM (with prescribed sea surface temperature) produces free-atmosphere statistics than can, when standardized using the model's internal means and variances (the MOS-like version of CPMS), closely approximate the observed local climate. When the model data are standardized by the observed free-atmosphere means and variances (the PP version of CPMS), however, the model does not reproduce the observed surface climate as well. Our results indicate that in the MOS-like version of CPMS the differences between the output of a ten-year GCM control run and the surface-based observations are often smaller than the differences between the observations of two ten-year periods. Such positive results suggest that GCMs may already contain important climatological information that can be used to infer the local climate.
Multivariate mixed linear model analysis of longitudinal data: an information-rich statistical technique for analyzing disease resistance data

USDA-ARS?s Scientific Manuscript database

The mixed linear model (MLM) is currently among the most advanced and flexible statistical modeling techniques and its use in tackling problems in plant pathology has begun surfacing in the literature. The longitudinal MLM is a multivariate extension that handles repeatedly measured data, such as r...
Permutation inference for the general linear model

PubMed Central

Winkler, Anderson M.; Ridgway, Gerard R.; Webster, Matthew A.; Smith, Stephen M.; Nichols, Thomas E.

2014-01-01

Permutation methods can provide exact control of false positives and allow the use of non-standard statistics, making only weak assumptions about the data. With the availability of fast and inexpensive computing, their main limitation would be some lack of flexibility to work with arbitrary experimental designs. In this paper we report on results on approximate permutation methods that are more flexible with respect to the experimental design and nuisance variables, and conduct detailed simulations to identify the best method for settings that are typical for imaging research scenarios. We present a generic framework for permutation inference for complex general linear models (glms) when the errors are exchangeable and/or have a symmetric distribution, and show that, even in the presence of nuisance effects, these permutation inferences are powerful while providing excellent control of false positives in a wide range of common and relevant imaging research scenarios. We also demonstrate how the inference on glm parameters, originally intended for independent data, can be used in certain special but useful cases in which independence is violated. Detailed examples of common neuroimaging applications are provided, as well as a complete algorithm – the “randomise” algorithm – for permutation inference with the glm. PMID:24530839
Automatic optimal filament segmentation with sub-pixel accuracy using generalized linear models and B-spline level-sets

PubMed Central

Xiao, Xun; Geyer, Veikko F.; Bowne-Anderson, Hugo; Howard, Jonathon; Sbalzarini, Ivo F.

2016-01-01

Biological filaments, such as actin filaments, microtubules, and cilia, are often imaged using different light-microscopy techniques. Reconstructing the filament curve from the acquired images constitutes the filament segmentation problem. Since filaments have lower dimensionality than the image itself, there is an inherent trade-off between tracing the filament with sub-pixel accuracy and avoiding noise artifacts. Here, we present a globally optimal filament segmentation method based on B-spline vector level-sets and a generalized linear model for the pixel intensity statistics. We show that the resulting optimization problem is convex and can hence be solved with global optimality. We introduce a simple and efficient algorithm to compute such optimal filament segmentations, and provide an open-source implementation as an ImageJ/Fiji plugin. We further derive an information-theoretic lower bound on the filament segmentation error, quantifying how well an algorithm could possibly do given the information in the image. We show that our algorithm asymptotically reaches this bound in the spline coefficients. We validate our method in comprehensive benchmarks, compare with other methods, and show applications from fluorescence, phase-contrast, and dark-field microscopy. PMID:27104582
The Bayesian group lasso for confounded spatial data

USGS Publications Warehouse

Hefley, Trevor J.; Hooten, Mevin B.; Hanks, Ephraim M.; Russell, Robin E.; Walsh, Daniel P.

2017-01-01

Generalized linear mixed models for spatial processes are widely used in applied statistics. In many applications of the spatial generalized linear mixed model (SGLMM), the goal is to obtain inference about regression coefficients while achieving optimal predictive ability. When implementing the SGLMM, multicollinearity among covariates and the spatial random effects can make computation challenging and influence inference. We present a Bayesian group lasso prior with a single tuning parameter that can be chosen to optimize predictive ability of the SGLMM and jointly regularize the regression coefficients and spatial random effect. We implement the group lasso SGLMM using efficient Markov chain Monte Carlo (MCMC) algorithms and demonstrate how multicollinearity among covariates and the spatial random effect can be monitored as a derived quantity. To test our method, we compared several parameterizations of the SGLMM using simulated data and two examples from plant ecology and disease ecology. In all examples, problematic levels multicollinearity occurred and influenced sampling efficiency and inference. We found that the group lasso prior resulted in roughly twice the effective sample size for MCMC samples of regression coefficients and can have higher and less variable predictive accuracy based on out-of-sample data when compared to the standard SGLMM.
Bayesian Inference for Generalized Linear Models for Spiking Neurons

PubMed Central

Gerwinn, Sebastian; Macke, Jakob H.; Bethge, Matthias

2010-01-01

Generalized Linear Models (GLMs) are commonly used statistical methods for modelling the relationship between neural population activity and presented stimuli. When the dimension of the parameter space is large, strong regularization has to be used in order to fit GLMs to datasets of realistic size without overfitting. By imposing properly chosen priors over parameters, Bayesian inference provides an effective and principled approach for achieving regularization. Here we show how the posterior distribution over model parameters of GLMs can be approximated by a Gaussian using the Expectation Propagation algorithm. In this way, we obtain an estimate of the posterior mean and posterior covariance, allowing us to calculate Bayesian confidence intervals that characterize the uncertainty about the optimal solution. From the posterior we also obtain a different point estimate, namely the posterior mean as opposed to the commonly used maximum a posteriori estimate. We systematically compare the different inference techniques on simulated as well as on multi-electrode recordings of retinal ganglion cells, and explore the effects of the chosen prior and the performance measure used. We find that good performance can be achieved by choosing an Laplace prior together with the posterior mean estimate. PMID:20577627
Generalized structural equations improve sexual-selection analyses

PubMed Central

Santini, Giacomo; Marchetti, Giovanni Maria; Focardi, Stefano

2017-01-01

Sexual selection is an intense evolutionary force, which operates through competition for the access to breeding resources. There are many cases where male copulatory success is highly asymmetric, and few males are able to sire most females. Two main hypotheses were proposed to explain this asymmetry: “female choice” and “male dominance”. The literature reports contrasting results. This variability may reflect actual differences among studied populations, but it may also be generated by methodological differences and statistical shortcomings in data analysis. A review of the statistical methods used so far in lek studies, shows a prevalence of Linear Models (LM) and Generalized Linear Models (GLM) which may be affected by problems in inferring cause-effect relationships; multi-collinearity among explanatory variables and erroneous handling of non-normal and non-continuous distributions of the response variable. In lek breeding, selective pressure is maximal, because large numbers of males and females congregate in small arenas. We used a dataset on lekking fallow deer (Dama dama), to contrast the methods and procedures employed so far, and we propose a novel approach based on Generalized Structural Equations Models (GSEMs). GSEMs combine the power and flexibility of both SEM and GLM in a unified modeling framework. We showed that LMs fail to identify several important predictors of male copulatory success and yields very imprecise parameter estimates. Minor variations in data transformation yield wide changes in results and the method appears unreliable. GLMs improved the analysis, but GSEMs provided better results, because the use of latent variables decreases the impact of measurement errors. Using GSEMs, we were able to test contrasting hypotheses and calculate both direct and indirect effects, and we reached a high precision of the estimates, which implies a high predictive ability. In synthesis, we recommend the use of GSEMs in studies on lekking behaviour, and we provide guidelines to implement these models. PMID:28809923
ADME evaluation in drug discovery. 1. Applications of genetic algorithms to the prediction of blood-brain partitioning of a large set of drugs.

PubMed

Hou, Tingjun; Xu, Xiaojie

2002-12-01

In this study, the relationships between the brain-blood concentration ratio of 96 structurally diverse compounds with a large number of structurally derived descriptors were investigated. The linear models were based on molecular descriptors that can be calculated for any compound simply from a knowledge of its molecular structure. The linear correlation coefficients of the models were optimized by genetic algorithms (GAs), and the descriptors used in the linear models were automatically selected from 27 structurally derived descriptors. The GA optimizations resulted in a group of linear models with three or four molecular descriptors with good statistical significance. The change of descriptor use as the evolution proceeds demonstrates that the octane/water partition coefficient and the partial negative solvent-accessible surface area multiplied by the negative charge are crucial to brain-blood barrier permeability. Moreover, we found that the predictions using multiple QSPR models from GA optimization gave quite good results in spite of the diversity of structures, which was better than the predictions using the best single model. The predictions for the two external sets with 37 diverse compounds using multiple QSPR models indicate that the best linear models with four descriptors are sufficiently effective for predictive use. Considering the ease of computation of the descriptors, the linear models may be used as general utilities to screen the blood-brain barrier partitioning of drugs in a high-throughput fashion.
Reaction Event Counting Statistics of Biopolymer Reaction Systems with Dynamic Heterogeneity.

PubMed

Lim, Yu Rim; Park, Seong Jun; Park, Bo Jung; Cao, Jianshu; Silbey, Robert J; Sung, Jaeyoung

2012-04-10

We investigate the reaction event counting statistics (RECS) of an elementary biopolymer reaction in which the rate coefficient is dependent on states of the biopolymer and the surrounding environment and discover a universal kinetic phase transition in the RECS of the reaction system with dynamic heterogeneity. From an exact analysis for a general model of elementary biopolymer reactions, we find that the variance in the number of reaction events is dependent on the square of the mean number of the reaction events when the size of measurement time is small on the relaxation time scale of rate coefficient fluctuations, which does not conform to renewal statistics. On the other hand, when the size of the measurement time interval is much greater than the relaxation time of rate coefficient fluctuations, the variance becomes linearly proportional to the mean reaction number in accordance with renewal statistics. Gillespie's stochastic simulation method is generalized for the reaction system with a rate coefficient fluctuation. The simulation results confirm the correctness of the analytic results for the time dependent mean and variance of the reaction event number distribution. On the basis of the obtained results, we propose a method of quantitative analysis for the reaction event counting statistics of reaction systems with rate coefficient fluctuations, which enables one to extract information about the magnitude and the relaxation times of the fluctuating reaction rate coefficient, without a bias that can be introduced by assuming a particular kinetic model of conformational dynamics and the conformation dependent reactivity. An exact relationship is established between a higher moment of the reaction event number distribution and the multitime correlation of the reaction rate for the reaction system with a nonequilibrium initial state distribution as well as for the system with the equilibrium initial state distribution.
Statistical downscaling of general-circulation-model- simulated average monthly air temperature to the beginning of flowering of the dandelion (Taraxacum officinale) in Slovenia

NASA Astrophysics Data System (ADS)

Bergant, Klemen; Kajfež-Bogataj, Lučka; Črepinšek, Zalika

2002-02-01

Phenological observations are a valuable source of information for investigating the relationship between climate variation and plant development. Potential climate change in the future will shift the occurrence of phenological phases. Information about future climate conditions is needed in order to estimate this shift. General circulation models (GCM) provide the best information about future climate change. They are able to simulate reliably the most important mean features on a large scale, but they fail on a regional scale because of their low spatial resolution. A common approach to bridging the scale gap is statistical downscaling, which was used to relate the beginning of flowering of Taraxacum officinale in Slovenia with the monthly mean near-surface air temperature for January, February and March in Central Europe. Statistical models were developed and tested with NCAR/NCEP Reanalysis predictor data and EARS predictand data for the period 1960-1999. Prior to developing statistical models, empirical orthogonal function (EOF) analysis was employed on the predictor data. Multiple linear regression was used to relate the beginning of flowering with expansion coefficients of the first three EOF for the Janauary, Febrauary and March air temperatures, and a strong correlation was found between them. Developed statistical models were employed on the results of two GCM (HadCM3 and ECHAM4/OPYC3) to estimate the potential shifts in the beginning of flowering for the periods 1990-2019 and 2020-2049 in comparison with the period 1960-1989. The HadCM3 model predicts, on average, 4 days earlier occurrence and ECHAM4/OPYC3 5 days earlier occurrence of flowering in the period 1990-2019. The analogous results for the period 2020-2049 are a 10- and 11-day earlier occurrence.
Experimental design and data analysis of Ago-RIP-Seq experiments for the identification of microRNA targets.

PubMed

Tichy, Diana; Pickl, Julia Maria Anna; Benner, Axel; Sültmann, Holger

2017-03-31

The identification of microRNA (miRNA) target genes is crucial for understanding miRNA function. Many methods for the genome-wide miRNA target identification have been developed in recent years; however, they have several limitations including the dependence on low-confident prediction programs and artificial miRNA manipulations. Ago-RNA immunoprecipitation combined with high-throughput sequencing (Ago-RIP-Seq) is a promising alternative. However, appropriate statistical data analysis algorithms taking into account the experimental design and the inherent noise of such experiments are largely lacking.Here, we investigate the experimental design for Ago-RIP-Seq and examine biostatistical methods to identify de novo miRNA target genes. Statistical approaches considered are either based on a negative binomial model fit to the read count data or applied to transformed data using a normal distribution-based generalized linear model. We compare them by a real data simulation study using plasmode data sets and evaluate the suitability of the approaches to detect true miRNA targets by sensitivity and false discovery rates. Our results suggest that simple approaches like linear regression models on (appropriately) transformed read count data are preferable. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Improving medium-range ensemble streamflow forecasts through statistical post-processing

NASA Astrophysics Data System (ADS)

Mendoza, Pablo; Wood, Andy; Clark, Elizabeth; Nijssen, Bart; Clark, Martyn; Ramos, Maria-Helena; Nowak, Kenneth; Arnold, Jeffrey

2017-04-01

Probabilistic hydrologic forecasts are a powerful source of information for decision-making in water resources operations. A common approach is the hydrologic model-based generation of streamflow forecast ensembles, which can be implemented to account for different sources of uncertainties - e.g., from initial hydrologic conditions (IHCs), weather forecasts, and hydrologic model structure and parameters. In practice, hydrologic ensemble forecasts typically have biases and spread errors stemming from errors in the aforementioned elements, resulting in a degradation of probabilistic properties. In this work, we compare several statistical post-processing techniques applied to medium-range ensemble streamflow forecasts obtained with the System for Hydromet Applications, Research and Prediction (SHARP). SHARP is a fully automated prediction system for the assessment and demonstration of short-term to seasonal streamflow forecasting applications, developed by the National Center for Atmospheric Research, University of Washington, U.S. Army Corps of Engineers, and U.S. Bureau of Reclamation. The suite of post-processing techniques includes linear blending, quantile mapping, extended logistic regression, quantile regression, ensemble analogs, and the generalized linear model post-processor (GLMPP). We assess and compare these techniques using multi-year hindcasts in several river basins in the western US. This presentation discusses preliminary findings about the effectiveness of the techniques for improving probabilistic skill, reliability, discrimination, sharpness and resolution.
Synthetic data sets for the identification of key ingredients for RNA-seq differential analysis.

PubMed

Rigaill, Guillem; Balzergue, Sandrine; Brunaud, Véronique; Blondet, Eddy; Rau, Andrea; Rogier, Odile; Caius, José; Maugis-Rabusseau, Cathy; Soubigou-Taconnat, Ludivine; Aubourg, Sébastien; Lurin, Claire; Martin-Magniette, Marie-Laure; Delannoy, Etienne

2018-01-01

Numerous statistical pipelines are now available for the differential analysis of gene expression measured with RNA-sequencing technology. Most of them are based on similar statistical frameworks after normalization, differing primarily in the choice of data distribution, mean and variance estimation strategy and data filtering. We propose an evaluation of the impact of these choices when few biological replicates are available through the use of synthetic data sets. This framework is based on real data sets and allows the exploration of various scenarios differing in the proportion of non-differentially expressed genes. Hence, it provides an evaluation of the key ingredients of the differential analysis, free of the biases associated with the simulation of data using parametric models. Our results show the relevance of a proper modeling of the mean by using linear or generalized linear modeling. Once the mean is properly modeled, the impact of the other parameters on the performance of the test is much less important. Finally, we propose to use the simple visualization of the raw P-value histogram as a practical evaluation criterion of the performance of differential analysis methods on real data sets. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Combined statistical analyses for long-term stability data with multiple storage conditions: a simulation study.

PubMed

Almalik, Osama; Nijhuis, Michiel B; van den Heuvel, Edwin R

2014-01-01

Shelf-life estimation usually requires that at least three registration batches are tested for stability at multiple storage conditions. The shelf-life estimates are often obtained by linear regression analysis per storage condition, an approach implicitly suggested by ICH guideline Q1E. A linear regression analysis combining all data from multiple storage conditions was recently proposed in the literature when variances are homogeneous across storage conditions. The combined analysis is expected to perform better than the separate analysis per storage condition, since pooling data would lead to an improved estimate of the variation and higher numbers of degrees of freedom, but this is not evident for shelf-life estimation. Indeed, the two approaches treat the observed initial batch results, the intercepts in the model, and poolability of batches differently, which may eliminate or reduce the expected advantage of the combined approach with respect to the separate approach. Therefore, a simulation study was performed to compare the distribution of simulated shelf-life estimates on several characteristics between the two approaches and to quantify the difference in shelf-life estimates. In general, the combined statistical analysis does estimate the true shelf life more consistently and precisely than the analysis per storage condition, but it did not outperform the separate analysis in all circumstances.
A constrained regularization method for inverting data represented by linear algebraic or integral equations

NASA Astrophysics Data System (ADS)

Provencher, Stephen W.

1982-09-01

CONTIN is a portable Fortran IV package for inverting noisy linear operator equations. These problems occur in the analysis of data from a wide variety experiments. They are generally ill-posed problems, which means that errors in an unregularized inversion are unbounded. Instead, CONTIN seeks the optimal solution by incorporating parsimony and any statistical prior knowledge into the regularizor and absolute prior knowledge into equallity and inequality constraints. This can be greatly increase the resolution and accuracyh of the solution. CONTIN is very flexible, consisting of a core of about 50 subprograms plus 13 small "USER" subprograms, which the user can easily modify to specify special-purpose constraints, regularizors, operator equations, simulations, statistical weighting, etc. Specjial collections of USER subprograms are available for photon correlation spectroscopy, multicomponent spectra, and Fourier-Bessel, Fourier and Laplace transforms. Numerically stable algorithms are used throughout CONTIN. A fairly precise definition of information content in terms of degrees of freedom is given. The regularization parameter can be automatically chosen on the basis of an F-test and confidence region. The interpretation of the latter and of error estimates based on the covariance matrix of the constrained regularized solution are discussed. The strategies, methods and options in CONTIN are outlined. The program itself is described in the following paper.
The reliability and reproducibility of cephalometric measurements: a comparison of conventional and digital methods

PubMed Central

AlBarakati, SF; Kula, KS; Ghoneima, AA

2012-01-01

Objective The aim of this study was to assess the reliability and reproducibility of angular and linear measurements of conventional and digital cephalometric methods. Methods A total of 13 landmarks and 16 skeletal and dental parameters were defined and measured on pre-treatment cephalometric radiographs of 30 patients. The conventional and digital tracings and measurements were performed twice by the same examiner with a 6 week interval between measurements. The reliability within the method was determined using Pearson's correlation coefficient (r2). The reproducibility between methods was calculated by paired t-test. The level of statistical significance was set at p < 0.05. Results All measurements for each method were above 0.90 r2 (strong correlation) except maxillary length, which had a correlation of 0.82 for conventional tracing. Significant differences between the two methods were observed in most angular and linear measurements except for ANB angle (p = 0.5), angle of convexity (p = 0.09), anterior cranial base (p = 0.3) and the lower anterior facial height (p = 0.6). Conclusion In general, both methods of conventional and digital cephalometric analysis are highly reliable. Although the reproducibility of the two methods showed some statistically significant differences, most differences were not clinically significant. PMID:22184624
Folded concave penalized sparse linear regression: sparsity, statistical performance, and algorithmic theory for local solutions.

PubMed

Liu, Hongcheng; Yao, Tao; Li, Runze; Ye, Yinyu

2017-11-01

This paper concerns the folded concave penalized sparse linear regression (FCPSLR), a class of popular sparse recovery methods. Although FCPSLR yields desirable recovery performance when solved globally, computing a global solution is NP-complete. Despite some existing statistical performance analyses on local minimizers or on specific FCPSLR-based learning algorithms, it still remains open questions whether local solutions that are known to admit fully polynomial-time approximation schemes (FPTAS) may already be sufficient to ensure the statistical performance, and whether that statistical performance can be non-contingent on the specific designs of computing procedures. To address the questions, this paper presents the following threefold results: (i) Any local solution (stationary point) is a sparse estimator, under some conditions on the parameters of the folded concave penalties. (ii) Perhaps more importantly, any local solution satisfying a significant subspace second-order necessary condition (S 3 ONC), which is weaker than the second-order KKT condition, yields a bounded error in approximating the true parameter with high probability. In addition, if the minimal signal strength is sufficient, the S 3 ONC solution likely recovers the oracle solution. This result also explicates that the goal of improving the statistical performance is consistent with the optimization criteria of minimizing the suboptimality gap in solving the non-convex programming formulation of FCPSLR. (iii) We apply (ii) to the special case of FCPSLR with minimax concave penalty (MCP) and show that under the restricted eigenvalue condition, any S 3 ONC solution with a better objective value than the Lasso solution entails the strong oracle property. In addition, such a solution generates a model error (ME) comparable to the optimal but exponential-time sparse estimator given a sufficient sample size, while the worst-case ME is comparable to the Lasso in general. Furthermore, to guarantee the S 3 ONC admits FPTAS.
Risk prediction for myocardial infarction via generalized functional regression models.

PubMed

Ieva, Francesca; Paganoni, Anna M

2016-08-01

In this paper, we propose a generalized functional linear regression model for a binary outcome indicating the presence/absence of a cardiac disease with multivariate functional data among the relevant predictors. In particular, the motivating aim is the analysis of electrocardiographic traces of patients whose pre-hospital electrocardiogram (ECG) has been sent to 118 Dispatch Center of Milan (the Italian free-toll number for emergencies) by life support personnel of the basic rescue units. The statistical analysis starts with a preprocessing of ECGs treated as multivariate functional data. The signals are reconstructed from noisy observations. The biological variability is then removed by a nonlinear registration procedure based on landmarks. Thus, in order to perform a data-driven dimensional reduction, a multivariate functional principal component analysis is carried out on the variance-covariance matrix of the reconstructed and registered ECGs and their first derivatives. We use the scores of the Principal Components decomposition as covariates in a generalized linear model to predict the presence of the disease in a new patient. Hence, a new semi-automatic diagnostic procedure is proposed to estimate the risk of infarction (in the case of interest, the probability of being affected by Left Bundle Brunch Block). The performance of this classification method is evaluated and compared with other methods proposed in literature. Finally, the robustness of the procedure is checked via leave-j-out techniques. © The Author(s) 2013.
Online and offline tools for head movement compensation in MEG.

PubMed

Stolk, Arjen; Todorovic, Ana; Schoffelen, Jan-Mathijs; Oostenveld, Robert

2013-03-01

Magnetoencephalography (MEG) is measured above the head, which makes it sensitive to variations of the head position with respect to the sensors. Head movements blur the topography of the neuronal sources of the MEG signal, increase localization errors, and reduce statistical sensitivity. Here we describe two novel and readily applicable methods that compensate for the detrimental effects of head motion on the statistical sensitivity of MEG experiments. First, we introduce an online procedure that continuously monitors head position. Second, we describe an offline analysis method that takes into account the head position time-series. We quantify the performance of these methods in the context of three different experimental settings, involving somatosensory, visual and auditory stimuli, assessing both individual and group-level statistics. The online head localization procedure allowed for optimal repositioning of the subjects over multiple sessions, resulting in a 28% reduction of the variance in dipole position and an improvement of up to 15% in statistical sensitivity. Offline incorporation of the head position time-series into the general linear model resulted in improvements of group-level statistical sensitivity between 15% and 29%. These tools can substantially reduce the influence of head movement within and between sessions, increasing the sensitivity of many cognitive neuroscience experiments. Copyright © 2012 Elsevier Inc. All rights reserved.

Managing heteroscedasticity in general linear models.

PubMed

Rosopa, Patrick J; Schaffer, Meline M; Schroeder, Amber N

2013-09-01

Heteroscedasticity refers to a phenomenon where data violate a statistical assumption. This assumption is known as homoscedasticity. When the homoscedasticity assumption is violated, this can lead to increased Type I error rates or decreased statistical power. Because this can adversely affect substantive conclusions, the failure to detect and manage heteroscedasticity could have serious implications for theory, research, and practice. In addition, heteroscedasticity is not uncommon in the behavioral and social sciences. Thus, in the current article, we synthesize extant literature in applied psychology, econometrics, quantitative psychology, and statistics, and we offer recommendations for researchers and practitioners regarding available procedures for detecting heteroscedasticity and mitigating its effects. In addition to discussing the strengths and weaknesses of various procedures and comparing them in terms of existing simulation results, we describe a 3-step data-analytic process for detecting and managing heteroscedasticity: (a) fitting a model based on theory and saving residuals, (b) the analysis of residuals, and (c) statistical inferences (e.g., hypothesis tests and confidence intervals) involving parameter estimates. We also demonstrate this data-analytic process using an illustrative example. Overall, detecting violations of the homoscedasticity assumption and mitigating its biasing effects can strengthen the validity of inferences from behavioral and social science data.
Biological Parametric Mapping: A Statistical Toolbox for Multi-Modality Brain Image Analysis

PubMed Central

Casanova, Ramon; Ryali, Srikanth; Baer, Aaron; Laurienti, Paul J.; Burdette, Jonathan H.; Hayasaka, Satoru; Flowers, Lynn; Wood, Frank; Maldjian, Joseph A.

2006-01-01

In recent years multiple brain MR imaging modalities have emerged; however, analysis methodologies have mainly remained modality specific. In addition, when comparing across imaging modalities, most researchers have been forced to rely on simple region-of-interest type analyses, which do not allow the voxel-by-voxel comparisons necessary to answer more sophisticated neuroscience questions. To overcome these limitations, we developed a toolbox for multimodal image analysis called biological parametric mapping (BPM), based on a voxel-wise use of the general linear model. The BPM toolbox incorporates information obtained from other modalities as regressors in a voxel-wise analysis, thereby permitting investigation of more sophisticated hypotheses. The BPM toolbox has been developed in MATLAB with a user friendly interface for performing analyses, including voxel-wise multimodal correlation, ANCOVA, and multiple regression. It has a high degree of integration with the SPM (statistical parametric mapping) software relying on it for visualization and statistical inference. Furthermore, statistical inference for a correlation field, rather than a widely-used T-field, has been implemented in the correlation analysis for more accurate results. An example with in-vivo data is presented demonstrating the potential of the BPM methodology as a tool for multimodal image analysis. PMID:17070709
Enhancing the Biological Relevance of Secretome-Based Proteomics by Linking Tumor Cell Proliferation and Protein Secretion.

PubMed

Gregori, Josep; Méndez, Olga; Katsila, Theodora; Pujals, Mireia; Salvans, Cándida; Villarreal, Laura; Arribas, Joaquin; Tabernero, Josep; Sánchez, Alex; Villanueva, Josep

2014-07-15

Secretome profiling has become a methodology of choice for the identification of tumor biomarkers. We hypothesized that due to the dynamic nature of secretomes cellular perturbations could affect their composition but also change the global amount of protein secreted per cell. We confirmed our hypothesis by measuring the levels of secreted proteins taking into account the amount of proteome produced per cell. Then, we established a correlation between cell proliferation and protein secretion that explained the observed changes in global protein secretion. Next, we implemented a normalization correcting the statistical results of secretome studies by the global protein secretion of cells into a generalized linear model (GLM). The application of the normalization to two biological perturbations on tumor cells resulted in drastic changes in the list of statistically significant proteins. Furthermore, we found that known epithelial-to-mesenchymal transition (EMT) effectors were only statistically significant when the normalization was applied. Therefore, the normalization proposed here increases the sensitivity of statistical tests by increasing the number of true-positives. From an oncology perspective, the correlation between protein secretion and cellular proliferation suggests that slow-growing tumors could have high-protein secretion rates and consequently contribute strongly to tumor paracrine signaling.
Linear models: permutation methods

USGS Publications Warehouse

Cade, B.S.; Everitt, B.S.; Howell, D.C.

2005-01-01

Permutation tests (see Permutation Based Inference) for the linear model have applications in behavioral studies when traditional parametric assumptions about the error term in a linear model are not tenable. Improved validity of Type I error rates can be achieved with properly constructed permutation tests. Perhaps more importantly, increased statistical power, improved robustness to effects of outliers, and detection of alternative distributional differences can be achieved by coupling permutation inference with alternative linear model estimators. For example, it is well-known that estimates of the mean in linear model are extremely sensitive to even a single outlying value of the dependent variable compared to estimates of the median [7, 19]. Traditionally, linear modeling focused on estimating changes in the center of distributions (means or medians). However, quantile regression allows distributional changes to be estimated in all or any selected part of a distribution or responses, providing a more complete statistical picture that has relevance to many biological questions [6]...
Adding a Parameter Increases the Variance of an Estimated Regression Function

ERIC Educational Resources Information Center

Withers, Christopher S.; Nadarajah, Saralees

2011-01-01

The linear regression model is one of the most popular models in statistics. It is also one of the simplest models in statistics. It has received applications in almost every area of science, engineering and medicine. In this article, the authors show that adding a predictor to a linear model increases the variance of the estimated regression…
Statistical properties of radiation from VUV and X-ray free electron laser

NASA Astrophysics Data System (ADS)

Saldin, E. L.; Schneidmiller, E. A.; Yurkov, M. V.

1998-03-01

The paper presents a comprehensive analysis of the statistical properties of the radiation from a self-amplified spontaneous emission (SASE) free electron laser operating in linear and nonlinear mode. The investigation has been performed in a one-dimensional approximation assuming the electron pulse length to be much larger than a coherence length of the radiation. The following statistical properties of the SASE FEL radiation have been studied in detail: time and spectral field correlations, distribution of the fluctuations of the instantaneous radiation power, distribution of the energy in the electron bunch, distribution of the radiation energy after the monochromator installed at the FEL amplifier exit and radiation spectrum. The linear high gain limit is studied analytically. It is shown that the radiation from a SASE FEL operating in the linear regime possesses all the features corresponding to completely chaotic polarized radiation. A detailed study of statistical properties of the radiation from a SASE FEL operating in linear and nonlinear regime has been performed by means of time-dependent simulation codes. All numerical results presented in the paper have been calculated for the 70 nm SASE FEL at the TESLA Test Facility being under construction at DESY.
Reporting quality of statistical methods in surgical observational studies: protocol for systematic review.

PubMed

Wu, Robert; Glen, Peter; Ramsay, Tim; Martel, Guillaume

2014-06-28

Observational studies dominate the surgical literature. Statistical adjustment is an important strategy to account for confounders in observational studies. Research has shown that published articles are often poor in statistical quality, which may jeopardize their conclusions. The Statistical Analyses and Methods in the Published Literature (SAMPL) guidelines have been published to help establish standards for statistical reporting.This study will seek to determine whether the quality of statistical adjustment and the reporting of these methods are adequate in surgical observational studies. We hypothesize that incomplete reporting will be found in all surgical observational studies, and that the quality and reporting of these methods will be of lower quality in surgical journals when compared with medical journals. Finally, this work will seek to identify predictors of high-quality reporting. This work will examine the top five general surgical and medical journals, based on a 5-year impact factor (2007-2012). All observational studies investigating an intervention related to an essential component area of general surgery (defined by the American Board of Surgery), with an exposure, outcome, and comparator, will be included in this systematic review. Essential elements related to statistical reporting and quality were extracted from the SAMPL guidelines and include domains such as intent of analysis, primary analysis, multiple comparisons, numbers and descriptive statistics, association and correlation analyses, linear regression, logistic regression, Cox proportional hazard analysis, analysis of variance, survival analysis, propensity analysis, and independent and correlated analyses. Each article will be scored as a proportion based on fulfilling criteria in relevant analyses used in the study. A logistic regression model will be built to identify variables associated with high-quality reporting. A comparison will be made between the scores of surgical observational studies published in medical versus surgical journals. Secondary outcomes will pertain to individual domains of analysis. Sensitivity analyses will be conducted. This study will explore the reporting and quality of statistical analyses in surgical observational studies published in the most referenced surgical and medical journals in 2013 and examine whether variables (including the type of journal) can predict high-quality reporting.
Global strength assessment in oblique waves of a large gas carrier ship, based on a non-linear iterative method

NASA Astrophysics Data System (ADS)

Domnisoru, L.; Modiga, A.; Gasparotti, C.

2016-08-01

At the ship's design, the first step of the hull structural assessment is based on the longitudinal strength analysis, with head wave equivalent loads by the ships' classification societies’ rules. This paper presents an enhancement of the longitudinal strength analysis, considering the general case of the oblique quasi-static equivalent waves, based on the own non-linear iterative procedure and in-house program. The numerical approach is developed for the mono-hull ships, without restrictions on 3D-hull offset lines non-linearities, and involves three interlinked iterative cycles on floating, pitch and roll trim equilibrium conditions. Besides the ship-wave equilibrium parameters, the ship's girder wave induced loads are obtained. As numerical study case we have considered a large LPG liquefied petroleum gas carrier. The numerical results of the large LPG are compared with the statistical design values from several ships' classification societies’ rules. This study makes possible to obtain the oblique wave conditions that are inducing the maximum loads into the large LPG ship's girder. The numerical results of this study are pointing out that the non-linear iterative approach is necessary for the computation of the extreme loads induced by the oblique waves, ensuring better accuracy of the large LPG ship's longitudinal strength assessment.
Generalized Linear Models of Home Activity for Automatic Detection of Mild Cognitive Impairment in Older Adults*

PubMed Central

Akl, Ahmad; Snoek, Jasper; Mihailidis, Alex

2015-01-01

With a globally aging population, the burden of care of cognitively impaired older adults is becoming increasingly concerning. Instances of Alzheimer’s disease and other forms of dementia are becoming ever more frequent. Earlier detection of cognitive impairment offers significant benefits, but remains difficult to do in practice. In this paper, we develop statistical models of the behavior of older adults within their homes using sensor data in order to detect the early onset of cognitive decline. Specifically, we use inhomogenous Poisson processes to model the presence of subjects within different rooms throughout the day in the home using unobtrusive sensing technologies. We compare the distributions learned from cognitively intact and impaired subjects using information theoretic tools and observe statistical differences between the two populations which we believe can be used to help detect the onset of cognitive decline. PMID:25570050
Generalized Linear Models of home activity for automatic detection of mild cognitive impairment in older adults.

PubMed

Akl, Ahmad; Snoek, Jasper; Mihailidis, Alex

2014-01-01

With a globally aging population, the burden of care of cognitively impaired older adults is becoming increasingly concerning. Instances of Alzheimer's disease and other forms of dementia are becoming ever more frequent. Earlier detection of cognitive impairment offers significant benefits, but remains difficult to do in practice. In this paper, we develop statistical models of the behavior of older adults within their homes using sensor data in order to detect the early onset of cognitive decline. Specifically, we use inhomogenous Poisson processes to model the presence of subjects within different rooms throughout the day in the home using unobtrusive sensing technologies. We compare the distributions learned from cognitively intact and impaired subjects using information theoretic tools and observe statistical differences between the two populations which we believe can be used to help detect the onset of cognitive decline.
Health tourism on the rise? Evidence from the Balance of Payments Statistics.

PubMed

Loh, Chung-Ping A

2014-09-01

The study assesses the presence and magnitude of global trends in health tourism using health-related travel (HRT) spending reported in the International Monetary Fund's Balance of Payments Statistics database. Linear regression and quantile regression are applied to estimate secular trends of the import and export of HRT based on a sample of countries from 2003 to 2009. The results show that from 2003 to 2009 the import and export of health tourism rose among countries with a high volume of such activities (accounting for the upper 40% of the countries), but not among those with a low volume. The uneven growth in health tourism has generated greater contrast between countries with high and low volumes of health tourism activities. However, the growth in the total import of health tourism did not outpace the population growth, implying that in general the population's tendency to engage in health tourism remained static.
Sample size in psychological research over the past 30 years.

PubMed

Marszalek, Jacob M; Barber, Carolyn; Kohlhart, Julie; Holmes, Cooper B

2011-04-01

The American Psychological Association (APA) Task Force on Statistical Inference was formed in 1996 in response to a growing body of research demonstrating methodological issues that threatened the credibility of psychological research, and made recommendations to address them. One issue was the small, even dramatically inadequate, size of samples used in studies published by leading journals. The present study assessed the progress made since the Task Force's final report in 1999. Sample sizes reported in four leading APA journals in 1955, 1977, 1995, and 2006 were compared using nonparametric statistics, while data from the last two waves were fit to a hierarchical generalized linear growth model for more in-depth analysis. Overall, results indicate that the recommendations for increasing sample sizes have not been integrated in core psychological research, although results slightly vary by field. This and other implications are discussed in the context of current methodological critique and practice.
Information transport in classical statistical systems

NASA Astrophysics Data System (ADS)

Wetterich, C.

2018-02-01

For "static memory materials" the bulk properties depend on boundary conditions. Such materials can be realized by classical statistical systems which admit no unique equilibrium state. We describe the propagation of information from the boundary to the bulk by classical wave functions. The dependence of wave functions on the location of hypersurfaces in the bulk is governed by a linear evolution equation that can be viewed as a generalized Schrödinger equation. Classical wave functions obey the superposition principle, with local probabilities realized as bilinears of wave functions. For static memory materials the evolution within a subsector is unitary, as characteristic for the time evolution in quantum mechanics. The space-dependence in static memory materials can be used as an analogue representation of the time evolution in quantum mechanics - such materials are "quantum simulators". For example, an asymmetric Ising model on a Euclidean two-dimensional lattice represents the time evolution of free relativistic fermions in two-dimensional Minkowski space.
Nonlinear GARCH model and 1 / f noise

NASA Astrophysics Data System (ADS)

Kononovicius, A.; Ruseckas, J.

2015-06-01

Auto-regressive conditionally heteroskedastic (ARCH) family models are still used, by practitioners in business and economic policy making, as a conditional volatility forecasting models. Furthermore ARCH models still are attracting an interest of the researchers. In this contribution we consider the well known GARCH(1,1) process and its nonlinear modifications, reminiscent of NGARCH model. We investigate the possibility to reproduce power law statistics, probability density function and power spectral density, using ARCH family models. For this purpose we derive stochastic differential equations from the GARCH processes in consideration. We find the obtained equations to be similar to a general class of stochastic differential equations known to reproduce power law statistics. We show that linear GARCH(1,1) process has power law distribution, but its power spectral density is Brownian noise-like. However, the nonlinear modifications exhibit both power law distribution and power spectral density of the 1 /fβ form, including 1 / f noise.
THE DISTRIBUTION OF COOK’S D STATISTIC

PubMed Central

Muller, Keith E.; Mok, Mario Chen

2013-01-01

Cook (1977) proposed a diagnostic to quantify the impact of deleting an observation on the estimated regression coefficients of a General Linear Univariate Model (GLUM). Simulations of models with Gaussian response and predictors demonstrate that his suggestion of comparing the diagnostic to the median of the F for overall regression captures an erratically varying proportion of the values. We describe the exact distribution of Cook’s statistic for a GLUM with Gaussian predictors and response. We also present computational forms, simple approximations, and asymptotic results. A simulation supports the accuracy of the results. The methods allow accurate evaluation of a single value or the maximum value from a regression analysis. The approximations work well for a single value, but less well for the maximum. In contrast, the cut-point suggested by Cook provides widely varying tail probabilities. As with all diagnostics, the data analyst must use scientific judgment in deciding how to treat highlighted observations. PMID:24363487
Simultaneous fitting of genomic-BLUP and Bayes-C components in a genomic prediction model.

PubMed

Iheshiulor, Oscar O M; Woolliams, John A; Svendsen, Morten; Solberg, Trygve; Meuwissen, Theo H E

2017-08-24

The rapid adoption of genomic selection is due to two key factors: availability of both high-throughput dense genotyping and statistical methods to estimate and predict breeding values. The development of such methods is still ongoing and, so far, there is no consensus on the best approach. Currently, the linear and non-linear methods for genomic prediction (GP) are treated as distinct approaches. The aim of this study was to evaluate the implementation of an iterative method (called GBC) that incorporates aspects of both linear [genomic-best linear unbiased prediction (G-BLUP)] and non-linear (Bayes-C) methods for GP. The iterative nature of GBC makes it less computationally demanding similar to other non-Markov chain Monte Carlo (MCMC) approaches. However, as a Bayesian method, GBC differs from both MCMC- and non-MCMC-based methods by combining some aspects of G-BLUP and Bayes-C methods for GP. Its relative performance was compared to those of G-BLUP and Bayes-C. We used an imputed 50 K single-nucleotide polymorphism (SNP) dataset based on the Illumina Bovine50K BeadChip, which included 48,249 SNPs and 3244 records. Daughter yield deviations for somatic cell count, fat yield, milk yield, and protein yield were used as response variables. GBC was frequently (marginally) superior to G-BLUP and Bayes-C in terms of prediction accuracy and was significantly better than G-BLUP only for fat yield. On average across the four traits, GBC yielded a 0.009 and 0.006 increase in prediction accuracy over G-BLUP and Bayes-C, respectively. Computationally, GBC was very much faster than Bayes-C and similar to G-BLUP. Our results show that incorporating some aspects of G-BLUP and Bayes-C in a single model can improve accuracy of GP over the commonly used method: G-BLUP. Generally, GBC did not statistically perform better than G-BLUP and Bayes-C, probably due to the close relationships between reference and validation individuals. Nevertheless, it is a flexible tool, in the sense, that it simultaneously incorporates some aspects of linear and non-linear models for GP, thereby exploiting family relationships while also accounting for linkage disequilibrium between SNPs and genes with large effects. The application of GBC in GP merits further exploration.
The Statistics of Visual Representation

NASA Technical Reports Server (NTRS)

Jobson, Daniel J.; Rahman, Zia-Ur; Woodell, Glenn A.

2002-01-01

The experience of retinex image processing has prompted us to reconsider fundamental aspects of imaging and image processing. Foremost is the idea that a good visual representation requires a non-linear transformation of the recorded (approximately linear) image data. Further, this transformation appears to converge on a specific distribution. Here we investigate the connection between numerical and visual phenomena. Specifically the questions explored are: (1) Is there a well-defined consistent statistical character associated with good visual representations? (2) Does there exist an ideal visual image? And (3) what are its statistical properties?
OPR-PPR, a Computer Program for Assessing Data Importance to Model Predictions Using Linear Statistics

USGS Publications Warehouse

Tonkin, Matthew J.; Tiedeman, Claire; Ely, D. Matthew; Hill, Mary C.

2007-01-01

The OPR-PPR program calculates the Observation-Prediction (OPR) and Parameter-Prediction (PPR) statistics that can be used to evaluate the relative importance of various kinds of data to simulated predictions. The data considered fall into three categories: (1) existing observations, (2) potential observations, and (3) potential information about parameters. The first two are addressed by the OPR statistic; the third is addressed by the PPR statistic. The statistics are based on linear theory and measure the leverage of the data, which depends on the location, the type, and possibly the time of the data being considered. For example, in a ground-water system the type of data might be a head measurement at a particular location and time. As a measure of leverage, the statistics do not take into account the value of the measurement. As linear measures, the OPR and PPR statistics require minimal computational effort once sensitivities have been calculated. Sensitivities need to be calculated for only one set of parameter values; commonly these are the values estimated through model calibration. OPR-PPR can calculate the OPR and PPR statistics for any mathematical model that produces the necessary OPR-PPR input files. In this report, OPR-PPR capabilities are presented in the context of using the ground-water model MODFLOW-2000 and the universal inverse program UCODE_2005. The method used to calculate the OPR and PPR statistics is based on the linear equation for prediction standard deviation. Using sensitivities and other information, OPR-PPR calculates (a) the percent increase in the prediction standard deviation that results when one or more existing observations are omitted from the calibration data set; (b) the percent decrease in the prediction standard deviation that results when one or more potential observations are added to the calibration data set; or (c) the percent decrease in the prediction standard deviation that results when potential information on one or more parameters is added.
Application of the Conway-Maxwell-Poisson generalized linear model for analyzing motor vehicle crashes.

PubMed

Lord, Dominique; Guikema, Seth D; Geedipally, Srinivas Reddy

2008-05-01

This paper documents the application of the Conway-Maxwell-Poisson (COM-Poisson) generalized linear model (GLM) for modeling motor vehicle crashes. The COM-Poisson distribution, originally developed in 1962, has recently been re-introduced by statisticians for analyzing count data subjected to over- and under-dispersion. This innovative distribution is an extension of the Poisson distribution. The objectives of this study were to evaluate the application of the COM-Poisson GLM for analyzing motor vehicle crashes and compare the results with the traditional negative binomial (NB) model. The comparison analysis was carried out using the most common functional forms employed by transportation safety analysts, which link crashes to the entering flows at intersections or on segments. To accomplish the objectives of the study, several NB and COM-Poisson GLMs were developed and compared using two datasets. The first dataset contained crash data collected at signalized four-legged intersections in Toronto, Ont. The second dataset included data collected for rural four-lane divided and undivided highways in Texas. Several methods were used to assess the statistical fit and predictive performance of the models. The results of this study show that COM-Poisson GLMs perform as well as NB models in terms of GOF statistics and predictive performance. Given the fact the COM-Poisson distribution can also handle under-dispersed data (while the NB distribution cannot or has difficulties converging), which have sometimes been observed in crash databases, the COM-Poisson GLM offers a better alternative over the NB model for modeling motor vehicle crashes, especially given the important limitations recently documented in the safety literature about the latter type of model.
Influenza forecasting with Google Flu Trends.

PubMed

Dugas, Andrea Freyer; Jalalpour, Mehdi; Gel, Yulia; Levin, Scott; Torcaso, Fred; Igusa, Takeru; Rothman, Richard E

2013-01-01

We developed a practical influenza forecast model based on real-time, geographically focused, and easy to access data, designed to provide individual medical centers with advanced warning of the expected number of influenza cases, thus allowing for sufficient time to implement interventions. Secondly, we evaluated the effects of incorporating a real-time influenza surveillance system, Google Flu Trends, and meteorological and temporal information on forecast accuracy. Forecast models designed to predict one week in advance were developed from weekly counts of confirmed influenza cases over seven seasons (2004-2011) divided into seven training and out-of-sample verification sets. Forecasting procedures using classical Box-Jenkins, generalized linear models (GLM), and generalized linear autoregressive moving average (GARMA) methods were employed to develop the final model and assess the relative contribution of external variables such as, Google Flu Trends, meteorological data, and temporal information. A GARMA(3,0) forecast model with Negative Binomial distribution integrating Google Flu Trends information provided the most accurate influenza case predictions. The model, on the average, predicts weekly influenza cases during 7 out-of-sample outbreaks within 7 cases for 83% of estimates. Google Flu Trend data was the only source of external information to provide statistically significant forecast improvements over the base model in four of the seven out-of-sample verification sets. Overall, the p-value of adding this external information to the model is 0.0005. The other exogenous variables did not yield a statistically significant improvement in any of the verification sets. Integer-valued autoregression of influenza cases provides a strong base forecast model, which is enhanced by the addition of Google Flu Trends confirming the predictive capabilities of search query based syndromic surveillance. This accessible and flexible forecast model can be used by individual medical centers to provide advanced warning of future influenza cases.

Statistical Modeling of Fire Occurrence Using Data from the Tōhoku, Japan Earthquake and Tsunami.

PubMed

Anderson, Dana; Davidson, Rachel A; Himoto, Keisuke; Scawthorn, Charles

2016-02-01

In this article, we develop statistical models to predict the number and geographic distribution of fires caused by earthquake ground motion and tsunami inundation in Japan. Using new, uniquely large, and consistent data sets from the 2011 Tōhoku earthquake and tsunami, we fitted three types of models-generalized linear models (GLMs), generalized additive models (GAMs), and boosted regression trees (BRTs). This is the first time the latter two have been used in this application. A simple conceptual framework guided identification of candidate covariates. Models were then compared based on their out-of-sample predictive power, goodness of fit to the data, ease of implementation, and relative importance of the framework concepts. For the ground motion data set, we recommend a Poisson GAM; for the tsunami data set, a negative binomial (NB) GLM or NB GAM. The best models generate out-of-sample predictions of the total number of ignitions in the region within one or two. Prefecture-level prediction errors average approximately three. All models demonstrate predictive power far superior to four from the literature that were also tested. A nonlinear relationship is apparent between ignitions and ground motion, so for GLMs, which assume a linear response-covariate relationship, instrumental intensity was the preferred ground motion covariate because it captures part of that nonlinearity. Measures of commercial exposure were preferred over measures of residential exposure for both ground motion and tsunami ignition models. This may vary in other regions, but nevertheless highlights the value of testing alternative measures for each concept. Models with the best predictive power included two or three covariates. © 2015 Society for Risk Analysis.
Outcomes associated with preoperative weight loss after laparoscopic Roux-en-Y gastric bypass

PubMed Central

Blackledge, Camille; Graham, Laura A.; Gullick, Allison A.; Richman, Joshua; Stahl, Richard; Grams, Jayleen

2016-01-01

Background Laparoscopic Roux-en-Y gastric bypass (LRYGB) is an effective treatment for achieving and maintaining weight loss and for improving obesity-related comorbidities. As part of the approval process for bariatric surgery, many insurance companies require patients to have documented recent participation in a supervised weight loss program. The goal of this study was to evaluate the relationship of preoperative weight changes with outcomes following LRYGB. Methods A retrospective review was conducted of adult patients undergoing LRYGB between 2008 and 2012 at a single institution. Patients were stratified into quartiles based on % excess weight gain (0–4.99 % and ≥5 % EWG) and % excess weight loss (0–4.99 % and ≥5 % EWL). Generalized linear models were used to examine differences in postoperative weight outcomes at 6, 12, and 24 months. Covariates included in the final adjusted models were determined using backwards stepwise selection. Results Of the 300 patients included in the study, there were no significant demographic differences among the quartiles. However, there was an increased time to operation for patients who gained or lost ≥5 % excess body weight (p < 0.001). Although there was no statistical significance in postoperative complications, there was a higher rate of complications in patients with ≥5 % EWG compared to those with ≥5 % EWL (12.5 vs. 4.8 %, respectively; p = 0.29). Unadjusted and adjusted generalized linear models showed no statistically significant association between preoperative % excess weight change and weight loss outcomes at 24 months. Conclusion Patients with the greatest % preoperative excess weight change had the longest intervals from initial visit to operation. No significant differences were seen in perioperative and postoperative outcomes. This study suggests preoperative weight loss requirements may delay the time to operation without improving postoperative outcomes or weight loss. PMID:26969666
Assessing the role of pavement macrotexture in preventing crashes on highways.

PubMed

Pulugurtha, Srinivas S; Kusam, Prasanna R; Patel, Kuvleshay J

2010-02-01

The objective of this article is to assess the role of pavement macrotexture in preventing crashes on highways in the State of North Carolina. Laser profilometer data obtained from the North Carolina Department of Transportation (NCDOT) for highways comprising four corridors are processed to calculate pavement macrotexture at 100-m (approximately 330-ft) sections according to the American Society for Testing and Materials (ASTM) standards. Crash data collected over the same lengths of the corridors were integrated with the calculated pavement macrotexture for each section. Scatterplots were generated to assess the role of pavement macrotexture on crashes and logarithm of crashes. Regression analyses were conducted by considering predictor variables such as million vehicle miles of travel (as a function of traffic volume and length), the number of interchanges, the number of at-grade intersections, the number of grade-separated interchanges, and the number of bridges, culverts, and overhead signs along with pavement macrotexture to study the statistical significance of relationship between pavement macrotexture and crashes (both linear and log-linear) when compared to other predictor variables. Scatterplots and regression analysis conducted indicate a more statistically significant relationship between pavement macrotexture and logarithm of crashes than between pavement macrotexture and crashes. The coefficient for pavement macrotexture, in general, is negative, indicating that the number of crashes or logarithm of crashes decreases as it increases. The relation between pavement macrotexture and logarithm of crashes is generally stronger than between most other predictor variables and crashes or logarithm of crashes. Based on results obtained, it can be concluded that maintaining pavement macrotexture greater than or equal to 1.524 mm (0.06 in.) as a threshold limit would possibly reduce crashes and provide safe transportation to road users on highways.
OPLS statistical model versus linear regression to assess sonographic predictors of stroke prognosis.

PubMed

Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi

2012-01-01

The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.
Correlation Between C-reactive Protein and Non-enzymatic Antioxidants (Albumin, Ferritin, Uric Acid and Bilirubin) in Hemodialysis Patients.

PubMed

Beciragic, Amela; Resic, Halima; Prohic, Nejra; Karamehic, Jasenko; Smajlovic, Ajdin; Masnic, Fahrudin; Ajanovic, Selma; Coric, Aida

2015-04-01

Increased levels of C-Reactive Protein are found in 30-60% on hemodialysis patients and it is closely associated with the progression of atherosclerosis, cardiovascular morbidity and mortality. Non enzymatic antioxidants are antioxidants which primarily retain potentially dangerous ions of iron and copper in their inactive form and thereby prevent its participation in the production of free radicals. The aim of the study was to examine the relationship of CRP and non enzymatic antioxidants (albumin, ferritin, uric acid and bilirubin) i.e. examine the importance of CRP as a serum biomarker in assessing the condition of inflammation and its relationship to antioxidant protection in patients on hemodialysis. The study was cross-sectional, clinical, comparative and descriptive. The study involved 100 patients (non diabetic) on chronic hemodialysis. The control group consisted of 50 subjects without subjective and objective indicators of chronic renal disease. In all patients, the concentration of CRP as well as concentrations of non enzymatic antioxidants were determined. In the group of hemodialysis patients 60% were men and 40% women. The average age of hemodialysis patients was 54.13 ± 11.8 years and the average age of the control group 41.72 ± 9.8 years. The average duration of hemodialysis treatment was 91.42 ± 76.2 months. In the group of hemodialysis patients statistically significant, negative linear correlation was determined between the concentration of CRP in and albumin concentration (rho = -0.251, p = 0.012) as well as negative, statistics insignificant, linear correlation between serum CRP and the concentration of uric acid (r = -0.077, p = 0.448). Furthermore, the positive, linear correlation was determined between serum CRP and ferritin (r = 0.159, p = 0.114) and positive linear correlation between CRP and total serum bilirubin (r = 0.121, p = 0.230). In the control group was determined a statistically significant, positive, linear correlation between serum CRP and uric acid concentration (rho = 0.438, p = 0.001) and statistically significant, positive, linear correlation between serum CRP and total serum bilirubin (rho = 0.510, p = 0.0001) A statistically significant, negative linear correlation was determined between CRP and albumin concentration (rho= -0.393, p = 0.005) as well as statistically significant, negative linear correlation between serum CRP and ferritin control group (rho = -0.391, p = 0.005). Elevated CRP level is a strong and independent predictor of low levels of serum albumin, which indicates that the hypoalbuminemia in hemodialysis patients could be more due to inflammation than malnutrition. There was no statistically significant correlation between CRP and other non enzymatic antioxidants (uric acid, ferritin, bilirubin), which shows that indicators of antioxidant defense in hemodialysis patients must be individually measured to determine their actual stocks and activity.
Do quality indicators for general practice teaching practices predict good outcomes for students?

PubMed

Bartlett, Maggie; Potts, Jessica; McKinley, Bob

2016-07-01

Keele medical students spend 113 days in general practices over our five-year programme. We collect practice data thought to indicate good quality teaching. We explored the relationships between these data and two outcomes for students; Objective Structured Clinical Examination (OSCE) scores and feedback regarding the placements. Though both are surrogate markers of good teaching, they are widely used. We collated practice and outcome data for one academic year. Two separate statistical analyses were carried out: (1) to determine how much of the variation seen in the OSCE scores was due to the effect of the practice and how much to the individual student. (2) to identify practice characteristics with a relationship to student feedback scores. (1) OSCE performance: 268 students in 90 practices: six quality indicators independently influenced the OSCE score, though without linear relationships and not to statistical significance. (2) Student satisfaction: 144 students in 69 practices: student feedback scores are not influenced by practice characteristics. The relationships between the quality indicators we collect for practices and outcomes for students are not clear. It may be that neither the quality indicators nor the outcome measures are reliable enough to inform decisions about practices' suitability for teaching.
Item Purification in Differential Item Functioning Using Generalized Linear Mixed Models

ERIC Educational Resources Information Center

Liu, Qian

2011-01-01

For this dissertation, four item purification procedures were implemented onto the generalized linear mixed model for differential item functioning (DIF) analysis, and the performance of these item purification procedures was investigated through a series of simulations. Among the four procedures, forward and generalized linear mixed model (GLMM)…
Comparison between Linear and Nonlinear Regression in a Laboratory Heat Transfer Experiment

ERIC Educational Resources Information Center

Gonçalves, Carine Messias; Schwaab, Marcio; Pinto, José Carlos

2013-01-01

In order to interpret laboratory experimental data, undergraduate students are used to perform linear regression through linearized versions of nonlinear models. However, the use of linearized models can lead to statistically biased parameter estimates. Even so, it is not an easy task to introduce nonlinear regression and show for the students…
Introducing Linear Functions: An Alternative Statistical Approach

ERIC Educational Resources Information Center

Nolan, Caroline; Herbert, Sandra

2015-01-01

The introduction of linear functions is the turning point where many students decide if mathematics is useful or not. This means the role of parameters and variables in linear functions could be considered to be "threshold concepts". There is recognition that linear functions can be taught in context through the exploration of linear…
Data Mining Methods Applied to Flight Operations Quality Assurance Data: A Comparison to Standard Statistical Methods

NASA Technical Reports Server (NTRS)

Stolzer, Alan J.; Halford, Carl

2007-01-01

In a previous study, multiple regression techniques were applied to Flight Operations Quality Assurance-derived data to develop parsimonious model(s) for fuel consumption on the Boeing 757 airplane. The present study examined several data mining algorithms, including neural networks, on the fuel consumption problem and compared them to the multiple regression results obtained earlier. Using regression methods, parsimonious models were obtained that explained approximately 85% of the variation in fuel flow. In general data mining methods were more effective in predicting fuel consumption. Classification and Regression Tree methods reported correlation coefficients of .91 to .92, and General Linear Models and Multilayer Perceptron neural networks reported correlation coefficients of about .99. These data mining models show great promise for use in further examining large FOQA databases for operational and safety improvements.
A statistical study of decaying kink oscillations detected using SDO/AIA

NASA Astrophysics Data System (ADS)

Goddard, C. R.; Nisticò, G.; Nakariakov, V. M.; Zimovets, I. V.

2016-01-01

Context. Despite intensive studies of kink oscillations of coronal loops in the last decade, a large-scale statistically significant investigation of the oscillation parameters has not been made using data from the Solar Dynamics Observatory (SDO). Aims: We carry out a statistical study of kink oscillations using extreme ultraviolet imaging data from a previously compiled catalogue. Methods: We analysed 58 kink oscillation events observed by the Atmospheric Imaging Assembly (AIA) on board SDO during its first four years of operation (2010-2014). Parameters of the oscillations, including the initial apparent amplitude, period, length of the oscillating loop, and damping are studied for 120 individual loop oscillations. Results: Analysis of the initial loop displacement and oscillation amplitude leads to the conclusion that the initial loop displacement prescribes the initial amplitude of oscillation in general. The period is found to scale with the loop length, and a linear fit of the data cloud gives a kink speed of Ck = (1330 ± 50) km s-1. The main body of the data corresponds to kink speeds in the range Ck = (800-3300) km s-1. Measurements of 52 exponential damping times were made, and it was noted that at least 21 of the damping profiles may be better approximated by a combination of non-exponential and exponential profiles rather than a purely exponential damping envelope. There are nine additional cases where the profile appears to be purely non-exponential and no damping time was measured. A scaling of the exponential damping time with the period is found, following the previously established linear scaling between these two parameters.
A statistical analysis of the daily streamflow hydrograph

NASA Astrophysics Data System (ADS)

Kavvas, M. L.; Delleur, J. W.

1984-03-01

In this study a periodic statistical analysis of daily streamflow data in Indiana, U.S.A., was performed to gain some new insight into the stochastic structure which describes the daily streamflow process. This analysis was performed by the periodic mean and covariance functions of the daily streamflows, by the time and peak discharge -dependent recession limb of the daily streamflow hydrograph, by the time and discharge exceedance level (DEL) -dependent probability distribution of the hydrograph peak interarrival time, and by the time-dependent probability distribution of the time to peak discharge. Some new statistical estimators were developed and used in this study. In general features, this study has shown that: (a) the persistence properties of daily flows depend on the storage state of the basin at the specified time origin of the flow process; (b) the daily streamflow process is time irreversible; (c) the probability distribution of the daily hydrograph peak interarrival time depends both on the occurrence time of the peak from which the inter-arrival time originates and on the discharge exceedance level; and (d) if the daily streamflow process is modeled as the release from a linear watershed storage, this release should depend on the state of the storage and on the time of the release as the persistence properties and the recession limb decay rates were observed to change with the state of the watershed storage and time. Therefore, a time-varying reservoir system needs to be considered if the daily streamflow process is to be modeled as the release from a linear watershed storage.
Calculating stage duration statistics in multistage diseases.

PubMed

Komarova, Natalia L; Thalhauser, Craig J

2011-01-01

Many human diseases are characterized by multiple stages of progression. While the typical sequence of disease progression can be identified, there may be large individual variations among patients. Identifying mean stage durations and their variations is critical for statistical hypothesis testing needed to determine if treatment is having a significant effect on the progression, or if a new therapy is showing a delay of progression through a multistage disease. In this paper we focus on two methods for extracting stage duration statistics from longitudinal datasets: an extension of the linear regression technique, and a counting algorithm. Both are non-iterative, non-parametric and computationally cheap methods, which makes them invaluable tools for studying the epidemiology of diseases, with a goal of identifying different patterns of progression by using bioinformatics methodologies. Here we show that the regression method performs well for calculating the mean stage durations under a wide variety of assumptions, however, its generalization to variance calculations fails under realistic assumptions about the data collection procedure. On the other hand, the counting method yields reliable estimations for both means and variances of stage durations. Applications to Alzheimer disease progression are discussed.
A statistical model for investigating binding probabilities of DNA nucleotide sequences using microarrays.

PubMed

Lee, Mei-Ling Ting; Bulyk, Martha L; Whitmore, G A; Church, George M

2002-12-01

There is considerable scientific interest in knowing the probability that a site-specific transcription factor will bind to a given DNA sequence. Microarray methods provide an effective means for assessing the binding affinities of a large number of DNA sequences as demonstrated by Bulyk et al. (2001, Proceedings of the National Academy of Sciences, USA 98, 7158-7163) in their study of the DNA-binding specificities of Zif268 zinc fingers using microarray technology. In a follow-up investigation, Bulyk, Johnson, and Church (2002, Nucleic Acid Research 30, 1255-1261) studied the interdependence of nucleotides on the binding affinities of transcription proteins. Our article is motivated by this pair of studies. We present a general statistical methodology for analyzing microarray intensity measurements reflecting DNA-protein interactions. The log probability of a protein binding to a DNA sequence on an array is modeled using a linear ANOVA model. This model is convenient because it employs familiar statistical concepts and procedures and also because it is effective for investigating the probability structure of the binding mechanism.
Statistical Signal Models and Algorithms for Image Analysis

DTIC Science & Technology

1984-10-25

In this report, two-dimensional stochastic linear models are used in developing algorithms for image analysis such as classification, segmentation, and object detection in images characterized by textured backgrounds. These models generate two-dimensional random processes as outputs to which statistical inference procedures can naturally be applied. A common thread throughout our algorithms is the interpretation of the inference procedures in terms of linear prediction
BOOK REVIEW: Statistical Mechanics of Turbulent Flows

NASA Astrophysics Data System (ADS)

Cambon, C.

2004-10-01

This is a handbook for a computational approach to reacting flows, including background material on statistical mechanics. In this sense, the title is somewhat misleading with respect to other books dedicated to the statistical theory of turbulence (e.g. Monin and Yaglom). In the present book, emphasis is placed on modelling (engineering closures) for computational fluid dynamics. The probabilistic (pdf) approach is applied to the local scalar field, motivated first by the nonlinearity of chemical source terms which appear in the transport equations of reacting species. The probabilistic and stochastic approaches are also used for the velocity field and particle position; nevertheless they are essentially limited to Lagrangian models for a local vector, with only single-point statistics, as for the scalar. Accordingly, conventional techniques, such as single-point closures for RANS (Reynolds-averaged Navier-Stokes) and subgrid-scale models for LES (large-eddy simulations), are described and in some cases reformulated using underlying Langevin models and filtered pdfs. Even if the theoretical approach to turbulence is not discussed in general, the essentials of probabilistic and stochastic-processes methods are described, with a useful reminder concerning statistics at the molecular level. The book comprises 7 chapters. Chapter 1 briefly states the goals and contents, with a very clear synoptic scheme on page 2. Chapter 2 presents definitions and examples of pdfs and related statistical moments. Chapter 3 deals with stochastic processes, pdf transport equations, from Kramer-Moyal to Fokker-Planck (for Markov processes), and moments equations. Stochastic differential equations are introduced and their relationship to pdfs described. This chapter ends with a discussion of stochastic modelling. The equations of fluid mechanics and thermodynamics are addressed in chapter 4. Classical conservation equations (mass, velocity, internal energy) are derived from their counterparts at the molecular level. In addition, equations are given for multicomponent reacting systems. The chapter ends with miscellaneous topics, including DNS, (idea of) the energy cascade, and RANS. Chapter 5 is devoted to stochastic models for the large scales of turbulence. Langevin-type models for velocity (and particle position) are presented, and their various consequences for second-order single-point corelations (Reynolds stress components, Kolmogorov constant) are discussed. These models are then presented for the scalar. The chapter ends with compressible high-speed flows and various models, ranging from k-epsilon to hybrid RANS-pdf. Stochastic models for small-scale turbulence are addressed in chapter 6. These models are based on the concept of a filter density function (FDF) for the scalar, and a more conventional SGS (sub-grid-scale model) for the velocity in LES. The final chapter, chapter 7, is entitled `The unification of turbulence models' and aims at reconciling large-scale and small-scale modelling. This book offers a timely survey of techniques in modern computational fluid mechanics for turbulent flows with reacting scalars. It should be of interest to engineers, while the discussion of the underlying tools, namely pdfs, stochastic and statistical equations should also be attractive to applied mathematicians and physicists. The book's emphasis on local pdfs and stochastic Langevin models gives a consistent structure to the book and allows the author to cover almost the whole spectrum of practical modelling in turbulent CFD. On the other hand, one might regret that non-local issues are not mentioned explicitly, or even briefly. These problems range from the presence of pressure-strain correlations in the Reynolds stress transport equations to the presence of two-point pdfs in the single-point pdf equation derived from the Navier--Stokes equations. (One may recall that, even without scalar transport, a general closure problem for turbulence statistics results from both non-linearity and non-locality of Navier-Stokes equations, the latter coming from, e.g., the nonlocal relationship of velocity and pressure in the quasi-incompressible case. These two aspects are often intricately linked. It is well known that non-linearity alone is not responsible for the `problem', as evidenced by 1D turbulence without pressure (`Burgulence' from the Burgers equation) and probably 3D (cosmological gas). A local description in terms of pdf for the velocity can resolve the `non-linear' problem, which instead yields an infinite hierarchy of equations in terms of moments. On the other hand, non-locality yields a hierarchy of unclosed equations, with the single-point pdf equation for velocity derived from NS incompressible equations involving a two-point pdf, and so on. The general relationship was given by Lundgren (1967, Phys. Fluids 10 (5), 969-975), with the equation for pdf at n points involving the pdf at n+1 points. The nonlocal problem appears in various statistical models which are not discussed in the book. The simplest example is full RST or ASM models, in which the closure of pressure-strain correlations is pivotal (their counterpart ought to be identified and discussed in equations (5-21) and the following ones). The book does not address more sophisticated non-local approaches, such as two-point (or spectral) non-linear closure theories and models, `rapid distortion theory' for linear regimes, not to mention scaling and intermittency based on two-point structure functions, etc. The book sometimes mixes theoretical modelling and pure empirical relationships, the empirical character coming from the lack of a nonlocal (two-point) approach.) In short, the book is orientated more towards applications than towards turbulence theory; it is written clearly and concisely and should be useful to a large community, interested either in the underlying stochastic formalism or in CFD applications.
How do general dentists and orthodontists determine where to refer patients requiring oral and maxillofacial surgical procedures?

PubMed

Schlieve, Thomas; Funderburk, Joseph; Flick, William; Miloro, Michael; Kolokythas, Antonia

2015-03-01

This study investigated the influence of specific criteria on referral selection among general dentists and orthodontists in deciding referrals to oral and maxillofacial surgeons. A cross-sectional study was designed to examine the importance of criteria used by 2 groups of practitioners, general dentists and orthodontists, for deciding on referrals to oral and maxillofacial surgeons. Data were collected by 2 multiple-choice surveys. The surveys were e-mailed to general dentists and orthodontists practicing in the state of Illinois and to graduates from the University of Illinois at Chicago (UIC) College of Dentistry and the UIC Department of Orthodontics. Participants were asked to rate referral criteria from most important to least important. Analysis of variance was used to examine the data for any differences in the importance of the criteria for each question and linear regression analysis was used to determine whether any 1 criterion was statistically meaningful within each group of practitioners. In total, 235 general dental practitioners and 357 orthodontists completed the survey, with a 100% completion rate. The most important criterion for referral to oral and maxillofacial surgeons in the general dentist group was the personal and professional relationship of the referring doctor to the specialist. In the orthodontist group, no single criterion was statistically meaningful. General dentists tend to develop long-term relationships with their patients, and when deciding the appropriate referrals it appears that personal and professional relationships that promote trust and open communication are key elements. General dentists favor these relationships when making referral decisions across a wide spectrum of procedures. Orthodontists do not place a substantial value on a specific criterion for referral and therefore may not develop the same relationships between patient and doctor and between doctors as general dentists. Copyright © 2015 American Association of Oral and Maxillofacial Surgeons. Published by Elsevier Inc. All rights reserved.
Does transport time help explain the high trauma mortality rates in rural areas? New and traditional predictors assessed by new and traditional statistical methods

PubMed Central

Røislien, Jo; Lossius, Hans Morten; Kristiansen, Thomas

2015-01-01

Background Trauma is a leading global cause of death. Trauma mortality rates are higher in rural areas, constituting a challenge for quality and equality in trauma care. The aim of the study was to explore population density and transport time to hospital care as possible predictors of geographical differences in mortality rates, and to what extent choice of statistical method might affect the analytical results and accompanying clinical conclusions. Methods Using data from the Norwegian Cause of Death registry, deaths from external causes 1998–2007 were analysed. Norway consists of 434 municipalities, and municipality population density and travel time to hospital care were entered as predictors of municipality mortality rates in univariate and multiple regression models of increasing model complexity. We fitted linear regression models with continuous and categorised predictors, as well as piecewise linear and generalised additive models (GAMs). Models were compared using Akaike's information criterion (AIC). Results Population density was an independent predictor of trauma mortality rates, while the contribution of transport time to hospital care was highly dependent on choice of statistical model. A multiple GAM or piecewise linear model was superior, and similar, in terms of AIC. However, while transport time was statistically significant in multiple models with piecewise linear or categorised predictors, it was not in GAM or standard linear regression. Conclusions Population density is an independent predictor of trauma mortality rates. The added explanatory value of transport time to hospital care is marginal and model-dependent, highlighting the importance of exploring several statistical models when studying complex associations in observational data. PMID:25972600
The probability density function (PDF) of Lagrangian Turbulence

NASA Astrophysics Data System (ADS)

Birnir, B.

2012-12-01

The statistical theory of Lagrangian turbulence is derived from the stochastic Navier-Stokes equation. Assuming that the noise in fully-developed turbulence is a generic noise determined by the general theorems in probability, the central limit theorem and the large deviation principle, we are able to formulate and solve the Kolmogorov-Hopf equation for the invariant measure of the stochastic Navier-Stokes equations. The intermittency corrections to the scaling exponents of the structure functions require a multiplicative (multipling the fluid velocity) noise in the stochastic Navier-Stokes equation. We let this multiplicative noise, in the equation, consists of a simple (Poisson) jump process and then show how the Feynmann-Kac formula produces the log-Poissonian processes, found by She and Leveque, Waymire and Dubrulle. These log-Poissonian processes give the intermittency corrections that agree with modern direct Navier-Stokes simulations (DNS) and experiments. The probability density function (PDF) plays a key role when direct Navier-Stokes simulations or experimental results are compared to theory. The statistical theory of turbulence is determined, including the scaling of the structure functions of turbulence, by the invariant measure of the Navier-Stokes equation and the PDFs for the various statistics (one-point, two-point, N-point) can be obtained by taking the trace of the corresponding invariant measures. Hopf derived in 1952 a functional equation for the characteristic function (Fourier transform) of the invariant measure. In distinction to the nonlinear Navier-Stokes equation, this is a linear functional differential equation. The PDFs obtained from the invariant measures for the velocity differences (two-point statistics) are shown to be the four parameter generalized hyperbolic distributions, found by Barndorff-Nilsen. These PDF have heavy tails and a convex peak at the origin. A suitable projection of the Kolmogorov-Hopf equations is the differential equation determining the generalized hyperbolic distributions. Then we compare these PDFs with DNS results and experimental data.
Task-Specific and General Cognitive Effects in Chiari Malformation Type I

PubMed Central

Allen, Philip A.; Houston, James R.; Pollock, Joshua W.; Buzzelli, Christopher; Li, Xuan; Harrington, A. Katherine; Martin, Bryn A.; Loth, Francis; Lien, Mei-Ching; Maleki, Jahangir; Luciano, Mark G.

2014-01-01

Objective Our objective was to use episodic memory and executive function tests to determine whether or not Chiari Malformation Type I (CM) patients experience cognitive dysfunction. Background CM is a neurological syndrome in which the cerebellum descends into the cervical spine causing neural compression, severe headaches, neck pain, and number of other physical symptoms. While primarily a disorder of the cervico-medullary junction, both clinicians and researchers have suspected deficits in higher-level cognitive function. Design and Methods We tested 24 CM patients who had undergone decompression neurosurgery and 24 age- and education-matched controls on measures of immediate and delayed episodic memory, as well as three measures of executive function. Results The CM group showed performance decrements relative to the controls in response inhibition (Stroop interference), working memory computational speed (Ospan), and processing speed (automated digit symbol substitution task), but group differences in recall did not reach statistical significance. After statistical control for depression and anxiety scores, the group effects for working memory and processing speed were eliminated, but not for response inhibition. This response inhibition difference was not due to overall general slowing for the CM group, either, because when controls' data were transformed using the linear function fit to all of the reaction time tasks, the interaction with group remained statistically significant. Furthermore, there was a multivariate group effect for all of the response time measures and immediate and delayed recall after statistical control of depression and anxiety scores. Conclusion These results suggest that CM patients with decompression surgery exhibit cognitive dysfunction compared to age- and education-matched controls. While some of these results may be related to anxiety and depression (likely proxies for chronic pain), response inhibition effects, in particular, as well as a general cognitive deficit persisted even after control for anxiety and decompression. PMID:24736676

Strategies for Reduced-Order Models in Uncertainty Quantification of Complex Turbulent Dynamical Systems

NASA Astrophysics Data System (ADS)

Qi, Di

Turbulent dynamical systems are ubiquitous in science and engineering. Uncertainty quantification (UQ) in turbulent dynamical systems is a grand challenge where the goal is to obtain statistical estimates for key physical quantities. In the development of a proper UQ scheme for systems characterized by both a high-dimensional phase space and a large number of instabilities, significant model errors compared with the true natural signal are always unavoidable due to both the imperfect understanding of the underlying physical processes and the limited computational resources available. One central issue in contemporary research is the development of a systematic methodology for reduced order models that can recover the crucial features both with model fidelity in statistical equilibrium and with model sensitivity in response to perturbations. In the first part, we discuss a general mathematical framework to construct statistically accurate reduced-order models that have skill in capturing the statistical variability in the principal directions of a general class of complex systems with quadratic nonlinearity. A systematic hierarchy of simple statistical closure schemes, which are built through new global statistical energy conservation principles combined with statistical equilibrium fidelity, are designed and tested for UQ of these problems. Second, the capacity of imperfect low-order stochastic approximations to model extreme events in a passive scalar field advected by turbulent flows is investigated. The effects in complicated flow systems are considered including strong nonlinear and non-Gaussian interactions, and much simpler and cheaper imperfect models with model error are constructed to capture the crucial statistical features in the stationary tracer field. Several mathematical ideas are introduced to improve the prediction skill of the imperfect reduced-order models. Most importantly, empirical information theory and statistical linear response theory are applied in the training phase for calibrating model errors to achieve optimal imperfect model parameters; and total statistical energy dynamics are introduced to improve the model sensitivity in the prediction phase especially when strong external perturbations are exerted. The validity of reduced-order models for predicting statistical responses and intermittency is demonstrated on a series of instructive models with increasing complexity, including the stochastic triad model, the Lorenz '96 model, and models for barotropic and baroclinic turbulence. The skillful low-order modeling methods developed here should also be useful for other applications such as efficient algorithms for data assimilation.
Smoothed Residual Plots for Generalized Linear Models. Technical Report #450.

ERIC Educational Resources Information Center

Brant, Rollin

Methods for examining the viability of assumptions underlying generalized linear models are considered. By appealing to the likelihood, a natural generalization of the raw residual plot for normal theory models is derived and is applied to investigating potential misspecification of the linear predictor. A smooth version of the plot is also…
Simulation of extreme rainfall and projection of future changes using the GLIMCLIM model

NASA Astrophysics Data System (ADS)

Rashid, Md. Mamunur; Beecham, Simon; Chowdhury, Rezaul Kabir

2017-10-01

In this study, the performance of the Generalized LInear Modelling of daily CLImate sequence (GLIMCLIM) statistical downscaling model was assessed to simulate extreme rainfall indices and annual maximum daily rainfall (AMDR) when downscaled daily rainfall from National Centers for Environmental Prediction (NCEP) reanalysis and Coupled Model Intercomparison Project Phase 5 (CMIP5) general circulation models (GCM) (four GCMs and two scenarios) output datasets and then their changes were estimated for the future period 2041-2060. The model was able to reproduce the monthly variations in the extreme rainfall indices reasonably well when forced by the NCEP reanalysis datasets. Frequency Adapted Quantile Mapping (FAQM) was used to remove bias in the simulated daily rainfall when forced by CMIP5 GCMs, which reduced the discrepancy between observed and simulated extreme rainfall indices. Although the observed AMDR were within the 2.5th and 97.5th percentiles of the simulated AMDR, the model consistently under-predicted the inter-annual variability of AMDR. A non-stationary model was developed using the generalized linear model for local, shape and scale to estimate the AMDR with an annual exceedance probability of 0.01. The study shows that in general, AMDR is likely to decrease in the future. The Onkaparinga catchment will also experience drier conditions due to an increase in consecutive dry days coinciding with decreases in heavy (>long term 90th percentile) rainfall days, empirical 90th quantile of rainfall and maximum 5-day consecutive total rainfall for the future period (2041-2060) compared to the base period (1961-2000).
Physiological Aldosterone Concentrations Are Associated with Alterations of Lipid Metabolism: Observations from the General Population.

PubMed

Hannich, M; Wallaschofski, H; Nauck, M; Reincke, M; Adolf, C; Völzke, H; Rettig, R; Hannemann, A

2018-01-01

Aldosterone and high-density lipoprotein cholesterol (HDL-C) are involved in many pathophysiological processes that contribute to the development of cardiovascular diseases. Previously, associations between the concentrations of aldosterone and certain components of the lipid metabolism in the peripheral circulation were suggested, but data from the general population is sparse. We therefore aimed to assess the associations between aldosterone and HDL-C, low-density lipoprotein cholesterol (LDL-C), total cholesterol, triglycerides, or non-HDL-C in the general adult population. Data from 793 men and 938 women aged 25-85 years who participated in the first follow-up of the Study of Health in Pomerania were obtained. The associations of aldosterone with serum lipid concentrations were assessed in multivariable linear regression models adjusted for sex, age, body mass index (BMI), estimated glomerular filtration rate (eGFR), and HbA1c. The linear regression models showed statistically significant positive associations of aldosterone with LDL-C ( β -coefficient = 0.022, standard error = 0.010, p = 0.03) and non-HDL-C ( β -coefficient = 0.023, standard error = 0.009, p = 0.01) as well as an inverse association of aldosterone with HDL-C ( β -coefficient = -0.022, standard error = 0.011, p = 0.04). The present data show that plasma aldosterone is positively associated with LDL-C and non-HDL-C and inversely associated with HDL-C in the general population. Our data thus suggests that aldosterone concentrations within the physiological range may be related to alterations of lipid metabolism.
Estimation of genetic variance for macro- and micro-environmental sensitivity using double hierarchical generalized linear models.

PubMed

Mulder, Han A; Rönnegård, Lars; Fikse, W Freddy; Veerkamp, Roel F; Strandberg, Erling

2013-07-04

Genetic variation for environmental sensitivity indicates that animals are genetically different in their response to environmental factors. Environmental factors are either identifiable (e.g. temperature) and called macro-environmental or unknown and called micro-environmental. The objectives of this study were to develop a statistical method to estimate genetic parameters for macro- and micro-environmental sensitivities simultaneously, to investigate bias and precision of resulting estimates of genetic parameters and to develop and evaluate use of Akaike's information criterion using h-likelihood to select the best fitting model. We assumed that genetic variation in macro- and micro-environmental sensitivities is expressed as genetic variance in the slope of a linear reaction norm and environmental variance, respectively. A reaction norm model to estimate genetic variance for macro-environmental sensitivity was combined with a structural model for residual variance to estimate genetic variance for micro-environmental sensitivity using a double hierarchical generalized linear model in ASReml. Akaike's information criterion was constructed as model selection criterion using approximated h-likelihood. Populations of sires with large half-sib offspring groups were simulated to investigate bias and precision of estimated genetic parameters. Designs with 100 sires, each with at least 100 offspring, are required to have standard deviations of estimated variances lower than 50% of the true value. When the number of offspring increased, standard deviations of estimates across replicates decreased substantially, especially for genetic variances of macro- and micro-environmental sensitivities. Standard deviations of estimated genetic correlations across replicates were quite large (between 0.1 and 0.4), especially when sires had few offspring. Practically, no bias was observed for estimates of any of the parameters. Using Akaike's information criterion the true genetic model was selected as the best statistical model in at least 90% of 100 replicates when the number of offspring per sire was 100. Application of the model to lactation milk yield in dairy cattle showed that genetic variance for micro- and macro-environmental sensitivities existed. The algorithm and model selection criterion presented here can contribute to better understand genetic control of macro- and micro-environmental sensitivities. Designs or datasets should have at least 100 sires each with 100 offspring.
Label-free protein quantification using LC-coupled ion trap or FT mass spectrometry: Reproducibility, linearity, and application with complex proteomes.

PubMed

Wang, Guanghui; Wu, Wells W; Zeng, Weihua; Chou, Chung-Lin; Shen, Rong-Fong

2006-05-01

A critical step in protein biomarker discovery is the ability to contrast proteomes, a process referred generally as quantitative proteomics. While stable-isotope labeling (e.g., ICAT, 18O- or 15N-labeling, or AQUA) remains the core technology used in mass spectrometry-based proteomic quantification, increasing efforts have been directed to the label-free approach that relies on direct comparison of peptide peak areas between LC-MS runs. This latter approach is attractive to investigators for its simplicity as well as cost effectiveness. In the present study, the reproducibility and linearity of using a label-free approach to highly complex proteomes were evaluated. Various amounts of proteins from different proteomes were subjected to repeated LC-MS analyses using an ion trap or Fourier transform mass spectrometer. Highly reproducible data were obtained between replicated runs, as evidenced by nearly ideal Pearson's correlation coefficients (for ion's peak areas or retention time) and average peak area ratios. In general, more than 50% and nearly 90% of the peptide ion ratios deviated less than 10% and 20%, respectively, from the average in duplicate runs. In addition, the multiplicity ratios of the amounts of proteins used correlated nicely with the observed averaged ratios of peak areas calculated from detected peptides. Furthermore, the removal of abundant proteins from the samples led to an improvement in reproducibility and linearity. A computer program has been written to automate the processing of data sets from experiments with groups of multiple samples for statistical analysis. Algorithms for outlier-resistant mean estimation and for adjusting statistical significance threshold in multiplicity of testing were incorporated to minimize the rate of false positives. The program was applied to quantify changes in proteomes of parental and p53-deficient HCT-116 human cells and found to yield reproducible results. Overall, this study demonstrates an alternative approach that allows global quantification of differentially expressed proteins in complex proteomes. The utility of this method to biomarker discovery is likely to synergize with future improvements in the detecting sensitivity of mass spectrometers.
Bayesian generalized linear mixed modeling of Tuberculosis using informative priors.

PubMed

Ojo, Oluwatobi Blessing; Lougue, Siaka; Woldegerima, Woldegebriel Assefa

2017-01-01

TB is rated as one of the world's deadliest diseases and South Africa ranks 9th out of the 22 countries with hardest hit of TB. Although many pieces of research have been carried out on this subject, this paper steps further by inculcating past knowledge into the model, using Bayesian approach with informative prior. Bayesian statistics approach is getting popular in data analyses. But, most applications of Bayesian inference technique are limited to situations of non-informative prior, where there is no solid external information about the distribution of the parameter of interest. The main aim of this study is to profile people living with TB in South Africa. In this paper, identical regression models are fitted for classical and Bayesian approach both with non-informative and informative prior, using South Africa General Household Survey (GHS) data for the year 2014. For the Bayesian model with informative prior, South Africa General Household Survey dataset for the year 2011 to 2013 are used to set up priors for the model 2014.
The linearized multistage model and the future of quantitative risk assessment.

PubMed

Crump, K S

1996-10-01

The linearized multistage (LMS) model has for over 15 years been the default dose-response model used by the U.S. Environmental Protection Agency (USEPA) and other federal and state regulatory agencies in the United States for calculating quantitative estimates of low-dose carcinogenic risks from animal data. The LMS model is in essence a flexible statistical model that can describe both linear and non-linear dose-response patterns, and that produces an upper confidence bound on the linear low-dose slope of the dose-response curve. Unlike its namesake, the Armitage-Doll multistage model, the parameters of the LMS do not correspond to actual physiological phenomena. Thus the LMS is 'biological' only to the extent that the true biological dose response is linear at low dose and that low-dose slope is reflected in the experimental data. If the true dose response is non-linear the LMS upper bound may overestimate the true risk by many orders of magnitude. However, competing low-dose extrapolation models, including those derived from 'biologically-based models' that are capable of incorporating additional biological information, have not shown evidence to date of being able to produce quantitative estimates of low-dose risks that are any more accurate than those obtained from the LMS model. Further, even if these attempts were successful, the extent to which more accurate estimates of low-dose risks in a test animal species would translate into improved estimates of human risk is questionable. Thus, it does not appear possible at present to develop a quantitative approach that would be generally applicable and that would offer significant improvements upon the crude bounding estimates of the type provided by the LMS model. Draft USEPA guidelines for cancer risk assessment incorporate an approach similar to the LMS for carcinogens having a linear mode of action. However, under these guidelines quantitative estimates of low-dose risks would not be developed for carcinogens having a non-linear mode of action; instead dose-response modelling would be used in the experimental range to calculate an LED10* (a statistical lower bound on the dose corresponding to a 10% increase in risk), and safety factors would be applied to the LED10* to determine acceptable exposure levels for humans. This approach is very similar to the one presently used by USEPA for non-carcinogens. Rather than using one approach for carcinogens believed to have a linear mode of action and a different approach for all other health effects, it is suggested herein that it would be more appropriate to use an approach conceptually similar to the 'LED10*-safety factor' approach for all health effects, and not to routinely develop quantitative risk estimates from animal data.
Large-scale galaxy bias

NASA Astrophysics Data System (ADS)

Jeong, Donghui; Desjacques, Vincent; Schmidt, Fabian

2018-01-01

Here, we briefly introduce the key results of the recent review (arXiv:1611.09787), whose abstract is as following. This review presents a comprehensive overview of galaxy bias, that is, the statistical relation between the distribution of galaxies and matter. We focus on large scales where cosmic density fields are quasi-linear. On these scales, the clustering of galaxies can be described by a perturbative bias expansion, and the complicated physics of galaxy formation is absorbed by a finite set of coefficients of the expansion, called bias parameters. The review begins with a detailed derivation of this very important result, which forms the basis of the rigorous perturbative description of galaxy clustering, under the assumptions of General Relativity and Gaussian, adiabatic initial conditions. Key components of the bias expansion are all leading local gravitational observables, which include the matter density but also tidal fields and their time derivatives. We hence expand the definition of local bias to encompass all these contributions. This derivation is followed by a presentation of the peak-background split in its general form, which elucidates the physical meaning of the bias parameters, and a detailed description of the connection between bias parameters and galaxy (or halo) statistics. We then review the excursion set formalism and peak theory which provide predictions for the values of the bias parameters. In the remainder of the review, we consider the generalizations of galaxy bias required in the presence of various types of cosmological physics that go beyond pressureless matter with adiabatic, Gaussian initial conditions: primordial non-Gaussianity, massive neutrinos, baryon-CDM isocurvature perturbations, dark energy, and modified gravity. Finally, we discuss how the description of galaxy bias in the galaxies' rest frame is related to clustering statistics measured from the observed angular positions and redshifts in actual galaxy catalogs.
Large-scale galaxy bias

NASA Astrophysics Data System (ADS)

Desjacques, Vincent; Jeong, Donghui; Schmidt, Fabian

2018-02-01

This review presents a comprehensive overview of galaxy bias, that is, the statistical relation between the distribution of galaxies and matter. We focus on large scales where cosmic density fields are quasi-linear. On these scales, the clustering of galaxies can be described by a perturbative bias expansion, and the complicated physics of galaxy formation is absorbed by a finite set of coefficients of the expansion, called bias parameters. The review begins with a detailed derivation of this very important result, which forms the basis of the rigorous perturbative description of galaxy clustering, under the assumptions of General Relativity and Gaussian, adiabatic initial conditions. Key components of the bias expansion are all leading local gravitational observables, which include the matter density but also tidal fields and their time derivatives. We hence expand the definition of local bias to encompass all these contributions. This derivation is followed by a presentation of the peak-background split in its general form, which elucidates the physical meaning of the bias parameters, and a detailed description of the connection between bias parameters and galaxy statistics. We then review the excursion-set formalism and peak theory which provide predictions for the values of the bias parameters. In the remainder of the review, we consider the generalizations of galaxy bias required in the presence of various types of cosmological physics that go beyond pressureless matter with adiabatic, Gaussian initial conditions: primordial non-Gaussianity, massive neutrinos, baryon-CDM isocurvature perturbations, dark energy, and modified gravity. Finally, we discuss how the description of galaxy bias in the galaxies' rest frame is related to clustering statistics measured from the observed angular positions and redshifts in actual galaxy catalogs.
An Evaluation of CPRA (Cost Performance Report Analysis) Estimate at Completion Techniques Based Upon AFWAL (Air Force Wright Aeronautical Laboratories) Cost/Schedule Control System Criteria Data

DTIC Science & Technology

1985-09-01

4 C/SCSC Terms and Definitions ...... ..... 5 Cost Performance Report Analysis (CPA) Progrra" m 6 Description of CPRA Terms and Formulas...hypotheses are: 1 2 C2: al’ 02 ’ The test statistic is then calculated as: F* (( SSEI + (nI - 2)) / (SSE 2 + (n 2 - 2))] The critical F value is: F(c, nl...353.90767 SIGNIF F = .0000 44 ,1 42 •.4 m . - .TABLE B.4 General Linear Test for EAC1 and EAC5 MEAN STD DEV CASES ECAC 827534.056 1202737.882 1630 EACS
Development of non-linear models predicting daily fine particle concentrations using aerosol optical depth retrievals and ground-based measurements at a municipality in the Brazilian Amazon region

NASA Astrophysics Data System (ADS)

Gonçalves, Karen dos Santos; Winkler, Mirko S.; Benchimol-Barbosa, Paulo Roberto; de Hoogh, Kees; Artaxo, Paulo Eduardo; de Souza Hacon, Sandra; Schindler, Christian; Künzli, Nino

2018-07-01

Epidemiological studies generally use particulate matter measurements with diameter less 2.5 μm (PM2.5) from monitoring networks. Satellite aerosol optical depth (AOD) data has considerable potential in predicting PM2.5 concentrations, and thus provides an alternative method for producing knowledge regarding the level of pollution and its health impact in areas where no ground PM2.5 measurements are available. This is the case in the Brazilian Amazon rainforest region where forest fires are frequent sources of high pollution. In this study, we applied a non-linear model for predicting PM2.5 concentration from AOD retrievals using interaction terms between average temperature, relative humidity, sine, cosine of date in a period of 365,25 days and the square of the lagged relative residual. Regression performance statistics were tested comparing the goodness of fit and R2 based on results from linear regression and non-linear regression for six different models. The regression results for non-linear prediction showed the best performance, explaining on average 82% of the daily PM2.5 concentrations when considering the whole period studied. In the context of Amazonia, it was the first study predicting PM2.5 concentrations using the latest high-resolution AOD products also in combination with the testing of a non-linear model performance. Our results permitted a reliable prediction considering the AOD-PM2.5 relationship and set the basis for further investigations on air pollution impacts in the complex context of Brazilian Amazon Region.
New Optical Transforms For Statistical Image Recognition

NASA Astrophysics Data System (ADS)

Lee, Sing H.

1983-12-01

In optical implementation of statistical image recognition, new optical transforms on large images for real-time recognition are of special interest. Several important linear transformations frequently used in statistical pattern recognition have now been optically implemented, including the Karhunen-Loeve transform (KLT), the Fukunaga-Koontz transform (FKT) and the least-squares linear mapping technique (LSLMT).1-3 The KLT performs principle components analysis on one class of patterns for feature extraction. The FKT performs feature extraction for separating two classes of patterns. The LSLMT separates multiple classes of patterns by maximizing the interclass differences and minimizing the intraclass variations.
Fractional Gaussian model in global optimization

NASA Astrophysics Data System (ADS)

Dimri, V. P.; Srivastava, R. P.

2009-12-01

Earth system is inherently non-linear and it can be characterized well if we incorporate no-linearity in the formulation and solution of the problem. General tool often used for characterization of the earth system is inversion. Traditionally inverse problems are solved using least-square based inversion by linearizing the formulation. The initial model in such inversion schemes is often assumed to follow posterior Gaussian probability distribution. It is now well established that most of the physical properties of the earth follow power law (fractal distribution). Thus, the selection of initial model based on power law probability distribution will provide more realistic solution. We present a new method which can draw samples of posterior probability density function very efficiently using fractal based statistics. The application of the method has been demonstrated to invert band limited seismic data with well control. We used fractal based probability density function which uses mean, variance and Hurst coefficient of the model space to draw initial model. Further this initial model is used in global optimization inversion scheme. Inversion results using initial models generated by our method gives high resolution estimates of the model parameters than the hitherto used gradient based liner inversion method.
Strengthen forensic entomology in court--the need for data exploration and the validation of a generalised additive mixed model.

PubMed

Baqué, Michèle; Amendt, Jens

2013-01-01

Developmental data of juvenile blow flies (Diptera: Calliphoridae) are typically used to calculate the age of immature stages found on or around a corpse and thus to estimate a minimum post-mortem interval (PMI(min)). However, many of those data sets don't take into account that immature blow flies grow in a non-linear fashion. Linear models do not supply a sufficient reliability on age estimates and may even lead to an erroneous determination of the PMI(min). According to the Daubert standard and the need for improvements in forensic science, new statistic tools like smoothing methods and mixed models allow the modelling of non-linear relationships and expand the field of statistical analyses. The present study introduces into the background and application of these statistical techniques by analysing a model which describes the development of the forensically important blow fly Calliphora vicina at different temperatures. The comparison of three statistical methods (linear regression, generalised additive modelling and generalised additive mixed modelling) clearly demonstrates that only the latter provided regression parameters that reflect the data adequately. We focus explicitly on both the exploration of the data--to assure their quality and to show the importance of checking it carefully prior to conducting the statistical tests--and the validation of the resulting models. Hence, we present a common method for evaluating and testing forensic entomological data sets by using for the first time generalised additive mixed models.
Markov and semi-Markov switching linear mixed models used to identify forest tree growth components.

PubMed

Chaubert-Pereira, Florence; Guédon, Yann; Lavergne, Christian; Trottier, Catherine

2010-09-01

Tree growth is assumed to be mainly the result of three components: (i) an endogenous component assumed to be structured as a succession of roughly stationary phases separated by marked change points that are asynchronous among individuals, (ii) a time-varying environmental component assumed to take the form of synchronous fluctuations among individuals, and (iii) an individual component corresponding mainly to the local environment of each tree. To identify and characterize these three components, we propose to use semi-Markov switching linear mixed models, i.e., models that combine linear mixed models in a semi-Markovian manner. The underlying semi-Markov chain represents the succession of growth phases and their lengths (endogenous component) whereas the linear mixed models attached to each state of the underlying semi-Markov chain represent-in the corresponding growth phase-both the influence of time-varying climatic covariates (environmental component) as fixed effects, and interindividual heterogeneity (individual component) as random effects. In this article, we address the estimation of Markov and semi-Markov switching linear mixed models in a general framework. We propose a Monte Carlo expectation-maximization like algorithm whose iterations decompose into three steps: (i) sampling of state sequences given random effects, (ii) prediction of random effects given state sequences, and (iii) maximization. The proposed statistical modeling approach is illustrated by the analysis of successive annual shoots along Corsican pine trunks influenced by climatic covariates. © 2009, The International Biometric Society.
Optimizing cost-efficiency in mean exposure assessment - cost functions reconsidered

PubMed Central

2011-01-01

Background Reliable exposure data is a vital concern in medical epidemiology and intervention studies. The present study addresses the needs of the medical researcher to spend monetary resources devoted to exposure assessment with an optimal cost-efficiency, i.e. obtain the best possible statistical performance at a specified budget. A few previous studies have suggested mathematical optimization procedures based on very simple cost models; this study extends the methodology to cover even non-linear cost scenarios. Methods Statistical performance, i.e. efficiency, was assessed in terms of the precision of an exposure mean value, as determined in a hierarchical, nested measurement model with three stages. Total costs were assessed using a corresponding three-stage cost model, allowing costs at each stage to vary non-linearly with the number of measurements according to a power function. Using these models, procedures for identifying the optimally cost-efficient allocation of measurements under a constrained budget were developed, and applied on 225 scenarios combining different sizes of unit costs, cost function exponents, and exposure variance components. Results Explicit mathematical rules for identifying optimal allocation could be developed when cost functions were linear, while non-linear cost functions implied that parts of or the entire optimization procedure had to be carried out using numerical methods. For many of the 225 scenarios, the optimal strategy consisted in measuring on only one occasion from each of as many subjects as allowed by the budget. Significant deviations from this principle occurred if costs for recruiting subjects were large compared to costs for setting up measurement occasions, and, at the same time, the between-subjects to within-subject variance ratio was small. In these cases, non-linearities had a profound influence on the optimal allocation and on the eventual size of the exposure data set. Conclusions The analysis procedures developed in the present study can be used for informed design of exposure assessment strategies, provided that data are available on exposure variability and the costs of collecting and processing data. The present shortage of empirical evidence on costs and appropriate cost functions however impedes general conclusions on optimal exposure measurement strategies in different epidemiologic scenarios. PMID:21600023
Optimizing cost-efficiency in mean exposure assessment--cost functions reconsidered.

PubMed

Mathiassen, Svend Erik; Bolin, Kristian

2011-05-21

Reliable exposure data is a vital concern in medical epidemiology and intervention studies. The present study addresses the needs of the medical researcher to spend monetary resources devoted to exposure assessment with an optimal cost-efficiency, i.e. obtain the best possible statistical performance at a specified budget. A few previous studies have suggested mathematical optimization procedures based on very simple cost models; this study extends the methodology to cover even non-linear cost scenarios. Statistical performance, i.e. efficiency, was assessed in terms of the precision of an exposure mean value, as determined in a hierarchical, nested measurement model with three stages. Total costs were assessed using a corresponding three-stage cost model, allowing costs at each stage to vary non-linearly with the number of measurements according to a power function. Using these models, procedures for identifying the optimally cost-efficient allocation of measurements under a constrained budget were developed, and applied on 225 scenarios combining different sizes of unit costs, cost function exponents, and exposure variance components. Explicit mathematical rules for identifying optimal allocation could be developed when cost functions were linear, while non-linear cost functions implied that parts of or the entire optimization procedure had to be carried out using numerical methods.For many of the 225 scenarios, the optimal strategy consisted in measuring on only one occasion from each of as many subjects as allowed by the budget. Significant deviations from this principle occurred if costs for recruiting subjects were large compared to costs for setting up measurement occasions, and, at the same time, the between-subjects to within-subject variance ratio was small. In these cases, non-linearities had a profound influence on the optimal allocation and on the eventual size of the exposure data set. The analysis procedures developed in the present study can be used for informed design of exposure assessment strategies, provided that data are available on exposure variability and the costs of collecting and processing data. The present shortage of empirical evidence on costs and appropriate cost functions however impedes general conclusions on optimal exposure measurement strategies in different epidemiologic scenarios.
Determination of statistics for any rotation of axes of a bivariate normal elliptical distribution. [of wind vector components

NASA Technical Reports Server (NTRS)

Falls, L. W.; Crutcher, H. L.

1976-01-01

Transformation of statistics from a dimensional set to another dimensional set involves linear functions of the original set of statistics. Similarly, linear functions will transform statistics within a dimensional set such that the new statistics are relevant to a new set of coordinate axes. A restricted case of the latter is the rotation of axes in a coordinate system involving any two correlated random variables. A special case is the transformation for horizontal wind distributions. Wind statistics are usually provided in terms of wind speed and direction (measured clockwise from north) or in east-west and north-south components. A direct application of this technique allows the determination of appropriate wind statistics parallel and normal to any preselected flight path of a space vehicle. Among the constraints for launching space vehicles are critical values selected from the distribution of the expected winds parallel to and normal to the flight path. These procedures are applied to space vehicle launches at Cape Kennedy, Florida.
Statistical Calibration and Validation of a Homogeneous Ventilated Wall-Interference Correction Method for the National Transonic Facility

NASA Technical Reports Server (NTRS)

Walker, Eric L.

2005-01-01

Wind tunnel experiments will continue to be a primary source of validation data for many types of mathematical and computational models in the aerospace industry. The increased emphasis on accuracy of data acquired from these facilities requires understanding of the uncertainty of not only the measurement data but also any correction applied to the data. One of the largest and most critical corrections made to these data is due to wall interference. In an effort to understand the accuracy and suitability of these corrections, a statistical validation process for wall interference correction methods has been developed. This process is based on the use of independent cases which, after correction, are expected to produce the same result. Comparison of these independent cases with respect to the uncertainty in the correction process establishes a domain of applicability based on the capability of the method to provide reasonable corrections with respect to customer accuracy requirements. The statistical validation method was applied to the version of the Transonic Wall Interference Correction System (TWICS) recently implemented in the National Transonic Facility at NASA Langley Research Center. The TWICS code generates corrections for solid and slotted wall interference in the model pitch plane based on boundary pressure measurements. Before validation could be performed on this method, it was necessary to calibrate the ventilated wall boundary condition parameters. Discrimination comparisons are used to determine the most representative of three linear boundary condition models which have historically been used to represent longitudinally slotted test section walls. Of the three linear boundary condition models implemented for ventilated walls, the general slotted wall model was the most representative of the data. The TWICS code using the calibrated general slotted wall model was found to be valid to within the process uncertainty for test section Mach numbers less than or equal to 0.60. The scatter among the mean corrected results of the bodies of revolution validation cases was within one count of drag on a typical transport aircraft configuration for Mach numbers at or below 0.80 and two counts of drag for Mach numbers at or below 0.90.

Incidence of childhood leukaemia and non-Hodgkin's lymphoma in the vicinity of nuclear sites in Scotland, 1968-93.

PubMed Central

Sharp, L; Black, R J; Harkness, E F; McKinney, P A

1996-01-01

OBJECTIVES: The primary aims were to investigate the incidence of leukaemia and non-Hodgkin's lymphoma in children resident near seven nuclear sites in Scotland and to determine whether there was any evidence of a gradient in risk with distance of residence from a nuclear site. A secondary aim was to assess the power of statistical tests for increased risk of disease near a point source when applied in the context of census data for Scotland. METHODS: The study data set comprised 1287 cases of leukaemia and non-Hodgkin's lymphoma diagnosed in children aged under 15 years in the period 1968-93, validated for accuracy and completeness. A study zone around each nuclear site was constructed from enumeration districts within 25 km. Expected numbers were calculated, adjusting for sex, age, and indices of deprivation and urban-rural residence. Six statistical tests were evaluated. Stone's maximum likelihood ratio (unconditional application) was applied as the main test for general increased incidence across a study zone. The linear risk score based on enumeration districts (conditional application) was used as a secondary test for declining risk with distance from each site. RESULTS: More cases were observed (O) than expected (E) in the study zones around Rosyth naval base (O/E 1.02), Chapelcross electricity generating station (O/E 1.08), and Dounreay reprocessing plant (O/E 1.99). The maximum likelihood ratio test reached significance only for Dounreay (P = 0.030). The linear risk score test did not indicate a trend in risk with distance from any of the seven sites, including Dounreay. CONCLUSIONS: There was no evidence of a generally increased risk of childhood leukaemia and non-Hodgkin's lymphoma around nuclear sites in Scotland, nor any evidence of a trend of decreasing risk with distance from any of the sites. There was a significant excess risk in the zone around Dounreay, which was only partially accounted for by the sociodemographic characteristics of the area. The statistical power of tests for localised increased risk of disease around a point source should be assessed in each new setting in which they are applied. PMID:8994402
Statistical aspects of solar flares

NASA Technical Reports Server (NTRS)

Wilson, Robert M.

1987-01-01

A survey of the statistical properties of 850 H alpha solar flares during 1975 is presented. Comparison of the results found here with those reported elsewhere for different epochs is accomplished. Distributions of rise time, decay time, and duration are given, as are the mean, mode, median, and 90th percentile values. Proportions by selected groupings are also determined. For flares in general, mean values for rise time, decay time, and duration are 5.2 + or - 0.4 min, and 18.1 + or 1.1 min, respectively. Subflares, accounting for nearly 90 percent of the flares, had mean values lower than those found for flares of H alpha importance greater than 1, and the differences are statistically significant. Likewise, flares of bright and normal relative brightness have mean values of decay time and duration that are significantly longer than those computed for faint flares, and mass-motion related flares are significantly longer than non-mass-motion related flares. Seventy-three percent of the mass-motion related flares are categorized as being a two-ribbon flare and/or being accompanied by a high-speed dark filament. Slow rise time flares (rise time greater than 5 min) have a mean value for duration that is significantly longer than that computed for fast rise time flares, and long-lived duration flares (duration greater than 18 min) have a mean value for rise time that is significantly longer than that computed for short-lived duration flares, suggesting a positive linear relationship between rise time and duration for flares. Monthly occurrence rates for flares in general and by group are found to be linearly related in a positive sense to monthly sunspot number. Statistical testing reveals the association between sunspot number and numbers of flares to be significant at the 95 percent level of confidence, and the t statistic for slope is significant at greater than 99 percent level of confidence. Dependent upon the specific fit, between 58 percent and 94 percent of the variation can be accounted for with the linear fits. A statistically significant Northern Hemisphere flare excess (P less than 1 percent) was found, as was a Western Hemisphere excess (P approx 3 percent). Subflares were more prolific within 45 deg of central meridian (P less than 1 percent), while flares of H alpha importance or = 1 were more prolific near the limbs greater than 45 deg from central meridian; P approx 2 percent). Two-ribbon flares were more frequent within 45 deg of central meridian (P less than 1 percent). Slow rise time flares occurred more frequently in the western hemisphere (P approx 2 percent), as did short-lived duration flares (P approx 9 percent), but fast rise time flares were not preferentially distributed (in terms of east-west or limb-disk). Long-lived duration flares occurred more often within 45 deg 0 central meridian (P approx 7 percent). Mean durations for subflares and flares of H alpha importance or + 1, found within 45 deg of central meridian, are 14 percent and 70 percent, respectively, longer than those found for flares closer to the limb. As compared to flares occurring near cycle maximum, the flares of 1975 (near solar minimum) have mean values of rise time, decay time, and duration that are significantly shorter. A flare near solar maximum, on average, is about 1.6 times longer than one occurring near solar minimum.
Birth weight, current anthropometric markers, and high sensitivity C-reactive protein in Brazilian school children.

PubMed

Boscaini, Camile; Pellanda, Lucia Campos

2015-01-01

Studies have shown associations of birth weight with increased concentrations of high sensitivity C-reactive protein. This study assessed the relationship between birth weight, anthropometric and metabolic parameters during childhood, and high sensitivity C-reactive protein. A total of 612 Brazilian school children aged 5-13 years were included in the study. High sensitivity C-reactive protein was measured by particle-enhanced immunonephelometry. Nutritional status was assessed by body mass index, waist circumference, and skinfolds. Total cholesterol and fractions, triglycerides, and glucose were measured by enzymatic methods. Insulin sensitivity was determined by the homeostasis model assessment method. Statistical analysis included chi-square test, General Linear Model, and General Linear Model for Gamma Distribution. Body mass index, waist circumference, and skinfolds were directly associated with birth weight (P < 0.001, P = 0.001, and P = 0.015, resp.). Large for gestational age children showed higher high sensitivity C-reactive protein levels (P < 0.001) than small for gestational age. High birth weight is associated with higher levels of high sensitivity C-reactive protein, body mass index, waist circumference, and skinfolds. Large for gestational age altered high sensitivity C-reactive protein and promoted additional risk factor for atherosclerosis in these school children, independent of current nutritional status.
Comparison Of Eigenvector-Based Statistical Pattern Recognition Algorithms For Hybrid Processing

NASA Astrophysics Data System (ADS)

Tian, Q.; Fainman, Y.; Lee, Sing H.

1989-02-01

The pattern recognition algorithms based on eigenvector analysis (group 2) are theoretically and experimentally compared in this part of the paper. Group 2 consists of Foley-Sammon (F-S) transform, Hotelling trace criterion (HTC), Fukunaga-Koontz (F-K) transform, linear discriminant function (LDF) and generalized matched filter (GMF). It is shown that all eigenvector-based algorithms can be represented in a generalized eigenvector form. However, the calculations of the discriminant vectors are different for different algorithms. Summaries on how to calculate the discriminant functions for the F-S, HTC and F-K transforms are provided. Especially for the more practical, underdetermined case, where the number of training images is less than the number of pixels in each image, the calculations usually require the inversion of a large, singular, pixel correlation (or covariance) matrix. We suggest solving this problem by finding its pseudo-inverse, which requires inverting only the smaller, non-singular image correlation (or covariance) matrix plus multiplying several non-singular matrices. We also compare theoretically the effectiveness for classification with the discriminant functions from F-S, HTC and F-K with LDF and GMF, and between the linear-mapping-based algorithms and the eigenvector-based algorithms. Experimentally, we compare the eigenvector-based algorithms using a set of image data bases each image consisting of 64 x 64 pixels.
Determination of the efficiency of diets for larval development in mass rearing Aedes aegypti (Diptera: Culicidae).

PubMed

Gunathilaka, P A D H N; Uduwawala, U M H U; Udayanga, N W B A L; Ranathunge, R M T B; Amarasinghe, L D; Abeyewickreme, W

2017-11-23

Larval diet quality and rearing conditions have a direct and irreversible effect on adult traits. Therefore, the current study was carried out to optimize the larval diet for mass rearing of Aedes aegypti, for Sterile Insect Technique (SIT)-based applications in Sri Lanka. Five batches of 750 first instar larvae (L 1) of Ae. aegypti were exposed to five different concentrations (2-10%) of International Atomic Energy Agency (IAEA) recommended the larval diet. Morphological development parameters of larva, pupa, and adult were detected at 24 h intervals along with selected growth parameters. Each experiment was replicated five times. General Linear Modeling along with Pearson's correlation analysis were used for statistical treatments. Significant differences (P < 0.05) among the larvae treated with different concentrations were found using General Linear Modeling in all the stages namely: total body length and the thoracic length of larvae; cephalothoracic length and width of pupae; thoracic length, thoracic width, abdominal length and the wing length of adults; along with pupation rate and success, sex ratio, adult success, fecundity and hatching rate of Ae. aegypti. The best quality adults can be produced at larval diet concentration of 10%. However, the 8% larval diet concentration was most suitable for adult male survival.
Development of the Complex General Linear Model in the Fourier Domain: Application to fMRI Multiple Input-Output Evoked Responses for Single Subjects

PubMed Central

Rio, Daniel E.; Rawlings, Robert R.; Woltz, Lawrence A.; Gilman, Jodi; Hommer, Daniel W.

2013-01-01

A linear time-invariant model based on statistical time series analysis in the Fourier domain for single subjects is further developed and applied to functional MRI (fMRI) blood-oxygen level-dependent (BOLD) multivariate data. This methodology was originally developed to analyze multiple stimulus input evoked response BOLD data. However, to analyze clinical data generated using a repeated measures experimental design, the model has been extended to handle multivariate time series data and demonstrated on control and alcoholic subjects taken from data previously analyzed in the temporal domain. Analysis of BOLD data is typically carried out in the time domain where the data has a high temporal correlation. These analyses generally employ parametric models of the hemodynamic response function (HRF) where prewhitening of the data is attempted using autoregressive (AR) models for the noise. However, this data can be analyzed in the Fourier domain. Here, assumptions made on the noise structure are less restrictive, and hypothesis tests can be constructed based on voxel-specific nonparametric estimates of the hemodynamic transfer function (HRF in the Fourier domain). This is especially important for experimental designs involving multiple states (either stimulus or drug induced) that may alter the form of the response function. PMID:23840281
Development of the complex general linear model in the Fourier domain: application to fMRI multiple input-output evoked responses for single subjects.

PubMed

Rio, Daniel E; Rawlings, Robert R; Woltz, Lawrence A; Gilman, Jodi; Hommer, Daniel W

2013-01-01

A linear time-invariant model based on statistical time series analysis in the Fourier domain for single subjects is further developed and applied to functional MRI (fMRI) blood-oxygen level-dependent (BOLD) multivariate data. This methodology was originally developed to analyze multiple stimulus input evoked response BOLD data. However, to analyze clinical data generated using a repeated measures experimental design, the model has been extended to handle multivariate time series data and demonstrated on control and alcoholic subjects taken from data previously analyzed in the temporal domain. Analysis of BOLD data is typically carried out in the time domain where the data has a high temporal correlation. These analyses generally employ parametric models of the hemodynamic response function (HRF) where prewhitening of the data is attempted using autoregressive (AR) models for the noise. However, this data can be analyzed in the Fourier domain. Here, assumptions made on the noise structure are less restrictive, and hypothesis tests can be constructed based on voxel-specific nonparametric estimates of the hemodynamic transfer function (HRF in the Fourier domain). This is especially important for experimental designs involving multiple states (either stimulus or drug induced) that may alter the form of the response function.
Social networking policies in nursing education.

PubMed

Frazier, Blake; Culley, Joan M; Hein, Laura C; Williams, Amber; Tavakoli, Abbas S

2014-03-01

Social networking use has increased exponentially in the past few years. A literature review related to social networking and nursing revealed a research gap between nursing practice and education. Although there was information available on the appropriate use of social networking sites, there was limited research on the use of social networking policies within nursing education. The purpose of this study was to identify current use of social media by faculty and students and a need for policies within nursing education at one institution. A survey was developed and administered to nursing students (n = 273) and nursing faculty (n = 33). Inferential statistics included χ², Fisher exact test, t test, and General Linear Model. Cronbach's α was used to assess internal consistency of social media scales. The χ² result indicates that there were associations with the group and several social media items. t Test results indicate significant differences between student and faculty for average of policies are good (P = .0127), policies and discipline (P = .0315), and policy at the study school (P = .0013). General Linear Model analyses revealed significant differences for "friend" a patient with a bond, unprofessional posts, policy, and nursing with class level. Results showed that students and faculty supported the development of a social networking policy.
Automatic optimal filament segmentation with sub-pixel accuracy using generalized linear models and B-spline level-sets.

PubMed

Xiao, Xun; Geyer, Veikko F; Bowne-Anderson, Hugo; Howard, Jonathon; Sbalzarini, Ivo F

2016-08-01

Biological filaments, such as actin filaments, microtubules, and cilia, are often imaged using different light-microscopy techniques. Reconstructing the filament curve from the acquired images constitutes the filament segmentation problem. Since filaments have lower dimensionality than the image itself, there is an inherent trade-off between tracing the filament with sub-pixel accuracy and avoiding noise artifacts. Here, we present a globally optimal filament segmentation method based on B-spline vector level-sets and a generalized linear model for the pixel intensity statistics. We show that the resulting optimization problem is convex and can hence be solved with global optimality. We introduce a simple and efficient algorithm to compute such optimal filament segmentations, and provide an open-source implementation as an ImageJ/Fiji plugin. We further derive an information-theoretic lower bound on the filament segmentation error, quantifying how well an algorithm could possibly do given the information in the image. We show that our algorithm asymptotically reaches this bound in the spline coefficients. We validate our method in comprehensive benchmarks, compare with other methods, and show applications from fluorescence, phase-contrast, and dark-field microscopy. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Spatio-Chromatic Adaptation via Higher-Order Canonical Correlation Analysis of Natural Images

PubMed Central

Gutmann, Michael U.; Laparra, Valero; Hyvärinen, Aapo; Malo, Jesús

2014-01-01

Independent component and canonical correlation analysis are two general-purpose statistical methods with wide applicability. In neuroscience, independent component analysis of chromatic natural images explains the spatio-chromatic structure of primary cortical receptive fields in terms of properties of the visual environment. Canonical correlation analysis explains similarly chromatic adaptation to different illuminations. But, as we show in this paper, neither of the two methods generalizes well to explain both spatio-chromatic processing and adaptation at the same time. We propose a statistical method which combines the desirable properties of independent component and canonical correlation analysis: It finds independent components in each data set which, across the two data sets, are related to each other via linear or higher-order correlations. The new method is as widely applicable as canonical correlation analysis, and also to more than two data sets. We call it higher-order canonical correlation analysis. When applied to chromatic natural images, we found that it provides a single (unified) statistical framework which accounts for both spatio-chromatic processing and adaptation. Filters with spatio-chromatic tuning properties as in the primary visual cortex emerged and corresponding-colors psychophysics was reproduced reasonably well. We used the new method to make a theory-driven testable prediction on how the neural response to colored patterns should change when the illumination changes. We predict shifts in the responses which are comparable to the shifts reported for chromatic contrast habituation. PMID:24533049
Spatio-chromatic adaptation via higher-order canonical correlation analysis of natural images.

PubMed

Gutmann, Michael U; Laparra, Valero; Hyvärinen, Aapo; Malo, Jesús

2014-01-01

Independent component and canonical correlation analysis are two general-purpose statistical methods with wide applicability. In neuroscience, independent component analysis of chromatic natural images explains the spatio-chromatic structure of primary cortical receptive fields in terms of properties of the visual environment. Canonical correlation analysis explains similarly chromatic adaptation to different illuminations. But, as we show in this paper, neither of the two methods generalizes well to explain both spatio-chromatic processing and adaptation at the same time. We propose a statistical method which combines the desirable properties of independent component and canonical correlation analysis: It finds independent components in each data set which, across the two data sets, are related to each other via linear or higher-order correlations. The new method is as widely applicable as canonical correlation analysis, and also to more than two data sets. We call it higher-order canonical correlation analysis. When applied to chromatic natural images, we found that it provides a single (unified) statistical framework which accounts for both spatio-chromatic processing and adaptation. Filters with spatio-chromatic tuning properties as in the primary visual cortex emerged and corresponding-colors psychophysics was reproduced reasonably well. We used the new method to make a theory-driven testable prediction on how the neural response to colored patterns should change when the illumination changes. We predict shifts in the responses which are comparable to the shifts reported for chromatic contrast habituation.
An Unbiased Estimator of Gene Diversity with Improved Variance for Samples Containing Related and Inbred Individuals of any Ploidy

PubMed Central

Harris, Alexandre M.; DeGiorgio, Michael

2016-01-01

Gene diversity, or expected heterozygosity (H), is a common statistic for assessing genetic variation within populations. Estimation of this statistic decreases in accuracy and precision when individuals are related or inbred, due to increased dependence among allele copies in the sample. The original unbiased estimator of expected heterozygosity underestimates true population diversity in samples containing relatives, as it only accounts for sample size. More recently, a general unbiased estimator of expected heterozygosity was developed that explicitly accounts for related and inbred individuals in samples. Though unbiased, this estimator’s variance is greater than that of the original estimator. To address this issue, we introduce a general unbiased estimator of gene diversity for samples containing related or inbred individuals, which employs the best linear unbiased estimator of allele frequencies, rather than the commonly used sample proportion. We examine the properties of this estimator, H∼BLUE, relative to alternative estimators using simulations and theoretical predictions, and show that it predominantly has the smallest mean squared error relative to others. Further, we empirically assess the performance of H∼BLUE on a global human microsatellite dataset of 5795 individuals, from 267 populations, genotyped at 645 loci. Additionally, we show that the improved variance of H∼BLUE leads to improved estimates of the population differentiation statistic, FST, which employs measures of gene diversity within its calculation. Finally, we provide an R script, BestHet, to compute this estimator from genomic and pedigree data. PMID:28040781
Statistical Tests of System Linearity Based on the Method of Surrogate Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hunter, N.; Paez, T.; Red-Horse, J.

When dealing with measured data from dynamic systems we often make the tacit assumption that the data are generated by linear dynamics. While some systematic tests for linearity and determinism are available - for example the coherence fimction, the probability density fimction, and the bispectrum - fi,u-ther tests that quanti$ the existence and the degree of nonlinearity are clearly needed. In this paper we demonstrate a statistical test for the nonlinearity exhibited by a dynamic system excited by Gaussian random noise. We perform the usual division of the input and response time series data into blocks as required by themore » Welch method of spectrum estimation and search for significant relationships between a given input fkequency and response at harmonics of the selected input frequency. We argue that systematic tests based on the recently developed statistical method of surrogate data readily detect significant nonlinear relationships. The paper elucidates the method of surrogate data. Typical results are illustrated for a linear single degree-of-freedom system and for a system with polynomial stiffness nonlinearity.« less
An analysis of a large dataset on immigrant integration in Spain. The Statistical Mechanics perspective on Social Action

NASA Astrophysics Data System (ADS)

Barra, Adriano; Contucci, Pierluigi; Sandell, Rickard; Vernia, Cecilia

2014-02-01

How does immigrant integration in a country change with immigration density? Guided by a statistical mechanics perspective we propose a novel approach to this problem. The analysis focuses on classical integration quantifiers such as the percentage of jobs (temporary and permanent) given to immigrants, mixed marriages, and newborns with parents of mixed origin. We find that the average values of different quantifiers may exhibit either linear or non-linear growth on immigrant density and we suggest that social action, a concept identified by Max Weber, causes the observed non-linearity. Using the statistical mechanics notion of interaction to quantitatively emulate social action, a unified mathematical model for integration is proposed and it is shown to explain both growth behaviors observed. The linear theory instead, ignoring the possibility of interaction effects would underestimate the quantifiers up to 30% when immigrant densities are low, and overestimate them as much when densities are high. The capacity to quantitatively isolate different types of integration mechanisms makes our framework a suitable tool in the quest for more efficient integration policies.
Comparing The Effectiveness of a90/95 Calculations (Preprint)

DTIC Science & Technology

2006-09-01

Nachtsheim, John Neter, William Li, Applied Linear Statistical Models , 5th ed., McGraw-Hill/Irwin, 2005 5. Mood, Graybill and Boes, Introduction...curves is based on methods that are only valid for ordinary linear regression. Requirements for a valid Ordinary Least-Squares Regression Model There... linear . For example is a linear model ; is not. 2. Uniform variance (homoscedasticity
The Inverse Problem for Confined Aquifer Flow: Identification and Estimation With Extensions

NASA Astrophysics Data System (ADS)

Loaiciga, Hugo A.; MariñO, Miguel A.

1987-01-01

The contributions of this work are twofold. First, a methodology for estimating the elements of parameter matrices in the governing equation of flow in a confined aquifer is developed. The estimation techniques for the distributed-parameter inverse problem pertain to linear least squares and generalized least squares methods. The linear relationship among the known heads and unknown parameters of the flow equation provides the background for developing criteria for determining the identifiability status of unknown parameters. Under conditions of exact or overidentification it is possible to develop statistically consistent parameter estimators and their asymptotic distributions. The estimation techniques, namely, two-stage least squares and three stage least squares, are applied to a specific groundwater inverse problem and compared between themselves and with an ordinary least squares estimator. The three-stage estimator provides the closer approximation to the actual parameter values, but it also shows relatively large standard errors as compared to the ordinary and two-stage estimators. The estimation techniques provide the parameter matrices required to simulate the unsteady groundwater flow equation. Second, a nonlinear maximum likelihood estimation approach to the inverse problem is presented. The statistical properties of maximum likelihood estimators are derived, and a procedure to construct confidence intervals and do hypothesis testing is given. The relative merits of the linear and maximum likelihood estimators are analyzed. Other topics relevant to the identification and estimation methodologies, i.e., a continuous-time solution to the flow equation, coping with noise-corrupted head measurements, and extension of the developed theory to nonlinear cases are also discussed. A simulation study is used to evaluate the methods developed in this study.
Statistical treatment for the wet bias in tree-ring chronologies: A case study from the InteriorWest, USA

Treesearch

Yan Sun; Matthew F. Bekker; R. Justin DeRose; Roger Kjelgren; S. -Y. Simon Wang

2017-01-01

Dendroclimatic research has long assumed a linear relationship between tree-ring increment and climate variables. However, ring width frequently underestimates extremely wet years, a phenomenon we refer to as âwet biasâ. In this paper, we present statistical evidence for wet bias that is obscured by the assumption of linearity. To improve tree-ring-climate modeling, we...
Incorporating signal-dependent noise for hyperspectral target detection

NASA Astrophysics Data System (ADS)

Morman, Christopher J.; Meola, Joseph

2015-05-01

The majority of hyperspectral target detection algorithms are developed from statistical data models employing stationary background statistics or white Gaussian noise models. Stationary background models are inaccurate as a result of two separate physical processes. First, varying background classes often exist in the imagery that possess different clutter statistics. Many algorithms can account for this variability through the use of subspaces or clustering techniques. The second physical process, which is often ignored, is a signal-dependent sensor noise term. For photon counting sensors that are often used in hyperspectral imaging systems, sensor noise increases as the measured signal level increases as a result of Poisson random processes. This work investigates the impact of this sensor noise on target detection performance. A linear noise model is developed describing sensor noise variance as a linear function of signal level. The linear noise model is then incorporated for detection of targets using data collected at Wright Patterson Air Force Base.
Artificial neural networks and multiple linear regression model using principal components to estimate rainfall over South America

NASA Astrophysics Data System (ADS)

Soares dos Santos, T.; Mendes, D.; Rodrigues Torres, R.

2016-01-01

Several studies have been devoted to dynamic and statistical downscaling for analysis of both climate variability and climate change. This paper introduces an application of artificial neural networks (ANNs) and multiple linear regression (MLR) by principal components to estimate rainfall in South America. This method is proposed for downscaling monthly precipitation time series over South America for three regions: the Amazon; northeastern Brazil; and the La Plata Basin, which is one of the regions of the planet that will be most affected by the climate change projected for the end of the 21st century. The downscaling models were developed and validated using CMIP5 model output and observed monthly precipitation. We used general circulation model (GCM) experiments for the 20th century (RCP historical; 1970-1999) and two scenarios (RCP 2.6 and 8.5; 2070-2100). The model test results indicate that the ANNs significantly outperform the MLR downscaling of monthly precipitation variability.
Dark energy and modified gravity in the Effective Field Theory of Large-Scale Structure

NASA Astrophysics Data System (ADS)

Cusin, Giulia; Lewandowski, Matthew; Vernizzi, Filippo

2018-04-01

We develop an approach to compute observables beyond the linear regime of dark matter perturbations for general dark energy and modified gravity models. We do so by combining the Effective Field Theory of Dark Energy and Effective Field Theory of Large-Scale Structure approaches. In particular, we parametrize the linear and nonlinear effects of dark energy on dark matter clustering in terms of the Lagrangian terms introduced in a companion paper [1], focusing on Horndeski theories and assuming the quasi-static approximation. The Euler equation for dark matter is sourced, via the Newtonian potential, by new nonlinear vertices due to modified gravity and, as in the pure dark matter case, by the effects of short-scale physics in the form of the divergence of an effective stress tensor. The effective fluid introduces a counterterm in the solution to the matter continuity and Euler equations, which allows a controlled expansion of clustering statistics on mildly nonlinear scales. We use this setup to compute the one-loop dark-matter power spectrum.

Feasibility and acceptability of cell phone diaries to measure HIV risk behavior among female sex workers.

PubMed

Roth, Alexis M; Hensel, Devon J; Fortenberry, J Dennis; Garfein, Richard S; Gunn, Jayleen K L; Wiehe, Sarah E

2014-12-01

Individual, social, and structural factors affecting HIV risk behaviors among female sex workers (FSWs) are difficult to assess using retrospective surveys methods. To test the feasibility and acceptability of cell phone diaries to collect information about sexual events, we recruited 26 FSWs in Indianapolis, Indiana (US). Over 4 weeks, FSWs completed twice daily digital diaries about their mood, drug use, sexual interactions, and daily activities. Feasibility was assessed using repeated measures general linear modeling and descriptive statistics examined event-level contextual information and acceptability. Of 1,420 diaries expected, 90.3 % were completed by participants and compliance was stable over time (p > .05 for linear trend). Sexual behavior was captured in 22 % of diaries and participant satisfaction with diary data collection was high. These data provide insight into event-level factors impacting HIV risk among FSWs. We discuss implications for models of sexual behavior and individually tailored interventions to prevent HIV in this high-risk group.
Predicting major element mineral/melt equilibria - A statistical approach

NASA Technical Reports Server (NTRS)

Hostetler, C. J.; Drake, M. J.

1980-01-01

Empirical equations have been developed for calculating the mole fractions of NaO0.5, MgO, AlO1.5, SiO2, KO0.5, CaO, TiO2, and FeO in a solid phase of initially unknown identity given only the composition of the coexisting silicate melt. The approach involves a linear multivariate regression analysis in which solid composition is expressed as a Taylor series expansion of the liquid compositions. An internally consistent precision of approximately 0.94 is obtained, that is, the nature of the liquidus phase in the input data set can be correctly predicted for approximately 94% of the entries. The composition of the liquidus phase may be calculated to better than 5 mol % absolute. An important feature of this 'generalized solid' model is its reversibility; that is, the dependent and independent variables in the linear multivariate regression may be inverted to permit prediction of the composition of a silicate liquid produced by equilibrium partial melting of a polymineralic source assemblage.
Martian lineaments from Mariner 6 and 7 photographs

NASA Technical Reports Server (NTRS)

Schultz, P. H.; Ingerson, F. E.

1973-01-01

Mariner 6 and 7 photographs were used to investigate the nature and importance of linear surface trends on Mars. Cross correlations of frequency-azimuth distributions of linear trends from different Mariner frames indicate that lineations not recognized as topographic features have a component of pseudoforms, probably introduced during digital reconstruction of the pictures. Similar statistical tests may aid in the analysis of surface trends from future satellites and space probes. The most reliable data were separated into photometrically defined provinces. Meridiani Sinus and Margaritifer Sinus display five major trends in common, which are interpreted as extensions of crustal weaknesses related to the enormous equatorial canyon revealed in Mariner 6 and 9 pictures. Alignments of crater wall segments generally match these trends and suggest structural control of crater plan. Crater chains, however, do not match these trends and are interpreted as secondary impacts. Rose diagrams of lineations in Deucalionis Regio exhibit much more complexity and are believed to reflect a better preserved or more complex geologic history.
Improvements in aircraft extraction programs

NASA Technical Reports Server (NTRS)

Balakrishnan, A. V.; Maine, R. E.

1976-01-01

Flight data from an F-8 Corsair and a Cessna 172 was analyzed to demonstrate specific improvements in the LRC parameter extraction computer program. The Cramer-Rao bounds were shown to provide a satisfactory relative measure of goodness of parameter estimates. It was not used as an absolute measure due to an inherent uncertainty within a multiplicative factor, traced in turn to the uncertainty in the noise bandwidth in the statistical theory of parameter estimation. The measure was also derived on an entirely nonstatistical basis, yielding thereby also an interpretation of the significance of off-diagonal terms in the dispersion matrix. The distinction between coefficients as linear and non-linear was shown to be important in its implication to a recommended order of parameter iteration. Techniques of improving convergence generally, were developed, and tested out on flight data. In particular, an easily implemented modification incorporating a gradient search was shown to improve initial estimates and thus remove a common cause for lack of convergence.
Copula-based model for rainfall and El- Niño in Banyuwangi Indonesia

NASA Astrophysics Data System (ADS)

Caraka, R. E.; Supari; Tahmid, M.

2018-04-01

Modelling, describing and measuring the structure dependences between different random events is at the very heart of statistics. Therefore, a broad variety of varying dependence concepts has been developed in the past. Most often, practitioners rely only on the linear correlation to describe the degree of dependence between two or more variables; an approach that can lead to quite misleading conclusions as this measure is only capable of capturing linear relationships. Copulas go beyond dependence measures and provide a sound framework for general dependence modelling. This paper will introduce an application of Copula to estimate, understand, and interpret the dependence structure in a given set of data El-Niño in Banyuwangi, Indonesia. In a nutshell, we proved the flexibility of Copulas Archimedean in rainfall modelling and catching phenomena of El Niño in Banyuwangi, East Java, Indonesia. Also, it was found that SST of nino3, nino4, and nino3.4 are most appropriate ENSO indicators in identifying the relationship of El Nino and rainfall.
Meta-analysis of quantitative pleiotropic traits for next-generation sequencing with multivariate functional linear models

PubMed Central

Chiu, Chi-yang; Jung, Jeesun; Chen, Wei; Weeks, Daniel E; Ren, Haobo; Boehnke, Michael; Amos, Christopher I; Liu, Aiyi; Mills, James L; Ting Lee, Mei-ling; Xiong, Momiao; Fan, Ruzong

2017-01-01

To analyze next-generation sequencing data, multivariate functional linear models are developed for a meta-analysis of multiple studies to connect genetic variant data to multiple quantitative traits adjusting for covariates. The goal is to take the advantage of both meta-analysis and pleiotropic analysis in order to improve power and to carry out a unified association analysis of multiple studies and multiple traits of complex disorders. Three types of approximate F -distributions based on Pillai–Bartlett trace, Hotelling–Lawley trace, and Wilks's Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants. Simulation analysis is performed to evaluate false-positive rates and power of the proposed tests. The proposed methods are applied to analyze lipid traits in eight European cohorts. It is shown that it is more advantageous to perform multivariate analysis than univariate analysis in general, and it is more advantageous to perform meta-analysis of multiple studies instead of analyzing the individual studies separately. The proposed models require individual observations. The value of the current paper can be seen at least for two reasons: (a) the proposed methods can be applied to studies that have individual genotype data; (b) the proposed methods can be used as a criterion for future work that uses summary statistics to build test statistics to meta-analyze the data. PMID:28000696
Reliable probabilities through statistical post-processing of ensemble predictions

NASA Astrophysics Data System (ADS)

Van Schaeybroeck, Bert; Vannitsem, Stéphane

2013-04-01

We develop post-processing or calibration approaches based on linear regression that make ensemble forecasts more reliable. We enforce climatological reliability in the sense that the total variability of the prediction is equal to the variability of the observations. Second, we impose ensemble reliability such that the spread around the ensemble mean of the observation coincides with the one of the ensemble members. In general the attractors of the model and reality are inhomogeneous. Therefore ensemble spread displays a variability not taken into account in standard post-processing methods. We overcome this by weighting the ensemble by a variable error. The approaches are tested in the context of the Lorenz 96 model (Lorenz 1996). The forecasts become more reliable at short lead times as reflected by a flatter rank histogram. Our best method turns out to be superior to well-established methods like EVMOS (Van Schaeybroeck and Vannitsem, 2011) and Nonhomogeneous Gaussian Regression (Gneiting et al., 2005). References [1] Gneiting, T., Raftery, A. E., Westveld, A., Goldman, T., 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Weather Rev. 133, 1098-1118. [2] Lorenz, E. N., 1996: Predictability - a problem partly solved. Proceedings, Seminar on Predictability ECMWF. 1, 1-18. [3] Van Schaeybroeck, B., and S. Vannitsem, 2011: Post-processing through linear regression, Nonlin. Processes Geophys., 18, 147.
Parametric and nonparametric Granger causality testing: Linkages between international stock markets

NASA Astrophysics Data System (ADS)

De Gooijer, Jan G.; Sivarajasingham, Selliah

2008-04-01

This study investigates long-term linear and nonlinear causal linkages among eleven stock markets, six industrialized markets and five emerging markets of South-East Asia. We cover the period 1987-2006, taking into account the on-set of the Asian financial crisis of 1997. We first apply a test for the presence of general nonlinearity in vector time series. Substantial differences exist between the pre- and post-crisis period in terms of the total number of significant nonlinear relationships. We then examine both periods, using a new nonparametric test for Granger noncausality and the conventional parametric Granger noncausality test. One major finding is that the Asian stock markets have become more internationally integrated after the Asian financial crisis. An exception is the Sri Lankan market with almost no significant long-term linear and nonlinear causal linkages with other markets. To ensure that any causality is strictly nonlinear in nature, we also examine the nonlinear causal relationships of VAR filtered residuals and VAR filtered squared residuals for the post-crisis sample. We find quite a few remaining significant bi- and uni-directional causal nonlinear relationships in these series. Finally, after filtering the VAR-residuals with GARCH-BEKK models, we show that the nonparametric test statistics are substantially smaller in both magnitude and statistical significance than those before filtering. This indicates that nonlinear causality can, to a large extent, be explained by simple volatility effects.
Velocity and displacement statistics in a stochastic model of nonlinear friction showing bounded particle speed

NASA Astrophysics Data System (ADS)

Menzel, Andreas M.

2015-11-01

Diffusion of colloidal particles in a complex environment such as polymer networks or biological cells is a topic of high complexity with significant biological and medical relevance. In such situations, the interaction between the surroundings and the particle motion has to be taken into account. We analyze a simplified diffusion model that includes some aspects of a complex environment in the framework of a nonlinear friction process: at low particle speeds, friction grows linearly with the particle velocity as for regular viscous friction; it grows more than linearly at higher particle speeds; finally, at a maximum of the possible particle speed, the friction diverges. In addition to bare diffusion, we study the influence of a constant drift force acting on the diffusing particle. While the corresponding stationary velocity distributions can be derived analytically, the displacement statistics generally must be determined numerically. However, as a benefit of our model, analytical progress can be made in one case of a special maximum particle speed. The effect of a drift force in this case is analytically determined by perturbation theory. It will be interesting in the future to compare our results to real experimental systems. One realization could be magnetic colloidal particles diffusing through a shear-thickening environment such as starch suspensions, possibly exposed to an external magnetic field gradient.
Does raising type 1 error rate improve power to detect interactions in linear regression models? A simulation study.

PubMed

Durand, Casey P

2013-01-01

Statistical interactions are a common component of data analysis across a broad range of scientific disciplines. However, the statistical power to detect interactions is often undesirably low. One solution is to elevate the Type 1 error rate so that important interactions are not missed in a low power situation. To date, no study has quantified the effects of this practice on power in a linear regression model. A Monte Carlo simulation study was performed. A continuous dependent variable was specified, along with three types of interactions: continuous variable by continuous variable; continuous by dichotomous; and dichotomous by dichotomous. For each of the three scenarios, the interaction effect sizes, sample sizes, and Type 1 error rate were varied, resulting in a total of 240 unique simulations. In general, power to detect the interaction effect was either so low or so high at α = 0.05 that raising the Type 1 error rate only served to increase the probability of including a spurious interaction in the model. A small number of scenarios were identified in which an elevated Type 1 error rate may be justified. Routinely elevating Type 1 error rate when testing interaction effects is not an advisable practice. Researchers are best served by positing interaction effects a priori and accounting for them when conducting sample size calculations.
Meta-analysis of quantitative pleiotropic traits for next-generation sequencing with multivariate functional linear models.

PubMed

Chiu, Chi-Yang; Jung, Jeesun; Chen, Wei; Weeks, Daniel E; Ren, Haobo; Boehnke, Michael; Amos, Christopher I; Liu, Aiyi; Mills, James L; Ting Lee, Mei-Ling; Xiong, Momiao; Fan, Ruzong

2017-02-01

To analyze next-generation sequencing data, multivariate functional linear models are developed for a meta-analysis of multiple studies to connect genetic variant data to multiple quantitative traits adjusting for covariates. The goal is to take the advantage of both meta-analysis and pleiotropic analysis in order to improve power and to carry out a unified association analysis of multiple studies and multiple traits of complex disorders. Three types of approximate F -distributions based on Pillai-Bartlett trace, Hotelling-Lawley trace, and Wilks's Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants. Simulation analysis is performed to evaluate false-positive rates and power of the proposed tests. The proposed methods are applied to analyze lipid traits in eight European cohorts. It is shown that it is more advantageous to perform multivariate analysis than univariate analysis in general, and it is more advantageous to perform meta-analysis of multiple studies instead of analyzing the individual studies separately. The proposed models require individual observations. The value of the current paper can be seen at least for two reasons: (a) the proposed methods can be applied to studies that have individual genotype data; (b) the proposed methods can be used as a criterion for future work that uses summary statistics to build test statistics to meta-analyze the data.
Does systemic administration of casein phosphopeptides affect orthodontic movement and root resorption in rats?

PubMed

Crowther, Lachlan; Shen, Gang; Almuzian, Mohammed; Jones, Allan; Walsh, William; Oliver, Rema; Petocz, Peter; Tarraf, Nour E; Darendeliler, M Ali

2017-10-01

To assess the potential effects of casein phosphopeptides (CPPs) on orthodontically induced iatrogenic root resorption (OIIRR) and orthodontic teeth movement. Forty Wistar rats (aged 11 weeks) were randomly divided into experimental group (EG; n = 20) that received a diet supplemented with CPP and control group (CG; n = 20) devoid of diet supplement. A 150 g force was applied using nickel titanium (NiTi) coil that was bonded on maxillary incisors and extended unilaterally to a maxillary first molar. At Day 28, animals in both groups were euthanized. Volumetric assessment of root resorption craters and linear measurement of maxillary first molars movement were blindly examined using a micro-computed tomography scan. Nine rats were excluded from the experiment due to loss during general anesthesia or appliances' failure. Intra-operator reproducibility was high in both volumetric and linear measurements, 92.8 per cent and 98.5-97.6 per cent, respectively. The results reveal that dietary CPP has statistically insignificant effect on the overall OIIRR and orthodontic movement. CPP seems to have statistically insignificant effect on the volume of OIIRR and orthodontic movement in rats. A long-term study with larger sample size using a different concentration of CPP is required to clarify the dentoalveolar effect of CPP. © The Author 2017. Published by Oxford University Press on behalf of the European Orthodontic Society. All rights reserved. For permissions, please email: journals.permissions@oup.com
Time-frequency Features for Impedance Cardiography Signals During Anesthesia Using Different Distribution Kernels.

PubMed

Muñoz, Jesús Escrivá; Gambús, Pedro; Jensen, Erik W; Vallverdú, Montserrat

2018-01-01

This works investigates the time-frequency content of impedance cardiography signals during a propofol-remifentanil anesthesia. In the last years, impedance cardiography (ICG) is a technique which has gained much attention. However, ICG signals need further investigation. Time-Frequency Distributions (TFDs) with 5 different kernels are used in order to analyze impedance cardiography signals (ICG) before the start of the anesthesia and after the loss of consciousness. In total, ICG signals from one hundred and thirty-one consecutive patients undergoing major surgery under general anesthesia were analyzed. Several features were extracted from the calculated TFDs in order to characterize the time-frequency content of the ICG signals. Differences between those features before and after the loss of consciousness were studied. The Extended Modified Beta Distribution (EMBD) was the kernel for which most features shows statistically significant changes between before and after the loss of consciousness. Among all analyzed features, those based on entropy showed a sensibility, specificity and area under the curve of the receiver operating characteristic above 60%. The anesthetic state of the patient is reflected on linear and non-linear features extracted from the TFDs of the ICG signals. Especially, the EMBD is a suitable kernel for the analysis of ICG signals and offers a great range of features which change according to the patient's anesthesia state in a statistically significant way. Schattauer GmbH.
Diagnosis checking of statistical analysis in RCTs indexed in PubMed.

PubMed

Lee, Paul H; Tse, Andy C Y

2017-11-01

Statistical analysis is essential for reporting of the results of randomized controlled trials (RCTs), as well as evaluating their effectiveness. However, the validity of a statistical analysis also depends on whether the assumptions of that analysis are valid. To review all RCTs published in journals indexed in PubMed during December 2014 to provide a complete picture of how RCTs handle assumptions of statistical analysis. We reviewed all RCTs published in December 2014 that appeared in journals indexed in PubMed using the Cochrane highly sensitive search strategy. The 2014 impact factors of the journals were used as proxies for their quality. The type of statistical analysis used and whether the assumptions of the analysis were tested were reviewed. In total, 451 papers were included. Of the 278 papers that reported a crude analysis for the primary outcomes, 31 (27·2%) reported whether the outcome was normally distributed. Of the 172 papers that reported an adjusted analysis for the primary outcomes, diagnosis checking was rarely conducted, with only 20%, 8·6% and 7% checked for generalized linear model, Cox proportional hazard model and multilevel model, respectively. Study characteristics (study type, drug trial, funding sources, journal type and endorsement of CONSORT guidelines) were not associated with the reporting of diagnosis checking. The diagnosis of statistical analyses in RCTs published in PubMed-indexed journals was usually absent. Journals should provide guidelines about the reporting of a diagnosis of assumptions. © 2017 Stichting European Society for Clinical Investigation Journal Foundation.
Rank-based testing of equal survivorship based on cross-sectional survival data with or without prospective follow-up.

PubMed

Chan, Kwun Chuen Gary; Qin, Jing

2015-10-01

Existing linear rank statistics cannot be applied to cross-sectional survival data without follow-up since all subjects are essentially censored. However, partial survival information are available from backward recurrence times and are frequently collected from health surveys without prospective follow-up. Under length-biased sampling, a class of linear rank statistics is proposed based only on backward recurrence times without any prospective follow-up. When follow-up data are available, the proposed rank statistic and a conventional rank statistic that utilizes follow-up information from the same sample are shown to be asymptotically independent. We discuss four ways to combine these two statistics when follow-up is present. Simulations show that all combined statistics have substantially improved power compared with conventional rank statistics, and a Mantel-Haenszel test performed the best among the proposal statistics. The method is applied to a cross-sectional health survey without follow-up and a study of Alzheimer's disease with prospective follow-up. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Jackknife Variance Estimator for Two Sample Linear Rank Statistics

DTIC Science & Technology

1988-11-01

Accesion For - - ,NTIS GPA&I "TIC TAB Unann c, nc .. [d Keywords: strong consistency; linear rank test’ influence function . i , at L By S- )Distribut...reverse if necessary and identify by block number) FIELD IGROUP SUB-GROUP Strong consistency; linear rank test; influence function . 19. ABSTRACT
Maternal hypothyroxinemia during pregnancy and growth of the fetal and infant head.

PubMed

van Mil, Nina H; Steegers-Theunissen, Régine P M; Bongers-Schokking, Jacoba J; El Marroun, Hanan; Ghassabian, Akhgar; Hofman, Albert; Jaddoe, Vincent W V; Visser, Theo J; Verhulst, Frank C; de Rijke, Yolanda B; Steegers, Eric A P; Tiemeier, Henning

2012-12-01

Severe maternal thyroid dysfunction during pregnancy affects fetal brain growth and corticogenesis. This study focused on the effect of maternal hypothyroxinemia during early pregnancy on growth of the fetal and infant head. In a population-based birth cohort, we assessed thyroid status in early pregnancy (median 13.4, 90% range 10.8-17.2), in 4894 women, and measured the prenatal and postnatal head size of their children at 5 time points. Hypothyroxinemia was defined as normal thyroid-stimulating hormone levels and free thyroxine-4 concentrations below the 10th percentile. Statistical analysis was performed using linear generalized estimating equation. Maternal hypothyroxinemia was associated with larger fetal and infant head size (overall estimate β: 1.38, 95% confidence interval 0.56; 2.19, P = .001). In conclusion, in the general population, even small variations in maternal thyroid function during pregnancy may affect the developing head of the young child.
An introduction to analyzing dichotomous outcomes in a longitudinal setting: a NIDRR traumatic brain injury model systems communication.

PubMed

Pretz, Christopher R; Ketchum, Jessica M; Cuthbert, Jeffery P

2014-01-01

An untapped wealth of temporal information is captured within the Traumatic Brain Injury Model Systems National Database. Utilization of appropriate longitudinal analyses can provide an avenue toward unlocking the value of this information. This article highlights 2 statistical methods used for assessing change over time when examination of noncontinuous outcomes is of interest where this article focuses on investigation of dichotomous responses. Specifically, the intent of this article is to familiarize the rehabilitation community with the application of generalized estimating equations and generalized linear mixed models as used in longitudinal studies. An introduction to each method is provided where similarities and differences between the 2 are discussed. In addition, to reinforce the ideas and concepts embodied in each approach, we highlight each method, using examples based on data from the Rocky Mountain Regional Brain Injury System.
General purpose graphic processing unit implementation of adaptive pulse compression algorithms

NASA Astrophysics Data System (ADS)

Cai, Jingxiao; Zhang, Yan

2017-07-01

This study introduces a practical approach to implement real-time signal processing algorithms for general surveillance radar based on NVIDIA graphical processing units (GPUs). The pulse compression algorithms are implemented using compute unified device architecture (CUDA) libraries such as CUDA basic linear algebra subroutines and CUDA fast Fourier transform library, which are adopted from open source libraries and optimized for the NVIDIA GPUs. For more advanced, adaptive processing algorithms such as adaptive pulse compression, customized kernel optimization is needed and investigated. A statistical optimization approach is developed for this purpose without needing much knowledge of the physical configurations of the kernels. It was found that the kernel optimization approach can significantly improve the performance. Benchmark performance is compared with the CPU performance in terms of processing accelerations. The proposed implementation framework can be used in various radar systems including ground-based phased array radar, airborne sense and avoid radar, and aerospace surveillance radar.
[Quality-of-life-related factors in adolescents].

PubMed

Lima-Serrano, Marta; Martínez-Montilla, José Manuel; Guerra-Martín, María Dolores; Vargas-Martínez, Ana Magdalena; Lima-Rodríguez, Joaquín S

To determine quality of life (QoL) and its relationship to lifestyles in adolescents in high schools. Cross-sectional, observational study with 256 students aged 12 to 17 in Seville (Spain). Multiple linear regression models were tested (p <0.05). The boys had higher scores in most of the QoL areas. The female gender was inversely related to physical, psychological, familial QoL areas and the general QoL index. Family functionality and performing physical activity were the factors most associated with better QoL in all areas. All multivariate models were statistically significant and explained from 11% of social QoL variability to 35% of the general QoL index. The findings could be useful for developing interventions to promote health in schools, with the objective of promoting healthy lifestyles and QoL. Copyright © 2016 SESPAS. Publicado por Elsevier España, S.L.U. All rights reserved.

Prediction of air pollutant concentration based on sparse response back-propagation training feedforward neural networks.

PubMed

Ding, Weifu; Zhang, Jiangshe; Leung, Yee

2016-10-01

In this paper, we predict air pollutant concentration using a feedforward artificial neural network inspired by the mechanism of the human brain as a useful alternative to traditional statistical modeling techniques. The neural network is trained based on sparse response back-propagation in which only a small number of neurons respond to the specified stimulus simultaneously and provide a high convergence rate for the trained network, in addition to low energy consumption and greater generalization. Our method is evaluated on Hong Kong air monitoring station data and corresponding meteorological variables for which five air quality parameters were gathered at four monitoring stations in Hong Kong over 4 years (2012-2015). Our results show that our training method has more advantages in terms of the precision of the prediction, effectiveness, and generalization of traditional linear regression algorithms when compared with a feedforward artificial neural network trained using traditional back-propagation.
General Aviation Avionics Statistics : 1975

DOT National Transportation Integrated Search

1978-06-01

This report presents avionics statistics for the 1975 general aviation (GA) aircraft fleet and updates a previous publication, General Aviation Avionics Statistics: 1974. The statistics are presented in a capability group framework which enables one ...
On spurious detection of linear response and misuse of the fluctuation-dissipation theorem in finite time series

NASA Astrophysics Data System (ADS)

Gottwald, Georg A.; Wormell, J. P.; Wouters, Jeroen

2016-09-01

Using a sensitive statistical test we determine whether or not one can detect the breakdown of linear response given observations of deterministic dynamical systems. A goodness-of-fit statistics is developed for a linear statistical model of the observations, based on results for central limit theorems for deterministic dynamical systems, and used to detect linear response breakdown. We apply the method to discrete maps which do not obey linear response and show that the successful detection of breakdown depends on the length of the time series, the magnitude of the perturbation and on the choice of the observable. We find that in order to reliably reject the assumption of linear response for typical observables sufficiently large data sets are needed. Even for simple systems such as the logistic map, one needs of the order of 106 observations to reliably detect the breakdown with a confidence level of 95 %; if less observations are available one may be falsely led to conclude that linear response theory is valid. The amount of data required is larger the smaller the applied perturbation. For judiciously chosen observables the necessary amount of data can be drastically reduced, but requires detailed a priori knowledge about the invariant measure which is typically not available for complex dynamical systems. Furthermore we explore the use of the fluctuation-dissipation theorem (FDT) in cases with limited data length or coarse-graining of observations. The FDT, if applied naively to a system without linear response, is shown to be very sensitive to the details of the sampling method, resulting in erroneous predictions of the response.
Geometrical effects on the electron residence time in semiconductor nano-particles.

PubMed

Koochi, Hakimeh; Ebrahimi, Fatemeh

2014-09-07

We have used random walk (RW) numerical simulations to investigate the influence of the geometry on the statistics of the electron residence time τ(r) in a trap-limited diffusion process through semiconductor nano-particles. This is an important parameter in coarse-grained modeling of charge carrier transport in nano-structured semiconductor films. The traps have been distributed randomly on the surface (r(2) model) or through the whole particle (r(3) model) with a specified density. The trap energies have been taken from an exponential distribution and the traps release time is assumed to be a stochastic variable. We have carried out (RW) simulations to study the effect of coordination number, the spatial arrangement of the neighbors and the size of nano-particles on the statistics of τ(r). It has been observed that by increasing the coordination number n, the average value of electron residence time, τ̅(r) rapidly decreases to an asymptotic value. For a fixed coordination number n, the electron's mean residence time does not depend on the neighbors' spatial arrangement. In other words, τ̅(r) is a porosity-dependence, local parameter which generally varies remarkably from site to site, unless we are dealing with highly ordered structures. We have also examined the effect of nano-particle size d on the statistical behavior of τ̅(r). Our simulations indicate that for volume distribution of traps, τ̅(r) scales as d(2). For a surface distribution of traps τ(r) increases almost linearly with d. This leads to the prediction of a linear dependence of the diffusion coefficient D on the particle size d in ordered structures or random structures above the critical concentration which is in accordance with experimental observations.
Statistical Neurodynamics.

NASA Astrophysics Data System (ADS)

Paine, Gregory Harold

1982-03-01

The primary objective of the thesis is to explore the dynamical properties of small nerve networks by means of the methods of statistical mechanics. To this end, a general formalism is developed and applied to elementary groupings of model neurons which are driven by either constant (steady state) or nonconstant (nonsteady state) forces. Neuronal models described by a system of coupled, nonlinear, first-order, ordinary differential equations are considered. A linearized form of the neuronal equations is studied in detail. A Lagrange function corresponding to the linear neural network is constructed which, through a Legendre transformation, provides a constant of motion. By invoking the Maximum-Entropy Principle with the single integral of motion as a constraint, a probability distribution function for the network in a steady state can be obtained. The formalism is implemented for some simple networks driven by a constant force; accordingly, the analysis focuses on a study of fluctuations about the steady state. In particular, a network composed of N noninteracting neurons, termed Free Thinkers, is considered in detail, with a view to interpretation and numerical estimation of the Lagrange multiplier corresponding to the constant of motion. As an archetypical example of a net of interacting neurons, the classical neural oscillator, consisting of two mutually inhibitory neurons, is investigated. It is further shown that in the case of a network driven by a nonconstant force, the Maximum-Entropy Principle can be applied to determine a probability distribution functional describing the network in a nonsteady state. The above examples are reconsidered with nonconstant driving forces which produce small deviations from the steady state. Numerical studies are performed on simplified models of two physical systems: the starfish central nervous system and the mammalian olfactory bulb. Discussions are given as to how statistical neurodynamics can be used to gain a better understanding of the behavior of these systems.
An Interactive Tool For Semi-automated Statistical Prediction Using Earth Observations and Models

NASA Astrophysics Data System (ADS)

Zaitchik, B. F.; Berhane, F.; Tadesse, T.

2015-12-01

We developed a semi-automated statistical prediction tool applicable to concurrent analysis or seasonal prediction of any time series variable in any geographic location. The tool was developed using Shiny, JavaScript, HTML and CSS. A user can extract a predictand by drawing a polygon over a region of interest on the provided user interface (global map). The user can select the Climatic Research Unit (CRU) precipitation or Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) as predictand. They can also upload their own predictand time series. Predictors can be extracted from sea surface temperature, sea level pressure, winds at different pressure levels, air temperature at various pressure levels, and geopotential height at different pressure levels. By default, reanalysis fields are applied as predictors, but the user can also upload their own predictors, including a wide range of compatible satellite-derived datasets. The package generates correlations of the variables selected with the predictand. The user also has the option to generate composites of the variables based on the predictand. Next, the user can extract predictors by drawing polygons over the regions that show strong correlations (composites). Then, the user can select some or all of the statistical prediction models provided. Provided models include Linear Regression models (GLM, SGLM), Tree-based models (bagging, random forest, boosting), Artificial Neural Network, and other non-linear models such as Generalized Additive Model (GAM) and Multivariate Adaptive Regression Splines (MARS). Finally, the user can download the analysis steps they used, such as the region they selected, the time period they specified, the predictand and predictors they chose and preprocessing options they used, and the model results in PDF or HTML format. Key words: Semi-automated prediction, Shiny, R, GLM, ANN, RF, GAM, MARS
Additive scales in degenerative disease--calculation of effect sizes and clinical judgment.

PubMed

Riepe, Matthias W; Wilkinson, David; Förstl, Hans; Brieden, Andreas

2011-12-16

The therapeutic efficacy of an intervention is often assessed in clinical trials by scales measuring multiple diverse activities that are added to produce a cumulative global score. Medical communities and health care systems subsequently use these data to calculate pooled effect sizes to compare treatments. This is done because major doubt has been cast over the clinical relevance of statistically significant findings relying on p values with the potential to report chance findings. Hence in an aim to overcome this pooling the results of clinical studies into a meta-analyses with a statistical calculus has been assumed to be a more definitive way of deciding of efficacy. We simulate the therapeutic effects as measured with additive scales in patient cohorts with different disease severity and assess the limitations of an effect size calculation of additive scales which are proven mathematically. We demonstrate that the major problem, which cannot be overcome by current numerical methods, is the complex nature and neurobiological foundation of clinical psychiatric endpoints in particular and additive scales in general. This is particularly relevant for endpoints used in dementia research. 'Cognition' is composed of functions such as memory, attention, orientation and many more. These individual functions decline in varied and non-linear ways. Here we demonstrate that with progressive diseases cumulative values from multidimensional scales are subject to distortion by the limitations of the additive scale. The non-linearity of the decline of function impedes the calculation of effect sizes based on cumulative values from these multidimensional scales. Statistical analysis needs to be guided by boundaries of the biological condition. Alternatively, we suggest a different approach avoiding the error imposed by over-analysis of cumulative global scores from additive scales.
A phenomenological biological dose model for proton therapy based on linear energy transfer spectra.

PubMed

Rørvik, Eivind; Thörnqvist, Sara; Stokkevåg, Camilla H; Dahle, Tordis J; Fjaera, Lars Fredrik; Ytre-Hauge, Kristian S

2017-06-01

The relative biological effectiveness (RBE) of protons varies with the radiation quality, quantified by the linear energy transfer (LET). Most phenomenological models employ a linear dependency of the dose-averaged LET (LET d ) to calculate the biological dose. However, several experiments have indicated a possible non-linear trend. Our aim was to investigate if biological dose models including non-linear LET dependencies should be considered, by introducing a LET spectrum based dose model. The RBE-LET relationship was investigated by fitting of polynomials from 1st to 5th degree to a database of 85 data points from aerobic in vitro experiments. We included both unweighted and weighted regression, the latter taking into account experimental uncertainties. Statistical testing was performed to decide whether higher degree polynomials provided better fits to the data as compared to lower degrees. The newly developed models were compared to three published LET d based models for a simulated spread out Bragg peak (SOBP) scenario. The statistical analysis of the weighted regression analysis favored a non-linear RBE-LET relationship, with the quartic polynomial found to best represent the experimental data (P = 0.010). The results of the unweighted regression analysis were on the borderline of statistical significance for non-linear functions (P = 0.053), and with the current database a linear dependency could not be rejected. For the SOBP scenario, the weighted non-linear model estimated a similar mean RBE value (1.14) compared to the three established models (1.13-1.17). The unweighted model calculated a considerably higher RBE value (1.22). The analysis indicated that non-linear models could give a better representation of the RBE-LET relationship. However, this is not decisive, as inclusion of the experimental uncertainties in the regression analysis had a significant impact on the determination and ranking of the models. As differences between the models were observed for the SOBP scenario, both non-linear LET spectrum- and linear LET d based models should be further evaluated in clinically realistic scenarios. © 2017 American Association of Physicists in Medicine.
General Aviation Avionics Statistics : 1976

DOT National Transportation Integrated Search

1979-11-01

This report presents avionics statistics for the 1976 general aviation (GA) aircraft fleet and is the third in a series titled "General Aviation Avionics Statistics." The statistics are presented in a capability group framework which enables one to r...
General Aviation Avionics Statistics : 1978 Data

DOT National Transportation Integrated Search

1980-12-01

The report presents avionics statistics for the 1978 general aviation (GA) aircraft fleet and is the fifth in a series titled "General Aviation Statistics." The statistics are presented in a capability group framework which enables one to relate airb...
General Aviation Avionics Statistics : 1979 Data

DOT National Transportation Integrated Search

1981-04-01

This report presents avionics statistics for the 1979 general aviation (GA) aircraft fleet and is the sixth in a series titled General Aviation Avionics Statistics. The statistics preseneted in a capability group framework which enables one to relate...
Generalized Multilevel Structural Equation Modeling

ERIC Educational Resources Information Center

Rabe-Hesketh, Sophia; Skrondal, Anders; Pickles, Andrew

2004-01-01

A unifying framework for generalized multilevel structural equation modeling is introduced. The models in the framework, called generalized linear latent and mixed models (GLLAMM), combine features of generalized linear mixed models (GLMM) and structural equation models (SEM) and consist of a response model and a structural model for the latent…
Statistical analysis and interpretation of prenatal diagnostic imaging studies, Part 2: descriptive and inferential statistical methods.

PubMed

Tuuli, Methodius G; Odibo, Anthony O

2011-08-01

The objective of this article is to discuss the rationale for common statistical tests used for the analysis and interpretation of prenatal diagnostic imaging studies. Examples from the literature are used to illustrate descriptive and inferential statistics. The uses and limitations of linear and logistic regression analyses are discussed in detail.
Doubly robust estimation of generalized partial linear models for longitudinal data with dropouts.

PubMed

Lin, Huiming; Fu, Bo; Qin, Guoyou; Zhu, Zhongyi

2017-12-01

We develop a doubly robust estimation of generalized partial linear models for longitudinal data with dropouts. Our method extends the highly efficient aggregate unbiased estimating function approach proposed in Qu et al. (2010) to a doubly robust one in the sense that under missing at random (MAR), our estimator is consistent when either the linear conditional mean condition is satisfied or a model for the dropout process is correctly specified. We begin with a generalized linear model for the marginal mean, and then move forward to a generalized partial linear model, allowing for nonparametric covariate effect by using the regression spline smoothing approximation. We establish the asymptotic theory for the proposed method and use simulation studies to compare its finite sample performance with that of Qu's method, the complete-case generalized estimating equation (GEE) and the inverse-probability weighted GEE. The proposed method is finally illustrated using data from a longitudinal cohort study. © 2017, The International Biometric Society.
Oral Health and Quality of Life in Old Age: A Cross-Sectional Pilot Project in Germany and Poland.

PubMed

Skośkiewicz-Malinowska, Katarzyna; Noack, Barbara; Kaderali, Lars; Malicka, Barbara; Lorenz, Katrin; Walczak, Katarzyna; Weber, Marie-Theres; Mendak-Ziółko, Magdalena; Hoffmann, Thomas; Ziętek, Marek; Walter, Michael; Kaczmarek, Urszula; Hannig, Christian; Radwan-Oczko, Małgorzata; Raedel, Michael

2016-01-01

The process of ageing influences all dimensions of social life and personal well-being, but the influence of health on different dimensions of quality of life (QoL) among the elderly is rarely examined. The aim of the pilot study is to test the feasibility of a comprehensive study design to evaluate general and dental health as well as QoL in a bi-national sample. In addition, this pilot study should allow for the exploration of potential interactions between QoL, socioeconomic, health and oral health variables. Individuals aged 64 years and older (n = 100) from university dental clinics of the Wroclaw Medical University, Poland (n = 50) and of the University Hospital in Dresden, Germany (n = 50) were examined. The oral health status of participants was assessed by clinical examination. Socio-demographic, environmental and general health status were evaluated during the medical interview. General quality of life (GQoL) was assessed by an overall question with a visual analogue scale (VAS) from -5 (worst) to +5 (best). Health-related quality of life (HRQoL) and oral health-related quality of life (OHRQoL) were measured with the EQ-5D and OHIP-14 questionnaires. Statistical analyses comprised Pearson's c2 test, Wilcoxon test, linear regression model for statistical analysis and different multivariate linear regression analyses. For the GQoL-VAS-Score the results for QoL measurements were 1.22 ± 2.62 (x± SD), for EQ-5D-Score 7.45 ± 2.25 (x± SD), and for OHIP-14-ADD-Score 11.04 ± 13.56 (x± SD). Differences between Polish and German populations were observed. The study design proved to be feasible for a senior population. The overall GQoL question, EQ-5D and OHIP-14 were regarded as appropriate instruments. Subjective and objective (oral) health measures showed differences between Germany and Poland. For methodological reasons, these differences are not generalizable, but of value for study hypotheses in larger samples.
The role of environmental variables in structuring landscape-scale species distributions in seafloor habitats.

PubMed

Kraan, Casper; Aarts, Geert; Van der Meer, Jaap; Piersma, Theunis

2010-06-01

Ongoing statistical sophistication allows a shift from describing species' spatial distributions toward statistically disentangling the possible roles of environmental variables in shaping species distributions. Based on a landscape-scale benthic survey in the Dutch Wadden Sea, we show the merits of spatially explicit generalized estimating equations (GEE). The intertidal macrozoobenthic species, Macoma balthica, Cerastoderma edule, Marenzelleria viridis, Scoloplos armiger, Corophium volutator, and Urothoe poseidonis served as test cases, with median grain-size and inundation time as typical environmental explanatory variables. GEEs outperformed spatially naive generalized linear models (GLMs), and removed much residual spatial structure, indicating the importance of median grain-size and inundation time in shaping landscape-scale species distributions in the intertidal. GEE regression coefficients were smaller than those attained with GLM, and GEE standard errors were larger. The best fitting GEE for each species was used to predict species' density in relation to median grain-size and inundation time. Although no drastic changes were noted compared to previous work that described habitat suitability for benthic fauna in the Wadden Sea, our predictions provided more detailed and unbiased estimates of the determinants of species-environment relationships. We conclude that spatial GEEs offer the necessary methodological advances to further steps toward linking pattern to process.
An introduction to modeling longitudinal data with generalized additive models: applications to single-case designs.

PubMed

Sullivan, Kristynn J; Shadish, William R; Steiner, Peter M

2015-03-01

Single-case designs (SCDs) are short time series that assess intervention effects by measuring units repeatedly over time in both the presence and absence of treatment. This article introduces a statistical technique for analyzing SCD data that has not been much used in psychological and educational research: generalized additive models (GAMs). In parametric regression, the researcher must choose a functional form to impose on the data, for example, that trend over time is linear. GAMs reverse this process by letting the data inform the choice of functional form. In this article we review the problem that trend poses in SCDs, discuss how current SCD analytic methods approach trend, describe GAMs as a possible solution, suggest a GAM model testing procedure for examining the presence of trend in SCDs, present a small simulation to show the statistical properties of GAMs, and illustrate the procedure on 3 examples of different lengths. Results suggest that GAMs may be very useful both as a form of sensitivity analysis for checking the plausibility of assumptions about trend and as a primary data analysis strategy for testing treatment effects. We conclude with a discussion of some problems with GAMs and some future directions for research on the application of GAMs to SCDs. (c) 2015 APA, all rights reserved).
Statistical assessment of changes in extreme maximum temperatures over Saudi Arabia, 1985-2014

NASA Astrophysics Data System (ADS)

Raggad, Bechir

2018-05-01

In this study, two statistical approaches were adopted in the analysis of observed maximum temperature data collected from fifteen stations over Saudi Arabia during the period 1985-2014. In the first step, the behavior of extreme temperatures was analyzed and their changes were quantified with respect to the Expert Team on Climate Change Detection Monitoring indices. The results showed a general warming trend over most stations, in maximum temperature-related indices, during the period of analysis. In the second step, stationary and non-stationary extreme-value analyses were conducted for the temperature data. The results revealed that the non-stationary model with increasing linear trend in its location parameter outperforms the other models for two-thirds of the stations. Additionally, the 10-, 50-, and 100-year return levels were found to change with time considerably and that the maximum temperature could start to reappear in the different T-year return period for most stations. This analysis shows the importance of taking account the change over time in the estimation of return levels and therefore justifies the use of the non-stationary generalized extreme value distribution model to describe most of the data. Furthermore, these last findings are in line with the result of significant warming trends found in climate indices analyses.
Resampling-Based Empirical Bayes Multiple Testing Procedures for Controlling Generalized Tail Probability and Expected Value Error Rates: Focus on the False Discovery Rate and Simulation Study

PubMed Central

Dudoit, Sandrine; Gilbert, Houston N.; van der Laan, Mark J.

2014-01-01

Summary This article proposes resampling-based empirical Bayes multiple testing procedures for controlling a broad class of Type I error rates, defined as generalized tail probability (gTP) error rates, gTP(q, g) = Pr(g(Vn, Sn) > q), and generalized expected value (gEV) error rates, gEV(g) = E[g(Vn, Sn)], for arbitrary functions g(Vn, Sn) of the numbers of false positives Vn and true positives Sn. Of particular interest are error rates based on the proportion g(Vn, Sn) = Vn/(Vn + Sn) of Type I errors among the rejected hypotheses, such as the false discovery rate (FDR), FDR = E[Vn/(Vn + Sn)]. The proposed procedures offer several advantages over existing methods. They provide Type I error control for general data generating distributions, with arbitrary dependence structures among variables. Gains in power are achieved by deriving rejection regions based on guessed sets of true null hypotheses and null test statistics randomly sampled from joint distributions that account for the dependence structure of the data. The Type I error and power properties of an FDR-controlling version of the resampling-based empirical Bayes approach are investigated and compared to those of widely-used FDR-controlling linear step-up procedures in a simulation study. The Type I error and power trade-off achieved by the empirical Bayes procedures under a variety of testing scenarios allows this approach to be competitive with or outperform the Storey and Tibshirani (2003) linear step-up procedure, as an alternative to the classical Benjamini and Hochberg (1995) procedure. PMID:18932138
Introducing linear functions: an alternative statistical approach

NASA Astrophysics Data System (ADS)

Nolan, Caroline; Herbert, Sandra

2015-12-01

The introduction of linear functions is the turning point where many students decide if mathematics is useful or not. This means the role of parameters and variables in linear functions could be considered to be `threshold concepts'. There is recognition that linear functions can be taught in context through the exploration of linear modelling examples, but this has its limitations. Currently, statistical data is easily attainable, and graphics or computer algebra system (CAS) calculators are common in many classrooms. The use of this technology provides ease of access to different representations of linear functions as well as the ability to fit a least-squares line for real-life data. This means these calculators could support a possible alternative approach to the introduction of linear functions. This study compares the results of an end-of-topic test for two classes of Australian middle secondary students at a regional school to determine if such an alternative approach is feasible. In this study, test questions were grouped by concept and subjected to concept by concept analysis of the means of test results of the two classes. This analysis revealed that the students following the alternative approach demonstrated greater competence with non-standard questions.

Regression assumptions in clinical psychology research practice-a systematic review of common misconceptions.

PubMed

Ernst, Anja F; Albers, Casper J

2017-01-01

Misconceptions about the assumptions behind the standard linear regression model are widespread and dangerous. These lead to using linear regression when inappropriate, and to employing alternative procedures with less statistical power when unnecessary. Our systematic literature review investigated employment and reporting of assumption checks in twelve clinical psychology journals. Findings indicate that normality of the variables themselves, rather than of the errors, was wrongfully held for a necessary assumption in 4% of papers that use regression. Furthermore, 92% of all papers using linear regression were unclear about their assumption checks, violating APA-recommendations. This paper appeals for a heightened awareness for and increased transparency in the reporting of statistical assumption checking.
Regression assumptions in clinical psychology research practice—a systematic review of common misconceptions

PubMed Central

Ernst, Anja F.

2017-01-01

Misconceptions about the assumptions behind the standard linear regression model are widespread and dangerous. These lead to using linear regression when inappropriate, and to employing alternative procedures with less statistical power when unnecessary. Our systematic literature review investigated employment and reporting of assumption checks in twelve clinical psychology journals. Findings indicate that normality of the variables themselves, rather than of the errors, was wrongfully held for a necessary assumption in 4% of papers that use regression. Furthermore, 92% of all papers using linear regression were unclear about their assumption checks, violating APA-recommendations. This paper appeals for a heightened awareness for and increased transparency in the reporting of statistical assumption checking. PMID:28533971
Assessing exotic plant species invasions and associated soil characteristics: A case study in eastern Rocky Mountain National Park, Colorado, USA, using the pixel nested plot design

USGS Publications Warehouse

Kalkhan, M.A.; Stafford, E.J.; Woodly, P.J.; Stohlgren, T.J.

2007-01-01

Rocky Mountain National Park (RMNP), Colorado, USA, contains a diversity of plant species. However, many exotic plant species have become established, potentially impacting the structure and function of native plant communities. Our goal was to quantify patterns of exotic plant species in relation to native plant species, soil characteristics, and other abiotic factors that may indicate or predict their establishment and success. Our research approach for field data collection was based on a field plot design called the pixel nested plot. The pixel nested plot provides a link to multi-phase and multi-scale spatial modeling-mapping techniques that can be used to estimate total species richness and patterns of plant diversity at finer landscape scales. Within the eastern region of RMNP, in an area of approximately 35,000 ha, we established a total of 60 pixel nested plots in 9 vegetation types. We used canonical correspondence analysis (CCA) and multiple linear regressions to quantify relationships between soil characteristics and native and exotic plant species richness and cover. We also used linear correlation, spatial autocorrelation and cross correlation statistics to test for the spatial patterns of variables of interest. CCA showed that exotic species were significantly (P < 0.05) associated with photosynthetically active radiation (r = 0.55), soil nitrogen (r = 0.58) and bare ground (r = -0.66). Pearson's correlation statistic showed significant linear relationships between exotic species, organic carbon, soil nitrogen, and bare ground. While spatial autocorrelations indicated that our 60 pixel nested plots were spatially independent, the cross correlation statistics indicated that exotic plant species were spatially associated with bare ground, in general, exotic plant species were most abundant in areas of high native species richness. This indicates that resource managers should focus on the protection of relatively rare native rich sites with little canopy cover, and fertile soils. Using the pixel nested plot approach for data collection can facilitate the ecological monitoring of these vulnerable areas at the landscape scale in a time- and cost-effective manner. ?? 2006 Elsevier B.V. All rights reserved.
Efficient Implementation of an Optimal Interpolator for Large Spatial Data Sets

NASA Technical Reports Server (NTRS)

Memarsadeghi, Nargess; Mount, David M.

2007-01-01

Interpolating scattered data points is a problem of wide ranging interest. A number of approaches for interpolation have been proposed both from theoretical domains such as computational geometry and in applications' fields such as geostatistics. Our motivation arises from geological and mining applications. In many instances data can be costly to compute and are available only at nonuniformly scattered positions. Because of the high cost of collecting measurements, high accuracy is required in the interpolants. One of the most popular interpolation methods in this field is called ordinary kriging. It is popular because it is a best linear unbiased estimator. The price for its statistical optimality is that the estimator is computationally very expensive. This is because the value of each interpolant is given by the solution of a large dense linear system. In practice, kriging problems have been solved approximately by restricting the domain to a small local neighborhood of points that lie near the query point. Determining the proper size for this neighborhood is a solved by ad hoc methods, and it has been shown that this approach leads to undesirable discontinuities in the interpolant. Recently a more principled approach to approximating kriging has been proposed based on a technique called covariance tapering. This process achieves its efficiency by replacing the large dense kriging system with a much sparser linear system. This technique has been applied to a restriction of our problem, called simple kriging, which is not unbiased for general data sets. In this paper we generalize these results by showing how to apply covariance tapering to the more general problem of ordinary kriging. Through experimentation we demonstrate the space and time efficiency and accuracy of approximating ordinary kriging through the use of covariance tapering combined with iterative methods for solving large sparse systems. We demonstrate our approach on large data sizes arising both from synthetic sources and from real applications.
Truncated Linear Statistics Associated with the Eigenvalues of Random Matrices II. Partial Sums over Proper Time Delays for Chaotic Quantum Dots

NASA Astrophysics Data System (ADS)

Grabsch, Aurélien; Majumdar, Satya N.; Texier, Christophe

2017-06-01

Invariant ensembles of random matrices are characterized by the distribution of their eigenvalues \\{λ _1,\\ldots ,λ _N\\}. We study the distribution of truncated linear statistics of the form \\tilde{L}=\\sum _{i=1}^p f(λ _i) with p
Notes on power of normality tests of error terms in regression models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Střelec, Luboš

2015-03-10

Normality is one of the basic assumptions in applying statistical procedures. For example in linear regression most of the inferential procedures are based on the assumption of normality, i.e. the disturbance vector is assumed to be normally distributed. Failure to assess non-normality of the error terms may lead to incorrect results of usual statistical inference techniques such as t-test or F-test. Thus, error terms should be normally distributed in order to allow us to make exact inferences. As a consequence, normally distributed stochastic errors are necessary in order to make a not misleading inferences which explains a necessity and importancemore » of robust tests of normality. Therefore, the aim of this contribution is to discuss normality testing of error terms in regression models. In this contribution, we introduce the general RT class of robust tests for normality, and present and discuss the trade-off between power and robustness of selected classical and robust normality tests of error terms in regression models.« less
The impact of workplace risk factors on long-term musculoskeletal sickness absence: a registry-based 5-year follow-up from the Oslo health study.

PubMed

Foss, Line; Gravseth, Hans Magne; Kristensen, Petter; Claussen, Bjørgulf; Mehlum, Ingrid Sivesind; Knardahl, Stein; Skyberg, Knut

2011-12-01

To determine the influence of work-related risk factors by gender on long-term sickness absence with musculoskeletal diagnoses (LSM). Data from the Oslo Health Study were linked to the historical event database of Statistics Norway. Eight thousand three hundred thirty-three participants were followed from 2001 through 2005. Generalized linear models were used to compute risk differences for LSM. In total, 12.6% of the women and 8.8% of the men experienced at least one LSM. Statistically, significant LSM risk increases between 0.039 and 0.086 in association with work environment were found for heavy physical work, low job control (men only), low support from superior (women only), and having shift/night work (men only). Women exhibited a higher LSM risk, but the associations with job exposures were stronger for men. This should be addressed when occupational health services give advice on preventive measures.
Circular RNAs Are the Predominant Transcript Isoform from Hundreds of Human Genes in Diverse Cell Types

PubMed Central

Wang, Peter Lincoln; Lacayo, Norman; Brown, Patrick O.

2012-01-01

Most human pre-mRNAs are spliced into linear molecules that retain the exon order defined by the genomic sequence. By deep sequencing of RNA from a variety of normal and malignant human cells, we found RNA transcripts from many human genes in which the exons were arranged in a non-canonical order. Statistical estimates and biochemical assays provided strong evidence that a substantial fraction of the spliced transcripts from hundreds of genes are circular RNAs. Our results suggest that a non-canonical mode of RNA splicing, resulting in a circular RNA isoform, is a general feature of the gene expression program in human cells. PMID:22319583
A Statistical Method of Identifying Interactions in Neuron–Glia Systems Based on Functional Multicell Ca2+ Imaging

PubMed Central

Nakae, Ken; Ikegaya, Yuji; Ishikawa, Tomoe; Oba, Shigeyuki; Urakubo, Hidetoshi; Koyama, Masanori; Ishii, Shin

2014-01-01

Crosstalk between neurons and glia may constitute a significant part of information processing in the brain. We present a novel method of statistically identifying interactions in a neuron–glia network. We attempted to identify neuron–glia interactions from neuronal and glial activities via maximum-a-posteriori (MAP)-based parameter estimation by developing a generalized linear model (GLM) of a neuron–glia network. The interactions in our interest included functional connectivity and response functions. We evaluated the cross-validated likelihood of GLMs that resulted from the addition or removal of connections to confirm the existence of specific neuron-to-glia or glia-to-neuron connections. We only accepted addition or removal when the modification improved the cross-validated likelihood. We applied the method to a high-throughput, multicellular in vitro Ca2+ imaging dataset obtained from the CA3 region of a rat hippocampus, and then evaluated the reliability of connectivity estimates using a statistical test based on a surrogate method. Our findings based on the estimated connectivity were in good agreement with currently available physiological knowledge, suggesting our method can elucidate undiscovered functions of neuron–glia systems. PMID:25393874
The Local Stellar Velocity Field via Vector Spherical Harmonics

NASA Technical Reports Server (NTRS)

Markarov, V. V.; Murphy, D. W.

2007-01-01

We analyze the local field of stellar tangential velocities for a sample of 42,339 nonbinary Hipparcos stars with accurate parallaxes, using a vector spherical harmonic formalism. We derive simple relations between the parameters of the classical linear model (Ogorodnikov-Milne) of the local systemic field and low-degree terms of the general vector harmonic decomposition. Taking advantage of these relationships, we determine the solar velocity with respect to the local stars of (V(sub X), V(sub Y), V(sub Z)) (10.5, 18.5, 7.3) +/- 0.1 km s(exp -1) not corrected for the asymmetric drift with respect to the local standard of rest. If only stars more distant than 100 pc are considered, the peculiar solar motion is (V(sub X), V(sub Y), V(sub Z)) (9.9, 15.6, 6.9) +/- 0.2 km s(exp -1). The adverse effects of harmonic leakage, which occurs between the reflex solar motion represented by the three electric vector harmonics in the velocity space and higher degree harmonics in the proper-motion space, are eliminated in our analysis by direct subtraction of the reflex solar velocity in its tangential components for each star. The Oort parameters determined by a straightforward least-squares adjustment in vector spherical harmonics are A=14.0 +/- 1.4, B=13.1 +/- 1.2, K=1.1 +/- 1.8, and C=2.9 +/- 1.4 km s(exp -1) kpc(exp -1). The physical meaning and the implications of these parameters are discussed in the framework of a general linear model of the velocity field. We find a few statistically significant higher degree harmonic terms that do not correspond to any parameters in the classical linear model. One of them, a third-degree electric harmonic, is tentatively explained as the response to a negative linear gradient of rotation velocity with distance from the Galactic plane, which we estimate at approximately -20 km s(exp -1) kpc(exp -1). A similar vertical gradient of rotation velocity has been detected for more distant stars representing the thick disk (z greater than 1 kpc), but here we surmise its existence in the thin disk at z less than 200 pc. The most unexpected and unexplained term within the Ogorodnikov-Milne model is the first-degree magnetic harmonic, representing a rigid rotation of the stellar field about the axis -Y pointing opposite to the direction of rotation. This harmonic comes out with a statistically robust coefficient of 6.2 +/- 0.9 km s(exp -1) kpc(exp -1) and is also present in the velocity field of more distant stars. The ensuing upward vertical motion of stars in the general direction of the Galactic center and the downward motion in the anticenter direction are opposite to the vector field expected from the stationary Galactic warp model.
Estimation of group means when adjusting for covariates in generalized linear models.

PubMed

Qu, Yongming; Luo, Junxiang

2015-01-01

Generalized linear models are commonly used to analyze categorical data such as binary, count, and ordinal outcomes. Adjusting for important prognostic factors or baseline covariates in generalized linear models may improve the estimation efficiency. The model-based mean for a treatment group produced by most software packages estimates the response at the mean covariate, not the mean response for this treatment group for the studied population. Although this is not an issue for linear models, the model-based group mean estimates in generalized linear models could be seriously biased for the true group means. We propose a new method to estimate the group mean consistently with the corresponding variance estimation. Simulation showed the proposed method produces an unbiased estimator for the group means and provided the correct coverage probability. The proposed method was applied to analyze hypoglycemia data from clinical trials in diabetes. Copyright © 2014 John Wiley & Sons, Ltd.
Advanced Statistical Analyses to Reduce Inconsistency of Bond Strength Data.

PubMed

Minamino, T; Mine, A; Shintani, A; Higashi, M; Kawaguchi-Uemura, A; Kabetani, T; Hagino, R; Imai, D; Tajiri, Y; Matsumoto, M; Yatani, H

2017-11-01

This study was designed to clarify the interrelationship of factors that affect the value of microtensile bond strength (µTBS), focusing on nondestructive testing by which information of the specimens can be stored and quantified. µTBS test specimens were prepared from 10 noncarious human molars. Six factors of µTBS test specimens were evaluated: presence of voids at the interface, X-ray absorption coefficient of resin, X-ray absorption coefficient of dentin, length of dentin part, size of adhesion area, and individual differences of teeth. All specimens were observed nondestructively by optical coherence tomography and micro-computed tomography before µTBS testing. After µTBS testing, the effect of these factors on µTBS data was analyzed by the general linear model, linear mixed effects regression model, and nonlinear regression model with 95% confidence intervals. By the general linear model, a significant difference in individual differences of teeth was observed ( P < 0.001). A significantly positive correlation was shown between µTBS and length of dentin part ( P < 0.001); however, there was no significant nonlinearity ( P = 0.157). Moreover, a significantly negative correlation was observed between µTBS and size of adhesion area ( P = 0.001), with significant nonlinearity ( P = 0.014). No correlation was observed between µTBS and X-ray absorption coefficient of resin ( P = 0.147), and there was no significant nonlinearity ( P = 0.089). Additionally, a significantly positive correlation was observed between µTBS and X-ray absorption coefficient of dentin ( P = 0.022), with significant nonlinearity ( P = 0.036). A significant difference was also observed between the presence and absence of voids by linear mixed effects regression analysis. Our results showed correlations between various parameters of tooth specimens and µTBS data. To evaluate the performance of the adhesive more precisely, the effect of tooth variability and a method to reduce variation in bond strength values should also be considered.
Assessment of bias correction under transient climate change

NASA Astrophysics Data System (ADS)

Van Schaeybroeck, Bert; Vannitsem, Stéphane

2015-04-01

Calibration of climate simulations is necessary since large systematic discrepancies are generally found between the model climate and the observed climate. Recent studies have cast doubt upon the common assumption of the bias being stationary when the climate changes. This led to the development of new methods, mostly based on linear sensitivity of the biases as a function of time or forcing (Kharin et al. 2012). However, recent studies uncovered more fundamental problems using both low-order systems (Vannitsem 2011) and climate models, showing that the biases may display complicated non-linear variations under climate change. This last analysis focused on biases derived from the equilibrium climate sensitivity, thereby ignoring the effect of the transient climate sensitivity. Based on the linear response theory, a general method of bias correction is therefore proposed that can be applied on any climate forcing scenario. The validity of the method is addressed using twin experiments with a climate model of intermediate complexity LOVECLIM (Goosse et al., 2010). We evaluate to what extent the bias change is sensitive to the structure (frequency) of the applied forcing (here greenhouse gases) and whether the linear response theory is valid for global and/or local variables. To answer these question we perform large-ensemble simulations using different 300-year scenarios of forced carbon-dioxide concentrations. Reality and simulations are assumed to differ by a model error emulated as a parametric error in the wind drag or in the radiative scheme. References [1] H. Goosse et al., 2010: Description of the Earth system model of intermediate complexity LOVECLIM version 1.2, Geosci. Model Dev., 3, 603-633. [2] S. Vannitsem, 2011: Bias correction and post-processing under climate change, Nonlin. Processes Geophys., 18, 911-924. [3] V.V. Kharin, G. J. Boer, W. J. Merryfield, J. F. Scinocca, and W.-S. Lee, 2012: Statistical adjustment of decadal predictions in a changing climate, Geophys. Res. Lett., 39, L19705.
Modelling leaf photosynthetic and transpiration temperature-dependent responses in Vitis vinifera cv. Semillon grapevines growing in hot, irrigated vineyard conditions

PubMed Central

Greer, Dennis H.

2012-01-01

Background and aims Grapevines growing in Australia are often exposed to very high temperatures and the question of how the gas exchange processes adjust to these conditions is not well understood. The aim was to develop a model of photosynthesis and transpiration in relation to temperature to quantify the impact of the growing conditions on vine performance. Methodology Leaf gas exchange was measured along the grapevine shoots in accordance with their growth and development over several growing seasons. Using a general linear statistical modelling approach, photosynthesis and transpiration were modelled against leaf temperature separated into bands and the model parameters and coefficients applied to independent datasets to validate the model. Principal results Photosynthesis, transpiration and stomatal conductance varied along the shoot, with early emerging leaves having the highest rates, but these declined as later emerging leaves increased their gas exchange capacities in accordance with development. The general linear modelling approach applied to these data revealed that photosynthesis at each temperature was additively dependent on stomatal conductance, internal CO2 concentration and photon flux density. The temperature-dependent coefficients for these parameters applied to other datasets gave a predicted rate of photosynthesis that was linearly related to the measured rates, with a 1 : 1 slope. Temperature-dependent transpiration was multiplicatively related to stomatal conductance and the leaf to air vapour pressure deficit and applying the coefficients also showed a highly linear relationship, with a 1 : 1 slope between measured and modelled rates, when applied to independent datasets. Conclusions The models developed for the grapevines were relatively simple but accounted for much of the seasonal variation in photosynthesis and transpiration. The goodness of fit in each case demonstrated that explicitly selecting leaf temperature as a model parameter, rather than including temperature intrinsically as is usually done in more complex models, was warranted. PMID:22567220
Inference of directed climate networks: role of instability of causality estimation methods

NASA Astrophysics Data System (ADS)

Hlinka, Jaroslav; Hartman, David; Vejmelka, Martin; Paluš, Milan

2013-04-01

Climate data are increasingly analyzed by complex network analysis methods, including graph-theoretical approaches [1]. For such analysis, links between localized nodes of climate network are typically quantified by some statistical measures of dependence (connectivity) between measured variables of interest. To obtain information on the directionality of the interactions in the networks, a wide range of methods exists. These can be broadly divided into linear and nonlinear methods, with some of the latter having the theoretical advantage of being model-free, and principally a generalization of the former [2]. However, as a trade-off, this generality comes together with lower accuracy - in particular if the system was close to linear. In an overall stationary system, this may potentially lead to higher variability in the nonlinear network estimates. Therefore, with the same control of false alarms, this may lead to lower sensitivity for detection of real changes in the network structure. These problems are discussed on the example of daily SAT and SLP data from the NCEP/NCAR reanalysis dataset. We first reduce the dimensionality of data using PCA with VARIMAX rotation to detect several dozens of components that together explain most of the data variability. We further construct directed climate networks applying a selection of most widely used methods - variants of linear Granger causality and conditional mutual information. Finally, we assess the stability of the detected directed climate networks by computing them in sliding time windows. To understand the origin of the observed instabilities and their range, we also apply the same procedure to two types of surrogate data: either with non-stationarity in network structure removed, or imposed in a controlled way. In general, the linear methods show stable results in terms of overall similarity of directed climate networks inferred. For instance, for different decades of SAT data, the Spearman correlation of edge weights in the networks is ~ 0.6. The networks constructed using nonlinear measures were in general less stable both in real data and stationarized surrogates. Interestingly, when the nonlinear method parameters are optimized with respect to temporal stability of the networks, the networks seem to converge close to those detected by linear Granger causality. This provides further evidence for the hypothesis of overall sparsity and weakness of nonlinear coupling in climate networks on this spatial and temporal scale [3] and sufficient support for the use of linear methods in this context, unless specific clearly detectable nonlinear phenomena are targeted. Acknowledgement: This study is supported by the Czech Science Foundation, Project No. P103/11/J068. [1] Boccaletti, S.; Latora, V.; Moreno, Y.; Chavez, M. & Hwang, D. U.: Complex networks: Structure and dynamics, Physics Reports, 2006, 424, 175-308 [2] Barnett, L.; Barrett, A. B. & Seth, A. K.: Granger Causality and Transfer Entropy Are Equivalent for Gaussian Variables, Physical Review Letters, 2009, 103, 238701 [3] Hlinka, J.; Hartman, D.; Vejmelka, M.; Novotná, D.; Paluš, M.: Non-linear dependence and teleconnections in climate data: sources, relevance, nonstationarity, submitted preprint (http://arxiv.org/abs/1211.6688)
Quality of life in breast cancer patients--a quantile regression analysis.

PubMed

Pourhoseingholi, Mohamad Amin; Safaee, Azadeh; Moghimi-Dehkordi, Bijan; Zeighami, Bahram; Faghihzadeh, Soghrat; Tabatabaee, Hamid Reza; Pourhoseingholi, Asma

2008-01-01

Quality of life study has an important role in health care especially in chronic diseases, in clinical judgment and in medical resources supplying. Statistical tools like linear regression are widely used to assess the predictors of quality of life. But when the response is not normal the results are misleading. The aim of this study is to determine the predictors of quality of life in breast cancer patients, using quantile regression model and compare to linear regression. A cross-sectional study conducted on 119 breast cancer patients that admitted and treated in chemotherapy ward of Namazi hospital in Shiraz. We used QLQ-C30 questionnaire to assessment quality of life in these patients. A quantile regression was employed to assess the assocciated factors and the results were compared to linear regression. All analysis carried out using SAS. The mean score for the global health status for breast cancer patients was 64.92+/-11.42. Linear regression showed that only grade of tumor, occupational status, menopausal status, financial difficulties and dyspnea were statistically significant. In spite of linear regression, financial difficulties were not significant in quantile regression analysis and dyspnea was only significant for first quartile. Also emotion functioning and duration of disease statistically predicted the QOL score in the third quartile. The results have demonstrated that using quantile regression leads to better interpretation and richer inference about predictors of the breast cancer patient quality of life.
The use and misuse of statistical analyses. [in geophysics and space physics

NASA Technical Reports Server (NTRS)

Reiff, P. H.

1983-01-01

The statistical techniques most often used in space physics include Fourier analysis, linear correlation, auto- and cross-correlation, power spectral density, and superposed epoch analysis. Tests are presented which can evaluate the significance of the results obtained through each of these. Data presented without some form of error analysis are frequently useless, since they offer no way of assessing whether a bump on a spectrum or on a superposed epoch analysis is real or merely a statistical fluctuation. Among many of the published linear correlations, for instance, the uncertainty in the intercept and slope is not given, so that the significance of the fitted parameters cannot be assessed.
Visual field progression in glaucoma: estimating the overall significance of deterioration with permutation analyses of pointwise linear regression (PoPLR).

PubMed

O'Leary, Neil; Chauhan, Balwantray C; Artes, Paul H

2012-10-01

To establish a method for estimating the overall statistical significance of visual field deterioration from an individual patient's data, and to compare its performance to pointwise linear regression. The Truncated Product Method was used to calculate a statistic S that combines evidence of deterioration from individual test locations in the visual field. The overall statistical significance (P value) of visual field deterioration was inferred by comparing S with its permutation distribution, derived from repeated reordering of the visual field series. Permutation of pointwise linear regression (PoPLR) and pointwise linear regression were evaluated in data from patients with glaucoma (944 eyes, median mean deviation -2.9 dB, interquartile range: -6.3, -1.2 dB) followed for more than 4 years (median 10 examinations over 8 years). False-positive rates were estimated from randomly reordered series of this dataset, and hit rates (proportion of eyes with significant deterioration) were estimated from the original series. The false-positive rates of PoPLR were indistinguishable from the corresponding nominal significance levels and were independent of baseline visual field damage and length of follow-up. At P < 0.05, the hit rates of PoPLR were 12, 29, and 42%, at the fifth, eighth, and final examinations, respectively, and at matching specificities they were consistently higher than those of pointwise linear regression. In contrast to population-based progression analyses, PoPLR provides a continuous estimate of statistical significance for visual field deterioration individualized to a particular patient's data. This allows close control over specificity, essential for monitoring patients in clinical practice and in clinical trials.
Steganalysis of recorded speech

NASA Astrophysics Data System (ADS)

Johnson, Micah K.; Lyu, Siwei; Farid, Hany

2005-03-01

Digital audio provides a suitable cover for high-throughput steganography. At 16 bits per sample and sampled at a rate of 44,100 Hz, digital audio has the bit-rate to support large messages. In addition, audio is often transient and unpredictable, facilitating the hiding of messages. Using an approach similar to our universal image steganalysis, we show that hidden messages alter the underlying statistics of audio signals. Our statistical model begins by building a linear basis that captures certain statistical properties of audio signals. A low-dimensional statistical feature vector is extracted from this basis representation and used by a non-linear support vector machine for classification. We show the efficacy of this approach on LSB embedding and Hide4PGP. While no explicit assumptions about the content of the audio are made, our technique has been developed and tested on high-quality recorded speech.
Composite Linear Models | Division of Cancer Prevention

Cancer.gov

By Stuart G. Baker The composite linear models software is a matrix approach to compute maximum likelihood estimates and asymptotic standard errors for models for incomplete multinomial data. It implements the method described in Baker SG. Composite linear models for incomplete multinomial data. Statistics in Medicine 1994;13:609-622. The software includes a library of thirty

A study of the linear free energy model for DNA structures using the generalized Hamiltonian formalism

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yavari, M., E-mail: yavari@iaukashan.ac.ir

2016-06-15

We generalize the results of Nesterenko [13, 14] and Gogilidze and Surovtsev [15] for DNA structures. Using the generalized Hamiltonian formalism, we investigate solutions of the equilibrium shape equations for the linear free energy model.
Solar granulation and statistical crystallography: A modeling approach using size-shape relations

NASA Technical Reports Server (NTRS)

Noever, D. A.

1994-01-01

The irregular polygonal pattern of solar granulation is analyzed for size-shape relations using statistical crystallography. In contrast to previous work which has assumed perfectly hexagonal patterns for granulation, more realistic accounting of cell (granule) shapes reveals a broader basis for quantitative analysis. Several features emerge as noteworthy: (1) a linear correlation between number of cell-sides and neighboring shapes (called Aboav-Weaire's law); (2) a linear correlation between both average cell area and perimeter and the number of cell-sides (called Lewis's law and a perimeter law, respectively) and (3) a linear correlation between cell area and squared perimeter (called convolution index). This statistical picture of granulation is consistent with a finding of no correlation in cell shapes beyond nearest neighbors. A comparative calculation between existing model predictions taken from luminosity data and the present analysis shows substantial agreements for cell-size distributions. A model for understanding grain lifetimes is proposed which links convective times to cell shape using crystallographic results.
Preterm birth and dyscalculia.

PubMed

Jaekel, Julia; Wolke, Dieter

2014-06-01

To evaluate whether the risk for dyscalculia in preterm children increases the lower the gestational age (GA) and whether small-for-gestational age birth is associated with dyscalculia. A total of 922 children ranging from 23 to 41 weeks' GA were studied as part of a prospective geographically defined longitudinal investigation of neonatal at-risk children in South Germany. At 8 years of age, children's cognitive and mathematic abilities were measured with the Kaufman Assessment Battery for Children and with a standardized mathematics test. Dyscalculia diagnoses were evaluated with discrepancy-based residuals of a linear regression predicting children's math scores by IQ and with fixed cut-off scores. We investigated each GA group's ORs for general cognitive impairment, general mathematic impairment, and dyscalculia by using binary logistic regressions. The risk for general cognitive and mathematic impairment increased with lower GA. In contrast, preterm children were not at increased risk of dyscalculia after statistically adjusting for child sex, family socioeconomic status, and small-for-gestational age birth. The risk of general cognitive and mathematic impairments increases with lower GA but preterm children are not at increased risk of dyscalculia. Copyright © 2014 Elsevier Inc. All rights reserved.
Unscented Kalman Filter for Brain-Machine Interfaces

PubMed Central

Li, Zheng; O'Doherty, Joseph E.; Hanson, Timothy L.; Lebedev, Mikhail A.; Henriquez, Craig S.; Nicolelis, Miguel A. L.

2009-01-01

Brain machine interfaces (BMIs) are devices that convert neural signals into commands to directly control artificial actuators, such as limb prostheses. Previous real-time methods applied to decoding behavioral commands from the activity of populations of neurons have generally relied upon linear models of neural tuning and were limited in the way they used the abundant statistical information contained in the movement profiles of motor tasks. Here, we propose an n-th order unscented Kalman filter which implements two key features: (1) use of a non-linear (quadratic) model of neural tuning which describes neural activity significantly better than commonly-used linear tuning models, and (2) augmentation of the movement state variables with a history of n-1 recent states, which improves prediction of the desired command even before incorporating neural activity information and allows the tuning model to capture relationships between neural activity and movement at multiple time offsets simultaneously. This new filter was tested in BMI experiments in which rhesus monkeys used their cortical activity, recorded through chronically implanted multielectrode arrays, to directly control computer cursors. The 10th order unscented Kalman filter outperformed the standard Kalman filter and the Wiener filter in both off-line reconstruction of movement trajectories and real-time, closed-loop BMI operation. PMID:19603074
Equivalence between a generalized dendritic network and a set of one-dimensional networks as a ground of linear dynamics.

PubMed

Koda, Shin-ichi

2015-05-28

It has been shown by some existing studies that some linear dynamical systems defined on a dendritic network are equivalent to those defined on a set of one-dimensional networks in special cases and this transformation to the simple picture, which we call linear chain (LC) decomposition, has a significant advantage in understanding properties of dendrimers. In this paper, we expand the class of LC decomposable system with some generalizations. In addition, we propose two general sufficient conditions for LC decomposability with a procedure to systematically realize the LC decomposition. Some examples of LC decomposable linear dynamical systems are also presented with their graphs. The generalization of the LC decomposition is implemented in the following three aspects: (i) the type of linear operators; (ii) the shape of dendritic networks on which linear operators are defined; and (iii) the type of symmetry operations representing the symmetry of the systems. In the generalization (iii), symmetry groups that represent the symmetry of dendritic systems are defined. The LC decomposition is realized by changing the basis of a linear operator defined on a dendritic network into bases of irreducible representations of the symmetry group. The achievement of this paper makes it easier to utilize the LC decomposition in various cases. This may lead to a further understanding of the relation between structure and functions of dendrimers in future studies.
Bayesian generalized linear mixed modeling of Tuberculosis using informative priors

PubMed Central

Woldegerima, Woldegebriel Assefa

2017-01-01

TB is rated as one of the world’s deadliest diseases and South Africa ranks 9th out of the 22 countries with hardest hit of TB. Although many pieces of research have been carried out on this subject, this paper steps further by inculcating past knowledge into the model, using Bayesian approach with informative prior. Bayesian statistics approach is getting popular in data analyses. But, most applications of Bayesian inference technique are limited to situations of non-informative prior, where there is no solid external information about the distribution of the parameter of interest. The main aim of this study is to profile people living with TB in South Africa. In this paper, identical regression models are fitted for classical and Bayesian approach both with non-informative and informative prior, using South Africa General Household Survey (GHS) data for the year 2014. For the Bayesian model with informative prior, South Africa General Household Survey dataset for the year 2011 to 2013 are used to set up priors for the model 2014. PMID:28257437
An Explicit Linear Filtering Solution for the Optimization of Guidance Systems with Statistical Inputs

NASA Technical Reports Server (NTRS)

Stewart, Elwood C.

1961-01-01

The determination of optimum filtering characteristics for guidance system design is generally a tedious process which cannot usually be carried out in general terms. In this report a simple explicit solution is given which is applicable to many different types of problems. It is shown to be applicable to problems which involve optimization of constant-coefficient guidance systems and time-varying homing type systems for several stationary and nonstationary inputs. The solution is also applicable to off-design performance, that is, the evaluation of system performance for inputs for which the system was not specifically optimized. The solution is given in generalized form in terms of the minimum theoretical error, the optimum transfer functions, and the optimum transient response. The effects of input signal, contaminating noise, and limitations on the response are included. From the results given, it is possible in an interception problem, for example, to rapidly assess the effects on minimum theoretical error of such factors as target noise and missile acceleration. It is also possible to answer important questions regarding the effect of type of target maneuver on optimum performance.
Testing alternative ground water models using cross-validation and other methods

USGS Publications Warehouse

Foglia, L.; Mehl, S.W.; Hill, M.C.; Perona, P.; Burlando, P.

2007-01-01

Many methods can be used to test alternative ground water models. Of concern in this work are methods able to (1) rank alternative models (also called model discrimination) and (2) identify observations important to parameter estimates and predictions (equivalent to the purpose served by some types of sensitivity analysis). Some of the measures investigated are computationally efficient; others are computationally demanding. The latter are generally needed to account for model nonlinearity. The efficient model discrimination methods investigated include the information criteria: the corrected Akaike information criterion, Bayesian information criterion, and generalized cross-validation. The efficient sensitivity analysis measures used are dimensionless scaled sensitivity (DSS), composite scaled sensitivity, and parameter correlation coefficient (PCC); the other statistics are DFBETAS, Cook's D, and observation-prediction statistic. Acronyms are explained in the introduction. Cross-validation (CV) is a computationally intensive nonlinear method that is used for both model discrimination and sensitivity analysis. The methods are tested using up to five alternative parsimoniously constructed models of the ground water system of the Maggia Valley in southern Switzerland. The alternative models differ in their representation of hydraulic conductivity. A new method for graphically representing CV and sensitivity analysis results for complex models is presented and used to evaluate the utility of the efficient statistics. The results indicate that for model selection, the information criteria produce similar results at much smaller computational cost than CV. For identifying important observations, the only obviously inferior linear measure is DSS; the poor performance was expected because DSS does not include the effects of parameter correlation and PCC reveals large parameter correlations. ?? 2007 National Ground Water Association.
Generalized theory of semiflexible polymers.

PubMed

Wiggins, Paul A; Nelson, Philip C

2006-03-01

DNA bending on length scales shorter than a persistence length plays an integral role in the translation of genetic information from DNA to cellular function. Quantitative experimental studies of these biological systems have led to a renewed interest in the polymer mechanics relevant for describing the conformational free energy of DNA bending induced by protein-DNA complexes. Recent experimental results from DNA cyclization studies have cast doubt on the applicability of the canonical semiflexible polymer theory, the wormlike chain (WLC) model, to DNA bending on biologically relevant length scales. This paper develops a theory of the chain statistics of a class of generalized semiflexible polymer models. Our focus is on the theoretical development of these models and the calculation of experimental observables. To illustrate our methods, we focus on a specific, illustrative model of DNA bending. We show that the WLC model generically describes the long-length-scale chain statistics of semiflexible polymers, as predicted by renormalization group arguments. In particular, we show that either the WLC or our present model adequately describes force-extension, solution scattering, and long-contour-length cyclization experiments, regardless of the details of DNA bend elasticity. In contrast, experiments sensitive to short-length-scale chain behavior can in principle reveal dramatic departures from the linear elastic behavior assumed in the WLC model. We demonstrate this explicitly by showing that our toy model can reproduce the anomalously large short-contour-length cyclization factors recently measured by Cloutier and Widom. Finally, we discuss the applicability of these models to DNA chain statistics in the context of future experiments.
Detecting anomalies in CMB maps: a new method

DOE Office of Scientific and Technical Information (OSTI.GOV)

Neelakanta, Jayanth T., E-mail: jayanthtn@gmail.com

2015-10-01

Ever since WMAP announced its first results, different analyses have shown that there is weak evidence for several large-scale anomalies in the CMB data. While the evidence for each anomaly appears to be weak, the fact that there are multiple seemingly unrelated anomalies makes it difficult to account for them via a single statistical fluke. So, one is led to considering a combination of these anomalies. But, if we ''hand-pick'' the anomalies (test statistics) to consider, we are making an a posteriori choice. In this article, we propose two statistics that do not suffer from this problem. The statistics aremore » linear and quadratic combinations of the a{sub ℓ m}'s with random co-efficients, and they test the null hypothesis that the a{sub ℓ m}'s are independent, normally-distributed, zero-mean random variables with an m-independent variance. The motivation for considering multiple modes is this: because most physical models that lead to large-scale anomalies result in coupling multiple ℓ and m modes, the ''coherence'' of this coupling should get enhanced if a combination of different modes is considered. In this sense, the statistics are thus much more generic than those that have been hitherto considered in literature. Using fiducial data, we demonstrate that the method works and discuss how it can be used with actual CMB data to make quite general statements about the incompatibility of the data with the null hypothesis.« less
Absence of race- or gender-specific income disparities among full-time white and Asian general internists working for the Veterans Administration.

PubMed

Weeks, William B; Wallace, Amy E

2010-02-01

Gender-based, but not race-based, income disparities exist among general internists who practice medicine in the private sector. The aim of this study was to assess whether race- or gender-based income disparities existed among full-time white and Asian general internists who worked for the Veterans Health Administration of the US Department of Veterans Affairs (VA) between fiscal years 2004 and 2007, and whether any disparities changed after the VA enacted physician pay reform in early 2006. A retrospective study was conducted of all nonsupervisory, board-certified, full-time white or Asian VA general internists who did not change their location of practice between fiscal years 2004 and 2007. A longitudinal cohort design and linear regression modeling, adjusted for physician characteristics, were used to compare race- and gender-specific incomes in fiscal years 2004-2007. A total of 176 physicians were included in the study: 82 white males, 33 Asian males, 30 white females, and 31 Asian females. In all fiscal years examined, white males had the highest mean annual incomes, though not statistically significantly so. Regression analyses for fiscal years 2004 through 2006 revealed that physician age and years of service were predictive of total income. After physician pay reform was enacted, Asian male VA primary care physicians had higher annual incomes than did physicians in all other race or gender categories, after adjustment for age and years of VA service, though these differences were not statistically significant. No significant gender-based income disparities were noted among these white and Asian VA physicians. Our findings for white and Asian general internists suggest that the VA' s goal of maintaining a racially diverse workforce may have been effected, in part, through use of market pay among primary care general internists. Copyright 2010. Published by Elsevier Inc.
Inhaled corticosteroids in children with persistent asthma: effects on growth.

PubMed

Zhang, Linjie; Prietsch, Sílvio O M; Ducharme, Francine M

2014-07-17

Treatment guidelines for asthma recommend inhaled corticosteroids (ICS) as first-line therapy for children with persistent asthma. Although ICS treatment is generally considered safe in children, the potential systemic adverse effects related to regular use of these drugs have been and continue to be a matter of concern, especially the effects on linear growth. To assess the impact of ICS on the linear growth of children with persistent asthma and to explore potential effect modifiers such as characteristics of available treatments (molecule, dose, length of exposure, inhalation device) and of treated children (age, disease severity, compliance with treatment). We searched the Cochrane Airways Group Specialised Register of trials (CAGR), which is derived from systematic searches of bibliographic databases including CENTRAL, MEDLINE, EMBASE, CINAHL, AMED and PsycINFO; we handsearched respiratory journals and meeting abstracts. We also conducted a search of ClinicalTrials.gov and manufacturers' clinical trial databases to look for potential relevant unpublished studies. The literature search was conducted in January 2014. Parallel-group randomised controlled trials comparing daily use of ICS, delivered by any type of inhalation device for at least three months, versus placebo or non-steroidal drugs in children up to 18 years of age with persistent asthma. Two review authors independently performed study selection, data extraction and assessment of risk of bias in included studies. We conducted meta-analyses using the Cochrane statistical package RevMan 5.2 and Stata version 11.0. We used the random-effects model for meta-analyses. We used mean differences (MDs) and 95% CIs as the metrics for treatment effects. A negative value for MD indicates that ICS have suppressive effects on linear growth compared with controls. We performed a priori planned subgroup analyses to explore potential effect modifiers, such as ICS molecule, daily dose, inhalation device and age of the treated child. We included 25 trials involving 8471 (5128 ICS-treated and 3343 control) children with mild to moderate persistent asthma. Six molecules (beclomethasone dipropionate, budesonide, ciclesonide, flunisolide, fluticasone propionate and mometasone furoate) [corrected] given at low or medium daily doses were used during a period of three months to four to six years. Most trials were blinded and over half of the trials had drop out rates of over 20%.Compared with placebo or non-steroidal drugs, ICS produced a statistically significant reduction in linear growth velocity (14 trials with 5717 participants, MD -0.48 cm/y, 95% CI -0.65 to -0.30, moderate quality evidence) and in the change from baseline in height (15 trials with 3275 participants; MD -0.61 cm/y, 95% CI -0.83 to -0.38, moderate quality evidence) during a one-year treatment period.Subgroup analysis showed a statistically significant group difference between six molecules in the mean reduction of linear growth velocity during one-year treatment (Chi² = 26.1, degrees of freedom (df) = 5, P value < 0.0001). The group difference persisted even when analysis was restricted to the trials using doses equivalent to 200 μg/d hydrofluoroalkane (HFA)-beclomethasone. Subgroup analyses did not show a statistically significant impact of daily dose (low vs medium), inhalation device or participant age on the magnitude of ICS-induced suppression of linear growth velocity during a one-year treatment period. However, head-to-head comparisons are needed to assess the effects of different drug molecules, dose, inhalation device or patient age. No statistically significant difference in linear growth velocity was found between participants treated with ICS and controls during the second year of treatment (five trials with 3174 participants; MD -0.19 cm/y, 95% CI -0.48 to 0.11, P value 0.22). Of two trials that reported linear growth velocity in the third year of treatment, one trial involving 667 participants showed similar growth velocity between the budesonide and placebo groups (5.34 cm/y vs 5.34 cm/y), and another trial involving 1974 participants showed lower growth velocity in the budesonide group compared with the placebo group (MD -0.33 cm/y, 95% CI -0.52 to -0.14, P value 0.0005). Among four trials reporting data on linear growth after treatment cessation, three did not describe statistically significant catch-up growth in the ICS group two to four months after treatment cessation. One trial showed accelerated linear growth velocity in the fluticasone group at 12 months after treatment cessation, but there remained a statistically significant difference of 0.7 cm in height between the fluticasone and placebo groups at the end of the three-year trial.One trial with follow-up into adulthood showed that participants of prepubertal age treated with budesonide 400 μg/d for a mean duration of 4.3 years had a mean reduction of 1.20 cm (95% CI -1.90 to -0.50) in adult height compared with those treated with placebo. Regular use of ICS at low or medium daily doses is associated with a mean reduction of 0.48 cm/y in linear growth velocity and a 0.61-cm change from baseline in height during a one-year treatment period in children with mild to moderate persistent asthma. The effect size of ICS on linear growth velocity appears to be associated more strongly with the ICS molecule than with the device or dose (low to medium dose range). ICS-induced growth suppression seems to be maximal during the first year of therapy and less pronounced in subsequent years of treatment. However, additional studies are needed to better characterise the molecule dependency of growth suppression, particularly with newer molecules (mometasone, ciclesonide), to specify the respective role of molecule, daily dose, inhalation device and patient age on the effect size of ICS, and to define the growth suppression effect of ICS treatment over a period of several years in children with persistent asthma.
A weighted generalized score statistic for comparison of predictive values of diagnostic tests

PubMed Central

Kosinski, Andrzej S.

2013-01-01

Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations which are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we present, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic which incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, it always reduces to the score statistic in the independent samples situation, and it preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the weighted generalized score test statistic in a general GEE setting. PMID:22912343
Outcomes associated with preoperative weight loss after laparoscopic Roux-en-Y gastric bypass.

PubMed

Blackledge, Camille; Graham, Laura A; Gullick, Allison A; Richman, Joshua; Stahl, Richard; Grams, Jayleen

2016-11-01

Laparoscopic Roux-en-Y gastric bypass (LRYGB) is an effective treatment for achieving and maintaining weight loss and for improving obesity-related comorbidities. As part of the approval process for bariatric surgery, many insurance companies require patients to have documented recent participation in a supervised weight loss program. The goal of this study was to evaluate the relationship of preoperative weight changes with outcomes following LRYGB. A retrospective review was conducted of adult patients undergoing LRYGB between 2008 and 2012 at a single institution. Patients were stratified into quartiles based on % excess weight gain (0-4.99 % and ≥5 % EWG) and % excess weight loss (0-4.99 % and ≥5 % EWL). Generalized linear models were used to examine differences in postoperative weight outcomes at 6, 12, and 24 months. Covariates included in the final adjusted models were determined using backwards stepwise selection. Of the 300 patients included in the study, there were no significant demographic differences among the quartiles. However, there was an increased time to operation for patients who gained or lost ≥5 % excess body weight (p < 0.001). Although there was no statistical significance in postoperative complications, there was a higher rate of complications in patients with ≥5 % EWG compared to those with ≥5 % EWL (12.5 vs. 4.8 %, respectively; p = 0.29). Unadjusted and adjusted generalized linear models showed no statistically significant association between preoperative % excess weight change and weight loss outcomes at 24 months. Patients with the greatest % preoperative excess weight change had the longest intervals from initial visit to operation. No significant differences were seen in perioperative and postoperative outcomes. This study suggests preoperative weight loss requirements may delay the time to operation without improving postoperative outcomes or weight loss.
[Infant mortality by cause of death in the Rio de Janeiro metropolitan area, 1976-1986: association with socioeconomic, climatic and air pollution variables].

PubMed

Duchiade, M P; Beltrao, K I

1992-01-01

The Metropolitan Region of Rio de Janeiro (RMR) consists of the capital (the city of Rio de Janeiro) and 13 surrounding cities. The city of Rio de Janeiro itself was divided into 24 rather heterogeneous administrative regions (RAS) based on the income level of their inhabitants, the supply of public services such as water and sewerage, and population density or air pollution. Three different socioeconomic covariables were selected in three residential zones (ZONA) or subareas: the central rich nucleus, the intermediary zone of transition, and the distant periphery. As dependent variables the specific rate of infant, neonatal, or postneonatal mortality were considered for causes. The RMRJ Civil Register mortality data were utilized. A factor of correction was estimated according to the technique of Brass using the fertility rate and the rate of delivery for specific 5-year age groups of mothers. A multivariate analysis, the adjusted generalized linear model (MLG), was used for studying associations between socioeconomic, climatic, and air pollution variables and the levels of mortality. The MLG was formulated by means of the statistical package, GLIM or Generalized Linear Interactive Modelling. Analysis of infant mortality trends during 1976-1986 for the large subareas of RMRJ and the outlying region showed that the peak months of total neonatal and perinatal mortality were March and February, while the lowest months were November and October. May and June represented maximum rates of postneonatal mortality for pneumonia, diarrhea, other respiratory infections, malnutrition, and other diseases. MLG indicated that there was a statistically significant association between the annual mortality rate for selected causes and socioeconomic indicators (INS, FS and Zona); the rates of mortality also varied depending on time (ANO and ANOQ); and the mortality rates also appeared to be associated with the variations of the log of average pollution (LPM).
Using the Coefficient of Determination "R"[superscript 2] to Test the Significance of Multiple Linear Regression

ERIC Educational Resources Information Center

Quinino, Roberto C.; Reis, Edna A.; Bessegato, Lupercio F.

2013-01-01

This article proposes the use of the coefficient of determination as a statistic for hypothesis testing in multiple linear regression based on distributions acquired by beta sampling. (Contains 3 figures.)
Psychosocial issues on-orbit: results from two space station programs

NASA Astrophysics Data System (ADS)

Kanas, N. A.; Salnitskiy, V. P.; Ritsher, J. B.; Gushin, V. I.; Weiss, D. S.; Saylor, S. A.; Marmar, C. R.

PURPOSE Psychosocial issues affecting people working in isolated and confined environments such as spacecraft can jeopardize mental health and mission safety Our team has completed two large NASA-funded studies involving missions to the Mir and International Space Stations where crewmembers were on-orbit for four to seven months Combining these two datasets allows us to generalize across these two settings and maximize statistical power in testing our hypotheses This paper presents results from three sets of hypotheses concerning possible changes in mood and social climate over time displacement of negative emotions to outside monitoring personnel and the task and support roles of the leader METHODS The combined sample of 216 participants included 13 American astronauts 17 Russian cosmonauts and 150 U S and 36 Russian mission control personnel Subjects completed a weekly questionnaire that included items from the Profile of Mood States the Group Environment Scale and the Work Environment Scale producing 20 subscale scores The analytic strategy included piecewise linear regression and general linear modeling and it accounted for the effects of multiple observations per person and multiple analyses RESULTS There was little evidence to suggest that universal changes in levels of mood and group climate occurred among astronauts and cosmonauts over time Although a few individuals experienced decrements in the second half of the mission the majority did not However there was evidence that subjects displaced negative emotions to outside
A Reanalysis of Curvature in the Dose Response for Cancer and Modifications by Age at Exposure Following Radiation Therapy for Benign Disease

DOE Office of Scientific and Technical Information (OSTI.GOV)

Little, Mark P., E-mail: mark.little@nih.gov; Stovall, Marilyn; Smith, Susan A.

Purpose: To assess the shape of the dose response for various cancer endpoints and modifiers by age and time. Methods and Materials: Reanalysis of the US peptic ulcer data testing for heterogeneity of radiogenic risk by cancer endpoint (stomach, pancreas, lung, leukemia, all other). Results: There are statistically significant (P<.05) excess risks for all cancer and for lung cancer and borderline statistically significant risks for stomach cancer (P=.07), and leukemia (P=.06), with excess relative risks Gy{sup -1} of 0.024 (95% confidence interval [CI] 0.011, 0.039), 0.559 (95% CI 0.221, 1.021), 0.042 (95% CI -0.002, 0.119), and 1.087 (95% CI -0.018,more » 4.925), respectively. There is statistically significant (P=.007) excess risk of pancreatic cancer when adjusted for dose-response curvature. General downward curvature is apparent in the dose response, statistically significant (P<.05) for all cancers, pancreatic cancer, and all other cancers (ie, other than stomach, pancreas, lung, leukemia). There are indications of reduction in relative risk with increasing age at exposure (for all cancers, pancreatic cancer), but no evidence for quadratic variations in relative risk with age at exposure. If a linear-exponential dose response is used, there is no significant heterogeneity in the dose response among the 5 endpoints considered or in the speed of variation of relative risk with age at exposure. The risks are generally consistent with those observed in the Japanese atomic bomb survivors and in groups of nuclear workers. Conclusions: There are excess risks for various malignancies in this data set. Generally there is a marked downward curvature in the dose response and significant reduction in relative risk with increasing age at exposure. The consistency of risks with those observed in the Japanese atomic bomb survivors and in groups of nuclear workers implies that there may be little sparing effect of fractionation of dose or low-dose-rate exposure.« less
Extreme value statistics analysis of fracture strengths of a sintered silicon nitride failing from pores

NASA Technical Reports Server (NTRS)

Chao, Luen-Yuan; Shetty, Dinesh K.

1992-01-01

Statistical analysis and correlation between pore-size distribution and fracture strength distribution using the theory of extreme-value statistics is presented for a sintered silicon nitride. The pore-size distribution on a polished surface of this material was characterized, using an automatic optical image analyzer. The distribution measured on the two-dimensional plane surface was transformed to a population (volume) distribution, using the Schwartz-Saltykov diameter method. The population pore-size distribution and the distribution of the pore size at the fracture origin were correllated by extreme-value statistics. Fracture strength distribution was then predicted from the extreme-value pore-size distribution, usin a linear elastic fracture mechanics model of annular crack around pore and the fracture toughness of the ceramic. The predicted strength distribution was in good agreement with strength measurements in bending. In particular, the extreme-value statistics analysis explained the nonlinear trend in the linearized Weibull plot of measured strengths without postulating a lower-bound strength.
Use of bias correction techniques to improve seasonal forecasts for reservoirs - A case-study in northwestern Mediterranean.

PubMed

Marcos, Raül; Llasat, Ma Carmen; Quintana-Seguí, Pere; Turco, Marco

2018-01-01

In this paper, we have compared different bias correction methodologies to assess whether they could be advantageous for improving the performance of a seasonal prediction model for volume anomalies in the Boadella reservoir (northwestern Mediterranean). The bias correction adjustments have been applied on precipitation and temperature from the European Centre for Middle-range Weather Forecasting System 4 (S4). We have used three bias correction strategies: two linear (mean bias correction, BC, and linear regression, LR) and one non-linear (Model Output Statistics analogs, MOS-analog). The results have been compared with climatology and persistence. The volume-anomaly model is a previously computed Multiple Linear Regression that ingests precipitation, temperature and in-flow anomaly data to simulate monthly volume anomalies. The potential utility for end-users has been assessed using economic value curve areas. We have studied the S4 hindcast period 1981-2010 for each month of the year and up to seven months ahead considering an ensemble of 15 members. We have shown that the MOS-analog and LR bias corrections can improve the original S4. The application to volume anomalies points towards the possibility to introduce bias correction methods as a tool to improve water resource seasonal forecasts in an end-user context of climate services. Particularly, the MOS-analog approach gives generally better results than the other approaches in late autumn and early winter. Copyright © 2017 Elsevier B.V. All rights reserved.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.